{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T18:30:29Z","timestamp":1770748229663,"version":"3.49.0"},"reference-count":75,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,10,23]],"date-time":"2022-10-23T00:00:00Z","timestamp":1666483200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,23]],"date-time":"2022-10-23T00:00:00Z","timestamp":1666483200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100012704","name":"University of Agder","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012704","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Speech Technol"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To meet the human perceived quality of experience (QoE) while communicating over various Voice over Internet protocol (VoIP) applications, for example Google Meet, Microsoft Skype, Apple FaceTime, etc. a precise speech quality assessment metric is needed. The metric should be able to detect and segregate different types of noise degradations present in the surroundings before measuring and monitoring the quality of speech in real-time. Our research is motivated by the lack of clear evidence presenting speech quality metric that can firstly distinguish different types of noise degradations before providing speech quality prediction decision. To that end, this paper presents a novel non-intrusive speech quality assessment metric using context-aware neural networks in which the noise class (context) of the degraded or noisy speech signal is first identified using a classifier then deep neutral networks (DNNs) based speech quality metrics (SQMs) are trained and optimized for each noise class to obtain the noise class-specific (context-specific) optimized speech quality predictions (MOS scores). The noisy speech signals, that is, clean speech signals degraded by different types of background noises are taken from the NOIZEUS speech corpus. Results demonstrate that even in the presence of less number of speech samples available from the NOIZEUS speech corpus, the proposed metric outperforms in different contexts compared to the metric where the contexts are not classified before speech quality prediction.<\/jats:p>","DOI":"10.1007\/s10772-022-10011-y","type":"journal-article","created":{"date-parts":[[2022,10,23]],"date-time":"2022-10-23T10:02:46Z","timestamp":1666519366000},"page":"947-965","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Non-intrusive speech quality assessment using context-aware neural networks"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3800-7235","authenticated-orcid":false,"given":"Rahul Kumar","family":"Jaiswal","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0028-507X","authenticated-orcid":false,"given":"Rajesh Kumar","family":"Dubey","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,10,23]]},"reference":[{"key":"10011_CR1","doi-asserted-by":"crossref","unstructured":"Avila, A. R., Gamper, H., Reddy, C., Cutler, R., Tashev, I., & Gehrke, J. (2019). Non-intrusive speech quality assessment using neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), 2019 (pp. 631\u2013635).","DOI":"10.1109\/ICASSP.2019.8683175"},{"issue":"3","key":"10011_CR2","doi-asserted-by":"publisher","first-page":"116","DOI":"10.25046\/aj020316","volume":"2","author":"S Belarouci","year":"2017","unstructured":"Belarouci, S., & Chikh, M. A. (2017). Medical imbalanced data classification. Advances in Science, Technology and Engineering Systems Journal, 2(3), 116\u2013124.","journal-title":"Advances in Science, Technology and Engineering Systems Journal"},{"issue":"2","key":"10011_CR3","first-page":"281","volume":"13","author":"JA Bergstra","year":"2012","unstructured":"Bergstra, J. A., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(2), 281\u2013305.","journal-title":"Journal of Machine Learning Research"},{"key":"10011_CR4","volume-title":"ITU-T Recommendation G.107: The E-Model, a computational model for use in transmission planning","author":"JA Bergstra","year":"2003","unstructured":"Bergstra, J. A., & Middelburg, C. (2003). ITU-T Recommendation G.107: The E-Model, a computational model for use in transmission planning. International Telecommunication Union."},{"key":"10011_CR5","unstructured":"Bruhn, S., Grancharov, V., & Kleijn, W. B. (2012). Low-complexity, non-intrusive speech quality assessment. US Patent, 8,195,449."},{"key":"10011_CR6","doi-asserted-by":"crossref","unstructured":"Catellier, A. A., & Voran, S. D. (2020). Wawenets: A no-reference convolutional waveform-based approach to estimating narrowband and wideband speech quality. In IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 331\u2013335).","DOI":"10.1109\/ICASSP40776.2020.9054204"},{"key":"10011_CR7","doi-asserted-by":"crossref","unstructured":"Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016 (pp. 785\u2013794).","DOI":"10.1145\/2939672.2939785"},{"key":"10011_CR8","doi-asserted-by":"crossref","unstructured":"Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O\u2019Gorman, F., & Hines, A. (2020). ViSQOL v3: An open source production ready objective speech and audio metric. In Twelfth international conference on quality of multimedia experience, 2020 (pp. 1\u20136). IEEE.","DOI":"10.1109\/QoMEX48832.2020.9123150"},{"key":"10011_CR9","unstructured":"Chowdhury, A., Yang, J., & Drineas, P. (2018). An iterative, sketching-based framework for ridge regression. In International conference on machine learning, 2018 (pp. 989\u2013998)."},{"issue":"4","key":"10011_CR10","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1109\/97.1001645","volume":"9","author":"I Cohen","year":"2002","unstructured":"Cohen, I. (2002). Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator. IEEE Signal Processing Letters, 9(4), 113\u2013116.","journal-title":"IEEE Signal Processing Letters"},{"key":"10011_CR11","first-page":"1","volume":"24","author":"N Das","year":"2020","unstructured":"Das, N., Chakraborty, S., Chaki, J., Padhy, N., & Dey, N. (2020). Fundamentals, present and future perspectives of speech enhancement. International Journal of Speech Technology, 24, 1\u201319.","journal-title":"International Journal of Speech Technology"},{"key":"10011_CR12","volume-title":"Emerging trends in image processing, computer vision and pattern recognition","author":"L Deligiannidis","year":"2014","unstructured":"Deligiannidis, L., & Arabnia, H. R. (2014). Emerging trends in image processing, computer vision and pattern recognition. Morgan Kaufmann."},{"key":"10011_CR13","doi-asserted-by":"crossref","unstructured":"Dimitrakopoulos, G. N., Vrahatis, A. G., Plagianakos, V., & Sgarbas, K. (2018). Pathway analysis using XGBoost classification in biomedical data. In Proceedings of the 10th Hellenic conference on artificial intelligence, 2018 (pp. 1\u20136).","DOI":"10.1145\/3200947.3201029"},{"key":"10011_CR14","unstructured":"Dozat, T. (2016). Incorporating Nesterov momentum into Adam. In 4th International conference on learning representations (ICLR), 2016."},{"key":"10011_CR15","unstructured":"Drummond, C., & Holte, R. C. (2003). C 4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In 20th International conference on machine learning (ICML) workshop on learning from imbalanced data sets, 2003."},{"issue":"1","key":"10011_CR16","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1007\/s10772-012-9162-4","volume":"16","author":"RK Dubey","year":"2013","unstructured":"Dubey, R. K., & Kumar, A. (2013). Non-intrusive speech quality assessment using several combinations of auditory features. International Journal of Speech Technology, 16(1), 89\u2013101.","journal-title":"International Journal of Speech Technology"},{"key":"10011_CR17","doi-asserted-by":"crossref","unstructured":"Dubey, R. K., & Kumar, A. (2015). Comparison of subjective and objective speech quality assessment for different degradation\/noise conditions. In IEEE international conference on signal processing and communication, 2015 (pp. 261\u2013266).","DOI":"10.1109\/ICSPCom.2015.7150659"},{"key":"10011_CR18","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1016\/j.dsp.2017.07.020","volume":"70","author":"RK Dubey","year":"2017","unstructured":"Dubey, R. K., & Kumar, A. (2017). Non-intrusive speech quality estimation as combination of estimates using multiple time-scale auditory features. Digital Signal Processing, 70, 114\u2013124.","journal-title":"Digital Signal Processing"},{"key":"10011_CR19","doi-asserted-by":"crossref","unstructured":"Eisen, M., Zhang, C., Chamon, L. F., Lee, D. D., & Ribeiro, A. (2018). Online deep learning in wireless communication systems. In 52nd Asilomar conference on signals, systems, and computers (ACSSC), 2018 (pp. 1289\u20131293). IEEE.","DOI":"10.1109\/ACSSC.2018.8645312"},{"key":"10011_CR20","volume-title":"Digital signal processing: An experimental approach","author":"S Engelberg","year":"2008","unstructured":"Engelberg, S. (2008). Digital signal processing: An experimental approach. Springer."},{"issue":"10","key":"10011_CR21","doi-asserted-by":"publisher","first-page":"1526","DOI":"10.1109\/5.168664","volume":"80","author":"Y Ephraim","year":"1992","unstructured":"Ephraim, Y. (1992). Statistical-model-based speech enhancement systems. Proceedings of the IEEE, 80(10), 1526\u20131555.","journal-title":"Proceedings of the IEEE"},{"issue":"6","key":"10011_CR22","doi-asserted-by":"publisher","first-page":"1109","DOI":"10.1109\/TASSP.1984.1164453","volume":"32","author":"Y Ephraim","year":"1984","unstructured":"Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109\u20131121.","journal-title":"IEEE Transactions on Acoustics, Speech, and Signal Processing"},{"issue":"6","key":"10011_CR23","doi-asserted-by":"publisher","first-page":"1935","DOI":"10.1109\/TASL.2006.883253","volume":"14","author":"TH Falk","year":"2006","unstructured":"Falk, T. H., & Chan, W. Y. (2006). Single-ended speech quality measurement using machine learning methods. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1935\u20131947.","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"7","key":"10011_CR24","doi-asserted-by":"publisher","first-page":"1766","DOI":"10.1109\/TASL.2010.2052247","volume":"18","author":"TH Falk","year":"2010","unstructured":"Falk, T. H., Zheng, C., & Chan, W. Y. (2010). A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1766\u20131774.","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"10011_CR25","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-98074-4","volume-title":"Learning from imbalanced datasets","author":"A Fern\u00e1ndez","year":"2018","unstructured":"Fern\u00e1ndez, A., Garc\u00eda, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced datasets (Vol. 11). Springer."},{"issue":"4","key":"10011_CR26","first-page":"1189","volume":"29","author":"JH Friedman","year":"2001","unstructured":"Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(4), 1189\u20131232.","journal-title":"Annals of Statistics"},{"key":"10011_CR27","doi-asserted-by":"crossref","unstructured":"Fu, S. W., Tsao, Y., Hwang, H. T., & Wang, H. M. (2018). Quality-Net: An end-to-end non-intrusive speech quality assessment model based on BLSTM. In Interspeech, 2018.","DOI":"10.21437\/Interspeech.2018-1802"},{"key":"10011_CR28","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1016\/j.specom.2018.01.008","volume":"98","author":"T Fukuda","year":"2018","unstructured":"Fukuda, T., Ichikawa, O., & Nishimura, M. (2018). Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition. Speech Communication, 98, 95\u2013103.","journal-title":"Speech Communication"},{"key":"10011_CR29","unstructured":"Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th international conference on artificial intelligence and statistics, 2010 (pp. 249\u2013256)."},{"key":"10011_CR30","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative nets. In Advances in neural information processing systems, 2014 (pp 2672\u20132680)."},{"issue":"6","key":"10011_CR31","doi-asserted-by":"publisher","first-page":"1948","DOI":"10.1109\/TASL.2006.883250","volume":"14","author":"V Grancharov","year":"2006","unstructured":"Grancharov, V., Zhao, D. Y., Lindblom, J., & Kleijn, W. B. (2006). Low-complexity, non-intrusive speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1948\u20131956.","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"8","key":"10011_CR32","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1109\/89.966083","volume":"9","author":"H Gustafsson","year":"2001","unstructured":"Gustafsson, H., Nordholm, S. E., & Claesson, I. (2001). Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9(8), 799\u2013807.","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"10011_CR33","doi-asserted-by":"crossref","unstructured":"Hines, A., Gillen, E., & Harte, N. (2015a). Measuring and monitoring speech quality for voice over IP with POLQA, ViSQOL and P.563. In INTERSPEECH, 2015, Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-171"},{"key":"10011_CR34","first-page":"1","volume":"1","author":"A Hines","year":"2015","unstructured":"Hines, A., Skoglund, J., Kokaram, A. C., & Harte, N. (2015). ViSQOL: An objective speech quality model. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1\u201318.","journal-title":"EURASIP Journal on Audio, Speech, and Music Processing"},{"key":"10011_CR35","unstructured":"Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR2000\u2014Automatic Speech Recognition: Challenges for the new millennium, ISCA tutorial and research workshop (ITRW), 2000, Paris, France."},{"issue":"1","key":"10011_CR36","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/s10772-016-9389-6","volume":"20","author":"J Holub","year":"2017","unstructured":"Holub, J., Avetisyan, H., & Isabelle, S. (2017). Subjective speech quality measurement repeatability: Comparison of laboratory test results. International Journal of Speech Technology, 20(1), 69\u201374.","journal-title":"International Journal of Speech Technology"},{"key":"10011_CR37","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1109\/TSA.2003.814458","volume":"11","author":"Y Hu","year":"2003","unstructured":"Hu, Y., & Loizou, P. C. (2003). A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing, 11, 334\u2013341.","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"issue":"1","key":"10011_CR38","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1109\/TSA.2003.819949","volume":"12","author":"Y Hu","year":"2004","unstructured":"Hu, Y., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Transactions on Speech and Audio Processing, 12(1), 59\u201367.","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"key":"10011_CR39","unstructured":"Hu, Y., & Loizou, P. C. (2006). Subjective comparison of speech enhancement algorithms. In IEEE international conference on acoustics speech and signal processing, Vol. 1, (pp. 153\u2013156)."},{"issue":"1","key":"10011_CR40","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1109\/TASL.2007.911054","volume":"16","author":"Y Hu","year":"2007","unstructured":"Hu, Y., & Loizou, P. C. (2007). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229\u2013238.","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"key":"10011_CR41","unstructured":"ITU. (1996). ITU-T Recommendation P.800: Methods for subjective determination of transmission quality. ITU."},{"key":"10011_CR42","unstructured":"ITU. (1998). ITU-T coded-speech database: Series P, Supplement 23. ITU."},{"key":"10011_CR43","unstructured":"ITU. (2004). ITU-T recommendation P.563: Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU."},{"key":"10011_CR44","unstructured":"ITU. (2011). ITU-T recommendation P.863: Perceptual objective listening quality assessment (POLQA). ITU."},{"key":"10011_CR45","doi-asserted-by":"crossref","unstructured":"Jahromi, H. Z., Hines, A., & Delanev, D. T. (2018). Towards application-aware networking: ML-based end-to-end application KPI\/QoE metrics characterization in SDN. In Tenth international conference on ubiquitous and future networks (ICUFN), 2018 (pp. 126\u2013131).","DOI":"10.1109\/ICUFN.2018.8436625"},{"key":"10011_CR46","doi-asserted-by":"crossref","unstructured":"Jain, R., Damoulas, T., & Kontokosta, C. (2014). Towards data-driven energy consumption forecasting of multi-family residential buildings: feature selection via the lasso. In Computing in civil and building engineering, 2014 (pp. 1675\u20131682).","DOI":"10.1061\/9780784413616.208"},{"key":"10011_CR47","doi-asserted-by":"crossref","unstructured":"Jaiswal, R. (2022). Performance analysis of voice activity detector in presence of non-stationary noise. In Proceedings of the 11th international conference on robotics, vision, signal processing and power applications (RoViSP), 2022 (pp. 59\u201365). Springer.","DOI":"10.1007\/978-981-16-8129-5_10"},{"key":"10011_CR48","unstructured":"Jaiswal, R., & Hines, A. (2018). The sound of silence: How traditional and deep learning based voice activity detection influences speech quality monitoring. In 26th Irish conference on artificial intelligence and cognitive science (AICS), 2018 (pp. 174\u2013185)."},{"key":"10011_CR49","doi-asserted-by":"crossref","unstructured":"Jaiswal, R., & Hines, A. (2020). Towards a non-intrusive context-aware speech quality model. In 31st Irish signals and systems conference, 2020 (pp. 1\u20135). IEEE.","DOI":"10.1109\/ISSC49989.2020.9180171"},{"key":"10011_CR50","doi-asserted-by":"crossref","unstructured":"Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In IEEE international conference on acoustics speech and signal processing,\u00a0Vol. 4, (pp. 4160\u20134164).","DOI":"10.1109\/ICASSP.2002.5745591"},{"issue":"1","key":"10011_CR51","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1002\/bltj.20228","volume":"12","author":"DS Kim","year":"2007","unstructured":"Kim, D. S., & Tarraf, A. (2007). ANIQUE+: A new American national standard for non-intrusive estimation of narrow-band speech quality. Bell Labs Technical Journal, 12(1), 221\u2013236.","journal-title":"Bell Labs Technical Journal"},{"key":"10011_CR52","unstructured":"Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In 3rd International conference on learning representations, 2015."},{"issue":"3","key":"10011_CR53","doi-asserted-by":"publisher","first-page":"551","DOI":"10.1007\/s10115-017-1059-8","volume":"53","author":"Y Li","year":"2017","unstructured":"Li, Y., Li, T., & Liu, H. (2017). Recent advances in feature selection and its applications. Knowledge and Information Systems, 53(3), 551\u2013577.","journal-title":"Knowledge and Information Systems"},{"issue":"6","key":"10011_CR54","doi-asserted-by":"publisher","first-page":"1924","DOI":"10.1109\/TASL.2006.883177","volume":"14","author":"L Malfait","year":"2006","unstructured":"Malfait, L., Berger, J., & Kastner, M. (2006). P.563\u2014The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1924\u20131934.","journal-title":"IEEE Transactions on Audio, Speech, and Language Processing"},{"issue":"2","key":"10011_CR55","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1002\/bimj.201200088","volume":"55","author":"RJ Meijer","year":"2013","unstructured":"Meijer, R. J., & Goeman, J. J. (2013). Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical Journal, 55(2), 141\u2013155.","journal-title":"Biometrical Journal"},{"issue":"2","key":"10011_CR56","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1109\/89.824700","volume":"8","author":"U Mittal","year":"2000","unstructured":"Mittal, U., & Phamdo, N. (2000). Signal\/noise KLT based approach for enhancing speech degraded by colored noise. IEEE Transactions on Speech and Audio Processing, 8(2), 159\u2013167.","journal-title":"IEEE Transactions on Speech and Audio Processing"},{"issue":"6","key":"10011_CR57","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/MSP.2011.942469","volume":"28","author":"S M\u00f6ller","year":"2011","unstructured":"M\u00f6ller, S., Chan, W. Y., C\u00f4t\u00e9, N., Falk, T. H., Raake, A., & W\u00e4ltermann, M. (2011). Speech quality estimation: Models and trends. IEEE Signal Processing Magazine, 28(6), 18\u201328.","journal-title":"IEEE Signal Processing Magazine"},{"key":"10011_CR58","doi-asserted-by":"publisher","first-page":"574","DOI":"10.1016\/j.csl.2016.11.003","volume":"46","author":"AH Moore","year":"2017","unstructured":"Moore, A. H., Parada, P. P., & Naylor, P. A. (2017). Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures. Computer Speech and Language, 46, 574\u2013584.","journal-title":"Computer Speech and Language"},{"key":"10011_CR59","doi-asserted-by":"crossref","unstructured":"Ooster, J., Huber, R., & Meyer, B. T. (2018). Prediction of perceived speech quality using deep machine listening. In INTERSPEECH, 2018 (pp. 976\u2013980).","DOI":"10.21437\/Interspeech.2018-1374"},{"issue":"9","key":"10011_CR60","first-page":"1","volume":"6","author":"J Ramirez","year":"2007","unstructured":"Ramirez, J., G\u00f3rriz, J. M., & Segura, J. C. (2007). Voice activity detection: Fundamentals and speech recognition system robustness. Robust Speech Recognition and Understanding, 6(9), 1\u201322.","journal-title":"Robust Speech Recognition and Understanding"},{"key":"10011_CR61","doi-asserted-by":"publisher","first-page":"101205","DOI":"10.1016\/j.csl.2021.101205","volume":"69","author":"MK Reddy","year":"2021","unstructured":"Reddy, M. K., Helkkula, P., Keerthana, Y. M., Kaitue, K., Minkkinen, M., Tolppanen, H., et al. (2021). The automatic detection of heart failure using speech signals. Computer Speech and Language, 69, 101205.","journal-title":"Computer Speech and Language"},{"key":"10011_CR62","doi-asserted-by":"crossref","unstructured":"Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)\u2014A new method for speech quality assessment of telephone networks and codecs. In IEEE international conference on acoustics, speech, and signal processing, 2001, Vol. 2, (pp. 749\u2013752).","DOI":"10.1109\/ICASSP.2001.941023"},{"issue":"4","key":"10011_CR63","doi-asserted-by":"publisher","first-page":"1051","DOI":"10.1007\/s10772-019-09645-2","volume":"22","author":"N Saleem","year":"2019","unstructured":"Saleem, N., & Khattak, M. I. (2019). A review of supervised learning algorithms for single channel speech enhancement. International Journal of Speech Technology, 22(4), 1051\u20131075.","journal-title":"International Journal of Speech Technology"},{"key":"10011_CR64","doi-asserted-by":"crossref","unstructured":"Scalart, P., et al. (1996). Speech enhancement based on a priori signal to noise estimation. In IEEE international conference on acoustics, speech, and signal processing, 1996, Vol. 2, (pp. 629\u2013632).","DOI":"10.1109\/ICASSP.1996.543199"},{"issue":"3","key":"10011_CR65","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1016\/j.specom.2007.01.006","volume":"49","author":"M Shami","year":"2007","unstructured":"Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201\u2013212.","journal-title":"Speech Communication"},{"key":"10011_CR66","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.specom.2016.03.005","volume":"80","author":"D Sharma","year":"2016","unstructured":"Sharma, D., Wang, Y., Naylor, P. A., & Brookes, M. (2016). A data-driven non-intrusive measure of speech quality and intelligibility. Speech Communication, 80, 84\u201394.","journal-title":"Speech Communication"},{"issue":"3","key":"10011_CR67","doi-asserted-by":"publisher","first-page":"585","DOI":"10.1007\/s10772-018-9537-2","volume":"22","author":"N Shome","year":"2019","unstructured":"Shome, N., Laskar, R. H., & Das, D. (2019). Reference free speech quality estimation for diverse data condition. International Journal of Speech Technology, 22(3), 585\u2013599.","journal-title":"International Journal of Speech Technology"},{"key":"10011_CR68","doi-asserted-by":"publisher","first-page":"101861","DOI":"10.1016\/j.sysarc.2020.101861","volume":"112","author":"J Singh","year":"2021","unstructured":"Singh, J., & Singh, J. (2021). A survey on machine learning-based malware detection in executable files. Journal of Systems Architecture, 112, 101861.","journal-title":"Journal of Systems Architecture"},{"key":"10011_CR69","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.specom.2021.03.004","volume":"130","author":"MH Soni","year":"2021","unstructured":"Soni, M. H., & Patil, H. A. (2021). Non-intrusive quality assessment of noise-suppressed speech using unsupervised deep features. Speech Communication, 130, 27\u201344.","journal-title":"Speech Communication"},{"issue":"1","key":"10011_CR70","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., et al. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929\u20131958.","journal-title":"The Journal of Machine Learning Research"},{"key":"10011_CR71","doi-asserted-by":"crossref","unstructured":"Sun, H., Chen, X., Shi, Q., Hong, M., Fu, X., & Sidiropoulos, N. D. (2017). Learning to optimize: Training deep neural networks for wireless resource management. In 18th IEEE international workshop on signal processing advances in wireless communications, 2017 (pp. 1\u20136).","DOI":"10.1109\/SPAWC.2017.8227766"},{"key":"10011_CR72","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.specom.2019.04.002","volume":"110","author":"J Wang","year":"2019","unstructured":"Wang, J., Shan, Y., Xie, X., & Kuang, J. (2019). Output-based speech quality assessment using autoencoder and support vector regression. Speech Communication, 110, 13\u201320.","journal-title":"Speech Communication"},{"key":"10011_CR73","doi-asserted-by":"crossref","unstructured":"Yang, H., et al. (2016). Parametric-based non-intrusive speech quality assessment by deep neural network. In IEEE international conference on digital signal processing (DSP), 2016 (pp. 99\u2013103).","DOI":"10.1109\/ICDSP.2016.7868524"},{"issue":"1","key":"10011_CR74","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1109\/LWC.2017.2757490","volume":"7","author":"H Ye","year":"2017","unstructured":"Ye, H., Li, G. Y., & Juang, B. H. (2017). Power of deep learning for channel estimation and signal detection in OFDM systems. IEEE Wireless Communications Letters, 7(1), 114\u2013117.","journal-title":"IEEE Wireless Communications Letters"},{"key":"10011_CR75","doi-asserted-by":"crossref","unstructured":"Ye, H., Li, G. Y., Juang, B. H. F., & Sivanesan, K. (2018). Channel agnostic end-to-end learning based communication systems with conditional GAN. In IEEE GLOBECOM Workshop, 2018 (pp. 1\u20135).","DOI":"10.1109\/GLOCOMW.2018.8644250"}],"container-title":["International Journal of Speech Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10772-022-10011-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10772-022-10011-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10772-022-10011-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,9]],"date-time":"2023-03-09T17:22:36Z","timestamp":1678382556000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10772-022-10011-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,23]]},"references-count":75,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["10011"],"URL":"https:\/\/doi.org\/10.1007\/s10772-022-10011-y","relation":{},"ISSN":["1381-2416","1572-8110"],"issn-type":[{"value":"1381-2416","type":"print"},{"value":"1572-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,23]]},"assertion":[{"value":"22 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 October 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors would like to declare that there is no conflict of interest with this manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}