{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T08:03:24Z","timestamp":1768637004523,"version":"3.49.0"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T00:00:00Z","timestamp":1609977600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T00:00:00Z","timestamp":1609977600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003329","name":"Ministerio de Econom?a y Competitividad","doi-asserted-by":"publisher","award":["TIN2017-85854-C4-1-R"],"award-info":[{"award-number":["TIN2017-85854-C4-1-R"]}],"id":[{"id":"10.13039\/501100003329","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010067","name":"Gobierno de Arag?n","doi-asserted-by":"publisher","award":["Reference Group T3617R"],"award-info":[{"award-number":["Reference Group T3617R"]}],"id":[{"id":"10.13039\/501100010067","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The progressive paradigm is a promising strategy to optimize network performance for speech enhancement purposes. Recent works have shown different strategies to improve the accuracy of speech enhancement solutions based on this mechanism. This paper studies the progressive speech enhancement using convolutional and residual neural network architectures and explores two criteria for loss function optimization: weighted and uniform progressive. This work carries out the evaluation on simulated and real speech samples with reverberation and added noise using REVERB and VoiceHome datasets. Experimental results show a variety of achievements among the loss function optimization criteria and the network architectures. Results show that the progressive design strengthens the model and increases the robustness to distortions due to reverberation and noise.<\/jats:p>","DOI":"10.1186\/s13636-020-00191-3","type":"journal-article","created":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T14:03:57Z","timestamp":1610028237000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Progressive loss functions for speech enhancement with deep neural networks"],"prefix":"10.1186","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9407-5817","authenticated-orcid":false,"given":"Jorge","family":"Llombart","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dayana","family":"Ribas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonio","family":"Miguel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luis","family":"Vicente","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alfonso","family":"Ortega","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eduardo","family":"Lleida","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,1,7]]},"reference":[{"key":"191_CR1","doi-asserted-by":"publisher","unstructured":"J. Llombart, D. Ribas, A. Miguel, L. Vicente, A. Ortega, E. Lleida, in Proc. Interspeech 2019. Progressive speech enhancement with residual connections (Graz, 2019), pp. 3193\u20133197. https:\/\/doi.org\/10.21437\/Interspeech.2019-1748.","DOI":"10.21437\/Interspeech.2019-1748"},{"key":"191_CR2","doi-asserted-by":"crossref","unstructured":"T. Gao, J. Du, L. R. Dai, C. H. Lee, in Interspeech 2016. SNR-based progressive learning of deep neural network for speech enhancement, (San Francisco, 2016), pp. 3713\u20133717.","DOI":"10.21437\/Interspeech.2016-224"},{"key":"191_CR3","first-page":"5054","volume-title":"IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)","author":"T. Gao","year":"2018","unstructured":"T. Gao, J. Du, L. R. Dai, C. H. Lee, in IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). Densely connected progressive learning for LSTM-based speech enhancement (IEEEAlberta, 2018), pp. 5054\u20135058."},{"key":"191_CR4","doi-asserted-by":"publisher","first-page":"5039","DOI":"10.1109\/ICASSP.2018.8462068","volume-title":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"M. H. Soni","year":"2018","unstructured":"M. H. Soni, N. Shah, H. A. Patil, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Time-frequency masking-based speech enhancement using generative adversarial network (IEEEAlberta, 2018), pp. 5039\u20135043."},{"key":"191_CR5","unstructured":"H. S. Choi, J. H. Kim, J. Huh, A. Kim, J. W. Ha, K. Lee, in International Conference on Learning Representations (ICLR 2018). Phase-aware speech enhancement with deep complex u-net (Vancouver, 2018)."},{"key":"191_CR6","doi-asserted-by":"crossref","unstructured":"J. Abdulbaqi, Y. Gu, I. Marsic, RHR-Net: a residual hourglass recurrent neural network for speech enhancement. arXiv preprint arXiv:1904.07294 (2019). Accessed 06 July 2020.","DOI":"10.1109\/ICASSP40776.2020.9053544"},{"key":"191_CR7","unstructured":"S. W. Fu, Y. Tsao, X. Lu, in Interspeech 2016. SNR-aware convolutional neural network modeling for speech enhancement (San Francisco, 2016), pp. 3768\u20133772."},{"key":"191_CR8","doi-asserted-by":"crossref","unstructured":"S. R. Park, J. Lee, in nterspeech 2017. A fully convolutional neural network for speech enhancement (Stockholm, 2017), pp. 1993\u20131997.","DOI":"10.21437\/Interspeech.2017-1465"},{"key":"191_CR9","first-page":"2401","volume-title":"IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)","author":"H. Zhao","year":"2018","unstructured":"H. Zhao, S. Zarar, I. Tashev, C. H. Lee, in IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). Convolutional-recurrent neural networks for speech enhancement (IEEEAlberta, 2018), pp. 2401\u20132405."},{"key":"191_CR10","doi-asserted-by":"crossref","unstructured":"Z. Chen, Y. Huang, J. Li, Y. Gong, in Interspeech 2017. Improving mask learning based speech enhancement system with restoration layers and residual connection (Stockholm, 2017), pp. 3632\u20133637.","DOI":"10.21437\/Interspeech.2017-515"},{"key":"191_CR11","doi-asserted-by":"crossref","unstructured":"J. Llombart, A. Miguel, A. Ortega, E. Lleida, in IberSPEECH 2018. Wide residual networks 1D for automatic text punctuation (Barcelona, 2018), pp. 296\u2013300.","DOI":"10.21437\/IberSPEECH.2018-62"},{"key":"191_CR12","doi-asserted-by":"publisher","unstructured":"J. Llombart, D. Ribas, A. Miguel, L. Vicente, A. Ortega, E. Lleida, in Proc. Interspeech 2019. Speech enhancement with wide residual networks in reverberant environments (Graz, 2019), pp. 1811\u20131815. https:\/\/doi.org\/10.21437\/Interspeech.2019-1745.","DOI":"10.21437\/Interspeech.2019-1745"},{"key":"191_CR13","unstructured":"L. Wyse, Audio spectrogram representations for processing with convolutional neural networks. arXiv preprint arXiv:1706.09559 (2017). Accessed 06 July 2020."},{"key":"191_CR14","doi-asserted-by":"publisher","first-page":"8360","DOI":"10.1109\/ICASSP.2019.8682194","volume-title":"ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"S. Kiranyaz","year":"2019","unstructured":"S. Kiranyaz, T. Ince, O. Abdeljaber, O. Avci, M. Gabbouj, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1-D convolutional neural networks for signal processing applications (IEEEBrighton, 2019), pp. 8360\u20138364."},{"issue":"1","key":"191_CR15","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1109\/TASLP.2016.2623559","volume":"25","author":"B. Wu","year":"2016","unstructured":"B. Wu, K. Li, M. Yang, C. H. Lee, A reverberation-time-aware approach to speech dereverberation based on deep neural networks. IEEE\/ACM Trans. Audio Speech Lang. Process.25(1), 102\u2013111 (2016).","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"191_CR16","unstructured":"A. Rousseau, P. Del\u00e9glise, Y. Esteve, in LREC 2014. Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks, (Reykjavik, 2014), pp. 3935\u20133939."},{"key":"191_CR17","doi-asserted-by":"publisher","first-page":"5206","DOI":"10.1109\/ICASSP.2015.7178964","volume-title":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"V. Panayotov","year":"2015","unstructured":"V. Panayotov, G. Chen, D. Povey, S. Khudanpur, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Librispeech: an ASR corpus based on public domain audio books (IEEESouth Brisbane, 2015), pp. 5206\u20135210."},{"key":"191_CR18","first-page":"27403","volume":"93","author":"J. S. Garofolo","year":"1993","unstructured":"J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI\/Recon Tech. Rep. n. 93:, 27403 (1993).","journal-title":"NASA STI\/Recon Tech. Rep. n"},{"issue":"4","key":"191_CR19","doi-asserted-by":"publisher","first-page":"943","DOI":"10.1121\/1.382599","volume":"65","author":"J. B. Allen","year":"1979","unstructured":"J. B. Allen, D. A. Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am.65(4), 943\u2013950 (1979).","journal-title":"J. Acoust. Soc. Am."},{"key":"191_CR20","unstructured":"D. Snyder, G. Chen, D. Povey, MUSAN: a music, speech, and noise corpus. arXiv:1510.08484v1 (2015). http:\/\/arxiv.org\/abs\/1510.08484. Accessed 08 July 2020."},{"key":"191_CR21","first-page":"1","volume-title":"Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-13)","author":"K. Kinoshita","year":"2013","unstructured":"K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, S. Gannot, B. Raj, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA-13). The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech (IEEENew Paltz, 2013), pp. 1\u20134."},{"key":"191_CR22","doi-asserted-by":"crossref","unstructured":"N. Bertin, E. Camberlein, E. Vincent, R. Lebarbenchon, S. Peillon, \u00c9. Lamand\u00e9, S. Sivasankaran, F. Bimbot, I. Illina, A. Tom, et al, in Interspeech 2016. A French corpus for distant-microphone speech processing in real homes (San Francisco, 2016), pp. 2781\u20132785.","DOI":"10.21437\/Interspeech.2016-1384"},{"key":"191_CR23","unstructured":"N. Bertin, E. Camberlein, R. Lebarbenchon, E. Vincent, S. Sivasankaran, I. Illina, F. Bimbot, 106. VoiceHome-2, an extended corpus for multichannel speech processing in real homes, (2019), pp. 68\u201378."},{"key":"191_CR24","first-page":"81","volume-title":"IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)","author":"T. Robinson","year":"1995","unstructured":"T. Robinson, J. Fransen, D. Pye, J. Foote, S. Renals, in IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP). WSJCAM0: a British English speech corpus for large vocabulary continuous speech recognition (IEEEDetroit, 1995), pp. 81\u201384."},{"key":"191_CR25","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1109\/ASRU.2005.1566470","volume-title":"Proceedings of the 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-05)","author":"M. Lincoln","year":"2005","unstructured":"M. Lincoln, I. McCowan, J. Vepa, H. K. Maganti, in Proceedings of the 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-05). The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments (IEEEPhiladelphia, 2005), pp. 357\u2013362."},{"key":"191_CR26","doi-asserted-by":"crossref","unstructured":"C. Kim, R. M. Stern, in Ninth Annual Conference of the International Speech Communication Association (Interspeech 2008). Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis (Brisbane, 2008).","DOI":"10.21437\/Interspeech.2008-644"},{"issue":"7","key":"191_CR27","doi-asserted-by":"publisher","first-page":"1766","DOI":"10.1109\/TASL.2010.2052247","volume":"18","author":"T. H. Falk","year":"2010","unstructured":"T. H. Falk, C. Zheng, W. Y. Chan, A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech. IEEE Trans. Audio Speech Lang. Process.18(7), 1766\u20131774 (2010).","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"191_CR28","unstructured":"J. F. Santos, M. Senoussaoui, T. H. Falk, in Proc. Int. Workshop Acoust. Signal Enhancement (IWAENC 2014). An updated objective intelligibility estimation metric for normal hearing listeners under noise and reverberation (Antibes - Jaun les Pins, 2014), pp. 55\u201359."},{"key":"191_CR29","volume-title":"Speech quality asssessment. in: multimedia analysis, processing and communications","author":"P. C. Loizou","year":"2011","unstructured":"P. C. Loizou, Speech quality asssessment. in: multimedia analysis, processing and communications (Springer, Berlin, 2011)."},{"key":"191_CR30","doi-asserted-by":"publisher","first-page":"749","DOI":"10.1109\/ICASSP.2001.941023","volume-title":"2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 2","author":"A. W. Rix","year":"2001","unstructured":"A. W. Rix, J. G. Beerends, M. P. Hollier, A. P. Hekstra, in 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), vol. 2. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs (IEEESalt Lake City, 2001), pp. 749\u2013752."},{"key":"191_CR31","unstructured":"L. Drude, J. Heymann, C. Boeddeker, R. Haeb-Umbach, NARA-WPE: a Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing, (Stuttgart, 2018)."},{"key":"191_CR32","unstructured":"T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, B. -H. Juang, 18. Speech dereverberation based on variance-normalized delayed linear prediction, (2010), pp. 1717\u20131731."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-020-00191-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13636-020-00191-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-020-00191-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,10]],"date-time":"2022-12-10T22:28:32Z","timestamp":1670711312000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-020-00191-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,7]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["191"],"URL":"https:\/\/doi.org\/10.1186\/s13636-020-00191-3","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,7]]},"assertion":[{"value":"8 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"1"}}