{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:57:53Z","timestamp":1760151473445,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,22]],"date-time":"2022-03-22T00:00:00Z","timestamp":1647907200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Monaural speech enhancement aims to remove background noise from an audio recording containing speech in order to improve its clarity and intelligibility. Currently, the most successful solutions for speech enhancement use deep neural networks. In a typical setting, such neural networks process the noisy input signal once and produces a single enhanced signal. However, it was recently shown that a U-Net-based network can be trained in such a way that allows it to process the same input signal multiple times in order to enhance the speech even further. Unfortunately, this was tested only for two-iteration enhancement. In the current research, we extend previous efforts and demonstrate how the multi-forward-pass speech enhancement can be successfully applied to other architectures, namely the ResBLSTM and Transformer-Net. Moreover, we test the three architectures with up to five iterations, thus identifying the method\u2019s limit in terms of performance gain. In our experiments, we used the audio samples from the WSJ0, Noisex-92, and DCASE datasets and measured speech enhancement quality using SI-SDR, STOI, and PESQ. The results show that performing speech enhancement up to five times still brings improvements to speech intelligibility, but the gain becomes smaller with each iteration. Nevertheless, performing five iterations instead of two gives additional a 0.6 dB SI-SDR and four-percentage-point STOI gain. However, these increments are not equal between different architectures, and the U-Net and Transformer-Net benefit more from multi-forward pass compared to ResBLSTM.<\/jats:p>","DOI":"10.3390\/s22072440","type":"journal-article","created":{"date-parts":[[2022,3,22]],"date-time":"2022-03-22T23:30:23Z","timestamp":1647991823000},"page":"2440","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Speech Enhancement by Multiple Propagation through the Same Neural Network"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9388-0494","authenticated-orcid":false,"given":"Tomasz","family":"Grzywalski","sequence":"first","affiliation":[{"name":"Institute of Automatic Control and Robotics, Poznan University of Technology, 60-965 Poznan, Poland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4603-8894","authenticated-orcid":false,"given":"Szymon","family":"Drgas","sequence":"additional","affiliation":[{"name":"Institute of Automatic Control and Robotics, Poznan University of Technology, 60-965 Poznan, Poland"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1109\/MSPEC.2017.7864754","article-title":"Deep learning reinvents the hearing aid","volume":"54","author":"Wang","year":"2017","journal-title":"IEEE Spectr."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ananthakrishnan, K.S., and Dogancay, K. (2009, January 23\u201326). Recent trends and challenges in speech-separation systems research\u2014A tutorial review. Proceedings of the TENCON 2009\u20142009 IEEE Region 10 Conference, Singapore.","DOI":"10.1109\/TENCON.2009.5396022"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Mowlaee, P., Saeidi, R., Christensen, M.G., and Martin, R. (2012, January 25\u201330). Subjective and objective quality assessment of single-channel speech separation algorithms. Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan.","DOI":"10.1109\/ICASSP.2012.6287819"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1849","DOI":"10.1109\/TASLP.2014.2352935","article-title":"On training targets for supervised speech separation","volume":"22","author":"Wang","year":"2014","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1492","DOI":"10.1109\/TASLP.2017.2696307","article-title":"Time-frequency masking in the complex domain for speech dereverberation and denoising","volume":"25","author":"Williamson","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Huang, P.S., Kim, M., Hasegawa-Johnson, M., and Smaragdis, P. (2014, January 4\u20139). Deep learning for monaural speech separation. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853860"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2136","DOI":"10.1109\/TASLP.2015.2468583","article-title":"Joint optimization of masks and deep recurrent neural networks for monaural source separation","volume":"23","author":"Huang","year":"2015","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Park, S.R., and Lee, J. (2016). A fully convolutional neural network for speech enhancement. arXiv.","DOI":"10.21437\/Interspeech.2017-1465"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4705","DOI":"10.1121\/1.4986931","article-title":"Long short-term memory for speaker generalization in supervised speech separation","volume":"141","author":"Chen","year":"2017","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zarar, S., Tashev, I., and Lee, C.H. (2018). Convolutional-Recurrent Neural Networks for Speech Enhancement. arXiv.","DOI":"10.1109\/ICASSP.2018.8462155"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1109\/TASLP.2018.2876171","article-title":"Gated residual networks with dilated convolutions for monaural speech enhancement","volume":"27","author":"Tan","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3143","DOI":"10.21437\/Interspeech.2019-2782","article-title":"Monaural Speech Enhancement with Dilated Convolutions","volume":"2019","author":"Pirhosseinloo","year":"2019","journal-title":"Proc. Interspeech"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Pandey, A., and Wang, D. (2019, January 12\u201317). TCNN: Temporal Convolutional Neural Network for Real-Time Speech Enhancement in The Time Domain. Proceedings of the ICASSP 2019\u20142019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683634"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.specom.2019.06.002","article-title":"Deep learning for minimum mean-square error approaches to speech enhancement","volume":"111","author":"Nicolson","year":"2019","journal-title":"Speech Commun."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kim, J., El-Khamy, M., and Lee, J. (2017). Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv.","DOI":"10.21437\/Interspeech.2017-477"},{"key":"ref_16","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv, Available online: http:\/\/xxx.lanl.gov\/abs\/1706.03762."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, K., He, B., and Zhu, W.P. (2021). TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain. arXiv, Available online: http:\/\/xxx.lanl.gov\/abs\/2103.09963.","DOI":"10.1109\/ICASSP39728.2021.9413740"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xu, Z., Jiang, T., Li, C., and Yu, J. (2021, January 24\u201326). An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement. Proceedings of the 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong, China.","DOI":"10.1109\/ISCSLP49672.2021.9362114"},{"key":"ref_19","unstructured":"Zhang, Q., Song, Q., Ni, Z., Nicolson, A., and Li, H. (2021). Time-Frequency Attention for Monaural Speech Enhancement. arXiv, Available online: http:\/\/xxx.lanl.gov\/abs\/2111.07518."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shaw, P., Uszkoreit, J., and Vaswani, A. (2018, January 1\u20136). Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-2074"},{"key":"ref_21","unstructured":"Liu, X., Yu, H.F., Dhillon, I., and Hsieh, C.J. (2020, January 13\u201318). Learning to Encode Position for Transformer with Continuous Dynamical Model. Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Grzywalski, T., and Drgas, S. (2018, January 19\u201321). Application of recurrent U-net architecture to speech enhancement. Proceedings of the Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland.","DOI":"10.23919\/SPA.2018.8563364"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Grzywalski, T., and Drgas, S. (2019, January 12\u201317). Using recurrences in time and frequency within U-net architecture for speech enhancement. Proceedings of the ICASSP 2019\u20142019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682830"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"107347","DOI":"10.1016\/j.apacoust.2020.107347","article-title":"Speech enhancement using progressive learning-based convolutional recurrent neural network","volume":"166","author":"Li","year":"2020","journal-title":"Appl. Acoust."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Grzywalski, T., and Drgas, S. (2020, January 23\u201325). Speech enhancement by iterating forward pass through U-net. Proceedings of the 2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland.","DOI":"10.23919\/SPA50552.2020.9241307"},{"key":"ref_26","first-page":"3713","article-title":"SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement","volume":"2016","author":"Gao","year":"2016","journal-title":"Interspeech"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Gao, T., Du, J., Dai, L.R., and Lee, C.H. (2018, January 15\u201320). Densely connected progressive learning for lstm-based speech enhancement. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461861"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_29","unstructured":"Clevert, D.A., Unterthiner, T., and Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv."},{"key":"ref_30","unstructured":"Garofalo, J., Graff, D., Paul, D., and Pallett, D. (2022, February 17). Csr-i (wsj0) Complete. Linguistic Data Consortium, Philadelphia. Available online: https:\/\/catalog.ldc.upenn.edu\/LDC93S6A."},{"key":"ref_31","unstructured":"Stowell, D., and Plumbley, M.D. (2013). An Open Dataset for Research on Audio Field Recording Archives: Freefield1010. arXiv, Available online: http:\/\/xxx.lanl.gov\/abs\/1309.5275."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Reddy, C.K., Dubey, H., Koishida, K., Nair, A., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., and Srinivasan, S. (September, January 30). INTERSPEECH 2021 Deep Noise Suppression Challenge. Proceedings of the INTERSPEECH 2021, Brno, Czech Republic.","DOI":"10.21437\/Interspeech.2021-1609"},{"key":"ref_33","unstructured":"(2019, September 28). Freesound. Available online: https:\/\/freesound.org\/."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/0167-6393(93)90095-3","article-title":"Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems","volume":"12","author":"Varga","year":"1993","journal-title":"Speech Commun."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"992","DOI":"10.1109\/TASLP.2019.2907016","article-title":"Sound event detection in the DCASE 2017 Challenge","volume":"27","author":"Mesaros","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_36","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2440\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:40:56Z","timestamp":1760136056000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2440"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,22]]},"references-count":36,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["s22072440"],"URL":"https:\/\/doi.org\/10.3390\/s22072440","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,3,22]]}}}