{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T03:13:41Z","timestamp":1774322021110,"version":"3.50.1"},"reference-count":117,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,1,26]],"date-time":"2023-01-26T00:00:00Z","timestamp":1674691200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"subvention financial means","award":["0211\/SBAD\/0222"],"award-info":[{"award-number":["0211\/SBAD\/0222"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper presents recent advances in low-latency, single-channel, deep neural network-based speech enhancement systems. The sources of latency and their acceptable values in different applications are described. This is followed by an analysis of the constraints imposed on neural network architectures. Specifically, the causal units used in deep neural networks are presented and discussed in the context of their properties, such as the number of parameters, the receptive field, and computational complexity. This is followed by a discussion of techniques used to reduce the computational complexity and memory requirements of the neural networks used in this task. Finally, the techniques used by the winners of the latest speech enhancement challenges (DNS, Clarity) are shown and compared.<\/jats:p>","DOI":"10.3390\/s23031380","type":"journal-article","created":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T01:27:58Z","timestamp":1674782878000},"page":"1380","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["A Survey on Low-Latency DNN-Based Speech Enhancement"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4603-8894","authenticated-orcid":false,"given":"Szymon","family":"Drgas","sequence":"first","affiliation":[{"name":"Institute of Automatic Control and Robotics, Poznan University of Technology, Piotrowo 3A Street, 60-965 Poznan, Poland"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Taal, C.H., Hendriks, R.C., Heusdens, R., and Jensen, J. (2010, January 14\u201319). A short-time objective intelligibility measure for time-frequency weighted noisy speech. Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA.","DOI":"10.1109\/ICASSP.2010.5495701"},{"key":"ref_2","unstructured":"Rix, A.W., Beerends, J.G., Hollier, M.P., and Hekstra, A.P. (2001, January 7\u201311). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ullah, R., Wuttisittikulkij, L., Chaudhary, S., Parnianifard, A., Shah, S., Ibrar, M., and Wahab, F.E. (2022). End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement. Sensors, 22.","DOI":"10.3390\/s22207782"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1849","DOI":"10.1109\/TASLP.2014.2352935","article-title":"On training targets for supervised speech separation","volume":"22","author":"Wang","year":"2014","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Erdogan, H., Hershey, J.R., Watanabe, S., and Le Roux, J. (2015, January 19\u201324). Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178061"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1492","DOI":"10.1109\/TASLP.2017.2696307","article-title":"Time-frequency masking in the complex domain for speech dereverberation and denoising","volume":"25","author":"Williamson","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Huang, P.S., Kim, M., Hasegawa-Johnson, M., and Smaragdis, P. (2014, January 4\u20139). Deep learning for monaural speech separation. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853860"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2136","DOI":"10.1109\/TASLP.2015.2468583","article-title":"Joint optimization of masks and deep recurrent neural networks for monaural source separation","volume":"23","author":"Huang","year":"2015","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Park, S.R., and Lee, J. (2016). A fully convolutional neural network for speech enhancement. arXiv.","DOI":"10.21437\/Interspeech.2017-1465"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1109\/TASLP.2018.2876171","article-title":"Gated residual networks with dilated convolutions for monaural speech enhancement","volume":"27","author":"Tan","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pirhosseinloo, S., and Brumberg, J.S. (2019, January 15\u201319). Monaural Speech Enhancement with Dilated Convolutions. Proceedings of the Interspeech, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-2782"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Pandey, A., and Wang, D. (2019, January 12\u201317). TCNN: Temporal Convolutional Neural Network for Real-Time Speech Enhancement in The Time Domain. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683634"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"4705","DOI":"10.1121\/1.4986931","article-title":"Long short-term memory for speaker generalization in supervised speech separation","volume":"141","author":"Chen","year":"2017","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zarar, S., Tashev, I., and Lee, C.H. (2018). Convolutional-Recurrent Neural Networks for Speech Enhancement. arXiv.","DOI":"10.1109\/ICASSP.2018.8462155"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Grzywalski, T., and Drgas, S. (2018, January 19\u201321). Application of recurrent U-net architecture to speech enhancement. Proceedings of the Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland.","DOI":"10.23919\/SPA.2018.8563364"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Grzywalski, T., and Drgas, S. (2019, January 12\u201317). Using recurrences in time and frequency within U-net architecture for speech enhancement. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682830"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"18617","DOI":"10.1007\/s11042-022-12632-6","article-title":"Speech enhancement using U-nets with wide-context units","volume":"81","author":"Grzywalski","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_18","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, K., He, B., and Zhu, W.P. (2021). TSTNN: Two-stage Transformer based Neural Network for Speech Enhancement in the Time Domain. arXiv.","DOI":"10.1109\/ICASSP39728.2021.9413740"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, Z., Jiang, T., Li, C., and Yu, J. (2021, January 24\u201327). An Attention-augmented Fully Convolutional Neural Network for Monaural Speech Enhancement. Proceedings of the 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP), Hong Kong, China.","DOI":"10.1109\/ISCSLP49672.2021.9362114"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Song, Q., Ni, Z., Nicolson, A., and Li, H. (2022, January 23\u201327). Time-Frequency Attention for Monaural Speech Enhancement. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746454"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Shaw, P., Uszkoreit, J., and Vaswani, A. (2018, January 1\u20136). Self-Attention with Relative Position Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-2074"},{"key":"ref_23","first-page":"6327","article-title":"Learning to Encode Position for Transformer with Continuous Dynamical Model","volume":"Volume 119","author":"Singh","year":"2020","journal-title":"Proceedings of the 37th International Conference on Machine Learning"},{"key":"ref_24","unstructured":"Han, S., Pool, J., Tran, J., and Dally, W. (2015, January 7\u201312). Learning both weights and connections for efficient neural network. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_25","unstructured":"Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., and Zue, V. (1993). TIMIT Acoustic Phonetic Continuous Speech Corpus, Linguistic Data Consortium."},{"key":"ref_26","unstructured":"Garofalo, J., Graff, D., Paul, D., and Pallett, D. (2007). Csr-i (wsj0) Complete, Linguistic Data Consortium."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Veaux, C., Yamagishi, J., and King, S. (2013, January 25\u201327). The voice bank corpus: Design, collection and data analysis of a large regional accent speech database. Proceedings of the 2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA\/CASLRE), Gurgaon, India.","DOI":"10.1109\/ICSDA.2013.6709856"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. (2015, January 19\u201324). Librispeech: An asr corpus based on public domain audio books. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"035081","DOI":"10.1121\/1.4799597","article-title":"The diverse environments multi-channel acoustic noise database (demand): A database of multichannel environmental noise recordings","volume":"19","author":"Thiemann","year":"2013","journal-title":"Proc. Meet. Acoust."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/0167-6393(93)90095-3","article-title":"Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems","volume":"12","author":"Varga","year":"1993","journal-title":"Speech Commun."},{"key":"ref_31","unstructured":"Stowell, D., and Plumbley, M.D. (2013). An open dataset for research on audio field recording archives: Freefield1010. arXiv."},{"key":"ref_32","unstructured":"Neyshabur, B., Li, Z., Bhojanapalli, S., LeCun, Y., and Srebro, N. (2018). Towards understanding the role of over-parametrization in generalization of neural networks. arXiv."},{"key":"ref_33","unstructured":"ITU-T (1996). One-Way Transmission Time, International Telecommunication Union. Recommendation G.114."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1080\/14992020802644871","article-title":"Audiovisual asynchrony detection and speech perception in hearing-impaired listeners with cochlear implants: A preliminary analysis","volume":"48","author":"Pisoni","year":"2009","journal-title":"Int. J. Audiol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1097\/00003446-199906000-00002","article-title":"Tolerable hearing aid delays. I. Estimation of limits imposed by the auditory path alone using simulated hearing losses","volume":"20","author":"Stone","year":"1999","journal-title":"Ear Hear."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1097\/00003446-200504000-00009","article-title":"Tolerable hearing-aid delays: IV. Effects on subjective disturbance during speech production by hearing-impaired subjects","volume":"26","author":"Stone","year":"2005","journal-title":"Ear Hear."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1080\/14992027.2017.1367848","article-title":"Tolerable delay for speech production and perception: Effects of hearing ability and experience with hearing aids","volume":"57","author":"Goehring","year":"2018","journal-title":"Int. J. Audiol."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Rethage, D., Pons, J., and Serra, X. (2018, January 15\u201320). A wavenet for speech denoising. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462417"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1109\/TASLP.2021.3064421","article-title":"Dense CNN with self-attention for time-domain speech enhancement","volume":"29","author":"Pandey","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1109\/TASLP.2019.2915167","article-title":"Conv-tasnet: Surpassing ideal time\u2013frequency magnitude masking for speech separation","volume":"27","author":"Luo","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_41","first-page":"9458","article-title":"Phasen: A phase-and-harmonics-aware speech enhancement network","volume":"34","author":"Yin","year":"2020","journal-title":"Proc. Aaai Conf. Artif. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1109\/TASLP.2018.2870725","article-title":"Two-stage deep learning for noisy-reverberant speech enhancement","volume":"27","author":"Zhao","year":"2018","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1702","DOI":"10.1109\/TASLP.2018.2842159","article-title":"Supervised speech separation based on deep learning: An overview","volume":"26","author":"Wang","year":"2018","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Schmidhuber","year":"1997","journal-title":"Neural Comput."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv.","DOI":"10.1007\/978-3-642-24797-2_3"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Cho, K., Van Merri\u00ebnboer, B., Bahdanau, D., and Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv.","DOI":"10.3115\/v1\/W14-4012"},{"key":"ref_47","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Choi, H.S., Park, S., Lee, J.H., Heo, H., Jeon, D., and Lee, K. (2021, January 6\u201311). Real-time denoising and dereverberation wtih tiny recurrent u-net. Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414852"},{"key":"ref_49","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_50","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5\u20139 October 2015, Springer."},{"key":"ref_51","unstructured":"Macartney, C., and Weyde, T. (2018). Improved speech enhancement with the wave-u-net. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Grzywalski, T., and Drgas, S. (2022). Speech Enhancement by Multiple Propagation through the Same Neural Network. Sensors, 22.","DOI":"10.3390\/s22072440"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Tan, K., and Wang, D. (2018, January 2\u20136). A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement. Proceedings of the Interspeech, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-1405"},{"key":"ref_54","unstructured":"Liu, P.J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser, L., and Shazeer, N. (2018). Generating wikipedia by summarizing long sequences. arXiv."},{"key":"ref_55","unstructured":"Huang, C.Z.A., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A.M., Hoffman, M.D., Dinculescu, M., and Eck, D. (2018). Music transformer. arXiv."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.specom.2020.10.004","article-title":"Masked multi-head self-attention for causal speech enhancement","volume":"125","author":"Nicolson","year":"2020","journal-title":"Speech Commun."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.specom.2019.06.002","article-title":"Deep learning for minimum mean-square error approaches to speech enhancement","volume":"111","author":"Nicolson","year":"2019","journal-title":"Speech Commun."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1404","DOI":"10.1109\/TASLP.2020.2987441","article-title":"DeepMMSE: A deep learning approach to MMSE-based noise power spectral density estimation","volume":"28","author":"Zhang","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_59","unstructured":"Oostermeijer, K., Wang, Q., and Du, J. (September, January 30). Lightweight Causal Transformer with Local Self-Attention for Real-Time Speech Enhancement. Proceedings of the Interspeech, Brno, Czech Republic."},{"key":"ref_60","unstructured":"Freire, P.J., Srivallapanondh, S., Napoli, A., Prilepsky, J.E., and Turitsyn, S.K. (2022). Computational complexity evaluation of neural network applications in signal processing. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Valin, J.M. (2018, January 29\u201331). A hybrid DSP\/deep learning approach to real-time full-band speech enhancement. Proceedings of the 2018 IEEE 20th international workshop on multimedia signal processing (MMSP), Vancouver, BC, Canada.","DOI":"10.1109\/MMSP.2018.8547084"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Guti\u00e9rrez-Mu\u00f1oz, M., Gonz\u00e1lez-Salazar, A., and Coto-Jim\u00e9nez, M. (2019). Evaluation of mixed deep neural networks for reverberant speech enhancement. Biomimetics, 5.","DOI":"10.20944\/preprints201910.0376.v1"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Khandelwal, P., MacGlashan, J., Wurman, P., and Stone, P. (June, January 30). Efficient Real-Time Inference in Temporal Convolution Networks. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9560784"},{"key":"ref_64","unstructured":"Mauler, D., and Martin, R. (2007, January 3\u20137). A low delay, variable resolution, perfect reconstruction spectral analysis-synthesis system for speech enhancement. Proceedings of the 2007 15th European Signal Processing Conference, Poznan, Poland."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Wang, Z.Q., Wichern, G., Watanabe, S., and Roux, J.L. (2022). STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency. arXiv.","DOI":"10.1109\/TASLP.2022.3224285"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Wang, S., Naithani, G., Politis, A., and Virtanen, T. (2021, January 23\u201327). Deep neural network based low-latency speech separation with asymmetric analysis-synthesis window pair. Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland.","DOI":"10.23919\/EUSIPCO54536.2021.9616165"},{"key":"ref_67","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1109\/72.248452","article-title":"Pruning algorithms-a survey","volume":"4","author":"Reed","year":"1993","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_69","unstructured":"Liu, J., Tripathi, S., Kurup, U., and Shah, M. (2020). Pruning algorithms to accelerate convolutional neural networks for edge applications: A survey. arXiv."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Srinivas, S., and Babu, R.V. (2015). Data-free parameter pruning for deep neural networks. arXiv.","DOI":"10.5244\/C.29.31"},{"key":"ref_71","unstructured":"LeCun, Y., Denker, J., and Solla, S. (1989, January 27\u201330). Optimal brain damage. Proceedings of the Advances in Neural Information Processing Systems, NIPS Conference, Denver, CO, USA."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"1785","DOI":"10.1109\/TASLP.2021.3082282","article-title":"Towards model compression for deep learning based speech enhancement","volume":"29","author":"Tan","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Tan, K., and Wang, D. (2021, January 6\u201311). Compressing deep neural networks for efficient speech enhancement. Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413536"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Ye, F., Tsao, Y., and Chen, F. (2019, January 18\u201321). Subjective feedback-based neural network pruning for speech enhancement. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China.","DOI":"10.1109\/APSIPAASC47483.2019.9023330"},{"key":"ref_75","first-page":"11","article-title":"IEEE standard 754 for binary floating-point arithmetic","volume":"754","author":"Kahan","year":"1996","journal-title":"Lect. Notes Status IEEE"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Nicodemo, N., Naithani, G., Drossos, K., Virtanen, T., and Saletti, R. (2021, January 18\u201321). Memory requirement reduction of deep neural networks for field programmable gate arrays using low-bit quantization of parameters. Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands.","DOI":"10.23919\/Eusipco47968.2020.9287739"},{"key":"ref_77","unstructured":"Bhandare, A., Sripathi, V., Karkada, D., Menon, V., Choi, S., Datta, K., and Saletore, V. (2019). Efficient 8-bit quantization of transformer neural machine language translation model. arXiv."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Nguyen, H.D., Alexandridis, A., and Mouchtaris, A. (2020, January 25\u201329). Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition. Proceedings of the Interspeech, Shanghai, China.","DOI":"10.21437\/Interspeech.2020-1991"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. (2018, January 18\u201323). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00286"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Lin, Y.C., Hsu, Y.T., Fu, S.W., Tsao, Y., and Kuo, T.W. (2019, January 15\u201319). IA-NET: Acceleration and Compression of Speech Enhancement Using Integer-Adder Deep Neural Network. Proceedings of the Interspeech, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-1207"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., and Ramabhadran, B. (2013, January 26\u201331). Low-rank matrix factorization for deep neural network training with high-dimensional output targets. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638949"},{"key":"ref_82","unstructured":"Denil, M., Shakibi, B., Dinh, L., Ranzato, M., and De Freitas, N. (2013, January 5\u201310). Predicting parameters in deep learning. Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1137\/S0895479896305696","article-title":"A multilinear singular value decomposition","volume":"21","author":"Vandewalle","year":"2000","journal-title":"SIAM J. Matrix Anal. Appl."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Zdunek, R., and Gabor, M. (2022, January 18\u201323). Nested compression of convolutional neural networks with Tucker-2 decomposition. Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy.","DOI":"10.1109\/IJCNN55064.2022.9892959"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_86","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF02289464","article-title":"Some mathematical notes on three-mode factor analysis","volume":"31","author":"Tucker","year":"1966","journal-title":"Psychometrika"},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1109\/MSP.2012.2211477","article-title":"The mnist database of handwritten digit images for machine learning research","volume":"29","author":"Deng","year":"2012","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_89","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images, University of Toronto."},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"2837","DOI":"10.1109\/TASLP.2020.3030495","article-title":"A model compression method with matrix product operators for speech enhancement","volume":"28","author":"Sun","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. (TASLP)"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Tjandra, A., Sakti, S., and Nakamura, S. (2018, January 8\u201313). Tensor decomposition for compressing recurrent neural network. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil.","DOI":"10.1109\/IJCNN.2018.8489213"},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1002\/1099-128X(200005\/06)14:3<105::AID-CEM582>3.0.CO;2-I","article-title":"Towards a standardized notation and terminology in multiway analysis","volume":"14","author":"Kiers","year":"2000","journal-title":"J. Chemom."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Qi, J., Yang, C.H.H., Chen, P.Y., and Tejedor, J. (2022). Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing. arXiv.","DOI":"10.31219\/osf.io\/gdqnz"},{"key":"ref_94","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Thakker, M., Eskimez, S.E., Yoshioka, T., and Wang, H. (2022). Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation. arXiv.","DOI":"10.21437\/Interspeech.2022-10962"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Kobayashi, K., and Toda, T. (2021, January 18\u201321). Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN. Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands.","DOI":"10.23919\/Eusipco47968.2020.9287721"},{"key":"ref_97","unstructured":"Campos, V., Jou, B., Gir\u00f3-i Nieto, X., Torres, J., and Chang, S.F. (2017). Skip rnn: Learning to skip state updates in recurrent neural networks. arXiv."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Fedorov, I., Stamenovic, M., Jensen, C., Yang, L.C., Mandell, A., Gan, Y., Mattina, M., and Whatmough, P.N. (2020). TinyLSTMs: Efficient neural speech enhancement for hearing aids. arXiv.","DOI":"10.21437\/Interspeech.2020-1864"},{"key":"ref_99","doi-asserted-by":"crossref","first-page":"2411","DOI":"10.1109\/TASLP.2022.3190738","article-title":"Inference skipping for more efficient real-time speech enhancement with parallel RNNs","volume":"30","author":"Le","year":"2022","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Kim, S., and Kim, M. (2022, January 23\u201327). Bloom-Net: Blockwise Optimization for Masking Networks toward Scalable and Efficient Speech Enhancement. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746767"},{"key":"ref_101","unstructured":"Kaya, Y., Hong, S., and Dumitras, T. (2019, January 9\u201315). Shallow-deep networks: Understanding and mitigating network overthinking. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Li, A., Zheng, C., Zhang, L., and Li, X. (2021, January 23\u201327). Learning to inference with early exit in the progressive speech enhancement. Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland.","DOI":"10.23919\/EUSIPCO54536.2021.9616248"},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Reddy, C.K., Beyrami, E., Dubey, H., Gopal, V., Cheng, R., Cutler, R., Matusevych, S., Aichner, R., Aazami, A., and Braun, S. (2020). The interspeech 2020 deep noise suppression challenge: Datasets, subjective speech quality and testing framework. arXiv.","DOI":"10.21437\/Interspeech.2020-3038"},{"key":"ref_104","doi-asserted-by":"crossref","unstructured":"Reddy, C.K., Beyrami, E., Pool, J., Cutler, R., Srinivasan, S., and Gehrke, J. (2019, January 15\u201319). A Scalable Noisy Speech Dataset and Online Subjective Test Framework. Proceedings of the Interspeech 2019, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-3087"},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Hu, Y., Liu, Y., Lv, S., Xing, M., Zhang, S., Fu, Y., Wu, J., Zhang, B., and Xie, L. (2020). DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv.","DOI":"10.21437\/Interspeech.2020-2537"},{"key":"ref_106","doi-asserted-by":"crossref","unstructured":"Reddy, C.K., Dubey, H., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., and Srinivasan, S. (2021, January 6\u201311). ICASSP 2021 deep noise suppression challenge. Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9415105"},{"key":"ref_107","doi-asserted-by":"crossref","unstructured":"Li, A., Liu, W., Luo, X., Zheng, C., and Li, X. (2021, January 6\u201311). ICASSP 2021 deep noise suppression challenge: Decoupling magnitude and phase optimization with a two-stage deep network. Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414062"},{"key":"ref_108","unstructured":"Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., and Hadsell, R. (2016). Progressive neural networks. arXiv."},{"key":"ref_109","doi-asserted-by":"crossref","first-page":"107511","DOI":"10.1016\/j.apacoust.2020.107511","article-title":"FLGCNN: A novel fully convolutional neural network for end-to-end monaural speech enhancement with utterance-based objective functions","volume":"170","author":"Zhu","year":"2020","journal-title":"Appl. Acoust."},{"key":"ref_110","unstructured":"Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., and Kavukcuoglu, K. (2016, January 5\u201310). Conditional image generation with pixelcnn decoders. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_111","doi-asserted-by":"crossref","unstructured":"Reddy, C.K., Dubey, H., Koishida, K., Nair, A., Gopal, V., Cutler, R., Braun, S., Gamper, H., Aichner, R., and Srinivasan, S. (September, January 30). INTERSPEECH 2021 Deep Noise Suppression Challenge. Proceedings of the Interspeech, Brno, Czech Republic.","DOI":"10.21437\/Interspeech.2021-1609"},{"key":"ref_112","doi-asserted-by":"crossref","unstructured":"Li, A., Liu, W., Luo, X., Yu, G., Zheng, C., and Li, X. (2021). A simultaneous denoising and dereverberation framework with target decoupling. arXiv.","DOI":"10.21437\/Interspeech.2021-1137"},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Dubey, H., Gopal, V., Cutler, R., Aazami, A., Matusevych, S., Braun, S., Eskimez, S.E., Thakker, M., Yoshioka, T., and Gamper, H. (2022, January 23\u201327). ICASSP 2022 deep noise suppression challenge. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747230"},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Zhang, G., Yu, L., Wang, C., and Wei, J. (2022, January 23\u201327). Multi-scale temporal frequency convolutional network with axial attention for speech enhancement. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746610"},{"key":"ref_115","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/LSP.2019.2955818","article-title":"Deep filtering: Signal extraction and reconstruction using complex time-frequency filters","volume":"27","author":"Mack","year":"2019","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_116","doi-asserted-by":"crossref","unstructured":"Graetzer, S., Barker, J., Cox, T.J., Akeroyd, M., Culling, J.F., Naylor, G., Porter, E., and Viveros Munoz, R. (September, January 30). Clarity-2021 challenges: Machine learning challenges for advancing hearing aid processing. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech Communication Association (ISCA), Brno, Czech Republic.","DOI":"10.21437\/Interspeech.2021-1574"},{"key":"ref_117","unstructured":"Tu, Z., Zhang, J., Ma, N., and Barker, J. (2021, January 16\u201317). A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing. Proceedings of the Machine Learning Challenges for Hearing Aids (Clarity-2021), Online. Available online: https:\/\/claritychallenge.org\/clarity2021-workshop\/."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1380\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:16:12Z","timestamp":1760120172000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1380"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,26]]},"references-count":117,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23031380"],"URL":"https:\/\/doi.org\/10.3390\/s23031380","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,26]]}}}