{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:30:58Z","timestamp":1761006658339,"version":"build-2065373602"},"reference-count":35,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T00:00:00Z","timestamp":1760918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Project of Henan Province","award":["232102210027"],"award-info":[{"award-number":["232102210027"]}]},{"name":"Henan Youth Natural Science Foundation","award":["242300420695"],"award-info":[{"award-number":["242300420695"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Deep neural network-based approaches have obtained remarkable progress in monaural speech enhancement. Nevertheless, current cutting-edge approaches remain vulnerable to complex acoustic scenarios. We propose a Symmetric Combined Convolution Network with ConvLSTM (SCCN) for monaural speech enhancement. Specifically, the Combined Convolution Block utilizes parallel convolution branches, including standard convolution and two different depthwise separable convolutions, to reinforce feature extraction in depthwise and channelwise. Similarly, Combined Deconvolution Blocks are stacked to construct the convolutional decoder. Moreover, we introduce the exponentially increasing dilation between convolutional kernel elements in the encoder and decoder, which expands receptive fields. Meanwhile, the grouped ConvLSTM layers are exploited to extract the interdependency of spatial and temporal information. The experimental results demonstrate that the proposed SCCN method obtains on average 86.00% in STOI and 2.43 in PESQ, which outperforms the state-of-the-art baseline methods, confirming the effectiveness in enhancing speech quality.<\/jats:p>","DOI":"10.3390\/sym17101768","type":"journal-article","created":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T09:23:34Z","timestamp":1760952214000},"page":"1768","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Symmetric Combined Convolution with Convolutional Long Short-Term Memory for Monaural Speech Enhancement"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3896-234X","authenticated-orcid":false,"given":"Yang","family":"Xian","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yujin","family":"Fu","sequence":"additional","affiliation":[{"name":"College of Mathematics and Information Science, Zhengzhou University of Light Industry, Zhengzhou 450002, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peixu","family":"Xing","sequence":"additional","affiliation":[{"name":"College of Mathematics and Information Science, Zhengzhou University of Light Industry, Zhengzhou 450002, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4722-5915","authenticated-orcid":false,"given":"Hongwei","family":"Tao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Sun","sequence":"additional","affiliation":[{"name":"Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Yousif, S.T., and Mahmmod, B.M. (2025). Speech Enhancement Algorithms: A Systematic Literature Review. Algorithms, 18.","DOI":"10.3390\/a18050272"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"109649","DOI":"10.1016\/j.apacoust.2023.109649","article-title":"Distributed parameterized topology-independent noise reduction in acoustic sensor networks","volume":"213","author":"Chang","year":"2023","journal-title":"Appl. Acoust."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"23312165231209913","DOI":"10.1177\/23312165231209913","article-title":"Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods","volume":"27","author":"Zheng","year":"2023","journal-title":"Trends Hear."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3118","DOI":"10.1109\/TASLP.2021.3120639","article-title":"Quantization-aware binaural MWF based noise reduction incorporating external wireless devices","volume":"29","author":"Zhang","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2731","DOI":"10.1109\/TASLP.2020.3028264","article-title":"Spatially correct rate-constrained noise reduction for binaural hearing aids in wireless acoustic sensor networks","volume":"28","author":"Amini","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"103405","DOI":"10.1016\/j.asej.2025.103405","article-title":"Deep neural networks for speech enhancement and speech recognition: A systematic review","volume":"16","author":"Natarajan","year":"2025","journal-title":"Ain Shams Eng. J."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1109\/TSA.2005.860851","article-title":"New insights into the noise reduction Wiener filter","volume":"14","author":"Chen","year":"2006","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_8","first-page":"389","article-title":"Improved wavelet denoising via empirical Wiener filtering","volume":"3169","author":"Ghael","year":"1997","journal-title":"Wavelet Appl. Signal Image Process. V"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1261","DOI":"10.1109\/TIT.2005.844072","article-title":"Mutual information and minimum mean-square error in Gaussian channels","volume":"51","author":"Guo","year":"2005","journal-title":"IEEE Trans. Inf. Theory."},{"key":"ref_10","first-page":"1182","article-title":"Spectral subtraction based on minimum statistics","volume":"6","author":"Martin","year":"1994","journal-title":"Power"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1109\/LSP.2013.2291240","article-title":"An experimental study on speech enhancement based on deep neural networks","volume":"21","author":"Xu","year":"2013","journal-title":"IEEE Signal Process Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"332","DOI":"10.1177\/1084713808326455","article-title":"Time-frequency masking for speech separation and its potential for hearing aid design","volume":"12","author":"Wang","year":"2008","journal-title":"Trends Amplif."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"982","DOI":"10.1109\/TASLP.2015.2416653","article-title":"Learning spectral mapping for speech dereverberation and denoising","volume":"23","author":"Han","year":"2015","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2136","DOI":"10.1109\/TASLP.2015.2468583","article-title":"Joint optimization of masks and deep recurrent neural networks for monaural source separation","volume":"23","author":"Huang","year":"2015","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Weninger, F.J., Erdogan, H., Watanabe, S., Vincent, E., Le Roux, J., Hershey, J.R., and Schuller, B.W. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. Latent Variable Analysis and Signal Separation, 12th International Conference, LVA\/ICA 2015, Liberec, Czech Republic, 25\u201328 August 2015, Proceedings, Springer.","DOI":"10.1007\/978-3-319-22482-4_11"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Fu, S.-W., Hu, T.-Y., Tsao, Y., and Lu, X. (2017, January 25\u201328). Complex spectrogram enhancement by convolutional neural network with multi-metrics learning. Proceedings of the 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan.","DOI":"10.1109\/MLSP.2017.8168119"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Park, S.R., and Lee, J.W. (2017, January 20\u201324). A fully convolutional neural network for speech enhancement. Proceedings of the Interspeech 2017, Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-1465"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tan, K., and Wang, D.L. (2018, January 2\u20136). A convolutional recurrent neural network for real-time speech enhancement. Proceedings of the Interspeech 2018, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-1405"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1109\/TASLP.2018.2876171","article-title":"Gated residual networks with dilated convolutions for monaural speech enhancement","volume":"27","author":"Tan","year":"2018","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1109\/TASLP.2019.2913512","article-title":"A new framework for CNN-based speech enhancement in the time domain","volume":"27","author":"Pandey","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pascual, S., Bonafonte, A., and Serr\u00e0, J. (2017, January 20\u201324). SEGAN: Speech Enhancement Generative Adversarial Network. Proceedings of the Interspeech 2017, Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-1428"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2477","DOI":"10.1109\/TASLP.2024.3393718","article-title":"CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement","volume":"32","author":"Abdulatif","year":"2024","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhang, X.Y., Zhou, X.Y., Lin, M.X., and Sun, J. (2018, January 18\u201322). Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.neunet.2021.05.017","article-title":"Convolutional fusion network for monaural speech enhancement","volume":"143","author":"Xian","year":"2021","journal-title":"Neural Netw."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Strake, M., Defraene, B., Fluyt, K., Tirry, W., and Fingscheidt, T. (2020, January 4\u20138). Fully convolutional recurrent networks for speech enhancement. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054230"},{"key":"ref_27","first-page":"802","article-title":"Convolutional LSTM network: A machine learning approach for precipitation nowcasting","volume":"28","author":"Shi","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Thiemann, J., Ito, N., and Vincent, E. (2013, January 2\u20137). The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings. Proceedings of the Meetings on Acoustics, Montreal, QC, Canada.","DOI":"10.1121\/1.4799597"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., and Dahlgren, N.L. (1993). DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus CDROM. NIST Interagency\/Internal Report (NISTIR), National Institute of Standards and Technology.","DOI":"10.6028\/NIST.IR.4930"},{"key":"ref_30","unstructured":"Botinhao, C.V., Wang, X., Takaki, S., and Yamagishi, J. (2016, January 13\u201315). Investigating RNN-based speech enhancement methods for noise-robust text-to-speech. Proceedings of the 9th ISCA Workshop on Speech Synthesis (SSW 9), Sunnyvale, CA, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/0167-6393(93)90095-3","article-title":"Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems","volume":"12","author":"Varga","year":"1993","journal-title":"Speech Commun."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2067","DOI":"10.1109\/TASL.2010.2041110","article-title":"A tandem algorithm for pitch estimation and voiced speech segregation","volume":"18","author":"Hu","year":"2010","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1109\/TASL.2007.911054","article-title":"Evaluation of objective quality measures for speech enhancement","volume":"16","author":"Hu","year":"2008","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"2125","DOI":"10.1109\/TASL.2011.2114881","article-title":"An algorithm for intelligibility prediction of time-frequency weighted noisy speech","volume":"19","author":"Taal","year":"2011","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3100","DOI":"10.1121\/1.1872572","article-title":"Localizing nearby sound sources in a classroom: Binaural room impulse responses","volume":"117","author":"Kopco","year":"2005","journal-title":"J. Acoust. Soc. Am."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/10\/1768\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T09:40:39Z","timestamp":1760953239000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/10\/1768"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,20]]},"references-count":35,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["sym17101768"],"URL":"https:\/\/doi.org\/10.3390\/sym17101768","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,20]]}}}