{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T18:49:51Z","timestamp":1770749391614,"version":"3.50.0"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,24]],"date-time":"2021-10-24T00:00:00Z","timestamp":1635033600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,24]],"date-time":"2021-10-24T00:00:00Z","timestamp":1635033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"The Graduate Student Innovation Fund of Xi'an University of Post and Telecommunications","award":["CXJJLD202003"],"award-info":[{"award-number":["CXJJLD202003"]}]},{"name":"The Key Research and Development Program of Shaanxi Province of China","award":["2020SF-377"],"award-info":[{"award-number":["2020SF-377"]}]},{"name":"The Key Research and Development Program of Shaanxi Province of China","award":["2019GY-086"],"award-info":[{"award-number":["2019GY-086"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EURASIP J. Adv. Signal Process."],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Speech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.<\/jats:p>","DOI":"10.1186\/s13634-021-00813-8","type":"journal-article","created":{"date-parts":[[2021,10,24]],"date-time":"2021-10-24T15:02:37Z","timestamp":1635087757000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Speech enhancement from fused features based on deep neural network and gated recurrent unit network"],"prefix":"10.1186","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5184-2705","authenticated-orcid":false,"given":"Youming","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiali","family":"Han","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianqi","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Didi","family":"Qing","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,10,24]]},"reference":[{"key":"813_CR1","doi-asserted-by":"publisher","DOI":"10.1201\/b14529","volume-title":"Speech Enhancement: Theory and Practice","author":"PC Loizou","year":"2013","unstructured":"P.C. Loizou, Speech Enhancement: Theory and Practice, 2nd edn. (CRC Press, Cambridge, 2013)","edition":"2"},{"key":"813_CR2","unstructured":"C. Valentinibotinhao, J. Yamagishi, S. King, Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise (2012)"},{"key":"813_CR3","doi-asserted-by":"crossref","unstructured":"H.N. Moritz, T. Roux, Triggered attention for end-to-end speech recognition. In: Icassp IEEE International Conference on Acoustics (IEEE, 2019).","DOI":"10.1109\/ICASSP.2019.8683510"},{"issue":"1","key":"813_CR4","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1121\/1.382239","volume":"65","author":"TV Sreenivas","year":"1979","unstructured":"T.V. Sreenivas, P. Rao, Pitch extraction from corrupted harmonics of the power spectrum. J Acoust Soc Am 65(1), 223\u2013228 (1979)","journal-title":"J Acoust Soc Am"},{"key":"813_CR5","unstructured":"C. Fdlwa, Vanessa Aparecida de Moraes Weber b e, C. Gvm, et al. Recognition of Pantaneira cattle breed using computer vision and convolutional neural networks-ScienceDirect. Comput. Electron. Agric. 175."},{"key":"813_CR6","unstructured":"Analysis of DNN speech signal enhancement for robust speaker recognition (2018)"},{"key":"813_CR7","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/j.procs.2018.10.359","volume":"143","author":"PP Barman","year":"2018","unstructured":"P.P. Barman et al., A RNN based approach for next word prediction in assamese phonetic transcription. Procedia Comput Sci 143, 117\u2013123 (2018)","journal-title":"Procedia Comput Sci"},{"key":"813_CR8","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1016\/j.specom.2019.06.002","volume":"111","author":"A Nicolson","year":"2019","unstructured":"A. Nicolson, K.K. Paliwal, Deep learning for minimum mean-square error approaches to speech enhancement. Speech Commun 111, 44\u201355 (2019)","journal-title":"Speech Commun"},{"key":"813_CR9","unstructured":"A. Adeel, M. Gogate, A. Hussain, Contextual audio-visual switching for speech enhancement in real-world environments (2018)"},{"issue":"12","key":"813_CR10","doi-asserted-by":"publisher","first-page":"2263","DOI":"10.1109\/TASLP.2016.2602884","volume":"24","author":"Y Qian","year":"2017","unstructured":"Y. Qian, M. Bi, T. Tian et al., Very deep convolutional neural networks for noise robust speech recognition. IEEE\/ACM Trans Audio Speech Lang Process 24(12), 2263\u20132276 (2017)","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"key":"813_CR11","unstructured":"X. G. Lu, Y. Tsao, S. Matsuda et al. Speech enhancement based on deep denoising autoencoder (2013)"},{"issue":"1","key":"813_CR12","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1109\/LSP.2013.2291240","volume":"21","author":"Y Xu","year":"2013","unstructured":"Y. Xu, J. Du, L.R. Dai et al., An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65\u201368 (2013)","journal-title":"IEEE Signal Process. Lett."},{"key":"813_CR13","doi-asserted-by":"crossref","unstructured":"F. Weninger, H. Erdogan, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in: International Conference on Latent Variable Analysis and Signal Separation (Liberec, Czech Republic, 2015), pp. 91\u201394","DOI":"10.1007\/978-3-319-22482-4_11"},{"key":"813_CR14","doi-asserted-by":"crossref","unstructured":"J. Lee, K. Kim, T. Shabestary, H. Kang. Deep bi-directional long short-term memory based speech enhancement for wind noise reduction, in: Hands-Free Speech Communications and Microphone Arrays (HSCMA) (San Francisco, USA, 2017), pp. 41\u201350","DOI":"10.1109\/HSCMA.2017.7895558"},{"key":"813_CR15","doi-asserted-by":"crossref","unstructured":"F. Weninger, J. R. Hershey, J. Le Roux, B. Schuller. Discriminatively trained recurrent neural networks for single-channel speech separation, in: Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on (IEEE, 2014), pp. 577\u2013581","DOI":"10.1109\/GlobalSIP.2014.7032183"},{"key":"813_CR16","doi-asserted-by":"crossref","unstructured":"F. Weninger, F. Eyben, B. Schulle. Single-channel speech separation with memory-enhanced recurrent neural networks, in: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2014), pp. 3709\u20133713","DOI":"10.1109\/ICASSP.2014.6854294"},{"key":"813_CR17","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1109\/TASLP.2014.2364452","volume":"23","author":"Y Xu","year":"2015","unstructured":"Y. Xu, J. Du, L. Dai, C. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE-ACM. Trans. Audio Speech Lang Process 23, 7\u201319 (2015)","journal-title":"IEEE-ACM. Trans. Audio Speech Lang Process"},{"key":"813_CR18","doi-asserted-by":"crossref","unstructured":"J. M. Valin, A hybrid DSP\/deep learning approach to real-time full-band speech enhancement, in: IEEE 20th International Workshop on Multimedia Signal Processing (2018), pp. 1\u20135","DOI":"10.1109\/MMSP.2018.8547084"},{"key":"813_CR19","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1016\/j.patrec.2020.11.009","volume":"140","author":"Z Zhu","year":"2020","unstructured":"Z. Zhu, W. Dai, Y. Hu, Speech emotion recognition model based on Bi-GRU and focal loss-ScienceDirect. Pattern Recognit Lett 140, 358\u2013365 (2020)","journal-title":"Pattern Recognit Lett"},{"key":"813_CR20","first-page":"755","volume":"50","author":"AW Rix","year":"2002","unstructured":"A.W. Rix, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ) the new ITU standard for end-to-end speech quality assessment Part 1\u2014time-delay compensation. J. Audio Eng. Soc. 50, 755\u2013764 (2002)","journal-title":"J. Audio Eng. Soc."},{"key":"813_CR21","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1049\/iet-its.2016.0208","volume":"11","author":"Z Zhao","year":"2017","unstructured":"Z. Zhao, W. Chen, X. Wu, LSTM network: a deep learning approach for short-term traffic forecast. Intell. Transp. Syst. IET 11, 68\u201375 (2017)","journal-title":"Intell. Transp. Syst. IET"},{"key":"813_CR22","unstructured":"J. Chung, C. Gulcehre, K. H. Cho, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014)"},{"key":"813_CR23","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1002\/hipo.22806","volume":"29","author":"US Bhalla","year":"2017","unstructured":"U.S. Bhalla, Dendrites, deep learning, and sequences in the hippocampus. Hippocampus 29, 239\u2013251 (2017)","journal-title":"Hippocampus"},{"key":"813_CR24","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1016\/j.jbi.2018.05.016","volume":"83","author":"W Stephen","year":"2018","unstructured":"W. Stephen, L. Sijia, S. Sunghwan, Modeling asynchronous event sequences with RNNs. J. Biomed. Infrom. 83, 167\u2013177 (2018)","journal-title":"J. Biomed. Infrom."},{"key":"813_CR25","unstructured":"ITU-T, Rec. P.862: Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, in: International Telecommun Union-Telecommun Standardization Sector (2001)"},{"key":"813_CR26","doi-asserted-by":"crossref","unstructured":"C. H. Taal, R. C. Hendriks, R. Heusdens. A short-time objective intelligibility measure for time-frequency weighted noisy speech, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Dallas, USA, 2010), pp. 4214\u20137","DOI":"10.1109\/ICASSP.2010.5495701"},{"key":"813_CR27","doi-asserted-by":"crossref","unstructured":"S. Kim, M. Maity, M, Kim. Incremental binarization on recurrent neural networks for single-channel source separation (2019). pp. 376\u2013380","DOI":"10.1109\/ICASSP.2019.8682595"}],"container-title":["EURASIP Journal on Advances in Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13634-021-00813-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13634-021-00813-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13634-021-00813-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,24]],"date-time":"2021-10-24T15:10:49Z","timestamp":1635088249000},"score":1,"resource":{"primary":{"URL":"https:\/\/asp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13634-021-00813-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,24]]},"references-count":27,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["813"],"URL":"https:\/\/doi.org\/10.1186\/s13634-021-00813-8","relation":{},"ISSN":["1687-6180"],"issn-type":[{"value":"1687-6180","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,24]]},"assertion":[{"value":"25 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The manuscript does not contain any individual person\u2019s data in any form (including individual details, images or videos), and therefore, the consent to publish is not applicable to this article.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"104"}}