{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:07:14Z","timestamp":1753880834454,"version":"3.41.2"},"reference-count":29,"publisher":"World Scientific Pub Co Pte Ltd","issue":"02","funder":[{"DOI":"10.13039\/501100012165","name":"Key Technologies Research and Development Program","doi-asserted-by":"publisher","award":["2018YFB0203801"],"award-info":[{"award-number":["2018YFB0203801"]}],"id":[{"id":"10.13039\/501100012165","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61572510","61702529","61502511"],"award-info":[{"award-number":["61572510","61702529","61502511"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Special Fund for Public Welfare","award":["GYHY201306003"],"award-info":[{"award-number":["GYHY201306003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J CIRCUIT SYST COMP"],"published-print":{"date-parts":[[2022,1,30]]},"abstract":"<jats:p> Conventional time\u2013frequency (TF) domain source separation methods mainly focus on predicting TF-masks or speech spectrums, where complex ideal ratio mask (cIRM) is an effective target for speech enhancement and separation. However, some recent studies employ a real-valued network, such as a general convolutional neural network (CNN) and a recurrent neural network (RNN), to predict a complex-valued mask or a spectrogram target, leading to the unbalanced training results of real and imaginary parts. In this paper, to estimate the complex-valued target more accurately, a novel U-shaped complex network for the complex signal approximation (uCSA) method is proposed. The uCSA is an adaptive front-end time-domain separation method, which tackles the monaural source separation problem in three ways. First, we design and implement a complex U-shaped network architecture comprising well-defined complex-valued encoder and decoder blocks, as well as complex-valued bidirectional Long Short-Term Memory (BLSTM) layers, to process complex-valued operations. Second, the cIRM is the training target of our uCSA method, optimized by signal approximation (SA), which takes advantage of both real and imaginary components of the complex-valued spectrum. Third, we re-formulate STFT and inverse STFT into derivable formats, and the model is trained with the scale-invariant source-to-noise ratio (SI-SNR) loss, achieving end-to-end training of the speech source separation task. Moreover, the proposed uCSA models are evaluated on the WSJ0-2mix datasets, which is a valid corpus commonly used by many supervised speech separation methods. Extensive experimental results indicate that our proposed method obtains state-of-the-art performance on the basis of the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility (STOI) metrics. <\/jats:p>","DOI":"10.1142\/s0218126622500281","type":"journal-article","created":{"date-parts":[[2021,8,27]],"date-time":"2021-08-27T08:46:11Z","timestamp":1630053971000},"source":"Crossref","is-referenced-by-count":1,"title":["End-to-End Monaural Speech Separation with a Deep Complex U-Shaped Network"],"prefix":"10.1142","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7945-9580","authenticated-orcid":false,"given":"Wen","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Meteorology and Oceanography, College of Computer Science and Technology, National University of Defense Technology, Changsha, P. R. China,"}]},{"given":"Xiaoyong","family":"Li","sequence":"additional","affiliation":[{"name":"College of Meteorology and Oceanography, College of Computer Science and Technology, National University of Defense Technology, Changsha, P. R. China,"}]},{"given":"Aolong","family":"Zhou","sequence":"additional","affiliation":[{"name":"College of Meteorology and Oceanography, College of Computer Science and Technology, National University of Defense Technology, Changsha, P. R. China,"}]},{"given":"Kefeng","family":"Deng","sequence":"additional","affiliation":[{"name":"College of Meteorology and Oceanography, College of Computer Science and Technology, National University of Defense Technology, Changsha, P. R. China,"}]},{"given":"Kaijun","family":"Ren","sequence":"additional","affiliation":[{"name":"College of Meteorology and Oceanography, College of Computer Science and Technology, National University of Defense Technology, Changsha, P. R. China,"}]},{"given":"Junqiang","family":"Song","sequence":"additional","affiliation":[{"name":"College of Meteorology and Oceanography, College of Computer Science and Technology, National University of Defense Technology, Changsha, P. R. China,"}]}],"member":"219","published-online":{"date-parts":[[2021,8,26]]},"reference":[{"key":"S0218126622500281BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2842159"},{"key":"S0218126622500281BIB002","first-page":"406","volume-title":"Proc. 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Aroudi A.","year":"2019"},{"key":"S0218126622500281BIB003","first-page":"2374","volume-title":"Proc. Interspeech","author":"Haider F.","year":"2019"},{"key":"S0218126622500281BIB004","first-page":"6381","volume-title":"Proc. 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"He Y.","year":"2019"},{"key":"S0218126622500281BIB005","first-page":"4628","volume-title":"Proc. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Han K.","year":"2014"},{"key":"S0218126622500281BIB006","first-page":"1562","volume-title":"Proc. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Huang P.-S.","year":"2014"},{"key":"S0218126622500281BIB009","doi-asserted-by":"crossref","first-page":"1570","DOI":"10.1109\/TASLP.2018.2821903","volume":"26","author":"Fu S.-W.","year":"2018","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"S0218126622500281BIB011","first-page":"696","volume-title":"Proc. 2018 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Luo Y.","year":"2018"},{"key":"S0218126622500281BIB013","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1109\/TASLP.2019.2915167","volume":"27","author":"Luo Y.","year":"2019","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"S0218126622500281BIB014","first-page":"46","volume-title":"Proc. 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Luo Y.","year":"2020"},{"key":"S0218126622500281BIB015","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/TASLP.2014.2364452","volume":"23","author":"Xu Y.","year":"2014","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"S0218126622500281BIB016","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2006.09.003"},{"key":"S0218126622500281BIB017","first-page":"7092","volume-title":"Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Narayanan A.","year":"2013"},{"key":"S0218126622500281BIB018","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2352935"},{"first-page":"1528","volume-title":"Proc. Thirteenth Annual Conf. International Speech Communication Association","author":"Wang Y.","key":"S0218126622500281BIB019"},{"key":"S0218126622500281BIB020","first-page":"708","volume-title":"Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Erdogan H.","year":"2015"},{"key":"S0218126622500281BIB021","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1109\/TASLP.2015.2512042","volume":"24","author":"Williamson D. S.","year":"2015","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"S0218126622500281BIB022","first-page":"1","volume-title":"Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN)","author":"Zhang W.","year":"2020"},{"key":"S0218126622500281BIB023","first-page":"5746","volume-title":"Proc. 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Liu Y.","year":"2019"},{"key":"S0218126622500281BIB024","doi-asserted-by":"crossref","first-page":"100158","DOI":"10.1016\/j.bdr.2020.100158","volume":"22","author":"Zhang W.","year":"2020","journal-title":"Big Data Res."},{"key":"S0218126622500281BIB025","first-page":"6865","volume-title":"Proc. 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Tan K.","year":"2019"},{"key":"S0218126622500281BIB026","first-page":"9458","volume-title":"AAAI","author":"Yin D.","year":"2020"},{"volume-title":"Int. Conf. Learning Representations","year":"2019","author":"Choi H.-S.","key":"S0218126622500281BIB027"},{"key":"S0218126622500281BIB029","first-page":"31","volume-title":"Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Hershey J. R.","year":"2016"},{"key":"S0218126622500281BIB030","first-page":"241","volume-title":"Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP)","author":"Yu D.","year":"2017"},{"key":"S0218126622500281BIB031","doi-asserted-by":"crossref","first-page":"1901","DOI":"10.1109\/TASLP.2017.2726762","volume":"25","author":"Kolb\u00e6k M.","year":"2017","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"S0218126622500281BIB033","doi-asserted-by":"crossref","first-page":"4705","DOI":"10.1121\/1.4986931","volume":"141","author":"Chen J.","year":"2017","journal-title":"J. Acoust. Soc. Amer."},{"key":"S0218126622500281BIB036","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.858005"},{"key":"S0218126622500281BIB037","first-page":"1","volume-title":"Proc. 2016 IEEE Int. Workshop on Acoustic Signal Enhancement (IWAENC)","author":"Wang Z.","year":"2016"}],"container-title":["Journal of Circuits, Systems and Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218126622500281","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,25]],"date-time":"2022-02-25T09:35:55Z","timestamp":1645781755000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0218126622500281"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,26]]},"references-count":29,"journal-issue":{"issue":"02","published-print":{"date-parts":[[2022,1,30]]}},"alternative-id":["10.1142\/S0218126622500281"],"URL":"https:\/\/doi.org\/10.1142\/s0218126622500281","relation":{},"ISSN":["0218-1266","1793-6454"],"issn-type":[{"type":"print","value":"0218-1266"},{"type":"electronic","value":"1793-6454"}],"subject":[],"published":{"date-parts":[[2021,8,26]]},"article-number":"2250028"}}