{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:47:19Z","timestamp":1760147239747,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T00:00:00Z","timestamp":1673913600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Thanks to the use of deep neural networks (DNNs), microphone array speech separation methods have achieved impressive performance. However, most existing neural beamforming methods explicitly follow traditional beamformer formulas, which possibly causes sub-optimal performance. In this study, a pre-separation and all-neural beamformer framework is proposed for multi-channel speech separation without following the solutions of the conventional beamformers, such as the minimum variance distortionless response (MVDR) beamformer. More specifically, the proposed framework includes two modules, namely the pre-separation module and the all-neural beamforming module. The pre-separation module is used to obtain pre-separated speech and interference, which are further utilized by the all-neural beamforming module to obtain frame-level beamforming weights without computing the spatial covariance matrices. The evaluation results of the multi-channel speech separation tasks, including speech enhancement subtasks and speaker separation subtasks, demonstrate that the proposed method is more effective than several advanced baselines. Furthermore, this method can be used for symmetrical stereo speech.<\/jats:p>","DOI":"10.3390\/sym15020261","type":"journal-article","created":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T05:40:02Z","timestamp":1673934002000},"page":"261","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["A Pre-Separation and All-Neural Beamformer Framework for Multi-Channel Speech Separation"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4319-9326","authenticated-orcid":false,"given":"Wupeng","family":"Xie","sequence":"first","affiliation":[{"name":"Information Science Academy, China Electronics Technology Group Corporation, Beijing 100041, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5799-5091","authenticated-orcid":false,"given":"Xiaoxiao","family":"Xiang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Electromagnetic Radiation and Sensing Technology, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Xiaojuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Electromagnetic Radiation and Sensing Technology, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Guanghong","family":"Liu","sequence":"additional","affiliation":[{"name":"Information Science Academy, China Electronics Technology Group Corporation, Beijing 100041, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1109\/TASLP.2019.2915167","article-title":"Conv-TasNet: Surpassing Ideal Time\u2013Frequency Magnitude Masking for Speech Separation","volume":"27","author":"Luo","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1455","DOI":"10.1109\/LSP.2021.3093859","article-title":"A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement","volume":"28","author":"Xiang","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Pandey, A., Xu, B., Kumar, A., Donley, J., Calamia, P., and Wang, D. (2022, January 23\u201327). TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747373"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1849","DOI":"10.1109\/TASLP.2014.2352935","article-title":"On Training Targets for Supervised Speech Separation","volume":"22","author":"Wang","year":"2014","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1829","DOI":"10.1109\/TASLP.2021.3079813","article-title":"Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement","volume":"29","author":"Li","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1109\/TASLP.2019.2941148","article-title":"Divide and Conquer: A Deep CASA Approach to Talker-Independent Monaural Speaker Separation","volume":"27","author":"Liu","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2109","DOI":"10.1109\/TASLP.2020.3007779","article-title":"Causal Deep CASA for Monaural Talker-Independent Speaker Separation","volume":"28","author":"Liu","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, A., Liu, W., Zheng, C., and Li, X. (2022, January 23\u201327). Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement. Proceedings of the ICASSP 2022\u20142022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746432"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1109\/89.622565","article-title":"A signal subspace tracking algorithm for microphone array processing of speech","volume":"5","author":"Affes","year":"1997","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1614","DOI":"10.1109\/78.934132","article-title":"Signal enhancement using beamforming and nonstationarity with applications to speech","volume":"49","author":"Gannot","year":"2001","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Erdogan, H., Hershey, J., Watanabe, S., Mandel, M.I., and Roux, J.L. (2016, January 8\u201312). Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks. Proceedings of the Interspeech 2016, San Francisco, CA, USA.","DOI":"10.21437\/Interspeech.2016-552"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Xiao, X., Zhao, S., Jones, D.L., Chng, E.S., and Li, H. (2017, January 5\u20139). On Time-Frequency Mask Estimation for MVDR Beamforming with Application in Robust Speech Recognition. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952756"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Qian, K., Zhang, Y., Chang, S., Yang, X., Florencio, D., and Hasegawa-Johnson, M. (2018, January 15\u201320). Deep Learning Based Speech Beamforming. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462430"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/TASLP.2020.2998279","article-title":"Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR","volume":"28","author":"Wang","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Gu, R., Zhang, S., Chen, L., Xu, Y., Yu, M., Su, D., Zou, Y., and Yu, D. (2020, January 4\u20138). Enhancing End-to-End Multi-Channel Speech Separation Via Spatial Feature Learning. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053092"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1109\/TASLP.2018.2881912","article-title":"Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation","volume":"27","author":"Wang","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, J., Zoril\u0103, C., Doddipatla, R., and Barker, J. (2020, January 4\u20138). On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053833"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"542","DOI":"10.1109\/JSTSP.2020.2987209","article-title":"Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network","volume":"14","author":"Tan","year":"2020","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2001","DOI":"10.1109\/TASLP.2021.3083405","article-title":"Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation","volume":"29","author":"Wang","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1853","DOI":"10.1109\/TASLP.2021.3082318","article-title":"Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones","volume":"29","author":"Tan","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1109\/TASLP.2022.3145319","article-title":"Neural Spectrospatial Filtering","volume":"30","author":"Tan","year":"2022","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Xu, Y., Yu, M., Zhang, S.X., Chen, L., and Yu, D. (2021, January 6\u201311). ADL-MVDR: All Deep Learning MVDR Beamformer for Target Speech Separation. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413594"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Xu, Y., Zhang, Z., Yu, M., Zhang, S.X., and Yu, D. (September, January 30). Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation. Proceedings of the Interspeech 2021, Brno, Czech Republic.","DOI":"10.21437\/Interspeech.2021-430"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Luo, Y., Han, C., Mesgarani, N., Ceolini, E., and Liu, S.C. (2019, January 14\u201318). FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore.","DOI":"10.1109\/ASRU46091.2019.9003849"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1541","DOI":"10.1109\/LSP.2022.3188178","article-title":"Distributed Microphones Speech Separation by Learning Spatial Information With Recurrent Neural Network","volume":"29","author":"Xiang","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_26","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer normalization. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_28","unstructured":"Ballas, N., Yao, L., Pal, C., and Courville, A. (2015). Delving Deeper into Convolutional Networks for Learning Video Representations. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1109\/TASLP.2021.3064421","article-title":"Dense CNN With Self-Attention for Time-Domain Speech Enhancement","volume":"29","author":"Pandey","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1901","DOI":"10.1109\/TASLP.2017.2726762","article-title":"Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks","volume":"25","author":"Yu","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. (2015, January 19\u201324). Librispeech: An ASR Corpus Based on Public Domain Audio Books. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Reddy, C.K.A., Gopal, V., Cutler, R., Beyrami, E., CHENG, R., Dubey, H., Matusevych, S., Aichner, R., Aazami, A., and Braun, S. (2020, January 25\u201329). The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results. Proceedings of the Interspeech 2020, Shanghai, China.","DOI":"10.21437\/Interspeech.2020-3038"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/0167-6393(93)90095-3","article-title":"Assessment for Automatic Speech Recognition: II. NOISEX-92: A Database and an Experiment to Study the Effect of Additive Noise on Speech Recognition Systems","volume":"12","author":"Varga","year":"1993","journal-title":"Speech Commun."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1752","DOI":"10.1109\/TASLP.2021.3078640","article-title":"Group Communication With Context Codec for Lightweight Source Separation","volume":"29","author":"Luo","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_35","unstructured":"Kingma, D.P., and Ba, J.L. (2015, January 7\u20139). Adam: A method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1109\/TASLP.2019.2955276","article-title":"Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement","volume":"28","author":"Tan","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1109\/LSP.2021.3128374","article-title":"A Nested U-Net With Self-Attention and Dense Connectivity for Monaural Speech Enhancement","volume":"29","author":"Xiang","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Luo, Y., Chen, Z., Mesgarani, N., and Yoshioka, T. (2020, January 4\u20138). End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054177"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Wang, D. (2020, January 4\u20138). Multi-Microphone Complex Spectral Mapping for Speech Dereverberation. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053610"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Luo, Y. (2021). A Time-domain Generalized Wiener Filter for Multi-channel Speech Separation. arXiv.","DOI":"10.1109\/TASLP.2022.3205750"},{"key":"ref_41","unstructured":"Lee, D., Kim, S., and Choi, J.W. (2021). Inter-channel Conv-TasNet for Multichannel Speech Enhancement. arXiv."},{"key":"ref_42","unstructured":"Rix, A.W., Beerends, J.G., Hollier, M.P., and Hekstra, A.P. (2001, January 7\u201311). Perceptual Evaluation of Speech Quality (PESQ)-A New Method for Speech Quality Assessment of Telephone Networks and Codecs. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2125","DOI":"10.1109\/TASL.2011.2114881","article-title":"An Algorithm for Intelligibility Prediction of Time\u2013Frequency Weighted Noisy Speech","volume":"19","author":"Taal","year":"2011","journal-title":"IEEE Trans. Audio Speech Lang. Process."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/2\/261\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:08:20Z","timestamp":1760119700000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/2\/261"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,17]]},"references-count":43,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["sym15020261"],"URL":"https:\/\/doi.org\/10.3390\/sym15020261","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2023,1,17]]}}}