{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T05:44:07Z","timestamp":1761975847590,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T00:00:00Z","timestamp":1761782400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Innovative Funds Plan of Henan University 341 of Technology","award":["2022ZKCJ13"],"award-info":[{"award-number":["2022ZKCJ13"]}]},{"name":"Open Project of Scientific Research Platform of Henan University of 342 Technology Grain Information Processing Center","award":["KFJJ2023011"],"award-info":[{"award-number":["KFJJ2023011"]}]},{"name":"Natural Science Project of Henan 343 Provincial Department of Science and Technology, Technology Research Projects","award":["242102211027"],"award-info":[{"award-number":["242102211027"]}]},{"name":"Fund of the Institute of Complexity Science, Henan University of Technology","award":["CSKFJJ-2025-49"],"award-info":[{"award-number":["CSKFJJ-2025-49"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>In cross-corpus scenarios, inappropriate feature-processing methods tend to cause the loss of key emotional information. Additionally, deep neural networks contain substantial redundancy, which triggers domain shift issues and impairs the generalization ability of emotion recognition systems. To address these challenges, this study proposes a cross-corpus speech emotion recognition model based on attention-driven feature refinement and spatial reconstruction. Specifically, the proposed approach consists of three key components: first, an autoencoder integrated with a multi-head attention mechanism to enhance the model\u2019s ability to focus on the emotional components of acoustic features during the feature compression process of the autoencoder network; second, a feature refinement and spatial reconstruction module designed to further improve the extraction of emotional features, with a gating mechanism employed to optimize the feature reconstruction process; finally, the Charbonnier loss function adopted as the loss metric during training to minimize the difference between features from the source domain and target domain, thereby enhancing the cross-domain robustness of the model. Experimental results demonstrated that the proposed method achieved an average recognition accuracy of 46.75% across six sets of cross-corpus experiments, representing an improvement of 4.17% to 14.33% compared with traditional domain adaptation methods.<\/jats:p>","DOI":"10.3390\/info16110945","type":"journal-article","created":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T05:28:43Z","timestamp":1761888523000},"page":"945","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Cross-Corpus Speech Emotion Recognition Based on Attention-Driven Feature Refinement and Spatial Reconstruction"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9194-0297","authenticated-orcid":false,"given":"Huawei","family":"Tao","sequence":"first","affiliation":[{"name":"Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"},{"name":"Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0008-091X","authenticated-orcid":false,"given":"Yixing","family":"Jiang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Grain Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"},{"name":"Henan Key Laboratory of Grain Photoelectric Detection and Control, Henan University of Technology, Zhengzhou 450001, China"}]},{"given":"Qianqian","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mechanical and Electrical Engineering, Zhengzhou Business University, Zhengzhou 451200, China"}]},{"given":"Li","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Southeast University, Nanjing 210096, China"}]},{"given":"Zhizhe","family":"Yang","sequence":"additional","affiliation":[{"name":"Yunnan Chinese Language and Culture College, Yunnan Normal University, Kunming 650504, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhang, S., Liu, R., Tao, X., and Zhao, X. (2021). Deep Cross-Corpus Speech Emotion Recognition: Recent Advances and Perspectives. Front. Neurorobot., 15.","DOI":"10.3389\/fnbot.2021.784514"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Singh, J., Saheer, L.B., and Faust, O. (2023). Speech Emotion Recognition Using Attention Model. Int. J. Environ. Res. Public Health, 20.","DOI":"10.3390\/ijerph20065140"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Latif, S., Rana, R., Khalifa, S., Jurdak, R., and Schuller, B. (2022). Self-Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition. arXiv.","DOI":"10.1109\/TAFFC.2022.3167013"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"101586","DOI":"10.1016\/j.csl.2023.101586","article-title":"A Semi-Supervised High-Quality Pseudo Labels Algorithm Based on Multi-Constraint Optimization for Speech Deception Detection","volume":"85","author":"Tao","year":"2024","journal-title":"Comput. Speech  Lang."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/s13636-022-00264-5","article-title":"Cross-Corpus Speech Emotion Recognition Using Subspace Learning and Domain Adaptation","volume":"2022","author":"Cao","year":"2022","journal-title":"J. Audio Speech Music Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Yang, J., Liu, J., Huang, K., Xia, J., Zhu, Z., and Zhang, H. (2024). Single- and Cross-Lingual Speech Emotion Recognition Based on WavLM Domain Emotion Embedding. Electronics, 13.","DOI":"10.3390\/electronics13071380"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Pastor, M.A., Ribas, D., Ortega, A., Miguel, A., and Lleida, E. (2023). Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations. Appl. Sci., 13.","DOI":"10.3390\/app13169062"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Naeeni, N., and Nasersharif, B. (2025). Feature and Classifier-Level Domain Adaptation in DistilHuBERT for Cross-Corpus Speech Emotion Recognition. Comput. Biol. Med., 194.","DOI":"10.1016\/j.compbiomed.2025.110510"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"110814","DOI":"10.1016\/j.knosys.2023.110814","article-title":"Cross-Corpus Speech Emotion Recognition Using Transfer Learning and Attention-Based Fusion of Wav2Vec2 and Prosody Features","volume":"277","author":"Naderi","year":"2023","journal-title":"Knowl.-Based Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1564","DOI":"10.1109\/TCDS.2021.3123979","article-title":"Convolutional-Recurrent Neural Networks with Multiple Attention Mechanisms for Speech Emotion Recognition","volume":"14","author":"Jiang","year":"2021","journal-title":"IEEE Trans. Cogn. Dev. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yu, S., Meng, J., Fan, W., Chen, Y., Zhu, B., Yu, H., Xie, Y., and Sun, Q. (2024). Speech Emotion Recognition Using Dual-Stream Representation and Cross-Attention Fusion. Electronics, 13.","DOI":"10.3390\/electronics13112191"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., M\u00fcller, C., and Narayanan, S.S. (2010, January 26\u201330). The Interspeech 2010 paralinguistic challenge. Proceedings of the Interspeech 2010, Makuhari, Japan.","DOI":"10.21437\/Interspeech.2010-739"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Eyben, F., Wollmer, M., and Schuller, B. (2010, January 25\u201329). Opensmile: The munich versatile and fast open-source audio feature extractor. Proceedings of the 18th ACM International Conference on Multimedia, Firenze, Italy.","DOI":"10.1145\/1873951.1874246"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., and Batliner, A. (2009, January 6\u201310). The Interspeech 2009 Emotion Challenge. Proceedings of the INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, UK.","DOI":"10.21437\/Interspeech.2009-103"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., Batliner, A., Schiel, F., and Krajewski, J. (2011, January 27\u201331). The Interspeech 2011 Speaker State Challenge. Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy.","DOI":"10.21437\/Interspeech.2011-801"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., Batliner, A., N\u00f6th, E., Vinciarelli, A., Burkhardt, F., van Son, R., Weninger, F., Eyben, F., and Bocklet, T. (2012, January 9\u201313). The Interspeech 2012 Speaker Trait Challenge. Proceedings of the 13th Annual Conference of the International Speech Communication Association, Portland, OR, USA.","DOI":"10.21437\/Interspeech.2012-86"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., and Marchi, E. (2013, January 25\u201329). The Interspeech 2013 Computational Paralinguistics Challenge: Social Signal, Conflict, Emotion, Autism. Proceedings of the 14th Annual Conference of the International Speech Communication Association, Lyon, France.","DOI":"10.21437\/Interspeech.2013-56"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1109\/TAFFC.2015.2457417","article-title":"The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing","volume":"7","author":"Eyben","year":"2015","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Barkan, O., and Tsiris, D. (2019, January 12\u201317). Deep Synthesizer Parameter Estimation. Proceedings of the ICASSP 2019\u2014IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682964"},{"key":"ref_20","unstructured":"Wu, X., Xu, X., Liu, J., Wang, H., Hu, B., and Nie, F. (2019). Supervised Feature Selection with Orthogonal Regression and Feature Weighting. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Li, J., Wen, Y., and He, L. (2023, January 18\u201322). SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00596"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yao, J., Zhu, Z., Yuan, M., Li, L., and Wang, M. (2025). The Detection of Maize Leaf Disease Based on an Improved Real-Time Detection Transformer Model. Symmetry, 17.","DOI":"10.3390\/sym17060808"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., and Weiss, B. (2005, January 4\u20138). A Database of German Emotional Speech. Proceedings of the Interspeech 2005-Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.","DOI":"10.21437\/Interspeech.2005-446"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Martin, O., Kotsia, I., Macq, B., and Pitas, I. (2006, January 3\u20137). The eNTERFACE\u201905 Audiovisual Emotion Database. Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW\u201906), Washington, DC, USA.","DOI":"10.1109\/ICDEW.2006.145"},{"key":"ref_25","unstructured":"Tao, J., Liu, F., Zhang, M., and Jia, H. (2008, January 21). Design of Speech Corpus for Mandarin Text-to-Speech. Proceedings of the Blizzard Challenge 2008 Workshop, Brisbane, Australia."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1109\/TASLP.2019.2955252","article-title":"Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition","volume":"28","author":"Zhang","year":"2019","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, J., Jiang, L., Zong, Y., Zheng, W., and Zhao, L. (2021, January 6\u201311). Cross-Corpus Speech Emotion Recognition Using Joint Distribution Adaptive Regression. Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414372"},{"key":"ref_28","first-page":"3279","article-title":"Cross-Corpus Speech Emotion Recognition Based on Deep Autoencoder Subdomain Adaptation","volume":"38","author":"Zhuang","year":"2021","journal-title":"J. Comput. Appl. Res. (J. Comput. Appl.)"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2017). Domain-Adversarial Training of Neural Networks. Advances in Computer Vision and Pattern Recognition, Springer.","DOI":"10.1007\/978-3-319-58347-1_10"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/11\/945\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T05:32:48Z","timestamp":1761975168000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/11\/945"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,30]]},"references-count":29,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["info16110945"],"URL":"https:\/\/doi.org\/10.3390\/info16110945","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,10,30]]}}}