{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T20:21:26Z","timestamp":1771705286780,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T00:00:00Z","timestamp":1672358400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Henan Province Key Scientific Research Projects Plan of Colleges and Universities","award":["22A520004"],"award-info":[{"award-number":["22A520004"]}]},{"name":"Henan Province Key Scientific Research Projects Plan of Colleges and Universities","award":["22A510001"],"award-info":[{"award-number":["22A510001"]}]},{"name":"Henan Province Key Scientific Research Projects Plan of Colleges and Universities","award":["22A510013"],"award-info":[{"award-number":["22A510013"]}]},{"name":"Henan Province Key Scientific Research Projects Plan of Colleges and Universities","award":["61975053"],"award-info":[{"award-number":["61975053"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["22A520004"],"award-info":[{"award-number":["22A520004"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["22A510001"],"award-info":[{"award-number":["22A510001"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["22A510013"],"award-info":[{"award-number":["22A510013"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61975053"],"award-info":[{"award-number":["61975053"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The absence of labeled samples limits the development of speech emotion recognition (SER). Data augmentation is an effective way to address sample sparsity. However, there is a lack of research on data augmentation algorithms in the field of SER. In this paper, the effectiveness of classical acoustic data augmentation methods in SER is analyzed, based on which a strong generalized speech emotion recognition model based on effective data augmentation is proposed. The model uses a multi-channel feature extractor consisting of multiple sub-networks to extract emotional representations. Different kinds of augmented data that can effectively improve SER performance are fed into the sub-networks, and the emotional representations are obtained by the weighted fusion of the output feature maps of each sub-network. And in order to make the model robust to unseen speakers, we employ adversarial training to generalize emotion representations. A discriminator is used to estimate the Wasserstein distance between the feature distributions of different speakers and to force the feature extractor to learn the speaker-invariant emotional representations by adversarial training. The simulation experimental results on the IEMOCAP corpus show that the performance of the proposed method is 2\u20139% ahead of the related SER algorithm, which proves the effectiveness of the proposed method.<\/jats:p>","DOI":"10.3390\/e25010068","type":"journal-article","created":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T03:31:17Z","timestamp":1672371077000},"page":"68","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation"],"prefix":"10.3390","volume":"25","author":[{"given":"Huawei","family":"Tao","sequence":"first","affiliation":[{"name":"Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"},{"name":"Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China"}]},{"given":"Shuai","family":"Shan","sequence":"additional","affiliation":[{"name":"Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"}]},{"given":"Ziyi","family":"Hu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2672-5483","authenticated-orcid":false,"given":"Chunhua","family":"Zhu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"},{"name":"Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China"}]},{"given":"Hongyi","family":"Ge","sequence":"additional","affiliation":[{"name":"Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China"},{"name":"Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1111\/ecc.13033","article-title":"Automated screening for distress: A perspective for the future","volume":"28","author":"Rana","year":"2019","journal-title":"Eur. J. Cancer Care"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1109\/TASLP.2022.3145287","article-title":"Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition","volume":"30","author":"Zhou","year":"2022","journal-title":"IEEE-ACM Trans. Audio Speech Lang."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.compedu.2019.103649","article-title":"Affective computing in education: A systematic review and future research","volume":"142","author":"Yadegaridehkordi","year":"2019","journal-title":"Comput. Educ."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Aldeneh, Z., and Provost, E.M. (2017, January 5\u20139). Using regional saliency for speech emotion recognition. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952655"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/TAFFC.2016.2515617","article-title":"MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception","volume":"8","author":"Busso","year":"2017","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, H., Tu, M., Huang, J., Narayanan, S., and Georgiou, P. (2020, January 4\u20138). Speaker-Invariant Affective Representation Learning via Adversarial Training. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054580"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Braunschweiler, N., Doddipatla, R., Keizer, S., and Stoyanchev, S. (2021, January 13\u201317). A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation. Proceedings of the 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia.","DOI":"10.1109\/ASRU51503.2021.9687987"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mujaddidurrahman, A., Ernawan, F., Wibowo, A., Sarwoko, E.A., Sugiharto, A., and Wahyudi, M.D.R. (2021, January 24\u201326). Speech Emotion Recognition Using 2D-CNN with Data Augmentation. Proceedings of the 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), Pekan, Malaysia.","DOI":"10.1109\/ICSECS52883.2021.00130"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., and Weiss, B. (2005, January 4\u20138). A database of German emotional speech. Proceedings of the Interspeech, Lisbon, Portugal.","DOI":"10.21437\/Interspeech.2005-446"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, J., and Wang, H. (2021, January 6\u201311). A Speech Emotion Recognition Framework for Better Discrimination of Confusions. Proceedings of the Interspeech, Toronto, ON, Canada.","DOI":"10.21437\/Interspeech.2021-718"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Park, D.S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B., Cubuk, E.D., and Le, Q.V. (2019). SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv.","DOI":"10.21437\/Interspeech.2019-2680"},{"key":"ref_12","unstructured":"Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative Adversarial Networks. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., Katsamanis, A., Potamianos, A., and Narayanan, S. (2019, January 15\u201319). Data Augmentation Using GANs for Speech Emotion Recognition. Proceedings of the Interspeech, Graz, Austria.","DOI":"10.21437\/Interspeech.2019-2561"},{"key":"ref_14","unstructured":"Mariani, G., Scheidegger, F., Istrate, R., Bekas, C., and Malossi, C. (2018). BAGAN: Data Augmentation with Balancing GAN. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yi, L., and Mak, M.W. (2019, January 18\u201321). Adversarial Data Augmentation Network for Speech Emotion Recognition. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China.","DOI":"10.1109\/APSIPAASC47483.2019.9023347"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"107646","DOI":"10.1016\/j.patcog.2020.107646","article-title":"Tackling mode collapse in multi-generator GANs with orthogonal vectors","volume":"110","author":"Li","year":"2021","journal-title":"Pattern Recognit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"125868","DOI":"10.1109\/ACCESS.2019.2938007","article-title":"Speech Emotion Recognition from 3D Log-Mel Spectrograms with Deep Learning Network","volume":"7","author":"Meng","year":"2019","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1803","DOI":"10.1109\/TASLP.2022.3171965","article-title":"ISNet: Individual Standardization Network for Speech Emotion Recognition","volume":"30","author":"Fan","year":"2022","journal-title":"IEEE-ACM Trans. Audio Speech Lang."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Xu, Y., Kong, Q., Wang, W., and Plumbley, M.D. (2018, January 15\u201320). Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461975"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gui, J., Li, Y., Chen, K., Siebert, J., and Chen, Q. (2022, January 23\u201327). End-to-End ASR-Enhanced Neural Network for Alzheimer\u2019s Disease Diagnosis. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747856"},{"key":"ref_21","unstructured":"Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lounnas, K., Lichouri, M., and Abbas, M. (2022, January 17\u201318). Analysis of the Effect of Audio Data Augmentation Techniques on Phone Digit Recognition for Algerian Arabic Dialect. Proceedings of the 2022 International Conference on Advanced Aspects of Software Engineering (ICAASE), Constantine, Algeria.","DOI":"10.1109\/ICAASE56196.2022.9931574"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Hailu, N., Siegert, I., and N\u00fcrnberger, A. (2020, January 21\u201324). Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation. Proceedings of the 2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland.","DOI":"10.1109\/MMSP48831.2020.9287127"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhao, W., and Yin, B. (2022, January 8\u20139). Environmental sound classification based on pitch shifting. Proceedings of the 2022 International Seminar on Computer Science and Engineering Technology (SCSET), Indianapolis, IN, USA.","DOI":"10.1109\/SCSET55041.2022.00070"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lin, W., and Mak, M.W. (2022, January 23\u201327). Robust Speaker Verification Using Population-Based Data Augmentation. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746956"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Han, S., Leng, F., and Jin, Z. (2021, January 14\u201316). Speech Emotion Recognition with a ResNet-CNN-Transformer Parallel Neural Network. Proceedings of the 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), Beijing, China.","DOI":"10.1109\/CISCE52179.2021.9445906"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1675","DOI":"10.1109\/TASLP.2021.3076364","article-title":"Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations","volume":"29","author":"Hsu","year":"2021","journal-title":"IEEE-ACM Trans. Audio Speech Lang."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Jiang, X., Guo, Y., Xiong, X., and Tian, H. (2021, January 10\u201312). A Speech Emotion Recognition Method Based on Improved Residual Network. Proceedings of the 2021 3rd International Academic Exchange Conference on Science and Technology Innovation (IAECST), Guangzhou, China.","DOI":"10.1109\/IAECST54258.2021.9695727"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Luo, D., Zou, Y., and Huang, D. (2017, January 12\u201315). Speech emotion recognition via ensembling neural networks. Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia.","DOI":"10.1109\/APSIPA.2017.8282242"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Xu, M., Zhang, F., and Khan, S.U. (2020, January 6\u20138). Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion. Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA.","DOI":"10.1109\/CCWC47524.2020.9031207"},{"key":"ref_32","unstructured":"Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. (2017). Improved Training of Wasserstein GANs. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10579-008-9076-6","article-title":"IEMOCAP: Interactive emotional dyadic motion capture database","volume":"42","author":"Busso","year":"2008","journal-title":"Lang. Resour. Eval."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.specom.2013.07.011","article-title":"Compensating for speaker or lexical variabilities in speech for emotion recognition","volume":"57","author":"Mariooryad","year":"2014","journal-title":"Speech Commun."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chenchah, F., and Lachiri, Z. (2019, January 16\u201318). Impact of emotion type on emotion recognition through vocal channel. Proceedings of the 2019 International Conference on Signal, Control and Communication (SCC), Hammamet, Tunisia.","DOI":"10.1109\/SCC47175.2019.9116103"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Mirsamadi, S., Barsoum, E., and Zhang, C. (2017, January 5\u20139). Automatic speech emotion recognition using recurrent neural networks with local attention. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952552"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1109\/TAFFC.2017.2730187","article-title":"Emotion Classification Using Segmentation of Vowel-Like and Non-Vowel-Like Regions","volume":"10","author":"Deb","year":"2019","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_38","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/1\/68\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:56:21Z","timestamp":1760147781000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/1\/68"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,30]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["e25010068"],"URL":"https:\/\/doi.org\/10.3390\/e25010068","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,30]]}}}