{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T06:37:56Z","timestamp":1764225476277,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2018,11,2]],"date-time":"2018-11-02T00:00:00Z","timestamp":1541116800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["2011-0030079"],"award-info":[{"award-number":["2011-0030079"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Personalized emotion recognition provides an individual training model for each target user in order to mitigate the accuracy problem when using general training models collected from multiple users. Existing personalized speech emotion recognition research has a cold-start problem that requires a large amount of emotionally-balanced data samples from the target user when creating the personalized training model. Such research is difficult to apply in real environments due to the difficulty of collecting numerous target user speech data with emotionally-balanced label samples. Therefore, we propose the Robust Personalized Emotion Recognition Framework with the Adaptive Data Boosting Algorithm to solve the cold-start problem. The proposed framework incrementally provides a customized training model for the target user by reinforcing the dataset by combining the acquired target user speech with speech from other users, followed by applying SMOTE (Synthetic Minority Over-sampling Technique)-based data augmentation. The proposed method proved to be adaptive across a small number of target user datasets and emotionally-imbalanced data environments through iterative experiments using the IEMOCAP (Interactive Emotional Dyadic Motion Capture) database.<\/jats:p>","DOI":"10.3390\/s18113744","type":"journal-article","created":{"date-parts":[[2018,11,5]],"date-time":"2018-11-05T04:26:39Z","timestamp":1541391999000},"page":"3744","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3675-2258","authenticated-orcid":false,"given":"Jaehun","family":"Bang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taeho","family":"Hur","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dohyeong","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9172-2935","authenticated-orcid":false,"given":"Thien","family":"Huynh-The","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jongwon","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongkoo","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oresti","family":"Banos","sequence":"additional","affiliation":[{"name":"Department of Computer Architecture and Computer Technology, University of Granada, C\/Periodista Daniel Saucedo Aranda s\/n, E-18071 Granada, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jee-In","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Smart ICT Convergence, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sungyoung","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Kyung Hee University, (Global Campus), 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,11,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Baltru\u0161aitis, T., Ahuja, C., and Morency, L.P. (2018). Multimodal Machine Learning: A Survey and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1007\/s12193-015-0195-2","article-title":"Emonets: Multimodal deep learning approaches for emotion recognition in video","volume":"10","author":"Kahou","year":"2016","journal-title":"J. Multimodal User Interfaces"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1145\/3129340","article-title":"Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends","volume":"61","author":"Schuller","year":"2018","journal-title":"Commun. ACM"},{"key":"ref_4","first-page":"62","article-title":"Emotional Prediction and Content Profile Estimation in Evaluating Audiovisual Mediated Communication","volume":"2","author":"Kotsakis","year":"2014","journal-title":"Int. J. Monit. Surveill. Technol. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1458","DOI":"10.3390\/s150101458","article-title":"Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition","volume":"15","author":"Wang","year":"2015","journal-title":"Sensors"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhu, L., Chen, L., Zhao, D., Zhou, J., and Zhang, W. (2017). Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17.","DOI":"10.3390\/s17071694"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Xiao, Z., Dellandr\u00e9a, E., Chen, L., and Dou, W. (2009, January 10\u201312). Recognition of emotions in speech by a hierarchical approach. Proceedings of the 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, Amsterdam, The Netherlands.","DOI":"10.1109\/ACII.2009.5349587"},{"key":"ref_8","first-page":"401","article-title":"A Study on the Improvement of Emotion Recognition by Gender Discrimination","volume":"45","author":"Cho","year":"2008","journal-title":"J. IEEK"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.inffus.2017.02.003","article-title":"A review of affective computing: From unimodal analysis to multimodal fusion","volume":"37","author":"Poria","year":"2017","journal-title":"Inf. Fusion"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., and Zafeiriou, S. (2016, January 20\u201325). Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472669"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1576","DOI":"10.1109\/TMM.2017.2766843","article-title":"Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching","volume":"20","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, Z.T., Wu, M., Cao, W.H., Mao, J.W., Xu, J.P., and Tan, G.Z. (2017). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing.","DOI":"10.1016\/j.neucom.2017.07.050"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1028","DOI":"10.1016\/j.neucom.2017.09.049","article-title":"Efficient and effective strategies for cross-corpus acoustic emotion recognition","volume":"275","author":"Kaya","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"562","DOI":"10.17743\/jaes.2017.0022","article-title":"Supervised Vocal-Based Emotion Recognition Using Multiclass Support Vector Machine, Random Forests and Adaboost","volume":"65","author":"Noroozi","year":"2017","journal-title":"J. Audio Eng. Soc."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Gosztolya, G., Busa-Fekete, R., and Toth, L. (2013, January 25\u201329). Detecting Autism, Emotions and Social Signals Using AdaBoost. Proceedings of the Interspeech, Lyon, France.","DOI":"10.21437\/Interspeech.2013-71"},{"key":"ref_16","unstructured":"Liu, T., Fang, S., Zhao, Y., Wang, P., and Zhang, J. (arXiv, 2015). Implementation of training convolutional neural networks, arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Emer\u0161i\u010d, \u017d., \u0160tepec, D., \u0160truc, V., and Peer, P. (arXiv, 2017). Training convolutional neural networks with limited training data for ear recognition in the wild, arXiv.","DOI":"10.1109\/FG.2017.123"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Vryzas, N., Liatsou, A., Kotsakis, R., Dimoulas, C., and Kalliris, G. (2017, January 23\u201326). Augmenting Drama: A Speech Emotion-Controlled Stage Lighting Framework. Proceedings of the AudioMostly 2017 Conference, London, UK.","DOI":"10.1145\/3123514.3123557"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"457","DOI":"10.17743\/jaes.2018.0036","article-title":"Speech Emotion Recognition for Performance Interaction","volume":"66","author":"Vryzas","year":"2018","journal-title":"J. Audio Eng. Soc."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Abdelwahab, M., and Busso, C. (2015, January 19\u201324). Supervised domain adaptation for emotion recognition from speech. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia.","DOI":"10.1109\/ICASSP.2015.7178934"},{"key":"ref_21","unstructured":"Shinoda, K. (2011, January 18\u201321). Speaker adaptation techniques for speech recognition using probabilistic models. Proceedings of the APSIPA ASC 2011, Xi\u2019an, China."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1109\/T-AFFC.2013.26","article-title":"Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech","volume":"4","author":"Busso","year":"2013","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Busso, C., Metallinou, A., and Narayanan, S. (2011, January 22\u201327). Iterative feature normalization for emotional speech detection. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic.","DOI":"10.1109\/ICASSP.2011.5947652"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, Y., Du, S., and Zhan, Y. (2008, January 18\u201320). Adaptive and optimal classification of speech emotion recognition. Proceedings of the 2008 Fourth International Conference on Natural Computation 2008, ICNC\u201908, Jinan, China.","DOI":"10.1109\/ICNC.2008.713"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Mao, Q., Xue, W., Rao, Q., Zhang, F., and Zhan, Y. (2016, January 20\u201325). Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472149"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Mishra, T., and Dimitriadis, D. (2013, January 25\u201329). Incremental emotion recognition. Proceedings of the Interspeech, Lyon, France.","DOI":"10.21437\/Interspeech.2013-254"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Abdelwahab, M., and Busso, C. (2017, January 5\u20139). Incremental adaptation using active learning for acoustic emotion recognition. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953140"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/j.engappai.2016.02.018","article-title":"Multistage data selection-based unsupervised speaker adaptation for personalized speech emotion recognition","volume":"52","author":"Kim","year":"2016","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_29","unstructured":"McKay, C., Fujinaga, I., and Depalle, P. (2005, January 11\u201315). jAudio: A feature extraction library. Proceedings of the International Conference on Music Information Retrieval, London, UK."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"27","DOI":"10.5815\/ijigsp.2014.06.04","article-title":"Silence Removal and Endpoint Detection of Speech Signal for Text Independent Speaker Identification","volume":"6","author":"Sahoo","year":"2014","journal-title":"Int. J. Image, Graph. Signal Process."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Vryzas, N., Vrysis, L., Kotsakis, R., and Dimoulas, C. (2018, January 6\u20137). Speech Emotion Recognition Adapted to Multimodal Semantic Repositories. Proceedings of the 13th International Workshop on Semantic and Social Media Adaptation and Personalization, Zaragoza, Spain.","DOI":"10.1109\/SMAP.2018.8501881"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Anagnostopoulos, C.N., and Iliou, T. (2010). Towards emotion recognition from speech: Definition, problems and the materials of research. Semantics in Adaptive and Personalized Services, Springer.","DOI":"10.1007\/978-3-642-11684-1_8"},{"key":"ref_33","first-page":"400","article-title":"A Review: Speech Emotion Recognition","volume":"6","author":"Peerzade","year":"2018","journal-title":"Int. J. Comput. Sci. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Chamoli, A., Semwal, A., and Saikia, N. (2017, January 19\u201320). Detection of emotion in analysis of speech using linear predictive coding techniques (LPC). Proceedings of the 2017 International Conference on IEEE Inventive Systems and Control (ICISC), Coimbatore, India.","DOI":"10.1109\/ICISC.2017.8068642"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ijaz, M., Alfian, G., Syafrudin, M., and Rhee, J. (2018). Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCAN-Based Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), and Random Forest. Appl. Sci., 8.","DOI":"10.3390\/app8081325"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1002\/ajpa.1330860307","article-title":"Euclidean distance matrix analysis: A coordinate-free approach for comparing biological shapes using landmark data","volume":"86","author":"Lele","year":"1991","journal-title":"Am. J. Phys. Anthropol."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Trampe, D., Quoidbach, J., and Taquet, M. (2015). Emotions in everyday life. PLoS ONE, 10.","DOI":"10.1371\/journal.pone.0145450"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1080\/02664760500079464","article-title":"A generalized normal distribution","volume":"32","author":"Nadarajah","year":"2005","journal-title":"J. Appl. Stat."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Huang, C., Liang, R., Wang, Q., Xi, J., Zha, C., and Zhao, L. (2013). Practical speech emotion recognition based on online learning: From acted data to elicited data. Math. Probl. Eng.","DOI":"10.1155\/2013\/265819"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ipm.2008.09.003","article-title":"Acoustic feature selection for automatic emotion recognition from speech","volume":"45","author":"Rong","year":"2009","journal-title":"Inf. Process. Manag."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10579-008-9076-6","article-title":"IEMOCAP: Interactive emotional dyadic motion capture database","volume":"42","author":"Busso","year":"2008","journal-title":"Lang. Resour. Eval."},{"key":"ref_43","unstructured":"Chernykh, V., Sterling, G., and Prihodko, P. (arXiv, 2017). Emotion recognition from speech with recurrent neural networks, arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Martin, O., Kotsia, I., Macq, B., and Pitas, I. (2006, January 3\u20137). The enterface\u201905 audio-visual emotion database. Proceedings of the 22nd International Conference on IEEE Data Engineering Workshops, Atlanta, GA, USA.","DOI":"10.1109\/ICDEW.2006.145"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., and Batliner, A. (2009, January 6\u201310). The interspeech 2009 emotion challenge. Proceedings of the Tenth Annual Conference of the International Speech Communication Association, Brighton, UK.","DOI":"10.21437\/Interspeech.2009-103"},{"key":"ref_46","unstructured":"Jackson, P., and Haq, S. (2014). Surrey Audio-Visual Expressed Emotion (Savee) Database, University of Surrey."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Hoens, T.R., and Chawla, N.V. (2013). Imbalanced Datasets: From Sampling to Classifiers. Imbalanced Learning: Foundations, Algorithms, and Applications, Wiley.","DOI":"10.1002\/9781118646106.ch3"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: An update","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Chavhan, Y.D., Yelure, B.S., and Tayade, K.N. (2015, January 26\u201327). Speech emotion recognition using RBF kernel of LIBSVM. Proceedings of the 2015 2nd International Conference on IEEE Electronics and Communication Systems (ICECS), Coimbatore, India.","DOI":"10.1109\/ECS.2015.7124760"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/11\/3744\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:27:42Z","timestamp":1760196462000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/11\/3744"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,11,2]]},"references-count":49,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2018,11]]}},"alternative-id":["s18113744"],"URL":"https:\/\/doi.org\/10.3390\/s18113744","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2018,11,2]]}}}