{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,19]],"date-time":"2025-10-19T00:10:19Z","timestamp":1760832619246,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T00:00:00Z","timestamp":1760572800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100007782","name":"Tshwane University of Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007782","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Emotion detection significantly impacts healthcare by enabling personalized patient care and improving treatment outcomes. Single-modality emotion recognition often lacks reliability due to the complexity and subjectivity of human emotions. This study proposes a multi-modal emotion detection platform integrating visual, audio, and heart rate data using AI techniques, including convolutional neural networks and support vector machines. The system outperformed single-modality approaches, demonstrating enhanced accuracy and robustness. This improvement underscores the value of multi-modal AI in emotion detection, offering potential benefits across healthcare, education, and human\u2013computer interaction.<\/jats:p>","DOI":"10.3390\/computers14100441","type":"journal-article","created":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T07:33:50Z","timestamp":1760686430000},"page":"441","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Multi-Modal Emotion Detection and Tracking System Using AI Techniques"],"prefix":"10.3390","volume":"14","author":[{"given":"Werner","family":"Mostert","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Tshwane University of Technology, Pretoria 0001, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7250-3665","authenticated-orcid":false,"given":"Anish","family":"Kurien","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Tshwane University of Technology, Pretoria 0001, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6060-8200","authenticated-orcid":false,"given":"Karim","family":"Djouani","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Tshwane University of Technology, Pretoria 0001, South Africa"},{"name":"LISSI Laboratory, University Paris-Est Cr\u00e9teil (UPEC), 94400 Cr\u00e9teil, France"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1097\/PTS.0b013e3181f6c01a","article-title":"Emotional influences in patient safety","volume":"6","author":"Croskerry","year":"2010","journal-title":"J. Patient Saf."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1007\/s00138-021-01249-8","article-title":"Early, intermediate and late fusion strategies for robust deep learning-based multimodal action recognition","volume":"32","author":"Boulahia","year":"2021","journal-title":"Mach. Vis. Appl."},{"key":"ref_3","first-page":"4101914","article-title":"MM-RNN: A Multimodal RNN for Precipitation Nowcasting","volume":"61","author":"Ma","year":"2023","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Xie, B., Sidulova, M., and Park, C.H. (2021). Article robust multimodal emotion recognition from conversation with transformer-based crossmodality the title fusion. Sensors, 21.","DOI":"10.3390\/s21144913"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Younis, E.M., Zaki, S.M., Kanjo, E., and Houssein, E.H. (2022). Evaluating Ensemble Learning Methods for Multi-Modal Emotion Recognition Using Sensor Data Fusion. Sensors, 22.","DOI":"10.3390\/s22155611"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1588","DOI":"10.1109\/TSMCB.2004.825930","article-title":"Facial expression recognition using constructive feedforward neural networks","volume":"34","author":"Ma","year":"2004","journal-title":"IEEE Trans. Syst. Man, Cybern. Part B (Cybern.)"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1016\/j.asej.2014.04.011","article-title":"Sentiment analysis algorithms and applications: A survey","volume":"5","author":"Medhat","year":"2014","journal-title":"Ain Shams Eng. J."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"105486","DOI":"10.1016\/j.engappai.2022.105486","article-title":"Improved Deep CNN-based Two Stream Super Resolution and Hybrid Deep Model-based Facial Emotion Recognition","volume":"116","author":"Ullah","year":"2022","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_9","first-page":"463","article-title":"Facial Expression Detection and Recognition Through VIOLA-JONES Algorithm and HCNN Using LSTM Method","volume":"7","author":"Kumar","year":"2021","journal-title":"Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shin, D.H., Chung, K., and Park, R.C. (2019). Detection of emotion using multi-block deep learning in a self-management interview app. Appl. Sci., 9.","DOI":"10.3390\/app9224830"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, S., Deng, W., and Du, J.P. (2017, January 22\u201325). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.277"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/3422622","article-title":"Generative adversarial networks","volume":"63","author":"Goodfellow","year":"2020","journal-title":"Commun. Acm"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"5135","DOI":"10.1007\/s10994-023-06367-0","article-title":"Robust generative adversarial network","volume":"112","author":"Zhang","year":"2023","journal-title":"Mach. Learn."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dupr\u00e9, D., Krumhuber, E.G., K\u00fcster, D., and McKeown, G.J. (2020). A performance comparison of eight commercially available automatic classifiers for facial affect recognition. PLoS ONE, 15.","DOI":"10.1371\/journal.pone.0231968"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"e13722","DOI":"10.2196\/13722","article-title":"Opportunities and pitfalls in applying emotion recognition software for persons with a visual impairment: Simulated real life conversations","volume":"7","author":"Buimer","year":"2019","journal-title":"JMIR mHealth uHealth"},{"key":"ref_16","unstructured":"Somers, M. (2023, February 03). Emotion AI, Explained. Available online: https:\/\/mitsloan.mit.edu\/ideas-made-to-matter\/emotion-ai-explained."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Livingstone, S.R., and Russo, F.A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0196391"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., and Marchi, E. (2013, January 25\u201329). The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Lyon, France.","DOI":"10.21437\/Interspeech.2013-56"},{"key":"ref_19","unstructured":"Petrushin, V.A. (2023, February 03). Emotion in speech: Recognition and application to call centers. In Proceedings of the Intelligent Engineering Systems Through Artificial Neural Networks, 1999; Volume 9. Available online: https:\/\/www.researchgate.net\/publication\/2611186_Emotion_in_Speech_Recognition_and_Application_to_Call_Centers."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sebe, N., Cohen, I., Gevers, T., and Huang, T. (2006, January 20\u201324). Emotion Recognition Based on Joint Visual and Audio Cues. Proceedings of the 18th International Conference on Pattern Recognition (ICPR\u201906), Hong Kong, China.","DOI":"10.1109\/ICPR.2006.489"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Nimitsurachat, P., and Washington, P. (2024). Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space. AI, 5.","DOI":"10.3390\/ai5010011"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"35173","DOI":"10.1007\/s11042-022-13363-4","article-title":"A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling","volume":"81","author":"Chamishka","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1007\/s10919-015-0209-5","article-title":"Effect of Acting Experience on Emotion Expression and Recognition in Voice: Non-Actors Provide Better Stimuli than Expected","volume":"39","author":"Grass","year":"2015","journal-title":"J. Nonverbal Behav."},{"key":"ref_24","unstructured":"Wilting, J., Krahmer, E., and Swerts, M. (2006, January 17\u201321). Real vs. acted emotional speech. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Pittsburgh, PA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"6455053","DOI":"10.1155\/2021\/6455053","article-title":"An Effective Deep Learning Model for Automated Detection of Myocardial Infarction Based on Ultrashort-Term Heart Rate Variability Analysis","volume":"2021","author":"Shahnawaz","year":"2021","journal-title":"Math. Probl. Eng."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1016\/j.biopsycho.2010.03.010","article-title":"Autonomic nervous system activity in emotion: A review","volume":"84","author":"Kreibig","year":"2010","journal-title":"Biol. Psychol."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, L., Hao, J., and Zhou, T.H. (2023). ECG Multi-Emotion Recognition Based on Heart Rate Variability Signal Features Mining. Sensors, 23.","DOI":"10.3390\/s23208636"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Luo, X., Wang, R., Zhou, Y.X., and Xie, W. (2024). The relationship between emotional disorders and heart rate variability: A Mendelian randomization study. PLoS ONE, 19.","DOI":"10.1371\/journal.pone.0298998"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sep\u00falveda, A., Castillo, F., Palma, C., and Rodriguez-Fernandez, M. (2021). Emotion recognition from ecg signals using wavelet scattering and machine learning. Appl. Sci., 11.","DOI":"10.3390\/app11114945"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1038\/s41597-020-00630-y","article-title":"K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations","volume":"7","author":"Park","year":"2020","journal-title":"Sci. Data"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mamieva, D., Abdusalomov, A.B., Kutlimuratov, A., Muminov, B., and Whangbo, T.K. (2023). Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features. Sensors, 23.","DOI":"10.3390\/s23125475"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Devi, C.A., Renuka, D.K., Pooventhiran, G., Harish, D., Yadav, S., and Thirunarayan, K. (2023). Towards enhancing emotion recognition via multimodal framework. J. Intell. Fuzzy Syst., 44.","DOI":"10.3233\/JIFS-220280"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.eij.2020.07.005","article-title":"A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition","volume":"22","author":"Salama","year":"2021","journal-title":"Egypt. Informatics J."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Hina, I., Shaukat, A., and Akram, M.U. (2022, January 24\u201326). Multimodal Emotion Recognition using Deep Learning Architectures. Proceedings of the 2022 2nd International Conference on Digital Futures and Transformative Technologies, ICoDT2 2022, Rawalpindi, Pakistan.","DOI":"10.1109\/ICoDT255437.2022.9787437"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"133180","DOI":"10.1109\/ACCESS.2020.3010311","article-title":"Multimodal Fused Emotion Recognition about Expression-EEG Interaction and Collaboration Using Deep Learning","volume":"8","author":"Wu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1016\/j.inffus.2020.01.011","article-title":"Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review","volume":"59","author":"Zhang","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.procs.2019.04.009","article-title":"Recognizing emotion from speech based on age and gender using hierarchical models","volume":"151","author":"Shaqra","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_38","first-page":"3","article-title":"Experimental Methods for Inducing Basic Emotions: A Qualitative Review","volume":"11","author":"Siedlecka","year":"2018","journal-title":"Emot. Rev."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Shaffer, F., and Ginsberg, J.P. (2017). An Overview of Heart Rate Variability Metrics and Norms. Front. Public Health, 5.","DOI":"10.3389\/fpubh.2017.00258"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1016\/j.imavis.2012.06.005","article-title":"Static and dynamic 3D facial expression recognition: A comprehensive survey","volume":"30","author":"Sandbach","year":"2012","journal-title":"Image Vis. Comput."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/10\/441\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T04:18:30Z","timestamp":1760761110000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/10\/441"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,16]]},"references-count":40,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["computers14100441"],"URL":"https:\/\/doi.org\/10.3390\/computers14100441","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,10,16]]}}}