{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T07:01:37Z","timestamp":1777705297254,"version":"3.51.4"},"reference-count":15,"publisher":"SAGE Publications","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2023,1,30]]},"abstract":"<jats:p>Emotional AI is the next era of AI to play a major role in various fields such as entertainment, health care, self-paced online education, etc., considering clues from multiple sources. In this work, we propose a multimodal emotion recognition system extracting information from speech, motion capture, and text data. The main aim of this research is to improve the unimodal architectures to outperform the state-of-the-arts and combine them together to build a robust multi-modal fusion architecture. We developed 1D and 2D CNN-LSTM time-distributed models for speech, a hybrid CNN-LSTM model for motion capture data, and a BERT-based model for text data to achieve state-of-the-art results, and attempted both concatenation-based decision-level fusion and Deep CCA-based feature-level fusion schemes. The proposed speech and mocap models achieve emotion recognition accuracies of 65.08% and 67.51%, respectively, and the BERT-based text model achieves an accuracy of 72.60%. The decision-level fusion approach significantly improves the accuracy of detecting emotions on the IEMOCAP and MELD datasets. This approach achieves 80.20% accuracy on IEMOCAP which is 8.61% higher than the state-of-the-art methods, and 63.52% and 61.65% in 5-class and 7-class classification on the MELD dataset which are higher than the state-of-the-arts.<\/jats:p>","DOI":"10.3233\/jifs-220280","type":"journal-article","created":{"date-parts":[[2022,11,4]],"date-time":"2022-11-04T11:36:17Z","timestamp":1667561777000},"page":"2455-2470","source":"Crossref","is-referenced-by-count":4,"title":["Towards enhancing emotion recognition via multimodal framework"],"prefix":"10.1177","volume":"44","author":[{"given":"C.","family":"Akalya devi","sequence":"first","affiliation":[{"name":"Department of Information Technology, PSG College of Technology, Coimbatore, India"}]},{"given":"D.","family":"Karthika Renuka","sequence":"additional","affiliation":[{"name":"Department of Information Technology, PSG College of Technology, Coimbatore, India"}]},{"given":"G.","family":"Pooventhiran","sequence":"additional","affiliation":[{"name":"Qualcomm India Private Limited Chennai, India"}]},{"given":"D.","family":"Harish","sequence":"additional","affiliation":[{"name":"Software AG, Bangalore, India"}]},{"given":"Shweta","family":"Yadav","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA"}]},{"given":"Krishnaprasad","family":"Thirunarayan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Wright State University, Dayton, OH, USA"}]}],"member":"179","reference":[{"key":"10.3233\/JIFS-220280_ref1","doi-asserted-by":"crossref","first-page":"1175","DOI":"10.1109\/34.954607","article-title":"Toward machine emotional intelligence: Analysis of affective physiologicalstate","volume":"23","author":"Picard","year":"2001","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"10.3233\/JIFS-220280_ref4","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1037\/h0030377","article-title":"Constants across cultures in the face and emotion,","volume":"17","author":"Ekman","year":"1971","journal-title":"Journal of Personality and Social Psychology"},{"key":"10.3233\/JIFS-220280_ref5","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10579-008-9076-6","article-title":"Iemocap: Interactive emotional dyadic motion capture database,","volume":"42","author":"Busso","year":"2008","journal-title":"Language Resources and Evaluation"},{"key":"10.3233\/JIFS-220280_ref8","doi-asserted-by":"publisher","DOI":"10.21437\/SMM.2018-5"},{"key":"10.3233\/JIFS-220280_ref11","first-page":"37","article-title":"Designing affective video games to support the social-emotional development of teenagers with autism spectrum disorders,","volume":"7","author":"Khandaker","year":"2009","journal-title":"Annual Review of Cybertherapy and Telemedicine"},{"issue":"5","key":"10.3233\/JIFS-220280_ref17","doi-asserted-by":"crossref","first-page":"4709","DOI":"10.3233\/JIFS-179020","article-title":"Predicting emotional intensity in social networks","volume":"36","author":"Rodriguez","year":"2019","journal-title":"Journal of Intelligent & Fuzzy Systems"},{"key":"10.3233\/JIFS-220280_ref18","first-page":"423","article-title":"Reusing neural speech representations for auditory emotion recognition, In (Volume : Long Papers), pp","volume":"1","author":"Lakomkin","year":"2017","journal-title":"Proceedings of the Eighth International Joint Conference on Natural Language Processing"},{"key":"10.3233\/JIFS-220280_ref21","doi-asserted-by":"crossref","first-page":"101894","DOI":"10.1016\/j.bspc.2020.101894","article-title":"Speech emotion recognition with deep convolutional neural networks,","volume":"59","author":"Issa","year":"2020","journal-title":"Biomedical Signal Processing and Control"},{"key":"10.3233\/JIFS-220280_ref22","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-917"},{"key":"10.3233\/JIFS-220280_ref29","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1109\/MIS.2018.2882362","article-title":"Multimodal sentiment analysis: Addressing key issues and setting up the baselines","volume":"33","author":"Poria","year":"2018","journal-title":"IEEE Intelligent Systems"},{"key":"10.3233\/JIFS-220280_ref32","doi-asserted-by":"crossref","first-page":"873","DOI":"10.18653\/v1\/P17-1081","article-title":"Context-dependent sentiment analysis user-generated videos, In (volume : long papers)","volume":"1","author":"Poria","year":"2017","journal-title":"Proceedings of the 55th annual meeting of the association for computational linguistics"},{"key":"10.3233\/JIFS-220280_ref37","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.inffus.2017.02.003","article-title":"A review of affective computing: From unimodal analysis to multimodal fusion","volume":"37","author":"Poria","year":"2017","journal-title":"Information Fusion"},{"key":"10.3233\/JIFS-220280_ref38","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.bspc.2018.08.035","article-title":"Speech emotion recognition using deep 1d & 2d cnn lstm networks,","volume":"47","author":"Zhao","year":"2019","journal-title":"Biomedical Signal Processing and Control"},{"key":"10.3233\/JIFS-220280_ref39","first-page":"183","article-title":"A CNN-assisted enhanced audio signal processing for speech emotion recognition,","volume":"20","author":"Kwon","year":"2020","journal-title":"Sensors"},{"key":"10.3233\/JIFS-220280_ref40","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory,","volume":"9","author":"Hochreiter","year":"1997","journal-title":"{Neural Computation"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-220280","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:43:27Z","timestamp":1777455807000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-220280"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,30]]},"references-count":15,"journal-issue":{"issue":"2"},"URL":"https:\/\/doi.org\/10.3233\/jifs-220280","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,30]]}}}