{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T21:30:13Z","timestamp":1774128613396,"version":"3.50.1"},"reference-count":54,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2021,7,9]]},"abstract":"<jats:p>Emotion recognition in conversations is crucial as there is an urgent need to improve the overall experience of human-computer interactions. A promising improvement in this field is to develop a model that can effectively extract adequate contexts of a test utterance. We introduce a novel model, termed hierarchical memory networks (HMN), to address the issues of recognizing utterance level emotions. HMN divides the contexts into different aspects and employs different step lengths to represent the weights of these aspects. To model the self dependencies, HMN takes independent local memory networks to model these aspects. Further, to capture the interpersonal dependencies, HMN employs global memory networks to integrate the local outputs into global storages. Such storages can generate contextual summaries and help to find the emotional dependent utterance that is most relevant to the test utterance. With an attention-based multi-hops scheme, these storages are then merged with the test utterance using an addition operation in the iterations. Experiments on the IEMOCAP dataset show our model outperforms the compared methods with accuracy improvement.<\/jats:p>","DOI":"10.3233\/ida-205183","type":"journal-article","created":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T18:24:02Z","timestamp":1626200642000},"page":"1031-1045","source":"Crossref","is-referenced-by-count":4,"title":["Multimodal emotion recognition with hierarchical memory networks"],"prefix":"10.1177","volume":"25","author":[{"given":"Helang","family":"Lai","sequence":"first","affiliation":[{"name":"Guangdong Justice Police Vocational College, Guangzhou, Guangdong, China"},{"name":"School of Computer Science, South China Normal University, Guangzhou, Guangdong, China"}]},{"given":"Keke","family":"Wu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Information Technology, Shenzhen, Guangdong, China"}]},{"given":"Lingli","family":"Li","sequence":"additional","affiliation":[{"name":"Guangdong Justice Police Vocational College, Guangzhou, Guangdong, China"}]}],"member":"179","reference":[{"issue":"6","key":"10.3233\/IDA-205183_ref1","doi-asserted-by":"crossref","first-page":"1477","DOI":"10.3233\/IDA-140267","article-title":"Multilingual emotion classifier using unsupervised pattern extraction from microblog data","volume":"20","author":"Argueta","year":"2016","journal-title":"Intelligent Data Analysis"},{"issue":"6","key":"10.3233\/IDA-205183_ref2","doi-asserted-by":"crossref","first-page":"1393","DOI":"10.3233\/IDA-163181","article-title":"On the need of hierarchical emotion classification: detecting the implicit feature using constrained topic model","volume":"21","author":"Zhang","year":"2017","journal-title":"Intelligent Data Analysis"},{"issue":"1","key":"10.3233\/IDA-205183_ref3","doi-asserted-by":"crossref","first-page":"227","DOI":"10.3233\/IDA-173781","article-title":"An efficient density-based clustering with side information and active learning: a case study for facial expression recognition task","volume":"23","author":"Vu","year":"2019","journal-title":"Intelligent Data Analysis"},{"key":"10.3233\/IDA-205183_ref4","doi-asserted-by":"crossref","unstructured":"D. Hazarika, S. Poria, A. Zadeh, E. Cambria, L.-P. Morency and R. Zimmermann, Conversational memory network for emotion recognition in dyadic dialogue videos, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers), 2018, pp.\u00a02122\u20132132.","DOI":"10.18653\/v1\/N18-1193"},{"key":"10.3233\/IDA-205183_ref5","doi-asserted-by":"crossref","unstructured":"N. Majumder, S. Poria, D. Hazarika, R. Mihalcea, A. Gelbukh and E. Cambria, Dialoguernn: An attentive rnn for emotion detection in conversations, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a033, 2019, pp.\u00a06818\u20136825.","DOI":"10.1609\/aaai.v33i01.33016818"},{"key":"10.3233\/IDA-205183_ref6","doi-asserted-by":"crossref","unstructured":"T. Young, E. Cambria, I. Chaturvedi, H. Zhou, S. Biswas and M. Huang, Augmenting end-to-end dialogue systems with commonsense knowledge, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.","DOI":"10.1609\/aaai.v32i1.11923"},{"key":"10.3233\/IDA-205183_ref8","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.inffus.2017.02.003","article-title":"A review of affective computing: From unimodal analysis to multimodal fusion","volume":"37","author":"Poria","year":"2017","journal-title":"Information Fusion"},{"key":"10.3233\/IDA-205183_ref9","doi-asserted-by":"crossref","unstructured":"H. Zhou, M. Huang, T. Zhang, X. Zhu and B. Liu, Emotional chatting machine: Emotional conversation generation with internal and external memory, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.","DOI":"10.1609\/aaai.v32i1.11325"},{"key":"10.3233\/IDA-205183_ref10","doi-asserted-by":"crossref","unstructured":"D. Hazarika, S. Poria, R. Mihalcea, E. Cambria and R. Zimmermann, ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp.\u00a02594\u20132604.","DOI":"10.18653\/v1\/D18-1280"},{"key":"10.3233\/IDA-205183_ref11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0191-3085(00)22002-9","article-title":"How emotions work: the social functions of emotional expression in negotiations","volume":"22","author":"Morris","year":"2000","journal-title":"Research in Organizational Behavior"},{"issue":"2","key":"10.3233\/IDA-205183_ref12","doi-asserted-by":"crossref","first-page":"202","DOI":"10.1111\/j.1467-6486.2012.01087.x","article-title":"Emotional dynamics and strategizing processes: a study of strategic conversations in top team meetings","volume":"51","author":"Liu","year":"2014","journal-title":"Journal of Management Studies"},{"issue":"2","key":"10.3233\/IDA-205183_ref13","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1037\/a0024756","article-title":"Changing emotion dynamics: individual differences in the effect of anticipatory social stress on emotional inertia","volume":"12","author":"Koval","year":"2012","journal-title":"Emotion"},{"issue":"7","key":"10.3233\/IDA-205183_ref14","doi-asserted-by":"crossref","first-page":"984","DOI":"10.1177\/0956797610372634","article-title":"Emotional inertia and psychological maladjustment","volume":"21","author":"Kuppens","year":"2010","journal-title":"Psychological Science"},{"key":"10.3233\/IDA-205183_ref15","unstructured":"C. Navarretta, Mirroring facial expressions and emotions in dyadic conversations, in: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016, pp.\u00a0469\u2013474."},{"issue":"1","key":"10.3233\/IDA-205183_ref16","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1109\/T-AFFC.2010.10","article-title":"Affective computing: from laughter to IEEE","volume":"1","author":"Picard","year":"2010","journal-title":"IEEE Transactions on Affective Computing"},{"issue":"4","key":"10.3233\/IDA-205183_ref17","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1037\/0003-066X.48.4.384","article-title":"Facial expression and emotion","volume":"48","author":"Ekman","year":"1993","journal-title":"American Psychologist"},{"key":"10.3233\/IDA-205183_ref19","doi-asserted-by":"crossref","unstructured":"D. Datcu and L.J. Rothkrantz, Emotion recognition using bimodal data fusion, in: Proceedings of the 12th International Conference on Computer Systems and Technologies, ACM, 2011, pp.\u00a0122\u2013128.","DOI":"10.1145\/2023607.2023629"},{"key":"10.3233\/IDA-205183_ref20","doi-asserted-by":"crossref","unstructured":"C.O. Alm, D. Roth and R. Sproat, Emotions from text: machine learning for text-based emotion prediction, in: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2005, pp.\u00a0579\u2013586.","DOI":"10.3115\/1220575.1220648"},{"key":"10.3233\/IDA-205183_ref21","doi-asserted-by":"crossref","unstructured":"C. Strapparava and R. Mihalcea, Annotating and identifying emotions in text, in: Intelligent Information Access, Springer, 2010, pp.\u00a021\u201338.","DOI":"10.1007\/978-3-642-14000-6_2"},{"issue":"2","key":"10.3233\/IDA-205183_ref22","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1109\/TPAMI.2018.2798607","article-title":"Multimodal machine learning: a survey and taxonomy","volume":"41","author":"Baltru\u0161aitis","year":"2018","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"10.3233\/IDA-205183_ref23","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.knosys.2018.07.041","article-title":"Multimodal sentiment analysis using hierarchical fusion with context modeling","volume":"161","author":"Majumder","year":"2018","journal-title":"Knowledge-Based Systems"},{"issue":"2","key":"10.3233\/IDA-205183_ref24","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1109\/T-AFFC.2011.37","article-title":"Multimodal emotion recognition in response to videos","volume":"3","author":"Soleymani","year":"2011","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.3233\/IDA-205183_ref26","doi-asserted-by":"crossref","unstructured":"M. Chen, S. Wang, P.P. Liang, T. Baltru\u0161aitis, A. Zadeh and L.-P. Morency, Multimodal sentiment analysis with word-level fusion and reinforcement learning, in: Proceedings of the 19th ACM International Conference on Multimodal Interaction, ACM, 2017, pp.\u00a0163\u2013171.","DOI":"10.1145\/3136755.3136801"},{"key":"10.3233\/IDA-205183_ref27","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.knosys.2014.05.005","article-title":"Sentic patterns: Dependency-based rules for concept-level sentiment analysis","volume":"69","author":"Poria","year":"2014","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/IDA-205183_ref28","doi-asserted-by":"crossref","unstructured":"A. Zadeh, P.P. Liang, S. Poria, P. Vij, E. Cambria and L.-P. Morency, Multi-attention recurrent network for human communication comprehension, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.","DOI":"10.1609\/aaai.v32i1.12024"},{"key":"10.3233\/IDA-205183_ref29","doi-asserted-by":"crossref","unstructured":"S. Poria, E. Cambria, D. Hazarika, N. Majumder, A. Zadeh and L.-P. Morency, Context-dependent sentiment analysis in user-generated videos, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers), 2017, pp.\u00a0873\u2013883.","DOI":"10.18653\/v1\/P17-1081"},{"issue":"3","key":"10.3233\/IDA-205183_ref30","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1007\/s10489-013-0478-9","article-title":"An SVM-AdaBoost facial expression recognition system","volume":"40","author":"Owusu","year":"2014","journal-title":"Applied Intelligence"},{"issue":"5","key":"10.3233\/IDA-205183_ref31","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.3233\/IDA-184311","article-title":"An attention-gated convolutional neural network for sentence classification","volume":"23","author":"Liu","year":"2019","journal-title":"Intelligent Data Analysis"},{"issue":"2","key":"10.3233\/IDA-205183_ref32","doi-asserted-by":"crossref","first-page":"259","DOI":"10.3233\/IDA-183842","article-title":"Utilizing Recurrent Neural Network for topic discovery in short text scenarios","volume":"23","author":"Lu","year":"2019","journal-title":"Intelligent Data Analysis"},{"key":"10.3233\/IDA-205183_ref33","doi-asserted-by":"crossref","unstructured":"S. Ebrahimi Kahou, V. Michalski, K. Konda, R. Memisevic and C. Pal, Recurrent neural networks for emotion recognition in video, in: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp.\u00a0467\u2013474.","DOI":"10.1145\/2818346.2830596"},{"key":"10.3233\/IDA-205183_ref34","doi-asserted-by":"crossref","unstructured":"A. Zadeh, P.P. Liang, N. Mazumder, S. Poria, E. Cambria and L.-P. Morency, Memory fusion network for multi-view sequential learning, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.","DOI":"10.1609\/aaai.v32i1.12021"},{"issue":"3","key":"10.3233\/IDA-205183_ref37","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MCI.2018.2840738","article-title":"Recent trends in deep learning based natural language processing","volume":"13","author":"Young","year":"2018","journal-title":"Ieee Computational Intelligen Ce Magazine"},{"issue":"2","key":"10.3233\/IDA-205183_ref38","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1007\/s10489-018-1286-z","article-title":"Formulation of a hybrid expertise retrieval system in community question answering services","volume":"49","author":"Kundu","year":"2019","journal-title":"Applied Intelligence"},{"issue":"2","key":"10.3233\/IDA-205183_ref39","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1007\/s10489-019-01544-y","article-title":"User correlation model for question recommendation in community question answering","volume":"50","author":"Fu","year":"2020","journal-title":"Applied Intelligence"},{"key":"10.3233\/IDA-205183_ref40","unstructured":"A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus and R. Socher, Ask me anything: Dynamic memory networks for natural language processing, in: International Conference on Machine Learning, 2016, pp.\u00a01378\u20131387."},{"key":"10.3233\/IDA-205183_ref41","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.knosys.2014.06.011","article-title":"EmoSenticSpace: A novel framework for affective common-sense reasoning","volume":"69","author":"Poria","year":"2014","journal-title":"Knowledge-Based Systems"},{"issue":"2","key":"10.3233\/IDA-205183_ref42","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1007\/s10489-010-0222-7","article-title":"A ranking method for example based machine translation results by learning from user feedback","volume":"35","author":"Daybelge","year":"2011","journal-title":"Applied Intelligence"},{"issue":"3","key":"10.3233\/IDA-205183_ref43","doi-asserted-by":"crossref","first-page":"534","DOI":"10.1007\/s10489-016-0846-3","article-title":"Speech translation system for english to dravidian languages","volume":"46","author":"Sangeetha","year":"2017","journal-title":"Applied Intelligence"},{"key":"10.3233\/IDA-205183_ref44","doi-asserted-by":"crossref","unstructured":"R. Kar, A. Konar, A. Chakraborty, B.S. Bhattacharya and A.K. Nagar, EEG source localization by memory network analysis of subjects engaged in perceiving emotions from facial expressions, in: 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, 2015, pp.\u00a01\u20138.","DOI":"10.1109\/IJCNN.2015.7280705"},{"issue":"5","key":"10.3233\/IDA-205183_ref48","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1177\/02654075030205002","article-title":"Emotion regulation in romantic relationships: The cognitive consequences of concealing feelings","volume":"20","author":"Richards","year":"2003","journal-title":"Journal of Social and Personal Relationships"},{"key":"10.3233\/IDA-205183_ref49","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.riob.2008.04.007","article-title":"Emotion cycles: on the social influence of emotion in organizations","volume":"28","author":"Hareli","year":"2008","journal-title":"Research in Organizational Behavior"},{"key":"10.3233\/IDA-205183_ref50","first-page":"026","article-title":"Textbased emotion transformation analysis","volume":"9","author":"Yang","year":"2011","journal-title":"Computer Engineering & Science"},{"issue":"6","key":"10.3233\/IDA-205183_ref51","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1109\/CC.2013.6549266","article-title":"Emotional state transition model based on stimulus and personality characteristics","volume":"10","author":"Xiaolan","year":"2013","journal-title":"China Communications"},{"key":"10.3233\/IDA-205183_ref52","doi-asserted-by":"crossref","unstructured":"M.-C. Sun, S.-H. Hsu, M.-C. Yang and J.-H. Chien, Context-aware cascade attention-based RNN for video emotion recognition, in: 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), IEEE, 2018, pp.\u00a01\u20136.","DOI":"10.1109\/ACIIAsia.2018.8470372"},{"issue":"2","key":"10.3233\/IDA-205183_ref53","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1109\/T-AFFC.2011.40","article-title":"Context-sensitive learning for enhanced audiovisual emotion classification","volume":"3","author":"Metallinou","year":"2012","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.3233\/IDA-205183_ref55","doi-asserted-by":"crossref","unstructured":"F. Eyben, M. W\u00f6llmer and B. Schuller, Opensmile: the munich versatile and fast open-source audio feature extractor, in: Proceedings of the 18th ACM International Conference on Multimedia, ACM, 2010, pp.\u00a01459\u20131462.","DOI":"10.1145\/1873951.1874246"},{"key":"10.3233\/IDA-205183_ref56","doi-asserted-by":"crossref","unstructured":"D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp.\u00a04489\u20134497.","DOI":"10.1109\/ICCV.2015.510"},{"issue":"8","key":"10.3233\/IDA-205183_ref58","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"issue":"4","key":"10.3233\/IDA-205183_ref59","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10579-008-9076-6","article-title":"IEMOCAP: Interactive emotional dyadic motion capture database","volume":"42","author":"Busso","year":"2008","journal-title":"Language Resources and Evaluation"},{"key":"10.3233\/IDA-205183_ref60","doi-asserted-by":"crossref","unstructured":"E. Cambria, D. Hazarika, S. Poria, A. Hussain and R. Subramanyam, Benchmarking multimodal sentiment analysis, in: International Conference on Computational Linguistics and Intelligent Text Processing, Springer, 2017, pp.\u00a0166\u2013179.","DOI":"10.1007\/978-3-319-77116-8_13"},{"issue":"1","key":"10.3233\/IDA-205183_ref62","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"issue":"Feb","key":"10.3233\/IDA-205183_ref63","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"Journal of Machine Learning Research"},{"key":"10.3233\/IDA-205183_ref65","doi-asserted-by":"crossref","unstructured":"S. Poria, I. Chaturvedi, E. Cambria and A. Hussain, Convolutional MKL based multimodal emotion recognition and sentiment analysis, in: 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp.\u00a0439\u2013448.","DOI":"10.1109\/ICDM.2016.0055"},{"key":"10.3233\/IDA-205183_ref66","unstructured":"V. P\u00e9rez-Rosas, R. Mihalcea and L.-P. Morency, Utterance-level multimodal sentiment analysis, in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2013, pp.\u00a0973\u2013982."}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-205183","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T09:06:12Z","timestamp":1741683972000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-205183"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,9]]},"references-count":54,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/ida-205183","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"value":"1088-467X","type":"print"},{"value":"1571-4128","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,9]]}}}