{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T16:21:23Z","timestamp":1774023683401,"version":"3.50.1"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"3s","license":[{"start":{"date-parts":[[2022,10,31]],"date-time":"2022-10-31T00:00:00Z","timestamp":1667174400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"King Saud University, Riyadh, Saudi Arabia","award":["RSP-2021\/32"],"award-info":[{"award-number":["RSP-2021\/32"]}]},{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"crossref","award":["62006212"],"award-info":[{"award-number":["62006212"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Fund of State Key Lab. for Novel Software Technology in Nanjing University","award":["KFKT2021B41"],"award-info":[{"award-number":["KFKT2021B41"]}]},{"name":"Industrial Science and Technology Research Project of Henan Province","award":["222102210031, 212102210418, 202102210387"],"award-info":[{"award-number":["222102210031, 212102210418, 202102210387"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,10,31]]},"abstract":"<jats:p>The recent booming of artificial intelligence (AI) applications, e.g., affective robots, human-machine interfaces, autonomous vehicles, and so on, has produced a great number of multi-modal records of human communication. Such data often carry latent subjective users\u2019 attitudes and opinions, which provides a practical and feasible path to realize the connection between human emotion and intelligence services. Sentiment and emotion analysis of multi-modal records is of great value to improve the intelligence level of affective services. However, how to find an optimal manner to learn people\u2019s sentiments and emotional representations has been a difficult problem, since both of them involve subtle mind activity. To solve this problem, a lot of approaches have been published, but most of them are insufficient to mine sentiment and emotion, since they have treated sentiment analysis and emotion recognition as two separate tasks. The interaction between them has been neglected, which limits the efficiency of sentiment and emotion representation learning. In this work, emotion is seen as the external expression of sentiment, while sentiment is the essential nature of emotion. We thus argue that they are strongly related to each other where one\u2019s judgment helps the decision of the other. The key challenges are multi-modal fused representation and the interaction between sentiment and emotion. To solve such issues, we design an external knowledge enhanced multi-task representation learning network, termed KAMT. The major elements contain two attention mechanisms, which are inter-modal and inter-task attentions and an external knowledge augmentation layer. The external knowledge augmentation layer is used to extract the vector of the participant\u2019s gender, age, occupation, and of overall color or shape. The main use of inter-modal attention is to capture effective multi-modal fused features. Inter-task attention is designed to model the correlation between sentiment analysis and emotion classification. We perform experiments on three widely used datasets, and the experimental performance proves the effectiveness of the KAMT model.<\/jats:p>","DOI":"10.1145\/3527175","type":"journal-article","created":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T13:08:35Z","timestamp":1648213715000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Affective Interaction: Attentive Representation Learning for Multi-Modal Sentiment Classification"],"prefix":"10.1145","volume":"18","author":[{"given":"Yazhou","family":"Zhang","sequence":"first","affiliation":[{"name":"Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prayag","family":"Tiwari","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Aalto University, Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lu","family":"Rong","sequence":"additional","affiliation":[{"name":"Faculty of Social Sciences and Liberal Arts, University College Sedaya International, Kuala Lumpur, Malaysia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rui","family":"Chen","sequence":"additional","affiliation":[{"name":"Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nojoom A.","family":"Alnajem","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M. Shamim","family":"Hossain","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2885117"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.02.013"},{"key":"e_1_3_2_4_2","unstructured":"Bharathi Raja Chakravarthi K. P. Soman Rahul Ponnusamy Prasanna Kumar Kumaresan Kingston Pal Thamburaj John P. McCrae et al. 2021. DravidianMultiModality: A dataset for multi-modal sentiment analysis in tamil and malayalam. arXiv:2106.04853. Retrieved from https:\/\/arxiv.org\/abs\/2106.04853."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.401"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-020-05616-w"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3023871"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11036-017-0929-3"},{"key":"e_1_3_2_9_2","first-page":"4171","volume-title":"Proceedings of the NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics. 4171\u20134186."},{"key":"e_1_3_2_10_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina N. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.108107"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Kawin Ethayarajh. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT ELMo and GPT-2 embeddings.  arXiv:1909.00512. Retrieved from https:\/\/arxiv.org\/abs\/1909.00512.","DOI":"10.18653\/v1\/D19-1006"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.3390\/brainsci10100687"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3241056"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2017.2772959"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2018.09.008"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2019.07.040"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2019.01.019"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-10285-x"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.semeval-1.150"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2963630"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCI.2016.7778967"},{"key":"e_1_3_2_23_2","first-page":"1","volume-title":"Proceedings of the10th Italian Information Retrieval Workshop","author":"Li Qiuchi","year":"2019","unstructured":"Qiuchi Li and Massimo Melucci. 2019. Quantum-inspired multimodal representation. In Proceedings of the10th Italian Information Retrieval Workshop. 1\u20132."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107700"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM52615.2021.9669461"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2444731"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.3390\/e23101349"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413608"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.74"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.3033129"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i02.5492"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.semeval-1.149"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2020.3020654"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2955637"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1081"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-020-05102-3"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2883866"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.semeval-1.99"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.06.095"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICAIBD.2018.8396178"},{"key":"e_1_3_2_41_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105\u20136114."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107598"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.semeval-1.160"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2021.03.025"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107676"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2017.2672829"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413690"},{"key":"e_1_3_2_49_2","unstructured":"Wenmeng Yu Hua Xu Ziqi Yuan and Jiele Wu. 2021. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. arXiv:2102.04830. Retrieved from https:\/\/arxiv.org\/abs\/2102.04830."},{"issue":"4","key":"e_1_3_2_50_2","first-page":"49","article-title":"Multi-modal sentiment analysis based on cross-modal context-aware attention","volume":"5","author":"Yuzhu Wang","year":"2021","unstructured":"Wang Yuzhu, Xie Jun, Chen Bo, and Xu Xinying. 2021. Multi-modal sentiment analysis based on cross-modal context-aware attention. Data Analysis and Knowledge Discovery 5, 4 (2021), 49\u201359.","journal-title":"Data Analysis and Knowledge Discovery"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1208"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0923-5965(02)00084-X"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2017.09.048"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/tmm.2015.2393635"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-021-09855-4"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/755"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM52615.2021.9669546"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TFUZZ.2021.3072492"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/MNET.011.2000400"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/MNET.2019.1800344"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2882744"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.3013710"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01012-6"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2020.04.003"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2018.04.029"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2020.10.001"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12193-015-0207-2"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448734.3450913"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2905048"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2019.03.003"},{"key":"e_1_3_2_71_2","doi-asserted-by":"crossref","unstructured":"M. S. Hossain et al. 2016. Audio-visual emotion recognition using big data towards 5G. Mob. Networks Appl. 21 5 (2016) 753\u2013763.","DOI":"10.1007\/s11036-016-0685-9"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3527175","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3527175","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:01Z","timestamp":1750182661000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3527175"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,31]]},"references-count":70,"journal-issue":{"issue":"3s","published-print":{"date-parts":[[2022,10,31]]}},"alternative-id":["10.1145\/3527175"],"URL":"https:\/\/doi.org\/10.1145\/3527175","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,31]]},"assertion":[{"value":"2021-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-14","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}