{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T06:45:31Z","timestamp":1764225931080,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,1,11]],"date-time":"2024-01-11T00:00:00Z","timestamp":1704931200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Researchers Supporting Project","award":["RSP2024R32"],"award-info":[{"award-number":["RSP2024R32"]}]},{"name":"King Saud University, Riyadh, Saudi Arabia"},{"name":"Foundation of Key Laboratory of Dependable Service Computing in Cyber-Physical-Society (Ministry of Education), Chongqing University","award":["CPSDSC202103"],"award-info":[{"award-number":["CPSDSC202103"]}]},{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"crossref","award":["62006212, 61702462"],"award-info":[{"award-number":["62006212, 61702462"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Fellowship from the China Postdoctoral Science Foundation","award":["2023M733907"],"award-info":[{"award-number":["2023M733907"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>Sentiment and sarcasm are intimate and complex, as sarcasm often deliberately elicits an emotional response in order to achieve its specific purpose. Current challenges in multi-modal sentiment and sarcasm joint detection mainly include multi-modal representation fusion and the modeling of the intrinsic relationship between sentiment and sarcasm. To address these challenges, we propose a single-input stream self-adaptive representation learning model (<jats:bold>SRLM<\/jats:bold>) for sentiment and sarcasm joint recognition. Specifically, we divide the image into blocks to learn its serialized features and fuse textual feature as input to the target model. Then, we introduce an adaptive representation learning network using a gated network approach for sarcasm and sentiment classification. In this framework, each task is equipped with its dedicated expert network responsible for learning task-specific information, while the shared expert knowledge is acquired and weighted through the gating network. Finally, comprehensive experiments conducted on two publicly available datasets, namely Memotion and MUStARD, demonstrate the effectiveness of the proposed model when compared to state-of-the-art baselines. The results reveal a notable improvement on the performance of sentiment and sarcasm tasks.<\/jats:p>","DOI":"10.1145\/3635311","type":"journal-article","created":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T12:02:03Z","timestamp":1701777723000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5699-0176","authenticated-orcid":false,"given":"Yazhou","family":"Zhang","sequence":"first","affiliation":[{"name":"Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of Education, Chongqing University and Articial Intelligence Laboratory, China Mobile Communication Group Tianjin Co., and Zhengzhou University of Light Industry, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4607-8661","authenticated-orcid":false,"given":"Yang","family":"Yu","sequence":"additional","affiliation":[{"name":"Software Engineering College, Zhengzhou University of Light Industry, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3810-8164","authenticated-orcid":false,"given":"Mengyao","family":"Wang","sequence":"additional","affiliation":[{"name":"Software Engineering College, Zhengzhou University of Light Industry, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3462-8931","authenticated-orcid":false,"given":"Min","family":"Huang","sequence":"additional","affiliation":[{"name":"Software Engineering College, Zhengzhou University of Light Industry, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5906-9422","authenticated-orcid":false,"given":"M. Shamim","family":"Hossain","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1,11]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/B978-0-12-804412-4.00007-3","volume-title":"Sentiment Analysis in Social Networks","author":"Farias DI Hern\u00e1ndez","year":"2017","unstructured":"DI Hern\u00e1ndez Farias and Paolo Rosso. 2017. Irony, sarcasm, and sentiment analysis. Sentiment Analysis in Social Networks. Elsevier, 113\u2013128."},{"issue":"3","key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/93.939998","article-title":"Reusable multimedia content in Web based learning systems","volume":"8","author":"Saddik Abdulmotaleb El","year":"2001","unstructured":"Abdulmotaleb El Saddik, Stefan Fischer, and Ralf Steinmetz. 2001. Reusable multimedia content in Web based learning systems. IEEE MultiMedia 8, 3 (2001), 30\u201338.","journal-title":"IEEE MultiMedia"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-022-05026-w"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"13099","DOI":"10.18653\/v1\/2023.acl-long.732","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zhang Xiaoheng","year":"2023","unstructured":"Xiaoheng Zhang and Yang Li. 2023. A cross-modality context fusion and semantic refinement network for emotion recognition in conversation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13099\u201313110."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2018.09.008"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Yue Deng Wenxuan Zhang Sinno Jialin Pan and Lidong Bing. 2023. Bidirectional generative framework for cross-domain aspect-based sentiment analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics . 12272\u201312285.","DOI":"10.18653\/v1\/2023.acl-long.686"},{"key":"e_1_3_1_8_2","first-page":"508","volume-title":"Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld\/UIC\/ScalCom\/DigitalTwin\/PriComp\/Meta)","author":"Liu Yi","year":"2022","unstructured":"Yi Liu, Zengwei Zheng, Binbin Zhou, Jianhua Ma, Lin Sun, and Ruichen Xia. 2022. Multimodal sarcasm detection based on multimodal sentiment co-training. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld\/UIC\/ScalCom\/DigitalTwin\/PriComp\/Meta). IEEE, 508\u2013515."},{"key":"e_1_3_1_9_2","first-page":"2540","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wen Changsong","year":"2023","unstructured":"Changsong Wen, Guoli Jia, and Jufeng Yang. 2023. DIP: Dual incongruity perceiving network for sarcasm detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2540\u20132550."},{"key":"e_1_3_1_10_2","first-page":"704","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Riloff Ellen","year":"2013","unstructured":"Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704\u2013714."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2023.02.023"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Yazhou Zhang Yang Yu Dongming Zhao Zuhe Li Bo Wang Yuexian Hou Prayag Tiwari and Jing Qin. 2023. Learning multi-task commonness and uniqueness for multi-modal sarcasm detection and sentiment analysis in conversation. IEEE Transactions on Artificial Intelligence 1 1 (2023) 1\u201313.","DOI":"10.1109\/TAI.2023.3298328"},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Md. Shad Akhtar Dushyant Singh Chauhan Deepanway Ghosal Soujanya Poria Asif Ekbal and Pushpak Bhat-tacharyya. 2019. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL-HLT . 370\u2013379.","DOI":"10.18653\/v1\/N19-1034"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2022.3178204"},{"key":"e_1_3_1_15_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2021. An image is worth \\(16\\times 16\\) words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations (ICLR\u201921) ."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2018.2867221"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2885279"},{"key":"e_1_3_1_18_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference Advances in Neural Information Processing Systems 30 (2017).","journal-title":"In Proceedings of the 31st International Conference Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_19_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL-HLT . 4171\u20134186."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2021.3139856"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053012"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1382"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-022-03343-4"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.119721"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.11.022"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964321"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.nlpbt-1.3"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-022-12122-9"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1239"},{"key":"e_1_3_1_31_2","first-page":"9507","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"37","author":"Qiao Yang","year":"2023","unstructured":"Yang Qiao, Liqiang Jing, Xuemeng Song, Xiaolin Chen, Lei Zhu, and Liqiang Nie. 2023. Mutual-enhanced incongruity learning network for multi-modal sarcasm detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 9507\u20139515."},{"key":"e_1_3_1_32_2","first-page":"1455","volume-title":"Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld\/UIC\/ScalCom\/DigitalTwin\/PriComp\/Meta)","author":"Lu Xinkai","year":"2022","unstructured":"Xinkai Lu, Ying Qian, Yan Yang, and Wenrao Pang. 2022. Sarcasm detection of dual multimodal contrastive attention networks. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld\/UIC\/ScalCom\/DigitalTwin\/PriComp\/Meta). IEEE, 1455\u20131460."},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"1767","DOI":"10.18653\/v1\/2022.acl-long.124","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","volume":"1","author":"Liang Bin","year":"2022","unstructured":"Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, and Ruifeng Xu. 2022. Multi-modal sarcasm detection via cross-modal graph convolutional network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. Association for Computational Linguistics, 1767\u20131777."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Hui Liu Wenya Wang and Haoliang Li. 2022. Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement. In 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201922) . 4995\u20135006.","DOI":"10.18653\/v1\/2022.emnlp-main.333"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","first-page":"6506","DOI":"10.18653\/v1\/2023.findings-acl.407","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Tan Yue","year":"2023","unstructured":"Yue Tan, Bo Wang, Anqi Liu, Dongming Zhao, Kun Huang, Ruifang He, and Yuexian Hou. 2023. Guiding dialogue agents to complex semantic targets by dynamically completing knowledge graph. In Findings of the Association for Computational Linguistics: ACL 2023. 6506\u20136518."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2019.2904691"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.121068"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1134\/S105466182101017X"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.401"},{"key":"e_1_3_1_40_2","doi-asserted-by":"crossref","unstructured":"Yaochen Liu Yazhou Zhang and Dawei Song. 2023. A quantum probability driven framework for joint multi-modal sarcasm sentiment and emotion analysis. IEEE Transactions on Affective Computing 1 (2023) 1\u201315.","DOI":"10.1109\/TAFFC.2023.3279145"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2017.2772959"},{"key":"e_1_3_1_42_2","volume-title":"Proceedings of the 14th International Workshop on Semantic Evaluation, Sep. Association for Computational Linguistics","author":"Sharma Chhavi","year":"2020","unstructured":"Chhavi Sharma, William Paka, Deepesh Bhageria Scott, Amitava Das, Soujanya Poria, Tanmoy Chakraborty, and Bj\u00f6rn Gamb\u00e4ck. 2020. Task report: Memotion analysis 1.0@ semeval 2020: The visuo-lingual metaphor. In Proceedings of the 14th International Workshop on Semantic Evaluation, Sep. Association for Computational Linguistics."},{"key":"e_1_3_1_43_2","first-page":"3852","volume-title":"Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Cramer Aurora Linh","year":"2019","unstructured":"Aurora Linh Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello. 2019. Look, listen, and learn more: Design choices for deep audio embeddings. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3852\u20133856."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-020-05102-3"},{"key":"e_1_3_1_45_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105\u20136114."},{"key":"e_1_3_1_46_2","doi-asserted-by":"crossref","unstructured":"Amir Zadeh Minghai Chen Soujanya Poria Erik Cambria and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing . 1103\u20131114.","DOI":"10.18653\/v1\/D17-1115"},{"key":"e_1_3_1_47_2","first-page":"3930","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Pramanick Shraman","year":"2022","unstructured":"Shraman Pramanick, Aniket Roy, and Vishal M. Patel. 2022. Multimodal learning using optimal transport for sarcasm and humor detection. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 3930\u20133940."},{"key":"e_1_3_1_48_2","first-page":"6558","volume-title":"Proceedings of the Conference Association for Computational Linguistics. Meeting","volume":"2019","author":"Tsai Yao-Hung Hubert","year":"2019","unstructured":"Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Conference Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351034"},{"key":"e_1_3_1_50_2","doi-asserted-by":"crossref","unstructured":"Yazhou Zhang Ao Jia Bo Wang Peng Zhang Dongming Zhao Pu Li Yuexian Hou Xiaojia Jin Dawei Song and Jing Qin. 2023. M3gat: A multi-modal multi-task interactive graph attention network for conversational sentiment analysis and emotion recognition. ACM Transactions on Information Systems 42 1 (2023) 1\u201332.","DOI":"10.1145\/3593583"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-020-09807-4"},{"key":"e_1_3_1_52_2","doi-asserted-by":"crossref","unstructured":"George-Alexandru Vlad George-Eduard Zaharia Dumitru-Clementin Cercel Costin Chiru and Stefan Trausan Matu. 2020. Upb at semeval-2020 task 8: Joint textual and visual modeling in a multi-task learning architecture for memotion analysis. In Proceedings of the Fourteenth Workshop on Semantic Evaluation . 1208\u20131214.","DOI":"10.18653\/v1\/2020.semeval-1.160"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3635311","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3635311","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:21Z","timestamp":1750178181000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3635311"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,11]]},"references-count":51,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3635311"],"URL":"https:\/\/doi.org\/10.1145\/3635311","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2024,1,11]]},"assertion":[{"value":"2023-09-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}