{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T18:11:42Z","timestamp":1774894302042,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62276073"],"award-info":[{"award-number":["62276073"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Innovation Project of Guangxi Graduate Education","award":["YCBZ2024115"],"award-info":[{"award-number":["YCBZ2024115"]}]},{"name":"Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>Multi-Modal Sarcasm Detection (MSD) aims to combine multiple modal information to identify implicit sarcastic sentiment. However, the significance of commonsense knowledge in implicit emotion recognition has been frequently overlooked. Additionally, the important visual emotions associated with textual modal sarcastic cues are typically dispersed throughout the entire image, making it difficult to accurately focus on crucial sarcastic features. Therefore, we propose a Knowledge-Aware Focused Graph Convolutional Networks (KFGC-Net) to tackle these issues. Specifically, we first construct a cross-modal knowledge-aware graph based on commonsense concepts for each instance. This graph explicitly establishes connections between significant visual sentiments and relevant textual tokens of knowledge. Next, we integrate the transformer encoder optimized with convolutional operations with the Convolutional Block Attention Module to compensate for the model\u2019s lack of attention to important features. Finally, we design a Global Modality Synergistic Fusion (GMSF) block, aiming to model the global relationships in each modality for complementing global sarcasm detection result. Notably, we analyze the proposed framework by testing it on several benchmark datasets, and the results outperform the existing state of the art.<\/jats:p>","DOI":"10.1145\/3722115","type":"journal-article","created":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T20:17:01Z","timestamp":1741378621000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Multi-Modal Sarcasm Detection via Knowledge-Aware Focused Graph Convolutional Networks"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-6461-3266","authenticated-orcid":false,"given":"Xingjie","family":"Zhuang","sequence":"first","affiliation":[{"name":"Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China and Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6913-8223","authenticated-orcid":false,"given":"Fengling","family":"Zhou","sequence":"additional","affiliation":[{"name":"Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China and Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5313-6134","authenticated-orcid":false,"given":"Zhixin","family":"Li","sequence":"additional","affiliation":[{"name":"Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, China and Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China"}]}],"member":"320","published-online":{"date-parts":[[2025,5,22]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2798607"},{"key":"e_1_3_2_4_2","first-page":"574","volume-title":"Proceedings of the 9th International AAAI Conference on Web and Social Media","author":"Bamman David","year":"2015","unstructured":"David Bamman and Noah Smith. 2015. Contextualized sarcasm detection on Twitter. In Proceedings of the 9th International AAAI Conference on Web and Social Media, 574\u2013577."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1239"},{"key":"e_1_3_2_6_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_2_7_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_2_8_2","first-page":"17933","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence","author":"Du Hang","year":"2024","unstructured":"Hang Du, Guoshun Nan, Sicheng Zhang, Binzhu Xie, Junrui Xu, Hehe Fan, Qimei Cui, Xiaofeng Tao, and Xudong Jiang. 2024. DocMSU: A comprehensive benchmark for document-level multimodal sarcasm understanding. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 17933\u201317941."},{"key":"e_1_3_2_9_2","first-page":"2846","volume-title":"Proceedings of 2022 IEEE International Conference on Image Processing","author":"Fu Jinmiao","year":"2022","unstructured":"Jinmiao Fu, Shaoyuan Xu, Huidong Liu, Yang Liu, Ning Xie, Chien-Chih Wang, Jia Liu, Yi Sun, and Bryan Wang. 2022. Cma-clip: Cross-modality attention clip for text-image classification. In Proceedings of 2022 IEEE International Conference on Image Processing, 2846\u20132850."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2005.06.042"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_12_2","first-page":"18354","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence","author":"Jia Mengzhao","year":"2024","unstructured":"Mengzhao Jia, Can Xie, and Liqiang Jing. 2024. Debiasing multimodal sarcasm detection with contrastive learning. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 18354\u201318362."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-2124"},{"key":"e_1_3_2_14_2","unstructured":"Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882. Retrieved from https:\/\/arxiv.org\/abs\/1408.5882"},{"key":"e_1_3_2_15_2","unstructured":"Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907. Retrieved from https:\/\/arxiv.org\/abs\/1609.02907"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3480963"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475190"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.124"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Hui Liu Wenya Wang and Haoliang Li. 2022. Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement. arXiv:2210.03501. Retrieved from https:\/\/arxiv.org\/abs\/2210.03501","DOI":"10.18653\/v1\/2022.emnlp-main.333"},{"key":"e_1_3_2_20_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_21_2","first-page":"871","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Liu Yaochen","year":"2021","unstructured":"Yaochen Liu, Yazhou Zhang, Qiuchi Li, Benyou Wang, and Dawei Song. 2021. What does your smile mean? Jointly detecting multi-modal sarcasm and sentiment using quantum probability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 871\u2013880."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2024.128874"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3715139"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.124"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00062"},{"key":"e_1_3_2_26_2","first-page":"213","volume-title":"Proceedings of the 25th International Conference on Computational Linguistics","author":"Pt\u00e1\u010dek Tom\u00e1\u0161","year":"2014","unstructured":"Tom\u00e1\u0161 Pt\u00e1\u010dek, Ivan Habernal, and Jun Hong. 2014. Sarcasm detection on Czech and English Twitter. In Proceedings of the 25th International Conference on Computational Linguistics, 213\u2013223."},{"key":"e_1_3_2_27_2","first-page":"10834","volume-title":"Proceedings of the 2023 Association for Computational Linguistics","author":"Qin Libo","year":"2023","unstructured":"Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, and Ruifeng Xu. 2023. MMSD2.0: Towards a reliable multi-modal sarcasm detection system. In Proceedings of the 2023 Association for Computational Linguistics, 10834\u201310845."},{"key":"e_1_3_2_28_2","first-page":"8748","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, 8748\u20138763."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-013-0652-8"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964321"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3634706"},{"key":"e_1_3_2_33_2","first-page":"4444","volume-title":"Proceedings of the 31th AAAI Conference on Artificial Intelligence","author":"Speer Robyn","year":"2017","unstructured":"Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the 31th AAAI Conference on Artificial Intelligence, 4444\u20134451."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1093"},{"key":"e_1_3_2_35_2","first-page":"162","volume-title":"Proceedings of the 4th International AAAI Conference on Web and Social Media","author":"Tsur Oren","year":"2010","unstructured":"Oren Tsur, Dmitry Davidov, and Ari Rappoport. 2010. ICWSM\u2014A great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proceedings of the 4th International AAAI Conference on Web and Social Media, 162\u2013169."},{"key":"e_1_3_2_36_2","first-page":"6000","article-title":"Attention is all you need","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, 6000\u20136010.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572915"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"8164","DOI":"10.18653\/v1\/2022.acl-long.562","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Wang Jiquan","year":"2022","unstructured":"Jiquan Wang, Lin Sun, Yi Liu, Meizhi Shao, and Zengwei Zheng. 2022. Multimodal sarcasm target identification in tweets. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8164\u20138175."},{"key":"e_1_3_2_39_2","first-page":"2595","volume-title":"Proceedings of the 36th AAAI Conference on Artificial Intelligence","author":"Wang Yuan","year":"2022","unstructured":"Yuan Wang, Min Cao, Zhenfeng Fan, and Silong Peng. 2022. Learning to detect 3D facial landmarks via heatmap regression with graph convolutional network. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2595\u20132603."},{"key":"e_1_3_2_40_2","first-page":"9151","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence","author":"Wei Yiwei","year":"2024","unstructured":"Yiwei Wei, Shaozu Yuan, Hengyang Zhou, Longbiao Wang, Zhiling Yan, Ruosong Yang, and Meng Chen. 2024. G \\(\\wedge\\) 2SAM: Graph-based global semantic awareness method for multimodal sarcasm detection. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, 9151\u20139159."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00250"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313735"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.349"},{"key":"e_1_3_2_45_2","first-page":"328","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Yang Xiaocui","year":"2021","unstructured":"Xiaocui Yang, Shi Feng, Yifei Zhang, and Daling Wang. 2021. Multimodal sentiment detection based on multi-channel graph neural networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 328\u2013339."},{"key":"e_1_3_2_46_2","first-page":"2449","volume-title":"Proceedings of the 26th International Conference on Computational Linguistics","author":"Zhang Meishan","year":"2016","unstructured":"Meishan Zhang, Yue Zhang, and Guohong Fu. 2016. Tweet sarcasm detection using deep neural network. In Proceedings of the 26th International Conference on Computational Linguistics, 2449\u20132460."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3694973"},{"key":"e_1_3_2_48_2","first-page":"971","volume-title":"Proceedings of the 2024 IEEE International Conference on Data Mining","author":"Zhuang Xingjie","year":"2024","unstructured":"Xingjie Zhuang and Zhixin Li. 2024. Multi-modal sarcasm detection via dual synergetic perception graph convolutional networks. In Proceedings of the 2024 IEEE International Conference on Data Mining, 971\u2013976."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2024.109884"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2025.113029"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627673.3679570"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722115","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3722115","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:50Z","timestamp":1750272230000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722115"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,22]]},"references-count":50,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3722115"],"URL":"https:\/\/doi.org\/10.1145\/3722115","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,22]]},"assertion":[{"value":"2024-06-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}