{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:57:17Z","timestamp":1750309037762,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T00:00:00Z","timestamp":1692835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U22B2059, 62222213, 62072423"],"award-info":[{"award-number":["U22B2059, 62222213, 62072423"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"USTC Research Funds of the Double First-Class Initiative","award":["YD2150002009"],"award-info":[{"award-number":["YD2150002009"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,8,31]]},"abstract":"<jats:p>Recent years have witnessed the booming of online social media platforms with embracing the popular service called \u201cTime-Sync Comment\u201d, which supports the viewers to share their time-sync opinions along with video content. In this way, we observe that numerous semantically-altered terms, or \u201cMemes\u201d, were created by niche users to express their unique ideas and emotions, and further attracted a large group of viewers with better activity and enthusiasm. Unfortunately, since the memes were created based on domain-specific knowledge and semantically varied depending on the multimodal context in videos, newcomers may fail to comprehend the semantic connotation of memes, which may severely impair their user-experiences. To deal with this issue, in this article, we propose a novel meme explanation framework, called ProMDE, to automatically capture and comprehend the memes in time-sync comments, which could further benefit the viewers with meme explanation service. Specifically, we first iteratively reconstruct the original time-sync comments compared with visual embedding to detect the semantically-altered terms as meme candidates. Afterward, based on the guides from the domain-specific corpus, visual and textual features will be fused to represent the context-aware multimodal cues. Moreover, to accurately describe the commonly-seen homophones in memes, i.e., they have the same pronunciation but different word-spelling expressions, we integrate the phonetic symbols as an additional modality to enhance the framework. Finally, we utilize a Transformer-based decoder to generate the natural language explanation for captured memes. Extensive experiments on a large real-world dataset prove that our framework could significantly outperform several state-of-the-art baseline methods, demonstrating the efficacy of modeling multimodal context and pronunciation for meme detection and explanation.<\/jats:p>","DOI":"10.1145\/3612920","type":"journal-article","created":{"date-parts":[[2023,8,2]],"date-time":"2023-08-02T12:10:36Z","timestamp":1690978236000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Comprehending the Gossips: Meme Explanation in Time-Sync Video Comment via Multimodal Cues"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-7453-5781","authenticated-orcid":false,"given":"Zheyong","family":"Xie","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1986-9710","authenticated-orcid":false,"given":"Weidong","family":"He","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4246-5386","authenticated-orcid":false,"given":"Tong","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3206-6827","authenticated-orcid":false,"given":"Shiwei","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Data Science, University of Science and Technology of China, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4817-482X","authenticated-orcid":false,"given":"Chen","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Management, University of Science and Technology of China, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2642-3652","authenticated-orcid":false,"given":"Ping","family":"Yang","sequence":"additional","affiliation":[{"name":"Alibaba Inc., China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4835-4102","authenticated-orcid":false,"given":"Enhong","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, China"}]}],"member":"320","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"10","volume-title":"Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV)","author":"Amirian Soheyla","year":"2019","unstructured":"Soheyla Amirian, Khaled Rasheed, Thiab R. Taha, and Hamid R. Arabnia. 2019. A short review on image caption generation with deep learning. In Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 10\u201318."},{"key":"e_1_3_2_3_2","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). International Conference on Learning Representations 2015, San Diego."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_5_2","volume-title":"The Selfish Gene (New Ed.)","author":"Dawkins Richard","year":"1989","unstructured":"Richard Dawkins. 1989. The Selfish Gene (New Ed.). Oxford University Press, Oxford, New York."},{"key":"e_1_3_2_6_2","first-page":"10563","volume-title":"Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022","author":"Desai Poorav","year":"2022","unstructured":"Poorav Desai, Tanmoy Chakraborty, and Md. Shad Akhtar. 2022. Nice perfume. how long did you marinate in it? Multimodal sarcasm explanation. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. AAAI Press, 10563\u201310571."},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.365"},{"key":"e_1_3_2_9_2","first-page":"2672","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada","author":"Goodfellow Ian J.","year":"2014","unstructured":"Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2672\u20132680."},{"key":"e_1_3_2_10_2","unstructured":"Jiaxi Gu Xiaojun Meng Guansong Lu Lu Hou Niu Minzhe Xiaodan Liang Lewei Yao Runhui Huang Wei Zhang Xin Jiang Chunjing XU and Hang Xu. 2022. Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark. In Advances in Neural Information Processing Systems Curran Associates Inc. 26418\u201326431."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1141"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_1"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9 8 (1997) 1735\u20131780.","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_14_2","first-page":"1233","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","year":"2016","unstructured":"Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, and Margaret Mitchell. 2016. Visual storytelling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1233\u20131239."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Kushal Kafle and Christopher Kanan. 2017. Visual question answering: Datasets algorithms and future challenges. Computer Vision and Image Understanding 163 (2017) 3\u201320.","DOI":"10.1016\/j.cviu.2017.06.005"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_35"},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"Peter Koch. 2016. Meaning change and semantic shifts. The Lexical Typology of Semantic Shifts 58 (2016) 21.","DOI":"10.1515\/9783110377675-002"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702349"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2023"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"Yongqi Li Wenjie Li and Liqiang Nie. 2022. Dynamic graph reasoning for conversational open-domain question answering. ACM Transactions on Information Systems (TOIS) 40 4 (2022) 1\u201324.","DOI":"10.1145\/3498557"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3172944.3172966"},{"key":"e_1_3_2_23_2","first-page":"74","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, 74\u201381."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIS.2014.6912126"},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","unstructured":"Maofu Liu Huijun Hu Lingjun Li Yan Yu and Weili Guan. 2020. Chinese image caption generation via visual attention and topic modeling. IEEE Transactions on Cybernetics 52 2 (2020) 1247\u20131257.","DOI":"10.1109\/TCYB.2020.2997034"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Maofu Liu Lingjun Li Huijun Hu Weili Guan and Jing Tian. 2020. Image caption generation with dual attention mechanism. Information Processing & Management 57 2 (2020) 102178.","DOI":"10.1016\/j.ipm.2019.102178"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10383"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-16142-2_32"},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","unstructured":"Guangyi Lv Kun Zhang Le Wu Enhong Chen Tong Xu Qi Liu and Weidong He. 2022. Understanding the users and videos by mining a novel danmu dataset. IEEE Transactions on Big Data 8 2 (2022) 535\u2013551.","DOI":"10.1109\/TBDATA.2019.2950411"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016810"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00091"},{"key":"e_1_3_2_32_2","unstructured":"Ron Mokady Amir Hertz and Amit H. Bermano. 2021. ClipCap: CLIP prefix for image captioning. arXiv:2111.09734. Retrieved from https:\/\/arxiv.org\/abs\/2111.09734"},{"key":"e_1_3_2_33_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_2_34_2","unstructured":"Jesus Perez-Martin Benjamin Bustos and Magdalena Salda\u00f1a. 2020. Semantic search of memes on twitter. arXiv:2002.01462. Retrieved from https:\/\/arxiv.org\/abs\/2002.01462"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1099"},{"key":"e_1_3_2_36_2","first-page":"66","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Shoemark Philippa","year":"2020","unstructured":"Philippa Shoemark, Farhana Ferdousi Liza, Dong Nguyen, Scott Hale, and Barbara McGillivray. 2020. Room to glo: A systematic comparison of semantic change detection approaches with word embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, 66\u201376."},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings","author":"Smith Samuel L.","year":"2017","unstructured":"Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings."},{"key":"e_1_3_2_38_2","first-page":"3104","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 3104\u20133112."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.682"},{"key":"e_1_3_2_40_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017. 5998\u20136008."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623625"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/IIAI-AAI.2014.65"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2757513.2757516"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10753"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.585"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1207"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"Hui Zhong Zaiyi Chen Chuan Qin Zai Huang Vincent W. Zheng Tong Xu and Enhong Chen. 2020. Adam revisited: A weighted past gradients perspective. Front. Comput. Sci. 14 5 (2020).","DOI":"10.1007\/s11704-019-8457-x"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3612920","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3612920","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:29:18Z","timestamp":1750285758000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3612920"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,24]]},"references-count":46,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8,31]]}},"alternative-id":["10.1145\/3612920"],"URL":"https:\/\/doi.org\/10.1145\/3612920","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2023,8,24]]},"assertion":[{"value":"2022-10-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-23","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}