{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T18:45:28Z","timestamp":1767033928820,"version":"3.48.0"},"publisher-location":"New York, NY, USA","reference-count":20,"publisher":"ACM","funder":[{"name":"Japan Science and Technology Agency (JST) and A*STAR","award":["R24I6IR136"],"award-info":[{"award-number":["R24I6IR136"]}]},{"name":"National Research Foundation, Singapore","award":["AISG2-GC-2022-005 & DesCartes"],"award-info":[{"award-number":["AISG2-GC-2022-005 & DesCartes"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,13]]},"DOI":"10.1145\/3747327.3764895","type":"proceedings-article","created":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:04:34Z","timestamp":1760191474000},"page":"185-189","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Contextualized Visual Storytelling for Conversational Chatbot in Education"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3012-9222","authenticated-orcid":false,"given":"Hui Li","family":"Tan","sequence":"first","affiliation":[{"name":"Institute for Infocomm Research (I2R), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6501-302X","authenticated-orcid":false,"given":"Ying","family":"Gu","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3758-1975","authenticated-orcid":false,"given":"Liyuan","family":"Li","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8123-8982","authenticated-orcid":false,"given":"Mei Chee","family":"Leong","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0872-5877","authenticated-orcid":false,"given":"Nancy F.","family":"Chen","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research (I2R), Agency for Science, Technology, and Research (A*STAR), Singapore, Singapore and College of Computing and Data Science (CCDS), Nanyang Technological University (NTU), Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2025,10,12]]},"reference":[{"key":"e_1_3_3_2_2_2","unstructured":"[n. d.]. ChatGPT. https:\/\/openai.com\/index\/chatgpt\/. Accessed: 2025-07-02."},{"key":"e_1_3_3_2_3_2","unstructured":"[n. d.]. Claude. https:\/\/claude.ai\/new. Accessed: 2025-07-02."},{"key":"e_1_3_3_2_4_2","unstructured":"Mohsen Balavar Wenli Yang David Herbert and Soonja Yeom. 2025. Enhancing tutoring systems by leveraging tailored promptings and domain knowledge with Large Language Models. arxiv:https:\/\/arXiv.org\/abs\/2505.02849\u00a0[cs.CY] https:\/\/arxiv.org\/abs\/2505.02849"},{"key":"e_1_3_3_2_5_2","first-page":"65","volume-title":"Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization, Jade Goldstein, Alon Lavie, Chin-Yew Lin, and Clare Voss (Eds.). Association for Computational Linguistics, Ann Arbor, Michigan, 65\u201372. https:\/\/aclanthology.org\/W05-0909\/"},{"key":"e_1_3_3_2_6_2","volume-title":"Bringing words to life: Robust vocabulary instruction","author":"Beck Isabel\u00a0L","year":"2013","unstructured":"Isabel\u00a0L Beck, Margaret\u00a0G McKeown, and Linda Kucan. 2013. Bringing words to life: Robust vocabulary instruction. Guilford Press."},{"key":"e_1_3_3_2_7_2","unstructured":"Mehar Bhatia Sahithya Ravi Aditya Chinchure Eunjeong Hwang and Vered Shwartz. 2024. From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models. arxiv:https:\/\/arXiv.org\/abs\/2407.00263\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2407.00263"},{"key":"e_1_3_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511815355"},{"key":"e_1_3_3_2_9_2","unstructured":"Shudong Liu Yiqiao Jin Cheng Li Derek\u00a0F. Wong Qingsong Wen Lichao Sun Haipeng Chen Xing Xie and Jindong Wang. 2025. CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries. arxiv:https:\/\/arXiv.org\/abs\/2501.01282\u00a0[cs.AI] https:\/\/arxiv.org\/abs\/2501.01282"},{"key":"e_1_3_3_2_10_2","unstructured":"Shravan Nayak Kanishk Jain Rabiul Awal Siva Reddy Sjoerd van Steenkiste Lisa\u00a0Anne Hendricks Karolina Sta\u0144czak and Aishwarya Agrawal. 2024. Benchmarking Vision Language Models for Cultural Understanding. arxiv:https:\/\/arXiv.org\/abs\/2407.10920\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2407.10920"},{"key":"e_1_3_3_2_11_2","unstructured":"Teresa Wai\u00a0See Ong and Quevada\u00a0Hannah Rocafort. 2023. The Bilingual Dreams: Reasons to Learn Mother Tongues and Practical Tips. https:\/\/singteach.nie.edu.sg\/2023\/04\/28\/the-bilingual-dreams-reasons-to-learn-mother-tongues-and-practical-tips\/"},{"key":"e_1_3_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.797"},{"key":"e_1_3_3_2_14_2","unstructured":"Quan Sun Yuxin Fang Ledell Wu Xinlong Wang and Yue Cao. 2023. EVA-CLIP: Improved Training Techniques for CLIP at Scale. arxiv:https:\/\/arXiv.org\/abs\/2303.15389\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2303.15389"},{"key":"e_1_3_3_2_15_2","unstructured":"Ramakrishna Vedantam C.\u00a0Lawrence Zitnick and Devi Parikh. 2015. CIDEr: Consensus-based Image Description Evaluation. arxiv:https:\/\/arXiv.org\/abs\/1411.5726\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/1411.5726"},{"key":"e_1_3_3_2_16_2","volume-title":"Mind in society: The development of higher psychological processes","author":"Vygotsky L.S.","year":"1978","unstructured":"L.S. Vygotsky. 1978. Mind in society: The development of higher psychological processes. Harvard University Press, Cambridge, MA."},{"key":"e_1_3_3_2_17_2","unstructured":"Jialian Wu Jianfeng Wang Zhengyuan Yang Zhe Gan Zicheng Liu Junsong Yuan and Lijuan Wang. 2022. GRiT: A Generative Region-to-text Transformer for Object Understanding. arxiv:https:\/\/arXiv.org\/abs\/2212.00280\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2212.00280"},{"key":"e_1_3_3_2_18_2","unstructured":"Zhiyu Wu Xiaokang Chen Zizheng Pan Xingchao Liu Wen Liu Damai Dai Huazuo Gao Yiyang Ma Chengyue Wu Bingxuan Wang Zhenda Xie Yu Wu Kai Hu Jiawei Wang Yaofeng Sun Yukun Li Yishi Piao Kang Guan Aixin Liu Xin Xie Yuxiang You Kai Dong Xingkai Yu Haowei Zhang Liang Zhao Yisong Wang and Chong Ruan. 2024. DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding. arxiv:https:\/\/arXiv.org\/abs\/2412.10302\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2412.10302"},{"key":"e_1_3_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2024\/180"},{"key":"e_1_3_3_2_20_2","unstructured":"Yuzhong Zhao Yue Liu Zonghao Guo Weijia Wu Chen Gong Fang Wan and Qixiang Ye. 2024. ControlCap: Controllable Region-level Captioning. arxiv:https:\/\/arXiv.org\/abs\/2401.17910\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2401.17910"},{"key":"e_1_3_3_2_21_2","unstructured":"Jinguo Zhu Weiyun Wang Zhe Chen Zhaoyang Liu Shenglong Ye Lixin Gu Hao Tian Yuchen Duan Weijie Su Jie Shao Zhangwei Gao Erfei Cui Xuehui Wang Yue Cao Yangzhou Liu Xingguang Wei Hongjie Zhang Haomin Wang Weiye Xu Hao Li Jiahao Wang Nianchen Deng Songze Li Yinan He Tan Jiang Jiapeng Luo Yi Wang Conghui He Botian Shi Xingcheng Zhang Wenqi Shao Junjun He Yingtong Xiong Wenwen Qu Peng Sun Penglong Jiao Han Lv Lijun Wu Kaipeng Zhang Huipeng Deng Jiaye Ge Kai Chen Limin Wang Min Dou Lewei Lu Xizhou Zhu Tong Lu Dahua Lin Yu Qiao Jifeng Dai and Wenhai Wang. 2025. InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models. arxiv:https:\/\/arXiv.org\/abs\/2504.10479\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2504.10479"}],"event":{"name":"ICMI Companion '25: Companion Proceedings of the 27th International Conference on Multimodal Interaction","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Canberra Australia","acronym":"ICMI Companion '25"},"container-title":["Companion Proceedings of the 27th International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3747327.3764895","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T21:08:01Z","timestamp":1765919281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3747327.3764895"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,12]]},"references-count":20,"alternative-id":["10.1145\/3747327.3764895","10.1145\/3747327"],"URL":"https:\/\/doi.org\/10.1145\/3747327.3764895","relation":{},"subject":[],"published":{"date-parts":[[2025,10,12]]},"assertion":[{"value":"2025-10-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}