{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T12:30:09Z","timestamp":1764937809198,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100018532","name":"Major Scientific and Technological Innovation Project of Shandong Province","doi-asserted-by":"publisher","award":["Grant No. 2021ZLGX05 and Grant No. 2020CXGC010705"],"award-info":[{"award-number":["Grant No. 2021ZLGX05 and Grant No. 2020CXGC010705"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100018532","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Natural Science Foundation of Shandong Province","award":["Grant No. ZR2023QF154"],"award-info":[{"award-number":["Grant No. ZR2023QF154"]}]},{"name":"National Key R&D Program of China","award":["Grant No. 2023YFB3307500"],"award-info":[{"award-number":["Grant No. 2023YFB3307500"]}]},{"name":"The Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University","award":["Grant No. MMC202420"],"award-info":[{"award-number":["Grant No. MMC202420"]}]},{"name":"Special Funding Program of Shandong Taishan Scholars Project"},{"DOI":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["Grant No. 62306087"],"award-info":[{"award-number":["Grant No. 62306087"]}],"id":[{"id":"10.13039\/https:\/\/doi.org\/10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,28]]},"DOI":"10.1145\/3664647.3681589","type":"proceedings-article","created":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T06:59:49Z","timestamp":1729925989000},"page":"3219-3228","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Can We Debias Multimodal Large Language Models via Model Editing?"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-4688-901X","authenticated-orcid":false,"given":"Zecheng","family":"Wang","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1929-2407","authenticated-orcid":false,"given":"Xinye","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9670-4621","authenticated-orcid":false,"given":"Zhanyue","family":"Qin","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8419-0109","authenticated-orcid":false,"given":"Chunshan","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8800-4513","authenticated-orcid":false,"given":"Zhiying","family":"Tu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2973-7252","authenticated-orcid":false,"given":"Dianhui","family":"Chu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5200-2265","authenticated-orcid":false,"given":"Dianbo","family":"Sui","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, WeiHai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,10,28]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.114"},{"key":"e_1_3_2_1_2_1","unstructured":"Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katherine Millican Malcolm Reynolds et al. 2022. Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems Vol. 35 (2022) 23716--23736."},{"key":"e_1_3_2_1_3_1","volume-title":"Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, and Max Bain.","author":"Berg Hugo","year":"2022","unstructured":"Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, and Max Bain. 2022. A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning. arxiv: 2203.11933 [cs.LG]"},{"key":"e_1_3_2_1_4_1","volume-title":"Language Models are Few-Shot Learners. arxiv","author":"Brown Tom B.","year":"2005","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv: 2005.14165 [cs.CL]"},{"key":"e_1_3_2_1_5_1","volume-title":"Counterfactual Samples Synthesizing for Robust Visual Question Answering. arxiv","author":"Chen Long","year":"2003","unstructured":"Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. 2020. Counterfactual Samples Synthesizing for Robust Visual Question Answering. arxiv: 2003.06576 [cs.CV]"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Yunliang Chen and Jungseock Joo. 2021. Understanding and Mitigating Annotation Bias in Facial Expression Recognition. arxiv: 2108.08504 [cs.CV]","DOI":"10.1109\/ICCV48922.2021.01471"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Siyuan Cheng Bozhong Tian Qingbin Liu Xi Chen Yongheng Wang Huajun Chen and Ningyu Zhang. 2023. Can We Edit Multimodal Large Language Models-arxiv: 2310.08475 [cs.CL]","DOI":"10.18653\/v1\/2023.emnlp-main.856"},{"key":"e_1_3_2_1_8_1","volume-title":"Xing","author":"Chiang Wei-Lin","year":"2023","unstructured":"Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https:\/\/lmsys.org\/blog\/2023-03--30-vicuna\/"},{"key":"e_1_3_2_1_9_1","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann Parker Schuh Kensen Shi Sasha Tsvyashchenko Joshua Maynez Abhishek Rao Parker Barnes Yi Tay Noam Shazeer Vinodkumar Prabhakaran Emily Reif Nan Du Ben Hutchinson Reiner Pope James Bradbury Jacob Austin Michael Isard Guy Gur-Ari Pengcheng Yin Toju Duke Anselm Levskaya Sanjay Ghemawat Sunipa Dev Henryk Michalewski Xavier Garcia Vedant Misra Kevin Robinson Liam Fedus Denny Zhou Daphne Ippolito David Luan Hyeontaek Lim Barret Zoph Alexander Spiridonov Ryan Sepassi David Dohan Shivani Agrawal Mark Omernick Andrew M. Dai Thanumalayan Sankaranarayana Pillai Marie Pellat Aitor Lewkowycz Erica Moreira Rewon Child Oleksandr Polozov Katherine Lee Zongwei Zhou Xuezhi Wang Brennan Saeta Mark Diaz Orhan Firat Michele Catasta Jason Wei Kathy Meier-Hellstern Douglas Eck Jeff Dean Slav Petrov and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. arxiv: 2204.02311 [cs.CL]"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3261988"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_3_2_1_12_1","unstructured":"Ekberjan Derman. 2021. Dataset Bias Mitigation Through Analysis of CNN Training Scores. arxiv: 2106.14829 [cs.CV]"},{"key":"e_1_3_2_1_13_1","volume-title":"Attenuating Bias in Word Vectors. arxiv","author":"Dev Sunipa","year":"1901","unstructured":"Sunipa Dev and Jeff Phillips. 2019. Attenuating Bias in Word Vectors. arxiv: 1901.07656 [cs.CL]"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_3_2_1_15_1","volume-title":"Fraser and Svetlana Kiritchenko","author":"Kathleen","year":"2024","unstructured":"Kathleen C. Fraser and Svetlana Kiritchenko. 2024. Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel Images. arxiv: 2402.05779 [cs.CY]"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Akshat Gupta Dev Sajnani and Gopala Anumanchipalli. 2024. A Unified Framework for Model Editing. arxiv: 2403.14236 [cs.LG]","DOI":"10.18653\/v1\/2024.findings-emnlp.903"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.199"},{"key":"e_1_3_2_1_20_1","unstructured":"Jianqiang Huang Yu Qin Jiaxin Qi Qianru Sun and Hanwang Zhang. 2021. Deconfounded Visual Grounding. arxiv: 2112.15324 [cs.CV]"},{"key":"e_1_3_2_1_21_1","unstructured":"Zeyu Huang Yikang Shen Xiaofeng Zhang Jie Zhou Wenge Rong and Zhang Xiong. 2023. Transformer-Patcher: One Mistake worth One Neuron. arxiv: 2301.09785 [cs.CL]"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.126"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_2_1_24_1","unstructured":"Jing Yu Koh Daniel Fried and Ruslan Salakhutdinov. 2023. Generating Images with Multimodal Language Models. arxiv: 2305.17216 [cs.CL]"},{"volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV). 3001--3010","author":"Kolling Camila","key":"e_1_3_2_1_25_1","unstructured":"Camila Kolling, Martin More, Nathan Gavenski, Eduardo Pooch, Ot\u00e1vio Parraga, and Rodrigo C. Barros. 2022. Efficient Counterfactual Debiasing for Visual Question Answering. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV). 3001--3010."},{"key":"e_1_3_2_1_26_1","unstructured":"Junnan Li Dongxu Li Silvio Savarese and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv: 2301.12597 [cs.CV]"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i2.20052"},{"key":"e_1_3_2_1_28_1","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong Jae Lee. 2023. Visual Instruction Tuning. arxiv: 2304.08485 [cs.CV]"},{"key":"e_1_3_2_1_29_1","unstructured":"Jun-Yu Ma Jia-Chen Gu Zhen-Hua Ling Quan Liu and Cong Liu. 2023. Untying the Reversal Curse via Bidirectional Language Model Editing. arxiv: 2310.10322 [cs.CL]"},{"key":"e_1_3_2_1_30_1","unstructured":"Shengyu Mao Ningyu Zhang Xiaohan Wang Mengru Wang Yunzhi Yao Yong Jiang Pengjun Xie Fei Huang and Huajun Chen. 2023. Editing Personality for LLMs. arxiv: 2310.02168 [cs.CL]"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00331"},{"key":"e_1_3_2_1_32_1","unstructured":"Kevin Meng David Bau Alex J Andonian and Yonatan Belinkov. 2022. Locating and Editing Factual Associations in GPT. In Advances in Neural Information Processing Systems Alice H. Oh Alekh Agarwal Danielle Belgrave and Kyunghyun Cho (Eds.). https:\/\/openreview.net\/forum?id=-h6WAS6eE4"},{"key":"e_1_3_2_1_33_1","volume-title":"Alex Andonian, Yonatan Belinkov, and David Bau.","author":"Meng Kevin","year":"2023","unstructured":"Kevin Meng, Arnab Sen Sharma, Alex Andonian, Yonatan Belinkov, and David Bau. 2023. Mass-Editing Memory in a Transformer. arxiv: 2210.07229 [cs.CL]"},{"key":"e_1_3_2_1_34_1","volume-title":"Manning","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. 2022. Fast Model Editing at Scale. arxiv: 2110.11309 [cs.LG]"},{"key":"e_1_3_2_1_35_1","volume-title":"Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"15831","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. 2022. Memory-Based Model Editing at Scale. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 15817--15831. https:\/\/proceedings.mlr.press\/v162\/mitchell22a.html"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2981912"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3041503"},{"key":"e_1_3_2_1_39_1","volume-title":"Gabriel S Sim oes, and Rodrigo C Barros","author":"Parraga Ot\u00e1vio","year":"2022","unstructured":"Ot\u00e1vio Parraga, Martin D More, Christian M Oliveira, Nathan S Gavenski, Lucas S Kupssinsk\u00fc, Adilson Medronha, Luis V Moura, Gabriel S Sim oes, and Rodrigo C Barros. 2022. Debiasing methods for fairer neural models in vision and language research: A survey. arXiv preprint arXiv:2211.05617 (2022)."},{"key":"e_1_3_2_1_40_1","volume-title":"Debiasing Multimodal Models via Causal Information Minimization. arXiv preprint arXiv:2311.16941","author":"Patil Vaidehi","year":"2023","unstructured":"Vaidehi Patil, Adyasha Maharana, and Mohit Bansal. 2023. Debiasing Multimodal Models via Causal Information Minimization. arXiv preprint arXiv:2311.16941 (2023)."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_1_42_1","unstructured":"Tiago Salvador Stephanie Cairns Vikram Voleti Noah Marshall and Adam Oberman. 2022. FairCal: Fairness Calibration for Face Verification. arxiv: 2106.03761 [cs.CV]"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Ashish Seth Mayur Hemani and Chirag Agarwal. 2023. DeAR: Debiasing Vision-Language Models with Additive Residuals. arxiv: 2303.10431 [cs.CV]","DOI":"10.1109\/CVPR52729.2023.00659"},{"key":"e_1_3_2_1_44_1","volume-title":"Editable Neural Networks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=HJedXaEtvS","author":"Sinitsin Anton","year":"2020","unstructured":"Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Pyrkin, Sergei Popov, and Artem Babenko. 2020. Editable Neural Networks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=HJedXaEtvS"},{"key":"e_1_3_2_1_45_1","unstructured":"Chenmien Tan Ge Zhang and Jie Fu. 2024. Massive Editing for Large Language Models via Meta Learning. arxiv: 2311.04661 [cs.CL]"},{"key":"e_1_3_2_1_46_1","volume-title":"Mitigating Gender Bias in Captioning Systems. arxiv","author":"Tang Ruixiang","year":"2006","unstructured":"Ruixiang Tang, Mengnan Du, Yuening Li, Zirui Liu, Na Zou, and Xia Hu. 2021. Mitigating Gender Bias in Captioning Systems. arxiv: 2006.08315 [cs.CV]"},{"key":"e_1_3_2_1_47_1","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv: 2302.13971 [cs.CL]"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Jialu Wang Yang Liu and Xin Eric Wang. 2021. Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search. arxiv: 2109.05433 [cs.CV]","DOI":"10.18653\/v1\/2021.emnlp-main.151"},{"key":"e_1_3_2_1_49_1","unstructured":"Jianhao Yan Futing Wang Yafu Li and Yue Zhang. 2024. Potential and Challenges of Model Editing for Social Debiasing. arxiv: 2402.13462 [cs.CL]"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3382507.3418889"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Xu Yang Hanwang Zhang Guojun Qi and Jianfei Cai. 2021. Causal Attention for Vision-Language Tasks. arxiv: 2103.03493 [cs.CV]","DOI":"10.1109\/CVPR46437.2021.00972"},{"key":"e_1_3_2_1_52_1","unstructured":"Shukang Yin Chaoyou Fu Sirui Zhao Ke Li Xing Sun Tong Xu and Enhong Chen. 2023. A Survey on Multimodal Large Language Models. arxiv: 2306.13549 [cs.CV]"},{"key":"e_1_3_2_1_53_1","unstructured":"Ningyu Zhang Yunzhi Yao Bozhong Tian Peng Wang Shumin Deng Mengru Wang Zekun Xi Shengyu Mao Jintian Zhang Yuansheng Ni Siyuan Cheng Ziwen Xu Xin Xu Jia-Chen Gu Yong Jiang Pengjun Xie Fei Huang Lei Liang Zhiqiang Zhang Xiaowei Zhu Jun Zhou and Huajun Chen. 2024. A Comprehensive Study of Knowledge Editing for Large Language Models. arxiv: 2401.01286 [cs.CL]"},{"key":"e_1_3_2_1_54_1","volume-title":"Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer.","author":"Zhang Susan","year":"2022","unstructured":"Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. 2022. OPT: Open Pre-trained Transformer Language Models. arxiv: 2205.01068 [cs.CL]"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.296"},{"key":"e_1_3_2_1_56_1","unstructured":"Kaizhi Zheng Xuehai He and Xin Eric Wang. 2024. MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens. arxiv: 2310.02239 [cs.CV]"},{"key":"e_1_3_2_1_57_1","volume-title":"Proceedings of the 2nd Conference of the Asia-Pacific","author":"Zhou Kankan","year":"2022","unstructured":"Kankan Zhou, Eason Lai, and Jing Jiang. 2022. VLStereoSet: A Study of Stereotypical Bias in Pre-trained Vision-Language Models. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Yulan He, Heng Ji, Sujian Li, Yang Liu, and Chua-Hui Chang (Eds.). Association for Computational Linguistics, Online only, 527--538. https:\/\/aclanthology.org\/2022.aacl-main.40"},{"key":"e_1_3_2_1_58_1","unstructured":"Deyao Zhu Jun Chen Xiaoqian Shen Xiang Li and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arxiv: 2304.10592 [cs.CV]"}],"event":{"name":"MM '24: The 32nd ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Melbourne VIC Australia","acronym":"MM '24"},"container-title":["Proceedings of the 32nd ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3681589","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664647.3681589","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:49Z","timestamp":1750295869000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664647.3681589"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,28]]},"references-count":57,"alternative-id":["10.1145\/3664647.3681589","10.1145\/3664647"],"URL":"https:\/\/doi.org\/10.1145\/3664647.3681589","relation":{},"subject":[],"published":{"date-parts":[[2024,10,28]]},"assertion":[{"value":"2024-10-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}