{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T04:48:20Z","timestamp":1773550100981,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"Science and Technology Development Fund of Macau, Macau SAR","award":["0035\/2023\/ITP1 and 0021\/2023\/RIA1"],"award-info":[{"award-number":["0035\/2023\/ITP1 and 0021\/2023\/RIA1"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,3,31]]},"abstract":"<jats:p>With the rapid advancement of large-scale model technology, Visual Question Answering (VQA)\u2014a core subfield of multimodal research\u2014increasingly relies on these models to address complex challenges. This trend is especially evident in Knowledge-based VQA (KB-VQA), which requires integrating external knowledge. While most studies approach KB-VQA using explicit or implicit knowledge bases, recent studies employ in-context learning to guide large language models (LLMs) with implicit knowledge (e.g., PICa and Prophet). However, existing sample selection strategies for in-context learning are oversimplified and fail to adequately leverage the tacit knowledge encoded within LLMs. To address this limitation, we propose an adaptive sample selection strategy that integrates triple similarity calculations (question-image, question-caption, and question-pre-answer) and dynamically assembles the most relevant samples using weighted combinations, thereby effectively activating the large model\u2019s implicit knowledge. To evaluate the performance of our proposed approach, we conducted experiments on benchmark datasets. Results demonstrate that our method (PLMAS) achieves state-of-the-art performance on both the OK-VQA and A-OKVQA datasets.<\/jats:p>","DOI":"10.1145\/3777476","type":"journal-article","created":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T16:05:03Z","timestamp":1763568303000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["PLMAS: Adaptive Sample Selection for Prompting LLMs in Knowledge-Based Visual Question Answering"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-1701-7496","authenticated-orcid":false,"given":"Jian","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Macau University of Science and Technology, Macau, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4354-8371","authenticated-orcid":false,"given":"Quanxing","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Macau University of Science and Technology, Macau, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8313-5749","authenticated-orcid":false,"given":"Ling","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Macau University of Science and Technology, Macau, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8153-9977","authenticated-orcid":false,"given":"Feifei","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1769-6126","authenticated-orcid":false,"given":"Rubing","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Macau University of Science and Technology, Macau, China and Macau University of Science and Technology Zhuhai MUST Science and Technology Research Institute, Zhuhai, China"}]}],"member":"320","published-online":{"date-parts":[[2026,2,27]]},"reference":[{"key":"e_1_3_1_2_1","first-page":"23716","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing System","author":"Alayrac Jean-Baptiste","year":"2022","unstructured":"Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: A visual language model for few-shot learning. In Proceedings of the 36th International Conference on Neural Information Processing System, 23716\u201323736."},{"key":"e_1_3_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3495883"},{"key":"e_1_3_1_5_1","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan et al. 2024. The Llama 3 herd of models. arXiv:2407.21783. Retrieved from https:\/\/arxiv.org\/abs\/2407.21783"},{"key":"e_1_3_1_6_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1469-1809.1936.tb02137.x"},{"key":"e_1_3_1_7_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.70"},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01046"},{"key":"e_1_3_1_9_1","unstructured":"Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi et al. 2025. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv:2501.12948. Retrieved from https:\/\/arxiv.org\/abs\/2501.12948"},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00277"},{"key":"e_1_3_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20059-5_38"},{"key":"e_1_3_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.488"},{"key":"e_1_3_1_13_1","first-page":"10560","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Lin Yuanze","year":"2022","unstructured":"Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, and Lu Yuan. 2022. REVIVE: Regional visual representation matters in knowledge-based visual question answering. In Proceedings of the 36th International Conference on Neural Information Processing Systems, 10560\u201310571."},{"key":"e_1_3_1_14_1","unstructured":"Aixin Liu Bei Feng Bing Xue Bingxuan Wang Bochao Wu Chengda Lu Chenggang Zhao Chengqi Deng Chenyu Zhang Chong Ruan et al. 2024. DeepSeek-V3 technical report. arXiv:2412.19437. Retrieved from https:\/\/arxiv.org\/abs\/2412.19437"},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3734220"},{"key":"e_1_3_1_16_1","first-page":"13","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 13\u201323."},{"key":"e_1_3_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01389"},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00331"},{"key":"e_1_3_1_19_1","unstructured":"Thomas Mesnard Cassidy Hardin Robert Dadashi Surya Bhupatiraju Shreya Pathak Laurent Sifre Morgane Rivi\u00e8re Mihir Sanjay Kale Juliette Love Pouya Tafti et al. 2024. Gemma: Open models based on gemini research and technology. arXiv:2403.08295. Retrieved from https:\/\/arxiv.org\/abs\/2403.08295"},{"key":"e_1_3_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.201"},{"key":"e_1_3_1_21_1","unstructured":"Ron Mokady Amir Hertz and Amit H. Bermano. 2021. ClipCap: CLIP prefix for image captioning. arXiv:2111.09734. Retrieved from https:\/\/arxiv.org\/abs\/2111.09734"},{"key":"e_1_3_1_22_1","unstructured":"Morgane Riviere Shreya Pathak Pier Giuseppe Sessa Cassidy Hardin Surya Bhupatiraju L\u00e9onard Hussenot Thomas Mesnard Bobak Shahriari Alexandre Ram\u00e9 Johan Ferret et al. 2024. Gemma 2: Improving open language models at a practical size. arXiv:2408.00118. Retrieved from https:\/\/arxiv.org\/abs\/2408.00118"},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20074-8_9"},{"key":"e_1_3_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01438"},{"key":"e_1_3_1_25_1","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023a. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_1_26_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023b. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_1_27_1","unstructured":"Jianfeng Wang Zhengyuan Yang Xiaowei Hu Linjie Li Lin Kevin Gan Zhe Liu Zicheng Ce Liu and Lijuan Wang. 2022a. GIT: A generative image-to-text transformer for vision and language. arXiv:2205.14100. Retrieved from https:\/\/arxiv.org\/abs\/2205.14100"},{"key":"e_1_3_1_28_1","first-page":"23318","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Wang Peng","year":"2022","unstructured":"Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022b. OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In Proceedings of the 39th International Conference on Machine Learning, 23318\u201323340."},{"key":"e_1_3_1_29_1","unstructured":"Jason Wei Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai and Quoc V. Le. 2021. Finetuned language models are zero-shot learners. arXiv:2109.01652. Retrieved from https:\/\/arxiv.org\/abs\/2109.01652"},{"key":"e_1_3_1_30_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20174"},{"key":"e_1_3_1_31_1","first-page":"1","article-title":"Bool prompt with decomposition and enhancement: Zero-shot VQA based on pvlms","author":"Xu Liyong","year":"2025","unstructured":"Liyong Xu, Yifan Jiao, and Bing-Kun Bao. 2025. Bool prompt with decomposition and enhancement: Zero-shot VQA based on pvlms. ACM Transactions on Multimedia Computing, Communications, and Applications 21, 9 (2025), 260:1\u2013260:21.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2025.126951"},{"key":"e_1_3_1_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20215"},{"key":"e_1_3_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_3_1_35_1","doi-asserted-by":"publisher","DOI":"10.1093\/nsr\/nwae403"},{"key":"e_1_3_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3571819"},{"key":"e_1_3_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3715141"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3777476","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T03:49:54Z","timestamp":1773546594000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777476"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,27]]},"references-count":36,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3,31]]}},"alternative-id":["10.1145\/3777476"],"URL":"https:\/\/doi.org\/10.1145\/3777476","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,27]]},"assertion":[{"value":"2025-07-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}