{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T04:27:38Z","timestamp":1729225658065,"version":"3.27.0"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"type":"electronic","value":"9781643685489"}],"license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,16]]},"abstract":"<jats:p>Pre-trained Large Language Models (LLMs) have demonstrated prominent generalization to various linguistic tasks. However, due to the inherent modality and task discrepancy, parameter-efficient transfer learning for adapting LLMs to vision-language (VL) tasks remains challenging, which may struggle with excessive extra computation and data expenditure for VL pre-training and disconnection between multi-modal representations. This paper concentrates on the parameter-efficient adaptation of LLMs to VL tasks without inflexible multi-modal alignment pre-training on additional image-text pairs. Inspired by Instruction Tuning and the nature of multi-modal representation learning, we propose Multi-modal Prompt Tuning for Language Models (MPT4LM). This method provides text-relevant visual prompts via a plug-and-play Cross-Attention module and integrates them with textual Learnable Instruction as multi-modal prompts into LLMs. We further assemble MPT4LM with the currently prevalent Adapter approach to alleviate the trainable parameter scale and facilitate the collaboration of multi-modal prompts. We evaluate MPT4LM upon two representative LLMs: LLAMA-2 and Flan-T5, over two VL tasks: Visual Question Answering (VQAv2.0, GQA) and Visual Entailment (SNLI-VE). Extensive experimental results reveal that MPT4LM achieves state-of-the-art performance among prompting methods with only fine-tuning about 0.65% of the parameters of backbones, indicating a better trade-off between computation and data overhead and model performance. Our code is available at: https:\/\/github.com\/YzM1a0\/MPT4LM.<\/jats:p>","DOI":"10.3233\/faia240515","type":"book-chapter","created":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T12:43:54Z","timestamp":1729169034000},"source":"Crossref","is-referenced-by-count":0,"title":["MPT4LM: Multi-Modal Prompt Tuning Makes Pre-Trained Large Language Models Better Vision-Language Learners"],"prefix":"10.3233","author":[{"given":"Yongzhu","family":"Miao","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology, Changsha, P.R. China, miaoyz@nudt.edu.cn, tangjintao@nudt.edu.cn, shashali@nudt.edu.cn, tingwang@nudt.edu.cn"}]},{"given":"Jintao","family":"Tang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology, Changsha, P.R. China, miaoyz@nudt.edu.cn, tangjintao@nudt.edu.cn, shashali@nudt.edu.cn, tingwang@nudt.edu.cn"}]},{"given":"Shasha","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology, Changsha, P.R. China, miaoyz@nudt.edu.cn, tangjintao@nudt.edu.cn, shashali@nudt.edu.cn, tingwang@nudt.edu.cn"}]},{"given":"Ting","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology, Changsha, P.R. China, miaoyz@nudt.edu.cn, tangjintao@nudt.edu.cn, shashali@nudt.edu.cn, tingwang@nudt.edu.cn"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2024"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA240515","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T12:43:55Z","timestamp":1729169035000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA240515"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,16]]},"ISBN":["9781643685489"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia240515","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"type":"print","value":"0922-6389"},{"type":"electronic","value":"1879-8314"}],"subject":[],"published":{"date-parts":[[2024,10,16]]}}}