{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T16:39:47Z","timestamp":1781887187168,"version":"3.54.5"},"reference-count":160,"publisher":"Association for Computing Machinery (ACM)","issue":"7","funder":[{"name":"Key R & D Projects of the Ministry of Science and Technology","award":["2022ZD0119100"],"award-info":[{"award-number":["2022ZD0119100"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,5,31]]},"abstract":"<jats:p>\n                    This article surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of\n                    <jats:sc>(instruction, output)<\/jats:sc>\n                    pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users\u2019 objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and application, along with analysis of aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.\n                  <\/jats:p>","DOI":"10.1145\/3777411","type":"journal-article","created":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T11:53:21Z","timestamp":1763380401000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":58,"title":["Instruction Tuning for Large Language Models: A Survey"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0030-8289","authenticated-orcid":false,"given":"Shengyu","family":"Zhang","sequence":"first","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9839-0438","authenticated-orcid":false,"given":"Linfeng","family":"Dong","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6006-3203","authenticated-orcid":false,"given":"Xiaoya","family":"Li","sequence":"additional","affiliation":[{"name":"University of Washington","place":["Seattle, United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5341-3293","authenticated-orcid":false,"given":"Sen","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1261-669X","authenticated-orcid":false,"given":"Xiaofei","family":"Sun","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-8068-2660","authenticated-orcid":false,"given":"Shuhe","family":"Wang","sequence":"additional","affiliation":[{"name":"Peking University","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3851-191X","authenticated-orcid":false,"given":"Jiwei","family":"Li","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-6974-2542","authenticated-orcid":false,"given":"Runyi","family":"Hu","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6595-6650","authenticated-orcid":false,"given":"Tianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University","place":["Singapore, Singapore"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5030-3469","authenticated-orcid":false,"given":"Guoyin","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Group","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2139-8807","authenticated-orcid":false,"given":"Fei","family":"Wu","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,1,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Vaibhav Adlakha Parishad BehnamGhader Xing Han Lu Nicholas Meade and Siva Reddy. 2023. Evaluating correctness and faithfulness of instruction-following models for question answering. arXiv:2307.16877. Retrieved from https:\/\/arxiv.org\/abs\/2307.16877","DOI":"10.1162\/tacl_a_00667"},{"key":"e_1_3_2_3_2","unstructured":"Ebtesam Almazrouei Hamza Alobeidli Abdulaziz Alshamsi Alessandro Cappelli Ruxandra Cojocaru Merouane Debbah Etienne Goffinet et\u00a0al. 2023. Falcon-40B: An open large language model with state-of-the-art performance. Retrieved from https:\/\/huggingface.co\/tiiuae\/falcon-40b-instruct. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_4_2","unstructured":"Anas Awadalla Irena Gao Joshua Gardner Jack Hessel Yusuf Hanafy Wanrong Zhu Kalyani Marathe Yonatan Bitton Samir Gadre Jenia Jitsev et\u00a0al. 2023. OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models. arXiv:2308.01390. Retrieved from https:\/\/arxiv.org\/abs\/2308.01390"},{"key":"e_1_3_2_5_2","unstructured":"Stephen H. Bach Victor Sanh Zheng Xin Yong Albert Webson Colin Raffel Nihal V. Nayak Abheesht Sharma et\u00a0al. 2022. PromptSource: An integrated development environment and repository for natural language prompts. arXiv:2202.01279. Retrieved from https:\/\/arxiv.org\/abs\/2202.01279"},{"key":"e_1_3_2_6_2","unstructured":"Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan et\u00a0al. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862. Retrieved from https:\/\/arxiv.org\/abs\/2204.05862"},{"key":"e_1_3_2_7_2","unstructured":"Yuntao Bai Saurav Kadavath Sandipan Kundu Amanda Askell Jackson Kernion Andy Jones Anna Chen Anna Goldie Azalia Mirhoseini Cameron McKinnon et\u00a0al. 2022. Constitutional AI: Harmlessness from AI feedback. arXiv:2212.08073. Retrieved from https:\/\/arxiv.org\/abs\/2212.08073"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00175"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1609\/icwsm.v14i1.7347"},{"key":"e_1_3_2_10_2","unstructured":"Stella Rose Biderman Hailey Schoelkopf Quentin G. Anthony Herbie Bradley Kyle O\u2019Brien Eric Hallahan Mohammad Aflah Khan Shivanshu Purohit USVSN Sai Prashanth Edward Raff et\u00a0al. 2023. Pythia: A suite for analyzing large language models across training and scaling. arXiv:2304.01373. Retrieved from https:\/\/arxiv.org\/abs\/2304.01373"},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Sid Black Stella Rose Biderman Eric Hallahan Quentin G. Anthony Leo Gao Laurence Golding Horace He Connor Leahy Kyle McDonell Jason Phang et\u00a0al. 2022. GPT-NeoX-20B: An open-source autoregressive language model. arXiv:2204.06745. Retrieved from https:\/\/arxiv.org\/abs\/2204.06745","DOI":"10.18653\/v1\/2022.bigscience-1.9"},{"key":"e_1_3_2_12_2","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ B. Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et\u00a0al. 2021. On the opportunities and risks of foundation models. arXiv:2108.07258. Retrieved from https:\/\/arxiv.org\/abs\/2108.07258"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Tim Brooks Aleksander Holynski and Alexei A. Efros. 2022. InstructPix2Pix: Learning to follow image editing instructions. arXiv:2211.09800. Retrieved from https:\/\/arxiv.org\/abs\/2211.09800","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_3_2_14_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_15_2","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan et\u00a0al. 2020. Language models are few-shot learners. arXiv:2005.14165. Retrieved from https:\/\/arxiv.org\/abs\/2005.14165"},{"key":"e_1_3_2_16_2","unstructured":"Sahil Chaudhary. 2023. Code alpaca: An instruction-following llama model for code generation. Retrieved from https:\/\/github.com\/sahil280114\/codealpaca. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_17_2","unstructured":"Guiming Hardy Chen Shunian Chen Ruifei Zhang Junying Chen Xiangbo Wu Zhiyi Zhang Zhihong Chen Jianquan Li Xiang Wan and Benyou Wang. 2024. ALLaVA: Harnessing GPT4V-synthesized data for a lite vision-language model. arXiv:2402.11684. Retrieved from https:\/\/arxiv.org\/abs\/2402.11684"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Lin Chen Jisong Li Xiaoyi Dong Pan Zhang Conghui He Jiaqi Wang Feng Zhao and Dahua Lin. 2023. Sharegpt4v: Improving large multi-modal models with better captions. arXiv:2311.12793. Retrieved from https:\/\/arxiv.org\/abs\/2311.12793","DOI":"10.1007\/978-3-031-72643-9_22"},{"key":"e_1_3_2_19_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde De Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et\u00a0al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https:\/\/arxiv.org\/abs\/2107.03374"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.835"},{"key":"e_1_3_2_21_2","unstructured":"Zixiang Chen Yihe Deng Huizhuo Yuan Kaixuan Ji and Quanquan Gu. 2024. Self-play fine-tuning converts weak language models to strong language models. arXiv:2401.01335. Retrieved from https:\/\/arxiv.org\/abs\/2401.01335"},{"key":"e_1_3_2_22_2","article-title":"Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality","author":"Chiang Wei-Lin","year":"2023","unstructured":"Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, et\u00a0al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. Retrieved April 14, 2023 from https:\/\/vicuna.lmsys.org, https:\/\/github.com\/lm-sys\/FastChat","journal-title":"Retrieved April 14, 2023 from https:\/\/vicuna.lmsys.org"},{"key":"e_1_3_2_23_2","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham et\u00a0al. 2022. PaLM: Scaling language modeling with pathways. arxiv:2204.02311 [cs.CL]. Retrieved from https:\/\/arxiv.org\/abs\/2204.02311"},{"key":"e_1_3_2_24_2","unstructured":"Hyung Won Chung Le Hou S. Longpre Barret Zoph Yi Tay William Fedus Eric Li et\u00a0al. 2022. Scaling instruction-finetuned language models. arXiv:2210.11416. Retrieved from https:\/\/arxiv.org\/abs\/2210.11416 https:\/\/huggingface.co\/google\/flan-t5-xxl"},{"key":"e_1_3_2_25_2","unstructured":"Christopher Clark Kenton Lee Ming-Wei Chang Tom Kwiatkowski Michael Collins and Kristina Toutanova. 2019. BoolQ: Exploring the surprising difficulty of natural yes\/no questions. arXiv:1905.10044. Retrieved from https:\/\/arxiv.org\/abs\/1905.10044"},{"key":"e_1_3_2_26_2","unstructured":"Peter Clark Isaac Cowhey Oren Etzioni Tushar Khot Ashish Sabharwal Carissa Schoenick and Oyvind Tafjord. 2018. Think you have solved question answering? Try ARC the AI2 reasoning challenge. arXiv:1803.05457. Retrieved from https:\/\/arxiv.org\/abs\/1803.05457"},{"key":"e_1_3_2_27_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse and John Schulman. 2021. Training verifiers to solve math word problems. arXiv:2110.14168. Retrieved from https:\/\/arxiv.org\/abs\/2110.14168"},{"key":"e_1_3_2_28_2","author":"Collective OpenAccess AI","year":"2023","unstructured":"OpenAccess AI Collective. 2023. software: huggingface.co\/openaccess-ai-collective\/minotaur-15b (2023). Retrieved from https:\/\/huggingface.co\/openaccess-ai-collective\/minotaur-15b. (Last accessed date: 2025.12.01).","journal-title":"software: huggingface.co\/openaccess-ai-collective\/minotaur-15b"},{"key":"e_1_3_2_29_2","unstructured":"Mike Conover Matt Hayes Ankit Mathur Xiangrui Meng Jianwei Xie Jun Wan Sam Shah Ali Ghodsi Patrick Wendell Matei Zaharia et\u00a0al. 2023. Free dolly: Introducing the world\u2019s first truly open instruction-tuned LLM. Retrieved from https:\/\/huggingface.co\/datasets\/databricks\/databricks-dolly-15k. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_30_2","unstructured":"Mike Conover Matt Hayes Ankit Mathur Jianwei Xie Jun Wan Sam Shah Ali Ghodsi Patrick Wendell Matei Zaharia and Reynold Xin. 2023. Free Dolly: Introducing the World\u2019s First Truly Open Instruction-Tuned LLM. Retrieved from https:\/\/github.com\/project-baize\/baize-chatbot. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_31_2","unstructured":"Wenliang Dai Junnan Li Dongxu Li Anthony Meng Huat Tiong Junqi Zhao Weisheng Wang Boyang Li Pascale Fung and Steven Hoi. 2023. InstructBLIP: Towards general-purpose vision-language models with instruction tuning. arXiv:2305.06500. Retrieved from https:\/\/arxiv.org\/abs\/2305.06500"},{"key":"e_1_3_2_32_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Dao Tri","year":"2022","unstructured":"Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher R\u00e9. 2022. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In Proceedings of the Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_33_2","unstructured":"Tim Dettmers Artidoro Pagnoni Ari Holtzman and Luke Zettlemoyer. 2023. Qlora: Efficient finetuning of quantized LLMs. arXiv:2305.14314. Retrieved from https:\/\/arxiv.org\/abs\/2305.14314"},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"Ning Ding Yulin Chen Bokai Xu Yujia Qin Zhi Zheng Shengding Hu Zhiyuan Liu Maosong Sun and Bowen Zhou. 2023. Enhancing chat language models by scaling high-quality instructional conversations. arXiv:2305.14233. Retrieved from https:\/\/arxiv.org\/abs\/2305.14233 https:\/\/github.com\/thunlp\/UltraChat#data","DOI":"10.18653\/v1\/2023.emnlp-main.183"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-023-00626-4"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.26"},{"key":"e_1_3_2_37_2","unstructured":"Yann Dubois Bal\u00e1zs Galambosi Percy Liang and Tatsunori B. Hashimoto. 2024. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv:2404.04475. Retrieved from https:\/\/arxiv.org\/abs\/2404.04475"},{"key":"e_1_3_2_38_2","unstructured":"Jon Durbin. 2023. Airoboros. software: https:\/\/github.com\/jondurbin\/airoboros. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2021.12.012"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.5555\/3586589.3586709"},{"key":"e_1_3_2_41_2","unstructured":"Jun Gao Huan Zhao Changlong Yu and Ruifeng Xu. 2023. Exploring the feasibility of ChatGPT for event extraction. arXiv:2303.03836. Retrieved from https:\/\/arxiv.org\/abs\/2303.03836"},{"key":"e_1_3_2_42_2","unstructured":"Tianyu Gao Howard Yen Jiatong Yu and Danqi Chen. 2023. Enabling large language models to generate text with citations. arXiv:2305.14627. Retrieved from https:\/\/arxiv.org\/abs\/2305.14627"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01457"},{"key":"e_1_3_2_44_2","unstructured":"Tao Gong Chengqi Lyu Shilong Zhang Yudong Wang Miao Zheng Qianmengke Zhao Kuikun Liu Wenwei Zhang Ping Luo and Kai Chen. 2023. MultiModal-GPT: A vision and language model for dialogue with humans. arXiv:2305.04790. Retrieved from https:\/\/arxiv.org\/abs\/2305.04790"},{"key":"e_1_3_2_45_2","unstructured":"Arnav Gudibande Eric Wallace Charlie Snell Xinyang Geng Hao Liu Pieter Abbeel Sergey Levine and Dawn Song. 2023. The false promise of imitating proprietary LLMs. arXiv:2305.15717. Retrieved from https:\/\/arxiv.org\/abs\/2305.15717"},{"key":"e_1_3_2_46_2","unstructured":"Suriya Gunasekar Yi Zhang Jyoti Aneja Caio C\u00e9sar Teodoro Mendes Allie Del Giorno Sivakanth Gopi Mojan Javaheripi Piero Kauffmann Gustavo de Rosa Olli Saarikivi et\u00a0al. 2023. Textbooks are all you need. arXiv:2306.11644. Retrieved from https:\/\/arxiv.org\/abs\/2306.11644 https:\/\/huggingface.co\/microsoft\/phi-1"},{"key":"e_1_3_2_47_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. arXiv:2009.03300. Retrieved from https:\/\/arxiv.org\/abs\/2009.03300"},{"key":"e_1_3_2_48_2","unstructured":"Dan Hendrycks Collin Burns Saurav Kadavath Akul Arora Steven Basart Eric Tang Dawn Song and Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset. arXiv:2103.03874. Retrieved from https:\/\/arxiv.org\/abs\/2103.03874"},{"key":"e_1_3_2_49_2","doi-asserted-by":"crossref","unstructured":"Or Honovich Thomas Scialom Omer Levy and Timo Schick. 2022. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv:2212.09689. Retrieved from https:\/\/arxiv.org\/abs\/2212.09689 https:\/\/github.com\/allenai\/natural-instructions-v1","DOI":"10.18653\/v1\/2023.acl-long.806"},{"key":"e_1_3_2_50_2","first-page":"2790","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning, Vol. 97. PMLR, 2790\u20132799."},{"key":"e_1_3_2_51_2","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv:2106.09685. Retrieved from https:\/\/arxiv.org\/abs\/2106.09685"},{"key":"e_1_3_2_52_2","unstructured":"Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv:2212.10403. Retrieved from https:\/\/arxiv.org\/abs\/2212.10403"},{"key":"e_1_3_2_53_2","unstructured":"Hamish Ivison Akshita Bhagia Yizhong Wang Hannaneh Hajishirzi and Matthew E. Peters. 2022. HINT: Hypernetwork instruction tuning for efficient zero-shot generalisation. arXiv:2212.10315. Retrieved from https:\/\/arxiv.org\/abs\/2212.10315"},{"key":"e_1_3_2_54_2","unstructured":"Srinivas Iyer Xiaojuan Lin Ramakanth Pasunuru Todor Mihaylov Daniel Simig Ping Yu Kurt Shuster Tianlu Wang Qing Liu Punit Singh Koura et\u00a0al. 2022. OPT-IML: Scaling language model instruction meta learning through the lens of generalization. arXiv:2212.12017. Retrieved from https:\/\/arxiv.org\/abs\/2212.12017 https:\/\/huggingface.co\/facebook\/opt-iml-30b"},{"key":"e_1_3_2_55_2","unstructured":"JosephusCheung. 2021. Guanaco: Generative Universal Assistant for Natural-language Adaptive Context-aware Omnilingual outputs. Retrieved from https:\/\/huggingface.co\/datasets\/JosephusCheung\/GuanacoDataset. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","unstructured":"Daniel Khashabi Sewon Min Tushar Khot Ashish Sabharwal Oyvind Tafjord Peter Clark and Hannaneh Hajishirzi. 2020. UnifiedQA: Crossing format boundaries with a single QA system. arXiv:2005.00700. Retrieved from https:\/\/arxiv.org\/abs\/2005.00700 https:\/\/github.com\/allenai\/unifiedqa","DOI":"10.18653\/v1\/2020.findings-emnlp.171"},{"key":"e_1_3_2_57_2","unstructured":"Andreas K\u00f6pf Yannic Kilcher Dimitri von R\u00fctte Sotiris Anagnostidis Zhi-Rui Tam Keith Stevens Abdullah Barhoum Nguyen Minh Duc Oliver Stanley Rich\u00e1rd Nagyfi et\u00a0al. 2023. OpenAssistant conversations\u2013democratizing large language model alignment. arXiv:2304.07327. Retrieved from https:\/\/arxiv.org\/abs\/2304.07327 https:\/\/github.com\/LAION-AI\/Open-Assistant"},{"key":"e_1_3_2_58_2","unstructured":"Po-Nien Kung and Nanyun Peng. 2023. Do models really learn to follow instructions? An empirical study of instruction tuning. arXiv:2305.11383. Retrieved from https:\/\/arxiv.org\/abs\/2305.11383"},{"key":"e_1_3_2_59_2","unstructured":"LAION.ai. 2023. Oig: The Open Instruction Generalist dataset. Retrieved from https:\/\/github.com\/LAION-AI\/Open-Instruction-Generalist. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502030"},{"key":"e_1_3_2_61_2","unstructured":"Bo Li Gexiang Fang Yang Yang Quansen Wang Wei Ye Wen Zhao and Shikun Zhang. 2023. Evaluating ChatGPT\u2019s information extraction capabilities: An assessment of performance explainability calibration and faithfulness. arXiv:2304.11633. Retrieved from https:\/\/arxiv.org\/abs\/2304.11633"},{"key":"e_1_3_2_62_2","unstructured":"Bo Li Yuanhan Zhang Liangyu Chen Jinghao Wang Jingkang Yang and Ziwei Liu. 2023. Otter: A multi-modal model with in-context instruction tuning. arXiv:2305.03726. Retrieved from https:\/\/arxiv.org\/abs\/2305.03726"},{"key":"e_1_3_2_63_2","unstructured":"Guohao Li Hasan Abed Al Kader Hammoud Hani Itani Dmitrii Khizbullin and Bernard Ghanem. 2023. CAMEL: Communicative agents for \u201cMind\u201d exploration of large scale language model society. Advances in Neural Information Processing Systems 37 (2023) 51991\u201352008."},{"key":"e_1_3_2_64_2","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.599"},{"key":"e_1_3_2_66_2","unstructured":"Kunchang Li Yinan He Yi Wang Yizhuo Li Wenhai Wang Ping Luo Yali Wang Limin Wang and Yu Qiao. 2023. VideoChat: Chat-centric video understanding. arXiv:2305.06355. Retrieved from https:\/\/arxiv.org\/abs\/2305.06355"},{"key":"e_1_3_2_67_2","unstructured":"Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim et\u00a0al. 2023. StarCoder: May the source be with you! arXiv:2305.06161. Retrieved from https:\/\/arxiv.org\/abs\/2305.06161"},{"key":"e_1_3_2_68_2","unstructured":"Xian Li Ping Yu Chunting Zhou Timo Schick Luke Zettlemoyer Omer Levy Jason Weston and Mike Lewis. 2023. Self-alignment with instruction backtranslation. arXiv:2308.06259. Retrieved from https:\/\/arxiv.org\/abs\/2308.06259"},{"key":"e_1_3_2_69_2","unstructured":"Xuechen Li Tianyi Zhang Yann Dubois Rohan Taori Ishaan Gulrajani Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. AlpacaEval: An automatic evaluator of instruction-following models. Retrieved from https:\/\/github.com\/tatsu-lab\/alpaca_eval. (Last accessed date: 2025.12.01). GitHub repository (5 2023)."},{"key":"e_1_3_2_70_2","unstructured":"Yuanzhi Li S\u00e9bastien Bubeck Ronen Eldan Allie Del Giorno Suriya Gunasekar and Yin Tat Lee. 2023. Textbooks are all you need II: phi-1.5 technical report. arXiv:2309.05463. Retrieved from https:\/\/arxiv.org\/abs\/2309.05463"},{"key":"e_1_3_2_71_2","article-title":"Holistic evaluation of language models","author":"Liang Percy","year":"2022","unstructured":"Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, et\u00a0al. 2022. Holistic evaluation of language models. Annals of the New York Academy of Sciences 1525, 1 (2022), 140\u2013146.","journal-title":"Annals of the New York Academy of Sciences"},{"key":"e_1_3_2_72_2","unstructured":"Bill Yuchen Lin Yuntian Deng Khyathi Chandu Faeze Brahman Abhilasha Ravichander Valentina Pyatkin Nouha Dziri Ronan Le Bras and Yejin Choi. 2024. WILDBENCH: Benchmarking LLMs with challenging tasks from real users in the wild. arXiv:2406.04770. Retrieved from https:\/\/arxiv.org\/abs\/2406.04770"},{"key":"e_1_3_2_73_2","unstructured":"Weixiong Lin Ziheng Zhao Xiaoman Zhang Chaoyi Wu Ya Zhang Yanfeng Wang and Weidi Xie. 2023. PMC-CLIP: Contrastive language-image pre-training using biomedical documents. arXiv:2303.07240. Retrieved from https:\/\/arxiv.org\/abs\/2303.07240"},{"key":"e_1_3_2_74_2","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong Jae Lee. 2023. Visual instruction tuning. arXiv:2304.08485. Retrieved from https:\/\/arxiv.org\/abs\/2304.08485"},{"key":"e_1_3_2_75_2","unstructured":"Hanmeng Liu Zhiyang Teng Leyang Cui Chaoli Zhang Qiji Zhou and Yue Zhang. 2023. LogiCoT: Logical chain-of-thought instruction-tuning data collection with GPT-4. arXiv:2305.12147. Retrieved from https:\/\/arxiv.org\/abs\/2305.12147 https:\/\/github.com\/csitfun\/LogiCoT"},{"key":"e_1_3_2_76_2","unstructured":"Jiachang Liu Dinghan Shen Yizhe Zhang Bill Dolan Lawrence Carin and Weizhu Chen. 2021. What makes good in-context examples for GPT-3? arXiv:2101.06804. Retrieved from https:\/\/arxiv.org\/abs\/2101.06804"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2023.10.006"},{"key":"e_1_3_2_78_2","unstructured":"Shayne Longpre Le Hou Tu Vu Albert Webson Hyung Won Chung Yi Tay Denny Zhou Quoc V. Le Barret Zoph Jason Wei et\u00a0al. 2023. The flan collection: Designing data and methods for effective instruction tuning. arXiv:2301.13688. Retrieved from https:\/\/arxiv.org\/abs\/2301.13688 https:\/\/github.com\/google-research\/FLAN"},{"key":"e_1_3_2_79_2","unstructured":"Ziyang Luo Can Xu Pu Zhao Qingfeng Sun Xiubo Geng Wenxiang Hu Chongyang Tao Jing Ma Qingwei Lin and Daxin Jiang. 2023. WizardCoder: Empowering Code Large Language Models with Evol-Instruct. Retrieved from https:\/\/github.com\/nlpxucan\/WizardLM. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_80_2","doi-asserted-by":"crossref","unstructured":"Kai Lv Yuging Yang Tengxiao Liu Qi jie Gao Qipeng Guo and Xipeng Qiu. 2024. Full parameter fine-tuning for large language models with limited resources. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics 8187\u20138198.","DOI":"10.18653\/v1\/2024.acl-long.445"},{"key":"e_1_3_2_81_2","unstructured":"Swaroop Mishra Daniel Khashabi Chitta Baral and Hannaneh Hajishirzi. 2021. Cross-task generalization via natural language crowdsourcing instructions. arXiv:2104.08773. Retrieved from https:\/\/arxiv.org\/abs\/2104.08773"},{"key":"e_1_3_2_82_2","unstructured":"Arindam Mitra Luciano Del Corro Shweti Mahajan Andres Codas Clarisse Simoes Sahaj Agarwal Xuxi Chen Anastasia Razdaibiedina Erik Jones Kriti Aggarwal et\u00a0al. 2023. Orca 2: Teaching small language models how to reason. arXiv:2311.11045. Retrieved from https:\/\/arxiv.org\/abs\/2311.11045"},{"key":"e_1_3_2_83_2","doi-asserted-by":"crossref","unstructured":"Niklas Muennighoff Thomas Wang Lintang Sutawika Adam Roberts Stella Biderman Teven Le Scao M Saiful Bari Sheng Shen Zheng-Xin Yong Hailey Schoelkopf et\u00a0al. 2022. Crosslingual generalization through multitask finetuning. arXiv:2211.01786. Retrieved from https:\/\/arxiv.org\/abs\/2211.01786 https:\/\/github.com\/bigscience-workshop\/xmtf","DOI":"10.18653\/v1\/2023.acl-long.891"},{"key":"e_1_3_2_84_2","unstructured":"Subhabrata Mukherjee Arindam Mitra Ganesh Jawahar Sahaj Agarwal Hamid Palangi and Ahmed Awadallah. 2023. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv:2306.02707. Retrieved from https:\/\/arxiv.org\/abs\/2306.02707 https:\/\/huggingface.co\/datasets\/Open-Orca\/OpenOrca"},{"key":"e_1_3_2_85_2","unstructured":"NousResearch. 2023. software: https:\/\/huggingface.co\/NousResearch\/Nous-Hermes-13b. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_86_2","unstructured":"OpenAI. 2022. Introducing ChatGPT. Blog post https:\/\/openai.com\/index\/chatgpt. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_87_2","unstructured":"OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_88_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","unstructured":"Arnold Overwijk Chenyan Xiong Xiao Liu Cameron VandenBerg and Jamie Callan. 2022. Clueweb22: 10 billion web documents with visual and semantic information. arXiv:2211.15848. Retrieved from https:\/\/arxiv.org\/abs\/2211.15848","DOI":"10.1145\/3477495.3536321"},{"key":"e_1_3_2_90_2","unstructured":"Guilherme Penedo Quentin Malartic Daniel Hesslow Ruxandra Cojocaru Alessandro Cappelli Hamza Alobeidli Baptiste Pannier Ebtesam Almazrouei and Julien Launay. 2023. The refinedweb dataset for falcon LLM: Outperforming curated corpora with web data and web data only. arXiv:2306.01116. Retrieved from https:\/\/arxiv.org\/abs\/2306.01116"},{"key":"e_1_3_2_91_2","unstructured":"Baolin Peng Chunyuan Li Pengcheng He Michel Galley and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv:2304.03277. Retrieved from https:\/\/arxiv.org\/abs\/2304.03277 https:\/\/github.com\/Instruction-Tuning-with-GPT-4\/GPT-4-LLM"},{"key":"e_1_3_2_92_2","doi-asserted-by":"crossref","unstructured":"Jing Qian Li Dong Yelong Shen Furu Wei and Weizhu Chen. 2022. Controllable natural language generation with contrastive prefixes. arXiv:2202.13257. Retrieved from https:\/\/arxiv.org\/abs\/2202.13257","DOI":"10.18653\/v1\/2022.findings-acl.229"},{"key":"e_1_3_2_93_2","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning."},{"issue":"8","key":"e_1_3_2_94_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.","journal-title":"OpenAI blog"},{"key":"e_1_3_2_95_2","unstructured":"Jack W. Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et\u00a0al. 2021. Scaling language models: Methods analysis & insights from training gopher. arXiv:2112.11446. Retrieved from https:\/\/arxiv.org\/abs\/2112.11446"},{"key":"e_1_3_2_96_2","unstructured":"Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683. Retrieved from https:\/\/arxiv.org\/abs\/1910.10683"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_99_2","unstructured":"Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Teven Le Scao Arun Raja et\u00a0al. 2021. Multitask prompted training enables zero-shot task generalization. arXiv:2110.08207. Retrieved from https:\/\/arxiv.org\/abs\/2110.08207 https:\/\/huggingface.co\/datasets\/bigscience\/P3"},{"key":"e_1_3_2_100_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Elizabeth-Jane Pavlick Suzana Ili\u2019c Daniel Hesslow Roman Castagn\u2019e et\u00a0al. 2022. BLOOM: A 176B-parameter open-access multilingual language model. arXiv:2211.05100. Retrieved from https:\/\/arxiv.org\/abs\/2211.05100"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.20"},{"key":"e_1_3_2_102_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_2_103_2","unstructured":"Aarohi Srivastava Abhinav Rastogi Abhishek Rao Abu Awal Md Shoeb Abubakar Abid Adam Fisch Adam R. Brown et\u00a0al. 2023. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arxiv:2206.04615 [cs.CL]. Retrieved from https:\/\/arxiv.org\/abs\/2206.04615"},{"key":"e_1_3_2_104_2","unstructured":"Aarohi Srivastava Abhinav Rastogi Abhishek Rao Abu Awal Md Shoeb Abubakar Abid Adam Fisch Adam R. Brown Adam Santoro Aditya Gupta Adri\u00e0 Garriga-Alonso et\u00a0al. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv:2206.04615. Retrieved from https:\/\/arxiv.org\/abs\/2206.04615"},{"key":"e_1_3_2_105_2","unstructured":"Weiwei Sun Hengyi Cai Hongshen Chen Pengjie Ren Zhumin Chen Maarten de Rijke and Zhaochun Ren. 2023. Answering ambiguous questions via iterative prompting. arXiv:2307.03897. Retrieved from https:\/\/arxiv.org\/abs\/2307.03897"},{"key":"e_1_3_2_106_2","unstructured":"Xiaofei Sun Linfeng Dong Xiaoya Li Zhen Wan Shuhe Wang Tianwei Zhang Jiwei Li Fei Cheng Lingjuan Lyu Fei Wu et\u00a0al. 2023. Pushing the limits of ChatGPT on NLP tasks. arXiv:2306.09719. Retrieved from https:\/\/arxiv.org\/abs\/2306.09719"},{"key":"e_1_3_2_107_2","unstructured":"Xiaofei Sun Xiaoya Li Jiwei Li Fei Wu Shangwei Guo Tianwei Zhang and Guoyin Wang. 2023. Text classification via large language models. arXiv:2305.08377. Retrieved from https:\/\/arxiv.org\/abs\/2305.08377"},{"key":"e_1_3_2_108_2","doi-asserted-by":"crossref","unstructured":"Mirac Suzgun Nathan Scales Nathanael Sch\u00e4rli Sebastian Gehrmann Yi Tay Hyung Won Chung Aakanksha Chowdhery Quoc V. Le Ed H. Chi Denny Zhou et\u00a0al. 2022. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv:2210.09261. Retrieved from https:\/\/arxiv.org\/abs\/2210.09261","DOI":"10.18653\/v1\/2023.findings-acl.824"},{"issue":"6","key":"e_1_3_2_109_2","first-page":"7","article-title":"Alpaca: A strong, replicable instruction-following model","volume":"3","author":"Taori Rohan","year":"2023","unstructured":"Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https:\/\/crfm.stanford.edu\/2023\/03\/13\/alpaca.html 3, 6 (2023), 7. Retrieved from https:\/\/github.com\/tatsu-lab\/stanford_alpaca","journal-title":"Stanford Center for Research on Foundation Models. https:\/\/crfm.stanford.edu\/2023\/03\/13\/alpaca.html"},{"key":"e_1_3_2_110_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. Retrieved from https:\/\/github.com\/tatsu-lab\/stanford_alpaca. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_111_2","unstructured":"Yi Tay Mostafa Dehghani Vinh Q. Tran Xavier Garcia Jason Wei Xuezhi Wang Hyung Won Chung Dara Bahri Tal Schuster Steven Zheng et\u00a0al. 2023. Ul2: Unifying language learning paradigms. The Eleventh International Conference on Learning Representations (ICLR 2023)."},{"key":"e_1_3_2_112_2","unstructured":"Romal Thoppilan Daniel De Freitas Jamie Hall Noam Shazeer Apoorv Kulshreshtha Heng-Tze Cheng Alicia Jin Taylor Bos Leslie Baker Yu Du et\u00a0al. 2022. Lamda: Language models for dialog applications. arXiv:2201.08239. Retrieved from https:\/\/arxiv.org\/abs\/2201.08239"},{"key":"e_1_3_2_113_2","article-title":"MOSS","author":"Tianxiang Sun","year":"2023","unstructured":"Sun Tianxiang and Qiu Xipeng. 2023. MOSS. Blog post txsun1997.github.io\/blogs\/moss.html (2023). Retrieved from https:\/\/github.com\/OpenLMLab\/MOSS. (Last accessed date: 2025.12.01).","journal-title":"Blog post txsun1997.github.io\/blogs\/moss.html"},{"key":"e_1_3_2_114_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_115_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro et\u00a0al. 2023. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_116_2","doi-asserted-by":"crossref","unstructured":"Zhen Wan Fei Cheng Zhuoyuan Mao Qianying Liu Haiyue Song Jiwei Li and Sadao Kurohashi. 2023. Gpt-re: In-context learning for relation extraction using large language models. arXiv:2305.02105. Retrieved from https:\/\/arxiv.org\/abs\/2305.02105","DOI":"10.18653\/v1\/2023.emnlp-main.214"},{"key":"e_1_3_2_117_2","first-page":"23318","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wang Peng","year":"2022","unstructured":"Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. 2022. Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In Proceedings of the International Conference on Machine Learning. PMLR, 23318\u201323340."},{"key":"e_1_3_2_118_2","unstructured":"Shuhe Wang Xiaofei Sun Xiaoya Li Rongbin Ouyang Fei Wu Tianwei Zhang Jiwei Li and Guoyin Wang. 2023. Gpt-ner: Named entity recognition via large language models. arXiv:2304.10428. Retrieved from https:\/\/arxiv.org\/abs\/2304.10428"},{"key":"e_1_3_2_119_2","unstructured":"Xuezhi Wang Jason Wei Dale Schuurmans Quoc Le Ed Huai hsin Chi and Denny Zhou. 2022. Self-consistency improves chain of thought reasoning in language models. arXiv:2203.11171. Retrieved from https:\/\/arxiv.org\/abs\/2203.11171"},{"key":"e_1_3_2_120_2","unstructured":"Yizhong Wang Hamish Ivison Pradeep Dasigi Jack Hessel Tushar Khot Khyathi Raghavi Chandu David Wadden Kelsey MacMillan Noah A. Smith Iz Beltagy and Hanna Hajishirzi. 2023. How far can camels go? Exploring the state of instruction tuning on open resources. arXiv:2306.04751. Retrieved from https:\/\/arxiv.org\/abs\/2306.04751https:\/\/github.com\/allenai\/open-instruct"},{"key":"e_1_3_2_121_2","doi-asserted-by":"crossref","unstructured":"Yizhong Wang Yeganeh Kordi Swaroop Mishra Alisa Liu Noah A. Smith Daniel Khashabi and Hannaneh Hajishirzi. 2022. Self-instruct: Aligning language model with self generated instructions. arXiv:2212.10560. Retrieved from https:\/\/arxiv.org\/abs\/2212.10560 https:\/\/github.com\/yizhongw\/self-instruct","DOI":"10.18653\/v1\/2023.acl-long.754"},{"key":"e_1_3_2_122_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.340"},{"key":"e_1_3_2_123_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.340"},{"key":"e_1_3_2_124_2","volume-title":"Proceedings of the 41st International Conference on Machine Learning, ICML 2024","author":"Wei Fangyun","year":"2024","unstructured":"Fangyun Wei, Xi Chen, and Lin Luo. 2024. Rethinking generative large language model evaluation for semantic comprehension. In Proceedings of the 41st International Conference on Machine Learning, ICML 2024. OpenReview.net."},{"key":"e_1_3_2_125_2","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Ed Huai hsin Chi F. Xia Quoc Le and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903. Retrieved from https:\/\/arxiv.org\/abs\/2201.11903"},{"key":"e_1_3_2_126_2","unstructured":"Yuxiang Wei Zhe Wang Jiawei Liu Yifeng Ding and Lingming Zhang. 2023. Magicoder: Source code is all you need. arXiv:2312.02120. Retrieved from https:\/\/arxiv.org\/abs\/2312.02120 https:\/\/github.com\/ise-uiuc\/magicoder?tab=readme-ov-file#-dataset"},{"key":"e_1_3_2_127_2","unstructured":"Sarah Wiegreffe Jack Hessel Swabha Swayamdipta Mark Riedl and Yejin Choi. 2021. Reframing human-AI collaboration for generating free-text explanations. arXiv:2112.08674. Retrieved from https:\/\/arxiv.org\/abs\/2112.08674"},{"key":"e_1_3_2_128_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2024.12.008"},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.39"},{"key":"e_1_3_2_130_2","unstructured":"Canwen Xu Daya Guo Nan Duan and Julian McAuley. 2023. Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv:2304.01196. Retrieved from https:\/\/arxiv.org\/abs\/2304.01196"},{"key":"e_1_3_2_131_2","unstructured":"Can Xu Qingfeng Sun Kai Zheng Xiubo Geng Pu Zhao Jiazhan Feng Chongyang Tao and Daxin Jiang. 2023. WizardLM: Empowering Large Language Models to Follow Complex Instructions. Retrieved from https:\/\/github.com\/nlpxucan\/evol-instruct. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_132_2","unstructured":"Zhiyang Xu Chao Feng Rulin Shao Trevor Ashby Ying Shen Di Jin Yu Cheng Qifan Wang and Lifu Huang. 2024. Vision-flan: Scaling human-labeled tasks in visual instruction tuning. arXiv:2402.11690. Retrieved from https:\/\/arxiv.org\/abs\/2402.11690"},{"key":"e_1_3_2_133_2","unstructured":"Zhiyang Xu Ying Shen and Lifu Huang. 2022. MultiInstruct: Improving multi-modal zero-shot learning via instruction tuning. arXiv:2212.10773. Retrieved from https:\/\/arxiv.org\/abs\/2212.10773 https:\/\/github.com\/VT-NLP\/MultiInstruct"},{"key":"e_1_3_2_134_2","unstructured":"Fuzhao Xue Kabir Jain Mahir Hitesh Shah Zangwei Zheng and Yang You. 2023. Instruction in the Wild: A User-based Instruction Dataset. Retrieved from https:\/\/github.com\/XueFuzhao\/InstructionWild. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_135_2","unstructured":"Jingfeng Yang Hongye Jin Ruixiang Tang Xiaotian Han Qizhang Feng Haoming Jiang Bing Yin and Xia Hu. 2023. Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond. arXiv:2304.13712. Retrieved from https:\/\/arxiv.org\/abs\/2304.13712"},{"key":"e_1_3_2_136_2","unstructured":"Kexin Yang Dayiheng Liu Wenqiang Lei Baosong Yang Mingfeng Xue Boxing Chen and Jun Xie. 2022. Tailor: A prompt-based approach to attribute-based controlled text generation. arXiv:2204.13362. Retrieved from https:\/\/arxiv.org\/abs\/2204.13362"},{"key":"e_1_3_2_137_2","doi-asserted-by":"crossref","unstructured":"Kevin Yang Nanyun Peng Yuandong Tian and Dan Klein. 2022. Re3: Generating longer stories with recursive reprompting and revision. arXiv:2210.06774. Retrieved from https:\/\/arxiv.org\/abs\/2210.06774","DOI":"10.18653\/v1\/2022.emnlp-main.296"},{"key":"e_1_3_2_138_2","unstructured":"Sherry Yang Ofir Nachum Yilun Du Jason Wei Pieter Abbeel and Dale Schuurmans. 2023. Foundation models for decision making: Problems methods and opportunities. arXiv:2303.04129. Retrieved from https:\/\/arxiv.org\/abs\/2303.04129"},{"key":"e_1_3_2_139_2","article-title":"The dawn of LMMs: Preliminary explorations with gpt-4v (ision)","author":"Yang Zhengyuan","year":"2023","unstructured":"Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. 2023. The dawn of LMMs: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421. Retrieved from https:\/\/arxiv.org\/abs\/2309.17421","journal-title":"arXiv preprint arXiv:2309.17421"},{"key":"e_1_3_2_140_2","unstructured":"Zhenfei Yin Jiong Wang Jianjian Cao Zhelun Shi Dingning Liu Mukai Li Lu Sheng Lei Bai Xiaoshui Huang Zhiyong Wang et\u00a0al. 2023. LAMM: Language-assisted multi-modal instruction-tuning dataset framework and benchmark. arXiv:2306.06687. Retrieved from https:\/\/arxiv.org\/abs\/2306.06687 https:\/\/github.com\/OpenLAMM\/LAMM"},{"key":"e_1_3_2_141_2","unstructured":"Jun Yu Yutong Dai Xiaokang Liu Jin Huang Yishan Shen Ke Zhang Rong Zhou Eashan Adhikarla Wenxuan Ye Yixin Liu et\u00a0al. 2024. Unleashing the power of multi-task learning: A comprehensive survey spanning traditional deep and pretrained foundation model eras. arXiv:2404.18961. Retrieved from https:\/\/arxiv.org\/abs\/2404.18961"},{"key":"e_1_3_2_142_2","unstructured":"Zhaojian Yu Xin Zhang Ning Shang Yangyu Huang Can Xu Yishujie Zhao Wenxiang Hu and Qiufeng Yin. 2023. Wavecoder: Widespread and versatile enhanced instruction tuning with refined data generation. arXiv:2312.14187. Retrieved from https:\/\/arxiv.org\/abs\/2312.14187"},{"key":"e_1_3_2_143_2","unstructured":"YuLan-Chat-Team. 2023. YuLan-Chat: An Open-Source Bilingual Chatbot. Retrieved from https:\/\/github.com\/RUC-GSAI\/YuLan-Chat. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_144_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-short.1"},{"key":"e_1_3_2_145_2","unstructured":"Ge Zhang Yemin Shi Ruibo Liu Ruibin Yuan Yizhi Li Siwei Dong Yu Shu Zhaoqun Li Zekun Wang Chenghua Lin et\u00a0al. 2023. Chinese open instruction generalist: A preliminary release. arXiv:2304.07987. Retrieved from https:\/\/arxiv.org\/abs\/2304.07987 https:\/\/github.com\/BAAI-Zlab\/COIG"},{"key":"e_1_3_2_146_2","doi-asserted-by":"crossref","unstructured":"Hang Zhang Xin Li and Lidong Bing. 2023. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv:2306.02858. Retrieved from https:\/\/arxiv.org\/abs\/2306.02858","DOI":"10.18653\/v1\/2023.emnlp-demo.49"},{"key":"e_1_3_2_147_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona T. Diab Xian Li Xi Victoria Lin et\u00a0al. 2022. OPT: Open pre-trained transformer language models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_2_148_2","unstructured":"Xiaoman Zhang Chaoyi Wu Ziheng Zhao Weixiong Lin Ya Zhang Yanfeng Wang and Weidi Xie. 2023. PMC-VQA: Visual instruction tuning for medical visual question answering. arXiv:2305.10415. Retrieved from https:\/\/arxiv.org\/abs\/2305.10415 https:\/\/github.com\/xiaoman-zhang\/PMC-VQA"},{"key":"e_1_3_2_149_2","unstructured":"Yuanhan Zhang Qinghong Sun Yichun Zhou Zexin He Zhenfei Yin Kun Wang Lu Sheng Yu Qiao Jing Shao and Ziwei Liu. 2022. Bamboo: Building mega-scale vision dataset continually with human-machine synergy. arXiv:2203.07845. Retrieved from https:\/\/arxiv.org\/abs\/2203.07845"},{"key":"e_1_3_2_150_2","unstructured":"Ziyin Zhang Lizhen Xu Zhaokun Jiang Hongkun Hao and Rui Wang. 2024. Multiple-choice questions are efficient and Robust LLM evaluators. arXiv:2405.11966. Retrieved from https:\/\/arxiv.org\/abs\/2405.11966"},{"key":"e_1_3_2_151_2","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhao Tony","year":"2021","unstructured":"Tony Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_3_2_152_2","unstructured":"Wenting Zhao Xiang Ren Jack Hessel Claire Cardie Yejin Choi and Yuntian Deng. 2024. Wildchat: 1m chatgpt interaction logs in the wild. arXiv:2405.01470. Retrieved from https:\/\/arxiv.org\/abs\/2405.01470 https:\/\/huggingface.co\/datasets\/allenai\/WildChat"},{"key":"e_1_3_2_153_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et\u00a0al. 2023. A survey of large language models. arXiv:2303.18223. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223"},{"key":"e_1_3_2_154_2","article-title":"Judging LLM-as-a-judge with MT-bench and chatbot arena","volume":"36","author":"Zheng Lianmin","year":"2023","unstructured":"Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et\u00a0al. 2023. Judging LLM-as-a-judge with MT-bench and chatbot arena. Advances in Neural Information Processing Systems 36 (2023), 46595\u2013466.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_155_2","first-page":"46595","article-title":"Judging LLM-as-a-judge with MT-bench and chatbot arena","volume":"36","author":"Zheng Lianmin","year":"2023","unstructured":"Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et\u00a0al. 2023. Judging LLM-as-a-judge with MT-bench and chatbot arena. Advances in Neural Information Processing Systems 36 (2023), 46595\u201346623.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_156_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-naacl.149"},{"key":"e_1_3_2_157_2","unstructured":"Chunting Zhou Pengfei Liu Puxin Xu Srini Iyer Jiao Sun Yuning Mao Xuezhe Ma Avia Efrat Ping Yu L. Yu et\u00a0al. 2023. LIMA: Less is more for alignment. arXiv:2305.11206. Retrieved from https:\/\/arxiv.org\/abs\/2305.11206 https:\/\/huggingface.co\/datasets\/GAIR\/lima"},{"key":"e_1_3_2_158_2","unstructured":"Jeffrey Zhou Tianjian Lu Swaroop Mishra Siddhartha Brahma Sujoy Basu Yi Luan Denny Zhou and Le Hou. 2023. Instruction-following evaluation for large language models. arXiv:2311.07911. Retrieved from https:\/\/arxiv.org\/abs\/2311.07911"},{"key":"e_1_3_2_159_2","unstructured":"Banghua Zhu Evan Frick Tianhao Wu Hanlin Zhu and Jiantao Jiao. 2023. Starling-7b: Improving LLM helpfulness & harmlessness with RLAIF. (2023). Retrieved from https:\/\/huggingface.co\/datasets\/berkeley-nest\/Nectar. (Last accessed date: 2025.12.01)."},{"key":"e_1_3_2_160_2","unstructured":"Deyao Zhu Jun Chen Xiaoqian Shen Xiang Li and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv:2304.10592. Retrieved from https:\/\/arxiv.org\/abs\/2304.10592"},{"key":"e_1_3_2_161_2","unstructured":"Ziyu Zhuang Qiguang Chen Longxuan Ma Mingda Li Yi Han Yushan Qian Haopeng Bai Zixian Feng Weinan Zhang and Ting Liu. 2023. Through the lens of core competency: Survey on evaluation of large language models. arXiv:2308.07902. Retrieved from https:\/\/arxiv.org\/abs\/2308.07902"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3777411","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T15:57:17Z","timestamp":1767887837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777411"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,8]]},"references-count":160,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2026,5,31]]}},"alternative-id":["10.1145\/3777411"],"URL":"https:\/\/doi.org\/10.1145\/3777411","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,8]]},"assertion":[{"value":"2023-11-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}