{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T19:14:06Z","timestamp":1726514046590},"reference-count":143,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"name":"NSF","award":["IIS-2224843, IIS-1900990"]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2024,7,31]]},"abstract":"\n This article presents a comprehensive and practical guide for practitioners and end-users working with Large Language Models (LLMs) in their downstream Natural Language Processing (NLP) tasks. We provide discussions and insights into the usage of LLMs from the perspectives of models, data, and downstream tasks. First, we offer an introduction and brief summary of current language models. Then, we discuss the influence of pre-training data, training data, and test data. Most importantly, we provide a detailed discussion about the use and non-use cases of large language models for various natural language processing tasks, such as knowledge-intensive tasks, traditional natural language understanding tasks, generation tasks, emergent abilities, and considerations for specific tasks. We present various use cases and non-use cases to illustrate the practical applications and limitations of LLMs in real-world scenarios. We also try to understand the importance of data and the specific challenges associated with each NLP task. Furthermore, we explore the impact of spurious biases on LLMs and delve into other essential considerations, such as efficiency, cost, and latency, to ensure a comprehensive understanding of deploying LLMs in practice. This comprehensive guide aims to provide researchers and practitioners with valuable insights and best practices for working with LLMs, thereby enabling the successful implementation of these models in a wide range of NLP tasks. A curated list of practical guide resources of LLMs, regularly updated, can be found at\n https:\/\/github.com\/Mooler0410\/LLMsPracticalGuide<\/jats:ext-link>\n . An LLMs evolutionary tree, editable yet regularly updated, can be found at\n llmtree.ai<\/jats:ext-link>\n .\n <\/jats:p>","DOI":"10.1145\/3649506","type":"journal-article","created":{"date-parts":[[2024,2,28]],"date-time":"2024-02-28T12:52:27Z","timestamp":1709124747000},"page":"1-32","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":48,"title":["Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-9713-1792","authenticated-orcid":false,"given":"Jingfeng","family":"Yang","sequence":"first","affiliation":[{"name":"Amazon.com Inc, Seattle, United States"}]},{"ORCID":"http:\/\/orcid.org\/0009-0009-7920-632X","authenticated-orcid":false,"given":"Hongye","family":"Jin","sequence":"additional","affiliation":[{"name":"Texas A&M University College Station, College Station, United States"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-6476-2336","authenticated-orcid":false,"given":"Ruixiang","family":"Tang","sequence":"additional","affiliation":[{"name":"Rice University, Houston, United States"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1175-7925","authenticated-orcid":false,"given":"Xiaotian","family":"Han","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, Texas A&M University, College Station, United States"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2574-0270","authenticated-orcid":false,"given":"Qizhang","family":"Feng","sequence":"additional","affiliation":[{"name":"Texas A&M University College Station, College Station, United States"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-0789-525X","authenticated-orcid":false,"given":"Haoming","family":"Jiang","sequence":"additional","affiliation":[{"name":"Amazon.com Inc, Seattle, United States"}]},{"ORCID":"http:\/\/orcid.org\/0009-0001-7289-3667","authenticated-orcid":false,"given":"Shaochen","family":"Zhong","sequence":"additional","affiliation":[{"name":"Rice University, Houston, United States"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-5890-0031","authenticated-orcid":false,"given":"Bing","family":"Yin","sequence":"additional","affiliation":[{"name":"Amazon.com Inc, Seattle, United States"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-2234-3226","authenticated-orcid":false,"given":"Xia","family":"Hu","sequence":"additional","affiliation":[{"name":"Computer Science, Rice University, Houston, United States"}]}],"member":"320","published-online":{"date-parts":[[2024,4,26]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"New York Times. [n. d.]. ChatGPT Is Banned in Italy Over Privacy Concerns\u2014The New York Times. Retrieved from https:\/\/www.nytimes.com\/2023\/03\/31\/technology\/chatgpt-italy-ban.html(accessed on 04\/23\/2023)."},{"key":"e_1_3_3_3_2","unstructured":"Lambda Labs. [n.d.]. OpenAI\u2019s GPT-3 Language Model: A Technical Overview. Retrieved from https:\/\/lambdalabs.com\/blog\/demystifying-gpt-3#1(accessed on 03\/02\/2023)."},{"key":"e_1_3_3_4_2","unstructured":"OpenAI. [n.d.]. Pricing. Retrieved from https:\/\/openai.com\/pricing(accessed on 03\/02\/2023)."},{"key":"e_1_3_3_5_2","doi-asserted-by":"crossref","unstructured":"Joshua Ainslie Tao Lei Michiel de Jong Santiago Onta\u00f1\u00f3n Siddhartha Brahma Yury Zemlyanskiy David Uthus Mandy Guo James Lee-Thorp Yi Tay et\u00a0al. 2023. Colt5: Faster long-range transformers with conditional computation. Retrieved from https:\/\/arXiv:2303.09752","DOI":"10.18653\/v1\/2023.emnlp-main.309"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-short.16"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1038\/d41586-023-00641-w"},{"key":"e_1_3_3_8_2","unstructured":"Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry Quoc Le et\u00a0al. 2021. Program synthesis with large language models. Retrieved from https:\/\/arXiv:2108.07732"},{"key":"e_1_3_3_9_2","unstructured":"Yuntao Bai Saurav Kadavath Sandipan Kundu Amanda Askell Jackson Kernion Andy Jones Anna Chen Anna Goldie Azalia Mirhoseini Cameron McKinnon et\u00a0al. 2022. Constitutional AI: Harmlessness from AI u. Retrieved from https:\/\/arXiv:2212.08073"},{"key":"e_1_3_3_10_2","first-page":"642","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Bao Hangbo","year":"2020","unstructured":"Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, et\u00a0al. 2020. Unilmv2: Pseudo-masked language models for unified language model pre-training. In Proceedings of the International Conference on Machine Learning. PMLR, 642\u2013652."},{"key":"e_1_3_3_11_2","first-page":"1533","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Berant Jonathan","year":"2013","unstructured":"Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1533\u20131544."},{"key":"e_1_3_3_12_2","doi-asserted-by":"crossref","unstructured":"Maciej Besta Nils Blach Ales Kubicek Robert Gerstenberger Lukas Gianinazzi Joanna Gajda Tomasz Lehmann Michal Podstawski Hubert Niewiadomski Piotr Nyczyk et\u00a0al. 2023. Graph of thoughts: Solving elaborate problems with large language models. Retrieved from https:\/\/arXiv:2308.09687","DOI":"10.1609\/aaai.v38i16.29720"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.12840\/issn.2255-4165.017"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2301"},{"key":"e_1_3_3_15_2","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et\u00a0al. 2021. On the opportunities and risks of foundation models. Retrieved from https:\/\/arXiv:2108.07258"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308560.3317593"},{"key":"e_1_3_3_17_2","doi-asserted-by":"crossref","unstructured":"Samuel R. Bowman Gabor Angeli Christopher Potts and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. Retrieved from https:\/\/arXiv:1508.05326","DOI":"10.18653\/v1\/D15-1075"},{"key":"e_1_3_3_18_2","unstructured":"Samuel R. Bowman Jeeyoon Hyun Ethan Perez Edwin Chen Craig Pettit Scott Heiner Kamile Lukosuite Amanda Askell Andy Jones Anna Chen et\u00a0al. 2022. Measuring progress on scalable oversight for large language models. Retrieved from https:\/\/arXiv:2211.03540"},{"key":"e_1_3_3_19_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Adv. Neural Info. Process. Syst. 33 (2020), 1877\u20131901.","journal-title":"Adv. Neural Info. Process. Syst."},{"key":"e_1_3_3_20_2","first-page":"77","volume-title":"Proceedings of the Conference on Fairness, Accountability and Transparency","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency. PMLR, 77\u201391."},{"key":"e_1_3_3_21_2","unstructured":"Yupeng Chang Xu Wang Jindong Wang Yuan Wu Kaijie Zhu Hao Chen Linyi Yang Xiaoyuan Yi Cunxiang Wang Yidong Wang et\u00a0al. 2023. A survey on evaluation of large language models. Retrieved from https:\/\/arXiv:2307.03109"},{"key":"e_1_3_3_22_2","unstructured":"Guanzheng Chen Xin Li Zaiqiao Meng Shangsong Liang and Lidong Bing. 2023. Clex: Continuous length extrapolation for large language models. Retrieved from https:\/\/arXiv:2310.16450"},{"key":"e_1_3_3_23_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et\u00a0al. 2021. Evaluating large language models trained on code. Retrieved from https:\/\/arXiv:2107.03374"},{"key":"e_1_3_3_24_2","unstructured":"Xi Chen Xiao Wang Soravit Changpinyo A. J. Piergiovanni Piotr Padlewski Daniel Salz Sebastian Goodman Adam Grycner Basil Mustafa Lucas Beyer et\u00a0al. 2022. Pali: A jointly-scaled multilingual language-image model. Retrieved from https:\/\/arXiv:2209.06794"},{"key":"e_1_3_3_25_2","doi-asserted-by":"crossref","unstructured":"Eunsol Choi He He Mohit Iyyer Mark Yatskar Wen-tau Yih Yejin Choi Percy Liang and Luke Zettlemoyer. 2018. QuAC: Question answering in context. Retrieved from https:\/\/arXiv:1808.07036 (2018).","DOI":"10.18653\/v1\/D18-1241"},{"key":"e_1_3_3_26_2","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann et\u00a0al. 2022. Palm: Scaling language modeling with pathways. Retrieved from https:\/\/arXiv:2204.02311"},{"key":"e_1_3_3_27_2","unstructured":"Zheng Chu Jingchang Chen Qianglong Chen Weijiang Yu Tao He Haotian Wang Weihua Peng Ming Liu Bing Qin and Ting Liu. 2023. A survey of chain of thought reasoning: Advances frontiers and future. Retrieved from https:\/\/arXiv:2309.15402"},{"key":"e_1_3_3_28_2","unstructured":"Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay William Fedus Eric Li Xuezhi Wang Mostafa Dehghani Siddhartha Brahma et\u00a0al. 2022. Scaling instruction-finetuned language models. Retrieved from https:\/\/arXiv:2210.11416"},{"key":"e_1_3_3_29_2","unstructured":"Peter Clark Isaac Cowhey Oren Etzioni Tushar Khot Ashish Sabharwal Carissa Schoenick and Oyvind Tafjord. 2018. Think you have solved question answering? try arc the ai2 reasoning challenge. Retrieved from https:\/\/arXiv:1803.05457"},{"key":"e_1_3_3_30_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano et\u00a0al. 2021. Training verifiers to solve math word problems. Retrieved from https:\/\/arXiv:2110.14168"},{"key":"e_1_3_3_31_2","unstructured":"Haixing Dai Zhengliang Liu Wenxiong Liao Xiaoke Huang Zihao Wu Lin Zhao Wei Liu Ninghao Liu Sheng Li Dajiang Zhu et\u00a0al. 2023. ChatAug: Leveraging ChatGPT for Text Data Augmentation. Retrieved from https:\/\/arXiv:2302.13007"},{"key":"e_1_3_3_32_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https:\/\/arXiv:1810.04805"},{"key":"e_1_3_3_33_2","doi-asserted-by":"crossref","unstructured":"Bosheng Ding Chengwei Qin Linlin Liu Lidong Bing Shafiq Joty and Boyang Li. 2022. Is GPT-3 a Good Data Annotator? Retrieved from https:\/\/arXiv:2212.10450","DOI":"10.18653\/v1\/2023.acl-long.626"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533234"},{"key":"e_1_3_3_35_2","unstructured":"Qingxiu Dong Lei Li Damai Dai Ce Zheng Zhiyong Wu Baobao Chang Xu Sun Jingjing Xu and Zhifang Sui. 2022. A survey for in-context learning. Retrieved from https:\/\/arXiv:2301.00234"},{"key":"e_1_3_3_36_2","unstructured":"Mengnan Du Fengxiang He Na Zou Dacheng Tao and Xia Hu. 2022. Shortcut learning of large language models in natural language understanding: A survey. Retrieved from https:\/\/arXiv:2208.11857"},{"key":"e_1_3_3_37_2","unstructured":"Corentin Duchene Henri Jamet Pierre Guillaume and Reda Dehak. 2023. A benchmark for toxic comment classification on Civil Comments dataset. Retrieved from https:\/\/arXiv:2301.11125"},{"key":"e_1_3_3_38_2","unstructured":"Jinlan Fu See-Kiong Ng Zhengbao Jiang and Pengfei Liu. 2023. Gptscore: Evaluate as you desire. Retrieved from https:\/\/arXiv:2302.04166"},{"key":"e_1_3_3_39_2","article-title":"OpenAGI: When LLM meets domain experts","author":"Ge Yingqiang","year":"2023","unstructured":"Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, and Yongfeng Zhang. 2023. OpenAGI: When LLM meets domain experts. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS\u201923).","journal-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS\u201923)"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-00257-z"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00370"},{"key":"e_1_3_3_42_2","doi-asserted-by":"crossref","unstructured":"Fabrizio Gilardi Meysam Alizadeh and Ma\u00ebl Kubli. 2023. ChatGPT outperforms crowd-workers for text-annotation tasks. Retrieved from https:\/\/arXiv:2303.15056","DOI":"10.1073\/pnas.2305016120"},{"key":"e_1_3_3_43_2","unstructured":"Tanya Goyal Junyi Jessy Li and Greg Durrett. 2022. News summarization and evaluation in the era of gpt-3. Retrieved from https:\/\/arXiv:2209.12356"},{"key":"e_1_3_3_44_2","unstructured":"Suriya Gunasekar Yi Zhang Jyoti Aneja Caio C\u00e9sar Teodoro Mendes Allie Del Giorno Sivakanth Gopi Mojan Javaheripi Piero Kauffmann Gustavo de Rosa Olli Saarikivi et\u00a0al. 2023. Textbooks are all you need. Retrieved from https:\/\/arXiv:2306.11644"},{"key":"e_1_3_3_45_2","doi-asserted-by":"crossref","unstructured":"Mandy Guo Joshua Ainslie David Uthus Santiago Ontanon Jianmo Ni Yun-Hsuan Sung and Yinfei Yang. 2021. LongT5: Efficient text-to-text transformer for long sequences. Retrieved from https:\/\/arXiv:2112.07916","DOI":"10.18653\/v1\/2022.findings-naacl.55"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10742"},{"key":"e_1_3_3_47_2","unstructured":"Xiaochuang Han Daniel Simig Todor Mihaylov Yulia Tsvetkov Asli Celikyilmaz and Tianlu Wang. 2023. Understanding in-context learning via supportive pretraining data. Retrieved from https:\/\/arXiv:2306.15091"},{"key":"e_1_3_3_48_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. Retrieved from https:\/\/arXiv:2009.03300"},{"key":"e_1_3_3_49_2","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de Las Casas Lisa Anne Hendricks Johannes Welbl Aidan Clark et\u00a0al. 2022. Training compute-optimal large language models. Retrieved from https:\/\/arXiv:2203.15556"},{"key":"e_1_3_3_50_2","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. Retrieved from https:\/\/arXiv:2106.09685"},{"key":"e_1_3_3_51_2","unstructured":"Hang Hua Xingjian Li Dejing Dou Cheng-Zhong Xu and Jiebo Luo. 2022. Fine-tuning Pre-trained Language Models with Noise Stability Regularization. Retrieved from https:\/\/arXiv:2206.05658"},{"key":"e_1_3_3_52_2","doi-asserted-by":"crossref","unstructured":"Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. Retrieved from https:\/\/arXiv:2212.10403","DOI":"10.18653\/v1\/2023.findings-acl.67"},{"key":"e_1_3_3_53_2","unstructured":"Gautier Izacard Patrick Lewis Maria Lomeli Lucas Hosseini Fabio Petroni Timo Schick Jane Dwivedi-Yu Armand Joulin Sebastian Riedel and Edouard Grave. 2022. Few-shot learning with retrieval augmented language models. http:\/\/arxiv.org\/abs\/2208.03299"},{"key":"e_1_3_3_54_2","unstructured":"Wenxiang Jiao Wenxuan Wang Jen-tse Huang Xing Wang Shuming Shi and Zhaopeng Tu. 2023. Is ChatGPT a good translator? Yes with GPT-4 as the engine. arXiv preprint arXiv:2301.08745 (2023)."},{"key":"e_1_3_3_55_2","unstructured":"Hongye Jin Xiaotian Han Jingfeng Yang Zhimeng Jiang Chia-Yuan Chang and Xia Hu. 2023. Growlength: Accelerating llms pretraining by progressively growing training length. Retrieved from https:\/\/arXiv:2310.00576"},{"key":"e_1_3_3_56_2","unstructured":"Hongye Jin Xiaotian Han Jingfeng Yang Zhimeng Jiang Zirui Liu Chia-Yuan Chang Huiyuan Chen and Xia Hu. 2024. LLM maybe LongLM: Self-extend LLM context window without tuning. Retrieved from https:\/\/arXiv:2401.01325"},{"key":"e_1_3_3_57_2","doi-asserted-by":"crossref","unstructured":"Mandar Joshi Eunsol Choi Daniel S. Weld and Luke Zettlemoyer. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. Retrieved from https:\/\/arXiv:1705.03551 (2017).","DOI":"10.18653\/v1\/P17-1147"},{"key":"e_1_3_3_58_2","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. Retrieved from https:\/\/arXiv:2001.08361"},{"key":"e_1_3_3_59_2","doi-asserted-by":"crossref","unstructured":"Akhil Kedia Mohd Abbas Zaidi and Haejun Lee. 2022. FiE: Building a global probability space by leveraging early fusion in encoder for open-domain question answering. Retrieved from https:\/\/arXiv:2211.10147","DOI":"10.18653\/v1\/2022.emnlp-main.285"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_3_61_2","unstructured":"Tom Kocmi and Christian Federmann. 2023. Large language models are state-of-the-art evaluators of translation quality. Retrieved from https:\/\/arXiv:2302.14520"},{"key":"e_1_3_3_62_2","doi-asserted-by":"crossref","unstructured":"Lingkai Kong Haoming Jiang Yuchen Zhuang Jie Lyu Tuo Zhao and Chao Zhang. 2020. Calibrated language model fine-tuning for in-and out-of-distribution data. Retrieved from https:\/\/arXiv:2010.11506","DOI":"10.18653\/v1\/2020.emnlp-main.102"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00276"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.85"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.49"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-emnlp.409"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.682"},{"key":"e_1_3_3_68_2","doi-asserted-by":"crossref","unstructured":"Mike Lewis Yinhan Liu Naman Goyal Marjan Ghazvininejad Abdelrahman Mohamed Omer Levy Ves Stoyanov and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation translation and comprehension. Retrieved from https:\/\/arXiv:1910.13461","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_3_69_2","unstructured":"Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. Retrieved from https:\/\/arXiv:2101.00190"},{"key":"e_1_3_3_70_2","unstructured":"Percy Liang Rishi Bommasani Tony Lee Dimitris Tsipras Dilara Soylu Michihiro Yasunaga Yian Zhang Deepak Narayanan Yuhuai Wu Ananya Kumar et\u00a0al. 2022. Holistic evaluation of language models. Retrieved from https:\/\/arXiv:2211.09110"},{"key":"e_1_3_3_71_2","first-page":"Association for","volume-title":"Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, 74\u201381."},{"key":"e_1_3_3_72_2","doi-asserted-by":"crossref","unstructured":"Wang Ling Dani Yogatama Chris Dyer and Phil Blunsom. 2017. Program induction by rationale generation: Learning to solve and explain algebraic word problems. Retrieved from https:\/\/arXiv:1705.04146","DOI":"10.18653\/v1\/P17-1015"},{"key":"e_1_3_3_73_2","doi-asserted-by":"crossref","unstructured":"Xiao Liu Kaixuan Ji Yicheng Fu Zhengxiao Du Zhilin Yang and Jie Tang. 2021. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. Retrieved from https:\/\/arXiv:2110.07602","DOI":"10.18653\/v1\/2022.acl-short.8"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-short.8"},{"key":"e_1_3_3_75_2","doi-asserted-by":"crossref","unstructured":"Yang Liu Dan Iter Yichong Xu Shuohang Wang Ruochen Xu and Chenguang Zhu. 2023. GPTEval: NLG Evaluation using GPT-4 with Better Human Alignment. Retrieved from arxiv:2303.16634","DOI":"10.18653\/v1\/2023.emnlp-main.153"},{"key":"e_1_3_3_76_2","doi-asserted-by":"crossref","unstructured":"Yixin Liu Pengfei Liu Dragomir Radev and Graham Neubig. 2022. BRIO: Bringing order to abstractive summarization. Retrieved from https:\/\/arXiv:2203.16804","DOI":"10.18653\/v1\/2022.acl-long.207"},{"key":"e_1_3_3_77_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. Retrieved from https:\/\/arXiv:1907.11692"},{"key":"e_1_3_3_78_2","unstructured":"Shayne Longpre Le Hou Tu Vu Albert Webson Hyung Won Chung Yi Tay Denny Zhou Quoc V. Le Barret Zoph Jason Wei et\u00a0al. 2023. The flan collection: Designing data and methods for effective instruction tuning. Retrieved from https:\/\/arXiv:2301.13688"},{"key":"e_1_3_3_79_2","doi-asserted-by":"crossref","unstructured":"Yao Lu Max Bartolo Alastair Moore Sebastian Riedel and Pontus Stenetorp. 2021. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. Retrieved from https:\/\/arXiv:2104.08786","DOI":"10.18653\/v1\/2022.acl-long.556"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.556"},{"key":"e_1_3_3_81_2","first-page":"2441","article-title":"Luna: Linear unified nested attention","volume":"34","author":"Ma Xuezhe","year":"2021","unstructured":"Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, and Luke Zettlemoyer. 2021. Luna: Linear unified nested attention. Adv. Neural Info. Process. Syst. 34 (2021), 2441\u20132453.","journal-title":"Adv. Neural Info. Process. Syst."},{"key":"e_1_3_3_82_2","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002491"},{"key":"e_1_3_3_83_2","unstructured":"Ian McKenzie Alexander Lyzhov Alicia Parrish Ameya Prabhu Aaron Mueller Najoung Kim Sam Bowman and Ethan Perez. 2023. Inverse Scaling Prize: Second Round Winners. Retrieved from https:\/\/irmckenzie.co.uk\/round2"},{"key":"e_1_3_3_84_2","doi-asserted-by":"crossref","unstructured":"Ramesh Nallapati Bowen Zhou Caglar Gulcehre Bing Xiang et\u00a0al. 2016. Abstractive text summarization using sequence-to-sequence RNNs and beyond. Retrieved from https:\/\/arXiv:1602.06023","DOI":"10.18653\/v1\/K16-1028"},{"key":"e_1_3_3_85_2","doi-asserted-by":"crossref","unstructured":"Shashi Narayan Shay B. Cohen and Mirella Lapata. 2018. Don\u2019t give me the details just the summary! Topic-aware convolutional neural networks for extreme summarization. Retrieved from https:\/\/arXiv:1808.08745","DOI":"10.18653\/v1\/D18-1206"},{"key":"e_1_3_3_86_2","first-page":"660","article-title":"MS MARCO: A human generated machine reading comprehension dataset","volume":"2640","author":"Nguyen Tri","year":"2016","unstructured":"Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. Choice 2640 (2016), 660.","journal-title":"Choice"},{"key":"e_1_3_3_87_2","doi-asserted-by":"crossref","unstructured":"Yixin Nie Adina Williams Emily Dinan Mohit Bansal Jason Weston and Douwe Kiela. 2019. Adversarial NLI: A new benchmark for natural language understanding. Retrieved from https:\/\/arXiv:1910.14599","DOI":"10.18653\/v1\/2020.acl-main.441"},{"key":"e_1_3_3_88_2","unstructured":"OpenAI. [n.d.]. GPT-4 System Card. Retrieved from https:\/\/cdn.openai.com\/papers\/gpt-4-system-card.pdf"},{"key":"e_1_3_3_89_2","unstructured":"OpenAI. 2023. GPT-4 Technical Report. Retrieved from arxiv:2303.08774"},{"key":"e_1_3_3_90_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray et\u00a0al. 2022. Training language models to follow instructions with human feedback. Adv. Neural Info. Process. Syst. 35 (2022), 27730\u201327744.","journal-title":"Adv. Neural Info. Process. Syst."},{"key":"e_1_3_3_91_2","unstructured":"Ankit Pal. 2022. Promptify: Structured Output from LLMs. Retrieved from https:\/\/github.com\/promptslab\/Promptify. Prompt-Engineering components for NLP tasks in Python."},{"key":"e_1_3_3_92_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_3_93_2","doi-asserted-by":"crossref","unstructured":"Arkil Patel Satwik Bhattamishra and Navin Goyal. 2021. Are NLP models really able to solve simple math word problems? Retrieved from https:\/\/arXiv:2103.07191","DOI":"10.18653\/v1\/2021.naacl-main.168"},{"key":"e_1_3_3_94_2","unstructured":"Bowen Peng Jeffrey Quesnelle Honglu Fan and Enrico Shippole. 2023. YaRN: Efficient context window extension of large language models. Retrieved from https:\/\/arXiv:2309.00071"},{"key":"e_1_3_3_95_2","unstructured":"Chengwei Qin Aston Zhang Zhuosheng Zhang Jiaao Chen Michihiro Yasunaga and Diyi Yang. 2023. Is ChatGPT a general-purpose natural language processing task solver? Retrieved from https:\/\/arXiv:2302.06476"},{"key":"e_1_3_3_96_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.04.020"},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_3_98_2","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Robin Jia and Percy Liang. 2018. Know what you don\u2019t know: Unanswerable questions for SQuAD. Retrieved from https:\/\/arXiv:1806.03822","DOI":"10.18653\/v1\/P18-2124"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00266"},{"key":"e_1_3_3_100_2","first-page":"15","article-title":"Transfer learning in natural language processing tutorial","author":"Ruder Sebastian","year":"2019","unstructured":"Sebastian Ruder, Matthew Peters, Swabha Swayamdipta, and Thomas Wolf. 2019. Transfer learning in natural language processing tutorial. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HTL\u201919). 15.","journal-title":"Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HTL\u201919)"},{"key":"e_1_3_3_101_2","unstructured":"Erik F. Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. Retrieved from https:\/\/cs\/0306050"},{"key":"e_1_3_3_102_2","unstructured":"Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Teven Le Scao Arun Raja et\u00a0al. 2021. Multitask prompted training enables zero-shot task generalization. Retrieved from https:\/\/arXiv:2110.08207"},{"key":"e_1_3_3_103_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ili\u0107 Daniel Hesslow Roman Castagn\u00e9 Alexandra Sasha Luccioni Fran\u00e7ois Yvon Matthias Gall\u00e9 et\u00a0al. 2022. Bloom: A 176b-parameter open-access multilingual language model. Retrieved from https:\/\/arXiv:2211.05100"},{"key":"e_1_3_3_104_2","unstructured":"Lingfeng Shen Aayush Mishra and Daniel Khashabi. 2023. Do pretrained transformers really learn in-context by gradient descent? Retrieved from https:\/\/arXiv:2310.08540"},{"key":"e_1_3_3_105_2","first-page":"1631","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631\u20131642."},{"key":"e_1_3_3_106_2","unstructured":"Aarohi Srivastava Abhinav Rastogi Abhishek Rao Abu Awal Md Shoeb Abubakar Abid Adam Fisch Adam R. Brown Adam Santoro Aditya Gupta Adri\u00e0 Garriga-Alonso et\u00a0al. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Retrieved from https:\/\/arXiv:2206.04615"},{"key":"e_1_3_3_107_2","unstructured":"Ruixiang Tang Yu-Neng Chuang and Xia Hu. 2023. The science of detecting LLM-generated texts. Retrieved from https:\/\/arXiv:2303.07205"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449950"},{"key":"e_1_3_3_109_2","unstructured":"Ruixiang Tang Xiaotian Han Xiaoqian Jiang and Xia Hu. 2023. Does synthetic data generation of llms help clinical text mining? Retrieved from https:\/\/arXiv:2303.04360"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.284"},{"key":"e_1_3_3_111_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. Retrieved from https:\/\/github.com\/tatsu-lab\/stanford_alpaca"},{"key":"e_1_3_3_112_2","first-page":"10183","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tay Yi","year":"2021","unstructured":"Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, and Che Zheng. 2021. Synthesizer: Rethinking self-attention for transformer models. In Proceedings of the International Conference on Machine Learning. PMLR, 10183\u201310192."},{"key":"e_1_3_3_113_2","article-title":"DDXPlus: A new dataset for automatic medical diagnosis","author":"Tchango Arsene Fansi","year":"2022","unstructured":"Arsene Fansi Tchango, Rishab Goel, Zhi Wen, Julien Martel, and Joumana Ghosn. 2022. DDXPlus: A new dataset for automatic medical diagnosis. Proceedings of the Neural Information Processing Systems\u2014Track on Datasets and Benchmarks. Retrieved from https:\/\/arxiv.org\/abs\/2205.09148","journal-title":"Proceedings of the Neural Information Processing Systems\u2014Track on Datasets and Benchmarks"},{"key":"e_1_3_3_114_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar and others. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"e_1_3_3_115_2","unstructured":"Jonathan Uesato Nate Kushman Ramana Kumar Francis Song Noah Siegel Lisa Wang Antonia Creswell Geoffrey Irving and Irina Higgins. 2022. Solving math word problems with process-and outcome-based feedback. Retrieved from https:\/\/arXiv:2211.14275"},{"key":"e_1_3_3_116_2","article-title":"Superglue: A stickier benchmark for general-purpose language understanding systems","volume":"32","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2019. Superglue: A stickier benchmark for general-purpose language understanding systems. Adv. Neural Info. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Info. Process. Syst."},{"key":"e_1_3_3_117_2","doi-asserted-by":"crossref","unstructured":"Alex Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy and Samuel R. Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. Retrieved from https:\/\/arXiv:1804.07461","DOI":"10.18653\/v1\/W18-5446"},{"key":"e_1_3_3_118_2","unstructured":"Jindong Wang Xixu Hu Wenxin Hou Hao Chen Runkai Zheng Yidong Wang Linyi Yang Haojun Huang Wei Ye Xiubo Geng et\u00a0al. 2023. On the robustness of ChatGPT: An adversarial and out-of-distribution perspective. Retrieved from https:\/\/arXiv:2302.12095"},{"key":"e_1_3_3_119_2","doi-asserted-by":"crossref","unstructured":"Jiaan Wang Yunlong Liang Fandong Meng Haoxiang Shi Zhixu Li Jinan Xu Jianfeng Qu and Jie Zhou. 2023. Is ChatGPT a good NLG evaluator? A preliminary study. Retrieved from https:\/\/arXiv:2303.04048","DOI":"10.18653\/v1\/2023.newsum-1.1"},{"key":"e_1_3_3_120_2","first-page":"22964","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wang Thomas","year":"2022","unstructured":"Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, and Colin Raffel. 2022. What language model architecture and pretraining objective works best for zero-shot generalization? In Proceedings of the International Conference on Machine Learning. PMLR, 22964\u201322984."},{"key":"e_1_3_3_121_2","doi-asserted-by":"crossref","unstructured":"Wenhui Wang Hangbo Bao Li Dong Johan Bjorck Zhiliang Peng Qiang Liu Kriti Aggarwal Owais Khan Mohammed Saksham Singhal Subhojit Som et\u00a0al. 2022. Image as a foreign language: BEiT pretraining for all vision and vision-language tasks. Retrieved from https:\/\/arXiv:2208.10442","DOI":"10.1109\/CVPR52729.2023.01838"},{"key":"e_1_3_3_122_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.167"},{"key":"e_1_3_3_123_2","unstructured":"Jason Wei Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai and Quoc V. Le. 2021. Fine-tuned language models are zero-shot learners. Retrieved from https:\/\/arXiv:2109.01652"},{"key":"e_1_3_3_124_2","article-title":"Emergent abilities of large language models","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent abilities of large language models. Trans. Mach. Learn. Res. (2022). Retrieved from https:\/\/openreview.net\/forum?id=yzkSU5zdwD","journal-title":"Trans. Mach. Learn. Res."},{"key":"e_1_3_3_125_2","doi-asserted-by":"crossref","unstructured":"Jason Wei Yi Tay and Quoc V. Le. 2022. Inverse scaling can become U-shaped. Retrieved from https:\/\/arXiv:2211.02011","DOI":"10.18653\/v1\/2023.emnlp-main.963"},{"key":"e_1_3_3_126_2","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Ed Chi Quoc Le and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. Retrieved from https:\/\/arXiv:2201.11903"},{"key":"e_1_3_3_127_2","doi-asserted-by":"crossref","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz Joe Davison Sam Shleifer Patrick von Platen Clara Ma Yacine Jernite Julien Plu Canwen Xu Teven Le Scao Sylvain Gugger Mariama Drame Quentin Lhoest and Alexander M. Rush. 2020. HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. Retrieved from https:\/\/arXiv1910.03771","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_3_128_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00780"},{"key":"e_1_3_3_129_2","doi-asserted-by":"crossref","unstructured":"Jingfeng Yang Aditya Gupta Shyam Upadhyay Luheng He Rahul Goel and Shachi Paul. 2022. Tableformer: Robust transformer modeling for table-text encoding. Retrieved from https:\/\/arXiv:2203.00274","DOI":"10.18653\/v1\/2022.acl-long.40"},{"key":"e_1_3_3_130_2","doi-asserted-by":"crossref","unstructured":"Jingfeng Yang Haoming Jiang Qingyu Yin Danqing Zhang Bing Yin and Diyi Yang. 2022. SEQZERO: Few-shot compositional semantic parsing with sequential prompts and zero-shot models. Retrieved from https:\/\/arXiv:2205.07381","DOI":"10.18653\/v1\/2022.findings-naacl.5"},{"key":"e_1_3_3_131_2","first-page":"446","volume-title":"Proceedings of the 6th Conference on Machine Translation","author":"Yang Jian","year":"2021","unstructured":"Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan, Xia Song, and Furu Wei. 2021. Multilingual machine translation systems from microsoft for WMT21 shared task. In Proceedings of the 6th Conference on Machine Translation. Association for Computational Linguistics, Online, 446\u2013455. Retrieved from https:\/\/aclanthology.org\/2021.wmt-1.54"},{"key":"e_1_3_3_132_2","unstructured":"Shunyu Yao Dian Yu Jeffrey Zhao Izhak Shafran Thomas L. Griffiths Yuan Cao and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. Retrieved from https:\/\/arXiv:2305.10601"},{"key":"e_1_3_3_133_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1404"},{"key":"e_1_3_3_134_2","unstructured":"Kang Min Yoo Dongju Park Jaewook Kang Sang-Woo Lee and Woomyeong Park. 2021. Gpt3mix: Leveraging large-scale language models for text augmentation. Retrieved from https:\/\/arXiv:2104.08826"},{"key":"e_1_3_3_135_2","unstructured":"Jiayi Yuan Ruixiang Tang Xiaoqian Jiang and Xia Hu. 2023. LLM for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability. Retrieved from https:\/\/arXiv:2303.16756"},{"key":"e_1_3_3_136_2","unstructured":"Daochen Zha Zaid Pervaiz Bhat Kwei-Herng Lai Fan Yang Zhimeng Jiang Shaochen Zhong and Xia Hu. 2023. Data-centric artificial intelligence: A survey. Retrieved from https:\/\/arXiv:2303.10158"},{"key":"e_1_3_3_137_2","first-page":"11328","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhang Jingqing","year":"2020","unstructured":"Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on Machine Learning. PMLR, 11328\u201311339."},{"key":"e_1_3_3_138_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin et\u00a0al. 2022. Opt: Open pre-trained transformer language models. Retrieved from https:\/\/arXiv:2205.01068"},{"key":"e_1_3_3_139_2","doi-asserted-by":"crossref","unstructured":"Tianyi Zhang Faisal Ladhak Esin Durmus Percy Liang Kathleen McKeown and Tatsunori B. Hashimoto. 2023. Benchmarking large language models for news summarization. Retrieved from https:\/\/arXiv:2301.13848","DOI":"10.1162\/tacl_a_00632"},{"key":"e_1_3_3_140_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et\u00a0al. 2023. A survey of large language models. Retrieved from https:\/\/arXiv:2303.18223"},{"key":"e_1_3_3_141_2","first-page":"12697","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhao Zihao","year":"2021","unstructured":"Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the International Conference on Machine Learning. PMLR, 12697\u201312706."},{"key":"e_1_3_3_142_2","unstructured":"Qihuang Zhong Liang Ding Juhua Liu Bo Du and Dacheng Tao. 2023. Can chatgpt understand too? A comparative study on chatgpt and fine-tuned BERT. Retrieved from https:\/\/arXiv:2302.10198"},{"key":"e_1_3_3_143_2","unstructured":"Ce Zhou Qian Li Chen Li Jun Yu Yixin Liu Guangjing Wang Kai Zhang Cheng Ji Qiben Yan Lifang He et\u00a0al. 2023. A comprehensive survey on pretrained foundation models: A history from BERT to chatgpt. Retrieved from https:\/\/arXiv:2302.09419"},{"key":"e_1_3_3_144_2","unstructured":"Kaiyang Zhou Ziwei Liu Yu Qiao Tao Xiang and Chen Change Loy. 2022. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 4 (2022) 4396\u20134415."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3649506","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T12:07:06Z","timestamp":1714219626000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649506"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,26]]},"references-count":143,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,7,31]]}},"alternative-id":["10.1145\/3649506"],"URL":"http:\/\/dx.doi.org\/10.1145\/3649506","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,26]]},"assertion":[{"value":"2023-06-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}