{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T07:35:53Z","timestamp":1775720153657,"version":"3.50.1"},"reference-count":227,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,9,14]],"date-time":"2023-09-14T00:00:00Z","timestamp":1694649600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,2,29]]},"abstract":"<jats:p>Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a generic, latent representation of language from a generic task once, then share it across disparate NLP tasks. Language modeling serves as the generic task, one with abundant self-supervised text available for extensive training. This article presents the key fundamental concepts of PLM architectures and a comprehensive view of the shift to PLM-driven NLP techniques. It surveys work applying the pre-training then fine-tuning, prompting, and text generation approaches. In addition, it discusses PLM limitations and suggested directions for future research.<\/jats:p>","DOI":"10.1145\/3605943","type":"journal-article","created":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T12:32:43Z","timestamp":1687869163000},"page":"1-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":977,"title":["Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6114-8418","authenticated-orcid":false,"given":"Bonan","family":"Min","sequence":"first","affiliation":[{"name":"Amazon AWS AI Labs, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2288-3847","authenticated-orcid":false,"given":"Hayley","family":"Ross","sequence":"additional","affiliation":[{"name":"Harvard University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9859-7313","authenticated-orcid":false,"given":"Elior","family":"Sulem","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6826-8113","authenticated-orcid":false,"given":"Amir Pouran Ben","family":"Veyseh","sequence":"additional","affiliation":[{"name":"University of Oregon, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3768-4736","authenticated-orcid":false,"given":"Thien Huu","family":"Nguyen","sequence":"additional","affiliation":[{"name":"University of Oregon, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0890-7670","authenticated-orcid":false,"given":"Oscar","family":"Sainz","sequence":"additional","affiliation":[{"name":"University of the Basque Country (UPV\/EHU), Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0195-4899","authenticated-orcid":false,"given":"Eneko","family":"Agirre","sequence":"additional","affiliation":[{"name":"University of the Basque Country (UPV\/EHU), Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2707-2376","authenticated-orcid":false,"given":"Ilana","family":"Heintz","sequence":"additional","affiliation":[{"name":"Synoptic Engineering, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1447-5173","authenticated-orcid":false,"given":"Dan","family":"Roth","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,9,14]]},"reference":[{"key":"e_1_3_2_2_2","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Abend Omri","year":"2013","unstructured":"Omri Abend and Ari Rappoport. 2013. Universal Conceptual Cognitive Annotation (UCCA). In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)."},{"key":"e_1_3_2_3_2","unstructured":"Zeyuan Allen-Zhu and Yuanzhi Li. 2021. Towards understanding ensemble knowledge distillation and self-distillation in deep learning. https:\/\/arxiv.org\/abs\/2012.09816"},{"key":"e_1_3_2_4_2","unstructured":"Asaf Amrami and Yoav Goldberg. 2019. Towards better substitution-based word sense induction. https:\/\/arxiv.org\/abs\/1905.12598"},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Mikel Artetxe Jingfei Du Naman Goyal Luke Zettlemoyer and Ves Stoyanov. 2022. On the Role of Bidirectionality in Language Model Pre-Training. https:\/\/arxiv.org\/abs\/2205.11726","DOI":"10.18653\/v1\/2022.findings-emnlp.293"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.27"},{"key":"e_1_3_2_7_2","unstructured":"Stephen H. Bach Victor Sanh Zheng-Xin Yong Albert Webson Colin Raffel Nihal V. Nayak Abheesht Sharma Taewoon Kim M. Saiful Bari Thibault Fevry Zaid Alyafeai Manan Dey Andrea Santilli Zhiqing Sun Srulik Ben-David Canwen Xu Gunjan Chhablani Han Wang Jason Alan Fries Maged S. Al-shaibani Shanya Sharma Urmish Thakker Khalid Almubarak Xiangru Tang Dragomir Radev Mike Tian-Jian Jiang and Alexander M. Rush. 2022. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. https:\/\/arxiv.org\/abs\/2202.01279"},{"key":"e_1_3_2_8_2","unstructured":"Geoff Bacon and Terry Regier. 2019. Does BERT agree? Evaluating knowledge of structure dependence through agreement relations. https:\/\/arxiv.org\/abs\/1908.09892"},{"key":"e_1_3_2_9_2","volume-title":"Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse","author":"Banarescu Laura","year":"2013","unstructured":"Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse."},{"key":"e_1_3_2_10_2","unstructured":"Jack Bandy and Nicholas Vincent. 2021. Addressing \u201cDocumentation Debt\u201d in Machine Learning Research: A Retrospective Datasheet for BookCorpus. https:\/\/arxiv.org\/abs\/2105.05241"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1165"},{"key":"e_1_3_2_12_2","unstructured":"Eyal Ben-David Nadav Oved and Roi Reichart. 2021. PADA: A prompt-based autoregressive approach for adaptation to unseen domains. https:\/\/arxiv.org\/abs\/2102.12206"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.463"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1075"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.152"},{"key":"e_1_3_2_17_2","first-page":"15834","article-title":"The lottery ticket hypothesis for pre-trained BERT networks","volume":"33","author":"Chen Tianlong","year":"2020","unstructured":"Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, and Michael Carbin. 2020. The lottery ticket hypothesis for pre-trained BERT networks. Advances in Neural Information Processing Systems 33 (2020), 15834\u201315846.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Xiang Chen Ningyu Zhang Xin Xie Shumin Deng Yunzhi Yao Chuanqi Tan Fei Huang Luo Si and Huajun Chen. 2021. KnowPrompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. https:\/\/arxiv.org\/abs\/2104.07650","DOI":"10.1145\/3485447.3511998"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.spnlp-1.9"},{"key":"e_1_3_2_20_2","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Clark Christopher","year":"2019","unstructured":"Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. 2019. BoolQ: Exploring the surprising difficulty of natural yes\/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-4828"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390177"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"Alexis Conneau Kartikay Khandelwal Naman Goyal Vishrav Chaudhary Guillaume Wenzek Francisco Guzm\u00e1n Edouard Grave Myle Ott Luke Zettlemoyer and Veselin Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. https:\/\/arxiv.org\/abs\/1911.02116","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1269"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.301"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.161"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.5555\/2463101"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1285"},{"key":"e_1_3_2_30_2","first-page":"933","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Dauphin Yann N.","year":"2017","unstructured":"Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In Proceedings of the International Conference on Machine Learning. PMLR, 933\u2013941."},{"key":"e_1_3_2_31_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Cao Nicola De","year":"2021","unstructured":"Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2021. Autoregressive entity retrieval. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)."},{"key":"e_1_3_2_32_2","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.98"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.49"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-short.83"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.70"},{"key":"e_1_3_2_37_2","unstructured":"Avia Efrat and Omer Levy. 2020. The turking test: Can language models understand instructions?https:\/\/arxiv.org\/abs\/2010.11982"},{"issue":"19","key":"e_1_3_2_38_2","article-title":"Why does unsupervised pre-training help deep learning?","volume":"11","author":"Erhan Dumitru","year":"2010","unstructured":"Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research 11, 19 (2010).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","unstructured":"Jack W. Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et\u00a0al. 2021. Scaling Language Models: Methods Analysis and Insights from Training Gopher. DOI:10.48550\/ARXIV.2112.11446","DOI":"10.48550\/ARXIV.2112.11446"},{"key":"e_1_3_2_40_2","first-page":"1877","volume-title":"Advances in Neural Information Processing Systems","author":"al. Tom Brown et","year":"2020","unstructured":"Tom Brown et al.2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877\u20131901."},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","year":"2020","unstructured":"Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et\u00a0al. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations."},{"key":"e_1_3_2_42_2","unstructured":"Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Teven Le Scao Arun Raja et\u00a0al. 2021. Multitask prompted training enables zero-shot task generalization. arxiv:2110.08207."},{"key":"e_1_3_2_43_2","unstructured":"Yu Sun et al.2021. ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. https:\/\/arxiv.org\/abs\/2107.02137"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00298"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","unstructured":"Joe Davison Joshua Feldman and Alexander Rush. 2019. Commonsense knowledge mining from pretrained models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics. 1173\u20131178. https:\/\/arxiv.org\/abs\/1909.00505","DOI":"10.18653\/v1\/D19-1109"},{"key":"e_1_3_2_46_2","article-title":"Probing and fine-tuning reading comprehension models for few-shot event extraction","author":"Feng Rui","year":"2020","unstructured":"Rui Feng, Jie Yuan, and Chao Zhang. 2020. Probing and fine-tuning reading comprehension models for few-shot event extraction. arXiv:2010.11325.","journal-title":"arXiv:2010.11325"},{"key":"e_1_3_2_47_2","unstructured":"Steven Fincke Shantanu Agarwal Scott Miller and Elizabeth Boschee. 2021. Language model priming for cross-lingual event extraction. https:\/\/arxiv.org\/abs\/2109.12383"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/7585.001.0001"},{"key":"e_1_3_2_49_2","article-title":"The pile: An 800GB dataset of diverse text for language modeling","author":"Gao Leo","year":"2020","unstructured":"Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et\u00a0al. 2020. The pile: An 800GB dataset of diverse text for language modeling. arXiv:2101.00027.","journal-title":"arXiv:2101.00027"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.nlp4convai-1.10"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-5932"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.295"},{"key":"e_1_3_2_53_2","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Gehman Samuel","year":"2020","unstructured":"Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020. Online."},{"key":"e_1_3_2_54_2","unstructured":"Yoav Goldberg. 2019. Assessing BERT\u2019s syntactic abilities. https:\/\/arxiv.org\/abs\/1901.05287"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1108"},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","unstructured":"Demi Guo Alexander M. Rush and Yoon Kim. 2021. Parameter-efficient transfer learning with diff pruning. arXiv: 2012.07463.","DOI":"10.18653\/v1\/2021.acl-long.378"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1533"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.381"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.261"},{"key":"e_1_3_2_60_2","unstructured":"Xu Han Weilin Zhao Ning Ding Zhiyuan Liu and Maosong Sun. 2021. PTR: Prompt tuning with rules for text classification. https:\/\/arxiv.org\/abs\/2105.11259"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.316"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.772"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.244"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3467017"},{"key":"e_1_3_2_66_2","volume-title":"International Conference on Machine Learning","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1031"},{"key":"e_1_3_2_68_2","volume-title":"Findings of the Association for Computational Linguistics (ACL-IJCNLP\u201921)","author":"Hsu Chao-Chun","year":"2021","unstructured":"Chao-Chun Hsu, Eric Lind, Luca Soldaini, and Alessandro Moschitti. 2021. Answer generation for retrieval-based question answering systems. In Findings of the Association for Computational Linguistics (ACL-IJCNLP\u201921)."},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"Patrick Huber Armen Aghajanyan Barlas O\u011fuz Dmytro Okhonko Wen tau Yih Sonal Gupta and Xilun Chen. 2021. CCQA: A new web-scale question answering dataset for model pre-training. https:\/\/arxiv.org\/abs\/2110.07731","DOI":"10.18653\/v1\/2022.findings-naacl.184"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.conll-1.49"},{"key":"e_1_3_2_71_2","unstructured":"Minyoung Huh Pulkit Agrawal and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning?arXiv: 1608.08614."},{"key":"e_1_3_2_72_2","first-page":"2","volume-title":"Proceedings of the 6th Linguistic Annotation Workshop","author":"Ivanova Angelina","year":"2012","unstructured":"Angelina Ivanova, Stephan Oepen, Lilja \u00d8vrelid, and Dan Flickinger. 2012. Who did what to whom? A contrastive study of syntacto-semantic dependencies. In Proceedings of the 6th Linguistic Annotation Workshop. Association for Computational Linguistics, 2\u201311."},{"key":"e_1_3_2_73_2","doi-asserted-by":"crossref","unstructured":"Peter Izsak Moshe Berchansky and Omer Levy. 2021. How to train BERT with an academic budget. https:\/\/arxiv.org\/abs\/2104.07705","DOI":"10.18653\/v1\/2021.emnlp-main.831"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S19-2002"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.479"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00324"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00300"},{"key":"e_1_3_2_78_2","doi-asserted-by":"crossref","unstructured":"Mandar Joshi Danqi Chen Yinhan Liu Daniel S. Weld Luke Zettlemoyer and Omer Levy. 2020. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8 (2020) 64\u201377.","DOI":"10.1162\/tacl_a_00300"},{"key":"e_1_3_2_79_2","doi-asserted-by":"crossref","unstructured":"Mandar Joshi Omer Levy Daniel S. Weld and Luke Zettlemoyer. 2019. BERT for coreference resolution: Baselines and analysis. arxiv:1908.09091.","DOI":"10.18653\/v1\/D19-1588"},{"key":"e_1_3_2_80_2","article-title":"Scaling laws for neural language models","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361.","journal-title":"arXiv:2001.08361"},{"key":"e_1_3_2_81_2","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Khashabi Daniel","year":"2020","unstructured":"Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. 2020. UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics: EMNLP 2020."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1445"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00447"},{"key":"e_1_3_2_84_2","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Kumar Sawan","year":"2021","unstructured":"Sawan Kumar and Partha Talukdar. 2021. Reordering examples helps during priming-based few-shot learning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online."},{"key":"e_1_3_2_85_2","doi-asserted-by":"crossref","unstructured":"John Lafferty and Chengxiang Zhai. 2001. Document language models query models and risk minimization for information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval . 111\u2013119.","DOI":"10.1145\/383952.383970"},{"key":"e_1_3_2_86_2","unstructured":"John D. Lafferty Andrew McCallum and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning . 282\u2013289."},{"key":"e_1_3_2_87_2","unstructured":"Huiyuan Lai Antonio Toral and Malvina Nissim. 2021. Thank you BART! Rewarding pre-trained models improves formality style transfer. https:\/\/arxiv.org\/abs\/2105.06947"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1549"},{"key":"e_1_3_2_89_2","article-title":"ALBERT: A lite BERT for self-supervised learning of language representations","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942.","journal-title":"arXiv:1909.11942"},{"key":"e_1_3_2_90_2","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Scao Teven Le","year":"2021","unstructured":"Teven Le Scao and Alexander Rush. 2021. How many data points is a prompt worth?. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online."},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2108"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.243"},{"key":"e_1_3_2_94_2","volume-title":"Proceedings of the 13th International Conference on the Principles of Knowledge Representation and Reasoning.","author":"Levesque Hector","year":"2012","unstructured":"Hector Levesque, Ernest Davis, and Leora Morgenstern. 2012. The Winograd schema challenge. In Proceedings of the 13th International Conference on the Principles of Knowledge Representation and Reasoning."},{"key":"e_1_3_2_95_2","volume-title":"Advances in Neural Information Processing Systems","author":"Levy Omer","year":"2014","unstructured":"Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems, Vol. 27. Curran Associates, Inc."},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K17-1034"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_2_98_2","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Li Fayuan","year":"2020","unstructured":"Fayuan Li, Weihua Peng, Yuguang Chen, Quan Wang, Lu Pan, Yajuan Lyu, and Yong Zhu. 2020. Event extraction as multi-turn question answering. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online."},{"key":"e_1_3_2_99_2","unstructured":"Junyi Li Tianyi Tang Wayne Xin Zhao and Ji-Rong Wen. 2021. Pretrained language models for text generation: A survey. https:\/\/arxiv.org\/abs\/2105.10311"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1198"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.69"},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-5505"},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.519"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1129"},{"key":"e_1_3_2_105_2","unstructured":"Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. (2021). https:\/\/arxiv.org\/abs\/2101.00190"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.510"},{"key":"e_1_3_2_107_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.465"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00115"},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.128"},{"key":"e_1_3_2_110_2","unstructured":"Jiachang Liu Dinghan Shen Yizhe Zhang Bill Dolan Lawrence Carin and Weizhu Chen. 2021. What makes good in-context examples for GPT-3?. https:\/\/arxiv.org\/abs\/2101.06804"},{"key":"e_1_3_2_111_2","unstructured":"Pengfei Liu Weizhe Yuan Jinlan Fu Zhengbao Jiang Hiroaki Hayashi and Graham Neubig. 2021. Pre-train prompt and predict: A systematic survey of prompting methods in natural language processing. arXiv: 2107.13586."},{"key":"e_1_3_2_112_2","unstructured":"Xiao Liu Yanan Zheng Zhengxiao Du Ming Ding Yujie Qian Zhilin Yang and Jie Tang. 2021. GPT understands too. https:\/\/arxiv.org\/abs\/2103.10385"},{"key":"e_1_3_2_113_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692.","journal-title":"arXiv:1907.11692"},{"key":"e_1_3_2_114_2","doi-asserted-by":"crossref","unstructured":"Robert L. Logan IV Ivana Balaz\u0306evi\u0107 Eric Wallace Fabio Petroni Sameer Singh and Sebastian Riedel. 2021. Cutting down on prompts and parameters: Simple few-shot learning with language models. https:\/\/arxiv.org\/abs\/2106.13353","DOI":"10.18653\/v1\/2022.findings-acl.222"},{"key":"e_1_3_2_115_2","doi-asserted-by":"crossref","unstructured":"Yao Lu Max Bartolo Alastair Moore Sebastian Riedel and Pontus Stenetorp. 2021. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. https:\/\/arxiv.org\/abs\/2104.08786","DOI":"10.18653\/v1\/2022.acl-long.556"},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.217"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-short.42"},{"key":"e_1_3_2_118_2","volume-title":"The CHILDES Project: Tools for Analyzing Talk. Transcription Format and Programs","author":"MacWhinney Brian","year":"2000","unstructured":"Brian MacWhinney. 2000. The CHILDES Project: Tools for Analyzing Talk. Transcription Format and Programs. Vol. 1. Psychology Press."},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00304"},{"key":"e_1_3_2_120_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2089"},{"key":"e_1_3_2_121_2","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.","journal-title":"arXiv:1301.3781"},{"key":"e_1_3_2_122_2","doi-asserted-by":"crossref","unstructured":"Swaroop Mishra Daniel Khashabi Chitta Baral and Hannaneh Hajishirzi. 2021. Cross-task generalization via natural language crowdsourcing instructions. https:\/\/arxiv.org\/abs\/2104.08773","DOI":"10.18653\/v1\/2022.acl-long.244"},{"key":"e_1_3_2_123_2","doi-asserted-by":"crossref","unstructured":"Mahdi Namazifar Alexandros Papangelis Gokhan Tur and Dilek Hakkani-T\u00fcr. 2020. Language model is all you need: Natural language understanding as question answering. https:\/\/arxiv.org\/abs\/2011.03023","DOI":"10.1109\/ICASSP39728.2021.9413810"},{"key":"e_1_3_2_124_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1451"},{"key":"e_1_3_2_125_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-demos.10"},{"key":"e_1_3_2_126_2","unstructured":"Xuan-Phi Nguyen Shafiq Joty Steven C. H. Hoi and Richard Socher. 2020. Tree-structured attention with hierarchical accumulation. https:\/\/arxiv.org\/abs\/2002.08046"},{"key":"e_1_3_2_127_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.441"},{"key":"e_1_3_2_128_2","volume-title":"Findings of the Association for Computational Linguistics (EMNLP)","author":"Nogueira Rodrigo","year":"2020","unstructured":"Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics (EMNLP)."},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.134"},{"key":"e_1_3_2_130_2","unstructured":"Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul Christiano Jan Leike and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. https:\/\/arxiv.org\/abs\/2203.02155"},{"key":"e_1_3_2_131_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Paolini Giovanni","year":"2021","unstructured":"Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero dos Santos Nogueira, Bing Xiang, and Stefano Soatto. 2021. Structured prediction as translation between augmented natural languages. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)."},{"key":"e_1_3_2_132_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2022.3148714"},{"key":"e_1_3_2_133_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_134_2","unstructured":"Ethan Perez Douwe Kiela and Kyunghyun Cho. 2021. True few-shot learning with language models. https:\/\/arxiv.org\/abs\/2105.11447"},{"key":"e_1_3_2_135_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_2_136_2","volume-title":"Proceedings of AKBC","author":"Petroni Fabio","year":"2020","unstructured":"Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rockt\u00e4schel, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2020. How context affects language models\u2019 factual predictions. In Proceedings of AKBC."},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1250"},{"key":"e_1_3_2_138_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.7"},{"key":"e_1_3_2_139_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.617"},{"key":"e_1_3_2_140_2","doi-asserted-by":"crossref","unstructured":"Jay M. Ponte and W. Bruce Croft. 2017. A language modeling approach to information retrieval. ACM SIGIR Forum 51 2 (2017) 202\u2013208.","DOI":"10.1145\/3130348.3130368"},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.259"},{"key":"e_1_3_2_142_2","unstructured":"Raul Puri and Bryan Catanzaro. 2019. Zero-shot text classification with generative language models. https:\/\/arxiv.org\/abs\/1912.10165"},{"key":"e_1_3_2_143_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.410"},{"key":"e_1_3_2_144_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11431-020-1647-3"},{"key":"e_1_3_2_145_2","first-page":"12","article-title":"Improving language understanding by generative pre-training","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI Blog (2018), 12.","journal-title":"OpenAI Blog"},{"issue":"8","key":"e_1_3_2_146_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_147_2","series-title":"Proceedings of Machine Learning Research","first-page":"2435","volume-title":"Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics","volume":"108","author":"Radiya-Dixit Evani","year":"2020","unstructured":"Evani Radiya-Dixit and Xin Wang. 2020. How fine can fine-tuning be? Learning efficient language models. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 108), Silvia Chiappa and Roberto Calandra (Eds.). 2435\u20132443."},{"key":"e_1_3_2_148_2","volume-title":"Journal of Machine Learning Research","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (2020)."},{"key":"e_1_3_2_149_2","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Robin Jia and Percy Liang. 2018. Know what you don\u2019t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) Association for Computational Linguistics. 784\u2013789.","DOI":"10.18653\/v1\/P18-2124"},{"key":"e_1_3_2_150_2","volume-title":"Findings of the Association for Computational Linguistics (ACL-IJCNLP\u201921)","author":"Ren Liliang","year":"2021","unstructured":"Liliang Ren, Chenkai Sun, Heng Ji, and Julia Hockenmaier. 2021. HySPA: Hybrid span generation for scalable text-to-graph extraction. In Findings of the Association for Computational Linguistics (ACL-IJCNLP\u201921)."},{"key":"e_1_3_2_151_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411763.3451760"},{"key":"e_1_3_2_152_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00349"},{"key":"e_1_3_2_153_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380064"},{"key":"e_1_3_2_154_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.689"},{"key":"e_1_3_2_155_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.228"},{"key":"e_1_3_2_156_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.92"},{"key":"e_1_3_2_157_2","first-page":"44","volume-title":"Proceedings of the 11th Global Wordnet Conference","author":"Sainz Oscar","year":"2021","unstructured":"Oscar Sainz and German Rigau. 2021. Ask2Transformers: Zero-shot domain labelling with pretrained language models. In Proceedings of the 11th Global Wordnet Conference. Global Wordnet Association, University of South Africa (UNISA), 44\u201352."},{"key":"e_1_3_2_158_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.240"},{"key":"e_1_3_2_159_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2020. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. https:\/\/arxiv.org\/abs\/1910.01108"},{"key":"e_1_3_2_160_2","unstructured":"Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Teven Le Scao Arun Raja Manan Dey M. Saiful Bari Canwen Xu Urmish Thakker Shanya Sharma Sharma Eliza Szczechla Taewoon Kim Gunjan Chhablani Nihal Nayak Debajyoti Datta Jonathan Chang Mike Tian-Jian Jiang Han Wang Matteo Manica Sheng Shen Zheng Xin Yong Harshit Pandey Rachel Bawden Thomas Wang Trishala Neeraj Jos Rozen Abheesht Sharma Andrea Santilli Thibault Fevry Jason Alan Fries Ryan Teehan Tali Bers Stella Biderman Leo Gao Thomas Wolf and Alexander M. Rush. 2022. Multitask prompted training enables zero-shot task generalization. https:\/\/arxiv.org\/abs\/2110.08207"},{"key":"e_1_3_2_161_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.488"},{"key":"e_1_3_2_162_2","unstructured":"Timo Schick and Hinrich Sch\u00fctze. 2019. Rare words: A major problem for contextualized embeddings and how to fix it by attentive mimicking. https:\/\/arxiv.org\/abs\/1904.06707"},{"key":"e_1_3_2_163_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.368"},{"key":"e_1_3_2_164_2","unstructured":"Timo Schick and Hinrich Sch\u00fctze. 2020. Few-shot text generation with pattern-exploiting training. (2020). https:\/\/arxiv.org\/abs\/2012.11926"},{"key":"e_1_3_2_165_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.20"},{"key":"e_1_3_2_166_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.185"},{"key":"e_1_3_2_167_2","doi-asserted-by":"crossref","unstructured":"Timo Schick and Hinrich Sch\u00fctze. 2021. Generating datasets with pretrained language models. https:\/\/arxiv.org\/abs\/2104.07540","DOI":"10.18653\/v1\/2021.emnlp-main.555"},{"key":"e_1_3_2_168_2","article-title":"Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP","volume":"9","author":"Schick Timo","year":"2021","unstructured":"Timo Schick, Sahana Udupa, and Hinrich Sch\u00fctze. 2021. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. Transactions of the Asscociation for Computational Linguistics 9 (2021).","journal-title":"Transactions of the Asscociation for Computational Linguistics"},{"key":"e_1_3_2_169_2","doi-asserted-by":"publisher","DOI":"10.1145\/3381831"},{"key":"e_1_3_2_170_2","doi-asserted-by":"crossref","unstructured":"Richard Shin Christopher Lin Sam Thomson Charles Chen Subhro Roy Emmanouil Antonios Platanios Adam Pauls Dan Klein Jason Eisner and Benjamin Van Durme. 2021. Constrained language models yield few-shot semantic parsers.","DOI":"10.18653\/v1\/2021.emnlp-main.608"},{"key":"e_1_3_2_171_2","doi-asserted-by":"crossref","unstructured":"Taylor Shin Yasaman Razeghi Robert L. Logan IV Eric Wallace and Sameer Singh. 2020. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics. 4222\u20134235.","DOI":"10.18653\/v1\/2020.emnlp-main.346"},{"key":"e_1_3_2_172_2","unstructured":"F\u00e1bio Souza Rodrigo Nogueira and Roberto Lotufo. 2020. Portuguese named entity recognition using BERT-CRF. https:\/\/arxiv.org\/abs\/1909.10649"},{"key":"e_1_3_2_173_2","first-page":"5986","volume-title":"Proceedings of the 36th International Conference on Machine Learning","author":"Stickland Asa Cooper","year":"2019","unstructured":"Asa Cooper Stickland and Iain Murray. 2019. BERT and PALs: Projected attention layers for efficient adaptation in multi-task learning. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 5986\u20135995."},{"key":"e_1_3_2_174_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1355"},{"key":"e_1_3_2_175_2","first-page":"37","volume-title":"Proceedings of the 1st International Workshop on NLP Solutions for Under Resourced Languages (NSURL\u201919) co-located with ICNLSP 2019 - Short Papers","author":"Taher Ehsan","year":"2019","unstructured":"Ehsan Taher, Seyed Abbas Hoseini, and Mehrnoush Shamsfard. 2019. Beheshti-NER: Persian named entity recognition using BERT. In Proceedings of the 1st International Workshop on NLP Solutions for Under Resourced Languages (NSURL\u201919) co-located with ICNLSP 2019 - Short Papers. Association for Computational Linguistics, Trento, Italy, 37\u201342."},{"key":"e_1_3_2_176_2","doi-asserted-by":"crossref","unstructured":"Alon Talmor Yanai Elazar Yoav Goldberg and Jonathan Berant. 2020. oLMpics\u2014On what language model pre-training captures. https:\/\/arxiv.org\/abs\/1912.13283","DOI":"10.1162\/tacl_a_00342"},{"key":"e_1_3_2_177_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1452"},{"key":"e_1_3_2_178_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1074"},{"key":"e_1_3_2_179_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. LLaMA: Open and efficient foundation language models. https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_180_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1503"},{"key":"e_1_3_2_181_2","unstructured":"Trieu H. Trinh and Quoc V. Le. 2019. A simple method for commonsense reasoning. https:\/\/arxiv.org\/abs\/1806.02847"},{"key":"e_1_3_2_182_2","unstructured":"Maria Tsimpoukelli Jacob Menick Serkan Cabi S. M. Ali Eslami Oriol Vinyals and Felix Hill. 2021. Multimodal few-shot learning with frozen language models. https:\/\/arxiv.org\/abs\/2106.13884"},{"key":"e_1_3_2_183_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.180"},{"key":"e_1_3_2_184_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_2_185_2","article-title":"Pointer networks","volume":"28","author":"Vinyals Oriol","year":"2015","unstructured":"Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. Advances in Neural Information Processing Systems, Vol. 28.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_186_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1221"},{"key":"e_1_3_2_187_2","unstructured":"Ben Wang. 2021. Mesh-transformer-JAX: Model-parallel implementation of transformer language model with JAX. Retrieved May2021 from https:\/\/github.com\/kingoflolz\/mesh-transformer-jax."},{"key":"e_1_3_2_188_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.251"},{"key":"e_1_3_2_189_2","unstructured":"Sinong Wang Han Fang Madian Khabsa Hanzi Mao and Hao Ma. 2021. Entailment as few-shot learner. https:\/\/arxiv.org\/abs\/2104.14690"},{"key":"e_1_3_2_190_2","unstructured":"Thomas Wang Adam Roberts Daniel Hesslow Teven Le Scao Hyung Won Chung Iz Beltagy Julien Launay and Colin Raffel. 2022. What language model architecture and pPretraining objective work bBest for zero-shot generalization?https:\/\/arxiv.org\/abs\/2204.05832"},{"key":"e_1_3_2_191_2","doi-asserted-by":"crossref","unstructured":"Xinyu Wang Yong Jiang Nguyen Bach Tao Wang Zhongqiang Huang Fei Huang and Kewei Tu. 2021. Automated concatenation of embeddings for structured prediction. https:\/\/arxiv.org\/abs\/2010.05006","DOI":"10.18653\/v1\/2021.acl-long.206"},{"key":"e_1_3_2_192_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.340"},{"key":"e_1_3_2_193_2","volume-title":"Proceedings of the Society for Computation in Linguistics 2020","author":"Warstadt Alex","year":"2020","unstructured":"Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, and Samuel R. Bowman. 2020. BLiMP: A benchmark of linguistic minimal pairs for English. In Proceedings of the Society for Computation in Linguistics 2020."},{"key":"e_1_3_2_194_2","unstructured":"Alex Warstadt Amanpreet Singh and Samuel R. Bowman. 2019. CoLA: The corpus of linguistic acceptability (with added annotations). http:\/\/nyu-mll.github.io\/cola."},{"key":"e_1_3_2_195_2","doi-asserted-by":"crossref","unstructured":"Albert Webson and Ellie Pavlick. 2021. Do prompt-based models really understand the meaning of their prompts?https:\/\/arxiv.org\/abs\/2109.01247","DOI":"10.18653\/v1\/2022.naacl-main.167"},{"key":"e_1_3_2_196_2","unstructured":"Jason Wei Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai and Quoc V. Le. 2021. Finetuned language models are zero-shot learners. https:\/\/arxiv.org\/abs\/2109.01652"},{"key":"e_1_3_2_197_2","article-title":"Ethical and social risks of harm from language models","author":"Weidinger Laura","year":"2021","unstructured":"Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, et\u00a0al. 2021. Ethical and social risks of harm from language models. arXiv:2112.04359).","journal-title":"arXiv:2112.04359"},{"key":"e_1_3_2_198_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1101"},{"key":"e_1_3_2_199_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.622"},{"key":"e_1_3_2_200_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00363"},{"key":"e_1_3_2_201_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.420"},{"key":"e_1_3_2_202_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.188"},{"key":"e_1_3_2_203_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.451"},{"key":"e_1_3_2_204_2","volume-title":"Advances in Neural Information Processing Systems","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc."},{"key":"e_1_3_2_205_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.122"},{"key":"e_1_3_2_206_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1404"},{"key":"e_1_3_2_207_2","volume-title":"Advances in Neural Information Processing Systems","author":"Yosinski Jason","year":"2014","unstructured":"Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. Advances in Neural Information Processing Systems, Vol. 27. Curran Associates, Inc."},{"key":"e_1_3_2_208_2","unstructured":"Wenhao Yu Chenguang Zhu Zaitang Li Zhiting Hu Qingyun Wang Heng Ji and Meng Jiang. 2021. A survey of knowledge-enhanced text generation. https:\/\/arxiv.org\/abs\/2010.04389"},{"key":"e_1_3_2_209_2","unstructured":"Weizhe Yuan Graham Neubig and Pengfei Liu. 2021. BARTScore: Evaluating generated text as text generation. https:\/\/arxiv.org\/abs\/2106.11520"},{"key":"e_1_3_2_210_2","volume-title":"Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing (SwissText\/KONVENS\u201920)CoRR","volume":"2007","author":"Zaczynska Karolina","year":"2020","unstructured":"Karolina Zaczynska, Nils Feldhus, Robert Schwarzenberg, Aleksandra Gabryszak, and Sebastian M\u00f6ller. 2020. Evaluating German transformer language models with syntactic agreement tests, In Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing (SwissText\/KONVENS\u201920). CoRR abs\/2007.03765."},{"key":"e_1_3_2_211_2","doi-asserted-by":"crossref","unstructured":"Elad Ben Zaken Shauli Ravfogel and Yoav Goldberg. 2021. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. https:\/\/arxiv.org\/abs\/2106.10199","DOI":"10.18653\/v1\/2022.acl-short.1"},{"key":"e_1_3_2_212_2","doi-asserted-by":"crossref","unstructured":"Jeffrey O. Zhang Alexander Sax Amir Zamir Leonidas Guibas and Jitendra Malik. 2020. Side-Tuning: A baseline for network adaptation via additive side networks. https:\/\/arxiv.org\/abs\/1912.13503","DOI":"10.1007\/978-3-030-58580-8_41"},{"key":"e_1_3_2_213_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1009"},{"key":"e_1_3_2_214_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-short.64"},{"key":"e_1_3_2_215_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.90"},{"key":"e_1_3_2_216_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/560"},{"key":"e_1_3_2_217_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6510"},{"key":"e_1_3_2_218_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.174"},{"key":"e_1_3_2_219_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/546"},{"key":"e_1_3_2_220_2","unstructured":"Tony Z. Zhao Eric Wallace Shi Feng Dan Klein and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. https:\/\/arxiv.org\/abs\/2102.09690"},{"key":"e_1_3_2_221_2","volume-title":"Findings of the Association for Computational Linguistics (EMNLP)","author":"Zhong Ruiqi","year":"2021","unstructured":"Ruiqi Zhong, Kristy Lee, Zheng Zhang, and Dan Klein. 2021. Adapting language models for zero-shot learning by meta-tuning on dataset and prompt collections. In Findings of the Association for Computational Linguistics (EMNLP)."},{"key":"e_1_3_2_222_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.398"},{"key":"e_1_3_2_223_2","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Zhou Junru","year":"2020","unstructured":"Junru Zhou, Zhuosheng Zhang, Hai Zhao, and Shuailiang Zhang. 2020. LIMIT-BERT: Linguistics informed multi-task BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online."},{"key":"e_1_3_2_224_2","unstructured":"Li Zhou and Kevin Small. 2020. Multi-domain dialogue state tracking as dynamic knowledge graph enhanced question answering. https:\/\/arxiv.org\/abs\/1911.06192"},{"key":"e_1_3_2_225_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.397"},{"key":"e_1_3_2_226_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i16.17720"},{"key":"e_1_3_2_227_2","unstructured":"Jinhua Zhu Yingce Xia Lijun Wu Di He Tao Qin Wengang Zhou Houqiang Li and Tie-Yan Liu. 2020. Incorporating BERT into neural machine translation. https:\/\/arxiv.org\/abs\/2002.06823"},{"key":"e_1_3_2_228_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3605943","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3605943","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:19Z","timestamp":1750178179000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3605943"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,14]]},"references-count":227,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,2,29]]}},"alternative-id":["10.1145\/3605943"],"URL":"https:\/\/doi.org\/10.1145\/3605943","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,14]]},"assertion":[{"value":"2022-01-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}