{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T15:18:28Z","timestamp":1780586308868,"version":"3.54.1"},"reference-count":288,"publisher":"Association for Computing Machinery (ACM)","issue":"9","license":[{"start":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T00:00:00Z","timestamp":1714003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62222215"],"award-info":[{"award-number":["62222215"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Natural Science Foundation","award":["4222027 and L233008"],"award-info":[{"award-number":["4222027 and L233008"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>Text Generation aims to produce plausible and readable text in human language from input data. The resurgence of deep learning has greatly advanced this field, in particular, with the help of neural generation models based on pre-trained language models (PLMs). Text generation based on PLMs is viewed as a promising approach in both academia and industry. In this article, we provide a survey on the utilization of PLMs in text generation. We begin with introducing two key aspects of applying PLMs to text generation: (1) how to design an effective PLM to serve as the generation model; and (2) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties. Then, we show the major challenges that have arisen in these aspects, as well as possible solutions for them. We also include a summary of various useful resources and typical text generation applications based on PLMs. Finally, we highlight the future research directions which will further improve these PLMs for text generation. This comprehensive survey is intended to help researchers interested in text generation problems to learn the core concepts, the main techniques and the latest developments in this area based on PLMs.<\/jats:p>","DOI":"10.1145\/3649449","type":"journal-article","created":{"date-parts":[[2024,3,7]],"date-time":"2024-03-07T11:49:07Z","timestamp":1709812147000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":162,"title":["Pre-Trained Language Models for Text Generation: A Survey"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-0480-5593","authenticated-orcid":false,"given":"Junyi","family":"Li","sequence":"first","affiliation":[{"name":"Renmin University of China, Beijing, China and Universit\u00e9 de Montr\u00e9al, Montr\u00e9al, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4352-8222","authenticated-orcid":false,"given":"Tianyi","family":"Tang","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8333-6196","authenticated-orcid":false,"given":"Wayne Xin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1556-3335","authenticated-orcid":false,"given":"Jian-Yun","family":"Nie","sequence":"additional","affiliation":[{"name":"Universit\u00e9 de Montr\u00e9al, Montr\u00e9al, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9777-9676","authenticated-orcid":false,"given":"Ji-Rong","family":"Wen","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,4,25]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Daniel Adiwardana Minh-Thang Luong David R. So Jamie Hall Noah Fiedel Romal Thoppilan Zi Yang Apoorv Kulshreshtha Gaurav Nemade Yifeng Lu and Quoc V. Le. 2020. Towards a human-like open-domain chatbot. CoRR abs\/2001.09977 (2020)."},{"key":"e_1_3_2_3_2","volume-title":"WMT@ACL","author":"Agarwal Abhaya","year":"2008","unstructured":"Abhaya Agarwal and Alon Lavie. 2008. Meteor, M-BLEU and M-TER: Evaluation metrics for high-correlation with human rankings of machine translation output. In WMT@ACL."},{"key":"e_1_3_2_4_2","unstructured":"Alpaca-LoRA. 2023. Instruct-Tune LLaMA on Consumer Hardware. https:\/\/github.com\/tloen\/alpaca-lora. (2023)."},{"key":"e_1_3_2_5_2","unstructured":"Amanda Askell Yuntao Bai Anna Chen Dawn Drain Deep Ganguli Tom Henighan Andy Jones Nicholas Joseph Benjamin Mann Nova DasSarma Nelson Elhage Zac Hatfield-Dodds Danny Hernandez Jackson Kernion Kamal Ndousse Catherine Olsson Dario Amodei Tom B. Brown Jack Clark Sam McCandlish Chris Olah and Jared Kaplan. 2021. A general language assistant as a laboratory for alignment. CoRR abs\/2112.00861 (2021)."},{"key":"e_1_3_2_6_2","volume-title":"ICLR","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR."},{"key":"e_1_3_2_7_2","volume-title":"ACL","author":"Bai Yu","year":"2021","unstructured":"Yu Bai, Yang Gao, and Heyan Huang. 2021. Cross-lingual abstractive summarization with limited parallel resources. In ACL."},{"key":"e_1_3_2_8_2","volume-title":"IEEvaluation@ACL","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In IEEvaluation@ACL."},{"key":"e_1_3_2_9_2","unstructured":"Hangbo Bao Li Dong Furu Wei Wenhui Wang Nan Yang Xiaodong Liu Yu Wang Jianfeng Gao Songhao Piao Ming Zhou and Hsiao-Wuen Hon. 2020. UniLMv2: Pseudo-masked language models for unified language model pre-training. In ICML (Proceedings of Machine Learning Research Vol. 119). PMLR 642\u2013652."},{"key":"e_1_3_2_10_2","volume-title":"ACL","author":"Bao Siqi","year":"2020","unstructured":"Siqi Bao, Huang He, Fan Wang, Hua Wu, and Haifeng Wang. 2020. PLATO: Pre-trained dialogue generation model with discrete latent variable. In ACL."},{"key":"e_1_3_2_11_2","volume-title":"EACL","author":"Belz Anja","year":"2006","unstructured":"Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of NLG systems. In EACL."},{"key":"e_1_3_2_12_2","volume-title":"EMNLP","author":"Bi Bin","year":"2020","unstructured":"Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, and Luo Si. 2020. PALM: Pre-training an autoencoding & autoregressive language model for context-conditioned generation. In EMNLP."},{"key":"e_1_3_2_13_2","volume-title":"1st Conference on Machine Translation (WMT 2016, colocated with ACL 2016","author":"Bojar Ondrej","year":"2016","unstructured":"Ondrej Bojar, Yvette Graham, Amir Kamran, and Milos Stanojevic. 2016. Results of the WMT16 metrics shared task. In 1st Conference on Machine Translation (WMT 2016, colocated with ACL 2016)."},{"key":"e_1_3_2_14_2","article-title":"A statistical approach to machine translation","author":"Brown Peter F.","year":"1990","unstructured":"Peter F. Brown, John Cocke, Stephen Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Comput. Linguistics (1990).","journal-title":"Comput. Linguistics"},{"key":"e_1_3_2_15_2","article-title":"An estimate of an upper bound for the entropy of english","author":"Brown Peter F.","year":"1992","unstructured":"Peter F. Brown, Stephen Della Pietra, Vincent J. Della Pietra, Jennifer C. Lai, and Robert L. Mercer. 1992. An estimate of an upper bound for the entropy of english. Comput. Linguistics (1992).","journal-title":"Comput. Linguistics"},{"key":"e_1_3_2_16_2","volume-title":"TMI","author":"Brown Ralf","year":"1995","unstructured":"Ralf Brown and Robert Frederking. 1995. Applying statistical english language modeling to symbolic machine translation. In TMI."},{"key":"e_1_3_2_17_2","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language models are few-shot learners. In NeurIPS Hugo Larochelle Marc\u2019Aurelio Ranzato Raia Hadsell Maria-Florina Balcan and Hsuan-Tien Lin (Eds.)."},{"key":"e_1_3_2_18_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott M. Lundberg Harsha Nori Hamid Palangi Marco T\u00falio Ribeiro and Yi Zhang. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR abs\/2303.12712 (2023)."},{"key":"e_1_3_2_19_2","volume-title":"NGT@EMNLP-IJCNLP","author":"Budzianowski Pawel","year":"2019","unstructured":"Pawel Budzianowski and Ivan Vulic. 2019. \u201dHello, it\u2019s GPT-2 - How can I help you?\u201d Towards the use of pretrained language models for task-oriented dialogue systems. In NGT@EMNLP-IJCNLP."},{"key":"e_1_3_2_20_2","volume-title":"EMNLP","author":"Cahyawijaya Samuel","year":"2021","unstructured":"Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, and Pascale Fung. 2021. IndoNLG: Benchmark and resources for evaluating indonesian natural language generation. In EMNLP."},{"key":"e_1_3_2_21_2","volume-title":"t3rd Workshop on Statistical Machine Translation, WMT@ACL 2008","author":"Callison-Burch Chris","year":"2008","unstructured":"Chris Callison-Burch, Cameron S. Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In t3rd Workshop on Statistical Machine Translation, WMT@ACL 2008."},{"key":"e_1_3_2_22_2","volume-title":"ACL\/IJCNLP","author":"Cao Shuyang","year":"2021","unstructured":"Shuyang Cao and Lu Wang. 2021. Controllable open-ended question generation with a new question type ontology. In ACL\/IJCNLP."},{"key":"e_1_3_2_23_2","article-title":"Evaluation of text generation: A survey","author":"Celikyilmaz Asli","year":"2020","unstructured":"Asli Celikyilmaz, Elizabeth Clark, and Jianfeng Gao. 2020. Evaluation of text generation: A survey. arXiv (2020).","journal-title":"arXiv"},{"key":"e_1_3_2_24_2","article-title":"VisualGPT: Data-efficient adaptation of pretrained language models for image captioning","author":"Chen Jun","year":"2021","unstructured":"Jun Chen, Han Guo, Kai Yi, Boyang Li, and Mohamed Elhoseiny. 2021. VisualGPT: Data-efficient adaptation of pretrained language models for image captioning. arXiv (2021).","journal-title":"arXiv"},{"key":"e_1_3_2_25_2","volume-title":"EMNLP","author":"Chen Jiaao","year":"2020","unstructured":"Jiaao Chen and Diyi Yang. 2020. Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization. In EMNLP."},{"key":"e_1_3_2_26_2","volume-title":"EMNLP","author":"Chen Jiaao","year":"2021","unstructured":"Jiaao Chen and Diyi Yang. 2021. Simple conversational data augmentation for semi-supervised abstractive dialogue summarization. In EMNLP."},{"key":"e_1_3_2_27_2","volume-title":"NAACL-HLT","author":"Chen Jiaao","year":"2021","unstructured":"Jiaao Chen and Diyi Yang. 2021. Structure-aware abstractive conversation summarization via discourse and action graphs. In NAACL-HLT."},{"key":"e_1_3_2_28_2","volume-title":"NeurIPS","author":"Chen Liqun","year":"2018","unstructured":"Liqun Chen, Shuyang Dai, Chenyang Tao, Haichao Zhang, Zhe Gan, Dinghan Shen, Yizhe Zhang, Guoyin Wang, Ruiyi Zhang, and Lawrence Carin. 2018. Adversarial text generation via feature-mover\u2019s distance. In NeurIPS."},{"key":"e_1_3_2_29_2","article-title":"Extending context window of large language models via positional interpolation","volume":"2306","author":"Chen Shouyuan","year":"2023","unstructured":"Shouyuan Chen, Sherman Wong, Liangjian Chen, and Yuandong Tian. 2023. Extending context window of large language models via positional interpolation. CoRR abs\/2306.15595 (2023).","journal-title":"CoRR"},{"key":"e_1_3_2_30_2","volume-title":"EMNLP","author":"Chen Wenhu","year":"2020","unstructured":"Wenhu Chen, Yu Su, Xifeng Yan, and William Yang Wang. 2020. KGPT: Knowledge-grounded pre-training for data-to-text generation. In EMNLP."},{"key":"e_1_3_2_31_2","article-title":"Microsoft COCO captions: Data collection and evaluation server","author":"Chen Xinlei","year":"2015","unstructured":"Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2015. Microsoft COCO captions: Data collection and evaluation server. arXiv (2015).","journal-title":"arXiv"},{"key":"e_1_3_2_32_2","volume-title":"ACL","author":"Chen Yen-Chun","year":"2020","unstructured":"Yen-Chun Chen, Zhe Gan, Yu Cheng, Jingzhou Liu, and Jingjing Liu. 2020. Distilling knowledge learned in BERT for text generation. In ACL."},{"key":"e_1_3_2_33_2","volume-title":"AAAI","author":"Chen Yi-Syuan","year":"2021","unstructured":"Yi-Syuan Chen and Hong-Han Shuai. 2021. Meta-transfer learning for low-resource abstractive summarization. In AAAI."},{"key":"e_1_3_2_34_2","volume-title":"ACL","author":"Chen Zhiyu","year":"2020","unstructured":"Zhiyu Chen, Harini Eavani, Wenhu Chen, Yinyin Liu, and William Yang Wang. 2020. Few-shot NLG with pre-trained language model. In ACL."},{"key":"e_1_3_2_35_2","volume-title":"ACL","author":"Chiang David Cheng-Han","year":"2023","unstructured":"David Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations?. In ACL."},{"key":"e_1_3_2_36_2","volume-title":"NeurIPS","author":"Conneau Alexis","year":"2019","unstructured":"Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In NeurIPS."},{"key":"e_1_3_2_37_2","unstructured":"Marta R. Costa-juss\u00e0 James Cross Onur \u00c7elebi Maha Elbayad Kenneth Heafield Kevin Heffernan Elahe Kalbassi Janice Lam Daniel Licht Jean Maillard Anna Y. Sun Skyler Wang Guillaume Wenzek Al Youngblood Bapi Akula Lo\u00efc Barrault Gabriel Mejia Gonzalez Prangthip Hansanti John Hoffman Semarley Jarrett Kaushik Ram Sadagopan Dirk Rowe Shannon Spruit Chau Tran Pierre Andrews Necip Fazil Ayan Shruti Bhosale Sergey Edunov Angela Fan Cynthia Gao Vedanuj Goswami Francisco Guzm\u00e1n Philipp Koehn Alexandre Mourachko Christophe Ropers Safiyyah Saleem Holger Schwenk and Jeff Wang. 2022. No language left behind: Scaling human-centered machine translation. CoRR abs\/2207.04672 (2022)."},{"key":"e_1_3_2_38_2","unstructured":"Raj Dabre. YANMTT: Yet Another Neural Machine Translation Toolkit. (n.d.). https:\/\/github.com\/prajdabre\/yanmtt"},{"key":"e_1_3_2_39_2","article-title":"A survey of multilingual neural machine translation","author":"Dabre Raj","year":"2020","unstructured":"Raj Dabre, Chenhui Chu, and Anoop Kunchukuttan. 2020. A survey of multilingual neural machine translation. CSUR (2020).","journal-title":"CSUR"},{"key":"e_1_3_2_40_2","volume-title":"EMNLP","author":"Dabre Raj","year":"2019","unstructured":"Raj Dabre, Atsushi Fujita, and Chenhui Chu. 2019. Exploiting multilingualism through multistage fine-tuning for low-resource neural machine translation. In EMNLP."},{"key":"e_1_3_2_41_2","volume-title":"Findings of ACL","author":"Dabre Raj","year":"2022","unstructured":"Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra, and Pratyush Kumar. 2022. IndicBART: A pre-trained model for indic natural language generation. In Findings of ACL."},{"key":"e_1_3_2_42_2","volume-title":"ICLR","author":"Dathathri Sumanth","year":"2020","unstructured":"Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and play language models: A simple approach to controlled text generation. In ICLR."},{"key":"e_1_3_2_43_2","article-title":"LLM.int8(): 8-Bit matrix multiplication for transformers at scale","volume":"2208","author":"Dettmers Tim","year":"2022","unstructured":"Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8(): 8-Bit matrix multiplication for transformers at scale. CoRR abs\/2208.07339 (2022).","journal-title":"CoRR"},{"key":"e_1_3_2_44_2","volume-title":"NAACL-HLT","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT."},{"key":"e_1_3_2_45_2","article-title":"DAGA: Data augmentation with a generation approach for low-resource tagging tasks","author":"Ding Bosheng","year":"2020","unstructured":"Bosheng Ding, Linlin Liu, Lidong Bing, Canasai Kruengkrai, Thien Hai Nguyen, Shafiq R. Joty, Luo Si, and Chunyan Miao. 2020. DAGA: Data augmentation with a generation approach for low-resource tagging tasks. arXiv (2020).","journal-title":"arXiv"},{"key":"e_1_3_2_46_2","volume-title":"NeurIPS","author":"Dong Li","year":"2019","unstructured":"Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In NeurIPS."},{"key":"e_1_3_2_47_2","volume-title":"EMNLP","author":"Dong Yue","year":"2020","unstructured":"Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, and Jingjing Liu. 2020. Multi-fact correction in abstractive text summarization. In EMNLP."},{"key":"e_1_3_2_48_2","article-title":"All NLP tasks are generation tasks: A general pretraining framework","author":"Du Zhengxiao","year":"2021","unstructured":"Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2021. All NLP tasks are generation tasks: A general pretraining framework. arXiv preprint arXiv:2103.10360 (2021).","journal-title":"arXiv preprint arXiv:2103.10360"},{"key":"e_1_3_2_49_2","doi-asserted-by":"crossref","DOI":"10.1016\/j.eswa.2020.113679","article-title":"Automatic text summarization: A comprehensive survey","author":"El-Kassas Wafaa S.","year":"2021","unstructured":"Wafaa S. El-Kassas, Cherif R. Salama, Ahmed A. Rafea, and Hoda K. Mohamed. 2021. Automatic text summarization: A comprehensive survey. Expert Syst. Appl. (2021).","journal-title":"Expert Syst. Appl."},{"key":"e_1_3_2_50_2","volume-title":"NeurIPS","author":"Elsayed Gamaleldin F.","year":"2018","unstructured":"Gamaleldin F. Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, and Samy Bengio. 2018. Large margin deep networks for classification. In NeurIPS."},{"key":"e_1_3_2_51_2","volume-title":"NAACL-HLT","author":"Fabbri Alexander R.","year":"2021","unstructured":"Alexander R. Fabbri, Simeng Han, Haoyuan Li, Haoran Li, Marjan Ghazvininejad, Shafiq R. Joty, Dragomir R. Radev, and Yashar Mehdad. 2021. Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. In NAACL-HLT."},{"key":"e_1_3_2_52_2","article-title":"Summeval: Re-evaluating summarization evaluation","author":"Fabbri Alexander R.","year":"2021","unstructured":"Alexander R. Fabbri, Wojciech Kry\u015bci\u0144ski, Bryan McCann, Caiming Xiong, Richard Socher, and Dragomir Radev. 2021. Summeval: Re-evaluating summarization evaluation. TACL (2021).","journal-title":"TACL"},{"key":"e_1_3_2_53_2","volume-title":"ICLR","author":"Fan Angela","year":"2020","unstructured":"Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing transformer depth on demand with structured dropout. In ICLR."},{"key":"e_1_3_2_54_2","volume-title":"ACL","author":"Fan Angela","year":"2018","unstructured":"Angela Fan, Mike Lewis, and Yann N. Dauphin. 2018. Hierarchical neural story generation. In ACL."},{"key":"e_1_3_2_55_2","article-title":"Unsupervised pre-training for sequence to sequence speech recognition","author":"Fan Zhiyun","year":"2019","unstructured":"Zhiyun Fan, Shiyu Zhou, and Bo Xu. 2019. Unsupervised pre-training for sequence to sequence speech recognition. arXiv preprint arXiv:1910.12418 (2019).","journal-title":"arXiv preprint arXiv:1910.12418"},{"key":"e_1_3_2_56_2","article-title":"Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity","volume":"23","author":"Fedus William","year":"2022","unstructured":"William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23 (2022).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_57_2","article-title":"A survey on dialogue summarization: Recent advances and new frontiers","author":"Feng Xiachong","year":"2021","unstructured":"Xiachong Feng, Xiaocheng Feng, and Bing Qin. 2021. A survey on dialogue summarization: Recent advances and new frontiers. arXiv preprint arXiv:2107.03175 (2021).","journal-title":"arXiv preprint arXiv:2107.03175"},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang and Ming Zhou. 2020. CodeBERT: A pre-trained model for programming and natural languages. In EMNLP Findings (Findings of ACL Vol. EMNLP 2020) Trevor Cohn Yulan He and Yang Liu (Eds.). Association for Computational Linguistics 1536\u20131547.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_2_59_2","volume-title":"NeurIPS","author":"Frantar Elias","year":"2022","unstructured":"Elias Frantar and Dan Alistarh. 2022. Optimal brain compression: A framework for accurate post-training quantization and pruning. In NeurIPS."},{"key":"e_1_3_2_60_2","article-title":"GPTQ: Accurate post-training quantization for generative pre-trained transformers","author":"Frantar Elias","year":"2022","unstructured":"Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2022. GPTQ: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323 (2022).","journal-title":"arXiv preprint arXiv:2210.17323"},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","unstructured":"Markus Freitag Nitika Mathur Chi-kiu Lo Eleftherios Avramidis Ricardo Rei Brian Thompson Tom Kocmi Fr\u00e9d\u00e9ric Blain Daniel Deutsch Craig Stewart Chrysoula Zerva Sheila Castilho Alon Lavie and George F. Foster. 2023. Results of WMT23 metrics shared task: Metrics might be guilty but references are not innocent. In Proceedings of the Eighth Conference on Machine Translation WMT 2023 Singapore December 6-7 2023 Philipp Koehn Barry Haddon Tom Kocmi and Christof Monz (Eds.). Association for Computational Linguistics 578\u2013628.","DOI":"10.18653\/v1\/2023.wmt-1.51"},{"key":"e_1_3_2_62_2","doi-asserted-by":"crossref","unstructured":"Prakhar Ganesh Yao Chen Xin Lou Mohammad Ali Khan Yin Yang Hassan Sajjad Preslav Nakov Deming Chen and Marianne Winslett. 2020. Compressing large-scale transformer-based models: A case study on BERT. Trans. Assoc. Comput. Linguistics 9 (2021) 1061\u20131080.","DOI":"10.1162\/tacl_a_00413"},{"key":"e_1_3_2_63_2","volume-title":"ACL","author":"Gao Tianyu","year":"2021","unstructured":"Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making pre-trained language models better few-shot learners. In ACL."},{"key":"e_1_3_2_64_2","article-title":"Neural language generation: Formulation, methods, and evaluation","author":"Garbacea Cristina","year":"2020","unstructured":"Cristina Garbacea and Qiaozhu Mei. 2020. Neural language generation: Formulation, methods, and evaluation. arXiv preprint arXiv:2007.15780 (2020).","journal-title":"arXiv preprint arXiv:2007.15780"},{"key":"e_1_3_2_65_2","volume-title":"EMNLP Findings","author":"Garcia Xavier","year":"2020","unstructured":"Xavier Garcia, Pierre Foret, Thibault Sellam, and Ankur P. Parikh. 2020. A multilingual view of unsupervised machine translation. In EMNLP Findings."},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","DOI":"10.3390\/app11073184","article-title":"A survey on bias in deep NLP","author":"Garrido-Mu\u00f1oz Ismael","year":"2021","unstructured":"Ismael Garrido-Mu\u00f1oz, Arturo Montejo-R\u00e1ez, Fernando Mart\u00ednez-Santiago, and L. Alfonso Ure\u00f1a-L\u00f3pez. 2021. A survey on bias in deep NLP. Applied Sciences (2021).","journal-title":"Applied Sciences"},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","unstructured":"Sebastian Gehrmann Tosin P. Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Aremu Anuoluwapo Antoine Bosselut Khyathi Raghavi Chandu Miruna-Adriana Clinciu Dipanjan Das Kaustubh D. Dhole Wanyu Du Esin Durmus Ondrej Dusek Chris Emezue Varun Gangal Cristina Garbacea Tatsunori Hashimoto Yufang Hou Yacine Jernite Harsh Jhamtani Yangfeng Ji Shailza Jolly Dhruv Kumar Faisal Ladhak Aman Madaan Mounica Maddela Khyati Mahajan Saad Mahamood Bodhisattwa Prasad Majumder Pedro Henrique Martins Angelina McMillan-Major Simon Mille Emiel van Miltenburg Moin Nadeem Shashi Narayan Vitaly Nikolaev Rubungo Andre Niyongabo Salomey Osei Ankur P. Parikh Laura Perez-Beltrachini Niranjan Ramesh Rao Vikas Raunak Juan Diego Rodriguez Sashank Santhanam Jo\u00e3o Sedoc Thibault Sellam Samira Shaikh Anastasia Shimorina Marco Antonio Sobrevilla Cabezudo Hendrik Strobelt Nishant Subramani Wei Xu Diyi Yang Akhila Yerukola and Jiawei Zhou. 2021. The gem benchmark: Natural language generation its evaluation and metrics. CoRR abs\/2102.01672 (2021).","DOI":"10.18653\/v1\/2021.gem-1.10"},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","unstructured":"Sebastian Gehrmann Abhik Bhattacharjee Abinaya Mahendiran Alex Wang Alexandros Papangelis Aman Madaan Angelina McMillan-Major Anna Shvets Ashish Upadhyay and Bernd Bohnet. 2022. Gemv2: Multilingual NLG benchmarking in a single line of code. In Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing EMNLP 2022 - System Demonstrations Abu Dhabi UAE December 7-11 2022 Wanxiang Che and Ekaterina Shutova (Eds.). Association for Computational Linguistics 266\u2013281.","DOI":"10.18653\/v1\/2022.emnlp-demos.27"},{"key":"e_1_3_2_69_2","article-title":"On the strengths of cross-attention in pretrained transformers for machine translation","author":"Gheini Mozhdeh","year":"2021","unstructured":"Mozhdeh Gheini, Xiang Ren, and Jonathan May. 2021. On the strengths of cross-attention in pretrained transformers for machine translation. arXiv preprint arXiv:2104.08771 (2021).","journal-title":"arXiv preprint arXiv:2104.08771"},{"key":"e_1_3_2_70_2","article-title":"A survey of quantization methods for efficient neural network inference","author":"Gholami Amir","year":"2021","unstructured":"Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv (2021).","journal-title":"arXiv"},{"key":"e_1_3_2_71_2","unstructured":"Amelia Glaese Nat McAleese Maja Trebacz John Aslanides Vlad Firoiu Timo Ewalds Maribeth Rauh Laura Weidinger Martin J. Chadwick Phoebe Thacker Lucy Campbell-Gillingham Jonathan Uesato Po-Sen Huang Ramona Comanescu Fan Yang Abigail See Sumanth Dathathri Rory Greig Charlie Chen Doug Fritz Jaume Sanchez Elias Richard Green Sona Mokr\u00e1 Nicholas Fernando Boxi Wu Rachel Foley Susannah Young Iason Gabriel William Isaac John Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks and Geoffrey Irving. 2022. Improving alignment of dialogue agents via targeted human judgements. CoRR abs\/2209.14375 (2022)."},{"key":"e_1_3_2_72_2","volume-title":"ACL","author":"Golovanov Sergey","year":"2019","unstructured":"Sergey Golovanov, Rauf Kurbanov, Sergey I. Nikolenko, Kyryl Truskovskyi, Alexander Tselousov, and Thomas Wolf. 2019. Large-scale transfer learning for natural language generation. In ACL."},{"key":"e_1_3_2_73_2","volume-title":"COLING","author":"Gong Heng","year":"2020","unstructured":"Heng Gong, Yawei Sun, Xiaocheng Feng, Bing Qin, Wei Bi, Xiaojiang Liu, and Ting Liu. 2020. TableGPT: Few-shot table-to-text generation with table structure reconstruction and content matching. In COLING."},{"key":"e_1_3_2_74_2","volume-title":"EMNLP Findings","author":"Goodwin Travis R.","year":"2020","unstructured":"Travis R. Goodwin, Max E. Savery, and Dina Demner-Fushman. 2020. Towards zero shot conditional summarization with adaptive multi-task fine-tuning. In EMNLP Findings."},{"key":"e_1_3_2_75_2","volume-title":"RepL4NLP@ACL","author":"Gordon Mitchell A.","year":"2020","unstructured":"Mitchell A. Gordon, Kevin Duh, and Nicholas Andrews. 2020. Compressing BERT: Studying the effects of weight pruning on transfer learning. In RepL4NLP@ACL."},{"key":"e_1_3_2_76_2","volume-title":"ICLR (Poster)","author":"Gu Jiatao","year":"2018","unstructured":"Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, and Richard Socher. 2018. Non-autoregressive neural machine translation. In ICLR (Poster)."},{"key":"e_1_3_2_77_2","volume-title":"ACL\/IJCNLP Short","author":"Gu Jing","year":"2021","unstructured":"Jing Gu, Qingyang Wu, Chongruo Wu, Weiyan Shi, and Zhou Yu. 2021. PRAL: A tailored pre-training model for task-oriented dialog generation. In ACL\/IJCNLP Short."},{"key":"e_1_3_2_78_2","volume-title":"AAAI","author":"Gu Xiaodong","year":"2021","unstructured":"Xiaodong Gu, Kang Min Yoo, and Jung-Woo Ha. 2021. DialogBERT: Discourse-aware response generation via learning to recover and rank utterances. In AAAI."},{"key":"e_1_3_2_79_2","article-title":"Response generation with context-aware prompt learning","author":"Gu Xiaodong","year":"2021","unstructured":"Xiaodong Gu, Kang Min Yoo, and Sang-Woo Lee. 2021. Response generation with context-aware prompt learning. arXiv preprint arXiv:2111.02643 (2021).","journal-title":"arXiv preprint arXiv:2111.02643"},{"key":"e_1_3_2_80_2","article-title":"A knowledge-enhanced pretraining model for commonsense story generation","author":"Guan Jian","year":"2020","unstructured":"Jian Guan, Fei Huang, Minlie Huang, Zhihao Zhao, and Xiaoyan Zhu. 2020. A knowledge-enhanced pretraining model for commonsense story generation. TACL (2020).","journal-title":"TACL"},{"key":"e_1_3_2_81_2","volume-title":"ICLR","author":"Gunel Beliz","year":"2021","unstructured":"Beliz Gunel, Jingfei Du, Alexis Conneau, and Veselin Stoyanov. 2021. Supervised contrastive learning for pre-trained language model fine-tuning. In ICLR."},{"key":"e_1_3_2_82_2","article-title":"Reweighted proximal pruning for large-scale language representation","author":"Guo Fu-Ming","year":"2019","unstructured":"Fu-Ming Guo, Sijia Liu, Finlay S. Mungall, Xue Lin, and Yanzhi Wang. 2019. Reweighted proximal pruning for large-scale language representation. arXiv preprint arXiv:1909.12486 (2019).","journal-title":"arXiv preprint arXiv:1909.12486"},{"key":"e_1_3_2_83_2","volume-title":"NAACL-HLT (Findings)","author":"Guo Mandy","year":"2022","unstructured":"Mandy Guo, Joshua Ainslie, David C. Uthus, Santiago Onta\u00f1\u00f3n, Jianmo Ni, Yun-Hsuan Sung, and Yinfei Yang. 2022. LongT5: Efficient text-to-text transformer for long sequences. In NAACL-HLT (Findings)."},{"key":"e_1_3_2_84_2","volume-title":"ACL","author":"Guo Yue","year":"2022","unstructured":"Yue Guo, Yi Yang, and Ahmed Abbasi. 2022. Auto-Debias: Debiasing masked language models with automated biased prompts. In ACL."},{"key":"e_1_3_2_85_2","doi-asserted-by":"crossref","unstructured":"Xu Han Zhengyan Zhang Ning Ding Yuxian Gu Xiao Liu Yuqi Huo Jiezhong Qiu Yuan Yao Ao Zhang Liang Zhang Wentao Han Minlie Huang Qin Jin Yanyan Lan Yang Liu Zhiyuan Liu Zhiwu Lu Xipeng Qiu Ruihua Song Jie Tang Ji-Rong Wen Jinhui Yuan Wayne Xin Zhao and Jun Zhu. 2021. Pre-trained models: Past present and future. AI Open 2 (2021) 225\u2013250.","DOI":"10.1016\/j.aiopen.2021.08.002"},{"key":"e_1_3_2_86_2","volume-title":"COLING","author":"Harkous Hamza","year":"2020","unstructured":"Hamza Harkous, Isabel Groves, and Amir Saffari. 2020. Have your text and use it too! End-to-end neural data-to-text generation with semantic fidelity. In COLING."},{"key":"e_1_3_2_87_2","doi-asserted-by":"crossref","unstructured":"Sadid A. Hasan and Oladimeji Farri. 2019. Clinical natural language processing with deep learning. In Data Science for Healthcare - Methodologies and Applications Sergio Consoli Diego Reforgiato Recupero and Milan Petkovic (Eds.). Springer 147\u2013171.","DOI":"10.1007\/978-3-030-05249-2_5"},{"key":"e_1_3_2_88_2","volume-title":"ICLR","author":"He Junxian","year":"2022","unstructured":"Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2022. Towards a unified view of parameter-efficient transfer learning. In ICLR."},{"key":"e_1_3_2_89_2","volume-title":"ICLR","author":"Holtzman Ari","year":"2020","unstructured":"Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The curious case of neural text degeneration. In ICLR."},{"key":"e_1_3_2_90_2","volume-title":"NeurIPS","author":"Hosseini-Asl Ehsan","year":"2020","unstructured":"Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher. 2020. A simple language model for task-oriented dialogue. In NeurIPS."},{"key":"e_1_3_2_91_2","volume-title":"NeurIPS","author":"Hou Lu","year":"2020","unstructured":"Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with adaptive width and depth. In NeurIPS."},{"key":"e_1_3_2_92_2","volume-title":"ICML","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In ICML."},{"key":"e_1_3_2_93_2","volume-title":"ICLR","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In ICLR."},{"key":"e_1_3_2_94_2","volume-title":"ACL\/IJCNLP","author":"Hua Xinyu","year":"2021","unstructured":"Xinyu Hua, Ashwin Sreevatsa, and Lu Wang. 2021. DYPLOC: Dynamic planning of content using mixed language models for text generation. In ACL\/IJCNLP."},{"key":"e_1_3_2_95_2","article-title":"Not all languages are created equal in LLMs: Improving multilingual capability by cross-lingual-thought prompting","author":"Huang Haoyang","year":"2023","unstructured":"Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, et\u00a0al. 2023. Not all languages are created equal in LLMs: Improving multilingual capability by cross-lingual-thought prompting. arXiv (2023).","journal-title":"arXiv"},{"key":"e_1_3_2_96_2","article-title":"Challenges in building intelligent open-domain dialog systems","author":"Huang Minlie","year":"2020","unstructured":"Minlie Huang, Xiaoyan Zhu, and Jianfeng Gao. 2020. Challenges in building intelligent open-domain dialog systems. TOIS (2020).","journal-title":"TOIS"},{"key":"e_1_3_2_97_2","volume-title":"ACL\/IJCNLP Findings","author":"Huang Xinting","year":"2021","unstructured":"Xinting Huang, Jianzhong Qi, Yu Sun, and Rui Zhang. 2021. Latent reasoning for low-resource question generation. In ACL\/IJCNLP Findings."},{"key":"e_1_3_2_98_2","unstructured":"Yanping Huang Youlong Cheng Ankur Bapna Orhan Firat Dehao Chen Mia Xu Chen HyoukJoong Lee Jiquan Ngiam Quoc V. Le Yonghui Wu and Zhifeng Chen. 2019. GPipe: Efficient training of giant neural networks using pipeline parallelism. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 NeurIPS 2019 December 8-14 2019 Vancouver BC Canada. 103\u2013112."},{"key":"e_1_3_2_99_2","doi-asserted-by":"crossref","unstructured":"Touseef Iqbal and Shaima Qureshi. 2022. The survey: Text generation models in deep learning. J. King Saud Univ. Comput. Inf. Sci. 34 6 Part A (2022) 2515\u20132528.","DOI":"10.1016\/j.jksuci.2020.04.001"},{"key":"e_1_3_2_100_2","doi-asserted-by":"crossref","DOI":"10.1162\/neco.1991.3.1.79","article-title":"Adaptive mixtures of local experts","author":"Jacobs Robert A.","year":"1991","unstructured":"Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. 1991. Adaptive mixtures of local experts. Neural Comput. (1991).","journal-title":"Neural Comput."},{"key":"e_1_3_2_101_2","article-title":"How can we know what language models know","author":"Jiang Zhengbao","year":"2020","unstructured":"Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig. 2020. How can we know what language models know. TACL (2020).","journal-title":"TACL"},{"key":"e_1_3_2_102_2","volume-title":"NeurIPS","author":"Jiang Zihang","year":"2020","unstructured":"Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, and Shuicheng Yan. 2020. ConvBERT: Improving BERT with span-based dynamic convolution. In NeurIPS."},{"key":"e_1_3_2_103_2","volume-title":"EMNLP Findings","author":"Jiao Xiaoqi","year":"2020","unstructured":"Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2020. TinyBERT: Distilling BERT for natural language understanding. In EMNLP Findings."},{"key":"e_1_3_2_104_2","volume-title":"ACL","author":"Jin Di","year":"2020","unstructured":"Di Jin, Zhijing Jin, Joey Tianyi Zhou, Lisa Orii, and Peter Szolovits. 2020. Hooks in the headline: Learning to generate headlines with controlled styles. In ACL."},{"key":"e_1_3_2_105_2","volume-title":"ACL","author":"Joshi Mandar","year":"2017","unstructured":"Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In ACL."},{"key":"e_1_3_2_106_2","volume-title":"EMNLP","author":"Kale Mihir","year":"2020","unstructured":"Mihir Kale and Abhinav Rastogi. 2020. Template guided text generation for task-oriented dialogue. In EMNLP."},{"key":"e_1_3_2_107_2","article-title":"AMMUS : A survey of transformer-based pretrained models in natural language processing","author":"Kalyan Katikapalli Subramanyam","year":"2021","unstructured":"Katikapalli Subramanyam Kalyan, Ajit Rajasekharan, and Sivanesan Sangeetha. 2021. AMMUS : A survey of transformer-based pretrained models in natural language processing. arXiv preprint arXiv:2108.05542 (2021).","journal-title":"arXiv preprint arXiv:2108.05542"},{"key":"e_1_3_2_108_2","article-title":"Scaling laws for neural language models","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv (2020).","journal-title":"arXiv"},{"key":"e_1_3_2_109_2","unstructured":"Andrej karpathy. 2023. nanoGPT: The Simplest Fastest Repository for Training\/Finetuning Medium-Sized GPTs."},{"key":"e_1_3_2_110_2","article-title":"The impact of positional encoding on length generalization in transformers","volume":"2305","author":"Kazemnejad Amirhossein","year":"2023","unstructured":"Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, and Siva Reddy. 2023. The impact of positional encoding on length generalization in transformers. CoRR abs\/2305.19466 (2023).","journal-title":"CoRR"},{"key":"e_1_3_2_111_2","article-title":"CTRL: A conditional transformer language model for controllable generation","author":"Keskar Nitish Shirish","year":"2019","unstructured":"Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858 (2019).","journal-title":"arXiv preprint arXiv:1909.05858"},{"key":"e_1_3_2_112_2","volume-title":"ICLR","author":"Khalifa Muhammad","year":"2021","unstructured":"Muhammad Khalifa, Hady Elsahar, and Marc Dymetman. 2021. A distributional approach to controlled text generation. In ICLR."},{"key":"e_1_3_2_113_2","volume-title":"EACL","author":"Kilickaya Mert","year":"2017","unstructured":"Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, and Erkut Erdem. 2017. Re-evaluating automatic metrics for image captioning. In EACL."},{"key":"e_1_3_2_114_2","unstructured":"James Kirkpatrick Razvan Pascanu Neil C. Rabinowitz Joel Veness Guillaume Desjardins Andrei A. Rusu Kieran Milan John Quan Tiago Ramalho Agnieszka Grabska-Barwinska Demis Hassabis Claudia Clopath Dharshan Kumaran and Raia Hadsell. 2016. Overcoming catastrophic forgetting in neural networks. CoRR abs\/1612.00796 (2016)."},{"key":"e_1_3_2_115_2","volume-title":"EMNLP-IJCNLP: System Demonstrations","author":"Kreutzer Julia","year":"2019","unstructured":"Julia Kreutzer, Jasmijn Bastings, and Stefan Riezler. 2019. Joey NMT: A minimalist NMT toolkit for novices. In EMNLP-IJCNLP: System Demonstrations."},{"key":"e_1_3_2_116_2","volume-title":"EMNLP","author":"Krishna Kalpesh","year":"2020","unstructured":"Kalpesh Krishna, John Wieting, and Mohit Iyyer. 2020. Reformulating unsupervised style transfer as paraphrase generation. In EMNLP."},{"key":"e_1_3_2_117_2","volume-title":"EMNLP","author":"Kryscinski Wojciech","year":"2018","unstructured":"Wojciech Kryscinski, Romain Paulus, Caiming Xiong, and Richard Socher. 2018. Improving abstraction in text summarization. In EMNLP."},{"key":"e_1_3_2_118_2","doi-asserted-by":"crossref","unstructured":"Aman Kumar Himani Shrotriya Prachi Sahu Amogh Mishra Raj Dabre Ratish Puduppully Anoop Kunchukuttan Mitesh M. Khapra and Pratyush Kumar. 2022. Indicnlg benchmark: Multilingual datasets for diverse nlg tasks in indic languages. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing EMNLP 2022 Abu Dhabi United Arab Emirates December 7-11 2022. Association for Computational Linguistics 5363\u20135394.","DOI":"10.18653\/v1\/2022.emnlp-main.360"},{"key":"e_1_3_2_119_2","volume-title":"EMNLP","author":"Lample Guillaume","year":"2018","unstructured":"Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, and Marc\u2019Aurelio Ranzato. 2018. Phrase-based & neural unsupervised machine translation. In EMNLP."},{"key":"e_1_3_2_120_2","volume-title":"ACL\/IJCNLP Short","author":"Le Hang","year":"2021","unstructured":"Hang Le, Juan Miguel Pino, Changhan Wang, Jiatao Gu, Didier Schwab, and Laurent Besacier. 2021. Lightweight adapter tuning for multilingual speech translation. In ACL\/IJCNLP Short."},{"key":"e_1_3_2_121_2","article-title":"Deep learning","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. 2015. Deep learning. Nat. (2015).","journal-title":"Nat."},{"key":"e_1_3_2_122_2","unstructured":"Dmitry Lepikhin HyoukJoong Lee Yuanzhong Xu Dehao Chen Orhan Firat Yanping Huang Maxim Krikun Noam Shazeer and Zhifeng Chen. 2021. GShard: Scaling giant models with conditional computation and automatic sharding. In 9th International Conference on Learning Representations ICLR 2021 Virtual Event Austria May 3-7 2021. OpenReview.net."},{"key":"e_1_3_2_123_2","volume-title":"International Conference on Machine Learning","author":"Leviathan Yaniv","year":"2023","unstructured":"Yaniv Leviathan, Matan Kalman, and Yossi Matias. 2023. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning."},{"key":"e_1_3_2_124_2","volume-title":"ACL","author":"Lewis Mike","year":"2020","unstructured":"Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In ACL."},{"key":"e_1_3_2_125_2","volume-title":"NAACL-HLT","author":"Li Jiwei","year":"2016","unstructured":"Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In NAACL-HLT."},{"key":"e_1_3_2_126_2","volume-title":"EMNLP","author":"Li Jiwei","year":"2014","unstructured":"Jiwei Li and Eduard H. Hovy. 2014. A model of coherence based on distributed sentence representation. In EMNLP."},{"key":"e_1_3_2_127_2","volume-title":"CIKM","author":"Li Junyi","year":"2020","unstructured":"Junyi Li, Siqing Li, Wayne Xin Zhao, Gaole He, Zhicheng Wei, Nicholas Jing Yuan, and Ji-Rong Wen. 2020. Knowledge-enhanced personalized review generation with capsule graph neural network. In CIKM."},{"key":"e_1_3_2_128_2","volume-title":"EMNLP","author":"Li Jianquan","year":"2020","unstructured":"Jianquan Li, Xiaokang Liu, Honghong Zhao, Ruifeng Xu, Min Yang, and Yaohong Jin. 2020. BERT-EMD: Many-to-many layer mapping for BERT compression with earth mover\u2019s distance. In EMNLP."},{"key":"e_1_3_2_129_2","volume-title":"ACL Demonstration","author":"Li Junyi","year":"2021","unstructured":"Junyi Li, Tianyi Tang, Gaole He, Jinhao Jiang, Xiaoxuan Hu, Puzhao Xie, Zhipeng Chen, Zhuohao Yu, Wayne Xin Zhao, and Ji-Rong Wen. 2021. TextBox: A unified, modularized, and extensible framework for text generation. In ACL Demonstration."},{"key":"e_1_3_2_130_2","volume-title":"EMNLP","author":"Li Junyi","year":"2022","unstructured":"Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2022. ELMER: A non-autoregressive pre-trained language model for efficient and effective text generation. In EMNLP."},{"key":"e_1_3_2_131_2","volume-title":"ACL\/IJCNLP Findings","author":"Li Junyi","year":"2021","unstructured":"Junyi Li, Tianyi Tang, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan, and Ji-Rong Wen. 2021. Few-shot knowledge graph-to-text generation with pretrained language models. In ACL\/IJCNLP Findings."},{"key":"e_1_3_2_132_2","article-title":"Pretrained language models for text generation: A survey","author":"Li Junyi","year":"2021","unstructured":"Junyi Li, Tianyi Tang, Wayne Xin Zhao, and Ji-Rong Wen. 2021. Pretrained language models for text generation: A survey. arXiv preprint arXiv:2105.10311 (2021).","journal-title":"arXiv preprint arXiv:2105.10311"},{"key":"e_1_3_2_133_2","volume-title":"SIGIR","author":"Li Junyi","year":"2021","unstructured":"Junyi Li, Wayne Xin Zhao, Zhicheng Wei, Nicholas Jing Yuan, and Ji-Rong Wen. 2021. Knowledge-based review generation by coherence enhanced text planning. In SIGIR."},{"key":"e_1_3_2_134_2","volume-title":"ACL","author":"Li Junyi","year":"2019","unstructured":"Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, and Yang Song. 2019. Generating long and informative reviews with aspect-aware coarse-to-fine decoding. In ACL."},{"key":"e_1_3_2_135_2","volume-title":"ACL","author":"Li Piji","year":"2020","unstructured":"Piji Li, Haisong Zhang, Xiaojiang Liu, and Shuming Shi. 2020. Rigid formats controlled text generation. In ACL."},{"key":"e_1_3_2_136_2","first-page":"12286","volume-title":"ACL (1)","author":"Li Xiang Lisa","year":"2023","unstructured":"Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2023. Contrastive decoding: Open-ended text generation as optimization. In ACL (1). Association for Computational Linguistics, 12286\u201312312."},{"key":"e_1_3_2_137_2","volume-title":"ACL","author":"Li Xiang Lisa","year":"2021","unstructured":"Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing continuous prompts for generation. In ACL."},{"key":"e_1_3_2_138_2","volume-title":"ACL\/IJCNLP","author":"Li Zekang","year":"2021","unstructured":"Zekang Li, Jinchao Zhang, Zhengcong Fei, Yang Feng, and Jie Zhou. 2021. Conversations are not flat: Modeling the dynamic information flow across dialogue utterances. In ACL\/IJCNLP."},{"key":"e_1_3_2_139_2","volume-title":"EMNLP Findings","author":"Li Zuchao","year":"2020","unstructured":"Zuchao Li, Hai Zhao, Rui Wang, Masao Utiyama, and Eiichiro Sumita. 2020. Reference language based unsupervised neural machine translation. In EMNLP Findings."},{"key":"e_1_3_2_140_2","article-title":"Let\u2019s verify step by step","volume":"2305","author":"Lightman Hunter","year":"2023","unstructured":"Hunter Lightman, Vineet Kosaraju, Yura Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let\u2019s verify step by step. CoRR abs\/2305.20050 (2023).","journal-title":"CoRR"},{"key":"e_1_3_2_141_2","volume-title":"Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out."},{"key":"e_1_3_2_142_2","volume-title":"ECCV","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV."},{"key":"e_1_3_2_143_2","volume-title":"EMNLP","author":"Lin Zehui","year":"2020","unstructured":"Zehui Lin, Xiao Pan, Mingxuan Wang, Xipeng Qiu, Jiangtao Feng, Hao Zhou, and Lei Li. 2020. Pre-training multilingual neural machine translation by leveraging alignment information. In EMNLP."},{"key":"e_1_3_2_144_2","volume-title":"ACL\/IJCNLP Findings","author":"Liu Dayiheng","year":"2021","unstructured":"Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, and Nan Duan. 2021. GLGE: A new general language generation evaluation benchmark. In ACL\/IJCNLP Findings."},{"key":"e_1_3_2_145_2","volume-title":"EMNLP Findings","author":"Liu Junpeng","year":"2021","unstructured":"Junpeng Liu, Yanyan Zou, Hainan Zhang, Hongshen Chen, Zhuoye Ding, Caixia Yuan, and Xiaojie Wang. 2021. Topic-aware contrastive learning for abstractive dialogue summarization. In EMNLP Findings."},{"key":"e_1_3_2_146_2","article-title":"Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing","author":"Liu Pengfei","year":"2021","unstructured":"Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv (2021).","journal-title":"arXiv"},{"key":"e_1_3_2_147_2","volume-title":"ACL","author":"Liu Siyang","year":"2022","unstructured":"Siyang Liu, Sahand Sabour, Yinhe Zheng, Pei Ke, Xiaoyan Zhu, and Minlie Huang. 2022. Rethinking and refining the distinct metric. In ACL."},{"key":"e_1_3_2_148_2","volume-title":"EMNLP","author":"Liu Shilei","year":"2021","unstructured":"Shilei Liu, Xiaofeng Zhao, Bochao Li, Feiliang Ren, Longhui Zhang, and Shujuan Yin. 2021. A three-stage learning framework for low-resource knowledge-grounded dialogue generation. In EMNLP."},{"key":"e_1_3_2_149_2","volume-title":"NAACL-HLT","author":"Liu Yixin","year":"2021","unstructured":"Yixin Liu, Zi-Yi Dou, and Pengfei Liu. 2021. RefSum: Refactoring neural summarization. In NAACL-HLT."},{"key":"e_1_3_2_150_2","article-title":"Multilingual denoising pre-training for neural machine translation","author":"Liu Yinhan","year":"2020","unstructured":"Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. 2020. Multilingual denoising pre-training for neural machine translation. TACL (2020).","journal-title":"TACL"},{"key":"e_1_3_2_151_2","volume-title":"EMNLP\/IJCNLP","author":"Liu Yang","year":"2019","unstructured":"Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In EMNLP\/IJCNLP."},{"key":"e_1_3_2_152_2","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv (2019).","journal-title":"arXiv"},{"key":"e_1_3_2_153_2","volume-title":"AAAI","author":"Liu Ye","year":"2021","unstructured":"Ye Liu, Yao Wan, Lifang He, Hao Peng, and Philip S. Yu. 2021. KG-BART: Knowledge graph-augmented BART for generative commonsense reasoning. In AAAI."},{"key":"e_1_3_2_154_2","volume-title":"AAAI","author":"Liu Yuchen","year":"2020","unstructured":"Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, and Chengqing Zong. 2020. Synchronous speech recognition and speech-to-text translation with interactive decoding. In AAAI."},{"key":"e_1_3_2_155_2","volume-title":"ACL\/IJCNLP Findings","author":"Liu Zihan","year":"2021","unstructured":"Zihan Liu, Genta Indra Winata, and Pascale Fung. 2021. Continual mixed-language pre-training for extremely low-resource neural machine translation. In ACL\/IJCNLP Findings."},{"key":"e_1_3_2_156_2","volume-title":"NetBERT: A Pre-trained Language Representation Model for Computer Networking","author":"Louis Antoine","year":"2020","unstructured":"Antoine Louis. 2020. NetBERT: A Pre-trained Language Representation Model for Computer Networking. Ph.D. Dissertation."},{"key":"e_1_3_2_157_2","volume-title":"ACL\/IJCNLP","author":"Luo Fuli","year":"2021","unstructured":"Fuli Luo, Wei Wang, Jiahao Liu, Yijia Liu, Bin Bi, Songfang Huang, Fei Huang, and Luo Si. 2021. VECO: Variable and flexible cross-lingual pre-training for language understanding and generation. In ACL\/IJCNLP."},{"key":"e_1_3_2_158_2","unstructured":"Huaishao Luo Lei Ji Botian Shi Haoyang Huang Nan Duan Tianrui Li Xilin Chen and Ming Zhou. 2020. UNIVL: A unified video and language pre-training model for multimodal understanding and generation. CoRR abs\/2002.06353 (2020)."},{"key":"e_1_3_2_159_2","volume-title":"EMNLP Findings","author":"Magooda Ahmed","year":"2021","unstructured":"Ahmed Magooda and Diane J. Litman. 2021. Mitigating data scarceness through data synthesis, augmentation and curriculum for abstractive summarization. In EMNLP Findings."},{"key":"e_1_3_2_160_2","volume-title":"NAACL-HLT","author":"Majumder Bodhisattwa Prasad","year":"2021","unstructured":"Bodhisattwa Prasad Majumder, Sudha Rao, Michel Galley, and Julian J. McAuley. 2021. Ask what\u2019s missing and what\u2019s useful: Improving clarification question generation using global knowledge. In NAACL-HLT."},{"key":"e_1_3_2_161_2","volume-title":"ACL\/IJCNLP","author":"Manakul Potsawee","year":"2021","unstructured":"Potsawee Manakul and Mark J. F. Gales. 2021. Long-span summarization via local attention and content selection. In ACL\/IJCNLP."},{"key":"e_1_3_2_162_2","volume-title":"EMNLP\/IJCNLP","author":"Mao Huanru Henry","year":"2019","unstructured":"Huanru Henry Mao, Bodhisattwa Prasad Majumder, Julian J. McAuley, and Garrison W. Cottrell. 2019. Improving neural story generation by targeted common sense grounding. In EMNLP\/IJCNLP."},{"key":"e_1_3_2_163_2","volume-title":"ACL","author":"Mathur Nitika","year":"2020","unstructured":"Nitika Mathur, Timothy Baldwin, and Trevor Cohn. 2020. Tangled up in BLEU: Reevaluating the evaluation of automatic machine translation evaluation metrics. In ACL."},{"key":"e_1_3_2_164_2","volume-title":"ACL\/IJCNLP Findings","author":"Maurya Kaushal Kumar","year":"2021","unstructured":"Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Yoshinobu Kano, and Kumari Deepshikha. 2021. ZmBART: An unsupervised cross-lingual transfer framework for language generation. In ACL\/IJCNLP Findings."},{"key":"e_1_3_2_165_2","doi-asserted-by":"crossref","DOI":"10.1162\/tacl_a_00536","article-title":"Locally typical sampling","author":"Meister Clara","year":"2023","unstructured":"Clara Meister, Tiago Pimentel, Gian Wiher, and Ryan Cotterell. 2023. Locally typical sampling. TACL (2023).","journal-title":"TACL"},{"key":"e_1_3_2_166_2","unstructured":"Jacob Menick Maja Trebacz Vladimir Mikulik John Aslanides H. Francis Song Martin J. Chadwick Mia Glaese Susannah Young Lucy Campbell-Gillingham Geoffrey Irving and Nat McAleese. 2022. Teaching language models to support answers with verified quotes. CoRR abs\/2203.11147 (2022)."},{"key":"e_1_3_2_167_2","article-title":"Mixed precision training","author":"Micikevicius Paulius","year":"2017","unstructured":"Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David Garc\u00eda, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2017. Mixed precision training. arXiv (2017).","journal-title":"arXiv"},{"key":"e_1_3_2_168_2","volume-title":"EMNLP","author":"Mihaylov Todor","year":"2018","unstructured":"Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. 2018. Can a suit of armor conduct electricity? a new dataset for open book question answering. In EMNLP."},{"key":"e_1_3_2_169_2","unstructured":"Jay Mody. picoGPT: An unnecessarily tiny implementation of GPT-2 in NumPy. (n.d.)."},{"key":"e_1_3_2_170_2","article-title":"Correcting length bias in neural machine translation","author":"Murray Kenton","year":"2018","unstructured":"Kenton Murray and David Chiang. 2018. Correcting length bias in neural machine translation. arXiv (2018).","journal-title":"arXiv"},{"key":"e_1_3_2_171_2","volume-title":"NeurIPS","author":"Nagrani Arsha","year":"2021","unstructured":"Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention bottlenecks for multimodal fusion. In NeurIPS."},{"key":"e_1_3_2_172_2","unstructured":"Reiichiro Nakano Jacob Hilton Suchir Balaji Jeff Wu Long Ouyang Christina Kim Christopher Hesse Shantanu Jain Vineet Kosaraju William Saunders Xu Jiang Karl Cobbe Tyna Eloundou Gretchen Krueger Kevin Button Matthew Knight Benjamin Chess and John Schulman. 2021. WebGPT: Browser-assisted question-answering with human feedback. arXiv (2021)."},{"key":"e_1_3_2_173_2","doi-asserted-by":"crossref","unstructured":"Feng Nan C\u00edcero Nogueira dos Santos Henghui Zhu Patrick Ng Kathleen R. McKeown Ramesh Nallapati Dejiao Zhang Zhiguo Wang Andrew O. Arnold and Bing Xiang. 2021. Improving factual consistency of abstractive summarization via question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing ACL\/IJCNLP 2021 (Volume1: Long Papers) Virtual Event August 1-6 2021 Chengqing Zong Fei Xia Wenjie Li and Roberto Navigli (Eds.). Association for Computational Linguistics 6881\u20136894.","DOI":"10.18653\/v1\/2021.acl-long.536"},{"key":"e_1_3_2_174_2","unstructured":"Piotr Nawrot. 2023. nanoT5: Fast & simple repository for pre-training and fine-tuning T5-style models. (2023)."},{"key":"e_1_3_2_175_2","volume-title":"EMNLP","author":"Nguyen Thong","year":"2021","unstructured":"Thong Nguyen, Anh Tuan Luu, Truc Lu, and Tho Quan. 2021. Enriching and controlling global semantics for text summarization. In EMNLP."},{"key":"e_1_3_2_176_2","article-title":"GPT-4 technical report","year":"2023","unstructured":"OpenAI. 2023. GPT-4 technical report. OpenAI (2023).","journal-title":"OpenAI"},{"key":"e_1_3_2_177_2","volume-title":"NAACL-HLT Demonstrations","author":"Ott Myle","year":"2019","unstructured":"Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. Fairseq: A fast, extensible toolkit for sequence modeling. In NAACL-HLT Demonstrations."},{"key":"e_1_3_2_178_2","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul F. Christiano Jan Leike and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 NeurIPS 2022 New Orleans LA USA November 28 - December 9 2022."},{"key":"e_1_3_2_179_2","volume-title":"ACL","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In ACL."},{"key":"e_1_3_2_180_2","volume-title":"EMNLP Findings","author":"Pascual Damian","year":"2021","unstructured":"Damian Pascual, Beni Egressy, Clara Meister, Ryan Cotterell, and Roger Wattenhofer. 2021. A plug-and-play method for controlled text generation. In EMNLP Findings."},{"key":"e_1_3_2_181_2","volume-title":"AAAI","author":"Pasunuru Ramakanth","year":"2021","unstructured":"Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal, and Jianfeng Gao. 2021. Data augmentation for abstractive query-focused multi-document summarization. In AAAI."},{"key":"e_1_3_2_182_2","volume-title":"NAACL-HLT","author":"Pasunuru Ramakanth","year":"2021","unstructured":"Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, and Markus Dreyer. 2021. Efficiently summarizing text and graph encodings of multi-document clusters. In NAACL-HLT."},{"key":"e_1_3_2_183_2","volume-title":"ICLR (Poster)","author":"Paulus Romain","year":"2018","unstructured":"Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive summarization. In ICLR (Poster)."},{"key":"e_1_3_2_184_2","volume-title":"EMNLP Findings","author":"Peng Baolin","year":"2020","unstructured":"Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, and Jianfeng Gao. 2020. Few-shot natural language generation for task-oriented dialog. In EMNLP Findings."},{"key":"e_1_3_2_185_2","article-title":"Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks","author":"Phang Jason","year":"2018","unstructured":"Jason Phang, Thibault F\u00e9vry, and Samuel R. Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088 (2018).","journal-title":"arXiv preprint arXiv:1811.01088"},{"key":"e_1_3_2_186_2","volume-title":"ACL","author":"Pires Telmo","year":"2019","unstructured":"Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is multilingual BERT?. In ACL."},{"key":"e_1_3_2_187_2","volume-title":"WMT","author":"Popovic Maja","year":"2017","unstructured":"Maja Popovic. 2017. chrF++: Words helping character n-grams. In WMT."},{"key":"e_1_3_2_188_2","volume-title":"NAACL-HLT","author":"Post Matt","year":"2018","unstructured":"Matt Post and David Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In NAACL-HLT."},{"key":"e_1_3_2_189_2","volume-title":"ICLR","author":"Press Ofir","year":"2022","unstructured":"Ofir Press, Noah A. Smith, and Mike Lewis. 2022. Train short, test long: Attention with linear biases enables input length extrapolation. In ICLR."},{"key":"e_1_3_2_190_2","unstructured":"Weizhen Qi Yeyun Gong Jian Jiao Yu Yan Weizhu Chen Dayiheng Liu Kewen Tang Houqiang Li Jiusheng Chen Ruofei Zhang Ming Zhou and Nan Duan. 2021. BANG: Bridging autoregressive and non-autoregressive generation with large scale pretraining. In Proceedings of the 38th International Conference on Machine Learning ICML 2021 18-24 July 2021 Virtual Event (Proceedings of Machine Learning Research Vol. 139) Marina Meila and Tong Zhang (Eds.). PMLR 8630\u20138639."},{"key":"e_1_3_2_191_2","volume-title":"EMNLP Findings","author":"Qi Weizhen","year":"2020","unstructured":"Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, and Ming Zhou. 2020. ProphetNet: Predicting future n-gram for sequence-to-sequence pre-training. In EMNLP Findings."},{"key":"e_1_3_2_192_2","article-title":"Pre-trained models for natural language processing: A survey","author":"Qiu Xipeng","year":"2020","unstructured":"Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained models for natural language processing: A survey. arXiv preprint arXiv:2003.08271 (2020).","journal-title":"arXiv preprint arXiv:2003.08271"},{"key":"e_1_3_2_193_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. (2018)."},{"key":"e_1_3_2_194_2","article-title":"Language models are unsupervised multitask learners","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog (2019).","journal-title":"OpenAI blog"},{"key":"e_1_3_2_195_2","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020).","journal-title":"JMLR"},{"key":"e_1_3_2_196_2","first-page":"20","volume-title":"SC","author":"Rajbhandari Samyam","year":"2020","unstructured":"Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. ZeRO: Memory optimizations toward training trillion parameter models. In SC. 20."},{"key":"e_1_3_2_197_2","doi-asserted-by":"crossref","DOI":"10.1145\/3567592","article-title":"Neural machine translation for low-resource languages: A survey","author":"Ranathunga Surangika","year":"2023","unstructured":"Surangika Ranathunga, En-Shiun Annie Lee, Marjana Prifti Skenduli, Ravi Shekhar, Mehreen Alam, and Rishemjit Kaur. 2023. Neural machine translation for low-resource languages: A survey. ACM Comput. Surv. (2023).","journal-title":"ACM Comput. Surv."},{"key":"e_1_3_2_198_2","volume-title":"EMNLP","author":"Rashkin Hannah","year":"2020","unstructured":"Hannah Rashkin, Asli Celikyilmaz, Yejin Choi, and Jianfeng Gao. 2020. PlotMachines: Outline-conditioned generation with dynamic plot state tracking. In EMNLP."},{"key":"e_1_3_2_199_2","volume-title":"SIGKDD","author":"Rasley Jeff","year":"2020","unstructured":"Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. 2020. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In SIGKDD."},{"key":"e_1_3_2_200_2","volume-title":"EMNLP","author":"Rei Ricardo","year":"2020","unstructured":"Ricardo Rei, Craig Stewart, Ana C. Farinha, and Alon Lavie. 2020. COMET: A neural framework for MT evaluation. In EMNLP."},{"key":"e_1_3_2_201_2","volume-title":"EMNLP","author":"Reid Machel","year":"2021","unstructured":"Machel Reid, Junjie Hu, Graham Neubig, and Yutaka Matsuo. 2021. AfroMT: Pretraining strategies and reproducible benchmarks for translation of 8 African languages. In EMNLP."},{"key":"e_1_3_2_202_2","volume-title":"EMNLP\/IJCNLP","author":"Ren Shuo","year":"2019","unstructured":"Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, and Shuai Ma. 2019. Explicit cross-lingual pre-training for unsupervised machine translation. In EMNLP\/IJCNLP."},{"key":"e_1_3_2_203_2","article-title":"Investigating pretrained language models for graph-to-text generation","author":"Ribeiro Leonardo F. R.","year":"2020","unstructured":"Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich Sch\u00fctze, and Iryna Gurevych. 2020. Investigating pretrained language models for graph-to-text generation. arXiv preprint arXiv:2007.08426 (2020).","journal-title":"arXiv preprint arXiv:2007.08426"},{"key":"e_1_3_2_204_2","volume-title":"EMNLP","author":"Ribeiro Leonardo F. R.","year":"2021","unstructured":"Leonardo F. R. Ribeiro, Yue Zhang, and Iryna Gurevych. 2021. Structural adapters in pretrained language models for amr-to-text generation. In EMNLP."},{"key":"e_1_3_2_205_2","volume-title":"EACL","author":"Roller Stephen","year":"2021","unstructured":"Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Eric Michael Smith, Y-Lan Boureau, and Jason Weston. 2021. Recipes for building an open-domain chatbot. In EACL."},{"key":"e_1_3_2_206_2","article-title":"Leveraging pre-trained checkpoints for sequence generation tasks","author":"Rothe Sascha","year":"2020","unstructured":"Sascha Rothe, Shashi Narayan, and Aliaksei Severyn. 2020. Leveraging pre-trained checkpoints for sequence generation tasks. TACL (2020).","journal-title":"TACL"},{"key":"e_1_3_2_207_2","volume-title":"ACL","author":"Sai Ananya B.","year":"2023","unstructured":"Ananya B. Sai, Tanay Dixit, Vignesh Nagarajan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra, and Raj Dabre. 2023. IndicMT Eval: A dataset to meta-evaluate machine translation metrics for indian languages. In ACL."},{"key":"e_1_3_2_208_2","article-title":"Abstractive summarization with combination of pre-trained sequence-to-sequence and saliency models","author":"Saito Itsumi","year":"2020","unstructured":"Itsumi Saito, Kyosuke Nishida, Kosuke Nishida, and Junji Tomita. 2020. Abstractive summarization with combination of pre-trained sequence-to-sequence and saliency models. arXiv preprint arXiv:2003.13028 (2020).","journal-title":"arXiv preprint arXiv:2003.13028"},{"key":"e_1_3_2_209_2","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1162\/tacl_a_00434","article-title":"Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP","volume":"9","author":"Schick Timo","year":"2021","unstructured":"Timo Schick, Sahana Udupa, and Hinrich Sch\u00fctze. 2021. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. Trans. Assoc. Comput. Linguistics 9 (2021), 1408\u20131424.","journal-title":"Trans. Assoc. Comput. Linguistics"},{"key":"e_1_3_2_210_2","article-title":"Proximal policy optimization algorithms","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"e_1_3_2_211_2","volume-title":"NeurIPS","author":"Scialom Thomas","year":"2020","unstructured":"Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, and Jacopo Staiano. 2020. ColdGANs: Taming language GANs with cautious sampling strategies. In NeurIPS."},{"key":"e_1_3_2_212_2","volume-title":"ACL","author":"See Abigail","year":"2017","unstructured":"Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer-generator networks. In ACL."},{"key":"e_1_3_2_213_2","volume-title":"ACL","author":"Sellam Thibault","year":"2020","unstructured":"Thibault Sellam, Dipanjan Das, and Ankur P. Parikh. 2020. BLEURT: Learning robust metrics for text generation. In ACL."},{"key":"e_1_3_2_214_2","article-title":"On accurate evaluation of GANs for language generation","author":"Semeniuta Stanislau","year":"2018","unstructured":"Stanislau Semeniuta, Aliaksei Severyn, and Sylvain Gelly. 2018. On accurate evaluation of GANs for language generation. arXiv preprint arXiv:1806.04936 (2018).","journal-title":"arXiv preprint arXiv:1806.04936"},{"key":"e_1_3_2_215_2","article-title":"Human vs automatic metrics: On the importance of correlation design","author":"Shimorina Anastasia","year":"2018","unstructured":"Anastasia Shimorina. 2018. Human vs automatic metrics: On the importance of correlation design. arXiv (2018).","journal-title":"arXiv"},{"key":"e_1_3_2_216_2","volume-title":"EMNLP","author":"Shin Taylor","year":"2020","unstructured":"Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, and Sameer Singh. 2020. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. In EMNLP."},{"key":"e_1_3_2_217_2","article-title":"Megatron-LM: Training multi-billion parameter language models using model parallelism","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv (2019).","journal-title":"arXiv"},{"key":"e_1_3_2_218_2","unstructured":"Karan Singhal Shekoofeh Azizi Tao Tu S. Sara Mahdavi Jason Wei Hyung Won Chung Nathan Scales Ajay Kumar Tanwani Heather Cole-Lewis Stephen Pfohl Perry Payne Martin Seneviratne Paul Gamble Chris Kelly Nathaneal Sch\u00e4rli Aakanksha Chowdhery Philip Andrew Mansfield Blaise Ag\u00fcera y Arcas Dale R.Webster Gregory S. Corrado Yossi Matias Katherine Chou Juraj Gottweis Nenad Tomasev Yun Liu Alvin Rajkomar Joelle K. Barral Christopher Semturs Alan Karthikesalingam and Vivek Natarajan. 2022. Large language models encode clinical knowledge. CoRR abs\/2212.13138 (2022)."},{"key":"e_1_3_2_219_2","doi-asserted-by":"crossref","unstructured":"Linda B. Smith and Michael Gasser. 2005. The Development of Embodied Cognition: Six Lessons from Babies. (2005).","DOI":"10.1162\/1064546053278973"},{"key":"e_1_3_2_220_2","volume-title":"ICML","author":"Song Kaitao","year":"2019","unstructured":"Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked sequence to sequence pre-training for language generation. In ICML."},{"key":"e_1_3_2_221_2","unstructured":"Aarohi Srivastava Abhinav Rastogi Abhishek Rao Abu Awal Md Shoeb Abubakar Abid Adam Fisch Adam R. Brown Adam Santoro Aditya Gupta Adri\u00e0 Garriga-Alonso Agnieszka Kluska Aitor Lewkowycz Akshat Agarwal Alethea Power Alex Ray Alex Warstadt Alexander W. Kocurek Ali Safaya Ali Tazarv Alice Xiang Alicia Parrish Allen Nie Aman Hussain Amanda Askell Amanda Dsouza Ameet Rahane Anantharaman S. Iyer Anders Andreassen Andrea Santilli Andreas Stuhlm\u00fcller Andrew M. Dai Andrew La Andrew K. Lampinen Andy Zou Angela Jiang Angelica Chen Anh Vuong Animesh Gupta Anna Gottardi Antonio Norelli Anu Venkatesh Arash Gholamidavoodi Arfa Tabassum Arul Menezes Arun Kirubarajan Asher Mullokandov Ashish Sabharwal Austin Herrick Avia Efrat Aykut Erdem and Ayla Karakas. 2022. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. CoRR abs\/2206.04615 (2022)."},{"key":"e_1_3_2_222_2","doi-asserted-by":"crossref","first-page":"11974","DOI":"10.1109\/ACCESS.2021.3051315","article-title":"A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence","volume":"9","author":"Stepin Ilia","year":"2021","unstructured":"Ilia Stepin, Jose M. Alonso, Alejandro Catala, and Mart\u00edn Pereira-Fari\u00f1a. 2021. A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9 (2021), 11974\u201312001.","journal-title":"IEEE Access"},{"key":"e_1_3_2_223_2","volume-title":"NeurIPS","author":"Stern Mitchell","year":"2018","unstructured":"Mitchell Stern, Noam Shazeer, and Jakob Uszkoreit. 2018. Blockwise parallel decoding for deep autoregressive models. In NeurIPS."},{"key":"e_1_3_2_224_2","volume-title":"EACL","author":"Stickland Asa Cooper","year":"2021","unstructured":"Asa Cooper Stickland, Xian Li, and Marjan Ghazvininejad. 2021. Recipes for adapting pre-trained monolingual and multilingual models to machine translation. In EACL."},{"key":"e_1_3_2_225_2","unstructured":"Jianlin Su. 2023. Transformer Upgrade Path: 12 Infinite Extrapolation of ReRoPE?"},{"key":"e_1_3_2_226_2","article-title":"RoFormer: Enhanced transformer with rotary position embedding","volume":"2104","author":"Su Jianlin","year":"2021","unstructured":"Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. 2021. RoFormer: Enhanced transformer with rotary position embedding. CoRR abs\/2104.09864 (2021).","journal-title":"CoRR"},{"key":"e_1_3_2_227_2","volume-title":"NeurIPS","author":"Su Yixuan","year":"2022","unstructured":"Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, and Nigel Collier. 2022. A contrastive framework for neural text generation. In NeurIPS."},{"key":"e_1_3_2_228_2","volume-title":"ICCV","author":"Sun Chen","year":"2019","unstructured":"Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. VideoBERT: A joint model for video and language representation learning. In ICCV."},{"key":"e_1_3_2_229_2","article-title":"A length-extrapolatable transformer","author":"Sun Yutao","year":"2022","unstructured":"Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, and Furu Wei. 2022. A length-extrapolatable transformer. arXiv (2022).","journal-title":"arXiv"},{"key":"e_1_3_2_230_2","unstructured":"Yu Sun Shuohuan Wang Shikun Feng Siyu Ding Chao Pang Junyuan Shang Jiaxiang Liu Xuyi Chen Yanbin Zhao Yuxiang Lu Weixin Liu Zhihua Wu Weibao Gong Jianzhong Liang Zhizhou Shang Peng Sun Wei Liu Xuan Ouyang Dianhai Yu Hao Tian Hua Wu and Haifeng Wang. 2021. ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation. arXiv (2021)."},{"key":"e_1_3_2_231_2","volume-title":"NIPS","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In NIPS."},{"key":"e_1_3_2_232_2","volume-title":"ACL\/IJCNLP Findings","author":"Tang Yuqing","year":"2021","unstructured":"Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, and Angela Fan. 2021. Multilingual translation from denoising pre-training. In ACL\/IJCNLP Findings."},{"key":"e_1_3_2_233_2","volume-title":"NAACL","author":"Tao Tao","year":"2006","unstructured":"Tao Tao, Xuanhui Wang, Qiaozhu Mei, and ChengXiang Zhai. 2006. Language model information retrieval with document expansion. In NAACL."},{"key":"e_1_3_2_234_2","volume-title":"ICLR","author":"Tay Yi","year":"2023","unstructured":"Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, and Donald Metzler. 2023. UL2: Unifying language learning paradigms. In ICLR."},{"key":"e_1_3_2_235_2","article-title":"LLaMA: Open and efficient foundation language models","volume":"2302","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, Aur\u00e9lien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and efficient foundation language models. CoRR abs\/2302.13971 (2023).","journal-title":"CoRR"},{"key":"e_1_3_2_236_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open foundation and fine-tuned chat models. CoRR abs\/2307.09288 (2023)."},{"key":"e_1_3_2_237_2","unstructured":"Jonathan Uesato Nate Kushman Ramana Kumar H. Francis Song Noah Y. Siegel Lisa Wang Antonia Creswell Geoffrey Irving and Irina Higgins. 2022. Solving math word problems with process- and outcome-based feedback. CoRR abs\/2211.14275 (2022)."},{"key":"e_1_3_2_238_2","volume-title":"NIPS","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS."},{"key":"e_1_3_2_239_2","article-title":"Diverse beam search: Decoding diverse solutions from neural sequence models","author":"Vijayakumar Ashwin K.","year":"2016","unstructured":"Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. 2016. Diverse beam search: Decoding diverse solutions from neural sequence models. arXiv (2016).","journal-title":"arXiv"},{"key":"e_1_3_2_240_2","article-title":"Unsupervised cross-lingual word embedding by multilingual neural language models","author":"Wada Takashi","year":"2018","unstructured":"Takashi Wada and Tomoharu Iwata. 2018. Unsupervised cross-lingual word embedding by multilingual neural language models. arXiv preprint arXiv:1809.02306 (2018).","journal-title":"arXiv preprint arXiv:1809.02306"},{"key":"e_1_3_2_241_2","volume-title":"NeurIPS","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. SuperGLUE: A stickier benchmark for general-purpose language understanding systems. In NeurIPS."},{"key":"e_1_3_2_242_2","volume-title":"ICLR","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR."},{"key":"e_1_3_2_243_2","volume-title":"ICML","author":"Wang Thomas","year":"2022","unstructured":"Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, and Colin Raffel. 2022. What language model architecture and pretraining objective works best for zero-shot generalization?. In ICML."},{"key":"e_1_3_2_244_2","first-page":"6827","volume-title":"ICCV","author":"Wang Teng","year":"2021","unstructured":"Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, and Ping Luo. 2021. End-to-end dense video captioning with parallel decoding. In ICCV. IEEE, 6827\u20136837."},{"key":"e_1_3_2_245_2","volume-title":"ECIR","author":"Wang Wei","year":"2021","unstructured":"Wei Wang, Piji Li, and Hai-Tao Zheng. 2021. Consistency and coherency enhanced story generation. In ECIR."},{"key":"e_1_3_2_246_2","volume-title":"NAACL-HLT Industry","author":"Wang Xiaohui","year":"2021","unstructured":"Xiaohui Wang, Ying Xiong, Yang Wei, Mingxuan Wang, and Lei Li. 2021. LightSeq: A high performance inference library for transformers. In NAACL-HLT Industry."},{"key":"e_1_3_2_247_2","article-title":"Perplexity from PLM is unreliable for evaluating text quality","author":"Wang Yequan","year":"2022","unstructured":"Yequan Wang, Jiawen Deng, Aixin Sun, and Xuying Meng. 2022. Perplexity from PLM is unreliable for evaluating text quality. arXiv preprint arXiv:2210.05892 (2022).","journal-title":"arXiv preprint arXiv:2210.05892"},{"key":"e_1_3_2_248_2","article-title":"Measuring and reducing gendered correlations in pre-trained models","volume":"2010","author":"Webster Kellie","year":"2020","unstructured":"Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, and Slav Petrov. 2020. Measuring and reducing gendered correlations in pre-trained models. CoRR abs\/2010.06032 (2020).","journal-title":"CoRR"},{"key":"e_1_3_2_249_2","unstructured":"Jason Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph Sebastian Borgeaud Dani Yogatama Maarten Bosma Denny Zhou Donald Metzler Ed H. Chi Tatsunori Hashimoto Oriol Vinyals Percy Liang Jeff Dean and William Fedus. 2022. Emergent abilities of large language models. Trans. Mach. Learn. Res. 2022 (2022)."},{"key":"e_1_3_2_250_2","doi-asserted-by":"crossref","unstructured":"Thomas Wolf Lysandre Debut Victor Sanh Julien Chaumond Clement Delangue Anthony Moi Pierric Cistac Tim Rault R\u00e9mi Louf Morgan Funtowicz Joe Davison Sam Shleifer Patrick von Platen Clara Ma Yacine Jernite Julien Plu Canwen Xu Teven Le Scao Sylvain Gugger Mariama Drame Quentin Lhoest and Alexander M. Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations EMNLP 2020 - Demos Online November 16-20 2020 Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics 38\u201345.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_251_2","article-title":"TransferTransfo: A transfer learning approach for neural network based conversational agents","author":"Wolf Thomas","year":"2019","unstructured":"Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. 2019. TransferTransfo: A transfer learning approach for neural network based conversational agents. arXiv preprint arXiv:1901.08149 (2019).","journal-title":"arXiv preprint arXiv:1901.08149"},{"key":"e_1_3_2_252_2","volume-title":"NLPCC","author":"Xia Qiaolin","year":"2021","unstructured":"Qiaolin Xia, Haoyang Huang, Nan Duan, Dongdong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, and Ming Zhou. 2021. XGPT: Cross-modal generative pre-training for image captioning. In NLPCC."},{"key":"e_1_3_2_253_2","article-title":"SmoothQuant: Accurate and efficient post-training quantization for large language models","author":"Xiao Guangxuan","year":"2022","unstructured":"Guangxuan Xiao, Ji Lin, Micka\u00ebl Seznec, Julien Demouth, and Song Han. 2022. SmoothQuant: Accurate and efficient post-training quantization for large language models. arXiv (2022).","journal-title":"arXiv"},{"key":"e_1_3_2_254_2","volume-title":"EMNLP","author":"Xu Peng","year":"2020","unstructured":"Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Raul Puri, Pascale Fung, Anima Anandkumar, and Bryan Catanzaro. 2020. MEGATRON-CNTRL: Controllable story generation with external knowledge using large-scale language models. In EMNLP."},{"key":"e_1_3_2_255_2","volume-title":"ACL\/IJCNLP","author":"Xue Lanqing","year":"2021","unstructured":"Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan, Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, and Tie-Yan Liu. 2021. DeepRapper: Neural rap generation with rhyme and rhythm modeling. In ACL\/IJCNLP."},{"key":"e_1_3_2_256_2","article-title":"FastSeq: Make sequence generation faster","author":"Yan Yu","year":"2021","unstructured":"Yu Yan, Fei Hu, Jiusheng Chen, Nikhil Bhendawade, Ting Ye, Yeyun Gong, Nan Duan, Desheng Cui, Bingyu Chi, and Ruifei Zhang. 2021. FastSeq: Make sequence generation faster. arXiv preprint arXiv:2106.04718 (2021).","journal-title":"arXiv preprint arXiv:2106.04718"},{"key":"e_1_3_2_257_2","volume-title":"AAAI","author":"Yang Jiacheng","year":"2020","unstructured":"Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Weinan Zhang, Yong Yu, and Lei Li. 2020. Towards making the most of BERT in neural machine translation. In AAAI."},{"key":"e_1_3_2_258_2","volume-title":"NeurIPS","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In NeurIPS."},{"key":"e_1_3_2_259_2","volume-title":"EMNLP","author":"Yang Zhen","year":"2020","unstructured":"Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, and Qi Ju. 2020. CSP: Code-switching pre-training for neural machine translation. In EMNLP."},{"key":"e_1_3_2_260_2","volume-title":"EMNLP Findings","author":"Yang Ziyi","year":"2020","unstructured":"Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, and Eric Darve. 2020. TED: A pretrained unsupervised summarization model with theme modeling and denoising. In EMNLP Findings."},{"key":"e_1_3_2_261_2","volume-title":"ACL","author":"You Weiqiu","year":"2020","unstructured":"Weiqiu You, Simeng Sun, and Mohit Iyyer. 2020. Hard-coded gaussian attention for neural machine translation. In ACL."},{"key":"e_1_3_2_262_2","volume-title":"AAAI","author":"Yu Lantao","year":"2017","unstructured":"Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence generative adversarial nets with policy gradient. In AAAI."},{"key":"e_1_3_2_263_2","unstructured":"Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti Santiago Onta\u00f1\u00f3n Philip Pham Anirudh Ravula Qifan Wang Li Yang and Amr Ahmed. 2020. Big Bird: Transformers for longer sequences. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 NeurIPS 2020 December 6-12 2020 Virtual."},{"key":"e_1_3_2_264_2","volume-title":"ACSW","author":"Zaib Munazza","year":"2020","unstructured":"Munazza Zaib, Quan Z. Sheng, and Wei Emma Zhang. 2020. A short survey of pre-trained language models for conversational AI-A new age in NLP. In ACSW."},{"key":"e_1_3_2_265_2","volume-title":"ICLR","author":"Zeng Aohan","year":"2023","unstructured":"Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, et\u00a0al. 2023. GLM-130B: An open bilingual pre-trained model. In ICLR."},{"key":"e_1_3_2_266_2","article-title":"PanGu- \\(\\alpha\\) : Large-scale autoregressive pretrained chinese language models with auto-parallel computation","author":"Zeng Wei","year":"2021","unstructured":"Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li, Ziyan Gong, Yifan Yao, Xinjing Huang, Jun Wang, Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang, Jin Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang Peng, Fangqing Jiang, Han Zhang, Lingfeng Deng, Yehong Zhang, Zhe Lin, Chao Zhang, Shaojie Zhang, Mingyue Guo, Shanzhi Gu, Gaojun Fan, Yaowei Wang, Xuefeng Jin, Qun Liu and , Yonghong Tian. 2021. PanGu- \\(\\alpha\\) : Large-scale autoregressive pretrained chinese language models with auto-parallel computation. arXiv preprint arXiv:2104.12369 (2021).","journal-title":"arXiv preprint arXiv:2104.12369"},{"key":"e_1_3_2_267_2","article-title":"Generalized conditioned dialogue generation based on pre-trained language model","author":"Zeng Yan","year":"2020","unstructured":"Yan Zeng and Jian-Yun Nie. 2020. Generalized conditioned dialogue generation based on pre-trained language model. arXiv preprint arXiv:2010.11140 (2020).","journal-title":"arXiv preprint arXiv:2010.11140"},{"key":"e_1_3_2_268_2","volume-title":"NAACL-HLT","author":"Zeng Yan","year":"2021","unstructured":"Yan Zeng and Jian-Yun Nie. 2021. A simple and efficient multi-task learning approach for conditioned dialogue generation. In NAACL-HLT."},{"key":"e_1_3_2_269_2","volume-title":"CIKM","author":"Zhai ChengXiang","year":"2001","unstructured":"ChengXiang Zhai and John D. Lafferty. 2001. Model-based feedback in the language modeling approach to information retrieval. In CIKM."},{"key":"e_1_3_2_270_2","volume-title":"ICML","author":"Zhang Jingqing","year":"2020","unstructured":"Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020. PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In ICML."},{"key":"e_1_3_2_271_2","volume-title":"EMNLP Findings","author":"Zhang Longxiang","year":"2021","unstructured":"Longxiang Zhang, Renato Negrinho, Arindam Ghosh, Vasudevan Jagannathan, Hamid Reza Hassanzadeh, Thomas Schaaf, and Matthew R. Gormley. 2021. Leveraging pretrained models for automatic summarization of doctor-patient conversations. In EMNLP Findings."},{"key":"e_1_3_2_272_2","article-title":"Adaptive budget allocation for parameter-efficient fine-tuning","author":"Zhang Qingru","year":"2023","unstructured":"Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. 2023. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512 (2023).","journal-title":"arXiv preprint arXiv:2303.10512"},{"key":"e_1_3_2_273_2","volume-title":"ICLR","author":"Zhang Tianyi","year":"2020","unstructured":"Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating text generation with BERT. In ICLR."},{"key":"e_1_3_2_274_2","volume-title":"ACL","author":"Zhang Xingxing","year":"2019","unstructured":"Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. In ACL."},{"key":"e_1_3_2_275_2","volume-title":"ACL","author":"Zhang Yizhe","year":"2020","unstructured":"Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. 2020. DIALOGPT : Large-scale generative pre-training for conversational response generation. In ACL."},{"key":"e_1_3_2_276_2","doi-asserted-by":"crossref","unstructured":"Zhengyan Zhang Yuxian Gu Xu Han Shengqi Chen Chaojun Xiao Zhenbo Sun Yuan Yao Fanchao Qi Jian Guan Pei Ke Yanzheng Cai Guoyang Zeng Zhixing Tan Zhiyuan Liu Minlie Huang Wentao Han Yang Liu Xiaoyan Zhu and Maosong Sun. 2021. CPM-2: Large-scale cost-effective pre-trained language models. AI Open 2 (2021) 216\u2013224.","DOI":"10.1016\/j.aiopen.2021.12.003"},{"key":"e_1_3_2_277_2","volume-title":"ACL","author":"Zhang Zhengyan","year":"2019","unstructured":"Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced language representation with informative entities. In ACL."},{"key":"e_1_3_2_278_2","article-title":"CPM: A large-scale generative chinese pre-trained language model","author":"Zhang Zhengyan","year":"2020","unstructured":"Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, YuSheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, and Maosong Sun. 2020. CPM: A large-scale generative chinese pre-trained language model. arXiv preprint arXiv:2012.00413 (2020).","journal-title":"arXiv preprint arXiv:2012.00413"},{"key":"e_1_3_2_279_2","volume-title":"NeurIPS","author":"Zhang Zhilu","year":"2018","unstructured":"Zhilu Zhang and Mert R. Sabuncu. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. In NeurIPS."},{"key":"e_1_3_2_280_2","article-title":"Recent advances and challenges in task-oriented dialog systems","author":"Zhang Zheng","year":"2020","unstructured":"Zheng Zhang, Ryuichi Takanobu, Qi Zhu, MinLie Huang, and XiaoYan Zhu. 2020. Recent advances and challenges in task-oriented dialog systems. Sci. China Technol. Sci. (2020).","journal-title":"Sci. China Technol. Sci."},{"key":"e_1_3_2_281_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong Yifan Du Chen Yang Yushuo Chen Zhipeng Chen Jinhao Jiang Ruiyang Ren Yifan Li Xinyu Tang Zikang Liu Peiyu Liu Jian-Yun Nie and Ji-Rong Wen. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)."},{"key":"e_1_3_2_282_2","volume-title":"ACL","author":"Zheng Hao","year":"2019","unstructured":"Hao Zheng and Mirella Lapata. 2019. Sentence centrality revisited for unsupervised summarization. In ACL."},{"key":"e_1_3_2_283_2","article-title":"DialogLM: Pre-trained model for long dialogue understanding and summarization","author":"Zhong Ming","year":"2021","unstructured":"Ming Zhong, Yang Liu, Yichong Xu, Chenguang Zhu, and Michael Zeng. 2021. DialogLM: Pre-trained model for long dialogue understanding and summarization. arXiv preprint arXiv:2109.02492 (2021).","journal-title":"arXiv preprint arXiv:2109.02492"},{"key":"e_1_3_2_284_2","article-title":"The design and implementation of xiaoice, an empathetic social chatbot","author":"Zhou Li","year":"2020","unstructured":"Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2020. The design and implementation of xiaoice, an empathetic social chatbot. Comput. Linguistics (2020).","journal-title":"Comput. Linguistics"},{"key":"e_1_3_2_285_2","article-title":"Controlled text generation with natural language instructions","volume":"2304","author":"Zhou Wangchunshu","year":"2023","unstructured":"Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, and Mrinmaya Sachan. 2023. Controlled text generation with natural language instructions. CoRR abs\/2304.14293 (2023).","journal-title":"CoRR"},{"key":"e_1_3_2_286_2","volume-title":"ICLR","author":"Zhou Wangchunshu","year":"2021","unstructured":"Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, and Xiang Ren. 2021. Pre-training text-to-text transformers for concept-centric common sense. In ICLR."},{"key":"e_1_3_2_287_2","volume-title":"SIGIR","author":"Zhu Yaoming","year":"2018","unstructured":"Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: A benchmarking platform for text generation models. In SIGIR."},{"key":"e_1_3_2_288_2","volume-title":"ACL","author":"Zmigrod Ran","year":"2019","unstructured":"Ran Zmigrod, S. J. Mielke, Hanna M. Wallach, and Ryan Cotterell. 2019. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In ACL."},{"key":"e_1_3_2_289_2","volume-title":"EMNLP","author":"Zou Yicheng","year":"2021","unstructured":"Yicheng Zou, Bolin Zhu, Xingwu Hu, Tao Gui, and Qi Zhang. 2021. Low-resource dialogue summarization with domain-agnostic multi-source pretraining. In EMNLP."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649449","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3649449","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:16Z","timestamp":1750291396000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649449"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,25]]},"references-count":288,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3649449"],"URL":"https:\/\/doi.org\/10.1145\/3649449","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,25]]},"assertion":[{"value":"2022-05-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-31","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}