{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T15:50:08Z","timestamp":1778514608400,"version":"3.51.4"},"reference-count":304,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,4,30]]},"abstract":"<jats:p>\n                    The challenge of effectively and efficiently adapting statically pre-trained Large Language Models (LLMs) to ever-evolving data distributions remains predominant. When tailored for specific needs, pre-trained LLMs often suffer from significant performance degradation in previous knowledge domains\u2014a phenomenon known as\n                    <jats:italic toggle=\"yes\">\u201ccatastrophic forgetting\u201d<\/jats:italic>\n                    . While extensively studied in the Continual Learning (CL) community, this problem presents new challenges in the context of LLMs. In this survey, we provide a comprehensive overview and detailed discussion of the current research progress on LLMs within the context of CL. Besides the introduction of the preliminary knowledge, this survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity:\n                    <jats:italic toggle=\"yes\">vertical continuity (or vertical continual learning)<\/jats:italic>\n                    , i.e., continual adaptation from general to specific capabilities, and\n                    <jats:italic toggle=\"yes\">horizontal continuity (or horizontal continual learning)<\/jats:italic>\n                    , i.e., continual adaptation across time and domains (Section\u00a0\n                    <jats:xref ref-type=\"sec\">3<\/jats:xref>\n                    ). Following vertical continuity, we summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section\u00a0\n                    <jats:xref ref-type=\"sec\">4<\/jats:xref>\n                    ). We then provide an overview of evaluation protocols for continual learning with LLMs, along with currently available data sources (Section\u00a0\n                    <jats:xref ref-type=\"sec\">5<\/jats:xref>\n                    ). Finally, we discuss intriguing questions related to continual learning for LLMs (Section\u00a0\n                    <jats:xref ref-type=\"sec\">6<\/jats:xref>\n                    ). This survey sheds light on the relatively understudied domain of continually pre-training, adapting, and fine-tuning large language models, suggesting the necessity for greater attention from the community. Key areas requiring immediate focus include the development of practical and accessible evaluation benchmarks, along with methodologies specifically designed to counter forgetting and enable knowledge transfer within the evolving landscape of LLM learning paradigms. The full list of articles examined in this survey is available at https:\/\/github.com\/Wang-ML-Lab\/llm-continual-learning-survey.\n                  <\/jats:p>","DOI":"10.1145\/3735633","type":"journal-article","created":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T07:28:21Z","timestamp":1747207701000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["Continual Learning of Large Language Models: A Comprehensive Survey"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8431-3703","authenticated-orcid":false,"given":"Haizhou","family":"Shi","sequence":"first","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8276-5417","authenticated-orcid":false,"given":"Zihao","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7696-9454","authenticated-orcid":false,"given":"Hengyi","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9225-9188","authenticated-orcid":false,"given":"Weiyi","family":"Qin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5287-1296","authenticated-orcid":false,"given":"Wenyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6099-2628","authenticated-orcid":false,"given":"Yibin","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0068-9042","authenticated-orcid":false,"given":"Zifeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Cloud AI Research, Google Inc","place":["Mountain View, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3133-8867","authenticated-orcid":false,"given":"Sayna","family":"Ebrahimi","sequence":"additional","affiliation":[{"name":"Google DeepMind","place":["Mountain View, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7308-938X","authenticated-orcid":false,"given":"Hao","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rutgers The State University of New Jersey","place":["New Brunswick, United States"]}]}],"member":"320","published-online":{"date-parts":[[2025,11,20]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et\u00a0al. 2023. Gpt-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_3_3_2","unstructured":"Emre Can Acikgoz Osman Batur \u0130nce Rayene Bench Arda An\u0131l Boz \u0130lker Kesen Aykut Erdem and Erkut Erdem. 2024. Hippocrates: An open-source framework for advancing large language models in healthcare. arXiv:2404.16621. Retrieved from https:\/\/arxiv.org\/abs\/2404.16621"},{"key":"e_1_3_3_4_2","unstructured":"Mayank Agarwal Yikang Shen Bailin Wang Yoon Kim and Jie Chen. 2024. Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models. arXiv:2401.10716. Retrieved from https:\/\/arxiv.org\/abs\/2401.10716"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_9"},{"key":"e_1_3_3_6_2","doi-asserted-by":"crossref","first-page":"2514","DOI":"10.1145\/3447548.3467162","volume-title":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Hombaiah Spurthi Amba","year":"2021","unstructured":"Spurthi Amba Hombaiah, Tao Chen, Mingyang Zhang, Michael Bendersky, and Marc Najork. 2021. Dynamic language models for continuously evolving content. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2514\u20132524."},{"key":"e_1_3_3_7_2","unstructured":"Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin Alexandre Passos Siamak Shakeri Emanuel Taropa Paige Bailey Zhifeng Chen et\u00a0al. 2023. Palm 2 technical report. arXiv:2305.10403. Retrieved from https:\/\/arxiv.org\/abs\/2305.10403"},{"key":"e_1_3_3_8_2","unstructured":"Dogu Araci. 2019. FinBERT: Financial Sentiment Analysis with Pre-trained Language Models. arXiv:1908.10063. Retrieved from https:\/\/arxiv.org\/abs\/1908.10063"},{"key":"e_1_3_3_9_2","unstructured":"Giuseppe Attanasio Debora Nozza Federico Bianchi and Dirk Hovy. 2023. Is It Worth the (Environmental) Cost? Limited Evidence for Temporal Adaptation via Continuous Training. arXiv:2210.07365. Retrieved from https:\/\/arxiv.org\/abs\/2210.07365"},{"key":"e_1_3_3_10_2","unstructured":"Zhangir Azerbayev Hailey Schoelkopf Keiran Paster Marco Dos Santos Stephen McAleer Albert Q. Jiang Jia Deng Stella Biderman and Sean Welleck. 2023. Llemma: An open language model for mathematics. arXiv:2310.10631. Retrieved from https:\/\/arxiv.org\/abs\/2310.10631"},{"key":"e_1_3_3_11_2","unstructured":"Xueying Bai Jinghuan Shang Yifan Sun and Niranjan Balasubramanian. 2023. Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift. arXiv:2205.12186. Retrieved from https:\/\/arxiv.org\/abs\/2205.12186"},{"key":"e_1_3_3_12_2","unstructured":"Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan et\u00a0al. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv:2204.05862. Retrieved from https:\/\/arxiv.org\/abs\/2204.05862"},{"key":"e_1_3_3_13_2","first-page":"65","volume-title":"Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization. 65\u201372."},{"key":"e_1_3_3_14_2","doi-asserted-by":"crossref","unstructured":"Jason Baumgartner Savvas Zannettou Brian Keegan Megan Squire and Jeremy Blackburn. 2020. The Pushshift Reddit Dataset. arXiv:2001.08435. Retrieved from https:\/\/arxiv.org\/abs\/2001.08435","DOI":"10.1609\/icwsm.v14i1.7347"},{"key":"e_1_3_3_15_2","doi-asserted-by":"crossref","unstructured":"Shai Ben-David John Blitzer Koby Crammer Alex Kulesza Fernando Pereira and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning 79 (2010) 151\u2013175.","DOI":"10.1007\/s10994-009-5152-4"},{"key":"e_1_3_3_16_2","unstructured":"Zhen Bi Ningyu Zhang Yida Xue Yixin Ou Daxiong Ji Guozhou Zheng and Huajun Chen. 2023. OceanGPT: A large language model for ocean science tasks. arXiv:2310.02031. Retrieved from https:\/\/arxiv.org\/abs\/2310.02031"},{"key":"e_1_3_3_17_2","first-page":"2397","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Biderman Stella","year":"2023","unstructured":"Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O\u2019Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et\u00a0al. 2023. Pythia: A suite for analyzing large language models across training and scaling. In Proceedings of the International Conference on Machine Learning. PMLR, 2397\u20132430."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.574"},{"key":"e_1_3_3_19_2","doi-asserted-by":"crossref","unstructured":"Yonatan Bisk Rowan Zellers Jianfeng Gao Yejin Choi and others. 2020. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI Conference on Artificial Intelligence. 7432\u20137439.","DOI":"10.1609\/aaai.v34i05.6239"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3302"},{"key":"e_1_3_3_21_2","unstructured":"Jorg Bornschein Yazhe Li and Amal Rannen-Triki. 2024. Transformers for Supervised Online Continual Learning. arXiv:2403.01554. Retrieved from https:\/\/arxiv.org\/abs\/2403.01554"},{"key":"e_1_3_3_22_2","unstructured":"Lucas Bourtoule Varun Chandrasekaran Christopher A. Choquette-Choo Hengrui Jia Adelin Travers Baiwu Zhang David Lie and Nicolas Papernot. 2020. Machine Unlearning. arXiv:1912.03817. Retrieved from https:\/\/arxiv.org\/abs\/1912.03817"},{"key":"e_1_3_3_23_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D. Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020) 1877\u20131901."},{"key":"e_1_3_3_24_2","first-page":"3339","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Br\u00fcmmer Martin","year":"2016","unstructured":"Martin Br\u00fcmmer, Milan Dojchinovski, and Sebastian Hellmann. 2016. Dbpedia abstracts: A large-scale, open, multilingual NLP training corpus. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916). 3339\u20133343."},{"key":"e_1_3_3_25_2","unstructured":"Pietro Buzzega Matteo Boschini Angelo Porrello Davide Abati and Simone Calderara. 2020. Dark experience for general continual learning: A strong simple baseline. Advances in Neural Information Processing Systems 33 (2020) 15920\u201315930."},{"key":"e_1_3_3_26_2","unstructured":"He Cao Zijing Liu Xingyu Lu Yuan Yao and Yu Li. 2023. InstructMol: Multi-modal integration for building a versatile and reliable molecular assistant in drug discovery. arXiv:2311.16208. Retrieved from https:\/\/arxiv.org\/abs\/2311.16208"},{"key":"e_1_3_3_27_2","unstructured":"Xusheng Cao Haori Lu Linlan Huang Xialei Liu and Ming-Ming Cheng. 2024. Generative multi-modal models are good class incremental learners. IEEE Computer Vision and Pattern Recognition (2024)."},{"key":"e_1_3_3_28_2","unstructured":"Caselaw Access Project. 2018. Caselaw Access Project. Retrieved from https:\/\/case.law\/"},{"key":"e_1_3_3_29_2","doi-asserted-by":"crossref","unstructured":"Ilias Chalkidis Tommaso Pasini Sheng Zhang Letizia Tomada Sebastian Felix Schwemer and Anders S\u00f8gaard. 2022. Fairlex: A multilingual benchmark for evaluating fairness in legal text processing. arXiv:2203.07228. Retrieved from https:\/\/arxiv.org\/abs\/2203.07228","DOI":"10.18653\/v1\/2022.acl-long.301"},{"key":"e_1_3_3_30_2","volume-title":"Proceedings of the ICLR","author":"Chaudhry Arslan","year":"2019","unstructured":"Arslan Chaudhry, Marc\u2019Aurelio Ranzato, Marcus Rohrbach, and Mohamed Elhoseiny. 2019. Efficient lifelong learning with a-GEM. In Proceedings of the ICLR."},{"key":"e_1_3_3_31_2","unstructured":"Arslan Chaudhry Marcus Rohrbach Mohamed Elhoseiny Thalaiyasingam Ajanthan Puneet K. Dokania Philip HS Torr and Marc\u2019Aurelio Ranzato. 2019. On tiny episodic memories in continual learning. arXiv:1902.10486. Retrieved from https:\/\/arxiv.org\/abs\/1902.10486"},{"key":"e_1_3_3_32_2","doi-asserted-by":"crossref","unstructured":"Ciprian Chelba Tomas Mikolov Mike Schuster Qi Ge Thorsten Brants Phillipp Koehn and Tony Robinson. 2014. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. arXiv:1312.3005. Retrieved from https:\/\/arxiv.org\/abs\/1312.3005","DOI":"10.21437\/Interspeech.2014-564"},{"key":"e_1_3_3_33_2","doi-asserted-by":"crossref","unstructured":"Cheng Chen Junchen Zhu Xu Luo Hengtao Shen Lianli Gao and Jingkuan Song. 2024. CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model. arXiv:2403.08350. Retrieved from https:\/\/arxiv.org\/abs\/2403.08350","DOI":"10.52202\/079017-1844"},{"key":"e_1_3_3_34_2","unstructured":"Junying Chen Xidong Wang Anningzhe Gao Feng Jiang Shunian Chen Hongbo Zhang Dingjie Song Wenya Xie Chuyi Kong Jianquan Li Xiang Wan Haizhou Li and Benyou Wang. 2023. HuatuoGPT-II one-stage training for medical adaption of LLMs. arXiv:2311.09774. Retrieved from https:\/\/arxiv.org\/abs\/2311.09774"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.634"},{"key":"e_1_3_3_36_2","first-page":"5383","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Chen Wuyang","year":"2023","unstructured":"Wuyang Chen, Yanqi Zhou, Nan Du, Yanping Huang, James Laudon, Zhifeng Chen, and Claire Cui. 2023. Lifelong language pretraining with distribution-specialized experts. In Proceedings of the International Conference on Machine Learning. PMLR, 5383\u20135395."},{"key":"e_1_3_3_37_2","unstructured":"Xuxi Chen Zhendong Wang Daouda Sow Junjie Yang Tianlong Chen Yingbin Liang Mingyuan Zhou and Zhangyang Wang. 2024. Take the bull by the horns: Hard sample-reweighted continual training improves LLM generalization. arXiv:2402.14270. Retrieved from https:\/\/arxiv.org\/abs\/2402.14270"},{"key":"e_1_3_3_38_2","unstructured":"Yongrui Chen Shenyu Zhang Guilin Qi and Xinnan Guo. 2024. Parameterizing context: Unleashing the power of parameter-efficient fine-tuning and in-context tuning for continual table semantic parsing. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_39_2","volume-title":"Lifelong Machine Learning","author":"Chen Zhiyuan","unstructured":"Zhiyuan Chen and Bing Liu. [n.d.]. Lifelong Machine Learning. Springer."},{"key":"e_1_3_3_40_2","unstructured":"Daixuan Cheng Shaohan Huang and Furu Wei. 2024. Adapting Large Language Models via Reading Comprehension. arXiv:2309.09530. Retrieved from https:\/\/arxiv.org\/abs\/2309.09530"},{"key":"e_1_3_3_41_2","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann et\u00a0al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24 240 (2023) 1\u2013113."},{"key":"e_1_3_3_42_2","unstructured":"Peter Clark Isaac Cowhey Oren Etzioni Tushar Khot Ashish Sabharwal Carissa Schoenick and Oyvind Tafjord. 2018. Think you have solved question answering? Try arc the ai2 reasoning challenge. arXiv:1803.05457. Retrieved from https:\/\/arxiv.org\/abs\/1803.05457"},{"key":"e_1_3_3_43_2","unstructured":"Pierre Colombo Telmo Pessoa Pires Malik Boudiaf Dominic Culver Rui Melo Caio Corro Andre F. T. Martins Fabrizio Esposito Vera L\u00facia Raposo Sofia Morgado and Michael Desa. 2024. SaulLM-7B: A pioneering Large Language Model for Law. arXiv:2403.03883. Retrieved from https:\/\/arxiv.org\/abs\/2403.03883"},{"key":"e_1_3_3_44_2","unstructured":"Together Computer. 2023. RedPajama: an Open Dataset for Training Large Language Models. Retrieved from https:\/\/github.com\/togethercomputer\/RedPajama-Data"},{"key":"e_1_3_3_45_2","doi-asserted-by":"crossref","unstructured":"Andrea Cossu Tinne Tuytelaars Antonio Carta Lucia Passaro Vincenzo Lomonaco and Davide Bacciu. 2022. Continual Pre-Training Mitigates Forgetting in Language and Vision. arXiv:2205.09357. Retrieved from https:\/\/arxiv.org\/abs\/2205.09357","DOI":"10.2139\/ssrn.4495233"},{"key":"e_1_3_3_46_2","unstructured":"Payel Das Subhajit Chaudhury Elliot Nelson Igor Melnyk Sarath Swaminathan Sihui Dai Aur\u00e9lie Lozano Georgios Kollias Vijil Chenthamarakshan Soham Dan et\u00a0al. 2024. Larimar: Large language models with episodic memory control. arXiv:2403.11901. Retrieved from https:\/\/arxiv.org\/abs\/2403.11901"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1606"},{"key":"e_1_3_3_48_2","doi-asserted-by":"crossref","unstructured":"Nicola De Cao Wilker Aziz and Ivan Titov. 2021. Editing factual knowledge in language models. arXiv:2104.08164. Retrieved from https:\/\/arxiv.org\/abs\/2104.08164","DOI":"10.18653\/v1\/2021.emnlp-main.522"},{"key":"e_1_3_3_49_2","doi-asserted-by":"crossref","unstructured":"Cheng Deng Tianhang Zhang Zhongmou He Yi Xu Qiyuan Chen Yuanyuan Shi Luoyi Fu Weinan Zhang Xinbing Wang Chenghu Zhou Zhouhan Lin and Junxian He. 2023. K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization. arXiv:2306.05064. Retrieved from https:\/\/arxiv.org\/abs\/2306.05064","DOI":"10.1145\/3616855.3635772"},{"key":"e_1_3_3_50_2","volume-title":"Proceedings of the CVPR09","author":"Deng J.","year":"2009","unstructured":"J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the CVPR09."},{"key":"e_1_3_3_51_2","unstructured":"Tim Dettmers Artidoro Pagnoni Ari Holtzman and Luke Zettlemoyer. 2023. QLoRA: Efficient finetuning of quantized LLMs. arXiv:2305.14314. Retrieved from https:\/\/arxiv.org\/abs\/2305.14314"},{"key":"e_1_3_3_52_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_3_53_2","doi-asserted-by":"crossref","unstructured":"Bhuwan Dhingra Jeremy R. Cole Julian Martin Eisenschlos Daniel Gillick Jacob Eisenstein and William W. Cohen. 2022. Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics 10 (2022) 257\u2013273.","DOI":"10.1162\/tacl_a_00459"},{"key":"e_1_3_3_54_2","doi-asserted-by":"crossref","unstructured":"Qingxiu Dong Damai Dai Yifan Song Jingjing Xu Zhifang Sui and Lei Li. 2022. Calibrating factual knowledge in pretrained language models. arXiv:2210.03329. Retrieved from https:\/\/arxiv.org\/abs\/2210.03329","DOI":"10.18653\/v1\/2022.findings-emnlp.438"},{"key":"e_1_3_3_55_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et\u00a0al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_3_56_2","unstructured":"Longxu Dou Qian Liu Guangtao Zeng Jia Guo Jiahui Zhou Wei Lu and Min Lin. 2024. Sailor: Open language models for south-east asia. arXiv:2404.03608. Retrieved from https:\/\/arxiv.org\/abs\/2404.03608"},{"key":"e_1_3_3_57_2","unstructured":"Dheeru Dua Yizhong Wang Pradeep Dasigi Gabriel Stanovsky Sameer Singh and Matt Gardner. 2019. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. arXiv:1903.00161. Retrieved from https:\/\/arxiv.org\/abs\/1903.00161"},{"key":"e_1_3_3_58_2","unstructured":"Sayna Ebrahimi Mohamed Elhoseiny Trevor Darrell and Marcus Rohrbach. 2019. Uncertainty-guided continual learning with bayesian neural networks. arXiv:1906.02425. Retrieved from https:\/\/arxiv.org\/abs\/1906.02425"},{"key":"e_1_3_3_59_2","first-page":"386","volume-title":"Proceedings of the 16th European Conference on Computer Vision\u2013ECCV 2020, Glasgow, UK, August 23\u201328, 2020, Part XI 16","author":"Ebrahimi Sayna","year":"2020","unstructured":"Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, and Marcus Rohrbach. 2020. Adversarial continual learning. In Proceedings of the 16th European Conference on Computer Vision\u2013ECCV 2020, Glasgow, UK, August 23\u201328, 2020, Part XI 16. Springer, 386\u2013402."},{"key":"e_1_3_3_60_2","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)","author":"Elsahar Hady","year":"2018","unstructured":"Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Frederique Laforest, and Elena Simperl. 2018. T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)."},{"key":"e_1_3_3_61_2","unstructured":"Kazuki Fujii Taishi Nakamura Mengsay Loem Hiroki Iida Masanari Ohi Kakeru Hattori Hirai Shota Sakae Mizuki Rio Yokota and Naoaki Okazaki. 2024. Continual pre-training for cross-lingual LLM adaptation: Enhancing japanese language capabilities. arXiv:2404.17790. Retrieved from https:\/\/arxiv.org\/abs\/2404.17790"},{"key":"e_1_3_3_62_2","unstructured":"Yaroslav Ganin Evgeniya Ustinova Hana Ajakan Pascal Germain Hugo Larochelle Fran\u00e7ois Laviolette Mario Marchand and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17 1 (2016) 2096\u20132030."},{"key":"e_1_3_3_63_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations (ICLR)","author":"Garg Saurabh","year":"2024","unstructured":"Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, and Fartash Faghri. 2024. TiC-CLIP: Continual training of CLIP models. In Proceedings of the 12th International Conference on Learning Representations (ICLR). Retrieved from https:\/\/openreview.net\/forum?id=TLADT8Wrhn"},{"key":"e_1_3_3_64_2","doi-asserted-by":"crossref","unstructured":"Evangelia Gogoulou Timoth\u00e9e Lesort Magnus Boman and Joakim Nivre. 2024. Continual Learning Under Language Shift. arXiv:2311.01200. Retrieved from https:\/\/arxiv.org\/abs\/2311.01200","DOI":"10.1007\/978-3-031-70563-2_6"},{"key":"e_1_3_3_65_2","unstructured":"Aaron Gokaslan and Vanya Cohen. 2019. OpenWebText Corpus. Retrieved from http:\/\/Skylion007.github.io\/OpenWebTextCorpus"},{"key":"e_1_3_3_66_2","doi-asserted-by":"crossref","unstructured":"Yash Goyal Tejas Khot Douglas Summers-Stay Dhruv Batra and Devi Parikh. 2017. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. arXiv:1612.00837. Retrieved from https:\/\/arxiv.org\/abs\/1612.00837","DOI":"10.1109\/CVPR.2017.670"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","unstructured":"Yu Gu Robert Tinn Hao Cheng Michael Lucas Naoto Usuyama Xiaodong Liu Tristan Naumann Jianfeng Gao and Hoifung Poon. 2021. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare 3 1 (2021) 1\u201323. DOI:10.1145\/3458754","DOI":"10.1145\/3458754"},{"key":"e_1_3_3_68_2","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y. Wu Y. K. Li Fuli Luo Yingfei Xiong and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming\u2014The Rise of Code Intelligence. arXiv:2401.14196. Retrieved from https:\/\/arxiv.org\/abs\/2401.14196"},{"key":"e_1_3_3_69_2","unstructured":"Zhen Guo and Yining Hua. 2023. Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering. arXiv:2311.00204. Retrieved from https:\/\/arxiv.org\/abs\/2311.00204"},{"key":"e_1_3_3_70_2","unstructured":"Kshitij Gupta Benjamin Th\u00e9rien Adam Ibrahim Mats L. Richter Quentin Anthony Eugene Belilovsky Irina Rish and Timoth\u00e9e Lesort. 2023. Continual Pre-Training of Large Language Models: How to (re)warm your model? arXiv:2308.04014. Retrieved from https:\/\/arxiv.org\/abs\/2308.04014"},{"key":"e_1_3_3_71_2","doi-asserted-by":"crossref","unstructured":"Danna Gurari Qing Li Abigale J. Stangl Anhong Guo Chi Lin Kristen Grauman Jiebo Luo and Jeffrey P. Bigham. 2018. VizWiz Grand Challenge: Answering Visual Questions from Blind People. arXiv:1802.08218. Retrieved from https:\/\/arxiv.org\/abs\/1802.08218","DOI":"10.1109\/CVPR.2018.00380"},{"key":"e_1_3_3_72_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.407"},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.436"},{"key":"e_1_3_3_75_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Hartvigsen Thomas","year":"2023","unstructured":"Thomas Hartvigsen, Swami Sankaranarayanan, Hamid Palangi, Yoon Kim, and Marzyeh Ghassemi. 2023. Aging with GRACE: Lifelong model editing with discrete key-value adaptors. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_76_2","unstructured":"Peter Hase Mohit Bansal Been Kim and Asma Ghandeharioun. 2023. Does localization inform editing? Surprising differences in causality-based localization vs. Knowledge Editing in Language Models (2023)."},{"key":"e_1_3_3_77_2","unstructured":"Peter Hase Mona Diab Asli Celikyilmaz Xian Li Zornitsa Kozareva Veselin Stoyanov Mohit Bansal and Srinivasan Iyer. 2021. Do language models have beliefs? Methods for detecting updating and visualizing model beliefs. arXiv:2111.13654. Retrieved from https:\/\/arxiv.org\/abs\/2111.13654"},{"key":"e_1_3_3_78_2","unstructured":"Jinghan He Haiyun Guo Ming Tang and Jinqiao Wang. 2023. Continual Instruction Tuning for Large Multimodal Models. arXiv:2311.16206. Retrieved from https:\/\/arxiv.org\/abs\/2311.16206"},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883037"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.95"},{"key":"e_1_3_3_81_2","unstructured":"Yongquan He Xuancheng Huang Minghao Tang Lingxun Meng Xiang Li Wei Lin Wenyuan Zhang and Yifu Gao. 2024. Don\u2019t Half-listen: Capturing Key-part Information in Continual Instruction Tuning. arXiv:2403.10056. Retrieved from https:\/\/arxiv.org\/abs\/2403.10056"},{"key":"e_1_3_3_82_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andrew Critch Jerry Li Dawn Song and Jacob Steinhardt. 2023. Aligning AI With Shared Human Values. arXiv:2008.02275. Retrieved from https:\/\/arxiv.org\/abs\/2008.02275"},{"key":"e_1_3_3_83_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2021. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (2021)."},{"key":"e_1_3_3_84_2","doi-asserted-by":"crossref","unstructured":"Daniel Hewlett Alexandre Lacoste Llion Jones Illia Polosukhin Andrew Fandrianto Jay Han Matthew Kelcey and David Berthelot. 2016. Wikireading: A novel large-scale language understanding task over wikipedia. arXiv:1608.03542. Retrieved from https:\/\/arxiv.org\/abs\/1608.03542","DOI":"10.18653\/v1\/P16-1145"},{"key":"e_1_3_3_85_2","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de Las Casas Lisa Anne Hendricks Johannes Welbl Aidan Clark et\u00a0al. 2022. Training compute-optimal large language models. arXiv:2203.15556. Retrieved from https:\/\/arxiv.org\/abs\/2203.15556"},{"key":"e_1_3_3_86_2","doi-asserted-by":"crossref","unstructured":"Chenhui Hu Pengfei Cao Yubo Chen Kang Liu and Jun Zhao. 2024. WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing. arXiv:2402.10987. Retrieved from https:\/\/arxiv.org\/abs\/2402.10987","DOI":"10.18653\/v1\/2024.findings-acl.207"},{"key":"e_1_3_3_87_2","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv:2106.09685. Retrieved from https:\/\/arxiv.org\/abs\/2106.09685"},{"key":"e_1_3_3_88_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=nZeVKeeFYf9"},{"key":"e_1_3_3_89_2","unstructured":"Yebowen Hu Tim Ganter Hanieh Deilamsalehy Franck Dernoncourt Hassan Foroosh and Fei Liu. 2023. MeetingBank: A Benchmark Dataset for Meeting Summarization. arXiv:2305.17529. Retrieved from https:\/\/arxiv.org\/abs\/2305.17529"},{"key":"e_1_3_3_90_2","unstructured":"Jianheng Huang Leyang Cui Ante Wang Chengyi Yang Xinting Liao Linfeng Song Junfeng Yao and Jinsong Su. 2024. Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal. arXiv:2403.01244. Retrieved from https:\/\/arxiv.org\/abs\/2403.01244"},{"key":"e_1_3_3_91_2","doi-asserted-by":"crossref","unstructured":"Lifu Huang Ronan Le Bras Chandra Bhagavatula and Yejin Choi. 2019. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. arXiv:1909.00277. Retrieved from https:\/\/arxiv.org\/abs\/1909.00277","DOI":"10.18653\/v1\/D19-1243"},{"key":"e_1_3_3_92_2","unstructured":"Quzhe Huang Mingxu Tao Zhenwei An Chen Zhang Cong Jiang Zhibin Chen Zirui Wu and Yansong Feng. 2023. Lawyer LLaMA technical report. arXiv:2305.15062. Retrieved from https:\/\/arxiv.org\/abs\/2305.15062"},{"key":"e_1_3_3_93_2","unstructured":"Zeyu Huang Yikang Shen Xiaofeng Zhang Jie Zhou Wenge Rong and Zhang Xiong. 2023. Transformer-patcher: One mistake worth one neuron. arXiv:2301.09785. Retrieved from https:\/\/arxiv.org\/abs\/2301.09785"},{"key":"e_1_3_3_94_2","doi-asserted-by":"crossref","unstructured":"Drew A. Hudson and Christopher D. Manning. 2019. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. arXiv:1902.09506. Retrieved from https:\/\/arxiv.org\/abs\/1902.09506","DOI":"10.1109\/CVPR.2019.00686"},{"key":"e_1_3_3_95_2","unstructured":"Adam Ibrahim Benjamin Th\u00e9rien Kshitij Gupta Mats L. Richter Quentin Anthony Timoth\u00e9e Lesort Eugene Belilovsky and Irina Rish. 2024. Simple and scalable strategies to continually pre-train large language models. arXiv:2403.08763. Retrieved from https:\/\/arxiv.org\/abs\/2403.08763"},{"key":"e_1_3_3_96_2","doi-asserted-by":"crossref","unstructured":"Joel Jang Seonghyeon Ye Changho Lee Sohee Yang Joongbo Shin Janghoon Han Gyeonghun Kim and Minjoon Seo. 2022. TemporalWiki: A lifelong benchmark for training and evaluating ever-evolving language models. EMNLP 2022.","DOI":"10.18653\/v1\/2022.emnlp-main.418"},{"key":"e_1_3_3_97_2","volume-title":"Proceedings of the ICLR","author":"Jang Joel","year":"2022","unstructured":"Joel Jang, Seonghyeon Ye, Sohee Yang, Joongbo Shin, Janghoon Han, Gyeonghun Kim, Stanley Jungkyu Choi, and Minjoon Seo. 2022. Towards continual knowledge learning of language models. In Proceedings of the ICLR."},{"key":"e_1_3_3_98_2","doi-asserted-by":"crossref","unstructured":"Jiaming Ji Tianyi Qiu Boyuan Chen Borong Zhang Hantao Lou Kaile Wang Yawen Duan Zhonghao He Jiayi Zhou Zhaowei Zhang Fanzhi Zeng Kwan Yee Ng Juntao Dai Xuehai Pan Aidan O\u2019Gara Yingshan Lei Hua Xu Brian Tse Jie Fu Stephen McAleer Yaodong Yang Yizhou Wang Song-Chun Zhu Yike Guo and Wen Gao. 2024. AI Alignment: A Comprehensive Survey. arXiv:2310.19852. Retrieved from https:\/\/arxiv.org\/abs\/2310.19852","DOI":"10.1145\/3770749"},{"key":"e_1_3_3_99_2","unstructured":"Zhengbao Jiang Zhiqing Sun Weijia Shi Pedro Rodriguez Chunting Zhou Graham Neubig Xi Victoria Lin Wen tau Yih and Srinivasan Iyer. 2024. Instruction-tuned Language Models are Better Knowledge Learners. arXiv:2402.12847. Retrieved from https:\/\/arxiv.org\/abs\/2402.12847"},{"key":"e_1_3_3_100_2","unstructured":"Xisen Jin and Xiang Ren. 2024. What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement. arXiv:2402.01865. Retrieved from https:\/\/arxiv.org\/abs\/2402.01865"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.bigscience-1.1"},{"key":"e_1_3_3_102_2","volume-title":"Principles of Neural Science","author":"Kandel Eric R.","year":"2000","unstructured":"Eric R. Kandel, James H. Schwartz, Thomas M. Jessell, Steven Siegelbaum, A. James Hudspeth, Sarah Mack, et\u00a0al. 2000. Principles of Neural Science. McGraw-hill New York."},{"key":"e_1_3_3_103_2","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361. Retrieved from https:\/\/arxiv.org\/abs\/2001.08361"},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1086"},{"key":"e_1_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.695"},{"key":"e_1_3_3_106_2","unstructured":"Zixuan Ke and Bing Liu. 2023. Continual Learning of Natural Language Processing Tasks: A Survey. arXiv:2211.12701. Retrieved from https:\/\/arxiv.org\/abs\/2211.12701"},{"key":"e_1_3_3_107_2","volume-title":"Proceedings of the NeurIPS","author":"Ke Zixuan","year":"2021","unstructured":"Zixuan Ke, Bing Liu, Nianzu Ma, Hu Xu, and Shu Lei. 2021. Achieving forgetting prevention and knowledge transfer in continual learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_3_108_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Ke Zixuan","year":"2022","unstructured":"Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, and Bing Liu. 2022. Continual pre-training of language models. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_3_109_2","first-page":"1","volume-title":"Proceedings of the 8th Edition of the Swiss Text Analytics Conference","author":"Kew Tannon","year":"2023","unstructured":"Tannon Kew, Marek Kostrzewa, and Sarah Ebling. 2023. 20 Minuten: A multi-task news summarisation dataset for german. In Proceedings of the 8th Edition of the Swiss Text Analytics Conference, Hatem Ghorbel, Maria Sokhn, Mark Cieliebak, Manuela H\u00fcrlimann, Emmanuel de Salis, and Jonathan Guerne (Eds.). Association for Computational Linguistics, Neuchatel, Switzerland, 1\u201313. Retrieved from https:\/\/aclanthology.org\/2023.swisstext-1.1"},{"key":"e_1_3_3_110_2","first-page":"252","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Khashabi Daniel","year":"2018","unstructured":"Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay, and Dan Roth. 2018. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 252\u2013262."},{"key":"e_1_3_3_111_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K17-1010"},{"key":"e_1_3_3_112_2","doi-asserted-by":"crossref","unstructured":"Tushar Khot Peter Clark Michal Guerquin Peter Jansen and Ashish Sabharwal. 2020. QASC: A Dataset for Question Answering via Sentence Composition. arXiv:1910.11473. Retrieved from https:\/\/arxiv.org\/abs\/1910.11473","DOI":"10.1609\/aaai.v34i05.6319"},{"key":"e_1_3_3_113_2","first-page":"5065","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"35","author":"Kim Gyuhak","year":"2022","unstructured":"Gyuhak Kim, Changnan Xiao, Tatsuya Konishi, Zixuan Ke, and Bing Liu. 2022. A theoretical study on solving continual learning. In Proceedings of the Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). Vol. 35, Curran Associates, Inc., 5065\u20135079. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/20f44da80080d76bbc35bca0027f14e6-Paper-Conference.pdf"},{"key":"e_1_3_3_114_2","doi-asserted-by":"crossref","unstructured":"James Kirkpatrick Razvan Pascanu Neil Rabinowitz Joel Veness Guillaume Desjardins Andrei A. Rusu Kieran Milan John Quan Tiago Ramalho Agnieszka Grabska-Barwinska et\u00a0al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences 114 13 (2017) 3521\u20133526.","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_3_115_2","doi-asserted-by":"crossref","unstructured":"Tom Kwiatkowski Jennimaria Palomaki Olivia Redfield Michael Collins Ankur Parikh Chris Alberti Danielle Epstein Illia Polosukhin Jacob Devlin Kenton Lee et\u00a0al. 2019. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics 7 (2019) 453\u2013466.","DOI":"10.1162\/tacl_a_00276"},{"key":"e_1_3_3_116_2","doi-asserted-by":"crossref","unstructured":"Guokun Lai Qizhe Xie Hanxiao Liu Yiming Yang and Eduard Hovy. 2017. Race: Large-scale reading comprehension dataset from examinations. arXiv:1704.04683. Retrieved from https:\/\/arxiv.org\/abs\/1704.04683","DOI":"10.18653\/v1\/D17-1082"},{"key":"e_1_3_3_117_2","unstructured":"Angeliki Lazaridou Adhi Kuncoro Elena Gribovskaya Devang Agrawal Adam Liska Tayfun Terzi Mai Gimenez Cyprien de Masson d\u2019Autume Tomas Kocisky Sebastian Ruder et\u00a0al. 2021. Mind the gap: Assessing temporal generalization in neural language models. Advances in Neural Information Processing Systems 34 (2021) 29348\u201329363."},{"key":"e_1_3_3_118_2","doi-asserted-by":"crossref","unstructured":"Omer Levy Minjoon Seo Eunsol Choi and Luke Zettlemoyer. 2017. Zero-shot relation extraction via reading comprehension. arXiv:1706.04115. Retrieved from https:\/\/arxiv.org\/abs\/1706.04115","DOI":"10.18653\/v1\/K17-1034"},{"key":"e_1_3_3_119_2","unstructured":"Chen-An Li and Hung-Yi Lee. 2024. Examining Forgetting in Continual Pre-training of Aligned Large Language Models. arXiv:2401.03129. Retrieved from https:\/\/arxiv.org\/abs\/2401.03129"},{"key":"e_1_3_3_120_2","unstructured":"Daliang Li Ankit Singh Rawat Manzil Zaheer Xin Wang Michal Lukasik Andreas Veit Felix Yu and Sanjiv Kumar. 2022. Large language models with controllable working memory. arXiv:2211.05110. Retrieved from https:\/\/arxiv.org\/abs\/2211.05110"},{"key":"e_1_3_3_121_2","unstructured":"Haitao Li Qingyao Ai Jia Chen Qian Dong Zhijing Wu Yiqun Liu Chong Chen and Qi Tian. 2024. BLADE: Enhancing black-box large language models with small domain-specific models. arXiv:2403.18365. Retrieved from https:\/\/arxiv.org\/abs\/2403.18365"},{"key":"e_1_3_3_122_2","unstructured":"Jiangtong Li Yuxuan Bian Guoxuan Wang Yang Lei Dawei Cheng Zhijun Ding and Changjun Jiang. 2023. CFGPT: Chinese Financial Assistant with Large Language Model. arXiv:2309.10654. Retrieved from https:\/\/arxiv.org\/abs\/2309.10654"},{"key":"e_1_3_3_123_2","unstructured":"Junnan Li Dongxu Li Silvio Savarese and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arXiv:2301.12597. Retrieved from https:\/\/arxiv.org\/abs\/2301.12597"},{"key":"e_1_3_3_124_2","unstructured":"Linyang Li and Xipeng Qiu. 2023. Continual Model Evolvement with Inner-Product Restriction. Retrieved from https:\/\/openreview.net\/forum?id=fn0BQK5T8p"},{"key":"e_1_3_3_125_2","doi-asserted-by":"crossref","unstructured":"Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 12 (2017) 2935\u20132947.","DOI":"10.1109\/TPAMI.2017.2773081"},{"key":"e_1_3_3_126_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.223"},{"key":"e_1_3_3_127_2","first-page":"74","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. 74\u201381."},{"key":"e_1_3_3_128_2","doi-asserted-by":"crossref","unstructured":"Kevin Lin Oyvind Tafjord Peter Clark and Matt Gardner. 2019. Reasoning Over Paragraph Effects in Situations. arXiv:1908.05852. Retrieved from https:\/\/arxiv.org\/abs\/1908.05852","DOI":"10.18653\/v1\/D19-5808"},{"key":"e_1_3_3_129_2","doi-asserted-by":"crossref","unstructured":"Yong Lin Hangyu Lin Wei Xiong Shizhe Diao Jianmeng Liu Jipeng Zhang Rui Pan Haoxiang Wang Wenbin Hu Hanning Zhang Hanze Dong Renjie Pi Han Zhao Nan Jiang Heng Ji Yuan Yao and Tong Zhang. 2024. Mitigating the Alignment Tax of RLHF. arXiv:2309.06256. Retrieved from https:\/\/arxiv.org\/abs\/2309.06256","DOI":"10.18653\/v1\/2024.emnlp-main.35"},{"key":"e_1_3_3_130_2","unstructured":"Zhouhan Lin Cheng Deng Le Zhou Tianhang Zhang Yi Xu Yutong Xu Zhongmou He Yuanyuan Shi Beiya Dai Yunchong Song Boyi Zeng Qiyuan Chen Tao Shi Tianyu Huang Yiwei Xu Shu Wang Luoyi Fu Weinan Zhang Junxian He Chao Ma Yunqiang Zhu Xinbing Wang and Chenghu Zhou. 2023. GeoGalactica: A Scientific Large Language Model in Geoscience. arXiv:2401.00434. Retrieved from https:\/\/arxiv.org\/abs\/2401.00434"},{"key":"e_1_3_3_131_2","unstructured":"Zhenghao Lin Zhibin Gou Yeyun Gong Xiao Liu Yelong Shen Ruochen Xu Chen Lin Yujiu Yang Jian Jiao Nan Duan et\u00a0al. 2024. Rho-1: Not all tokens are what you need. arXiv:2404.07965. Retrieved from https:\/\/arxiv.org\/abs\/2404.07965"},{"key":"e_1_3_3_132_2","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong Jae Lee. 2023. Visual Instruction Tuning. arXiv:2304.08485. Retrieved from https:\/\/arxiv.org\/abs\/2304.08485"},{"key":"e_1_3_3_133_2","unstructured":"Haokun Liu Derek Tam Mohammed Muqeeth Jay Mohta Tenghao Huang Mohit Bansal and Colin A Raffel. 2022. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022) 1950\u20131965."},{"key":"e_1_3_3_134_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_3_135_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.447"},{"key":"e_1_3_3_136_2","doi-asserted-by":"crossref","unstructured":"Vincenzo Lomonaco Davide Maltoni and Lorenzo Pellegrini. 2020. Rehearsal-Free Continual Learning over Small Non-I.I.D. Batches. arXiv:1907.03799. Retrieved from https:\/\/arxiv.org\/abs\/1907.03799","DOI":"10.1109\/CVPRW50498.2020.00131"},{"key":"e_1_3_3_137_2","unstructured":"David Lopez-Paz and Marc\u2019Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in Neural Information Processing Systems 30 (2017)."},{"key":"e_1_3_3_138_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-demo.25"},{"key":"e_1_3_3_139_2","unstructured":"Dakuan Lu Hengkui Wu Jiaqing Liang Yipei Xu Qianyu He Yipeng Geng Mengkun Han Yingsi Xin and Yanghua Xiao. 2023. BBT-Fin: Comprehensive construction of chinese financial domain pre-trained language model corpus and benchmark. arXiv:2302.09432. Retrieved from https:\/\/arxiv.org\/abs\/2302.09432"},{"key":"e_1_3_3_140_2","unstructured":"Pengyuan Lu Michele Caprio Eric Eaton and Insup Lee. 2023. IBCL: Zero-shot Model Generation for Task Tradeoffs in Continual Learning. arXiv:2310.02995. Retrieved from https:\/\/arxiv.org\/abs\/2310.02995"},{"key":"e_1_3_3_141_2","unstructured":"Pan Lu Swaroop Mishra Tony Xia Liang Qiu Kai-Wei Chang Song-Chun Zhu Oyvind Tafjord Peter Clark and Ashwin Kalyan. 2022. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. arXiv:2209.09513. Retrieved from https:\/\/arxiv.org\/abs\/2209.09513"},{"key":"e_1_3_3_142_2","unstructured":"Shuai Lu Daya Guo Shuo Ren Junjie Huang Alexey Svyatkovskiy Ambrosio Blanco Colin Clement Dawn Drain Daxin Jiang Duyu Tang Ge Li Lidong Zhou Linjun Shou Long Zhou Michele Tufano Ming Gong Ming Zhou Nan Duan Neel Sundaresan Shao Kun Deng Shengyu Fu and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv:2102.04664. Retrieved from https:\/\/arxiv.org\/abs\/2102.04664"},{"key":"e_1_3_3_143_2","unstructured":"Haipeng Luo Qingfeng Sun Can Xu Pu Zhao Jianguang Lou Chongyang Tao Xiubo Geng Qingwei Lin Shifeng Chen and Dongmei Zhang. 2023. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv:2308.09583. Retrieved from https:\/\/arxiv.org\/abs\/2308.09583"},{"key":"e_1_3_3_144_2","doi-asserted-by":"publisher","unstructured":"Renqian Luo Liai Sun Yingce Xia Tao Qin Sheng Zhang Hoifung Poon and Tie-Yan Liu. 2022. BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics 23 6 (2022). DOI:10.1093\/bib\/bbac409","DOI":"10.1093\/bib\/bbac409"},{"key":"e_1_3_3_145_2","unstructured":"Yun Luo Zhen Yang Xuefeng Bai Fandong Meng Jie Zhou and Yue Zhang. 2023. Investigating Forgetting in Pre-Trained Representations Through Continual Learning. arXiv:2305.05968. Retrieved from https:\/\/arxiv.org\/abs\/2305.05968"},{"key":"e_1_3_3_146_2","unstructured":"Yun Luo Zhen Yang Fandong Meng Yafu Li Jie Zhou and Yue Zhang. 2023. An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning. arXiv:2308.08747. Retrieved from https:\/\/arxiv.org\/abs\/2308.08747"},{"key":"e_1_3_3_147_2","unstructured":"Yizhen Luo Jiahuan Zhang Siqi Fan Kai Yang Yushuai Wu Mu Qiao and Zaiqing Nie. 2023. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv:2308.09442. Retrieved from https:\/\/arxiv.org\/abs\/2308.09442"},{"key":"e_1_3_3_148_2","unstructured":"Ziyang Luo Can Xu Pu Zhao Qingfeng Sun Xiubo Geng Wenxiang Hu Chongyang Tao Jing Ma Qingwei Lin and Daxin Jiang. 2023. WizardCoder: Empowering Code Large Language Models with Evol-Instruct. arXiv:2306.08568. Retrieved from https:\/\/arxiv.org\/abs\/2306.08568"},{"key":"e_1_3_3_149_2","unstructured":"Shirong Ma Shen Huang Shulin Huang Xiaobin Wang Yangning Li Hai-Tao Zheng Pengjun Xie Fei Huang and Yong Jiang. 2023. EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data. arXiv:2312.15696. Retrieved from https:\/\/arxiv.org\/abs\/2312.15696"},{"key":"e_1_3_3_150_2","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002491"},{"key":"e_1_3_3_151_2","doi-asserted-by":"publisher","unstructured":"Zheda Mai Ruiwen Li Jihwan Jeong David Quispe Hyunwoo Kim and Scott Sanner. 2022. Online continual learning in image classification: An empirical survey. Neurocomputing 469 (2022) 28\u201351. DOI:10.1016\/j.neucom.2021.10.021","DOI":"10.1016\/j.neucom.2021.10.021"},{"key":"e_1_3_3_152_2","unstructured":"Junhua Mao Jonathan Huang Alexander Toshev Oana Camburu Alan Yuille and Kevin Murphy. 2016. Generation and Comprehension of Unambiguous Object Descriptions. arXiv:1511.02283. Retrieved from https:\/\/arxiv.org\/abs\/1511.02283"},{"key":"e_1_3_3_153_2","unstructured":"Vittorio Mazzia Alessandro Pedrani Andrea Caciolai Kay Rottmann and Davide Bernardi. 2023. A survey on knowledge editing of neural networks. arXiv:2310.19704. Retrieved from https:\/\/arxiv.org\/abs\/2310.19704"},{"key":"e_1_3_3_154_2","unstructured":"David McCaffary. 2021. Towards continual task learning in artificial neural networks: Current approaches and insights from neuroscience. arXiv:2112.14146. Retrieved from https:\/\/arxiv.org\/abs\/2112.14146"},{"key":"e_1_3_3_155_2","doi-asserted-by":"crossref","unstructured":"James L. McClelland Bruce L. McNaughton and Randall C. O\u2019Reilly. 1995. Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review 102 3 (1995) 419.","DOI":"10.1037\/\/0033-295X.102.3.419"},{"key":"e_1_3_3_156_2","doi-asserted-by":"publisher","unstructured":"Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation Vol. 24. Academic Press 109\u2013165. DOI:10.1016\/S0079-7421(08)60536-8","DOI":"10.1016\/S0079-7421(08)60536-8"},{"key":"e_1_3_3_157_2","unstructured":"Sanket Vaibhav Mehta Darshan Patil Sarath Chandar and Emma Strubell. 2023. An empirical investigation of the role of pre-training in lifelong learning. Journal of Machine Learning Research 24 214 (2023) 1\u201350. Retrieved from http:\/\/jmlr.org\/papers\/v24\/22-0496.html"},{"key":"e_1_3_3_158_2","unstructured":"Kevin Meng David Bau Alex Andonian and Yonatan Belinkov. 2022. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems 35 (2022) 17359\u201317372."},{"key":"e_1_3_3_159_2","unstructured":"Kevin Meng Arnab Sen Sharma Alex Andonian Yonatan Belinkov and David Bau. 2022. Mass-editing memory in a transformer. arXiv:1701.00133. Retrieved from https:\/\/arxiv.org\/abs\/1701.00133"},{"key":"e_1_3_3_160_2","doi-asserted-by":"crossref","unstructured":"Sewon Min Xinxi Lyu Ari Holtzman Mikel Artetxe Mike Lewis Hannaneh Hajishirzi and Luke Zettlemoyer. 2022. Rethinking the role of demonstrations: What makes in-context learning work? arXiv:2202.12837. Retrieved from https:\/\/arxiv.org\/abs\/2202.12837","DOI":"10.18653\/v1\/2022.emnlp-main.759"},{"key":"e_1_3_3_161_2","volume-title":"Proceedings of the ICDAR","author":"Mishra Anand","year":"2019","unstructured":"Anand Mishra, Shashank Shekhar, Ajeet Kumar Singh, and Anirban Chakraborty. 2019. OCR-VQA: Visual question answering by reading text in images. In Proceedings of the ICDAR."},{"key":"e_1_3_3_162_2","unstructured":"Swaroop Mishra Daniel Khashabi Chitta Baral and Hannaneh Hajishirzi. 2021. Natural Instructions: Benchmarking generalization to new tasks from natural language instructions. arXiv:2104.08773. Retrieved from https:\/\/arxiv.org\/abs\/2104.08773"},{"key":"e_1_3_3_163_2","doi-asserted-by":"crossref","unstructured":"Swaroop Mishra Arindam Mitra Neeraj Varshney Bhavdeep Sachdeva Peter Clark Chitta Baral and Ashwin Kalyan. 2022. NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks. arXiv:2204.05660. Retrieved from https:\/\/arxiv.org\/abs\/2204.05660","DOI":"10.18653\/v1\/2022.acl-long.246"},{"key":"e_1_3_3_164_2","unstructured":"Eric Mitchell Charles Lin Antoine Bosselut Chelsea Finn and Christopher D Manning. 2021. Fast model editing at scale. arXiv:2110.11309. Retrieved from https:\/\/arxiv.org\/abs\/2110.11309"},{"key":"e_1_3_3_165_2","first-page":"15817","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D Manning, and Chelsea Finn. 2022. Memory-based model editing at scale. In Proceedings of the International Conference on Machine Learning. PMLR, 15817\u201315831."},{"key":"e_1_3_3_166_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.703"},{"key":"e_1_3_3_167_2","doi-asserted-by":"publisher","unstructured":"Arghavan Moradi Dakhel Vahid Majdinasab Amin Nikanjam Foutse Khomh Michel C. Desmarais and Zhen Ming (Jack) Jiang. 2023. GitHub copilot AI pair programmer: Asset or liability? Journal of Systems and Software 203 (2023) 111734. DOI:10.1016\/j.jss.2023.111734","DOI":"10.1016\/j.jss.2023.111734"},{"key":"e_1_3_3_168_2","unstructured":"Taishi Nakamura Mayank Mishra Simone Tedeschi Yekun Chai Jason T. Stillerman Felix Friedrich Prateek Yadav Tanmay Laud Vu Minh Chien Terry Yue Zhuo et\u00a0al. 2024. Aurora-M: The first open source multilingual language model red-teamed according to the US executive order. arXiv:2404.00399. Retrieved from https:\/\/arxiv.org\/abs\/2404.00399"},{"key":"e_1_3_3_169_2","doi-asserted-by":"crossref","unstructured":"Tuan Dung Nguyen Yuan-Sen Ting Ioana Ciuca Charlie O\u2019Neill Zechang Sun Maja Jablonska Sandor Kruk Ernest Perkowski Jack W. Miller Jason Li Josh Peek Kartheik Iyer Tomasz R\u00f3zanski Pranav Khetarpal Sharaf Zaman David Brodrick Sergio J. Rodr\u00edguez M\u00e9ndez Thang Bui Alyssa Goodman Alberto Accomazzi Jill P. Naiman Jesse Cranney Kevin Schawinski and UniverseTBD. 2023. AstroLLaMA: Towards specialized foundation models in astronomy. arXiv:2309.06126. Retrieved from https:\/\/arxiv.org\/abs\/2309.06126","DOI":"10.18653\/v1\/2023.wiesp-1.7"},{"key":"e_1_3_3_170_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1018"},{"key":"e_1_3_3_171_2","unstructured":"Zixuan Ni Haizhou Shi Siliang Tang Longhui Wei Qi Tian and Yueting Zhuang. 2021. Revisiting catastrophic forgetting in class incremental learning. arXiv:2107.12308. Retrieved from https:\/\/arxiv.org\/abs\/2107.12308"},{"key":"e_1_3_3_172_2","doi-asserted-by":"publisher","DOI":"10.5555\/3618408.3619495"},{"key":"e_1_3_3_173_2","unstructured":"Erik Nijkamp Bo Pang Hiroaki Hayashi Lifu Tu Huan Wang Yingbo Zhou Silvio Savarese and Caiming Xiong. 2023. CodeGen: An open large language model for code with multi-turn program synthesis. ICLR (2023)."},{"key":"e_1_3_3_174_2","unstructured":"OpenAI. 2022. Introducing chatgpt. Retrieved from https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_3_3_175_2","unstructured":"Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul Christiano Jan Leike and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arXiv:2203.02155. Retrieved from https:\/\/arxiv.org\/abs\/2203.02155"},{"key":"e_1_3_3_176_2","doi-asserted-by":"crossref","unstructured":"Christophe Pallier Stanislas Dehaene J-B Poline Denis LeBihan A-M Argenti Emmanuel Dupoux and Jacques Mehler. 2003. Brain imaging of language plasticity in adopted adults: Can a second language replace the first? Cerebral Cortex 13 2 (2003) 155\u2013161.","DOI":"10.1093\/cercor\/13.2.155"},{"key":"e_1_3_3_177_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_3_178_2","unstructured":"Indraneil Paul Jun Luo Goran Glava\u0161 and Iryna Gurevych. 2024. IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators. arXiv:2403.03894. Retrieved from https:\/\/arxiv.org\/abs\/2403.03894"},{"key":"e_1_3_3_179_2","volume-title":"Theoretical foundations of multi-task lifelong learning","author":"Pentina Anastasia","year":"2016","unstructured":"Anastasia Pentina. 2016. Theoretical foundations of multi-task lifelong learning. Ph. D. Dissertation."},{"key":"e_1_3_3_180_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1250"},{"key":"e_1_3_3_181_2","first-page":"3698","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Prabhu Ameya","year":"2023","unstructured":"Ameya Prabhu, Hasan Abed Al Kader Hammoud, Puneet K. Dokania, Philip H. S. Torr, Ser-Nam Lim, Bernard Ghanem, and Adel Bibi. 2023. Computationally budgeted continual learning: What does matter?. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 3698\u20133707."},{"key":"e_1_3_3_182_2","unstructured":"Ameya Prabhu Zhipeng Cai Puneet Dokania Philip Torr Vladlen Koltun and Ozan Sener. 2023. Online Continual Learning Without the Storage Constraint. arXiv:2305.09253. Retrieved from https:\/\/arxiv.org\/abs\/2305.09253"},{"key":"e_1_3_3_183_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Qin Chengwei","year":"2021","unstructured":"Chengwei Qin and Shafiq Joty. 2021. LFPT5: A unified framework for lifelong few-shot language learning based on prompt tuning of T5. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_184_2","doi-asserted-by":"crossref","unstructured":"Yujia Qin Cheng Qian Xu Han Yankai Lin Huadong Wang Ruobing Xie Zhiyuan Liu Maosong Sun and Jie Zhou. 2023. Recyclable tuning for continual pre-training. arXiv:2305.08702. Retrieved from https:\/\/arxiv.org\/abs\/2305.08702","DOI":"10.18653\/v1\/2023.findings-acl.723"},{"key":"e_1_3_3_185_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-acl.220"},{"key":"e_1_3_3_186_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_3_187_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei Ilya Sutskever et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1 8 (2019) 9."},{"key":"e_1_3_3_188_2","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Christopher D. Manning Stefano Ermon and Chelsea Finn. 2024. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_189_2","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Christopher D Manning Stefano Ermon and Chelsea Finn. 2024. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_190_2","unstructured":"Colin Raffel Noam Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 140 (2020) 1\u201367."},{"key":"e_1_3_3_191_2","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Robin Jia and Percy Liang. 2018. Know what you don\u2019t know: Unanswerable questions for SQuAD. arXiv:1806.03822. Retrieved from https:\/\/arxiv.org\/abs\/1806.03822","DOI":"10.18653\/v1\/P18-2124"},{"key":"e_1_3_3_192_2","unstructured":"Rahul Ramesh and Pratik Chaudhari. 2021. Model zoo: A growing \u201cbrain\u201d that learns continually. arXiv:2106.03027. Retrieved from https:\/\/arxiv.org\/abs\/2106.03027"},{"key":"e_1_3_3_193_2","unstructured":"Hubert Ramsauer Bernhard Sch\u00e4fl Johannes Lehner Philipp Seidl Michael Widrich Thomas Adler Lukas Gruber Markus Holzleitner Milena Pavlovi\u0107 Geir Kjetil Sandve Victor Greiff David Kreil Michael Kopp G\u00fcnter Klambauer Johannes Brandstetter and Sepp Hochreiter. 2021. Hopfield Networks is All You Need. arXiv:2008.02217. Retrieved from https:\/\/arxiv.org\/abs\/2008.02217"},{"key":"e_1_3_3_194_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.587"},{"key":"e_1_3_3_195_2","unstructured":"Machel Reid Nikolay Savinov Denis Teplyashin Dmitry Lepikhin Timothy Lillicrap Jean-baptiste Alayrac Radu Soricut Angeliki Lazaridou Orhan Firat Julian Schrittwieser et\u00a0al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv:2403.05530. Retrieved from https:\/\/arxiv.org\/abs\/2403.05530"},{"key":"e_1_3_3_196_2","unstructured":"Matthew Riemer Ignacio Cases Robert Ajemian Miao Liu Irina Rish Yuhai Tu and Gerald Tesauro. 2018. Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv:1810.11910. Retrieved from https:\/\/arxiv.org\/abs\/1810.11910"},{"key":"e_1_3_3_197_2","unstructured":"Hippolyt Ritter Aleksandar Botev and David Barber. 2018. Online structured laplace approximations for overcoming catastrophic forgetting. Advances in Neural Information Processing Systems 31 (2018)."},{"key":"e_1_3_3_198_2","unstructured":"Subendhu Rongali Abhyuday Jagannatha Bhanu Pratap Singh Rawat and Hong Yu. 2021. Continual Domain-Tuning for Pretrained Language Models. arXiv:2004.02288. Retrieved from https:\/\/arxiv.org\/abs\/2004.02288"},{"key":"e_1_3_3_199_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3498529"},{"key":"e_1_3_3_200_2","unstructured":"Baptiste Rozi\u00e8re Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Romain Sauvestre Tal Remez J\u00e9r\u00e9my Rapin Artyom Kozhevnikov Ivan Evtimov Joanna Bitton Manish Bhatt Cristian Canton Ferrer Aaron Grattafiori Wenhan Xiong Alexandre D\u00e9fossez Jade Copet Faisal Azhar Hugo Touvron Louis Martin Nicolas Usunier Thomas Scialom and Gabriel Synnaeve. 2024. Code Llama: Open Foundation Models for Code. arXiv:2308.12950. Retrieved from https:\/\/arxiv.org\/abs\/2308.12950"},{"key":"e_1_3_3_201_2","unstructured":"Andre Niyongabo Rubungo Craig Arnold Barry P. Rand and Adji Bousso Dieng. 2023. LLM-Prop: Predicting physical and electronic properties of crystalline solids from their text descriptions. arXiv:2310.14029. Retrieved from https:\/\/arxiv.org\/abs\/2310.14029"},{"key":"e_1_3_3_202_2","unstructured":"Andrei A Rusu Neil C Rabinowitz Guillaume Desjardins Hubert Soyer James Kirkpatrick Koray Kavukcuoglu Razvan Pascanu and Raia Hadsell. 2016. Progressive neural networks. arXiv:1606.04671. Retrieved from https:\/\/arxiv.org\/abs\/1606.04671"},{"key":"e_1_3_3_203_2","unstructured":"Keisuke Sakaguchi Ronan Le Bras Chandra Bhagavatula and Yejin Choi. 2019. WinoGrande: An Adversarial Winograd Schema Challenge at Scale. arXiv:1907.10641. Retrieved from https:\/\/arxiv.org\/abs\/1907.10641"},{"key":"e_1_3_3_204_2","unstructured":"Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika Zaid Alyafeai Antoine Chaffin Arnaud Stiegler Teven Le Scao Arun Raja Manan Dey M Saiful Bari Canwen Xu Urmish Thakker Shanya Sharma Sharma Eliza Szczechla Taewoon Kim Gunjan Chhablani Nihal Nayak Debajyoti Datta Jonathan Chang Mike Tian-Jian Jiang Han Wang Matteo Manica Sheng Shen Zheng Xin Yong Harshit Pandey Rachel Bawden Thomas Wang Trishala Neeraj Jos Rozen Abheesht Sharma Andrea Santilli Thibault Fevry Jason Alan Fries Ryan Teehan Tali Bers Stella Biderman Leo Gao Thomas Wolf and Alexander M. Rush. 2022. Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv:2110.08207. Retrieved from https:\/\/arxiv.org\/abs\/2110.08207"},{"key":"e_1_3_3_205_2","unstructured":"Fahad Sarfraz Elahe Arani and Bahram Zonooz. 2023. Error sensitivity modulation based experience replay: Mitigating abrupt representation drift in continual learning. arXiv:2302.11344. Retrieved from https:\/\/arxiv.org\/abs\/2302.11344"},{"key":"e_1_3_3_206_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_3_207_2","doi-asserted-by":"crossref","unstructured":"Tal Schuster Adam Fisch and Regina Barzilay. 2021. Get your vitamin C! robust fact verification with contrastive evidence. arXiv:2103.08541. Retrieved from https:\/\/arxiv.org\/abs\/2103.08541","DOI":"10.18653\/v1\/2021.naacl-main.52"},{"key":"e_1_3_3_208_2","first-page":"4528","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Schwarz Jonathan","year":"2018","unstructured":"Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. 2018. Progress and compress: A scalable framework for continual learning. In Proceedings of the International Conference on Machine Learning. PMLR, 4528\u20134537."},{"key":"e_1_3_3_209_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.410"},{"key":"e_1_3_3_210_2","doi-asserted-by":"crossref","unstructured":"Agam Shah Suvan Paturi and Sudheer Chava. 2023. Trillion Dollar Words: A New Financial Dataset Task and Market Analysis. arXiv:2305.07972. Retrieved from https:\/\/arxiv.org\/abs\/2305.07972","DOI":"10.18653\/v1\/2023.acl-long.368"},{"key":"e_1_3_3_211_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-short.109"},{"key":"e_1_3_3_212_2","unstructured":"Noam Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc Le Geoffrey Hinton and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv:1701.06538. Retrieved from https:\/\/arxiv.org\/abs\/1701.06538"},{"key":"e_1_3_3_213_2","unstructured":"Junhong Shen Neil Tenenholtz James Brian Hall David Alvarez-Melis and Nicolo Fusi. 2024. Tag-LLM: Repurposing general-purpose LLMs for specialized domains. arXiv:2402.05140. Retrieved from https:\/\/arxiv.org\/abs\/2402.05140"},{"key":"e_1_3_3_214_2","unstructured":"Haizhou Shi and Hao Wang. 2024. A unified approach to domain incremental learning with memory: Theory and algorithm. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_215_2","doi-asserted-by":"crossref","unstructured":"Amanpreet Singh Vivek Natarajan Meet Shah Yu Jiang Xinlei Chen Dhruv Batra Devi Parikh and Marcus Rohrbach. 2019. Towards VQA Models That Can Read. arXiv:1904.08920. Retrieved from https:\/\/arxiv.org\/abs\/1904.08920","DOI":"10.1109\/CVPR.2019.00851"},{"key":"e_1_3_3_216_2","unstructured":"Anton Sinitsin Vsevolod Plokhotnyuk Dmitriy Pyrkin Sergei Popov and Artem Babenko. 2020. Editable neural networks. arXiv:2004.00345. Retrieved from https:\/\/arxiv.org\/abs\/2004.00345"},{"key":"e_1_3_3_217_2","doi-asserted-by":"crossref","unstructured":"Luca Soldaini Rodney Kinney Akshita Bhagia Dustin Schwenk David Atkinson Russell Authur Ben Bogin Khyathi Chandu Jennifer Dumas Yanai Elazar Valentin Hofmann Ananya Harsh Jha Sachin Kumar Li Lucy Xinxi Lyu Nathan Lambert Ian Magnusson Jacob Morrison Niklas Muennighoff Aakanksha Naik Crystal Nam Matthew E. Peters Abhilasha Ravichander Kyle Richardson Zejiang Shen Emma Strubell Nishant Subramani Oyvind Tafjord Pete Walsh Luke Zettlemoyer Noah A. Smith Hannaneh Hajishirzi Iz Beltagy Dirk Groeneveld Jesse Dodge and Kyle Lo. 2024. Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research. arXiv:2402.00159. Retrieved from https:\/\/arxiv.org\/abs\/2402.00159","DOI":"10.18653\/v1\/2024.acl-long.840"},{"key":"e_1_3_3_218_2","unstructured":"Chenyang Song Xu Han Zheni Zeng Kuai Li Chen Chen Zhiyuan Liu Maosong Sun and Tao Yang. 2023. ConPET: Continual Parameter-Efficient Tuning for Large Language Models. arXiv:2309.14763. Retrieved from https:\/\/arxiv.org\/abs\/2309.14763"},{"key":"e_1_3_3_219_2","doi-asserted-by":"crossref","unstructured":"Demin Song Honglin Guo Yunhua Zhou Shuhao Xing Yudong Wang Zifan Song Wenwei Zhang Qipeng Guo Hang Yan Xipeng Qiu and Dahua Lin. 2024. Code Needs Comments: Enhancing Code LLMs with Comment Augmentation. arXiv:2402.13013. Retrieved from https:\/\/arxiv.org\/abs\/2402.13013","DOI":"10.18653\/v1\/2024.findings-acl.809"},{"key":"e_1_3_3_220_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.418"},{"key":"e_1_3_3_221_2","unstructured":"Qiushi Sun Zhirui Chen Fangzhi Xu Kanzhi Cheng Chang Ma Zhangyue Yin Jianing Wang Chengcheng Han Renyu Zhu Shuai Yuan Qipeng Guo Xipeng Qiu Pengcheng Yin Xiaoli Li Fei Yuan Lingpeng Kong Xiang Li and Zhiyong Wu. 2024. A Survey of Neural Code Intelligence: Paradigms Advances and Beyond. arXiv:2403.14734. Retrieved from https:\/\/arxiv.org\/abs\/2403.14734"},{"key":"e_1_3_3_222_2","doi-asserted-by":"publisher","unstructured":"Yu Sun Shuohuan Wang Yukun Li Shikun Feng Hao Tian Hua Wu and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. Proceedings of the AAAI Conference on Artificial Intelligence 34 05 (2020) 8968\u20138975. DOI:10.1609\/aaai.v34i05.6428","DOI":"10.1609\/aaai.v34i05.6428"},{"key":"e_1_3_3_223_2","unstructured":"Kosuke Takahashi Takahiro Omi Kosuke Arima and Tatsuya Ishigaki. 2024. Pretraining and updating language-and domain-specific large language model: A case study in japanese business domain. arXiv:2404.08262. Retrieved from https:\/\/arxiv.org\/abs\/2404.08262"},{"key":"e_1_3_3_224_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Tao Mingxu","year":"2022","unstructured":"Mingxu Tao, Yansong Feng, and Dongyan Zhao. 2022. Can bert refrain from forgetting on sequential tasks? A probing study. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_3_225_2","unstructured":"DeepSeek-AI Team. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv:2401.02954. Retrieved from https:\/\/arxiv.org\/abs\/2401.02954"},{"key":"e_1_3_3_226_2","unstructured":"Gemini Team Rohan Anil Sebastian Borgeaud Yonghui Wu Jean-Baptiste Alayrac Jiahui Yu Radu Soricut Johan Schalkwyk Andrew M. Dai Anja Hauth et\u00a0al. 2023. Gemini: A family of highly capable multimodal models. arXiv:2312.11805. Retrieved from https:\/\/arxiv.org\/abs\/2312.11805"},{"key":"e_1_3_3_227_2","unstructured":"StarCode Team. 2023. StarCoder: may the source be with you! arXiv:2305.06161. Retrieved from https:\/\/arxiv.org\/abs\/2305.06161"},{"key":"e_1_3_3_228_2","unstructured":"StarCoder2 Team. 2024. StarCoder 2 and The Stack v2: The Next Generation. arXiv:2402.19173. Retrieved from https:\/\/arxiv.org\/abs\/2402.19173"},{"key":"e_1_3_3_229_2","doi-asserted-by":"crossref","unstructured":"James Thorne Andreas Vlachos Christos Christodoulopoulos and Arpit Mittal. 2018. FEVER: A large-scale dataset for fact extraction and VERification. arXiv:1803.05355. Retrieved from https:\/\/arxiv.org\/abs\/1803.05355","DOI":"10.18653\/v1\/W18-5501"},{"key":"e_1_3_3_230_2","unstructured":"David Thulke Yingbo Gao Petrus Pelser Rein Brune Rricha Jalota Floris Fok Michael Ramos Ian van Wyk Abdallah Nasir Hayden Goldstein et\u00a0al. 2024. ClimateGPT: Towards AI synthesizing interdisciplinary research on climate change. arXiv:2401.09646. Retrieved from https:\/\/arxiv.org\/abs\/2401.09646"},{"key":"e_1_3_3_231_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_3_232_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_3_233_2","doi-asserted-by":"crossref","unstructured":"Gido M. Van de Ven Tinne Tuytelaars and Andreas S Tolias. 2022. Three types of incremental learning. Nature Machine Intelligence 4 12 (2022) 1185\u20131197.","DOI":"10.1038\/s42256-022-00568-3"},{"key":"e_1_3_3_234_2","unstructured":"Eli Verwimp Rahaf Aljundi Shai Ben-David Matthias Bethge Andrea Cossu Alexander Gepperth Tyler L. Hayes Eyke H\u00fcllermeier Christopher Kanan Dhireesha Kudithipudi Christoph H. Lampert Martin Mundt Razvan Pascanu Adrian Popescu Andreas S. Tolias Joost van de Weijer Bing Liu Vincenzo Lomonaco Tinne Tuytelaars and Gido M. van de Ven. 2024. Continual Learning: Applications and the Road Forward. arXiv:2311.11908. Retrieved from https:\/\/arxiv.org\/abs\/2311.11908"},{"key":"e_1_3_3_235_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4508"},{"key":"e_1_3_3_236_2","unstructured":"Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. Retrieved from https:\/\/github.com\/kingoflolz\/mesh-transformer-jax"},{"key":"e_1_3_3_237_2","first-page":"254","volume-title":"Proceedings of the 17th European Conference on Computer Vision\u2013ECCV 2022, Tel Aviv, Israel, October 23\u201327, 2022, Part XXVI","author":"Wang Liyuan","year":"2022","unstructured":"Liyuan Wang, Xingxing Zhang, Qian Li, Jun Zhu, and Yi Zhong. 2022. CoSCL: Cooperation of small continual learners is stronger than a big one. In Proceedings of the 17th European Conference on Computer Vision\u2013ECCV 2022, Tel Aviv, Israel, October 23\u201327, 2022, Part XXVI. Springer, 254\u2013271."},{"key":"e_1_3_3_238_2","doi-asserted-by":"publisher","unstructured":"Liyuan Wang Xingxing Zhang Hang Su and Jun Zhu. 2024. A comprehensive survey of continual learning: Theory method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024) 1\u201320. DOI:10.1109\/TPAMI.2024.3367329","DOI":"10.1109\/TPAMI.2024.3367329"},{"key":"e_1_3_3_239_2","unstructured":"Peng Wang Zexi Li Ningyu Zhang Ziwen Xu Yunzhi Yao Yong Jiang Pengjun Xie Fei Huang and Huajun Chen. 2024. WISE: Rethinking the knowledge memory for lifelong model editing of large language models. arXiv:2405.14768. Retrieved from https:\/\/arxiv.org\/abs\/2405.14768"},{"key":"e_1_3_3_240_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.121"},{"key":"e_1_3_3_241_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.715"},{"key":"e_1_3_3_242_2","unstructured":"Xiao Wang Yuansen Zhang Tianze Chen Songyang Gao Senjie Jin Xianjun Yang Zhiheng Xi Rui Zheng Yicheng Zou Tao Gui Qi Zhang and Xuanjing Huang. 2023. TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models. arXiv:2310.06762. Retrieved from https:\/\/arxiv.org\/abs\/2310.06762"},{"key":"e_1_3_3_243_2","doi-asserted-by":"crossref","unstructured":"Yue Wang Hung Le Akhilesh Deepak Gotmare Nghi D. Q. Bui Junnan Li and Steven C. H. Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. arXiv:2305.07922. Retrieved from https:\/\/arxiv.org\/abs\/2305.07922","DOI":"10.18653\/v1\/2023.emnlp-main.68"},{"key":"e_1_3_3_244_2","unstructured":"Yifan Wang Yafei Liu Chufan Shi Haoling Li Chen Chen Haonan Lu and Yujiu Yang. 2024. InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions. arXiv:2403.11435. Retrieved from https:\/\/arxiv.org\/abs\/2403.11435"},{"key":"e_1_3_3_245_2","doi-asserted-by":"crossref","unstructured":"Yizhong Wang Swaroop Mishra Pegah Alipoormolabashi Yeganeh Kordi Amirreza Mirzaei Anjana Arunkumar Arjun Ashok Arut Selvan Dhanasekaran Atharva Naik David Stap Eshaan Pathak Giannis Karamanolakis Haizhi Gary Lai Ishan Purohit Ishani Mondal Jacob Anderson Kirby Kuznia Krima Doshi Maitreya Patel Kuntal Kumar Pal Mehrad Moradshahi Mihir Parmar Mirali Purohit Neeraj Varshney Phani Rohitha Kaza Pulkit Verma Ravsehaj Singh Puri Rushang Karia Shailaja Keyur Sampat Savan Doshi Siddhartha Mishra Sujan Reddy Sumanta Patro Tanay Dixit Xudong Shen Chitta Baral Yejin Choi Noah A. Smith Hannaneh Hajishirzi and Daniel Khashabi. 2022. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. arXiv:2204.07705. Retrieved from https:\/\/arxiv.org\/abs\/2204.07705","DOI":"10.18653\/v1\/2022.emnlp-main.340"},{"key":"e_1_3_3_246_2","volume-title":"Proceedings of the EMNLP","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the EMNLP."},{"key":"e_1_3_3_247_2","doi-asserted-by":"crossref","unstructured":"Zifeng Wang Chun-Liang Li Vincent Perot Long T. Le Jin Miao Zizhao Zhang Chen-Yu Lee and Tomas Pfister. 2024. CodecLM: Aligning language models with tailored synthetic data. arXiv:2404.05875. Retrieved from https:\/\/arxiv.org\/abs\/2404.05875","DOI":"10.18653\/v1\/2024.findings-naacl.235"},{"key":"e_1_3_3_248_2","unstructured":"Zifeng Wang Zheng Zhan Yifan Gong Geng Yuan Wei Niu Tong Jian Bin Ren Stratis Ioannidis Yanzhi Wang and Jennifer Dy. 2022. Sparcl: Sparse continual learning on the edge. Advances in Neural Information Processing Systems 35 (2022) 20366\u201320380."},{"key":"e_1_3_3_249_2","doi-asserted-by":"crossref","unstructured":"Zifeng Wang Zizhao Zhang Sayna Ebrahimi Ruoxi Sun Han Zhang Chen-Yu Lee Xiaoqi Ren Guolong Su Vincent Perot Jennifer Dy et\u00a0al. 2022. DualPrompt: Complementary prompting for rehearsal-free continual learning. In Proceedings of the European Conference on Computer Vision.","DOI":"10.1007\/978-3-031-19809-0_36"},{"key":"e_1_3_3_250_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00024"},{"key":"e_1_3_3_251_2","unstructured":"Jason Wei Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai and Quoc V. Le. 2021. Finetuned language models are zero-shot learners. arXiv:2109.01652. Retrieved from https:\/\/arxiv.org\/abs\/2109.01652"},{"key":"e_1_3_3_252_2","unstructured":"Jason Wei Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai and Quoc V. Le. 2022. Finetuned Language Models Are Zero-Shot Learners. arXiv:2109.01652. Retrieved from https:\/\/arxiv.org\/abs\/2109.01652"},{"key":"e_1_3_3_253_2","unstructured":"Jason Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph Sebastian Borgeaud Dani Yogatama Maarten Bosma Denny Zhou Donald Metzler et\u00a0al. 2022. Emergent abilities of large language models. arXiv:2206.07682. Retrieved from https:\/\/arxiv.org\/abs\/2206.07682"},{"key":"e_1_3_3_254_2","unstructured":"Jason Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Fei Xia Ed Chi Quoc V. Le Denny Zhou et\u00a0al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022) 24824\u201324837."},{"key":"e_1_3_3_255_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616244"},{"key":"e_1_3_3_256_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.48"},{"key":"e_1_3_3_257_2","volume-title":"Proceedings of the NeurIPS 2023 Workshop on Distribution Shifts (DistShifts)","author":"Wistuba Martin","year":"2023","unstructured":"Martin Wistuba, Prabhu Teja Sivaprasad, Lukas Balles, and Giovanni Zappella. 2023. Continual learning with low rank adaptation. In Proceedings of the NeurIPS 2023 Workshop on Distribution Shifts (DistShifts). Retrieved from https:\/\/www.amazon.science\/publications\/continual-learning-with-low-rank-adaptation"},{"key":"e_1_3_3_258_2","unstructured":"Chengyue Wu Yukang Gan Yixiao Ge Zeyu Lu Jiahao Wang Ye Feng Ping Luo and Ying Shan. 2024. LLaMA Pro: Progressive LLaMA with Block Expansion. arXiv:2401.02415. Retrieved from https:\/\/arxiv.org\/abs\/2401.02415"},{"key":"e_1_3_3_259_2","unstructured":"Chaoyi Wu Weixiong Lin Xiaoman Zhang Ya Zhang Yanfeng Wang and Weidi Xie. 2023. Pmc-llama: Towards building open-source language models for medicine. arXiv:2305.10415. Retrieved from https:\/\/arxiv.org\/abs\/2305.10415"},{"key":"e_1_3_3_260_2","unstructured":"Shijie Wu Ozan Irsoy Steven Lu Vadim Dabravolski Mark Dredze Sebastian Gehrmann Prabhanjan Kambadur David S. Rosenberg and Gideon Mann. 2023. BloombergGPT: A large language model for finance. arXiv:2303.17564. Retrieved from https:\/\/arxiv.org\/abs\/2303.17564"},{"key":"e_1_3_3_261_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wu Tongtong","year":"2021","unstructured":"Tongtong Wu, Massimo Caccia, Zhuang Li, Yuan-Fang Li, Guilin Qi, and Gholamreza Haffari. 2021. Pretrained language model in continual learning: A comparative study. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_262_2","unstructured":"Tongtong Wu Linhao Luo Yuan-Fang Li Shirui Pan Thuy-Trang Vu and Gholamreza Haffari. 2024. Continual Learning for Large Language Models: A Survey. arXiv:2402.01364. Retrieved from https:\/\/arxiv.org\/abs\/2402.01364"},{"key":"e_1_3_3_263_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00046"},{"key":"e_1_3_3_264_2","unstructured":"Yan Wu Greg Wayne Alex Graves and Timothy Lillicrap. 2018. The kanerva machine: A generative distributed memory. arXiv:1804.01756. Retrieved from https:\/\/arxiv.org\/abs\/1804.01756"},{"key":"e_1_3_3_265_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599891"},{"key":"e_1_3_3_266_2","unstructured":"Qianqian Xie Qingyu Chen Aokun Chen Cheng Peng Yan Hu Fongci Lin Xueqing Peng Jimin Huang Jeffrey Zhang Vipina Keloth et\u00a0al. 2024. Me LLaMA: Foundation large language models for medical applications. arXiv:2402.12749. Retrieved from https:\/\/arxiv.org\/abs\/2402.12749"},{"key":"e_1_3_3_267_2","unstructured":"Qianqian Xie Weiguang Han Xiao Zhang Yanzhao Lai Min Peng Alejandro Lopez-Lira and Jimin Huang. 2023. PIXIU: A large language model instruction data and evaluation benchmark for finance. arXiv:2306.05443. Retrieved from https:\/\/arxiv.org\/abs\/2306.05443"},{"key":"e_1_3_3_268_2","unstructured":"Sang Michael Xie Shibani Santurkar Tengyu Ma and Percy S Liang. 2024. Data selection for language models via importance resampling. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_269_2","unstructured":"Yong Xie Karan Aggarwal and Aitzaz Ahmad. 2023. Efficient Continual Pre-training for Building Domain Specific Large Language Models. arXiv:2311.08545. Retrieved from https:\/\/arxiv.org\/abs\/2311.08545"},{"key":"e_1_3_3_270_2","unstructured":"Can Xu Qingfeng Sun Kai Zheng Xiubo Geng Pu Zhao Jiazhan Feng Chongyang Tao and Daxin Jiang. 2023. Wizardlm: Empowering large language models to follow complex instructions. arXiv:2304.12244. Retrieved from https:\/\/arxiv.org\/abs\/2304.12244"},{"key":"e_1_3_3_271_2","unstructured":"Hu Xu Bing Liu Lei Shu and Philip S. Yu. 2019. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis. arXiv:1904.02232. Retrieved from https:\/\/arxiv.org\/abs\/1904.02232"},{"key":"e_1_3_3_272_2","unstructured":"Siqiao Xue Fan Zhou Yi Xu Hongyu Zhao Shuo Xie Qingyang Dai Caigao Jiang James Zhang Jun Zhou Dacheng Xiu and Hongyuan Mei. 2023. WeaverBird: Empowering financial decision-making with large language model knowledge base and search engine. arXiv:2308.05361. Retrieved from https:\/\/arxiv.org\/abs\/2308.05361"},{"key":"e_1_3_3_273_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM58861.2023.10385733"},{"key":"e_1_3_3_274_2","unstructured":"Shu Yang Muhammad Asif Ali Cheng-Long Wang Lijie Hu and Di Wang. 2024. MoRAL: MoE Augmented LoRA for LLMs\u2019 Lifelong Learning. arXiv:2402.11260. Retrieved from https:\/\/arxiv.org\/abs\/2402.11260"},{"key":"e_1_3_3_275_2","unstructured":"Xianjun Yang Junfeng Gao Wenxin Xue and Erik Alexandersson. 2024. PLLaMa: An open-source large language model for plant science. arXiv:2401.01600. Retrieved from https:\/\/arxiv.org\/abs\/2401.01600"},{"key":"e_1_3_3_276_2","unstructured":"Yanlai Yang Matt Jones Michael C. Mozer and Mengye Ren. 2024. Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training. arXiv:2403.09613. Retrieved from https:\/\/arxiv.org\/abs\/2403.09613"},{"key":"e_1_3_3_277_2","unstructured":"Yutao Yang Jie Zhou Xuanwen Ding Tianyu Huai Shunyu Liu Qin Chen Liang He and Yuan Xie. 2024. Recent advances of foundation language models-based continual learning: A survey. arXiv:2405.18653. Retrieved from https:\/\/arxiv.org\/abs\/2405.18653"},{"key":"e_1_3_3_278_2","unstructured":"Shunyu Yao Dian Yu Jeffrey Zhao Izhak Shafran Tom Griffiths Yuan Cao and Karthik Narasimhan. 2024. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_279_2","unstructured":"\u00c7a\u011fatay Y\u0131ld\u0131z Nishaanth Kanna Ravichandran Prishruit Punia Matthias Bethge and Beyza Ermis. 2024. Investigating continual pretraining in large language models: Insights and implications. arXiv:2402.17400. Retrieved from https:\/\/arxiv.org\/abs\/2402.17400"},{"key":"e_1_3_3_280_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.218"},{"key":"e_1_3_3_281_2","unstructured":"Lang Yu Qin Chen Jie Zhou and Liang He. 2023. MELO: Enhancing model editing with neuron-indexed dynamic LoRA. arXiv:2312.11795. Retrieved from https:\/\/arxiv.org\/abs\/2312.11795"},{"key":"e_1_3_3_282_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534219"},{"key":"e_1_3_3_283_2","unstructured":"Xiang Yue Xingwei Qu Ge Zhang Yao Fu Wenhao Huang Huan Sun Yu Su and Wenhu Chen. 2023. Mammoth: Building math generalist models through hybrid instruction tuning. arXiv:2309.05653. Retrieved from https:\/\/arxiv.org\/abs\/2309.05653"},{"key":"e_1_3_3_284_2","unstructured":"Rowan Zellers Ari Holtzman Hannah Rashkin Yonatan Bisk Ali Farhadi Franziska Roesner and Yejin Choi. 2019. Defending against neural fake news. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_3_285_2","unstructured":"Yuexiang Zhai Shengbang Tong Xiao Li Mu Cai Qing Qu Yong Jae Lee and Yi Ma. 2023. Investigating the Catastrophic Forgetting in Multimodal Large Language Models. arXiv:2309.10313. Retrieved from https:\/\/arxiv.org\/abs\/2309.10313"},{"key":"e_1_3_3_286_2","unstructured":"Dan Zhang Ziniu Hu Sining Zhoubian Zhengxiao Du Kaiyu Yang Zihan Wang Yisong Yue Yuxiao Dong and Jie Tang. 2024. SciGLM: Training scientific language models with self-reflective instruction annotation and tuning. arXiv:2401.07950. Retrieved from https:\/\/arxiv.org\/abs\/2401.07950"},{"key":"e_1_3_3_287_2","unstructured":"Han Zhang Lin Gui Yuanzhao Zhai Hui Wang Yu Lei and Ruifeng Xu. 2023. Copf: Continual learning human preference through optimal policy fitting. arXiv:2310.15694. Retrieved from https:\/\/arxiv.org\/abs\/2310.15694"},{"key":"e_1_3_3_288_2","unstructured":"Han Zhang Yu Lei Lin Gui Min Yang Yulan He Hui Wang and Ruifeng Xu. [n.d.]. CPPO: Continual learning for reinforcement learning with human feedback. ([n.d.])."},{"key":"e_1_3_3_289_2","doi-asserted-by":"crossref","unstructured":"Shengyu Zhang Linfeng Dong Xiaoya Li Sen Zhang Xiaofei Sun Shuhe Wang Jiwei Li Runyi Hu Tianwei Zhang Fei Wu and Guoyin Wang. 2024. Instruction Tuning for Large Language Models: A Survey. arXiv:2308.10792. Retrieved from https:\/\/arxiv.org\/abs\/2308.10792","DOI":"10.1145\/3777411"},{"key":"e_1_3_3_290_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3615285"},{"key":"e_1_3_3_291_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"28","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proceedings of the Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (Eds.). Vol. 28, Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2015\/file\/250cf8b51c773f3f8dc8b4be867a9a02-Paper.pdf"},{"key":"e_1_3_3_292_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.255"},{"key":"e_1_3_3_293_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.633"},{"key":"e_1_3_3_294_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.747"},{"key":"e_1_3_3_295_2","unstructured":"Haokun Zhao Haixia Han Jie Shi Chengyu Du Jiaqing Liang and Yanghua Xiao. 2024. Large language model can continue evolving from mistakes. arXiv:2404.08707. Retrieved from https:\/\/arxiv.org\/abs\/2404.08707"},{"key":"e_1_3_3_296_2","doi-asserted-by":"publisher","unstructured":"Hanbin Zhao Hui Wang Yongjian Fu Fei Wu and Xi Li. 2022. Memory-efficient class-incremental learning for image classification. IEEE Transactions on Neural Networks and Learning Systems 33 10 (2022) 5966\u20135977. DOI:10.1109\/TNNLS.2021.3072041","DOI":"10.1109\/TNNLS.2021.3072041"},{"key":"e_1_3_3_297_2","unstructured":"Shu Zhao Xiaohan Zou Tan Yu and Huijuan Xu. 2024. Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration. arXiv:2403.11373. Retrieved from https:\/\/arxiv.org\/abs\/2403.11373"},{"key":"e_1_3_3_298_2","unstructured":"Weixiang Zhao Shilong Wang Yulin Hu Yanyan Zhao Bing Qin Xuanyu Zhang Qing Yang Dongliang Xu and Wanxiang Che. 2024. SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models. arXiv:2401.08295. Retrieved from https:\/\/arxiv.org\/abs\/2401.08295"},{"key":"e_1_3_3_299_2","unstructured":"Junhao Zheng Qianli Ma Zhen Liu Binquan Wu and Huawen Feng. 2024. Beyond Anti-Forgetting: Multimodal Continual Instruction Tuning with Positive Forward Transfer. arXiv:2401.09181. Retrieved from https:\/\/arxiv.org\/abs\/2401.09181"},{"key":"e_1_3_3_300_2","unstructured":"Junhao Zheng Shengjie Qiu and Qianli Ma. 2023. Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models. arXiv:2312.07887. Retrieved from https:\/\/arxiv.org\/abs\/2312.07887"},{"key":"e_1_3_3_301_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01752"},{"key":"e_1_3_3_302_2","unstructured":"Ziqiang Zheng Jipeng Zhang Tuan-Anh Vu Shizhe Diao Yue Him Wong Tim and Sai-Kit Yeung. 2023. MarineGPT: Unlocking secrets of ocean to the public. arXiv:2310.13596. Retrieved from https:\/\/arxiv.org\/abs\/2310.13596"},{"key":"e_1_3_3_303_2","doi-asserted-by":"crossref","unstructured":"Ben Zhou Daniel Khashabi Qiang Ning and Dan Roth. 2019. \u201cGoing on a vacation\u201d takes longer than \u201cGoing for a walk\u201d: A Study of Temporal Commonsense Understanding. arXiv:1909.03065. Retrieved from https:\/\/arxiv.org\/abs\/1909.03065","DOI":"10.18653\/v1\/D19-1332"},{"key":"e_1_3_3_304_2","unstructured":"Wangchunshu Zhou Dong-Ho Lee Ravi Kiran Selvam Seyeon Lee Bill Yuchen Lin and Xiang Ren. 2021. Pre-training text-to-text transformers for concept-centric common sense. (2021)."},{"key":"e_1_3_3_305_2","unstructured":"Didi Zhu Zhongyi Sun Zexi Li Tao Shen Ke Yan Shouhong Ding Kun Kuang and Chao Wu. 2024. Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models. arXiv:2402.12048. Retrieved from https:\/\/arxiv.org\/abs\/2402.12048"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3735633","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T13:33:35Z","timestamp":1763645615000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3735633"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,20]]},"references-count":304,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2026,4,30]]}},"alternative-id":["10.1145\/3735633"],"URL":"https:\/\/doi.org\/10.1145\/3735633","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,20]]},"assertion":[{"value":"2024-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}