{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:21:24Z","timestamp":1781713284275,"version":"3.54.5"},"reference-count":467,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/100000183","name":"Army Research Office","doi-asserted-by":"crossref","award":["W911NF21-1-0198"],"award-info":[{"award-number":["W911NF21-1-0198"]}],"id":[{"id":"10.13039\/100000183","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Department of Homeland Security CINA","award":["17STCIN0000105-00"],"award-info":[{"award-number":["17STCIN0000105-00"]}]},{"name":"Cisco Faculty Research Award"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>\n                    Large language models (LLMs) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like PaLM 540B and Llama-3.1 405B face limitations due to large parameter sizes and computational demands, often requiring cloud API use, which raises privacy concerns, limits real-time applications on edge devices, and increases fine-tuning costs. Additionally, LLMs often underperform in specialized domains such as healthcare and law due to insufficient domain-specific knowledge, necessitating specialized models. Therefore, Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. These models are particularly well-suited for resource-limited environments and domain knowledge acquisition, addressing LLMs\u2019 challenges and proving ideal for applications that require localized data handling for privacy, minimal inference latency for efficiency, and domain knowledge acquisition through lightweight fine-tuning. The rising demand for SLMs has spurred extensive research and development. However, a comprehensive survey investigating issues related to the definition, acquisition, application, enhancement, and reliability of SLM remains lacking, prompting us to conduct a detailed survey on these topics. The definition of SLMs varies widely; thus, to standardize, we propose defining SLMs by their capability to perform specialized tasks and suitability for resource-constrained settings, setting boundaries based on the minimal size for emergent abilities and the maximum size sustainable under resource constraints. For other aspects, we provide a taxonomy of relevant models\/methods and develop general frameworks for each category to enhance and utilize SLMs effectively. We have compiled the collected SLM models and related methods on GitHub:\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/FairyFali\/SLMs-Survey\">https:\/\/github.com\/FairyFali\/SLMs-Survey<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3768165","type":"journal-article","created":{"date-parts":[[2025,9,18]],"date-time":"2025-09-18T18:43:57Z","timestamp":1758221037000},"page":"1-87","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":59,"title":["A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-8321-6365","authenticated-orcid":false,"given":"Fali","family":"Wang","sequence":"first","affiliation":[{"name":"The Pennsylvania State University, University Park, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-6153-2739","authenticated-orcid":false,"given":"Zhiwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, University Park, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0283-4693","authenticated-orcid":false,"given":"Xianren","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, University Park, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8378-7632","authenticated-orcid":false,"given":"Zongyu","family":"Wu","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, University Park, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0975-8280","authenticated-orcid":false,"given":"TzuHao","family":"Mo","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Philadelphia, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2368-8410","authenticated-orcid":false,"given":"Qiuhao","family":"Lu","sequence":"additional","affiliation":[{"name":"The University of Texas Health Science Center at Houston, Houston, TX, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4978-5868","authenticated-orcid":false,"given":"Wanjing","family":"Wang","sequence":"additional","affiliation":[{"name":"The University of Texas Health Science Center at Houston, Houston, TX, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-7292-4011","authenticated-orcid":false,"given":"Rui","family":"Li","sequence":"additional","affiliation":[{"name":"The University of Texas Health Science Center at Houston, Houston, TX, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3673-786X","authenticated-orcid":false,"given":"Junjie","family":"Xu","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, University Park, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1554-2761","authenticated-orcid":false,"given":"Xianfeng","family":"Tang","sequence":"additional","affiliation":[{"name":"Amazon.com Inc., Palo Alto, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5257-6843","authenticated-orcid":false,"given":"Qi","family":"He","sequence":"additional","affiliation":[{"name":"Amazon.com Inc., Palo Alto, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4985-8724","authenticated-orcid":false,"given":"Yao","family":"Ma","sequence":"additional","affiliation":[{"name":"Rensselaer Polytechnic Institute, Troy, NY, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7367-3626","authenticated-orcid":false,"given":"Ming","family":"Huang","sequence":"additional","affiliation":[{"name":"The University of Texas Health Science Center at Houston, Houston, TX, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3448-4878","authenticated-orcid":false,"given":"Suhang","family":"Wang","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, University Park, PA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,11,24]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Marah Abdin Sam Ade Jacobs Ammar Ahmad Awan Jyoti Aneja Ahmed Awadallah Hany Awadalla Nguyen Bach Amit Bahree Arash Bakhtiari Harkirat Behl et al. 2024. Phi-3 technical report: A highly capable language model locally on your phone. arXiv:2404.14219. Retrieved from https:\/\/arxiv.org\/abs\/2404.14219"},{"key":"e_1_3_2_3_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et al. 2023. Gpt-4 Technical Report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_4_2","unstructured":"Emre Can Acikgoz Osman Batur \u0130nce Rayene Bench Arda An\u0131l Boz \u0130lker Kesen Aykut Erdem and Erkut Erdem. 2024. Hippocrates: An open-source framework for advancing large language models in healthcare. arXiv:2404.16621. Retrieved from https:\/\/arxiv.org\/abs\/2404.16621"},{"key":"e_1_3_2_5_2","unstructured":"Harshavardhan Adepu Zhanpeng Zeng Li Zhang and Vikas Singh. 2024. FrameQuant: Flexible low-bit quantization for transformers. arXiv:2403.06082. Retrieved from https:\/\/arxiv.org\/abs\/2403.06082"},{"key":"e_1_3_2_6_2","unstructured":"Abien Fred Agarap. 2018. Deep learning using rectified linear units (ReLU). arXiv:1803.08375. Retrieved from http:\/\/arxiv.org\/abs\/1803.08375"},{"key":"e_1_3_2_7_2","volume-title":"The Twelfth International Conference on Learning Representations (ICLR \u201924)","author":"Agarwal Rishabh","year":"2024","unstructured":"Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. 2024. On-policy distillation of language models: Learning from self-generated mistakes. In The Twelfth International Conference on Learning Representations (ICLR \u201924)."},{"key":"e_1_3_2_8_2","unstructured":"Meta AI. 2024. Llama 3.2: Revolutionizing edge AI and vision with open customizable models. Retrieved from https:\/\/ai.meta.com\/blog\/llama-3-2-connect-2024-vision-edge-mobile-devices\/. Accessed: September 25 2024."},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Joshua Ainslie James Lee-Thorp Michiel de Jong Yury Zemlyanskiy Federico Lebr\u00f3n and Sumit Sanghai. 2023. GQA: Training generalized multi-query transformer models from multi-head checkpoints. arXiv:2305.13245. Retrieved from https:\/\/arxiv.org\/abs\/2305.13245","DOI":"10.18653\/v1\/2023.emnlp-main.298"},{"key":"e_1_3_2_10_2","unstructured":"Ali Al-Lawati Jason Lucas Zhiwei Zhang Prasenjit Mitra and Suhang Wang. 2025. Graph-based molecular in-context learning grounded on morgan fingerprints. arXiv:2502.05414. Retrieved from https:\/\/arxiv.org\/abs\/2502.05414"},{"key":"e_1_3_2_11_2","unstructured":"Loubna Ben Allal Anton Lozhkov Elie Bakouch Leandro von Werra and Thomas Wolf. 2024. SmolLM - blazingly fast and remarkably powerful. arXiv:2409.00286v1. Retrieved from https:\/\/arxiv.org\/abs\/2409.00286v1"},{"key":"e_1_3_2_12_2","unstructured":"Ebtesam Almazrouei Hamza Alobeidli Abdulaziz Alshamsi Alessandro Cappelli Ruxandra Cojocaru M\u00e9rouane Debbah \u00c9tienne Goffinet Daniel Hesslow Julien Launay Quentin Malartic et al. 2023. The falcon series of open language models. arXiv:2311.16867. Retrieved from https:\/\/arxiv.org\/abs\/2311.16867"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2024.104145"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41404.2022.00051"},{"key":"e_1_3_2_15_2","first-page":"10865","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"An Yongqi","year":"2024","unstructured":"Yongqi An, Xu Zhao, Tao Yu, Ming Tang, and Jinqiao Wang. 2024. Fluctuation-based adaptive structured pruning for large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, AAAI, 10865\u201310873."},{"key":"e_1_3_2_16_2","unstructured":"Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin Alexandre Passos Siamak Shakeri Emanuel Taropa Paige Bailey Zhifeng Chen et al. 2023. Palm 2 Technical Report. arXiv:2305.10403. Retrieved from https:\/\/arxiv.org\/abs\/2305.10403"},{"issue":"1","key":"e_1_3_2_17_2","article-title":"The Claude 3 model family: Opus, Sonnet, Haiku","author":"AI Anthropic","year":"2024","unstructured":"AI Anthropic. 2024. The Claude 3 model family: Opus, Sonnet, Haiku. Claude-3 Model Card, 1, (2024). Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:268232499","journal-title":"Claude-3 Model Card"},{"key":"e_1_3_2_18_2","unstructured":"David Anugraha Genta Indra Winata Chenyue Li Patrick Amadeus Irawan and En-Shiun Annie Lee. 2024. ProxyLM: Predicting language model performance on multilingual tasks via proxy models. arXiv:2406.09334. Retrieved from https:\/\/arxiv.org\/abs\/2406.09334"},{"key":"e_1_3_2_19_2","unstructured":"Viraat Aryabumi Yixuan Su Raymond Ma Adrien Morisot Ivan Zhang Acyr Locatelli Marzieh Fadaee Ahmet \u00dcst\u00fcn and Sara Hooker. 2024. To code or not to code? Exploring impact of code in pre-training. arXiv:2408.10914. Retrieved from https:\/\/arxiv.org\/abs\/2408.10914"},{"key":"e_1_3_2_20_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Asai Akari","year":"2024","unstructured":"Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=hSyW5go0v8"},{"key":"e_1_3_2_21_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Ashkboos Saleh","year":"2024","unstructured":"Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, and James Hensman. 2024. SliceGPT: Compress large language models by deleting rows and columns. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=vXxardq6db"},{"key":"e_1_3_2_22_2","unstructured":"Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry Quoc Le et al. 2021. Program synthesis with large language models. arXiv:2108.07732. Retrieved from https:\/\/arxiv.org\/abs\/2108.07732"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","first-page":"967","DOI":"10.18653\/v1\/2023.findings-emnlp.68","volume-title":"Findings of the Association for Computational Linguistics: EMNLP \u201923","author":"Azaria Amos","year":"2023","unstructured":"Amos Azaria, and Tom Mitchell. 2023. The internal state of an LLM knows when it\u2019s lying. In Findings of the Association for Computational Linguistics: EMNLP \u201923. Association for Computational Linguistics, 967\u2013976."},{"key":"e_1_3_2_24_2","unstructured":"Zhangir Azerbayev Hailey Schoelkopf Keiran Paster Marco Dos Santos Stephen McAleer Albert Q. Jiang Jia Deng Stella Biderman and Sean Welleck. 2023. Llemma: An open language model for mathematics. arXiv:2310.10631. Retrieved from https:\/\/arxiv.org\/abs\/2310.10631"},{"key":"e_1_3_2_25_2","unstructured":"Jinze Bai Shuai Bai Yunfei Chu Zeyu Cui Kai Dang Xiaodong Deng Yang Fan Wenbin Ge Yu Han Fei Huang et al. 2023. Qwen Technical Report. arXiv:2309.16609. Retrieved from https:\/\/arxiv.org\/abs\/2309.16609"},{"key":"e_1_3_2_26_2","first-page":"830","volume-title":"Proceedings of the International AAAI Conference on Web and Social Media","author":"Baumgartner Jason","year":"2020","unstructured":"Jason Baumgartner, Savvas Zannettou, Brian Keegan, Megan Squire, and Jeremy Blackburn. 2020. The pushshift reddit dataset. In Proceedings of the International AAAI Conference on Web and Social Media, Vol.14, AAAI, 830\u2013839."},{"key":"e_1_3_2_27_2","volume-title":"The Thirty-Eighth Annual Conference on Neural Information Processing Systems","author":"Beck Maximilian","year":"2024","unstructured":"Maximilian Beck, Korbinian P\u00f6ppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael K. Kopp, G\u00fcnter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. 2024. xLSTM: Extended long Short-Term memory. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems. Retrieved from https:\/\/openreview.net\/forum?id=ARAxPPIAhq"},{"key":"e_1_3_2_28_2","unstructured":"Marco Bellagente Jonathan Tow Dakota Mahan Duy Phung Maksym Zhuravinskyi Reshinth Adithyan James Baicoianu Ben Brooks Nathan Cooper Ashish Datta et al. 2024. Stable LM 2 1.6B Technical Report. arXiv:2402.17834. Retrieved from https:\/\/arxiv.org\/abs\/2402.17834"},{"key":"e_1_3_2_29_2","unstructured":"Loubna Ben Allal Anton Lozhkov Guilherme Penedo Thomas Wolf and Leandro von Werra. 2024. SmolLM-Corpus. Retrieved from https:\/\/huggingface.co\/datasets\/HuggingFaceTB\/smollm-corpus"},{"key":"e_1_3_2_30_2","unstructured":"Benjamin Bergner Andrii Skliar Amelie Royer Tijmen Blankevoort Yuki Asano and Babak Ehteshami Bejnordi. 2024. Think big generate quick: LLM-to-SLM for fast autoregressive decoding. arXiv:2402.16844. Retrieved from https:\/\/arxiv.org\/abs\/2402.16844"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","unstructured":"Milan Bhan Jean-Noel Vittaut Nicolas Chesneau and Marie-Jeanne Lesot. 2024. Self-AMPLIFY: Improving small language models with self post Hoc explanations. arXiv:2402.12038. Retrieved from https:\/\/arxiv.org\/abs\/2402.12038","DOI":"10.18653\/v1\/2024.emnlp-main.615"},{"key":"e_1_3_2_32_2","unstructured":"Zhen Bi Ningyu Zhang Yida Xue Yixin Ou Daxiong Ji Guozhou Zheng and Huajun Chen. 2023. OceanGPT: A large language model for ocean science tasks. arXiv:2310.02031. Retrieved from https:\/\/arxiv.org\/abs\/2310.02031"},{"key":"e_1_3_2_33_2","unstructured":"Stella Biderman Hailey Schoelkopf Quentin Anthony Herbie Bradley Kyle O\u2019Brien Eric Hallahan Mohammad Aflah Khan Shivanshu Purohit Usvsn Sai Prashanth Edward Raff Aviya Skowron Lintang Sutawika and Oskar van der Wal. 2023. Pythia: A suite for analyzing large language models across training and scaling. arXiv:230401373. Retrieved from https:\/\/arxiv.org\/abs\/2304.01373"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6239"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","unstructured":"Sid Black Leo Gao Phil Wang Connor Leahy and Stella Biderman. 2021. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. 10.5281\/zenodo.5297715.","DOI":"10.5281\/zenodo.5297715"},{"key":"e_1_3_2_36_2","unstructured":"Elliot Bolton Abhinav Venigalla Michihiro Yasunaga David Hall Betty Xiong Tony Lee Roxana Daneshjou Jonathan Frankle Percy Liang Michael Carbin et al. 2024. BioMedLM: A 2.7 b parameter language model trained on biomedical text. arXiv:2403.18421. Retrieved from https:\/\/arxiv.org\/abs\/2403.18421"},{"key":"e_1_3_2_37_2","unstructured":"William James Bolton Rafael Poyiadzi Edward R. Morrell Gabriela van Bergen Gonzalez Bueno and Lea Goetz. 2024. RAmBLA: A framework for evaluating the reliability of LLMs as assistants in the biomedical domain. arXiv:2403.14578. Retrieved from https:\/\/arxiv.org\/abs\/2403.14578"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","unstructured":"Luiz Bonifacio Hugo Abonizio Marzieh Fadaee and Rodrigo Nogueira. 2022. Inpars: Data augmentation for information retrieval using large language models. arXiv:2202.05144. Retrieved from https:\/\/arxiv.org\/abs\/2202.05144","DOI":"10.1145\/3477495.3531863"},{"key":"e_1_3_2_39_2","first-page":"1877","volume-title":"Advances in Neural Information Processing Systems","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877\u20131901. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3010274"},{"key":"e_1_3_2_41_2","unstructured":"Weilin Cai Juyong Jiang Fan Wang Jing Tang Sunghun Kim and Jiayi Huang. 2025. A survey on mixture of experts. IEEE Transactions on Knowledge & Data Engineering 37 7 (2025) 3896\u20133915."},{"key":"e_1_3_2_42_2","unstructured":"Zheng Cai Maosong Cao Haojiong Chen Kai Chen Keyu Chen Xin Chen Xun Chen Zehui Chen Zhi Chen Pei Chu et al. 2024. InternLM2 Technical Report. arXiv:2403.17297. Retrieved from https:\/\/arxiv.org\/abs\/2403.17297"},{"key":"e_1_3_2_43_2","first-page":"2633","volume-title":"30th USENIX Security Symposium (USENIX Security \u201921)","author":"Carlini Nicholas","year":"2021","unstructured":"Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2021. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security \u201921). JMLR, 2633\u20132650."},{"key":"e_1_3_2_44_2","unstructured":"Samuel Carreira Tom\u00e1s Marques Jos\u00e9 Ribeiro and Carlos Grilo. 2023. Revolutionizing mobile interaction: Enabling a 3 billion parameter GPT LLM on mobile. arXiv:231001434. Retrieved from https:\/\/arxiv.org\/abs\/2310.01434"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.nlp4convai-1.5"},{"key":"e_1_3_2_46_2","volume-title":"International Conference on Learning Representations","author":"Chang Wei-Cheng","unstructured":"Wei-Cheng Chang, X. Yu Felix, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. 2020. Pre-training tasks for embedding-based large-scale retrieval. In International Conference on Learning Representations."},{"key":"e_1_3_2_47_2","unstructured":"Patrick Chao Edoardo Debenedetti Alexander Robey Maksym Andriushchenko Francesco Croce Vikash Sehwag Edgar Dobriban Nicolas Flammarion George J. Pappas Florian Tramer et al. 2024. Jailbreakbench: An open robustness benchmark for jailbreaking large language models. arXiv:2404.01318. Retrieved from https:\/\/arxiv.org\/abs\/2404.01318"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","unstructured":"Dong Chen Shuo Zhang Yueting Zhuang Siliang Tang Qidong Liu Hua Wang and Mingliang Xu. 2024. Improving large models with small models: Lower costs and better performance. arXiv:2406.15471. Retrieved from https:\/\/arxiv.org\/abs\/2406.15471","DOI":"10.2139\/ssrn.5185499"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i10.29003"},{"key":"e_1_3_2_50_2","unstructured":"Hardy Chen Haoqin Tu Fali Wang Hui Liu Xianfeng Tang Xinya Du Yuyin Zhou and Cihang Xie. 2025. SFT or RL? an early investigation into training R1-like reasoning large vision-language models. arXiv:2504.11468. Retrieved from https:\/\/arxiv.org\/abs\/2504.11468"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"6805","DOI":"10.18653\/v1\/2023.findings-emnlp.454","volume-title":"Findings of the Association for Computational Linguistics: EMNLP \u201923","author":"Chen Hongzhan","year":"2023","unstructured":"Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, and Ji Zhang. 2023. MCC-KD: Multi-CoT consistent knowledge distillation. In Findings of the Association for Computational Linguistics: EMNLP \u201923. Association for Computational Linguistics, 6805\u20136820."},{"key":"e_1_3_2_52_2","unstructured":"Lihu Chen and Ga\u00ebl Varoquaux. 2024. What is the role of small models in the LLM era: A survey. arXiv:2409.06857. Retrieved from https:\/\/arxiv.org\/abs\/2409.06857"},{"key":"e_1_3_2_53_2","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde De Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https:\/\/arxiv.org\/abs\/2107.03374"},{"key":"e_1_3_2_54_2","unstructured":"Tianyi Chen Tianyu Ding Badal Yadav Ilya Zharkov and Luming Liang. 2023. Lorashear: Efficient large language model structured pruning and knowledge recovery. arXiv:2310.18356. Retrieved from https:\/\/arxiv.org\/abs\/2310.18356"},{"key":"e_1_3_2_55_2","unstructured":"Wei Chen Zhiyuan Li and Mingyuan Ma. 2024. Octopus: On-device language model for function calling of software APIs. arXiv:240401549. Retrieved from https:\/\/arxiv.org\/abs\/2404.01549"},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"11215","DOI":"10.18653\/v1\/2022.emnlp-main.770","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP","author":"Chen Yangyi","year":"2022","unstructured":"Yangyi Chen, Fanchao Qi, Hongcheng Gao, Zhiyuan Liu, and Maosong Sun. 2022. Textual backdoor attacks can Be more harmful via two simple tricks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP. Springer-Verlag, 11215\u201311221."},{"key":"e_1_3_2_57_2","unstructured":"Yangyi Chen Xingyao Wang Hao Peng and Heng Ji. 2024. A single transformer for scalable vision-language modeling. arXiv:2407.06438. Retrieved from https:\/\/arxiv.org\/abs\/2407.06438"},{"key":"e_1_3_2_58_2","unstructured":"Zeming Chen Alejandro Hern\u00e1ndez Cano Angelika Romanou Antoine Bonnet Kyle Matoba Francesco Salvi Matteo Pagliardini Simin Fan Andreas K\u00f6pf Amirkeivan Mohtashami et al. 2023. MEDITRON-70B: Scaling medical pretraining for large language models. arXiv:2311.16079. Retrieved from https:\/\/arxiv.org\/abs\/2311.16079"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","first-page":"3697","DOI":"10.18653\/v1\/2021.emnlp-main.300","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Chen Zhiyu","year":"2021","unstructured":"Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R. Routledge, et al. 2021. FinQA: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. ACM, 3697\u20133711."},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"6279","DOI":"10.18653\/v1\/2022.emnlp-main.421","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Chen Zhiyu","year":"2022","unstructured":"Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William Yang Wang. 2022. ConvFinQA: Exploring the chain of numerical reasoning in conversational finance question answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 6279\u20136292."},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","unstructured":"Xiaoxue Cheng Junyi Li Wayne Xin Zhao Hongzhi Zhang Fuzheng Zhang Di Zhang Kun Gai and Ji-Rong Wen. 2024. Small agent can also rock! Empowering small language models as hallucination detector. arXiv:2406.11277. Retrieved from https:\/\/arxiv.org\/abs\/2406.11277","DOI":"10.18653\/v1\/2024.emnlp-main.809"},{"key":"e_1_3_2_62_2","unstructured":"Steffi Chern Zhulin Hu Yuqing Yang Ethan Chern Yuan Guo Jiahe Jin Binjie Wang and Pengfei Liu. 2024. BeHonest: Benchmarking honesty of large language models. arXiv:2406.13261. Retrieved from https:\/\/arxiv.org\/abs\/2406.13261"},{"key":"e_1_3_2_63_2","first-page":"35","volume-title":"Proceedings of the First Edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024)","author":"Chia Yew Ken","year":"2024","unstructured":"Yew Ken Chia, Pengfei Hong, Lidong Bing, and Soujanya Poria. 2024. InstructEval: Towards holistic evaluation of Instruction-Tuned large language models. In Proceedings of the First Edition of the Workshop on the Scaling Behavior of Large Language Models (SCALE-LLM 2024). Association for Computational Linguistics, 35\u201364."},{"key":"e_1_3_2_64_2","unstructured":"Wei-Lin Chiang Zhuohan Li Zi Lin Ying Sheng Zhanghao Wu Hao Zhang Lianmin Zheng Siyuan Zhuang Yonghao Zhuang Joseph E. Gonzalez Ion Stoica and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. Retrieved from https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/"},{"key":"e_1_3_2_65_2","unstructured":"Yae Jee Cho Luyang Liu Zheng Xu Aldi Fahrezi and Gauri Joshi. 2024. Heterogeneous LoRA for federated fine-tuning of on-device foundation models. arXiv:240106432. Retrieved from https:\/\/arxiv.org\/abs\/2401.06432"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531986"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.5555\/3722577.3722647"},{"key":"e_1_3_2_68_2","unstructured":"Christopher Clark Kenton Lee Ming-Wei Chang Tom Kwiatkowski Michael Collins and Kristina Toutanova. 2019. BoolQ: Exploring the surprising difficulty of natural yes\/no questions. arXiv:1905.10044. Retrieved from https:\/\/arxiv.org\/abs\/1905.10044"},{"key":"e_1_3_2_69_2","unstructured":"Peter Clark Isaac Cowhey Oren Etzioni Tushar Khot Ashish Sabharwal Carissa Schoenick and Oyvind Tafjord. 2018. Think you have solved question answering? Try ARC the AI2 reasoning challenge. arXiv:1803.05457. Retrieved from https:\/\/arxiv.org\/abs\/1803.05457"},{"key":"e_1_3_2_70_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano et al. 2021. Training verifiers to solve math word problems. arXiv:2110.14168. Retrieved from https:\/\/arxiv.org\/abs\/2110.14168"},{"key":"e_1_3_2_71_2","unstructured":"Together Computer. 2023. RedPajama: An Open Dataset for Training Large Language Models. Retrieved from https:\/\/github.com\/togethercomputer\/RedPajama-Data"},{"key":"e_1_3_2_72_2","unstructured":"Mike Conover Matt Hayes Ankit Mathur Jianwei Xie Jun Wan Sam Shah Ali Ghodsi Patrick Wendell Matei Zaharia and Reynold Xin. 2023. Free Dolly: Introducing the World\u2019s First Truly Open Instruction-Tuned LLM. Retrieved from https:\/\/www.databricks.com\/blog\/2023\/04\/12\/dolly-first-open-commercially-viable-instruction-tuned-llm"},{"key":"e_1_3_2_73_2","doi-asserted-by":"crossref","unstructured":"Nick Craswell Bhaskar Mitra Emine Yilmaz and Daniel Campos. 2021. Overview of the TREC 2020 deep learning track. arXiv:2102.07662. Retrieved from https:\/\/arxiv.org\/abs\/2102.07662","DOI":"10.6028\/NIST.SP.1266.deep-overview"},{"key":"e_1_3_2_74_2","unstructured":"Justin Cui Wei-Lin Chiang Ion Stoica and Cho-Jui Hsieh. 2024. OR-bench: An over-refusal benchmark for large language models. arXiv:2405.20947. Retrieved from https:\/\/arxiv.org\/abs\/2405.20947"},{"key":"e_1_3_2_75_2","unstructured":"Shiyao Cui Zhenyu Zhang Yilong Chen Wenyuan Zhang Tianyun Liu Siqi Wang and Tingwen Liu. 2023. Fft: Towards harmlessness evaluation and analysis for LLMS with factuality fairness toxicity. arXiv:2311.18580. Retrieved from https:\/\/arxiv.org\/abs\/2311.18580"},{"key":"e_1_3_2_76_2","unstructured":"Luigi Daniele and Suphava Deeprasit. 2023. Amplify-Instruct: Synthetically generated diverse multi-turn conversations for efficient LLM training. arXiv Preprint. Retrieved from https:\/\/huggingface.co\/datasets\/LDJnr\/Capybara"},{"key":"e_1_3_2_77_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Dao Tri","unstructured":"Tri Dao. 2024. FlashAttention-2: Faster attention with better parallelism and work partitioning. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_78_2","first-page":"16344","article-title":"Flashattention: Fast and memory-efficient exact attention with io-awareness","volume":"35","author":"Dao Tri","year":"2022","unstructured":"Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher R\u00e9. 2022. Flashattention: Fast and memory-efficient exact attention with io-awareness. Adv. Neural Inf. Process. Syst. 35, (2022), 16344\u201316359.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_79_2","unstructured":"Tri Dao and Albert Gu. 2024. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. arXiv:2405.21060. Retrieved from https:\/\/arxiv.org\/abs\/2405.21060"},{"key":"e_1_3_2_80_2","unstructured":"Rocktim Jyoti Das Liqun Ma and Zhiqiang Shen. 2023. Beyond size: How gradients shape pruning decisions in large language models. arXiv:2311.04902. Retrieved from https:\/\/arxiv.org\/abs\/2311.04902"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020578"},{"key":"e_1_3_2_82_2","first-page":"1693","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics","author":"Delobelle Pieter","year":"2022","unstructured":"Pieter Delobelle, Ewoenam Kwaku Tokpo, Toon Calders, and Bettina Berendt. 2022. Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 1693\u20131706."},{"key":"e_1_3_2_83_2","unstructured":"Yongheng Deng Ziqing Qiao Ju Ren Yang Liu and Yaoxue Zhang. 2023. Mutual enhancement of large and small language models with cross-silo knowledge transfer. arXiv:2312.05842. Retrieved from https:\/\/arxiv.org\/abs\/2312.05842"},{"key":"e_1_3_2_84_2","volume-title":"Advances in Neural Information Processing Systems","author":"Dettmers Tim","year":"2022","unstructured":"Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). Retrieved from https:\/\/openreview.net\/forum?id=dXiGWqBoxaD."},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.14314"},{"key":"e_1_3_2_86_2","first-page":"7750","volume-title":"International Conference on Machine Learning","author":"Dettmers Tim","year":"2023","unstructured":"Tim Dettmers, and Luke Zettlemoyer. 2023. The case for 4-bit precision: K -bit inference scaling laws. In International Conference on Machine Learning. PMLR, 7750\u20137774."},{"key":"e_1_3_2_87_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_2_88_2","unstructured":"Nolan Dey Gurpreet Gosal Zhiming Chen Hemant Khachane William Marshall Ribhu Pathria Marvin Tom and Joel Hestness. 2023. Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. CoRR abs\/2304.03208."},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","first-page":"3275","DOI":"10.18653\/v1\/2024.findings-emnlp.187","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2024, Yaser","author":"Martinez Richard Diehl","year":"2024","unstructured":"Richard Diehl Martinez, Pietro Lesci, and Paula Buttery. 2024. Tending Towards Stability: Convergence Challenges in Small Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, 3275\u20133286. https:\/\/doi.org\/10.18653\/v1\/2024.findings-emnlp.187"},{"key":"e_1_3_2_90_2","doi-asserted-by":"crossref","unstructured":"Ning Ding Yulin Chen Bokai Xu Yujia Qin Zhi Zheng Shengding Hu Zhiyuan Liu Maosong Sun and Bowen Zhou. 2023. Enhancing chat language models by scaling high-quality instructional conversations. arXiv:2305.14233. Retrieved from https:\/\/arxiv.org\/abs\/2305.14233","DOI":"10.18653\/v1\/2023.emnlp-main.183"},{"key":"e_1_3_2_91_2","unstructured":"Tinghe Ding. 2024. MobileAgent: Enhancing mobile control via human-machine interaction and SOP integration. arXiv:240104124. Retrieved from https:\/\/arxiv.org\/abs\/2401.04124"},{"key":"e_1_3_2_92_2","unstructured":"Ricardo Dominguez-Olmedo Moritz Hardt and Celestine Mendler-D\u00fcnner. 2023. Questioning the survey responses of large language models. arXiv:2306.07951. Retrieved from https:\/\/arxiv.org\/abs\/2306.07951"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614923"},{"key":"e_1_3_2_94_2","unstructured":"Xin Dong Yonggan Fu Shizhe Diao Wonmin Byeon Zijia Chen Ameya Sunil Mahabaleshwarkar Shih-Yang Liu Matthijs Van Keirsbilck Min-Hung Chen Yoshi Suhara et al. 2024. Hymba: A hybrid-head architecture for small language models. arXiv:2411.13676. Retrieved from https:\/\/arxiv.org\/abs\/2411.13676"},{"key":"e_1_3_2_95_2","unstructured":"Xinrun Du Zhouliang Yu Songyang Gao Ding Pan Yuyang Cheng Ziyang Ma Ruibin Yuan Xingwei Qu Jiaheng Liu Tianyu Zheng Xinchen Luo Guorui Zhou Wenhu Chen and Ge Zhang. 2024. Chinese tiny LLM: Pretraining a Chinese-centric large language model. arXiv:240404167. Retrieved from https:\/\/arxiv.org\/abs\/2404.04167"},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.26"},{"key":"e_1_3_2_97_2","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan et al. 2024. The Llama 3 herd of models. arXiv:2407.21783. Retrieved from https:\/\/arxiv.org\/abs\/2407.21783"},{"key":"e_1_3_2_98_2","doi-asserted-by":"crossref","unstructured":"Kazuki Egashira Mark Vero Robin Staab Jingxuan He and Martin Vechev. 2024. Exploiting LLM quantization. arXiv:2405.18137. Retrieved from https:\/\/arxiv.org\/abs\/2405.18137","DOI":"10.52202\/079017-1319"},{"key":"e_1_3_2_99_2","unstructured":"Ronen Eldan and Yuanzhi Li. 2023. Tinystories: How small can language models be and still speak coherent English?. arXiv:2305.07759. Retrieved from https:\/\/arxiv.org\/abs\/2305.07759"},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2017.12.012"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.230"},{"key":"e_1_3_2_102_2","unstructured":"Hugging Face. 2024. SmolVLM - small yet mighty Vision Language Model. Retrieved from https:\/\/huggingface.co\/blog\/smolvlm. Accessed: November 26 2024."},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2101.03961"},{"key":"e_1_3_2_104_2","unstructured":"Shangbin Feng Weijia Shi Yuyang Bai Vidhisha Balachandran Tianxing He and Yulia Tsvetkov. 2023. Knowledge card: Filling LLMs\u2019 knowledge gaps with plug-in specialized language models. arXiv:2305.09955. Retrieved from https:\/\/arxiv.org\/abs\/2305.09955"},{"key":"e_1_3_2_105_2","first-page":"10323","volume-title":"International Conference on Machine Learning","author":"Frantar Elias","year":"2023","unstructured":"Elias Frantar, and Dan Alistarh. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. In International Conference on Machine Learning. PMLR, 10323\u201310337."},{"key":"e_1_3_2_106_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"Frantar Elias","year":"2023","unstructured":"Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2023. GPTQ: Accurate post-training quantization for generative pre-trained transformers. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_107_2","unstructured":"Hao Fu Yao Peng and Tushar Khot. 2022. How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources. Yao Fu\u2019s Notion (Dec. 2022) Retrieved from https:\/\/yaofu.notion.site\/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1"},{"key":"e_1_3_2_108_2","first-page":"10421","volume-title":"International Conference on Machine Learning","author":"Fu Yao","unstructured":"Yao Fu, Hao Peng, and Litu Ou. Ashish sabharwal, and tushar khot. 2023. Specializing smaller language models towards multi-step reasoning. In International Conference on Machine Learning. PMLR, 10421\u201310430."},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.1145\/3594871"},{"key":"e_1_3_2_110_2","unstructured":"Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe Charles Foster Jason Phang Horace He Anish Thite Noa Nabeshima et al. 2020. The Pile: An 800GB dataset of diverse text for language modeling. arXiv:2101.00027. Retrieved from https:\/\/arxiv.org\/abs\/2101.00027"},{"key":"e_1_3_2_111_2","first-page":"2843","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics","author":"Gao Luyu","year":"2022","unstructured":"Luyu Gao, and Jamie Callan. 2022. Unsupervised corpus aware language model pre-training for dense passage retrieval. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Vol. 1 Long Papers, Association for Computational Linguistics, 2843\u20132853."},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","unstructured":"Leo Gao Jonathan Tow Baber Abbasi Stella Biderman Sid Black Anthony DiPofi Charles Foster Laurence Golding Jeffrey Hsu Alain Le Noac\u2019h et al. 2024. The Language Model Evaluation Harness. DOI: 10.5281\/zenodo.12608602","DOI":"10.5281\/zenodo.12608602"},{"key":"e_1_3_2_113_2","volume-title":"The Thirty-Eighth Annual Conference on Neural Information Processing Systems","author":"Gao Shangqian","year":"2024","unstructured":"Shangqian Gao, Chi-Heng Lin, Ting Hua, Zheng Tang, Yilin Shen, Hongxia Jin, and Yen-Chang Hsu. 2024. DISP-LLM: Dimension-Independent structural pruning for large language models. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems. Retrieved from https:\/\/openreview.net\/forum?id=YxaY6tHgg0"},{"key":"e_1_3_2_114_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Ge Suyu","year":"2024","unstructured":"Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, and Jianfeng Gao. 2024. Model tells you what to discard: Adaptive KV cache compression for LLMs. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=uNrFpDPMyo"},{"key":"e_1_3_2_115_2","doi-asserted-by":"crossref","unstructured":"Alex Gichamba Tewodros Kederalah Idris Brian Ebiyau Eric Nyberg and Teruko Mitamura. 2024. ColBERT retrieval and ensemble response scoring for language model question answering. arXiv:240810808. Retrieved from https:\/\/arxiv.org\/abs\/2408.10808","DOI":"10.1109\/GCWkshp64532.2024.11100297"},{"key":"e_1_3_2_116_2","unstructured":"Karan Goel. 2024. The OnDevice Intelligence Update. Retrieved from https:\/\/www.cartesia.ai\/blog\/on-device"},{"key":"e_1_3_2_117_2","unstructured":"Aaron Gokaslan Vanya Cohen Ellie Pavlick and Stefanie Tellex. 2019. Openwebtext corpus."},{"key":"e_1_3_2_118_2","doi-asserted-by":"publisher","DOI":"10.1145\/3593042"},{"key":"e_1_3_2_119_2","unstructured":"Dirk Groeneveld Iz Beltagy Pete Walsh Akshita Bhagia Rodney Kinney Oyvind Tafjord Ananya Harsh Jha Hamish Ivison Ian Magnusson Yizhong Wang et al. 2024. OLMo: Accelerating the science of language models. arXiv:2402.00838. Retrieved from https:\/\/arxiv.org\/abs\/2402.00838"},{"key":"e_1_3_2_120_2","unstructured":"Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces. arXiv:2312.00752. Retrieved from https:\/\/arxiv.org\/abs\/2312.00752"},{"key":"e_1_3_2_121_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.447"},{"key":"e_1_3_2_122_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Gu Yuxian","year":"2024","unstructured":"Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. 2024. MiniLLM: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_123_2","unstructured":"Suriya Gunasekar Yi Zhang Jyoti Aneja Caio C\u00e9sar Teodoro Mendes Allie Del Giorno Sivakanth Gopi Mojan Javaheripi Piero Kauffmann Gustavo de Rosa Olli Saarikivi Adil Salim Shital Shah Harkirat Singh Behl Xin Wang S\u00e9bastien Bubeck Ronen Eldan Adam Tauman Kalai Yin Tat Lee and Yuanzhi Li. 2023. Textbooks are all you need. arXiv:230611644. Retrieved from https:\/\/arxiv.org\/abs\/2306.11644"},{"key":"e_1_3_2_124_2","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Yu Wu Y. K. Li et al. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming\u2013The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196."},{"key":"e_1_3_2_125_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Guo Jinyang","year":"2024","unstructured":"Jinyang Guo, Jianyu Wu, Zining Wang, Jiaheng Liu, Ge Yang, Yifu Ding, Ruihao Gong, Haotong Qin, and Xianglong Liu. 2024. Compressing large language models by joint sparsification and quantization. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_126_2","unstructured":"Shangwei Guo Chunlong Xie Jiwei Li Lingjuan Lyu and Tianwei Zhang. 2022. Threats to pre-trained language models: Survey and taxonomy. arXiv preprint arXiv:2202.06862."},{"key":"e_1_3_2_127_2","unstructured":"Song Guo Jiahang Xu Li Lyna Zhang and Mao Yang. 2023. Compresso: Structured pruning with collaborative prompting learns compact large language models. arXiv preprint arXiv:2310.05015."},{"key":"e_1_3_2_128_2","unstructured":"Zhen Guo Peiqi Wang Yanwei Wang and Shangdi Yu. 2023. Improving small language models on PubMedQA via generative data augmentation. arXiv:230507804. Retrieved from https:\/\/arxiv.org\/abs\/2305.07804"},{"key":"e_1_3_2_129_2","first-page":"1135","article-title":"Learning both weights and connections for efficient neural network","volume":"28","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst., 28 (2015), 1135\u20131143.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_130_2","doi-asserted-by":"publisher","DOI":"10.1145\/3662006.3662067"},{"key":"e_1_3_2_131_2","unstructured":"Tim Hartill Diana Benavides-Prado Michael Witbrock and Patricia J. Riddle. 2023. Answering unseen questions with smaller language models using rationale generation and dense retrieval. arXiv:230804711. Retrieved from https:\/\/arxiv.org\/abs\/2308.04711"},{"key":"e_1_3_2_132_2","unstructured":"Tim Hartill Neset Tan Michael Witbrock and Patricia J. Riddle. 2023. Teaching smaller language models to generalise to unseen compositional questions. arXiv:230800946. Retrieved from https:\/\/arxiv.org\/abs\/2308.00946"},{"key":"e_1_3_2_133_2","doi-asserted-by":"crossref","unstructured":"Kai He Rui Mao Qika Lin Yucheng Ruan Xiang Lan Mengling Feng and Erik Cambria. 2023. A survey of large language models for healthcare: From data technology and applications to accountability and ethics. arXiv:2310.05694. Retrieved from https:\/\/arxiv.org\/abs\/2310.05694","DOI":"10.2139\/ssrn.4809363"},{"key":"e_1_3_2_134_2","volume-title":"The Eleventh International Conference on Learning Representations","author":"He Pengcheng","year":"2023","unstructured":"Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2023. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled embedding sharing. In The Eleventh International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=sE7-XhLxHA"},{"key":"e_1_3_2_135_2","unstructured":"Pengcheng He Xiaodong Liu Jianfeng Gao and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention. arXiv:2006.03654. Retrieved from https:\/\/arxiv.org\/abs\/2006.03654"},{"key":"e_1_3_2_136_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.109835"},{"key":"e_1_3_2_137_2","unstructured":"Dan Hendrycks Collin Burns Steven Basart Andy Zou Mantas Mazeika Dawn Song and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. arXiv:2009.03300. Retrieved from https:\/\/arxiv.org\/abs\/2009.03300"},{"key":"e_1_3_2_138_2","volume-title":"International Conference on Learning Representations","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring massive multitask language understanding. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=d7KBjmI3GmQ"},{"key":"e_1_3_2_139_2","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv:1606.08415. Retrieved from https:\/\/arxiv.org\/abs\/1606.08415"},{"key":"e_1_3_2_140_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https:\/\/arxiv.org\/abs\/1503.02531"},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.5555\/2998981.2999048"},{"key":"e_1_3_2_142_2","volume-title":"Proceedings of the Forty-First International Conference on Machine Learning","author":"Hong Junyuan","year":"2024","unstructured":"Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian R. Bartoldson, Ajay Kumar Jaiswal, Kaidi Xu, et al. 2024. Decoding compressed trust: Scrutinizing the trustworthiness of efficient LLMs under compression. In Proceedings of the Forty-First International Conference on Machine Learning, ICML. Retrieved from https:\/\/openreview.net\/forum?id=e3Dpq3WdMv"},{"key":"e_1_3_2_143_2","unstructured":"Yutong Meng Yuhao Wang Hongcheng Liu and Yusheng Liao. 2023. XieZhi: Chinese law large language model. Retrieved from https:\/\/github.com\/LiuHC0428\/LAW_GPT"},{"key":"e_1_3_2_144_2","first-page":"2790","volume-title":"International Conference on Machine Learning","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790\u20132799."},{"key":"e_1_3_2_145_2","doi-asserted-by":"crossref","first-page":"8003","DOI":"10.18653\/v1\/2023.findings-acl.507","volume-title":"Findings of the Association for Computational Linguistics: ACL \u201923","author":"Hsieh Cheng-Yu","year":"2023","unstructured":"Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alex Ratner, Ranjay Krishna, Chen-Yu Lee, and Tomas Pfister. 2023. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. In Findings of the Association for Computational Linguistics: ACL \u201923. Association for Computational Linguistics, 8003\u20138017."},{"key":"e_1_3_2_146_2","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv:2106.09685. Retrieved from https:\/\/arxiv.org\/abs\/2106.09685"},{"key":"e_1_3_2_147_2","unstructured":"Shengding Hu Yuge Tu Xu Han Chaoqun He Ganqu Cui Xiang Long Zhi Zheng Yewei Fang Yuxiang Huang Weilin Zhao et al. 2024. Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv:2404.06395. Retrieved from https:\/\/arxiv.org\/abs\/2404.06395"},{"key":"e_1_3_2_148_2","unstructured":"Xing Hu Yuan Chen Dawei Yang Sifan Zhou Zhihang Yuan Jiangyong Yu and Chen Xu. 2024. I-LLM: Efficient integer-only inference for fully-quantized low-bit large language models. arXiv:2405.17849. Retrieved from https:\/\/arxiv.org\/abs\/2405.17849"},{"key":"e_1_3_2_149_2","volume-title":"The 2023 Conference on Empirical Methods in Natural Language Processing","author":"Huang Jiaxin","unstructured":"Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei Han. 2023. Large language models can Self-Improve. In The 2023 Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_3_2_150_2","unstructured":"Lei Huang Weijiang Yu Weitao Ma Weihong Zhong Zhangyin Feng Haotian Wang Qianglong Chen Weihua Peng Xiaocheng Feng Bing Qin et al. 2023. A survey on hallucination in large language models: Principles taxonomy challenges and open questions. arXiv:2311.05232. Retrieved from https:\/\/arxiv.org\/abs\/2311.05232"},{"key":"e_1_3_2_151_2","unstructured":"Wei Huang Yangdong Liu Haotong Qin Ying Li Shiming Zhang Xianglong Liu Michele Magno and Xiaojuan Qi. 2024. Billm: Pushing the limit of post-training quantization for LLMS. arXiv:2402.04291. Retrieved from https:\/\/arxiv.org\/abs\/2402.04291"},{"key":"e_1_3_2_152_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.927"},{"key":"e_1_3_2_153_2","first-page":"62991","article-title":"C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models","volume":"36","author":"Huang Yuzhen","year":"2024","unstructured":"Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Yao Fu, et al. 2024. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Adv. Neural Inf. Process. Syst. 36 (2024), 62991\u201363010.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_154_2","unstructured":"Yangsibo Huang Samyak Gupta Mengzhou Xia Kai Li and Danqi Chen. 2023. Catastrophic jailbreak of open-source LLMs via exploiting generation. arXiv:2310.06987. Retrieved from https:\/\/arxiv.org\/abs\/2310.06987"},{"key":"e_1_3_2_155_2","unstructured":"Yuheng Huang Jiayang Song Zhijie Wang Shengming Zhao Huaming Chen Felix Juefei-Xu and Lei Ma. 2023. Look before you leap: An exploratory study of uncertainty measurement for large language models. arXiv:2307.10236. Retrieved from https:\/\/arxiv.org\/abs\/2307.10236"},{"key":"e_1_3_2_156_2","volume-title":"International Conference on Learning Representations","author":"Humeau Samuel","unstructured":"Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In International Conference on Learning Representations."},{"key":"e_1_3_2_157_2","unstructured":"Hakan Inan Kartikeya Upasani Jianfeng Chi Rashi Rungta Krithika Iyer Yuning Mao Michael Tontchev Qing Hu Brian Fuller Davide Testuggine et al. 2023. Llama guard: LLM-based input-output safeguard for human-AI conversations. arXiv:2312.06674. Retrieved from https:\/\/arxiv.org\/abs\/2312.06674"},{"key":"e_1_3_2_158_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1991.3.1.79"},{"key":"e_1_3_2_159_2","volume-title":"Microsoft Research Blog","author":"Javaheripi Mojan","year":"2023","unstructured":"Mojan Javaheripi, S\u00e9bastien Bubeck, Marah Abdin, Jyoti Aneja, Sebastien Bubeck, Caio C\u00e9sar Teodoro Mendes, Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, et al. 2023. Phi-2: The surprising power of small language models. In Microsoft Research Blog."},{"key":"e_1_3_2_160_2","doi-asserted-by":"crossref","first-page":"15459","DOI":"10.18653\/v1\/2023.findings-emnlp.1033","volume-title":"Findings of the Association for Computational Linguistics: EMNLP \u201923","author":"Jeong Soyeong","year":"2023","unstructured":"Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Hwang, and Jong C. Park. 2023. Test-Time Self-Adaptive small language models for question answering. In Findings of the Association for Computational Linguistics: EMNLP \u201923. Association for Computational Linguistics, 15459\u201315469."},{"key":"e_1_3_2_161_2","unstructured":"Ananya Harsh Jha Tom Sherborne Evan Pete Walsh Dirk Groeneveld Emma Strubell and Iz Beltagy. 2024. Just CHOP: Embarrassingly simple LLM compression. arXiv:230514864. Retrieved from https:\/\/arxiv.org\/abs\/2305.14864"},{"key":"e_1_3_2_162_2","unstructured":"Yixin Ji Yang Xiang Juntao Li Wei Chen Zhongyi Liu Kehai Chen and Min Zhang. 2024. Feature-based low-rank compression of large language models via Bayesian optimization. arXiv:2405.10616. Retrieved from https:\/\/arxiv.org\/abs\/2405.10616"},{"key":"e_1_3_2_163_2","first-page":"1827","volume-title":"Findings of the Association for Computational Linguistics: EMNLP \u201923","author":"Ji Ziwei","year":"2023","unstructured":"Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. 2023. Towards mitigating LLM hallucination via self reflection. In Findings of the Association for Computational Linguistics: EMNLP \u201923. Association for Computational Linguistics, 1827\u20131843."},{"key":"e_1_3_2_164_2","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier et al. 2023. Mistral 7B. arXiv:2310.06825. Retrieved from https:\/\/arxiv.org\/abs\/2310.06825"},{"key":"e_1_3_2_165_2","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Antoine Roux Arthur Mensch Blanche Savary Chris Bamford Devendra Singh Chaplot Diego de las Casas Emma Bou Hanna Florian Bressand et al. 2024. Mixtral of experts. arXiv:2401.04088. Retrieved from https:\/\/arxiv.org\/abs\/2401.04088"},{"key":"e_1_3_2_166_2","volume-title":"ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models","author":"Jiang Huiqiang","year":"2024","unstructured":"Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2024. LongLLMLingua: Accelerating and enhancing LLMs in long context scenarios via prompt compression. In ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models. Retrieved from https:\/\/openreview.net\/forum?id=9YvfRrpmyw"},{"key":"e_1_3_2_167_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1259"},{"key":"e_1_3_2_168_2","doi-asserted-by":"crossref","unstructured":"Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. Trans. ASME D 82 (1960) 35\u201344.","DOI":"10.1115\/1.3662552"},{"key":"e_1_3_2_169_2","unstructured":"Hao Kang Qingru Zhang Souvik Kundu Geonhwa Jeong Zaoxing Liu Tushar Krishna and Tuo Zhao. 2024. Gear: An efficient KV cache compression recipefor near-lossless generative inference of LLM. arXiv:2403.05527. Retrieved from https:\/\/arxiv.org\/abs\/2403.05527"},{"key":"e_1_3_2_170_2","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling laws for neural language models. arXiv:2001.08361. Retrieved from https:\/\/arxiv.org\/abs\/2001.08361"},{"key":"e_1_3_2_171_2","unstructured":"Omar Khattab Arnav Singhvi Paridhi Maheshwari Zhiyuan Zhang Keshav Santhanam Sri Vardhamanan Saiful Haq Ashutosh Sharma Thomas T. Joshi Hanna Moazam et al. 2023. DSPy: Compiling declarative language model calls into self-improving pipelines. arXiv:2310.03714. Retrieved from https:\/\/arxiv.org\/abs\/2310.03714"},{"key":"e_1_3_2_172_2","first-page":"36187","article-title":"Memory-efficient fine-tuning of compressed large language models via Sub-4-bit integer quantization","volume":"36","author":"Kim Jeonghoon","year":"2023","unstructured":"Jeonghoon Kim, Jung Hyun Lee, Sungdong Kim, Joonsuk Park, Kang Min Yoo, Se Jung Kwon, and Dongsoo Lee. 2023. Memory-efficient fine-tuning of compressed large language models via Sub-4-bit integer quantization. Adv. Neural Inf. Process. Syst. 36 (2023), 36187\u201336207.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_173_2","first-page":"42097","article-title":"Token-scaled logit distillation for ternary weight generative language models","volume":"36","author":"Kim Minsoo","year":"2023","unstructured":"Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, and Jungwook Choi. 2023. Token-scaled logit distillation for ternary weight generative language models. Adv. Neural Inf. Process. Syst. 36 (2023), 42097\u201342118.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_174_2","unstructured":"Sehoon Kim Coleman Hooper Amir Gholami Zhen Dong Xiuyu Li Sheng Shen Michael W. Mahoney and Kurt Keutzer. 2023. Squeezellm: Dense-and-sparse quantization. arXiv:2306.07629. Retrieved from https:\/\/arxiv.org\/abs\/2306.07629"},{"key":"e_1_3_2_175_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1139"},{"key":"e_1_3_2_176_2","unstructured":"Young Jin Kim Raffy Fahim and Hany Hassan Awadalla. 2023. Mixture of quantized experts (MoQE): Complementary effect of low-bit quantization and robustness. arXiv:2310.02410. Retrieved from https:\/\/arxiv.org\/abs\/2310.02410"},{"key":"e_1_3_2_177_2","unstructured":"Jongwoo Ko Sungnyun Kim Tianyi Chen and Se-Young Yun. 2024. DistiLLM: Towards streamlined distillation for large language models. arXiv:2402.03898. Retrieved from https:\/\/arxiv.org\/abs\/2402.03898"},{"key":"e_1_3_2_178_2","unstructured":"Denis Kocetkov Raymond Li Loubna Ben Allal Jia Li Chenghao Mou Carlos Mu\u00f1oz Ferrandis Yacine Jernite Margaret Mitchell Sean Hughes Thomas Wolf et al. 2022. The stack: 3 TB of permissively licensed source code. arXiv:2211.15533. Retrieved from https:\/\/arxiv.org\/abs\/2211.15533"},{"key":"e_1_3_2_179_2","volume-title":"Proceedings of the Eleventh International Conference on Learning Representations","author":"Kuhn Lorenz","year":"2023","unstructured":"Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. 2023. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In Proceedings of the Eleventh International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=VD-AYtP0dve"},{"key":"e_1_3_2_180_2","unstructured":"Divyanshu Kumar Anurakt Kumar Sahil Agarwal and Prashanth Harshangi. 2024. Fine-tuning quantization and LLMs: Navigating unintended outcomes. arXiv:2404.04392. Retrieved from https:\/\/arxiv.org\/abs\/2404.04392"},{"key":"e_1_3_2_181_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-industry.99"},{"key":"e_1_3_2_182_2","doi-asserted-by":"crossref","unstructured":"Yanis Labrak Adrien Bazoge Emmanuel Morin Pierre-Antoine Gourraud Mickael Rouvier and Richard Dufour. 2024. Biomistral: A collection of open-source pretrained large language models for medical domains. arXiv:2402.10373. Retrieved from https:\/\/arxiv.org\/abs\/2402.10373","DOI":"10.18653\/v1\/2024.findings-acl.348"},{"key":"e_1_3_2_183_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1131"},{"key":"e_1_3_2_184_2","first-page":"31809","article-title":"The bigscience roots corpus: A 1.6 tb composite multilingual dataset","volume":"35","author":"Lauren\u00e7on Hugo","year":"2022","unstructured":"Hugo Lauren\u00e7on, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo Gonz\u00e1lez Ponferrada, Huu Nguyen, et al. 2022. The bigscience roots corpus: A 1.6 tb composite multilingual dataset. Adv. Neural Inf. Process. Syst. 35 (2022), 31809\u201331826.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_185_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ili\u0107 Daniel Hesslow Roman Castagn\u00e9 Alexandra Sasha Luccioni Fran\u00e7ois Yvon Matthias Gall\u00e9 et al. 2023. Bloom: A 176b-parameter open-access multilingual language model. arxiv:2211.05100. Retrieved from https:\/\/arxiv.org\/abs\/2211.05100"},{"key":"e_1_3_2_186_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.977"},{"key":"e_1_3_2_187_2","first-page":"2835","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING \u201924)","author":"Lee Jooyoung","year":"2024","unstructured":"Jooyoung Lee, Fan Yang, Thanh Tran, Qian Hu, Emre Barut, and Kai-Wei Chang. 2024. Can small language models help large language models reason better?: LM-Guided chain-of-thought. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING \u201924), 2835\u20132843."},{"key":"e_1_3_2_188_2","unstructured":"Benjamin Lefaudeux Francisco Massa Diana Liskovich Wenhan Xiong Vittorio Caggiano Sean Naren Min Xu Jieru Hu Marta Tintore Susan Zhang et al. 2022. xFormers: A modular and hackable transformer modelling library. Retrieved from https:\/\/github.com\/facebookresearch\/xformers"},{"key":"e_1_3_2_189_2","unstructured":"Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https:\/\/arxiv.org\/abs\/1607.06450v1"},{"key":"e_1_3_2_190_2","first-page":"3843","article-title":"Solving quantitative reasoning problems with language models","volume":"35","author":"Lewkowycz Aitor","year":"2022","unstructured":"Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. 2022. Solving quantitative reasoning problems with language models. Adv. Neural Inf. Process. Syst. 35, (2022), 3843\u20133857.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_191_2","unstructured":"Chenglin Li Qianglong Chen Liangyue Li Caiyu Wang Yicheng Li Zulong Chen and Yin Zhang. 2023. Mixed distillation helps smaller language model better reasoning. arXiv:2312.10730. Retrieved from https:\/\/arxiv.org\/abs\/2312.10730"},{"key":"e_1_3_2_192_2","unstructured":"Guangyan Li Yongqiang Tang and Wensheng Zhang. 2024. LoRAP: Transformer sub-layers deserve differentiated structured compression for large language models. arXiv:2404.09695. Retrieved from https:\/\/arxiv.org\/abs\/2404.09695"},{"key":"e_1_3_2_193_2","unstructured":"Haitao Li Qingyao Ai Jia Chen Qian Dong Zhijing Wu Yiqun Liu Chong Chen and Qi Tian. 2024. BLADE: Enhancing black-box large language models with small domain-specific models. arXiv:2403.18365. Retrieved from https:\/\/arxiv.org\/abs\/2403.18365"},{"key":"e_1_3_2_194_2","first-page":"54","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics","author":"Li Haoran","year":"2024","unstructured":"Haoran Li, Dadi Guo, Donghao Li, Wei Fan, Qi Hu, Xin Liu, Chunkit Chan, Duanyi Yao, Yuan Yao, and Yangqiu Song. 2024. PrivLM-Bench: A multi-level privacy evaluation benchmark for language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 54\u201373."},{"key":"e_1_3_2_195_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.397"},{"key":"e_1_3_2_196_2","unstructured":"Jeffrey Li Alex Fang Georgios Smyrnis Maor Ivgi Matt Jordan Samir Gadre Hritik Bansal Etash Guha Sedrick Keh Kushal Arora et al. 2024. DataComp-LM: In search of the next generation of training sets for language models. arXiv:2406.11794. Retrieved from https:\/\/arxiv.org\/abs\/2406.11794"},{"key":"e_1_3_2_197_2","unstructured":"Luchang Li Sheng Qian Jie Lu Lunxi Yuan Rui Wang and Qin Xie. 2024. Transformer-lite: high-efficiency deployment of large language models on mobile phone GPUs. arXiv:2403.20041. Retrieved from https:\/\/arxiv.org\/abs\/2403.20041"},{"key":"e_1_3_2_198_2","unstructured":"Pingzhi Li Xiaolong Jin Yu Cheng and Tianlong Chen. 2024. Examining post-training quantization for mixture-of-experts: A benchmark. arXiv:2406.08155. Retrieved from https:\/\/arxiv.org\/abs\/2406.08155"},{"key":"e_1_3_2_199_2","unstructured":"Quan Li Tianxiang Zhao Lingwei Chen Junjie Xu and Suhang Wang. 2024. Enhancing graph neural networks with limited labeled data by actively distilling knowledge from large language models. arXiv:2407.13989. Retrieved from https:\/\/arxiv.org\/abs\/2407.13989"},{"key":"e_1_3_2_200_2","unstructured":"Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim et al. 2023. StarCoder: May the source be with you! arXiv:2305.06161. Retrieved from https:\/\/arxiv.org\/abs\/2305.06161"},{"key":"e_1_3_2_201_2","unstructured":"Shengrui Li Xueting Han and Jing Bai. 2024. Nuteprune: Efficient progressive pruning with numerous teachers for large language models. arXiv:2402.09773. Retrieved from https:\/\/arxiv.org\/abs\/2402.09773"},{"key":"e_1_3_2_202_2","unstructured":"Tianlin Li Qian Liu Tianyu Pang Chao Du Qing Guo Yang Liu and Min Lin. 2024. Purifying large language models by ensembling a small language model. arXiv:2402.14845. Retrieved from https:\/\/arxiv.org\/abs\/2402.14845"},{"key":"e_1_3_2_203_2","first-page":"12286","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics","author":"Li Xiang Lisa","year":"2023","unstructured":"Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori B. Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2023. Contrastive decoding: Open-ended text generation as optimization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Vol. 1 Long Papers, Association for Computational Linguistics, 12286\u201312312."},{"key":"e_1_3_2_204_2","unstructured":"Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv:2101.00190. Retrieved from https:\/\/arxiv.org\/abs\/2101.00190"},{"key":"e_1_3_2_205_2","unstructured":"Yuanzhi Li S\u00e9bastien Bubeck Ronen Eldan Allie Del Giorno Suriya Gunasekar and Yin Tat Lee. 2023. Textbooks are all you need II: Phi-1.5 Technical Report. arXiv:2309.05463. Retrieved from https:\/\/arxiv.org\/abs\/2309.05463"},{"key":"e_1_3_2_206_2","unstructured":"Yun Li Lin Niu Xipeng Zhang Kai Liu Jianchen Zhu and Zhanhui Kang. 2023. E-Sparse: Boosting the large language model inference through entropy-based N:M sparsity. arXiv:2310.15929. Retrieved from https:\/\/arxiv.org\/abs\/2310.15929"},{"key":"e_1_3_2_207_2","first-page":"374","volume-title":"In Proceedings of the Fourth ACM International Conference on AI in Finance","author":"Li Yinheng","year":"2023","unstructured":"Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. 2023. Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance, 374\u2013382."},{"key":"e_1_3_2_208_2","unstructured":"Wing Lian Guan Wang Bleys Goodson Eugene Pentland Austin Cook Chanvichet and Vong Teknium. 2023. SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces with Verification. Retrieved from https:\/\/https:\/\/huggingface.co\/Open-Orca\/SlimOrca"},{"key":"e_1_3_2_209_2","doi-asserted-by":"crossref","first-page":"14133","DOI":"10.18653\/v1\/2024.findings-acl.840","volume-title":"Findings of the Association for Computational Linguistics ACL \u201924","author":"Liang Jinggui","year":"2024","unstructured":"Jinggui Liang, Lizi Liao, Hao Fei, and Jing Jiang. 2024. Synergizing large language models and Pre-Trained smaller models for conversational intent discovery. In Findings of the Association for Computational Linguistics ACL \u201924. Association for Computational Linguistics, 14133\u201314147."},{"key":"e_1_3_2_210_2","unstructured":"Percy Liang Rishi Bommasani Tony Lee Dimitris Tsipras Dilara Soylu Michihiro Yasunaga Yian Zhang Deepak Narayanan Yuhuai Wu Ananya Kumar et al. 2023. Holistic evaluation of language models. In TMLR \u201923. Retrieved from https:\/\/openreview.net\/forum?id=iO4LZibEqW"},{"key":"e_1_3_2_211_2","unstructured":"Opher Lieber Barak Lenz Hofit Bata Gal Cohen Jhonathan Osin Itay Dalmedigos Erez Safahi Shaked Meirom Yonatan Belinkov Shai Shalev-Shwartz et al. 2024. Jamba: A hybrid transformer-Mamba language model. arXiv:2403.19887. Retrieved from https:\/\/arxiv.org\/abs\/2403.19887"},{"key":"e_1_3_2_212_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3645467"},{"key":"e_1_3_2_213_2","first-page":"87","article-title":"AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration","volume":"6","author":"Lin Ji","year":"2024","unstructured":"Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. AWQ: Activation-aware weight quantization for on-device LLM compression and acceleration. Proc. Mach. Learn. Syst. 6 (2024), 87\u2013100.","journal-title":"Proc. Mach. Learn. Syst"},{"key":"e_1_3_2_214_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"e_1_3_2_215_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.616"},{"key":"e_1_3_2_216_2","unstructured":"Zhenghao Lin Zhibin Gou Yeyun Gong Xiao Liu Yelong Shen Ruochen Xu Chen Lin Yujiu Yang Jian Jiao Nan Duan et al. 2024. Rho-1: Not all tokens are what you need. arXiv:2404.07965. Retrieved from https:\/\/arxiv.org\/abs\/2404.07965"},{"key":"e_1_3_2_217_2","unstructured":"Aixin Liu Bei Feng Bin Wang Bingxuan Wang Bo Liu Chenggang Zhao Chengqi Dengr Chong Ruan Damai Dai Daya Guo et al. 2024. DeepSeek-v2: A strong economical and efficient mixture-of-experts language model. arXiv:2405.04434. Retrieved from https:\/\/arxiv.org\/abs\/2405.04434"},{"key":"e_1_3_2_218_2","unstructured":"Alisa Liu Xiaochuang Han Yizhong Wang Yulia Tsvetkov Yejin Choi and Noah A. Smith. 2024. Tuning language models by proxy. arXiv:2401.08565. Retrieved from https:\/\/arxiv.org\/abs\/2401.08565"},{"key":"e_1_3_2_219_2","unstructured":"Jiashuo Liu Zheyan Shen Yue He Xingxuan Zhang Renzhe Xu Han Yu and Peng Cui. 2021. Towards out-of-distribution generalization: A survey. arXiv:2108.13624. Retrieved from https:\/\/arxiv.org\/abs\/2108.13624"},{"key":"e_1_3_2_220_2","doi-asserted-by":"publisher","DOI":"10.1145\/3616855.3635845"},{"key":"e_1_3_2_221_2","first-page":"388","volume-title":"Proceedings of the 2024 on Innovation and Technology in Computer Science Education","author":"Liu Suqing","year":"2024","unstructured":"Suqing Liu, Zezhu Yu, Feiran Huang, Yousef Bulbulia, Andreas Bergen, and Michael Liut. 2024. Can small language models with retrieval-augmented generation replace large language models when learning computer science?. In Proceedings of the 2024 on Innovation and Technology in Computer Science Education, Vol. 1, ACM, 388\u2013393."},{"key":"e_1_3_2_222_2","unstructured":"Wei Liu Weihao Zeng Keqing He Yong Jiang and Junxian He. 2023. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. arXiv:2312.15685. Retrieved from https:\/\/arxiv.org\/abs\/2312.15685"},{"key":"e_1_3_2_223_2","unstructured":"Yinpeng Liu Jiawei Liu Xiang Shi Qikai Cheng and Wei Lu. 2024. Let\u2019s learn step by step: Enhancing in-context learning ability with curriculum learning. arXiv:2402.10738. Retrieved from https:\/\/arxiv.org\/abs\/2402.10738"},{"key":"e_1_3_2_224_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_225_2","unstructured":"Yanming Liu Xinyue Peng Xuhong Zhang Weihao Liu Jianwei Yin Jiannan Cao and Tianyu Du. 2024. RA-ISF: Learning to answer and understand from retrieval augmentation via iterative self-feedback. arXiv:2403.06840. Retrieved from https:\/\/arxiv.org\/abs\/2403.06840"},{"key":"e_1_3_2_226_2","first-page":"52342","article-title":"Scissorhands: Exploiting the persistence of importance hypothesis for LLM KV cache compression at test time","volume":"36","author":"Liu Zichang","year":"2023","unstructured":"Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, and Anshumali Shrivastava. 2023. Scissorhands: Exploiting the persistence of importance hypothesis for LLM KV cache compression at test time. Adv. Neural Inf. Process. Syst. 36 (2023), 52342\u201352364.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_227_2","unstructured":"Zechun Liu Barlas Oguz Changsheng Zhao Ernie Chang Pierre Stock Yashar Mehdad Yangyang Shi Raghuraman Krishnamoorthi and Vikas Chandra. 2023. LLM-QAT: Data-free quantization aware training for large language models. arXiv:2305.17888. Retrieved from https:\/\/arxiv.org\/abs\/2305.17888"},{"key":"e_1_3_2_228_2","unstructured":"Zechun Liu Changsheng Zhao Forrest Iandola Chen Lai Yuandong Tian Igor Fedorov Yunyang Xiong Ernie Chang Yangyang Shi Raghuraman Krishnamoorthi et al. 2024. Mobilellm: Optimizing sub-billion parameter language models for on-device use cases. arXiv:2402.14905. Retrieved from https:\/\/arxiv.org\/abs\/2402.14905"},{"key":"e_1_3_2_229_2","doi-asserted-by":"crossref","first-page":"11065","DOI":"10.18653\/v1\/2024.findings-acl.658","volume-title":"Findings of the Association for Computational Linguistics ACL \u201924","author":"Long Lin","year":"2024","unstructured":"Lin Long, Rui Wang, Ruixuan Xiao, Junbo Zhao, Xiao Ding, Gang Chen, and Haobo Wang. 2024. On LLMs-Driven synthetic data generation, curation, and evaluation: A survey. In Findings of the Association for Computational Linguistics ACL \u201924. Association for Computational Linguistics, 11065\u201311082."},{"key":"e_1_3_2_230_2","first-page":"22631","volume-title":"International Conference on Machine Learning","author":"Longpre Shayne","year":"2023","unstructured":"Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, et al. 2023. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning. PMLR, 22631\u201322648."},{"key":"e_1_3_2_231_2","unstructured":"Shayne Longpre Robert Mahari Anthony Chen Naana Obeng-Marnu Damien Sileo William Brannon Niklas Muennighoff Nathan Khazam Jad Kabbara Kartik Perisetla et al. 2023. The data provenance initiative: A large scale audit of dataset licensing & attribution in AI. arXiv:2310.16787. Retrieved from https:\/\/arxiv.org\/abs\/2310.16787"},{"key":"e_1_3_2_232_2","unstructured":"Anton Lozhkov Raymond Li Loubna Ben Allal Federico Cassano Joel Lamy-Poirier Nouamane Tazi Ao Tang Dmytro Pykhtar Jiawei Liu Yuxiang Wei et al. 2024. StarCoder 2 and the stack v2: The next generation. arXiv:2402.19173. Retrieved from https:\/\/arxiv.org\/abs\/2402.19173"},{"key":"e_1_3_2_233_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412747"},{"key":"e_1_3_2_234_2","unstructured":"Zhenyan Lu Xiang Li Dongqi Cai Rongjie Yi Fangming Liu Xiwen Zhang Nicholas D. Lane and Mengwei Xu. 2024. Small language models: Survey measurements and insights. arXiv:2409.15790. Retrieved from https:\/\/arxiv.org\/abs\/2409.15790"},{"key":"e_1_3_2_235_2","unstructured":"Haitong Luo Xuying Meng Suhang Wang Tianxiang Zhao Fali Wang Hanyun Cao and Yujun Zhang. 2024. Enhance graph alignment for large language models. arXiv:2410.11370. Retrieved from https:\/\/arxiv.org\/abs\/2410.11370"},{"key":"e_1_3_2_236_2","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac409"},{"key":"e_1_3_2_237_2","unstructured":"Shuming Ma Hongyu Wang Lingxiao Ma Lei Wang Wenhui Wang Shaohan Huang Li Dong Ruiping Wang Jilong Xue and Furu Wei. 2024. The era of 1-bit LLMS: All large language models are in 1.58 bits. arXiv:2402.17764. Retrieved from https:\/\/arxiv.org\/abs\/2402.17764"},{"key":"e_1_3_2_238_2","first-page":"21702","article-title":"Llm-pruner: On the structural pruning of large language models","volume":"36","author":"Ma Xinyin","year":"2023","unstructured":"Xinyin Ma, Gongfan Fang, and Xinchao Wang. 2023. Llm-pruner: On the structural pruning of large language models. Adv. Neural Inf. Process. Syst. 36 (2023), 21702\u201321720.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_239_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.322"},{"key":"e_1_3_2_240_2","unstructured":"Yubo Ma Yixin Cao YongChing Hong and Aixin Sun. 2023. Large language model is not a good few-shot information extractor but a good reranker for hard samples! arXiv:2303.08559. Retrieved from https:\/\/arxiv.org\/abs\/2303.08559"},{"key":"e_1_3_2_241_2","first-page":"2394","volume-title":"2023 9th International Conference on Computer and Communications (ICCC)","author":"Ma Yuhan","year":"2023","unstructured":"Yuhan Ma, Chenyou Fan, and Haiqi Jiang. 2023. Sci-cot: Leveraging large language models for enhanced knowledge distillation in small models for scientific QA. In 2023 9th International Conference on Computer and Communications (ICCC). IEEE, 2394\u20132398."},{"key":"e_1_3_2_242_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Ma Yingwei","year":"2024","unstructured":"Yingwei Ma, Yue Liu, Yue Yu, Yuanliang Zhang, Yu Jiang, Changjian Wang, and Shanshan Li. 2024. At which training stage does code data help LLMs reasoning? In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=KIPJKST4gw"},{"key":"e_1_3_2_243_2","unstructured":"Ian Magnusson Akshita Bhagia Valentin Hofmann Luca Soldaini Ananya Harsh Jha Oyvind Tafjord Dustin Schwenk Evan Pete Walsh Yanai Elazar Kyle Lo et al. 2023. Paloma: A benchmark for evaluating language model fit. arXiv:2312.10523. Retrieved from https:\/\/arxiv.org\/abs\/2312.10523"},{"key":"e_1_3_2_244_2","unstructured":"Dakota Mahan Ryan Carlow Louis Castricato Nathan Cooper and Christian Laforte. [n.\u2009d.] Stable beluga models. Retrieved from https:\/\/huggingface.co\/stabilityai\/StableBeluga2"},{"key":"e_1_3_2_245_2","unstructured":"Vladimir Malinovskii Denis Mazur Ivan Ilin Denis Kuznedelev Konstantin Burlachenko Kai Yi Dan Alistarh and Peter Richtarik. 2024. PV-tuning: Beyond straight-through estimation for extreme LLM compression. arXiv:2405.14852. Retrieved from https:\/\/arxiv.org\/abs\/2405.14852"},{"key":"e_1_3_2_246_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.557"},{"key":"e_1_3_2_247_2","volume-title":"Workshop on Efficient Systems for Foundation Models II (ICML \u201924)","author":"Mehta Sachin","year":"2024","unstructured":"Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Seyed Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, et al. 2024. OpenELM: An efficient language model family with open training and inference framework. In Workshop on Efficient Systems for Foundation Models II (ICML \u201924)."},{"key":"e_1_3_2_248_2","doi-asserted-by":"crossref","unstructured":"Dheeraj Mekala Alex Nguyen and Jingbo Shang. 2024. Smaller language models are capable of selecting instruction-tuning training data for larger language models. arXiv:2402.10430. Retrieved from https:\/\/arxiv.org\/abs\/2402.10430","DOI":"10.18653\/v1\/2024.findings-acl.623"},{"key":"e_1_3_2_249_2","unstructured":"Xin Men Mingyu Xu Qingyu Zhang Bingning Wang Hongyu Lin Yaojie Lu Xianpei Han and Weipeng Chen. 2024. ShortGPT: Layers in large language models are more redundant than you expect. arXiv:2403.03853. Retrieved from https:\/\/arxiv.org\/abs\/2403.03853"},{"key":"e_1_3_2_250_2","unstructured":"Stephen Merity Caiming Xiong James Bradbury and Richard Socher. 2016. Pointer sentinel mixture models. arXiv:1609.07843. Retrieved from https:\/\/arxiv.org\/abs\/1609.07843"},{"key":"e_1_3_2_251_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.741"},{"key":"e_1_3_2_252_2","unstructured":"Go Min-Su. 2024. Deep Learning Bible - 8. Large Language Models. WikiDocs. Retrieved from https:\/\/wikidocs.net\/237419"},{"key":"e_1_3_2_253_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Mitchell Eric","year":"2024","unstructured":"Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, and Christopher D. Manning. 2024. An emulator for fine-tuning large language models using small language models. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=Eo7kv0sllr"},{"key":"e_1_3_2_254_2","unstructured":"Arindam Mitra Luciano Del Corro Shweti Mahajan Andres Codas Clarisse Simoes Sahaj Agarwal Xuxi Chen Anastasia Razdaibiedina Erik Jones Kriti Aggarwal et al. 2023. Orca 2: Teaching small language models how to reason. arXiv:2311.11045. Retrieved from https:\/\/arxiv.org\/abs\/2311.11045"},{"key":"e_1_3_2_255_2","first-page":"2775","volume-title":"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Mo Lingbo","year":"2024","unstructured":"Lingbo Mo, Boshi Wang, Muhao Chen, and Huan Sun. 2024. How trustworthy are Open-Source LLMs? An assessment under malicious demonstrations shows their vulnerabilities. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 Long Papers, Association for Computational Linguistics, 2775\u20132792."},{"key":"e_1_3_2_256_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Morris John Xavier","year":"2024","unstructured":"John Xavier Morris, Wenting Zhao, Justin T. Chiu, Vitaly Shmatikov, and Alexander M. Rush. 2024. Language model inversion. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=t9dWHpGkPj"},{"key":"e_1_3_2_257_2","unstructured":"Maximilian Mozes Xuanli He Bennett Kleinberg and Lewis D. Griffin. 2023. Use of LLMS for illicit purposes: Threats prevention measures and vulnerabilities. arXiv:2308.12833. Retrieved from https:\/\/arxiv.org\/abs\/2308.12833"},{"key":"e_1_3_2_258_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Muennighoff Niklas","year":"2025","unstructured":"Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Evan Pete Walsh, Oyvind Tafjord, Nathan Lambert, et al. 2025. OLMoE: Open mixture-of-experts language models. In The Thirteenth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=xXTkbTBmqq"},{"key":"e_1_3_2_259_2","unstructured":"Subhabrata Mukherjee Arindam Mitra Ganesh Jawahar Sahaj Agarwal Hamid Palangi and Ahmed Awadallah. 2023. Orca: Progressive learning from complex explanation traces of GPT-4. arXiv:2306.02707. Retrieved from https:\/\/arxiv.org\/abs\/2306.02707"},{"key":"e_1_3_2_260_2","unstructured":"Saurav Muralidharan Sharath Turuvekere Sreenivas Raviraj Joshi Marcin Chochowski Mostofa Patwary Mohammad Shoeybi Bryan Catanzaro Jan Kautz and Pavlo Molchanov. 2024. Compact language models via pruning and knowledge distillation. arXiv:2407.14679. Retrieved from https:\/\/arxiv.org\/abs\/2407.14679"},{"key":"e_1_3_2_261_2","unstructured":"Rithesh Murthy Liangwei Yang Juntao Tan Tulika Manoj Awalgaonkar Yilun Zhou Shelby Heinecke Sachin Desai Jason Wu Ran Xu Sarah Tan Jianguo Zhang Zhiwei Liu Shirley Kokane Zuxin Liu Ming Zhu Huan Wang Caiming Xiong and Silvio Savarese. 2024. MobileAIBench: Benchmarking LLMs and LMMs for on-device use cases. arXiv:240610290. Retrieved from https:\/\/arxiv.org\/abs\/2406.10290"},{"key":"e_1_3_2_262_2","unstructured":"Kalyan Nakka Jimmy Dani and Nitesh Saxena. 2024. Is on-device AI broken and exploitable? Assessing the trust and ethics in small language models. arXiv:2406.05364. Retrieved from https:\/\/arxiv.org\/abs\/2406.05364"},{"key":"e_1_3_2_263_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639187"},{"key":"e_1_3_2_264_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Nawrot Piotr","year":"2024","unstructured":"Piotr Nawrot, Adrian \u0141a\u0144cucki, Marcin Chochowski, David Tarjan, and Edoardo Ponti. 2024. Dynamic memory compression: Retrofitting LLMs for accelerated inference. In Forty-First International Conference on Machine Learning. Retrieved from https:\/\/openreview.net\/forum?id=tDRYrAkOB7"},{"key":"e_1_3_2_265_2","first-page":"4226","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING \u201924)","author":"Nguyen Thuat","year":"2024","unstructured":"Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, and Thien Huu Nguyen. 2024. CulturaX: A cleaned, enormous, and multilingual dataset for large language models in 167 languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING \u201924). Association for Computational Linguistics, 4226\u20134237."},{"key":"e_1_3_2_266_2","doi-asserted-by":"crossref","first-page":"49","DOI":"10.18653\/v1\/2023.wiesp-1.7","volume-title":"Proceedings of the Second Workshop on Information Extraction from Scientific Publications","author":"Nguyen Tuan Dung","year":"2023","unstructured":"Tuan Dung Nguyen, Yuan-Sen Ting, Ioana Ciuca, Charles O\u2019Neill, Ze-Chang Sun, Maja Jab\u0142o\u0144ska, Sandor Kruk, Ernest Perkowski, Jack Miller, Jason Jason Jingsh Li, et al. 2023. AstroLLaMA: Towards specialized foundation models in astronomy. In Proceedings of the Second Workshop on Information Extraction from Scientific Publications. Association for Computational Linguistics, 49\u201355."},{"key":"e_1_3_2_267_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.669"},{"key":"e_1_3_2_268_2","unstructured":"Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. arXiv:1901.04085. Retrieved from https:\/\/arxiv.org\/abs\/1901.04085"},{"key":"e_1_3_2_269_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.csi.2023.103766"},{"key":"e_1_3_2_270_2","unstructured":"OpenAI. 2024. GPT-4o mini: Advancing cost-efficient intelligence. Retrieved from https:\/\/openai.com\/index\/gpt-4o-mini-advancing-cost-efficient-intelligence\/. Accessed: July 18 2024."},{"key":"e_1_3_2_271_2","unstructured":"OpenAI. 2024. Hello GPT-4o. Retrieved from https:\/\/openai.com\/index\/hello-gpt-4o\/. Accessed: May 13 2024."},{"key":"e_1_3_2_272_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, (2022), 27730\u201327744.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_273_2","first-page":"47124","article-title":"Propagating knowledge updates to lms through distillation","volume":"36","author":"Padmanabhan Shankar","year":"2023","unstructured":"Shankar Padmanabhan, Yasumasa Onoe, Michael Zhang, Greg Durrett, and Eunsol Choi. 2023. Propagating knowledge updates to lms through distillation. Adv. Neural Inf. Process. Syst. 36 (2023), 47124\u201347142.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_274_2","unstructured":"Jupinder Parmar Shrimai Prabhumoye Joseph Jennings Mostofa Patwary Sandeep Subramanian Dan Su Chen Zhu Deepak Narayanan Aastha Jhunjhunwala Ayush Dattagupta et al. 2024. Nemotron-4 15B Technical Report. arXiv:2402.16819. Retrieved from https:\/\/arxiv.org\/abs\/2402.16819"},{"key":"e_1_3_2_275_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Paster Keiran","year":"2024","unstructured":"Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, and Jimmy Ba. 2024. OpenWebMath: An open dataset of High-Quality mathematical web text. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=jKHmjlpViu"},{"key":"e_1_3_2_276_2","unstructured":"Guilherme Penedo Hynek Kydl\u00ed\u010dek Loubna Ben Allal Anton Lozhkov Margaret Mitchell Colin Raffel Leandro Von Werra and Thomas Wolf. 2024. The fineweb datasets: Decanting the web for the finest text data at scale. arXiv:2406.17557. Retrieved from https:\/\/arxiv.org\/abs\/2406.17557"},{"key":"e_1_3_2_277_2","unstructured":"Guilherme Penedo Quentin Malartic Daniel Hesslow Ruxandra Cojocaru Alessandro Cappelli Hamza Alobeidli Baptiste Pannier Ebtesam Almazrouei and Julien Launay. 2023. The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data and web data only. arXiv:2306.01116. Retrieved from https:\/\/arxiv.org\/abs\/2306.01116"},{"key":"e_1_3_2_278_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.936"},{"key":"e_1_3_2_279_2","unstructured":"Baolin Peng Chunyuan Li Pengcheng He Michel Galley and Jianfeng Gao. 2023. Instruction tuning with GPT-4. arXiv:2304.03277. Retrieved from https:\/\/arxiv.org\/abs\/2304.03277"},{"key":"e_1_3_2_280_2","unstructured":"Zhiyuan Peng Xuyang Wu Qifan Wang and Yi Fang. 2023. Soft prompt tuning for augmenting dense retrieval with large language models. arXiv:2307.08303. Retrieved from https:\/\/arxiv.org\/abs\/2307.08303"},{"key":"e_1_3_2_281_2","first-page":"13387","volume-title":"Findings of ACL \u201923","author":"Perez Ethan","year":"2023","unstructured":"Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chenandet al. 2023. Discovering language model behaviors with Model-Written evaluations. In Findings of ACL \u201923. Association for Computational Linguistics, 13387\u201313434."},{"key":"e_1_3_2_282_2","unstructured":"Pascal Pfeiffer Philipp Singer and Yauhen Babakhin Gabor Fodor Nischay Dhankhar and Sri Satish Ambati. 2024. H2O-Danube3 Technical Report. arXiv:2407.09276. Retrieved from https:\/\/arxiv.org\/abs\/2407.09276"},{"key":"e_1_3_2_283_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.617"},{"key":"e_1_3_2_284_2","first-page":"606","article-title":"Efficiently scaling transformer inference","volume":"5","author":"Pope Reiner","year":"2023","unstructured":"Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. 2023. Efficiently scaling transformer inference. Proc. Mach. Learn. Syst. 5, 606\u2013624.","journal-title":"Proc. Mach. Learn. Syst"},{"key":"e_1_3_2_285_2","volume-title":"International Conference on Learning Representations","author":"Press Ofir","year":"2022","unstructured":"Ofir Press, Noah Smith, and Mike Lewis. 2022. Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=R8sQPpGCv0"},{"key":"e_1_3_2_286_2","unstructured":"Ruiyang Qin Jun Xia Zhenge Jia Meng Jiang Ahmed Abbasi Peipei Zhou Jingtong Hu and Yiyu Shi. 2023. Enabling on-device large language model personalization with self-supervised data selection and synthesis. arXiv:2311.12275. Retrieved from https:\/\/arxiv.org\/abs\/2311.12275"},{"key":"e_1_3_2_287_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Qin Yujia","year":"2024","unstructured":"Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2024. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_288_2","unstructured":"Haohao Qu Liangbo Ning Rui An Wenqi Fan Tyler Derr Hui Liu Xin Xu and Qing Li. 2024. A survey of Mamba. arXiv:2408.01129. Retrieved from https:\/\/arxiv.org\/abs\/2408.01129"},{"key":"e_1_3_2_289_2","unstructured":"Haohao Qu Yifeng Zhang Liangbo Ning Wenqi Fan and Qing Li. 2024. SSD4Rec: A structured state space duality model for efficient sequential recommendation. arXiv:2409.01192. Retrieved from https:\/\/arxiv.org\/abs\/2409.01192"},{"issue":"8","key":"e_1_3_2_290_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_291_2","first-page":"53728","article-title":"Direct preference optimization: Your language model is secretly a reward model","volume":"36","author":"Rafailov Rafael","year":"2023","unstructured":"Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. Adv. Neural Inf. Process. Syst. 36, (2023), 53728\u201353741.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_292_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_2_293_2","doi-asserted-by":"crossref","unstructured":"Mohammad Wali Ur Rahman Murad Mehrab Abrar Hunter Gibbons Copening Salim Hariri Sicong Shao Pratik Satam and Soheil Salehi. 2023. Quantized transformer language model implementations on edge devices. arXiv:2310.03971. Retrieved from https:\/\/arxiv.org\/abs\/2310.03971","DOI":"10.1109\/ICMLA58977.2023.00104"},{"key":"e_1_3_2_294_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_2_295_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00605"},{"key":"e_1_3_2_296_2","first-page":"15762","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics","author":"Ramesh Krithika","year":"2023","unstructured":"Krithika Ramesh, Arnav Chavan, Shrey Pandit, and Sunayana Sitaram. 2023. A comparative study on the impact of model compression techniques on fairness in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Vol. 1 Long Papers, Association for Computational Linguistics, 15762\u201315782. Retrieved from https:\/\/aclanthology.org\/2023.acl-long.878"},{"key":"e_1_3_2_297_2","doi-asserted-by":"publisher","DOI":"10.1145\/1540276.1540302"},{"key":"e_1_3_2_298_2","article-title":"Androidinthewild: A large-scale dataset for android device control","volume":"36","author":"Rawles Christopher","year":"2024","unstructured":"Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, and Timothy Lillicrap. 2024. Androidinthewild: A large-scale dataset for android device control. Adv. Neural Inf. Process. Syst. 36,. 2024.","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"e_1_3_2_299_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_3_2_300_2","unstructured":"Baptiste Roziere Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin et al. 2023. Code Llama: Open foundation models for code. arXiv:2308.12950. Retrieved from https:\/\/arxiv.org\/abs\/2308.12950"},{"key":"e_1_3_2_301_2","unstructured":"Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-Based Similarity Detection. Technical report Google."},{"key":"e_1_3_2_302_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474381"},{"key":"e_1_3_2_303_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.realm-1.14"},{"key":"e_1_3_2_304_2","unstructured":"Rico Sennrich Jannis Vamvas and Alireza Mohammadshahi. 2023. Mitigating hallucinations and off-target machine translation with source-contrastive and language-contrastive decoding. arXiv:2309.07098. Retrieved from https:\/\/arxiv.org\/abs\/2309.07098"},{"key":"e_1_3_2_305_2","unstructured":"Zeyang Sha and Yang Zhang. 2024. Prompt stealing attacks against large language models. arXiv:2402.12959. Retrieved from https:\/\/arxiv.org\/abs\/2402.12959"},{"key":"e_1_3_2_306_2","unstructured":"Yu Shang Yu Li Fengli Xu and Yong Li. 2024. Synergy-of-thoughts: Eliciting efficient reasoning in hybrid language models. arXiv:2402.02563. Retrieved from https:\/\/arxiv.org\/abs\/2402.02563"},{"key":"e_1_3_2_307_2","unstructured":"Yuzhang Shang Zhihang Yuan Qiang Wu and Zhen Dong. 2023. PB-LLM: Partially binarized large language models. arXiv:231000034. Retrieved from https:\/\/arxiv.org\/abs\/2310.00034"},{"key":"e_1_3_2_308_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10445737"},{"key":"e_1_3_2_309_2","unstructured":"Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell Samuel R. Bowman Newton Cheng Esin Durmus Zac Hatfield-Dodds Scott R. Johnston et al. 2023. Towards understanding sycophancy in language models. arXiv:2310.13548. Retrieved from https:\/\/arxiv.org\/abs\/2310.13548"},{"key":"e_1_3_2_310_2","unstructured":"Erfan Shayegani Md Abdullah Al Mamun Yu Fu Pedram Zaree Yue Dong and Nael Abu-Ghazaleh. 2023. Survey of vulnerabilities in large language models revealed by adversarial attacks. arXiv:2310.10844. Retrieved from https:\/\/arxiv.org\/abs\/2310.10844"},{"key":"e_1_3_2_311_2","unstructured":"Noam Shazeer. 2019. Fast transformer decoding: One write-head is all you need. arXiv:1911.02150. Retrieved from https:\/\/arxiv.org\/abs\/1911.02150"},{"key":"e_1_3_2_312_2","unstructured":"Noam Shazeer. 2020. Glu variants improve transformer. arXiv:2002.05202. Retrieved from https:\/\/arxiv.org\/abs\/2002.05202"},{"key":"e_1_3_2_313_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.112"},{"key":"e_1_3_2_314_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.582"},{"key":"e_1_3_2_315_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.929"},{"key":"e_1_3_2_316_2","first-page":"31094","volume-title":"International Conference on Machine Learning","author":"Sheng Ying","year":"2023","unstructured":"Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher R\u00e9, Ion Stoica, and Ce Zhang. 2023. Flexgen: High-throughput generative inference of large language models with a single gpu. In International Conference on Machine Learning. PMLR, 31094\u201331116."},{"key":"e_1_3_2_317_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3657683"},{"key":"e_1_3_2_318_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2019. Megatron-LM: Training multi-billion parameter language models using model parallelism. arXiv:1909.08053. Retrieved from https:\/\/arxiv.org\/abs\/1909.08053"},{"key":"e_1_3_2_319_2","unstructured":"Parshin Shojaee Kazem Meidani Shashank Gupta Amir Barati Farimani and Chandan K. Reddy. 2024. LLM-SR: Scientific equation discovery via programming with large language models. arXiv:2404.18400. Retrieved from https:\/\/arxiv.org\/abs\/2404.18400"},{"key":"e_1_3_2_320_2","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3601308"},{"key":"e_1_3_2_321_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06291-2"},{"key":"e_1_3_2_322_2","unstructured":"Daria Soboleva Faisal Al-Khateeb Robert Myers Jacob R. Steeves Joel Hestness and Nolan Dey. 2023. SlimPajama: A 627B token cleaned and deduplicated version of RedPajama. Retrieved from https:\/\/cerebras.ai\/blog\/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama; https:\/\/huggingface.co\/datasets\/cerebras\/SlimPajama-627B"},{"key":"e_1_3_2_323_2","unstructured":"Luca Soldaini Rodney Kinney Akshita Bhagia Dustin Schwenk David Atkinson Russell Authur Ben Bogin Khyathi Chandu Jennifer Dumas Yanai Elazar et al. 2024. Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research. arXiv:2402.00159. Retrieved from https:\/\/arxiv.org\/abs\/2402.00159"},{"key":"e_1_3_2_324_2","unstructured":"Sofia Eleni Spatharioti David M. Rothschild Daniel G. Goldstein and Jake M. Hofman. 2023. Comparing traditional and LLM-based search for consumer choice: A randomized experiment. arXiv:2307.03744. Retrieved from https:\/\/arxiv.org\/abs\/2307.03744"},{"key":"e_1_3_2_325_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.127063"},{"key":"e_1_3_2_326_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i17.29872"},{"key":"e_1_3_2_327_2","unstructured":"Lichao Sun Yue Huang Haoran Wang Siyuan Wu Qihui Zhang Chujie Gao Yixin Huang Wenhan Lyu Yixuan Zhang Xiner Li et al. 2024. TrustLLM: Trustworthiness in large language models. arXiv:2401.05561. Retrieved from https:\/\/arxiv.org\/abs\/2401.05561"},{"key":"e_1_3_2_328_2","volume-title":"Proceedings of the Twelfth International Conference on Learning Representations","author":"Sun Mingjie","year":"2024","unstructured":"Mingjie Sun, Zhuang Liu, Anna Bair, and J. Zico Kolter. 2024. A simple and effective pruning approach for large language models. In Proceedings of the Twelfth International Conference on Learning Representations. ICLR."},{"key":"e_1_3_2_329_2","unstructured":"Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang and Denny Zhou. 2020. Mobilebert: A compact task-agnostic BERT for resource-limited devices. arXiv:2004.02984. Retrieved from https:\/\/arxiv.org\/abs\/2004.02984"},{"key":"e_1_3_2_330_2","doi-asserted-by":"crossref","first-page":"9275","DOI":"10.18653\/v1\/2020.emnlp-main.746","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Swayamdipta Swabha","year":"2020","unstructured":"Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. 2020. Dataset cartography: Mapping and diagnosing datasets with training dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9275\u20139293."},{"key":"e_1_3_2_331_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1059"},{"key":"e_1_3_2_332_2","unstructured":"Jiejun Tan Zhicheng Dou Yutao Zhu Peidong Guo Kun Fang and Ji-Rong Wen. 2024. Small models big insights: leveraging slim proxy models to decide when and what to retrieve for LLMs. arXiv:2402.12052. Retrieved from https:\/\/arxiv.org\/abs\/2402.12052"},{"key":"e_1_3_2_333_2","unstructured":"Qiaoyu Tang Ziliang Deng Hongyu Lin Xianpei Han Qiao Liang Boxi Cao and Le Sun. 2023. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv:2306.05301. Retrieved from https:\/\/arxiv.org\/abs\/2306.05301"},{"key":"e_1_3_2_334_2","unstructured":"Xuemei Tang Jun Wang and Qi Su. 2024. Small language model is a good guide for large language model in Chinese entity relation extraction. arXiv:2402.14373. Retrieved from https:\/\/arxiv.org\/abs\/2402.14373"},{"key":"e_1_3_2_335_2","unstructured":"Yehui Tang Fangcheng Liu Yunsheng Ni Yuchuan Tian Zheyuan Bai Yi-Qi Hu Sichao Liu Shangling Jui Kai Han and Yunhe Wang. 2024. Rethinking optimization and architecture for tiny language models. arXiv:2402.02791. Retrieved from https:\/\/arxiv.org\/abs\/2402.02791"},{"key":"e_1_3_2_336_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An instruction-following LLaMA model. Retrieved from https:\/\/github.com\/tatsu-lab\/stanford_alpaca"},{"key":"e_1_3_2_337_2","unstructured":"Ross Taylor Marcin Kardas Guillem Cucurull Thomas Scialom Anthony Hartshorn Elvis Saravia Andrew Poulton Viktor Kerkez and Robert Stojnic. 2022. Galactica: A large language model for science. arXiv:2211.09085. Retrieved from https:\/\/arxiv.org\/abs\/2211.09085"},{"key":"e_1_3_2_338_2","unstructured":"CodeGemma Team. 2024. CodeGemma: Open code models based on Gemma. arXiv:2406.11409. Retrieved from https:\/\/arxiv.org\/abs\/2406.11409"},{"key":"e_1_3_2_339_2","unstructured":"Gemma Team Thomas Mesnard Cassidy Hardin Robert Dadashi Surya Bhupatiraju Shreya Pathak Laurent Sifre Morgane Rivi\u00e8re Mihir Sanjay Kale Juliette Love et al. 2024. Gemma: Open models based on gemini research and technology. arXiv:2403.08295. Retrieved from https:\/\/arxiv.org\/abs\/2403.08295"},{"key":"e_1_3_2_340_2","unstructured":"Morgane Riviere Shreya Pathak Pier Giuseppe Sessa Cassidy Hardin Surya Bhupatiraju L\u00e9onard Hussenot Thomas Mesnard Bobak Shahriari Alexandre Ram\u00e9 et al. 2024. Gemma 2: Improving open language models at a practical size. arXiv:2408.00118. Retrieved from https:\/\/arxiv.org\/abs\/2408.00118"},{"key":"e_1_3_2_341_2","unstructured":"TensorOpera Team. 2024. TensorOpera Unveils Fox Foundation Model: A Pioneering Small Language Model (SLM) for Cloud and Edge. Retrieved from https:\/\/blog.tensoropera.ai\/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants\/. Accessed: June 13 2024."},{"key":"e_1_3_2_342_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_2_343_2","unstructured":"Omkar Thawakar Ashmal Vayani Salman Khan Hisham Cholakal Rao M. Anwer Michael Felsberg Tim Baldwin Eric P. Xing and Fahad Shahbaz Khan. 2024. Mobillama: Towards accurate and lightweight fully transparent GPT. arXiv:2402.16840. Retrieved from https:\/\/arxiv.org\/abs\/2402.16840"},{"key":"e_1_3_2_344_2","unstructured":"Ye Tian Baolin Peng Linfeng Song Lifeng Jin Dian Yu Haitao Mi and Dong Yu. 2024. Toward self-improvement of LLMs via imagination searching and criticizing. arXiv:2404.12253. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:2404.12253."},{"key":"e_1_3_2_345_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_346_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_2_347_2","unstructured":"Jonathan Tow Marco Bellagente Dakota Mahan and Carlos Riquelme. 2024. StableLM 3B 4E1T. Retrieved from https:\/\/huggingface.co\/stabilityai\/stablelm-3b-4e1t"},{"key":"e_1_3_2_348_2","unstructured":"Trieu H. Trinh and Quoc V. Le. 2018. A simple method for commonsense reasoning. arXiv:1806.02847. Retrieved from https:\/\/arxiv.org\/abs\/1806.02847"},{"key":"e_1_3_2_349_2","unstructured":"Adina Trufinescu. 2024. Discover the New Multi-Lingual High-Quality Phi-3.5 SLMs. Retrieved from https:\/\/techcommunity.microsoft.com\/t5\/ai-azure-ai-services-blog\/discover-the-new-multi-lingual-high-quality-phi-3-5-slms\/ba-p\/4225280"},{"key":"e_1_3_2_350_2","unstructured":"Dennis Ulmer Martin Gubri Hwaran Lee Sangdoo Yun and Seong Joon Oh. 2024. Calibrating large language models using their generations only. arXiv:2403.05973. Retrieved from https:\/\/arxiv.org\/abs\/2403.05973"},{"key":"e_1_3_2_351_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-022-01713-6"},{"key":"e_1_3_2_352_2","unstructured":"Chien Van Nguyen Xuan Shen Ryan Aponte Yu Xia Samyadeep Basu Zhengmian Hu Jian Chen Mihir Parmar Sasidhar Kunapuli Joe Barrow et al. 2024. A survey of small language models. arXiv:2410.20011. Retrieved from https:\/\/arxiv.org\/abs\/2410.20011"},{"key":"e_1_3_2_353_2","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_354_2","first-page":"7360","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Veksler Olga","year":"2023","unstructured":"Olga Veksler. 2023. Test time adaptation with regularized loss for weakly supervised salient object detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7360\u20137369."},{"key":"e_1_3_2_355_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-57959-7"},{"key":"e_1_3_2_356_2","first-page":"238","volume-title":"International Conference on Neural Information Processing","author":"Wan Yuxian","year":"2023","unstructured":"Yuxian Wan, Wenlin Zhang, and Zhen Li. 2023. Multi-Task feature Self-Distillation for Semi-Supervised machine translation. In International Conference on Neural Information Processing. Springer, 238\u2013254."},{"key":"e_1_3_2_357_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-5446"},{"key":"e_1_3_2_358_2","volume-title":"Proceedings of the Annual Conference on Neural Information Processing Systems","author":"Wang Boxin","year":"2023","unstructured":"Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, et al. 2023. DecodingTrust: A comprehensive assessment of trustworthiness in GPT models. In Proceedings of the Annual Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_359_2","unstructured":"Boxin Wang Chejian Xu Shuohang Wang Zhe Gan Yu Cheng Jianfeng Gao Ahmed Hassan Awadallah and Bo Li. 2021. Adversarial glue: A multi-task benchmark for robustness evaluation of language models. arXiv:2111.02840. Retrieved from https:\/\/arxiv.org\/abs\/2111.02840"},{"key":"e_1_3_2_360_2","doi-asserted-by":"publisher","DOI":"10.1145\/3711896.3736563"},{"key":"e_1_3_2_361_2","unstructured":"Fali Wang Hui Liu Zhenwei Dai Jingying Zeng Zhiwei Zhang Zongyu Wu Chen Luo Zhen Li Xianfeng Tang Qi He et al. 2025. AgentTTS: Large language model agent for test-time compute-optimal scaling strategy in complex tasks. arXiv:2508.00890. Retrieved from https:\/\/arxiv.org\/abs\/2508.00890"},{"key":"e_1_3_2_362_2","doi-asserted-by":"publisher","unstructured":"Guan Wang Sijie Cheng Qiying Yu and Changling Liu. 2023. OpenLLMs: Less is More for Open-Source Models. DOI: 10.5281\/zenodo.8105775","DOI":"10.5281\/zenodo.8105775"},{"key":"e_1_3_2_363_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Wang Guan","year":"2024","unstructured":"Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, and Yang Liu. 2024. OpenChat: Advancing open-source language models with Mixed-Quality data. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=AOJyfhWYHf"},{"key":"e_1_3_2_364_2","unstructured":"Hongyu Wang Shuming Ma Li Dong Shaohan Huang Huaijie Wang Lingxiao Ma Fan Yang Ruiping Wang Yi Wu and Furu Wei. 2023. Bitnet: Scaling 1-bit transformers for large language models. arXiv:2310.11453. Retrieved from https:\/\/arxiv.org\/abs\/2310.11453"},{"key":"e_1_3_2_365_2","volume-title":"ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models","author":"Wang Jindong","unstructured":"Jindong Wang, H. U. Xixu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Wei Ye, Haojun Huang, Xiubo Geng, et al. n.d. On the robustness of ChatGPT: An adversarial and out-of-distribution perspective. In ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models."},{"key":"e_1_3_2_366_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.642"},{"key":"e_1_3_2_367_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.304"},{"key":"e_1_3_2_368_2","unstructured":"Wenxiao Wang Wei Chen Yicong Luo Yongliu Long Zhengkai Lin Liye Zhang Binbin Lin Deng Cai and Xiaofei He. 2024. Model compression and efficient inference for large language models: A survey. arXiv:2402.09748. Retrieved from https:\/\/arxiv.org\/abs\/2402.09748"},{"key":"e_1_3_2_369_2","unstructured":"Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang and Ming Zhou. 2020. MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. arXiv:2002.10957. Retrieved from https:\/\/arxiv.org\/abs\/2002.10957"},{"key":"e_1_3_2_370_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Wang Xiaoxuan","year":"2024","unstructured":"Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, and Wei Wang. 2024. SciBench: Evaluating College-Level scientific Problem-Solving abilities of large language models. In Forty-First International Conference on Machine Learning. Retrieved from https:\/\/openreview.net\/forum?id=bq1JEgioLr"},{"key":"e_1_3_2_371_2","first-page":"896","volume-title":"Findings of the Association for Computational Linguistics: EACL \u201924","author":"Wang Yuxia","year":"2024","unstructured":"Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, and Timothy Baldwin. 2024. Do-Not-Answer: Evaluating Safeguards in LLMs. In Findings of the Association for Computational Linguistics: EACL \u201924. Association for Computational Linguistics, 896\u2013911. Retrieved from https:\/\/aclanthology.org\/2024.findings-eacl.61"},{"key":"e_1_3_2_372_2","doi-asserted-by":"crossref","first-page":"10303","DOI":"10.18653\/v1\/2023.findings-emnlp.691","volume-title":"Findings of the Association for Computational Linguistics: EMNLP \u201923","author":"Wang Yile","year":"2023","unstructured":"Yile Wang, Peng Li, Maosong Sun, and Yang Liu. 2023. Self-Knowledge guided retrieval augmentation for large language models. In Findings of the Association for Computational Linguistics: EMNLP \u201923. Association for Computational Linguistics, 10303\u201310315."},{"key":"e_1_3_2_373_2","unstructured":"Yubo Wang Xueguang Ma and Wenhu Chen. 2023. Augmenting black-box LLMS with medical textbooks for clinical question answering. arXiv:2309.02233. Retrieved from https:\/\/arxiv.org\/abs\/2309.02233"},{"key":"e_1_3_2_374_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.340"},{"key":"e_1_3_2_375_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539073"},{"key":"e_1_3_2_376_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3645671"},{"key":"e_1_3_2_377_2","unstructured":"Yuqing Wang and Yun Zhao. 2024. RUPBench: Benchmarking reasoning under perturbations for robustness evaluation in large language models. arXiv:2406.11020. Retrieved from https:\/\/arxiv.org\/abs\/2406.11020"},{"key":"e_1_3_2_378_2","volume-title":"International Conference on Learning Representations","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. Finetuned language models are Zero-Shot learners. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=gEZrGCozdqR"},{"key":"e_1_3_2_379_2","unstructured":"Yuxiang Wei Zhe Wang Jiawei Liu Yifeng Ding and Lingming Zhang. 2023. Magicoder: Source code is all you need. arXiv:2312.02120. Retrieved from https:\/\/arxiv.org\/abs\/2312.02120"},{"key":"e_1_3_2_380_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.210"},{"key":"e_1_3_2_381_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4413"},{"key":"e_1_3_2_382_2","doi-asserted-by":"crossref","unstructured":"Hao Wen Yuanchun Li Guohong Liu Shanhui Zhao Tao Yu Toby Jia-Jun Li Shiqi Jiang Yunhao Liu Yaqin Zhang and Yunxin Liu. 2024. AutoDroid: LLM-powered task automation in android. arXiv:2308.15272. Retrieved from https:\/\/arxiv.org\/abs\/2308.15272","DOI":"10.1145\/3636534.3649379"},{"key":"e_1_3_2_383_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Wettig Alexander","year":"2024","unstructured":"Alexander Wettig, Aatmik Gupta, Saumya Malik, and Danqi Chen. 2024. QuRating: Selecting High-Quality data for training language models. In Forty-First International Conference on Machine Learning. Retrieved from https:\/\/openreview.net\/forum?id=GLGYYqPwjy"},{"key":"e_1_3_2_384_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1101"},{"key":"e_1_3_2_385_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463069"},{"key":"e_1_3_2_386_2","first-page":"944","volume-title":"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Wu","year":"2024","unstructured":"Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, and Alham Fikri Aji. 2024. Minghao LaMini-LM: A diverse herd of distilled models from Large-Scale instructions. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Yvette Graham and Matthew Purver (Eds.). Association for Computational Linguistics, 944\u2013964. Retrieved from https:\/\/aclanthology.org\/2024.eacl-long.57"},{"key":"e_1_3_2_387_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-naacl.2"},{"key":"e_1_3_2_388_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3645494"},{"key":"e_1_3_2_389_2","unstructured":"Zhuofeng Wu He Bai Aonan Zhang Jiatao Gu V. G. Vydiswaran Navdeep Jaitly and Yizhe Zhang. 2024. Divide-or-conquer? Which part should you distill your LLM? arXiv:2402.15000. Retrieved from https:\/\/arxiv.org\/abs\/2402.15000"},{"key":"e_1_3_2_390_2","unstructured":"Nuwa Xi Yuhan Chen Sendong Zhao Haochun Wang Bing Qin and Ting Liu. 2024. AS-ES learning: Towards efficient CoT learning in small models. arXiv:2403.01969. Retrieved from https:\/\/arxiv.org\/abs\/2403.01969"},{"key":"e_1_3_2_391_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Xia Mengzhou","year":"2024","unstructured":"Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng, and Danqi Chen. 2024. Sheared LLaMA: Accelerating language model pre-training via structured pruning. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=09iOdaeOzp"},{"key":"e_1_3_2_392_2","first-page":"38087","volume-title":"International Conference on Machine Learning","author":"Xiao Guangxuan","year":"2023","unstructured":"Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. 2023. Smoothquant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning. PMLR, 38087\u201338099."},{"key":"e_1_3_2_393_2","unstructured":"Tinghao Xie Xiangyu Qi Yi Zeng Yangsibo Huang Udari Madhushani Sehwag Kaixuan Huang Luxi He Boyi Wei Dacheng Li Ying Sheng et al. 2024. Sorry-bench: Systematically evaluating large language model safety refusal behaviors. arXiv:2406.14598. Retrieved from https:\/\/arxiv.org\/abs\/2406.14598"},{"key":"e_1_3_2_394_2","unstructured":"Tong Xie Yuwei Wan Wei Huang Zhenyu Yin Yixuan Liu Shaozhou Wang Qingyuan Linghu Chunyu Kit Clara Grazian Wenjie Zhang et al. 2023. Darwin series: Domain specific large language models for natural science. arXiv:2308.13565. Retrieved from https:\/\/arxiv.org\/abs\/2308.13565"},{"key":"e_1_3_2_395_2","unstructured":"Weikai Xie Li Zhang Shihe Wang Rongjie Yi and Mengwei Xu. 2024. DroidCall: A dataset for LLM-powered android intent invocation. arXiv:2412.00402. Retrieved from https:\/\/arxiv.org\/abs\/2412.00402"},{"key":"e_1_3_2_396_2","unstructured":"Xuan Xie Jiayang Song Zhehua Zhou Yuheng Huang Da Song and Lei Ma. 2024. Online safety analysis for LLMs: A benchmark an assessment and a path forward. arXiv:2404.08517. Retrieved from https:\/\/arxiv.org\/abs\/2404.08517"},{"key":"e_1_3_2_397_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.8"},{"key":"e_1_3_2_398_2","unstructured":"Can Xu Qingfeng Sun Kai Zheng Xiubo Geng Pu Zhao Jiazhan Feng Chongyang Tao and Daxin Jiang. 2023. Wizardlm: Empowering large language models to follow complex instructions. arXiv:2304.12244. Retrieved from https:\/\/arxiv.org\/abs\/2304.12244"},{"key":"e_1_3_2_399_2","unstructured":"Canwen Xu Yichong Xu Shuohang Wang Yang Liu Chenguang Zhu and Julian McAuley. 2023. Small models are valuable plug-ins for large language models. arXiv:2305.08848. Retrieved from https:\/\/arxiv.org\/abs\/2305.08848"},{"key":"e_1_3_2_400_2","unstructured":"Daliang Xu Wangsong Yin Xin Jin Ying Zhang Shiyun Wei Mengwei Xu and Xuanzhe Liu. 2023. LLMCad: Fast and scalable on-device large language model inference. arXiv:2309.04255. Retrieved from https:\/\/arxiv.org\/abs\/2309.04255"},{"key":"e_1_3_2_401_2","unstructured":"Daliang Xu Hao Zhang Liming Yang Ruiqi Liu Gang Huang Mengwei Xu and Xuanzhe Liu. 2024. Empowering 1000 tokens\/second on-device LLM prefilling with MLLM-NPU. arXiv:2407.05858. Retrieved from https:\/\/arxiv.org\/abs\/2407.05858"},{"key":"e_1_3_2_402_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W15-1509"},{"key":"e_1_3_2_403_2","unstructured":"Minrui Xu Niyato Dusit Jiawen Kang Zehui Xiong Shiwen Mao Zhu Han Dong In Kim and Khaled B. Letaief. 2024. When large language model agents meet 6G networks: Perception grounding and alignment. arXiv:2401.07764. Retrieved from https:\/\/arxiv.org\/abs\/2401.07764"},{"key":"e_1_3_2_404_2","unstructured":"Mengwei Xu Wangsong Yin Dongqi Cai Rongjie Yi Daliang Xu Qipeng Wang Bingyang Wu Yihao Zhao Chen Yang Shihe Wang et al. 2024. A survey of resource-efficient llm and multimodal foundation models. arXiv:2401.08092. Retrieved from https:\/\/arxiv.org\/abs\/2401.08092"},{"key":"e_1_3_2_405_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00563"},{"key":"e_1_3_2_406_2","unstructured":"Yuzhuang Xu Xu Han Zonghan Yang Shuo Wang Qingfu Zhu Zhiyuan Liu Weidong Liu and Wanxiang Che. 2024. OneBit: Towards extremely low-bit large language models. arXiv:2402.11295. Retrieved from https:\/\/arxiv.org\/abs\/2402.11295"},{"key":"e_1_3_2_407_2","doi-asserted-by":"crossref","unstructured":"Shi-Qi Yan Jia-Chen Gu Yun Zhu and Zhen-Hua Ling. 2024. Corrective retrieval augmented generation. arXiv:2401.15884. Retrieved from https:\/\/arxiv.org\/abs\/2401.15884","DOI":"10.2139\/ssrn.5267341"},{"key":"e_1_3_2_408_2","unstructured":"An Yang Baosong Yang Binyuan Hui Bo Zheng Bowen Yu Chang Zhou Chengpeng Li Chengyuan Li Dayiheng Liu Fei Huang et al. 2024. Qwen2 Technical Report. arXiv:2407.10671. Retrieved from https:\/\/arxiv.org\/abs\/2407.10671"},{"key":"e_1_3_2_409_2","unstructured":"Chuanpeng Yang Wang Lu Yao Zhu Yidong Wang Qian Chen Chenlong Gao Bingjie Yan and Yiqiang Chen. 2024. Survey on knowledge distillation for large language models: Methods evaluation and application. arXiv:2407.01885. Retrieved from https:\/\/arxiv.org\/abs\/2407.01885"},{"key":"e_1_3_2_410_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.612"},{"key":"e_1_3_2_411_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Yang Kevin","year":"2024","unstructured":"Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, and Yuandong Tian. 2024. RLCD: Reinforcement learning from contrastive distillation for LM alignment. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=v3XXtxWKi6"},{"key":"e_1_3_2_412_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3648137"},{"key":"e_1_3_2_413_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.806"},{"key":"e_1_3_2_414_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Yang Linyi","year":"2024","unstructured":"Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, and Yue Zhang. 2024. Supervised knowledge makes large language models better In-context learners. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=bAMPOUF227"},{"key":"e_1_3_2_415_2","unstructured":"Runming Yang Taiqiang Wu Jiahao Wang Pengfei Hu Ngai Wong and Yujiu Yang. 2024. LLM-Neo: Parameter efficient knowledge distillation for large language models. arXiv:2411.06839. Retrieved from https:\/\/arxiv.org\/abs\/2411.06839"},{"key":"e_1_3_2_416_2","doi-asserted-by":"crossref","unstructured":"Yifei Yang Zouying Cao and Hai Zhao. 2024. Laco: Large language model pruning via layer collapse. arXiv:2402.11187. Retrieved from https:\/\/arxiv.org\/abs\/2402.11187","DOI":"10.18653\/v1\/2024.findings-emnlp.372"},{"key":"e_1_3_2_417_2","volume-title":"The Thirty-Eighth Annual Conference on Neural Information Processing Systems","author":"Yang Yu","year":"2024","unstructured":"Yu Yang, Siddhartha Mishra, Jeffrey N. Chiang, and Baharan Mirzasoleiman. 2024. SmallToLarge (S2L): Scalable data selection for fine-tuning large language models by summarizing training trajectories of small models. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems. Retrieved from https:\/\/openreview.net\/forum?id=K9IGlMQpif"},{"key":"e_1_3_2_418_2","doi-asserted-by":"crossref","unstructured":"Yizhe Yang Huashan Sun Jiawei Li Runheng Liu Yinghao Li Yuhang Liu Heyan Huang and Yang Gao. 2023. Mindllm: Pre-training lightweight large language model from scratch evaluations and domain applications. arXiv:2310.15777. Retrieved from https:\/\/arxiv.org\/abs\/2310.15777","DOI":"10.2139\/ssrn.4710644"},{"key":"e_1_3_2_419_2","unstructured":"Zhou Yang Zhaochun Ren Wang Yufeng Shizhong Peng Haizhou Sun Xiaofei Zhu and Xiangwen Liao. 2024. Enhancing empathetic response generation by augmenting LLMs with small-scale empathetic models. arXiv:2402.11801. Retrieved from https:\/\/arxiv.org\/abs\/2402.11801"},{"key":"e_1_3_2_420_2","unstructured":"Yunzhi Yao Shaohan Huang Wenhui Wang Li Dong and Furu Wei. 2021. Adapt-and-distill: Developing small fast and effective pretrained language models for domains. arXiv:2106.13474. Retrieved from https:\/\/arxiv.org\/abs\/2106.13474."},{"key":"e_1_3_2_421_2","unstructured":"Mert Yazan Suzan Verberne and Frederik Situmeang. 2024. The impact of quantization on retrieval-augmented generation: An analysis of small LLMs. arXiv:2406.10251. Retrieved from https:\/\/arxiv.org\/abs\/2406.10251"},{"key":"e_1_3_2_422_2","unstructured":"Rongjie Yi Liwei Guo Shiyun Wei Ao Zhou Shangguang Wang and Mengwei Xu. 2023. EdgeMoE: Fast on-device inference of MoE-based large language models. arXiv:2308.14352. Retrieved from https:\/\/arxiv.org\/abs\/2308.14352"},{"key":"e_1_3_2_423_2","unstructured":"Rongjie Yi Xiang Li Weikai Xie Zhenyan Lu Chenghua Wang Ao Zhou Shangguang Wang Xiwen Zhang and Mengwei Xu. 2024. PhoneLM: An efficient and capable small language model family through principled pre-training. arXiv:2411.05046. Retrieved from https:\/\/arxiv.org\/abs\/2411.05046"},{"key":"e_1_3_2_424_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-2033"},{"key":"e_1_3_2_425_2","unstructured":"Wangsong Yin Mengwei Xu Yuanchun Li and Xuanzhe Liu. 2024. LLM as a system service on mobile devices. arXiv:240311805. Retrieved from https:\/\/arxiv.org\/abs\/2403.11805"},{"key":"e_1_3_2_426_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Yu Longhui","year":"2024","unstructured":"Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. 2024. MetaMath: Bootstrap your own mathematical questions for large language models. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=N8N0hgNDRt"},{"key":"e_1_3_2_427_2","unstructured":"Yue Yu Wei Ping Zihan Liu Boxin Wang Jiaxuan You Chao Zhang Mohammad Shoeybi and Bryan Catanzaro. 2024. Rankrag: Unifying context ranking with retrieval-augmented generation in LLMS. arXiv:2407.02485. Retrieved from https:\/\/arxiv.org\/abs\/2407.02485"},{"key":"e_1_3_2_428_2","unstructured":"Zhongzhi Yu Zheng Wang Yuhan Li Haoran You Ruijie Gao Xiaoya Zhou Sreenidhi Reedy Bommu Yang Katie Zhao and Yingyan Celine Lin. 2024. EDGE-LLM: Enabling efficient large language model adaptation on edge devices via layerwise unified compression and adaptive layer tuning and voting. arXiv:2406.15758. Retrieved from https:\/\/arxiv.org\/abs\/2406.15758"},{"key":"e_1_3_2_429_2","doi-asserted-by":"publisher","DOI":"10.1145\/3636534.3649361"},{"key":"e_1_3_2_430_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.06.001"},{"key":"e_1_3_2_431_2","unstructured":"Xiaohan Yuan Jinfeng Li Dongxia Wang Yuefeng Chen Xiaofeng Mao Longtao Huang Hui Xue Wenhai Wang Kui Ren and Jingyi Wang. 2024. S-Eval: Automatic and adaptive test generation for benchmarking safety evaluation of large language models. arXiv:2405.14191. Retrieved from https:\/\/arxiv.org\/abs\/2405.14191"},{"key":"e_1_3_2_432_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Yuan Youliang","year":"2024","unstructured":"Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen Tse Huang, Pinjia He, Shuming Shi, and Zhaopeng Tu. 2024. GPT-4 is too smart to Be safe: Stealthy chat with LLMs via cipher. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=MbfAK4s61A."},{"key":"e_1_3_2_433_2","unstructured":"Shengbin Yue Wei Chen Siyuan Wang Bingxuan Li Chenchen Shen Shujun Liu Yuxuan Zhou Yao Xiao Song Yun Wei Lin et al. 2023. Disc-lawllm: Fine-tuning large language models for intelligent legal services. arXiv:2309.11325. Retrieved from https:\/\/arxiv.org\/abs\/2309.11325"},{"key":"e_1_3_2_434_2","unstructured":"Xiang Yue Xingwei Qu Ge Zhang Yao Fu Wenhao Huang Huan Sun Yu Su and Wenhu Chen. 2024. Mammoth: Building math generalist models through hybrid instruction tuning. In The Twelfth International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=yLClGs770I"},{"key":"e_1_3_2_435_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1472"},{"key":"e_1_3_2_436_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455099"},{"key":"e_1_3_2_437_2","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455397"},{"key":"e_1_3_2_438_2","unstructured":"Cheng Zhang Jianyi Cheng George A. Constantinides and Yiren Zhao. 2024. LQER: low-rank quantization error reconstruction for LLMs. arXiv:2402.02446. Retrieved from https:\/\/arxiv.org\/abs\/2402.02446"},{"key":"e_1_3_2_439_2","doi-asserted-by":"crossref","unstructured":"Collin Zhang John X. Morris and Vitaly Shmatikov. 2024. Extracting prompts by inverting LLM outputs. arXiv:2405.15012. Retrieved from https:\/\/arxiv.org\/abs\/2405.15012","DOI":"10.18653\/v1\/2024.emnlp-main.819"},{"key":"e_1_3_2_440_2","unstructured":"Chen Zhang Dawei Song Zheyu Ye and Yan Gao. 2023. Towards the law of capacity gap in distilling language models. arXiv:2311.07052. Retrieved from https:\/\/arxiv.org\/abs\/2311.07052"},{"key":"e_1_3_2_441_2","unstructured":"Dan Zhang Ziniu Hu Sining Zhoubian Zhengxiao Du Kaiyu Yang Zihan Wang Yisong Yue Yuxiao Dong and Jie Tang. 2024. Sciglm: Training scientific language models with self-reflective instruction annotation and tuning. arXiv:2401.07950. Retrieved from https:\/\/arxiv.org\/abs\/2401.07950"},{"key":"e_1_3_2_442_2","unstructured":"Di Zhang Wei Liu Qian Tan Jingdan Chen Hang Yan Yuliang Yan Jiatong Li Weiran Huang Xiangyu Yue Dongzhan Zhou et al. 2024. ChemLLM: A chemical large language model. arXiv:2402.06852. Retrieved from https:\/\/arxiv.org\/abs\/2402.06852"},{"key":"e_1_3_2_443_2","doi-asserted-by":"crossref","unstructured":"Kaiyan Zhang Jianyu Wang Ermo Hua Biqing Qi Ning Ding and Bowen Zhou. 2024. Cogenesis: A framework collaborating large and small language models for secure context-aware instruction following. arXiv:2403.03129. Retrieved from https:\/\/arxiv.org\/abs\/2403.03129","DOI":"10.18653\/v1\/2024.acl-long.235"},{"key":"e_1_3_2_444_2","unstructured":"Mingyang Zhang Hao Chen Chunhua Shen Zhen Yang Linlin Ou Xinyi Yu and Bohan Zhuang. 2023. Loraprune: Pruning meets low-rank parameter-efficient fine-tuning. arXiv:2305.18403. Retrieved from https:\/\/arxiv.org\/abs\/2305.18403"},{"key":"e_1_3_2_445_2","unstructured":"Peiyuan Zhang Guangtao Zeng Tianduo Wang and Wei Lu. 2024. TinyLlama: An open-source small language model. arXiv:240102385. Retrieved from https:\/\/arxiv.org\/abs\/2401.02385"},{"key":"e_1_3_2_446_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin et al. 2022. Opt: Open pre-trained transformer language models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_2_447_2","doi-asserted-by":"crossref","first-page":"2612","DOI":"10.1145\/3343031.3356070","volume-title":"Proceedings of the 27th ACM International Conference on Multimedia","author":"Zhang Xinran","year":"2019","unstructured":"Xinran Zhang, Xin Yuan, Yunwei Li, and Yanru Zhang. 2019. Cold-Start representation learning: A recommendation approach with bert4Movie and movie2Vec. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, 2612\u20132616."},{"key":"e_1_3_2_448_2","volume-title":"The Twelfth International Conference on Learning Representations","author":"Zhang Yingtao","year":"2024","unstructured":"Yingtao Zhang, Haoli Bai, Haokun Lin, Jialin Zhao, Lu Hou, and Carlo Vittorio Cannistraci. 2024. Plug-and-play: An efficient post-training pruning method for large language models. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_2_449_2","unstructured":"Yiming Zhang Nicholas Carlini and Daphne Ippolito. 2024. Effective prompt extraction from language models. arXiv:230706865. Retrieved from https:\/\/arxiv.org\/abs\/2307.06865"},{"key":"e_1_3_2_450_2","doi-asserted-by":"publisher","DOI":"10.5555\/3666122.3667628"},{"key":"e_1_3_2_451_2","volume-title":"Forty-First International Conference on Machine Learning","author":"Zhao Bowen","year":"2024","unstructured":"Bowen Zhao, Hannaneh Hajishirzi, and Qingqing Cao. 2024. APT: Adaptive pruning and tuning pretrained language models for efficient training and inference. In Forty-First International Conference on Machine Learning."},{"key":"e_1_3_2_452_2","unstructured":"Junchen Zhao Yurun Song Simeng Liu Ian G. Harris and Sangeetha Abdu Jyothi. 2023. LinguaLinked: A distributed large language model inference system for mobile devices. arXiv:2312.00388. Retrieved from https:\/\/arxiv.org\/abs\/2312.00388"},{"key":"e_1_3_2_453_2","doi-asserted-by":"crossref","unstructured":"Juntao Zhao Borui Wan Yanghua Peng Haibin Lin and Chuan Wu. 2024. LLM-PQ: Serving LLM on heterogeneous clusters with phase-aware partition and adaptive quantization. arXiv:2403.01136. Retrieved from https:\/\/arxiv.org\/abs\/2403.01136","DOI":"10.1145\/3627535.3638480"},{"key":"e_1_3_2_454_2","doi-asserted-by":"crossref","unstructured":"Kun Zhao Bohao Yang Chen Tang Chenghua Lin and Liang Zhan. 2024. SLIDE: A framework integrating small and large language models for open-domain dialogues evaluation. arXiv:2405.15924. Retrieved from https:\/\/arxiv.org\/abs\/2405.15924","DOI":"10.18653\/v1\/2024.findings-acl.911"},{"key":"e_1_3_2_455_2","unstructured":"Theodore Zhao Mu Wei J. Samuel Preston and Hoifung Poon. 2023. Automatic calibration and error correction for large language models via Pareto optimal self-supervision. arXiv:2306.16564. Retrieved from https:\/\/arxiv.org\/abs\/2306.16564"},{"key":"e_1_3_2_456_2","unstructured":"Youpeng Zhao Ming Lin Huadong Tang Qiang Wu and Jun Wang. 2024. Merino: entropy-driven design for generative language models on IoT devices. arXiv:240307921. Retrieved from https:\/\/arxiv.org\/abs\/2403.07921"},{"key":"e_1_3_2_457_2","unstructured":"Zhengyun Zhao Qiao Jin Fangyuan Chen Tuorui Peng and Sheng Yu. 2022. PMC-patients: A large-scale dataset of patient summaries and relations for benchmarking retrieval-based clinical decision support systems. arXiv:2202.13876. Retrieved from https:\/\/arxiv.org\/abs\/2202.13876"},{"key":"e_1_3_2_458_2","volume-title":"The Thirty-Eighth Annual Conference on Neural Information Processing Systems","author":"Zhou Zhanhui","year":"2024","unstructured":"Zhanhui Zhou, Zhixuan Liu, Jie Liu, Zhichen Dong, Chao Yang, and Yu Qiao. 2024. Weak-to-strong search: Align large language models via searching over small language models. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems. Retrieved from https:\/\/openreview.net\/forum?id=dOJ6CqWDf1"},{"key":"e_1_3_2_459_2","first-page":"3277","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing","author":"Zhu Fengbin","year":"2021","unstructured":"Fengbin Zhu, Wenqiang Lei, Youcheng Huang, Chao Wang, Shuo Zhang, Jiancheng Lv, Fuli Feng, and Tat-Seng Chua. 2021. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. 1 Long Papers, Association for Computational Linguistics, 3277\u20133287."},{"key":"e_1_3_2_460_2","unstructured":"Jiachen Zhu Jianghao Lin Xinyi Dai Bo Chen Rong Shan Jieming Zhu Ruiming Tang Yong Yu and Weinan Zhang. 2024. Lifelong personalized low-rank adaptation of large language models for recommendation. arXiv:2408.03533. Retrieved from https:\/\/arxiv.org\/abs\/2408.03533"},{"key":"e_1_3_2_461_2","unstructured":"Kaijie Zhu Jindong Wang Jiaheng Zhou Zichen Wang Hao Chen Yidong Wang Linyi Yang Wei Ye Yue Zhang Neil Zhenqiang Gong et al. 2023. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv:2306.04528. Retrieved from https:\/\/arxiv.org\/abs\/2306.04528"},{"key":"e_1_3_2_462_2","unstructured":"Xunyu Zhu Jian Li Yong Liu Can Ma and Weiping Wang. 2023. A survey on model compression for large language models. arXiv:2308.07633. Retrieved from https:\/\/arxiv.org\/abs\/2308.07633"},{"key":"e_1_3_2_463_2","unstructured":"Yun Zhu Yinxiao Liu Felix Stahlberg Shankar Kumar Yu Hui Chen Liangchen Luo Lei Shu Renjie Liu Jindong Chen and Lei Meng. 2023. Towards an on-device agent for text rewriting. arXiv:2308.11807. Retrieved from https:\/\/arxiv.org\/abs\/2308.11807"},{"key":"e_1_3_2_464_2","doi-asserted-by":"publisher","DOI":"10.3390\/su13148039"},{"key":"e_1_3_2_465_2","unstructured":"Terry Yue Zhuo Yujin Huang Chunyang Chen and Zhenchang Xing. 2023. Red teaming chatGPT via jailbreaking: Bias robustness reliability and toxicity. arXiv:2301.12867. Retrieved from https:\/\/arxiv.org\/abs\/2301.12867"},{"key":"e_1_3_2_466_2","unstructured":"Andy Zou Zifan Wang Nicholas Carlini Milad Nasr Zico Kolter and Matt Fredrikson. 2023. J Universal and transferable adversarial attacks on aligned language models. arXiv:2307.15043. Retrieved from https:\/\/arxiv.org\/abs\/2307.15043"},{"key":"e_1_3_2_467_2","doi-asserted-by":"publisher","DOI":"10.1145\/3568681"},{"key":"e_1_3_2_468_2","unstructured":"Jingwei Zuo Maksim Velikanov Dhia Eddine Rhaiem Ilyas Chahed Younes Belkada Guillaume Kunsch and Hakim Hacid. 2024. Falcon Mamba: The first competitive attention-free 7B language model. arXiv:2410.05355. Retrieved from https:\/\/arxiv.org\/abs\/2410.05355"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3768165","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T15:11:47Z","timestamp":1763997107000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3768165"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,24]]},"references-count":467,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3768165"],"URL":"https:\/\/doi.org\/10.1145\/3768165","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,24]]},"assertion":[{"value":"2025-01-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}