{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T01:02:05Z","timestamp":1771549325094,"version":"3.50.1"},"reference-count":206,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:00:00Z","timestamp":1771459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:00:00Z","timestamp":1771459200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Front. Comput. Sci."],"published-print":{"date-parts":[[2026,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Recently, the reasoning capabilities of Large Reasoning Models (LRMs), such as DeepSeek-R1, have witnessed significant advancements through computationally intensive \u201cslow thinking\u201d processes. These models have demonstrated impressive performance across a variety of complex reasoning tasks. However, despite their remarkable success, LRMs come with substantial computational demands that pose considerable challenges in terms of resource consumption, scalability, and accessibility. In contrast, Small Reasoning Models (SRMs), which are often distilled from larger models, offer a more efficient alternative while still achieving competitive performance. Beyond their efficiency, SRMs frequently exhibit distinct capabilities and cognitive trajectories compared with their larger counterparts, making them particularly interesting from both practical and theoretical perspectives. In this work, we provide a timely and comprehensive survey of recently published research focused on SRMs. We first review the current landscape of SRMs. Then, we analyze diverse training paradigms and inference techniques tailored to enhance the reasoning capabilities of SRMs. Furthermore, we offer an extensive review of domain-specific applications where SRMs have been effectively leveraged. Finally, we discuss promising future research directions that aim to bridge existing gaps. By consolidating recent advances, this survey serves as an essential reference for researchers and practitioners interested in leveraging or developing SRMs to unlock advanced reasoning functionalities with improved efficiency.<\/jats:p>","DOI":"10.1007\/s11704-025-50990-0","type":"journal-article","created":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T22:33:33Z","timestamp":1771540413000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A short survey on small reasoning models: training, inference, applications, and research directions"],"prefix":"10.1007","volume":"20","author":[{"given":"Chengyu","family":"Wang","sequence":"first","affiliation":[]},{"given":"Taolin","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Richang","family":"Hong","sequence":"additional","affiliation":[]},{"given":"Jun","family":"Huang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,2,19]]},"reference":[{"key":"50990_CR1","unstructured":"Zhao W X, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z, Du Y, Yang C, Chen Y, Chen Z, Jiang J, Ren R, Li Y, Tang X, Liu Z, Liu P, Nie J Y, Wen J R. A survey of large language models. 2023, arXiv preprint arXiv: 2303.18223"},{"key":"50990_CR2","doi-asserted-by":"crossref","unstructured":"Xu F, Hao Q, Zong Z, Wang J, Zhang Y, Wang J, Lan X, Gong J, Ouyang T, Meng F, Shao C, Yan Y, Yang Q, Song Y, Ren S, Hu X, Li Y, Feng J, Gao C, Li Y. Towards large reasoning models: a survey of reinforced reasoning with large language models. 2025, arXiv preprint arXiv: 2501.09686","DOI":"10.1016\/j.patter.2025.101370"},{"key":"50990_CR3","unstructured":"DeepSeek-AI. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. 2025, arXiv preprint arXiv: 2501.12948"},{"key":"50990_CR4","first-page":"1800","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"J Wei","year":"2022","unstructured":"Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, Chi E H, Le Q V, Zhou D. Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1800"},{"key":"50990_CR5","first-page":"420","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Y Fu","year":"2023","unstructured":"Fu Y, Peng H, Ou L, Sabharwal A, Khot T. Specializing smaller language models towards multi-step reasoning. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 420"},{"key":"50990_CR6","doi-asserted-by":"publisher","first-page":"1773","DOI":"10.18653\/v1\/2023.acl-short.151","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"L C Magister","year":"2023","unstructured":"Magister L C, Mallinson J, Adamek J, Malmi E, Severyn A. Teaching small language models to reason. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023, 1773\u20131781"},{"key":"50990_CR7","doi-asserted-by":"publisher","first-page":"7059","DOI":"10.18653\/v1\/2023.findings-acl.441","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023","author":"K Shridhar","year":"2023","unstructured":"Shridhar K, Stolfo A, Sachan M. Distilling reasoning capabilities into smaller language models. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. 2023, 7059\u20137073"},{"issue":"1","key":"50990_CR8","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/MIS.2024.3517792","volume":"40","author":"Q Zhang","year":"2025","unstructured":"Zhang Q, Liu Z, Pan S. The rise of small language models. IEEE Intelligent Systems, 2025, 40(1): 30\u201337","journal-title":"IEEE Intelligent Systems"},{"key":"50990_CR9","first-page":"12413","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023","author":"J Yan","year":"2023","unstructured":"Yan J, Wang C, Zhang T, He X, Huang J, Zhang W. From complex to simple: unraveling the cognitive tree for reasoning with small language models. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 12413\u201312425"},{"key":"50990_CR10","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"B Zhang","year":"2024","unstructured":"Zhang B, Liu Z, Cherry C, Firat O. When scaling meets LLM finetuning: the effect of data, model and finetuning method. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201320"},{"key":"50990_CR11","first-page":"18234","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence","author":"L Hu","year":"2024","unstructured":"Hu L, He H, Wang D, Zhao Z, Shao Y, Nie L. LLM vs small model? Large language model based text augmentation enhanced personality detection model. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 18234\u201318242"},{"key":"50990_CR12","unstructured":"Plaat A, Wong A, Verberne S, Broekens J, van Stein N, B\u00e4ck T. Reasoning with large language models, a survey. 2024, arXiv preprint arXiv: 2407.11511"},{"key":"50990_CR13","doi-asserted-by":"publisher","first-page":"1049","DOI":"10.18653\/v1\/2023.findings-acl.67","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023","author":"J Huang","year":"2023","unstructured":"Huang J, Chang K C C. Towards reasoning in large language models: a survey. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. 2023, 1049\u20131065"},{"key":"50990_CR14","doi-asserted-by":"publisher","first-page":"11574","DOI":"10.18653\/v1\/2024.emnlp-main.646","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"P Giadikiaroglou","year":"2024","unstructured":"Giadikiaroglou P, Lymperaiou M, Filandrianos G, Stamou G. Puzzle solving using reasoning of large language models: a survey. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 11574\u201311591"},{"key":"50990_CR15","first-page":"225","volume-title":"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop","author":"J Ahn","year":"2024","unstructured":"Ahn J, Verma R, Lou R, Liu D, Zhang R, Yin W. Large language models for mathematical reasoning: progresses and challenges. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. 2024, 225\u2013237"},{"issue":"9","key":"50990_CR16","doi-asserted-by":"publisher","first-page":"199348","DOI":"10.1007\/s11704-024-40330-z","volume":"19","author":"X Zhang","year":"2025","unstructured":"Zhang X, Wang D, Dou L, Zhu Q, Che W. A survey of table reasoning with large language models. Frontiers of Computer Science, 2025, 19(9): 199348","journal-title":"Frontiers of Computer Science"},{"issue":"12","key":"50990_CR17","doi-asserted-by":"publisher","first-page":"1912613","DOI":"10.1007\/s11704-025-50480-3","volume":"19","author":"S Wei","year":"2025","unstructured":"Wei S, Tong Y, Zhou Z, Xu Y, Gao J, Wei T, He T, Lv W. Federated reasoning LLMs: a survey. Frontiers of Computer Science, 2025, 19(12): 1912613","journal-title":"Frontiers of Computer Science"},{"key":"50990_CR18","unstructured":"Hui B, Yang J, Cui Z, Yang J, Liu D, Zhang L, Liu T, Zhang J, Yu B, Dang K, Yang A, Men R, Huang F, Ren X, Ren X, Zhou J, Lin J. Qwen2.5-coder technical report. 2024, arXiv preprint arXiv: 2409.12186"},{"key":"50990_CR19","unstructured":"Guo D, Zhu Q, Yang D, Xie Z, Dong K, Zhang W, Chen G, Bi X, Wu Y, Li Y K, Luo F, Xiong Y, Liang W. DeepSeek-coder: when the large language model meets programming-the rise of code intelligence. 2024, arXiv preprint arXiv: 2401.14196"},{"key":"50990_CR20","unstructured":"Lozhkov A, Li R, Allal L B, Cassano F, Lamy-Poirier J, Tazi N, Tang A, Pykhtar D, Liu J, Wei Y, Liu T, Tian M, Kocetkov D, Zucker A, Belkada Y, Wang Z, Liu Q, Abulkhanov D, Paul I, Li Z, Li W D, Risdal M, Li J, Zhu J, Zhuo T Y, Zheltonozhskii E, Dade N O O, Yu W, Krau\u00df L, Jain N, Su Y, He X, Dey M, Abati E, Chai Y, Muennighoff N, Tang X, Oblokulov M, Akiki C, Marone M, Mou C, Mishra M, Gu A, Hui B, Dao T, Zebaze A, Dehaene O, Patry N, Xu C, McAuley J, Hu H, Scholak T, Paquet S, Robinson J, Anderson C J, Chapados N, Patwary M, Tajbakhsh N, Jernite Y, Ferrandis C M, Zhang L, Hughes S, Wolf T, Guha A, von Werra L, de Vries H. StarCoder 2 and the stack v2: the next generation. 2024, arXiv preprint arXiv: 2402.19173"},{"key":"50990_CR21","unstructured":"Jiang J, Wang F, Shen J, Kim S, Kim S. A survey on large language models for code generation. 2024, arXiv preprint arXiv: 2406.00515"},{"key":"50990_CR22","unstructured":"Yang A, Zhang B, Hui B, Gao B, Yu B, Li C, Liu D, Tu J, Zhou J, Lin J, Lu K, Xue M, Lin R, Liu T, Ren X, Zhang Z. Qwen2.5-math technical report: toward mathematical expert model via self-improvement. 2024, arXiv preprint arXiv: 2409.12122"},{"key":"50990_CR23","unstructured":"Shao Z, Wang P, Zhu Q, Xu R, Song J, Bi X, Zhang H, Zhang M, Li Y K, Wu Y, Guo D. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. 2024, arXiv preprint arXiv: 2402.03300"},{"key":"50990_CR24","unstructured":"Ying H, Zhang S, Li L, Zhou Z, Shao Y, Fei Z, Ma Y, Hong J, Liu K, Wang Z, Wang Y, Wu Z, Li S, Zhou F, Liu H, Zhang S, Zhang W, Yan H, Qiu X, Wang J, Chen K, Lin D. InternLM-Math: open math large language models toward verifiable reasoning. 2024, arXiv preprint arXiv: 2402.06332"},{"key":"50990_CR25","doi-asserted-by":"crossref","unstructured":"Muennighoff N, Yang Z, Shi W, Li X L, Fei-Fei L, Hajishirzi H, Zettlemoyer L, Liang P, Cand\u00e8s E, Hashimoto T. s1: simple test-time scaling. 2025, arXiv preprint arXiv: 2501.19393","DOI":"10.18653\/v1\/2025.emnlp-main.1025"},{"key":"50990_CR26","unstructured":"Zhao Y, Yin H, Zeng B, Wang H, Shi T, Lyu C, Wang L, Luo W, Zhang K. Marco-o1: towards open reasoning models for open-ended solutions. 2024, arXiv preprint arXiv: 2411.14405"},{"key":"50990_CR27","unstructured":"Cai W, Wang C, Yan J, Huang J, Fang X. Reasoning with OmniThought: a large CoT dataset with verbosity and cognitive difficulty annotations. 2025, arXiv preprint arXiv: 2505.10937"},{"key":"50990_CR28","unstructured":"Rein D, Hou B L, Stickland A C, Petty J, Pang R Y, Dirani J, Michael J, Bowman S R. GPQA: a graduate-level google-proof Q&A benchmark. 2023, arXiv preprint arXiv: 2311.12022"},{"key":"50990_CR29","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"N Jain","year":"2025","unstructured":"Jain N, Han K, Gu A, Li W D, Yan F, Zhang T, Wang S, Solar-Lezama A, Sen K, Stoica I. LiveCodeBench: holistic and contamination free evaluation of large language models for code. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201341"},{"key":"50990_CR30","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"H Lightman","year":"2024","unstructured":"Lightman H, Kosaraju V, Burda Y, Edwards H, Baker B, Lee T, Leike J, Schulman J, Sutskever I, Cobbe K. Let\u2019s verify step by step. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201324"},{"key":"50990_CR31","unstructured":"Chen J, Cai Z, Ji K, Wang X, Liu W, Wang R, Hou J, Wang B. HuatuoGPT-o1, towards medical complex reasoning with LLMs. 2024, arXiv preprint arXiv: 2412.18925"},{"key":"50990_CR32","unstructured":"Yuan W, Yu J, Jiang S, Padthe K, Li Y, Wang D, Kulikov I, Cho K, Tian Y, Weston J E, Li X. NaturalReasoning: reasoning in the wild with 2.8M challenging questions. 2025, arXiv preprint arXiv: 2502.13124"},{"key":"50990_CR33","unstructured":"Lu D, Tan X, Xu R, Yao T, Qu C, Chu W, Xu Y, Qi Y. SCP-116K: a high-quality problem-solution dataset and a generalized pipeline for automated extraction in the higher education science domain. 2025, arXiv preprint arXiv: 2501.15587"},{"key":"50990_CR34","first-page":"2909","volume-title":"Proceedings of the 13th Language Resources and Evaluation Conference","author":"M Mikulov\u00e1","year":"2022","unstructured":"Mikulov\u00e1 M, Straka M, \u0160t\u011bp\u00e1nek J, \u0160t\u011bp\u00e1nkov\u00e1 B, Hajic J. Quality and efficiency of manual annotation: pre-annotation bias. In: Proceedings of the 13th Language Resources and Evaluation Conference. 2022, 2909\u20132918"},{"key":"50990_CR35","first-page":"168","volume-title":"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations","author":"H Kim","year":"2024","unstructured":"Kim H, Mitra K, Chen R L, Rahman S, Zhang D. MEGAnno+: a human-LLM collaborative annotation system. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. 2024, 168\u2013176"},{"key":"50990_CR36","doi-asserted-by":"publisher","first-page":"15609","DOI":"10.18653\/v1\/2024.emnlp-main.874","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"J Li","year":"2024","unstructured":"Li J. Human-LLM hybrid text answer aggregation for crowd annotations. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 15609\u201315622"},{"key":"50990_CR37","first-page":"303","volume-title":"Proceedings of 2024 CHI Conference on Human Factors in Computing Systems","author":"X Wang","year":"2024","unstructured":"Wang X, Kim H, Rahman S, Mitra K, Miao Z. Human-LLM collaborative annotation through effective verification of LLM labels. In: Proceedings of 2024 CHI Conference on Human Factors in Computing Systems. 2024, 303"},{"key":"50990_CR38","doi-asserted-by":"publisher","first-page":"9048","DOI":"10.18653\/v1\/2024.emnlp-main.511","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"R Movva","year":"2024","unstructured":"Movva R, Koh P W, Pierson E. Annotation alignment: comparing LLM and human annotations of conversational safety. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 9048\u20139062"},{"key":"50990_CR39","first-page":"2997","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"T Schick","year":"2023","unstructured":"Schick T, Dwivedi-Yu J, Dess\u00ed R, Raileanu R, Lomeli M, Hambro E, Zettlemoyer L, Cancedda N, Scialom T. Toolformer: language models can teach themselves to use tools. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 2997"},{"key":"50990_CR40","doi-asserted-by":"publisher","first-page":"2983","DOI":"10.1145\/3626772.3661381","volume-title":"Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"H Wang","year":"2024","unstructured":"Wang H, Qin Y, Lin Y, Pan J Z, Wong K F. Empowering large language models: tool learning for real-world interaction. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 2983\u20132986"},{"key":"50990_CR41","first-page":"3550","volume-title":"Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"S Qiao","year":"2024","unstructured":"Qiao S, Gui H, Lv C, Jia Q, Chen H, Zhang N. Making language models better tool learners with execution feedback. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024, 3550\u20133568"},{"issue":"7","key":"50990_CR42","doi-asserted-by":"publisher","first-page":"6728","DOI":"10.1109\/LRA.2024.3410155","volume":"9","author":"T Kwon","year":"2024","unstructured":"Kwon T, Palo N D, Johns E. Language models as zero-shot trajectory generators. IEEE Robotics and Automation Letters, 2024, 9(7): 6728\u20136735","journal-title":"IEEE Robotics and Automation Letters"},{"key":"50990_CR43","doi-asserted-by":"publisher","first-page":"8003","DOI":"10.18653\/v1\/2023.findings-acl.507","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023","author":"C Y Hsieh","year":"2023","unstructured":"Hsieh C Y, Li C L, Yeh C K, Nakhost H, Fujii Y, Ratner A, Krishna R, Lee C Y, Pfister T. Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. 2023, 8003\u20138017"},{"key":"50990_CR44","doi-asserted-by":"publisher","first-page":"14852","DOI":"10.18653\/v1\/2023.acl-long.830","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"N Ho","year":"2023","unstructured":"Ho N, Schmid L, Yun S Y. Large language models are reasoning teachers. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 14852\u201314882"},{"key":"50990_CR45","doi-asserted-by":"publisher","first-page":"2665","DOI":"10.18653\/v1\/2023.acl-long.150","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"L H Li","year":"2023","unstructured":"Li L H, Hessel J, Yu Y, Ren X, Chang K W, Choi Y. Symbolic chain-of-thought distillation: small models can also \u201cthink\u201d step-by-step. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 2665\u20132679"},{"key":"50990_CR46","doi-asserted-by":"publisher","first-page":"6030","DOI":"10.18653\/v1\/2024.findings-emnlp.350","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024","author":"Y Yue","year":"2024","unstructured":"Yue Y, Wang C, Huang J, Wang P. Distilling instruction-following abilities of large language models with task-aware curriculum planning. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024. 2024, 6030\u20136054"},{"key":"50990_CR47","first-page":"431","volume-title":"Proceedings of the 31st International Conference on Computational Linguistics: Industry Track","author":"Y Yue","year":"2025","unstructured":"Yue Y, Wang C, Huang J, Wang P. Building a family of data augmentation models for low-cost LLM fine-tuning on the cloud. In: Proceedings of the 31st International Conference on Computational Linguistics: Industry Track. 2025, 431\u2013444"},{"key":"50990_CR48","doi-asserted-by":"publisher","first-page":"1028","DOI":"10.18653\/v1\/2024.acl-long.58","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Z Yang","year":"2024","unstructured":"Yang Z, Pang T, Feng H, Wang H, Chen W, Zhu M, Liu Q. Self-distillation bridges distribution gap in language model fine-tuning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 1028\u20131043"},{"key":"50990_CR49","first-page":"1954","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Z Tang","year":"2024","unstructured":"Tang Z, Zhang X, Wang B, Wei F. MathScale: scaling instruction tuning for mathematical reasoning. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 1954"},{"key":"50990_CR50","first-page":"709","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"A Havrilla","year":"2024","unstructured":"Havrilla A, Raparthy S, Nalmpantis C, Dwivedi-Yu J, Zhuravynski M, Hambro E, Raileanu R. GLoRe: when, where, and how to improve LLM reasoning via global and local refinements. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 709"},{"key":"50990_CR51","first-page":"3003","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics","author":"S Qiao","year":"2024","unstructured":"Qiao S, Zhang N, Fang R, Luo Y, Zhou W, Jiang Y E, Lv C, Chen H. AutoAct: automatic agent learning from scratch for QA via self-planning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 3003\u20133021"},{"key":"50990_CR52","unstructured":"Li C, Dong G, Xue M, Peng R, Wang X, Liu D. DotaMath: decomposition of thought with code assistance and self-correction for mathematical reasoning. 2024, arXiv preprint arXiv: 2407.04078"},{"key":"50990_CR53","unstructured":"Song Y, Yin D, Yue X, Huang J, Li S, Lin B Y. Trial and error: exploration-based trajectory optimization for LLM agents. 2024, arXiv preprint arXiv: 2403.02502"},{"key":"50990_CR54","unstructured":"Motwani S R, Smith C, Das R J, Rybchuk M, Torr P H S, Laptev I, Pizzati F, Clark R, de Witt C S. MALT: improving reasoning with multiagent LLM training. 2024, arXiv preprint arXiv: 2412.01928"},{"key":"50990_CR55","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"A Kumar","year":"2025","unstructured":"Kumar A, Zhuang V, Agarwal R, Su Y, Co-Reyes J D, Singh A, Baumli K, Iqbal S, Bishop C, Roelofs R, Zhang L M, McKinney K, Shrivastava D, Paduraru C, Tucker G, Precup D, Behbahani F, Faust A. Training language models to self-correct via reinforcement learning. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201327"},{"key":"50990_CR56","doi-asserted-by":"publisher","first-page":"10495","DOI":"10.18653\/v1\/2025.findings-acl.547","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025","author":"Z Zhang","year":"2025","unstructured":"Zhang Z, Zheng C, Wu Y, Zhang B, Lin R, Yu B, Liu D, Zhou J, Lin J. The lessons of developing process reward models in mathematical reasoning. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025. 2025, 10495\u201310516"},{"key":"50990_CR57","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"H Luo","year":"2025","unstructured":"Luo H, Sun Q, Xu C, Zhao P, Lou J G, Tao C, Geng X, Lin Q, Chen S, Tang Y, Zhang D. WizardMath: empowering mathematical reasoning for large language models via Reinforced Evol-Instruct. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201337"},{"key":"50990_CR58","doi-asserted-by":"publisher","first-page":"9426","DOI":"10.18653\/v1\/2024.acl-long.510","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"P Wang","year":"2024","unstructured":"Wang P, Li L, Shao Z, Xu R, Dai D, Li Y, Chen D, Wu Y, Sui Z. Math-shepherd: verify and reinforce LLMs step-by-step without human annotations. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 9426\u20139439"},{"key":"50990_CR59","doi-asserted-by":"publisher","first-page":"7309","DOI":"10.18653\/v1\/2024.findings-emnlp.429","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024","author":"Z Wang","year":"2024","unstructured":"Wang Z, Li Y, Wu Y, Luo L, Hou L, Yu H, Shang J. Multi-step problem solving through a verifier: An empirical analysis on model-induced process supervision. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024. 2024, 7309\u20137319"},{"key":"50990_CR60","first-page":"2066","volume-title":"Proceedings of the 38th International Conference on Neural Information Processing Systems","author":"D Zhang","year":"2024","unstructured":"Zhang D, Zhoubian S, Hu Z, Yue Y, Dong Y, Tang J. ReST-MCTS*: LLM self-training via process reward guided tree search. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 2066"},{"key":"50990_CR61","doi-asserted-by":"publisher","first-page":"13659","DOI":"10.18653\/v1\/2024.acl-long.738","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Z Chen","year":"2024","unstructured":"Chen Z, White M, Mooney R, Payani A, Su Y, Sun H. When is tree search useful for LLM planning? it depends on the discriminator. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 13659\u201313678"},{"key":"50990_CR62","unstructured":"Wang C, Yan J, Zhang W, Huang J. Towards better parameter-efficient fine-tuning for large language models: a position paper. 2023, arXiv preprint arXiv: 2311.13126"},{"key":"50990_CR63","unstructured":"Han Z, Gao C, Liu J, Zhang J, Zhang S Q. Parameter-efficient finetuning for large models: a comprehensive survey. 2024, arXiv preprint arXiv: 2403.14608"},{"key":"50990_CR64","first-page":"1","volume-title":"Proceedings of the 10th International Conference on Learning Representations","author":"E J Hu","year":"2022","unstructured":"Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022, 1\u201313"},{"key":"50990_CR65","first-page":"441","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"T Dettmers","year":"2023","unstructured":"Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L. QLORA: efficient finetuning of quantized LLMs. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 441"},{"key":"50990_CR66","first-page":"1","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Q Zhang","year":"2023","unstructured":"Zhang Q, Chen M, Bukharin A, He P, Cheng Y, Chen W, Zhao T. Adaptive budget allocation for parameter-efficient fine-tuning. In: Proceedings of the 11th International Conference on Learning Representations. 2023, 1\u201317"},{"key":"50990_CR67","first-page":"1279","volume-title":"Proceedings of the 20th European Conference on Computer Systems","author":"G Sheng","year":"2024","unstructured":"Sheng G, Zhang C, Ye Z, Wu X, Zhang W, Zhang R, Peng Y, Lin H, Wu C. HybridFlow: a flexible and efficient RLHF framework. In: Proceedings of the 20th European Conference on Computer Systems. 2024, 1279\u20131297"},{"key":"50990_CR68","unstructured":"Mei Z, Fu W, Li K, Wang G, Zhang H, Wu Y. ReaLHF: optimized RLHF training for large language models through parameter reallocation. 2024, arXiv preprint arXiv: 2406.14088"},{"key":"50990_CR69","volume-title":"Proceedings of the 8th Conference on Machine Learning and Systems","author":"Z Mei","year":"2025","unstructured":"Mei Z, Fu W, Li K, Wang G, Zhang H, Wu Y. Real: efficient RLHF training of large language models with parameter reallocation. In: Proceedings of the 8th Conference on Machine Learning and Systems. 2025"},{"key":"50990_CR70","doi-asserted-by":"crossref","unstructured":"Wen L, Cai Y, Xiao F, He X, An Q, Duan Z, Du Y, Liu J, Tang L, Lv X, Zou H, Deng Y, Jia S, Zhang X. Light-R1: curriculum SFT, DPO and RL for long cot from scratch and beyond. 2025, arXiv preprint arXiv: 2503.10460","DOI":"10.18653\/v1\/2025.acl-industry.24"},{"key":"50990_CR71","unstructured":"Hu J, Wu X, Wang W, Xianyu, Zhang D, Cao Y. OpenRLHF: an easy-to-use, scalable and high-performance RLHF framework. 2024, arXiv preprint arXiv: 2405.11143"},{"key":"50990_CR72","first-page":"2011","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"L Ouyang","year":"2022","unstructured":"Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011"},{"key":"50990_CR73","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347"},{"key":"50990_CR74","unstructured":"Bai Y, Kadavath S, Kundu S, Askell A, Kernion J, Jones A, Chen A, Goldie A, Mirhoseini A, McKinnon C, Chen C, Olsson C, Olah C, Hernandez D, Drain D, Ganguli D, Li D, Tran-Johnson E, Perez E, Kerr J, Mueller J, Ladish J, Landau J, Ndousse K, Lukosuite K, Lovitt L, Sellitto M, Elhage N, Schiefer N, Mercado N, DasSarma N, Lasenby R, Larson R, Ringer S, Johnston S, Kravec S, El Showk S, Fort S, Lanham T, Telleen-Lawton T, Conerly T, Henighan T, Hume T, Bowman S R, Hatfield-Dodds Z, Mann B, Amodei D, Joseph N, McCandlish S, Brown T, Kaplan J. Constitutional AI: harmlessness from AI feedback. 2022, arXiv preprint arXiv: 2212.08073"},{"key":"50990_CR75","first-page":"2338","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"R Rafailov","year":"2023","unstructured":"Rafailov R, Sharma A, Mitchell E, Ermon S, Manning C D, Finn C. Direct preference optimization: your language model is secretly a reward model. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 2338"},{"key":"50990_CR76","unstructured":"Ethayarajh K, Xu W, Muennighoff N, Jurafsky D, Kiela D. KTO: model alignment as prospect theoretic optimization. 2024, arXiv preprint arXiv: 2402.01306"},{"key":"50990_CR77","doi-asserted-by":"publisher","first-page":"9954","DOI":"10.18653\/v1\/2024.findings-acl.592","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024","author":"A Amini","year":"2024","unstructured":"Amini A, Vieira T, Cotterell R. Direct preference optimization with an offset. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. 2024, 9954\u20139972"},{"key":"50990_CR78","first-page":"3946","volume-title":"Proceedings of the 38th International Conference on Neural Information Processing Systems","author":"Y Meng","year":"2024","unstructured":"Meng Y, Xia M, Chen D. SimPO: simple preference optimization with a reference-free reward. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 3946"},{"key":"50990_CR79","first-page":"7601","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics","author":"L Q Trung","year":"2024","unstructured":"Trung L Q, Zhang X, Jie Z, Sun P, Jin X, Li H. ReFT: reasoning with reinforced fine-tuning. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 7601\u20137614"},{"key":"50990_CR80","unstructured":"Kazemnejad A, Aghajohari M, Portelance E, Sordoni A, Reddy S, Courville A, Le Roux N. VinePPO: unlocking RL potential for LLM reasoning through refined credit assignment. 2024, arXiv preprint arXiv: 2410.01679"},{"key":"50990_CR81","unstructured":"Wang T, Chen J, Han X, Bai J. CPL: critical plan step learning boosts LLM generalization in reasoning tasks. 2024, arXiv preprint arXiv: 2409.08642"},{"key":"50990_CR82","unstructured":"Hwang H, Kim D, Kim S, Ye S, Seo M. Self-explore to avoid the pit: Improving the reasoning capabilities of language models with finegrained rewards. 2024, arXiv preprint arXiv: 2404.10346"},{"key":"50990_CR83","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"A Setlur","year":"2025","unstructured":"Setlur A, Nagpal C, Fisch A, Geng X, Eisenstein J, Agarwal R, Agarwal A, Berant J, Kumar A. Rewarding progress: scaling automated process verifiers for LLM reasoning. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201331"},{"key":"50990_CR84","doi-asserted-by":"publisher","first-page":"7889","DOI":"10.18653\/v1\/2024.findings-emnlp.463","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024","author":"G Chen","year":"2024","unstructured":"Chen G, Liao M, Li C, Fan K. Step-level value preference optimization for mathematical reasoning. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024. 2024, 7889\u20137903"},{"key":"50990_CR85","unstructured":"Xie Y, Goyal A, Zheng W, Kan M Y, Lillicrap T P, Kawaguchi K, Shieh M. Monte Carlo tree search boosts reasoning via iterative preference learning. 2024, arXiv preprint arXiv: 2405.00451"},{"key":"50990_CR86","unstructured":"Wang C, Deng Y, Lyu Z, Zeng L, He J, Yan S, An B. Q*: improving multi-step reasoning for LLMs with deliberative planning. 2024, arXiv preprint arXiv: 2406.14283"},{"key":"50990_CR87","unstructured":"Guan X, Zhang L L, Liu Y, Shang N, Sun Y, Zhu Y, Yang F, Yang M. rStar-math: small LLMs can master math reasoning with self-evolved deep thinking. 2025, arXiv preprint arXiv: 2501.04519"},{"key":"50990_CR88","first-page":"159","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"T B Brown","year":"2020","unstructured":"Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 159"},{"key":"50990_CR89","first-page":"2656","volume-title":"Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"T R Chiang","year":"2019","unstructured":"Chiang T R, Chen Y N. Semantically-aligned equation generation for solving and reasoning math word problems. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 2656\u20132668"},{"key":"50990_CR90","unstructured":"Nye M I, Andreassen A J, Gur-Ari G, Michalewski H, Austin J, Bieber D, Dohan D, Lewkowycz A, Bosma M, Luan D, Sutton C, Odena A. Show your work: Scratchpads for intermediate computation with language models. 2021, arXiv preprint arXiv: 2112.00114"},{"key":"50990_CR91","unstructured":"Snell C, Lee J, Xu K, Kumar A. Scaling LLM test-time compute optimally can be more effective than scaling model parameters. 2024, arXiv preprint arXiv: 2408.03314"},{"key":"50990_CR92","unstructured":"Wu Y, Sun Z, Li S, Welleck S, Yang Y. An empirical analysis of compute-optimal inference for problem-solving with language models. 2024, arXiv preprint arXiv: 2408.00724"},{"key":"50990_CR93","first-page":"1","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"X Wang","year":"2023","unstructured":"Wang X, Wei J, Schuurmans D, Le Q V, Chi E H, Narang S, Chowdhery A, Zhou D. Self-consistency improves chain of thought reasoning in language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023, 1\u201324"},{"key":"50990_CR94","first-page":"517","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"S Yao","year":"2023","unstructured":"Yao S, Yu D, Zhao J, Shafran I, Griffiths T L, Cao Y, Narasimhan K. Tree of thoughts: Deliberate problem solving with large language models. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 517"},{"key":"50990_CR95","first-page":"1797","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"B Sel","year":"2024","unstructured":"Sel B, Al-Tawaha A, Khattar V, Jia R, Jin M. Algorithm of thoughts: enhancing exploration of ideas in large language models. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 1797"},{"key":"50990_CR96","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"X Ning","year":"2024","unstructured":"Ning X, Lin Z, Zhou Z, Wang Z, Yang H, Wang Y. Skeleton-of-thought: prompting LLMs for efficient parallel generation. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201351"},{"key":"50990_CR97","unstructured":"Long J. Large language model guided tree-of-thought. 2023, arXiv preprint arXiv: 2305.08291"},{"key":"50990_CR98","first-page":"467","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Y Du","year":"2024","unstructured":"Du Y, Li S, Torralba A, Tenenbaum J B, Mordatch I. Improving factuality and reasoning in language models through multiagent debate. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 467"},{"key":"50990_CR99","doi-asserted-by":"publisher","first-page":"17889","DOI":"10.18653\/v1\/2024.emnlp-main.992","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"T Liang","year":"2024","unstructured":"Liang T, He Z, Jiao W, Wang X, Wang Y, Wang R, Yang Y, Shi S, Tu Z. Encouraging divergent thinking in large language models through multi-agent debate. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 17889\u201317904"},{"key":"50990_CR100","first-page":"257","volume-title":"Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"Z Wang","year":"2024","unstructured":"Wang Z, Mao S, Wu W, Ge T, Wei F, Ji H. Unleashing the emergent cognitive synergy in large language models: a task-solving agent through multi-persona self-collaboration. In: Proceedings of 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024, 257\u2013279"},{"key":"50990_CR101","doi-asserted-by":"publisher","first-page":"14544","DOI":"10.18653\/v1\/2024.acl-long.782","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"J Zhang","year":"2024","unstructured":"Zhang J, Xu X, Zhang N, Liu R, Hooi B, Deng S. Exploring collaboration mechanisms for LLM agents: a social psychology view. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 14544\u201314607"},{"key":"50990_CR102","doi-asserted-by":"publisher","first-page":"15174","DOI":"10.18653\/v1\/2024.acl-long.810","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"C Qian","year":"2024","unstructured":"Qian C, Liu W, Liu H, Chen N, Dang Y, Li J, Yang C, Chen W, Su Y, Cong X, Xu J, Li D, Liu Z, Sun M. ChatDev: communicative agents for software development. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 15174\u201315186"},{"key":"50990_CR103","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"S Hong","year":"2024","unstructured":"Hong S, Zhuge M, Chen J, Zheng X, Cheng Y, Wang J, Zhang C, Wang Z, Yau S K S, Lin Z, Zhou L, Ran C, Xiao L, Wu C, Schmidhuber J. MetaGPT: meta programming for a multi-agent collaborative framework. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201329"},{"key":"50990_CR104","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"S Holt","year":"2024","unstructured":"Holt S, Luyten M R, van der Schaar M. L2MAC: large language model automatic computer for extensive code generation. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201361"},{"key":"50990_CR105","unstructured":"Wu Q, Bansal G, Zhang J, Wu Y, Zhang S, Zhu E, Li B, Jiang L, Zhang X, Wang C. AutoGen: enabling next-gen LLM applications via multi-agent conversation framework. 2023, arXiv preprint arXiv: 2308.08155"},{"key":"50990_CR106","unstructured":"Yan Y, Zhang Y, Huang K. Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games. 2024, arXiv preprint arXiv: 2403.17674"},{"key":"50990_CR107","first-page":"627","volume-title":"Proceedings of the 33rd International Joint Conference on Artificial Intelligence","author":"Z Zhou","year":"2024","unstructured":"Zhou Z, Hu B, Zhao C, Zhang P, Liu B. Large language model as a policy teacher for training reinforcement learning agents. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence. 2024, 627"},{"key":"50990_CR108","doi-asserted-by":"publisher","first-page":"14165","DOI":"10.18653\/v1\/2023.acl-long.792","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"D Jiang","year":"2023","unstructured":"Jiang D, Ren X, Lin B Y. LLM-blender: ensembling large language models with pairwise ranking and generative fusion. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 14165\u201314178"},{"key":"50990_CR109","first-page":"2597","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"M Zhuge","year":"2024","unstructured":"Zhuge M, Wang W, Kirsch L, Faccio F, Khizbullin D, Schmidhuber J. GPTSwarm: language agents as optimizable graphs. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 2597"},{"key":"50990_CR110","doi-asserted-by":"publisher","first-page":"24013","DOI":"10.18653\/v1\/2025.acl-long.1170","volume-title":"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Z Wang","year":"2025","unstructured":"Wang Z, Wang Y, Liu X, Ding L, Zhang M, Liu J, Zhang M. AgentDropout: dynamic agent elimination for token-efficient and highperformance LLM-based multi-agent collaboration. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025, 24013\u201324035"},{"key":"50990_CR111","first-page":"11567","volume-title":"Proceedings of 2025 Conference on Empirical Methods in Natural Language Processing","author":"S Wang","year":"2025","unstructured":"Wang S, Tan Z, Chen Z, Zhou S, Chen T, Li J. AnyMAC: cascading flexible multi-agent collaboration via next-agent prediction. In: Proceedings of 2025 Conference on Empirical Methods in Natural Language Processing. 2025, 11567\u20131157"},{"key":"50990_CR112","unstructured":"Jiang M, Ruan Y, Lastras L, Kapanipathi P, Hashimoto T. Putting it all into context: simplifying agents with LCLMs. 2025, arXiv preprint arXiv: 2505.08120"},{"key":"50990_CR113","unstructured":"Zheng C, Liu Z, Xie E, Li Z, Li Y. Progressive-hint prompting improves reasoning in large language models. 2023, arXiv preprint arXiv: 2304.09797"},{"key":"50990_CR114","unstructured":"Liu Z, Zhang Y, Li P, Liu Y, Yang D. Dynamic LLM-agent network: an LLM-agent collaboration framework with agent team optimization. 2023, arXiv preprint arXiv: 2310.02170"},{"key":"50990_CR115","unstructured":"Shinn N, Labash B, Gopinath A. Reflexion: an autonomous agent with dynamic memory and self-reflection. 2023, arXiv preprint arXiv: 2303.11366"},{"key":"50990_CR116","unstructured":"Fu Y, Peng H, Khot T, Lapata M. Improving language model negotiation with self-play and in-context learning from AI feedback. 2023, arXiv preprint arXiv: 2305.10142"},{"key":"50990_CR117","doi-asserted-by":"publisher","first-page":"1720","DOI":"10.18653\/v1\/2024.findings-naacl.112","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024","author":"P Chen","year":"2024","unstructured":"Chen P, Zhang S, Han B. CoMM: collaborative multi-agent, multi-reasoning-path prompting for complex problem solving. In: Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024. 2024, 1720\u20131738"},{"issue":"2","key":"50990_CR118","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1007\/s10994-022-06286-6","volume":"112","author":"E Pesce","year":"2023","unstructured":"Pesce E, Montana G. Learning multi-agent coordination through connectivity-driven communication. Machine Learning, 2023, 112(2): 483\u2013514","journal-title":"Machine Learning"},{"key":"50990_CR119","first-page":"743","volume-title":"Proceedings of the 44th Annual Meeting of the Cognitive Science Society","author":"Y Liu","year":"2022","unstructured":"Liu Y, Dou Y, Li Y, Xu X, Liu D. Temporal dynamic weighted graph convolution for multi-agent reinforcement learning. In: Proceedings of the 44th Annual Meeting of the Cognitive Science Society. 2022, 743\u2013749"},{"key":"50990_CR120","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"S Hu","year":"2024","unstructured":"Hu S, Shen L, Zhang Y, Tao D. Learning multi-agent communication from graph modeling perspective. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201316"},{"key":"50990_CR121","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"C M Chan","year":"2024","unstructured":"Chan C M, Chen W, Su Y, Yu J, Xue W, Zhang S, Fu J, Liu Z. ChatEval: towards better LLM-based evaluators through multi-agent debate. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201315"},{"key":"50990_CR122","unstructured":"Zelikman E, Lorch E, Mackey L, Kalai A T. Self-taught optimizer (STOP): recursively self-improving code generation. 2023, arXiv preprint arXiv: 2310.02304"},{"key":"50990_CR123","unstructured":"Khattab O, Singhvi A, Maheshwari P, Zhang Z, Santhanam K, Vardhamanan S, Haq S, Sharma A, Joshi T T, Moazam H, Miller H, Zaharia M, Potts C. DSPy: compiling declarative language model calls into self-improving pipelines. 2023, arXiv preprint arXiv: 2310.03714"},{"key":"50990_CR124","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"C Qian","year":"2025","unstructured":"Qian C, Xie Z, Wang Y, Liu W, Zhu K, Xia H, Dang Y, Du Z, Chen W, Yang C, Liu Z, Sun M. Scaling large language model-based multi-agent collaboration. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201318"},{"key":"50990_CR125","first-page":"1910","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"G Lample","year":"2022","unstructured":"Lample G, Lachaux M A, Lavril T, Martinet X, Hayat A, Ebner G, Rodriguez A, Lacroix T. HyperTree proof search for neural theorem proving. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1910"},{"key":"50990_CR126","unstructured":"Bi Z, Han K, Liu C, Tang Y, Wang Y. Forest-of-thought: scaling test-time compute for enhancing LLM reasoning. 2024, arXiv preprint arXiv: 2412.09078"},{"key":"50990_CR127","volume-title":"Proceedings of the 6th Conference on Machine Learning and Systems","author":"R Pope","year":"2023","unstructured":"Pope R, Douglas S, Chowdhery A, Devlin J, Bradbury J, Heek J, Xiao K, Agrawal S, Dean J. Efficiently scaling transformer inference. In: Proceedings of the 6th Conference on Machine Learning and Systems. 2023"},{"key":"50990_CR128","unstructured":"Chen Y, Pan X, Li Y, Ding B, Zhou J. A simple and provable scaling law for the test-time compute of large language models. 2024, arXiv preprint arXiv: 2411.19477"},{"key":"50990_CR129","first-page":"1","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"S Welleck","year":"2023","unstructured":"Welleck S, Lu X, West P, Brahman F, Shen T, Khashabi D, Choi Y. Generating sequences by learning to self-correct. In: Proceedings of the 11th International Conference on Learning Representations. 2023, 1\u201319"},{"key":"50990_CR130","first-page":"2019","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"A Madaan","year":"2023","unstructured":"Madaan A, Tandon N, Gupta P, Hallinan S, Gao L, Wiegreffe S, Alon U, Dziri N, Prabhumoye S, Yang Y, Gupta S, Majumder B P, Hermann K, Welleck S, Yazdanbakhsh A, Clark P. SELF-REFINE: iterative refinement with self-feedback. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 2019"},{"key":"50990_CR131","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"X Chen","year":"2024","unstructured":"Chen X, Lin M, Sch\u00e4rli N, Zhou D. Teaching large language models to self-debug. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201381"},{"key":"50990_CR132","unstructured":"Chen J, Ren J, Chen X, Yang C, Sun R, Arik S \u00d6. SETS: leveraging self-verification and self-correction for improved test-time scaling. 2025, arXiv preprint arXiv: 2501.19306"},{"key":"50990_CR133","unstructured":"Hou Z, Lv X, Lu R, Zhang J, Li Y, Yao Z, Li J, Tang J, Dong Y. T1: advancing language model reasoning through reinforcement learning and inference scaling. 2025, arXiv preprint arXiv: 2501.11651"},{"key":"50990_CR134","unstructured":"Lee K H, Fischer I, Wu Y H, Marwood D, Baluja S, Schuurmans D, Chen X. Evolving deeper LLM thinking. 2025, arXiv preprint arXiv: 2501.09891"},{"key":"50990_CR135","doi-asserted-by":"publisher","first-page":"14035","DOI":"10.18653\/v1\/2023.emnlp-main.867","volume-title":"Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing","author":"S Choi","year":"2023","unstructured":"Choi S, Fang T, Wang Z, Song Y. KCTS: knowledge-constrained tree search decoding with token-level hallucination detection. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 14035\u201314053"},{"key":"50990_CR136","first-page":"1","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"S Zhang","year":"2023","unstructured":"Zhang S, Chen Z, Shen Y, Ding M, Tenenbaum J B, Gan C. Planning with large language models for code generation. In: Proceedings of the 11th International Conference on Learning Representations. 2023, 1\u201328"},{"key":"50990_CR137","first-page":"2572","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"A Zhou","year":"2024","unstructured":"Zhou A, Yan K, Shlapentokh-Rothman M, Wang H, Wang Y X. Language agent tree search unifies reasoning, acting, and planning in language models. In: Proceedings of the 41st International Conference on Machine Learning. 2024, 2572"},{"key":"50990_CR138","first-page":"1802","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"Y Xie","year":"2023","unstructured":"Xie Y, Kawaguchi K, Zhao Y, Zhao J X, Kan M Y, He J, Xie M Q. Self-evaluation guided beam search for reasoning. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1802"},{"key":"50990_CR139","unstructured":"Gandhi K, Lee D, Grand G, Liu M, Cheng W, Sharma A, Goodman N D. Stream of search (SoS): learning to search in language. 2024, arXiv preprint arXiv: 2404.03683"},{"key":"50990_CR140","unstructured":"Xin H, Guo D, Shao Z, Ren Z, Zhu Q, Liu B, Ruan C, Li W, Liang X. DeepSeek-prover: advancing theorem proving in LLMs through large-scale synthetic data. 2024, arXiv preprint arXiv: 2405.14333"},{"key":"50990_CR141","unstructured":"Ankner Z, Paul M, Cui B, Chang J D, Ammanabrolu P. Critique-out-loud reward models. 2024, arXiv preprint arXiv: 2408.11791"},{"key":"50990_CR142","unstructured":"Wan Y, Wu J, Abdulhai M, Shani L, Jaques N. Enhancing personalized multi-turn dialogue with curiosity reward. 2025, arXiv preprint arXiv: 2504.03206"},{"key":"50990_CR143","doi-asserted-by":"crossref","unstructured":"Heo D, Rim D N, Choi H. Dynamic preference multi-objective reinforcement learning for internet network management. 2025, arXiv preprint arXiv: 2506.13153","DOI":"10.24135\/ICONIP24"},{"key":"50990_CR144","doi-asserted-by":"publisher","first-page":"11214","DOI":"10.18653\/v1\/2025.acl-long.549","volume-title":"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"C Li","year":"2025","unstructured":"Li C, Zhang H, Xu Y, Xue H, Ao X, He Q. Gradient-adaptive policy optimization: towards multi-objective alignment of large language models. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025, 11214\u201311232"},{"key":"50990_CR145","doi-asserted-by":"publisher","first-page":"49","DOI":"10.18653\/v1\/2023.wiesp-1.7","volume-title":"Proceedings of the 2nd Workshop on Information Extraction from Scientific Publications","author":"T D Nguyen","year":"2023","unstructured":"Nguyen T D, Ting Y S, Ciuc\u0103 I, O\u2019Neill C, Sun Z C, Jab\u0142onska M, Kruk S, Perkowski E, Miller J, Li J J, Peek J, Iyer K, R\u00f3\u017ca\u0144ski T, Khetarpal P, Zaman S, Brodrick D, Rodr\u00edguez M\u00e9ndez S J, Bui T, Goodman A, Accomazzi A, Naiman J, Cranney J, Schawinski K, R\u0103ileanu R, UniverseTBD. AstroLLaMA: towards specialized foundation models in astronomy. In: Proceedings of the 2nd Workshop on Information Extraction from Scientific Publications. 2023, 49\u201355"},{"key":"50990_CR146","first-page":"4489","volume-title":"Proceedings of the ACM Web Conference","author":"K Yang","year":"2024","unstructured":"Yang K, Zhang T, Kuang Z, Xie Q, Huang J, Ananiadou S. MentaLLaMA: interpretable mental health analysis on social media with large language models. In: Proceedings of the ACM Web Conference. 2024, 4489\u20134500"},{"key":"50990_CR147","unstructured":"Zhang D, Hu Z, Zhoubian S, Du Z, Yang K, Wang Z, Yue Y, Dong Y, Tang J. SciGLM: training scientific language models with self-reflective instruction annotation and tuning. 2024, arXiv preprint arXiv: 2401.07950"},{"key":"50990_CR148","unstructured":"Zhang D, Liu W, Tan Q, Chen J, Yan H, Yan Y, Li J, Huang W, Yue X, Zhou D, Zhang S, Su M, Zhong H, Li Y, Ouyang W. ChemLLM: a chemical large language model. 2024, arXiv preprint arXiv: 2402.06852"},{"key":"50990_CR149","unstructured":"Acikgoz E C, Ince O B, Bench R, Boz A A, Kesen I, Erdem A, Erdem E. Hippocrates: an open-source framework for advancing large language models in healthcare. 2024, arXiv preprint arXiv: 2404.16621"},{"key":"50990_CR150","unstructured":"Yang Y, Sun H, Li J, Liu R, Li Y, Liu Y, Huang H, Gao Y. MindLLM: pre-training lightweight large language model from scratch, evaluations and domain applications. 2023, arXiv preprint arXiv: 2310.15777"},{"key":"50990_CR151","doi-asserted-by":"publisher","first-page":"460","DOI":"10.18653\/v1\/2021.findings-acl.40","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Y Yao","year":"2021","unstructured":"Yao Y, Huang S, Wang W, Dong L, Wei F. Adapt-and-distill: developing small, fast and effective pretrained language models for domains. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021, 460\u2013470"},{"key":"50990_CR152","first-page":"954","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"O Dige","year":"2024","unstructured":"Dige O, Arneja D, Yau T F, Zhang Q, Bolandraftar M, Zhu X, Khattak F K. Can machine unlearning reduce social bias in language models. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 954\u2013969"},{"key":"50990_CR153","unstructured":"Schmidgall S, Harris C, Essien I, Olshvang D, Rahman T, Kim J W, Ziaei R, Eshraghian J, Abadir P, Chellappa R. Addressing cognitive bias in medical language models. 2024, arXiv preprint arXiv: 2402.08113"},{"key":"50990_CR154","doi-asserted-by":"publisher","first-page":"14653","DOI":"10.18653\/v1\/2024.emnlp-main.812","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"M M Manerba","year":"2024","unstructured":"Manerba M M, Stanczak K, Guidotti R, Augenstein I. Social bias probing: fairness benchmarking for language models. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 14653\u201314671"},{"key":"50990_CR155","first-page":"588","volume-title":"Proceedings of the 17th International Conference on Agents and Artificial Intelligence-Volume 1: ICAART","author":"N Upreti","year":"2025","unstructured":"Upreti N, Ciupa J, Belle V. Towards developing ethical reasoners: Integrating probabilistic reasoning and decision-making for complex AI systems. In: Proceedings of the 17th International Conference on Agents and Artificial Intelligence-Volume 1: ICAART. 2025, 588\u2013599"},{"key":"50990_CR156","unstructured":"Zhao Z, Jin Q, Yu S. PMC-patients: a large-scale dataset of patient notes and relations extracted from case reports in PubMed central. 2022, arXiv preprint arXiv: 2202.13876"},{"key":"50990_CR157","first-page":"2567","volume-title":"Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Q Jin","year":"2019","unstructured":"Jin Q, Dhingra B, Liu Z, Cohen W W, Lu X. PubMedQA: a dataset for biomedical research question answering. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 2567\u20132577"},{"key":"50990_CR158","unstructured":"Chen Z, Hern\u00e1ndez Cano A, Romanou A, Bonnet A, Matoba K, Salvi F, Pagliardini M, Fan S, K\u00f6pf A, Mohtashami A, Sallinen A, Sakhaeirad A, Swamy V, Krawczuk I, Bayazit D, Marmet A, Montariol S, Hartley M A, Jaggi M, Bosselut A. MEDITRON-70B: scaling medical pretraining for large language models. 2023, arXiv preprint arXiv: 2311.16079"},{"key":"50990_CR159","unstructured":"Bolton E, Venigalla A, Yasunaga M, Hall D, Xiong B, Lee T, Daneshjou R, Frankle J, Liang P, Carbin M, Manning C D. BioMedLM: a 2.7B parameter language model trained on biomedical text. 2024, arXiv preprint arXiv: 2403.18421"},{"key":"50990_CR160","unstructured":"Gao L, Biderman S, Black S, Golding L, Hoppe T, Foster C, Phang J, He H, Thite A, Nabeshima N, Presser S, Leahy C. The pile: an 800GB dataset of diverse text for language modeling. 2021, arXiv preprint arXiv: 2101.00027"},{"key":"50990_CR161","unstructured":"OpenAI. GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774"},{"key":"50990_CR162","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Z Azerbayev","year":"2024","unstructured":"Azerbayev Z, Schoelkopf H, Paster K, Santos M D, McAleer S M, Jiang A Q, Deng J, Biderman S, Welleck S. Llemma: an open language model for mathematics. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201328"},{"key":"50990_CR163","unstructured":"Rozi\u00e8re B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, Canton Ferrer C, Grattafiori A, Xiong W, D\u00e9fossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code Llama: open foundation models for code. 2023, arXiv preprint arXiv: 2308.12950"},{"key":"50990_CR164","first-page":"1","volume-title":"Proceedings of the 9th International Conference on Learning Representations","author":"D Hendrycks","year":"2021","unstructured":"Hendrycks D, Burns C, Basart S, Zou A, Mazeika M, Song D, Steinhardt J. Measuring massive multitask language understanding. In: Proceedings of the 9th International Conference on Learning Representations. 2021, 1\u201327"},{"key":"50990_CR165","unstructured":"Cobbe K, Kosaraju V, Bavarian M, Chen M, Jun H, Kaiser L, Plappert M, Tworek J, Hilton J, Nakano R, Hesse C, Schulman J. Training verifiers to solve math word problems. 2021, arXiv preprint arXiv: 2110.14168"},{"key":"50990_CR166","first-page":"278","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"A Lewkowycz","year":"2022","unstructured":"Lewkowycz A, Andreassen A, Dohan D, Dyer E, Michalewski H, Ramasesh V V, Slone A, Anil C, Schlag I, Gutman-Solo T, Wu Y, Neyshabur B, Gur-Ari G, Misra V. Solving quantitative reasoning problems with language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 278"},{"key":"50990_CR167","unstructured":"Cai Z, Cao M, Chen H, Chen K, Chen K, Chen X, Chen X, Chen Z, Chen Z, Chu P, Dong X, Duan H, Fan Q, Fei Z, Gao Y, Ge J, Gu C, Gu Y, Gui T, Guo A, Guo Q, He C, Hu Y, Huang T, Jiang T, Jiao P, Jin Z, Lei Z, Li J, Li J, Li L, Li S, Li W, Li Y, Liu H, Liu J, Hong J, Liu K, Liu K, Liu X, Lv C, Lv H, Lv K, Ma L, Ma R, Ma Z, Ning W, Ouyang L, Qiu J, Qu Y, Shang F, Shao Y, Song D, Song Z, Sui Z, Sun P, Sun Y, Tang H, Wang B, Wang G, Wang J, Wang J, Wang R, Wang Y, Wang Z, Wei X, Weng Q, Wu F, Xiong Y, Xu C, Xu R, Yan H, Yan Y, Yang X, Ye H, Ying H, Yu J, Yu J, Zang Y, Zhang C, Zhang L, Zhang P, Zhang P, Zhang R, Zhang S, Zhang S, Zhang W, Zhang W, Zhang X, Zhang X, Zhao H, Zhao Q, Zhao X, Zhou F, Zhou Z, Zhuo J, Zou Y, Qiu X, Qiao Y, Lin D. InternLM2 technical report. 2024, arXiv preprint arXiv: 2403.17297"},{"key":"50990_CR168","unstructured":"Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, Bikel D, Blecher L, Canton Ferrer C, Chen M, Cucurull G, Esiobu D, Fernandes J, Fu J, Fu W, Fuller B, Gao C, Goswami V, Goyal N, Hartshorn A, Hosseini S, Hou R, Inan H, Kardas M, Kerkez V, Khabsa M, Kloumann I, Korenev A, Koura P S, Lachaux M A, Lavril T, Lee J, Liskovich D, Lu Y, Mao Y, Martinet X, Mihaylov T, Mishra P, Molybog I, Nie Y, Poulton A, Reizenstein J, Rungta R, Saladi K, Schelten A, Silva R, Smith E M, Subramanian R, Tan X E, Tang B, Taylor R, Williams A, Kuan J X, Xu P, Yan Z, Zarov I, Zhang Y, Fan A, Kambadur M, Narang S, Rodriguez A, Stojnic R, Edunov S, Scialom T. Llama 2: Open foundation and fine-tuned chat models. 2023, arXiv preprint arXiv: 2307.09288"},{"key":"50990_CR169","unstructured":"Gunasekar S, Zhang Y, Aneja J, Mendes C C T, Giorno A D, Gopi S, Javaheripi M, Kauffmann P, de Rosa G, Saarikivi O, Salim A, Shah S, Behl H S, Wang X, Bubeck S, Eldan R, Kalai A T, Lee Y T, Li Y. Textbooks are all you need. 2023, arXiv preprint arXiv: 2306.11644"},{"key":"50990_CR170","unstructured":"Abdin M, Jacobs S A, Awan A A, Aneja J, Awadallah A, Awadalla H, Bach N, Bahree A, Bakhtiari A, Behl H, Benhaim A, Bilenko M, Bjorck J, Bubeck S, Cai M, Mendes C C T, Chen W, Chaudhary V, Chopra P, Giorno A D, de Rosa G, Dixon M, Eldan R, Iter D, Garg A, Goswami A, Gunasekar S, Haider E, Hao J, Hewett R J, Huynh J, Javaheripi M, Jin X, Kauffmann P, Karampatziakis N, Kim D, Khademi M, Kurilenko L, Lee J R, Lee Y T, Li Y, Liang C, Liu W, Lin E, Lin Z, Madan P, Mitra A, Modi H, Nguyen A, Norick B, Patra B, Perez-Becker D, Portet T, Pryzant R, Qin H, Radmilac M, Rosset C, Roy S, Ruwase O, Saarikivi O, Saied A, Salim A, Santacroce M, Shah S, Shang N, Sharma H, Song X, Tanaka M, Wang X, Ward R, Wang G, Witte P A, Wyatt M, Xu C, Xu J, Yadav S, Yang F, Yang Z, Yu D, Zhang C, Zhang C, Zhang J, Zhang L L, Zhang Y, Zhang Y, Zhang Y, Zhou X. Phi-3 technical report: a highly capable language model locally on your phone. 2024, arXiv preprint arXiv: 2404.14219"},{"key":"50990_CR171","unstructured":"Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, et al. The Llama 3 herd of models. 2024, arXiv preprint arXiv: 2407.21783"},{"key":"50990_CR172","unstructured":"Gemma Team. Gemma 2: improving open language models at a practical size. 2024, arXiv preprint arXiv: 2408.00118"},{"key":"50990_CR173","unstructured":"Rozi\u00e8re B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Sauvestre R, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, Ferrer C C, Grattafiori A, Xiong W, D\u00e9fossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code Llama: open foundation models for code. 2024, arXiv preprint arXiv: 2308.12950"},{"key":"50990_CR174","unstructured":"Zhou Z, Shi J X, Song P X, Yang X W, Jin Y X, Guo L Z, Li Y F. LawGPT: a Chinese legal knowledge-enhanced large language model. 2024, arXiv preprint arXiv: 2406.04614"},{"key":"50990_CR175","unstructured":"Huang Q, Tao M, An Z, Zhang C, Jiang C, Chen Z, Wu Z, Feng Y. Lawyer LLaMA technical report. 2023, arXiv preprint arXiv: 2305.15062"},{"key":"50990_CR176","unstructured":"Cui J, Li Z, Yan Y, Chen B, Yuan L. ChatLaw: open-source legal large language model with integrated external knowledge bases. 2023, arXiv preprint arXiv: 2306.16092"},{"key":"50990_CR177","unstructured":"Liu Z, Guo X, Lou F, Zeng L, Niu J, Wang Z, Xu J, Cai W, Yang Z, Zhao X, Li C, Xu S, Chen D, Chen Y, Bai Z, Zhang L. Fin-R1: a large language model for financial reasoning through reinforcement learning. 2025, arXiv preprint arXiv: 2503.16252"},{"issue":"9","key":"50990_CR178","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1007\/s10462-024-10896-y","volume":"57","author":"Z Lin","year":"2024","unstructured":"Lin Z, Guan S, Zhang W, Zhang H, Li Y, Zhang H. Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models. Artificial Intelligence Review, 2024, 57(9): 243","journal-title":"Artificial Intelligence Review"},{"key":"50990_CR179","doi-asserted-by":"publisher","first-page":"56","DOI":"10.18653\/v1\/2025.trustnlp-main.5","volume-title":"Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)","author":"G Liu","year":"2025","unstructured":"Liu G, Xue Z, Zhang X, Wang R, Johnson K. Smaller large language models can do moral self-correction. In: Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025). 2025, 56\u201365"},{"key":"50990_CR180","first-page":"6614","volume-title":"Proceedings of 2024 IEEE International Conference on Big Data","author":"K Nakagawa","year":"2024","unstructured":"Nakagawa K, Hirano M, Fujimoto Y. Evaluating companyspecific biases in financial sentiment analysis using large language models. In: Proceedings of 2024 IEEE International Conference on Big Data. 2024, 6614\u20136623"},{"issue":"1","key":"50990_CR181","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1038\/s41746-025-01790-0","volume":"8","author":"A Mahajan","year":"2025","unstructured":"Mahajan A, Obermeyer Z, Daneshjou R, Lester J, Powell D. Cognitive bias in clinical large language models. npj Digital Medicine, 2025, 8(1): 428","journal-title":"npj Digital Medicine"},{"key":"50990_CR182","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"S Wang","year":"2025","unstructured":"Wang S, Wang P, Zhou T, Dong Y, Tan Z, Li J. CEB: compositional evaluation benchmark for fairness in large language models. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201342"},{"key":"50990_CR183","first-page":"7075","volume-title":"Proceedings of 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"A Li","year":"2025","unstructured":"Li A, Zhao J, Liang B, Gui L, Wang H, Zeng X, Liang X, Wong K F, Xu R. Mitigating biases of large language models in stance detection with counterfactual augmented calibration. In: Proceedings of 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies. 2025, 7075\u20137092"},{"key":"50990_CR184","doi-asserted-by":"publisher","first-page":"5999","DOI":"10.18653\/v1\/2024.emnlp-main.344","volume-title":"Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing","author":"R D Martinez","year":"2024","unstructured":"Martinez R D, Goriely Z, Caines A, Buttery P, Beinborn L. Mitigating frequency bias and anisotropy in language model pre-training with syntactic smoothing. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 5999\u20136011"},{"key":"50990_CR185","doi-asserted-by":"publisher","first-page":"109698","DOI":"10.1016\/j.compeleceng.2024.109698","volume":"120","author":"H Kibriya","year":"2024","unstructured":"Kibriya H, Khan W Z, Siddiqa A, Khan M K. Privacy issues in large language models: a survey. Computers and Electrical Engineering, 2024, 120: 109698","journal-title":"Computers and Electrical Engineering"},{"issue":"1","key":"50990_CR186","doi-asserted-by":"publisher","first-page":"47","DOI":"10.20532\/cit.2024.1005778","volume":"32","author":"Z Gao","year":"2024","unstructured":"Gao Z, Liu X, Lan Y, Yang Z. A brief survey on safety of large language models. CIT Journal of Computing and Information Technology, 2024, 32(1): 47\u201364","journal-title":"CIT Journal of Computing and Information Technology"},{"key":"50990_CR187","unstructured":"Fan M, Chen C, Wang C, Huang J. On the trustworthiness landscape of state-of-the-art generative models: a comprehensive survey. 2023, arXiv preprint arXiv: 2307.16680"},{"key":"50990_CR188","first-page":"1","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"J Liu","year":"2024","unstructured":"Liu J, Gong R, Wei X, Dong Z, Cai J, Zhuang B. QLLM: accurate and efficient low-bitwidth quantization for large language models. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1\u201323"},{"key":"50990_CR189","unstructured":"Guo Y, Kong F, Li X, Li H, Chen W, Tian X, Cai J, Zhang Y, Liu S. decoupleQ: towards 2-bit post-training uniform quantization via decoupling parameters into integer and floating points. 2024, arXiv preprint arXiv: 2404.12759"},{"key":"50990_CR190","doi-asserted-by":"publisher","first-page":"4483","DOI":"10.18653\/v1\/2025.acl-long.225","volume-title":"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"J Zhao","year":"2025","unstructured":"Zhao J, Zhang M, Wang M, Shang Y, Zhang K, Guan W, Wang Y, Zhang M. PTQ1.61: push the real limit of extremely low-bit post-training quantization methods for large language models. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025, 4483\u20134502"},{"key":"50990_CR191","unstructured":"Liu Z, Zhao C, Huang H, Chen S, Zhang J, Zhao J, Roy S, Jin L, Xiong Y, Shi Y, Xiao L, Tian Y, Soran B, Krishnamoorthi R, Blankevoort T, Chandra V. ParetoQ: scaling laws in extremely low-bit LLM quantization. 2025, arXiv preprint arXiv: 2502.02631"},{"key":"50990_CR192","doi-asserted-by":"publisher","first-page":"2002","DOI":"10.18653\/v1\/2025.acl-long.99","volume-title":"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"H Jeon","year":"2025","unstructured":"Jeon H, Kim Y, Kim J J. L4Q: parameter efficient quantization-aware fine-tuning on large language models. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025, 2002\u20132024"},{"key":"50990_CR193","first-page":"1","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"G Zhang","year":"2025","unstructured":"Zhang G, Yue Y, Li Z, Yun S, Wan G, Wang K, Cheng D, Yu J X, Chen T. Cut the crap: an economical communication pipeline for LLM-based multi-agent systems. In: Proceedings of the 13th International Conference on Learning Representations. 2025, 1\u201340"},{"key":"50990_CR194","unstructured":"Liu L, Qu Z, Chen Z, Ding Y, Xie Y. Transformer acceleration with dynamic sparse attention. 2021, arXiv preprint arXiv: 2110.11299"},{"key":"50990_CR195","unstructured":"Liu R, Sun Y, Zhang M, Bai H, Yu X, Yu T, Yuan C, Hou L. Quantization hurts reasoning? An empirical study on quantized reasoning models. 2025, arXiv preprint arXiv: 2504.04823"},{"key":"50990_CR196","unstructured":"Africa D D, Weiss Y, Buttery P, Martinez R D. Learning dynamics of meta-learning in small model pretraining. 2025, arXiv preprint arXiv: 2508.02189"},{"key":"50990_CR197","unstructured":"Yang W, Yue X, Chaudhary V, Han X. Speculative thinking: enhancing small-model reasoning with large model guidance at inference time. 2025, arXiv preprint arXiv: 2504.12329"},{"key":"50990_CR198","doi-asserted-by":"publisher","first-page":"40","DOI":"10.5220\/0013123800003896","volume-title":"Proceedings of the 13th International Conference on Model-Based Software and Systems Engineering","author":"N Sinani","year":"2025","unstructured":"Sinani N, Salma S, Boutot P, Mustafiz S. Towards a domainspecific modelling environment for reinforcement learning. In: Proceedings of the 13th International Conference on Model-Based Software and Systems Engineering. 2025, 40\u201351"},{"key":"50990_CR199","unstructured":"Cheng Z, Hao S, Liu T, Zhou F, Xie Y, Yao F, Bian Y, Zhuang Y, Dey N, Zha Y, Gu Y, Zhou K, Wang Y, Li Y, Fan R, She J, Gao C, Saparov A, Li H, Killian T W, Yurochkin M, Liu Z, Xing E P, Hu Z. Revisiting reinforcement learning for LLM reasoning from a crossdomain perspective. 2025, arXiv preprint arXiv: 2506.14965"},{"key":"50990_CR200","doi-asserted-by":"crossref","unstructured":"Aralimatti R, Shakhadri S A G, KR K, Angadi K B. Fine-tuning small language models for domain-specific AI: an edge AI perspective. 2025, arXiv preprint arXiv: 2503.01933","DOI":"10.20944\/preprints202502.2128.v1"},{"key":"50990_CR201","unstructured":"Zhou H, Li X, Wang R, Cheng M, Zhou T, Hsieh C J. R1-zero\u2019s \u201caha moment\u201d in visual reasoning on a 2B non-SFT model. 2025, arXiv preprint arXiv: 2503.05132"},{"key":"50990_CR202","unstructured":"Shen H, Liu P, Li J, Fang C, Ma Y, Liao J, Shen Q, Zhang Z, Zhao K, Zhang Q, Xu R, Zhao T. VLM-R1: a stable and generalizable R1-style large vision-language model. 2025, arXiv preprint arXiv: 2504.07615"},{"key":"50990_CR203","unstructured":"Liu Z, Sun Z, Zang Y, Dong X, Cao Y, Duan H, Lin D, Wang J. Visual-RFT: Visual reinforcement fine-tuning. 2025, arXiv preprint arXiv: 2503.01785"},{"key":"50990_CR204","first-page":"1","volume-title":"Proceedings of 2024 IEEE Wireless Communications and Networking Conference","author":"X Zhang","year":"2024","unstructured":"Zhang X, Liu J, Xiong Z, Huang Y, Xie G, Zhang R. Edge intelligence optimization for large language model inference with batching and quantization. In: Proceedings of 2024 IEEE Wireless Communications and Networking Conference. 2024, 1\u20136"},{"key":"50990_CR205","first-page":"1","volume-title":"Proceedings of 2025 IEEE International Symposium on Circuits and Systems","author":"Y Hu","year":"2025","unstructured":"Hu Y, Yuan Z, Gao W, Zhang S, Liu Y. An integer-only quantization framework for edge deployment of large language models. In: Proceedings of 2025 IEEE International Symposium on Circuits and Systems. 2025, 1\u20135"},{"key":"50990_CR206","unstructured":"Kandala S V, Medaranga P, Varshney A. TinyLLM: a framework for training and deploying language models at the edge computers. 2024, arXiv preprint arXiv: 2412.15304"}],"container-title":["Frontiers of Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11704-025-50990-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11704-025-50990-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11704-025-50990-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T00:05:59Z","timestamp":1771545959000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11704-025-50990-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,19]]},"references-count":206,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2026,11]]}},"alternative-id":["50990"],"URL":"https:\/\/doi.org\/10.1007\/s11704-025-50990-0","relation":{},"ISSN":["2095-2228","2095-2236"],"issn-type":[{"value":"2095-2228","type":"print"},{"value":"2095-2236","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,19]]},"assertion":[{"value":"9 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 February 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests or financial conflicts to disclose.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"2011366"}}