{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T17:40:47Z","timestamp":1777570847102,"version":"3.51.4"},"reference-count":236,"publisher":"Association for Computing Machinery (ACM)","issue":"8","funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62495093"],"award-info":[{"award-number":["62495093"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Jiangsu Science Foundation","award":["BK20243039"],"award-info":[{"award-number":["BK20243039"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,6,30]]},"abstract":"<jats:p>Mathematical reasoning has long represented one of the most fundamental and challenging frontiers in artificial intelligence research. In recent years, large language models (LLMs) have achieved significant advances in this area. This survey examines the development of mathematical reasoning abilities in LLMs through two high-level cognitive phases: comprehension, where models gain mathematical understanding via diverse pretraining strategies, and answer generation, which has progressed from direct prediction to step-by-step Chain-of-Thought (CoT) reasoning. We review methods for enhancing mathematical reasoning, ranging from training-free prompting to fine-tuning approaches such as supervised fine-tuning and reinforcement learning, and discuss recent work on extended CoT and \u201ctest-time scaling\u201d. Despite notable progress, fundamental challenges remain in terms of capacity, efficiency, and generalization. To address these issues, we highlight promising research directions, including advanced pretraining and knowledge augmentation techniques, formal reasoning frameworks, and meta-generalization through principled learning paradigms. This survey tries to provide some insights for researchers interested in enhancing reasoning capabilities of LLMs and for those seeking to apply these techniques to other domains.<\/jats:p>","DOI":"10.1145\/3786333","type":"journal-article","created":{"date-parts":[[2025,12,25]],"date-time":"2025-12-25T12:07:58Z","timestamp":1766664478000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A Survey on Large Language Models for Mathematical Reasoning"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-1545-9457","authenticated-orcid":false,"given":"Peng-Yuan","family":"Wang","sequence":"first","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2515-3200","authenticated-orcid":false,"given":"Tian-Shuo","family":"Liu","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0526-1321","authenticated-orcid":false,"given":"Chenyang","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0449-002X","authenticated-orcid":false,"given":"Ziniu","family":"Li","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong - Shenzhen","place":["Shenzhen, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0494-9292","authenticated-orcid":false,"given":"Yidi","family":"Wang","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5491-5779","authenticated-orcid":false,"given":"Shu","family":"Yan","sequence":"additional","affiliation":[{"name":"School of Computer Science, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3421-606X","authenticated-orcid":false,"given":"Chengxing","family":"Jia","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2663-7600","authenticated-orcid":false,"given":"Xu-Hui","family":"Liu","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6453-8535","authenticated-orcid":false,"given":"Xinwei","family":"Chen","sequence":"additional","affiliation":[{"name":"polixir","place":["Nanjing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4315-6976","authenticated-orcid":false,"given":"Jiacheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University","place":["Singapore, Singapore"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1052-5447","authenticated-orcid":false,"given":"Yang","family":"Yu","sequence":"additional","affiliation":[{"name":"National Key Laboratory for Novel Software Technology, School of Artificial Intelligence, Nanjing University","place":["Nanjing, China"]}]}],"member":"320","published-online":{"date-parts":[[2026,2,4]]},"reference":[{"key":"e_1_3_4_2_2","article-title":"SIKeD: Self-guided iterative knowledge distillation for mathematical reasoning","volume":"2410","author":"Adarsh Shivam","year":"2024","unstructured":"Shivam Adarsh, Kumar Shridhar, Caglar Gulcehre, Nicholas Monath, and Mrinmaya Sachan. 2024. SIKeD: Self-guided iterative knowledge distillation for mathematical reasoning. CoRR abs\/2410.18574 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.662"},{"key":"e_1_3_4_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.eacl-srw.17"},{"key":"e_1_3_4_5_2","volume-title":"Proceedings of the 41th International Conference on Machine Learning","author":"Allen-Zhu Zeyuan","year":"2024","unstructured":"Zeyuan Allen-Zhu and Yuanzhi Li. 2024. Physics of language models: Part 3.1, knowledge storage and extraction. In Proceedings of the 41th International Conference on Machine Learning."},{"key":"e_1_3_4_6_2","article-title":"Mathqa: Towards interpretable math word problem solving with operation-based formalisms","volume":"1905","author":"Amini Aida","year":"2019","unstructured":"Aida Amini, Saadia Gabriel, Peter Lin, Rik Koncel-Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. 2019. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. CoRR abs\/1905.13319 (2019).","journal-title":"CoRR"},{"key":"e_1_3_4_7_2","article-title":"Learning from mistakes makes LLM better reasoner","volume":"2310","author":"An Shengnan","year":"2023","unstructured":"Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, and Weizhu Chen. 2023. Learning from mistakes makes LLM better reasoner. CoRR abs\/2310.20689 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_8_2","article-title":"Critique-out-loud reward models","volume":"2408","author":"Ankner Zachary","year":"2024","unstructured":"Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D. Chang, and Prithviraj Ammanabrolu. 2024. Critique-out-loud reward models. CoRR abs\/2408.11791 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_9_2","volume-title":"Advances in Neural Information Processing Systems 31","author":"Anthony Thomas","year":"2017","unstructured":"Thomas Anthony, Zheng Tian, and David Barber. 2017. Thinking fast and slow with deep learning and tree search. In Advances in Neural Information Processing Systems 31."},{"key":"e_1_3_4_10_2","article-title":"Chain-of-thought reasoning in the wild is not always faithful","volume":"2503","author":"Arcuschin Iv\u00e1n","year":"2025","unstructured":"Iv\u00e1n Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, and Arthur Conmy. 2025. Chain-of-thought reasoning in the wild is not always faithful. CoRR abs\/2503.08679 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511804090"},{"key":"e_1_3_4_12_2","article-title":"A latent variable model approach to PMI-based word embeddings","volume":"1502","author":"Arora Sanjeev","year":"2019","unstructured":"Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2019. A latent variable model approach to PMI-based word embeddings. CoRR abs\/1502.03520 (2019).","journal-title":"CoRR"},{"key":"e_1_3_4_13_2","volume-title":"The 12th International Conference on Learning Representations","author":"Asai Akari","year":"2023","unstructured":"Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2023. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In The 12th International Conference on Learning Representations."},{"key":"e_1_3_4_14_2","article-title":"Program synthesis with large language models","volume":"2108","author":"Austin Jacob","year":"2021","unstructured":"Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, et\u00a0al. 2021. Program synthesis with large language models. CoRR abs\/2108.07732 (2021).","journal-title":"CoRR"},{"key":"e_1_3_4_15_2","volume-title":"The 12th International Conference on Learning Representations","author":"Azerbayev Zhangir","year":"2024","unstructured":"Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen Marcus McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, and Sean Welleck. 2024. Llemma: An open language model for mathematics. In The 12th International Conference on Learning Representations."},{"key":"e_1_3_4_16_2","article-title":"Smaller, weaker, yet better: Training LLM reasoners via compute-optimal sampling","volume":"2408","author":"Bansal Hritik","year":"2024","unstructured":"Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, and Mehran Kazemi. 2024. Smaller, weaker, yet better: Training LLM reasoners via compute-optimal sampling. CoRR abs\/2408.16737 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i16.29720"},{"key":"e_1_3_4_18_2","article-title":"Forest-of-thought: Scaling test-time compute for enhancing LLM reasoning","volume":"2412","author":"Bi Zhenni","year":"2024","unstructured":"Zhenni Bi, Kai Han, Chuanjian Liu, Yehui Tang, and Yunhe Wang. 2024. Forest-of-thought: Scaling test-time compute for enhancing LLM reasoning. CoRR abs\/2412.09078 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_19_2","article-title":"Natural language input for a computer problem solving system","author":"Bobrow Daniel","year":"1964","unstructured":"Daniel Bobrow et\u00a0al. 1964. Natural language input for a computer problem solving system. Ph. D. Thesis, Department of Mathematics (1964).","journal-title":"Ph. D. Thesis, Department of Mathematics"},{"key":"e_1_3_4_20_2","article-title":"Large language monkeys: Scaling inference compute with repeated sampling","volume":"2407","author":"Brown Bradley","year":"2024","unstructured":"Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher R\u00e9, and Azalia Mirhoseini. 2024. Large language monkeys: Scaling inference compute with repeated sampling. CoRR abs\/2407.21787 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_21_2","article-title":"Large language monkeys: Scaling inference compute with repeated sampling","volume":"2407","author":"Brown Bradley","year":"2024","unstructured":"Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher R\u00e9, and Azalia Mirhoseini. 2024. Large language monkeys: Scaling inference compute with repeated sampling. CoRR abs\/2407.21787 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2012.2186810"},{"key":"e_1_3_4_23_2","article-title":"Demystifying long chain-of-thought reasoning in LLMs","volume":"2502","author":"Chang Edward Y.","year":"2025","unstructured":"Edward Y. Chang, Yuxuan Tong, Morry Niu, Graham Neubig, and Xiang Yue. 2025. Demystifying long chain-of-thought reasoning in LLMs. CoRR abs\/2502.03373 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1609\/aiide.v4i1.18700"},{"key":"e_1_3_4_25_2","article-title":"xverify: Efficient answer verifier for reasoning model evaluations","volume":"2504","author":"Chen Ding","year":"2025","unstructured":"Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, and Zhiyu Li. 2025. xverify: Efficient answer verifier for reasoning model evaluations. CoRR abs\/2504.10481 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.52202\/079017-0870"},{"key":"e_1_3_4_27_2","article-title":"Learning to reason with search for LLMs via reinforcement learning","volume":"2503","author":"Chen Mingyang","year":"2025","unstructured":"Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Fan Yang, Zenan Zhou, Weipeng Chen, Haofen Wang, Jeff Z. Pan, et\u00a0al. 2025. Learning to reason with search for LLMs via reinforcement learning. CoRR abs\/2503.19470 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_28_2","article-title":"Towards reasoning era: A survey of long chain-of-thought for reasoning large language models","volume":"2503","author":"Chen Qiguang","year":"2025","unstructured":"Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. 2025. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models. CoRR abs\/2503.09567 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_29_2","volume-title":"Proceeding of the 12th International Conference on Learning Representations","author":"Chen Sijia","year":"2024","unstructured":"Sijia Chen, Baochun Li, and Di Niu. 2024. Boosting of thoughts: Trial-and-error problem solving with large language models. In Proceeding of the 12th International Conference on Learning Representations."},{"key":"e_1_3_4_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.489"},{"key":"e_1_3_4_31_2","article-title":"Reasoning models don\u2019t always say what they think","volume":"2505","author":"Chen Yanda","year":"2025","unstructured":"Yanda Chen, Joe Benton, Ansh Radhakrishnan, Jonathan Uesato, Carson Denison, John Schulman, Arushi Somani, Peter Hase, Misha Wagner, Fabien Roger, et\u00a0al. 2025. Reasoning models don\u2019t always say what they think. CoRR abs\/2505.05410 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_32_2","unstructured":"Ethan Chern Haoyang Zou Xuefeng Li Jiewen Hu Kehua Feng Junlong Li and Pengfei Liu. 2023. Generative AI for Math: Abel. Retrieved from https:\/\/github.com\/GAIR-NLP\/abel"},{"key":"e_1_3_4_33_2","article-title":"Training verifiers to solve math word problems","volume":"2110","author":"Cobbe Karl","year":"2021","unstructured":"Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et\u00a0al. 2021. Training verifiers to solve math word problems. CoRR abs\/2110.14168 (2021).","journal-title":"CoRR"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pbio.1001293"},{"key":"e_1_3_4_35_2","unstructured":"OpenCompass Contributors. 2023. Opencompass: A universal evaluation platform for foundation models. https:\/\/github.com\/opencompass\/opencompass"},{"key":"e_1_3_4_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/1777826.1777833"},{"key":"e_1_3_4_37_2","article-title":"Beyond imitation: Learning key reasoning steps from dual chain-of-thoughts in reasoning distillation","volume":"2405","author":"Dai Chengwei","year":"2024","unstructured":"Chengwei Dai, Kun Li, Wei Zhou, and Songlin Hu. 2024. Beyond imitation: Learning key reasoning steps from dual chain-of-thoughts in reasoning distillation. CoRR abs\/2405.19737 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_38_2","article-title":"MATHSENSEI: A tool-augmented large language model for mathematical reasoning","volume":"2402","author":"Das Debrup","year":"2024","unstructured":"Debrup Das, Debopriyo Banerjee, Somak Aditya, and Ashish Kulkarni. 2024. MATHSENSEI: A tool-augmented large language model for mathematical reasoning. CoRR abs\/2402.17231 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_39_2","article-title":"Metacognitive capabilities of LLMs: An exploration in mathematical problem solving","volume":"2405","author":"Didolkar Aniket","year":"2024","unstructured":"Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, and Sanjeev Arora. 2024. Metacognitive capabilities of LLMs: An exploration in mathematical problem solving. CoRR abs\/2405.12205 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_40_2","article-title":"Unleashing reasoning capability of LLMs via scalable question synthesis from scratch","volume":"2410","author":"Ding Yuyang","year":"2024","unstructured":"Yuyang Ding, Xinyu Shi, Xiaobo Liang, Juntao Li, Qiaoming Zhu, and Min Zhang. 2024. Unleashing reasoning capability of LLMs via scalable question synthesis from scratch. CoRR abs\/2410.18693 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_41_2","article-title":"SBI-RAG: Enhancing math word problem solving for students through schema-based instruction and retrieval-augmented generation","volume":"2410","author":"Dixit Prakhar","year":"2024","unstructured":"Prakhar Dixit and Tim Oates. 2024. SBI-RAG: Enhancing math word problem solving for students through schema-based instruction and retrieval-augmented generation. CoRR 2410.13293 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_42_2","article-title":"The llama 3 herd of models","volume":"2407","author":"Dubey Abhimanyu","year":"2024","unstructured":"Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et\u00a0al. 2024. The llama 3 herd of models. CoRR abs\/2407.21783 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_43_2","doi-asserted-by":"publisher","DOI":"10.5555\/601134"},{"key":"e_1_3_4_44_2","article-title":"Towards analyzing and understanding the limitations of DPO: A theoretical perspective","volume":"2404","author":"Feng Duanyu","year":"2024","unstructured":"Duanyu Feng, Bowen Qin, Chen Huang, Zheng Zhang, and Wenqiang Lei. 2024. Towards analyzing and understanding the limitations of DPO: A theoretical perspective. CoRR abs\/2404.04626 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_45_2","volume-title":"Advances in Neural Information Processing Systems 37","author":"Feng Guhao","year":"2023","unstructured":"Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, and Liwei Wang. 2023. Towards revealing the mystery behind chain of thought: A theoretical perspective. In Advances in Neural Information Processing Systems 37."},{"key":"e_1_3_4_46_2","article-title":"Step-by-step reasoning for math problems via twisted sequential monte carlo","volume":"2410","author":"Feng Shengyu","year":"2024","unstructured":"Shengyu Feng, Xiang Kong, Shuang Ma, Aonan Zhang, Dong Yin, Chong Wang, Ruoming Pang, and Yiming Yang. 2024. Step-by-step reasoning for math problems via twisted sequential monte carlo. CoRR abs\/2410.01920 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_47_2","doi-asserted-by":"publisher","DOI":"10.3758\/BF03207654"},{"key":"e_1_3_4_48_2","doi-asserted-by":"publisher","DOI":"10.1152\/physrev.00006.2011"},{"key":"e_1_3_4_49_2","article-title":"Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective stars","volume":"2503","author":"Gandhi Kanishk","year":"2025","unstructured":"Kanishk Gandhi, Ayush Chakravarthy, Anikait Singh, Nathan Lile, and Noah D. Goodman. 2025. Cognitive behaviors that enable self-improving reasoners, or, four habits of highly effective stars. CoRR abs\/2503.01307 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_50_2","article-title":"LLM critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback","volume":"2406","author":"Gao Bofei","year":"2024","unstructured":"Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Junyang Lin, Chang Zhou, Wen Xiao, et\u00a0al. 2024. LLM critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback. CoRR abs\/2406.14024 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_51_2","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Gao Luyu","year":"2023","unstructured":"Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. Pal: Program-aided language models. In Proceedings of the 40th International Conference on Machine Learning."},{"key":"e_1_3_4_52_2","article-title":"Retrieval-augmented generation for large language models: A survey","volume":"2312","author":"Gao Yunfan","year":"2023","unstructured":"Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey. CoRR abs\/2312.10997 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_53_2","article-title":"Rlef: Grounding code LLMs in execution feedback with reinforcement learning","volume":"2410","author":"Gehring Jonas","year":"2024","unstructured":"Jonas Gehring, Kunhao Zheng, Jade Copet, Vegard Mella, Taco Cohen, and Gabriel Synnaeve. 2024. Rlef: Grounding code LLMs in execution feedback with reinforcement learning. CoRR abs\/2410.02089 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_54_2","article-title":"Visual description grounding reduces hallucinations and boosts reasoning in lvlms","volume":"2405","author":"Ghosh Sreyan","year":"2024","unstructured":"Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, and Dinesh Manocha. 2024. Visual description grounding reduces hallucinations and boosts reasoning in lvlms. CoRR abs\/2405.15683 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-013-5407-y"},{"key":"e_1_3_4_56_2","article-title":"rStar-Math: Small LLMs can master math reasoning with self-evolved deep thinking","volume":"2501","author":"Guan Xinyu","year":"2025","unstructured":"Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang. 2025. rStar-Math: Small LLMs can master math reasoning with self-evolved deep thinking. CoRR abs\/2501.04519 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_57_2","article-title":"DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning","volume":"2501","author":"Guo Daya","year":"2025","unstructured":"Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et\u00a0al. 2025. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. CoRR abs\/2501.12948 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_58_2","article-title":"Learning beyond pattern matching? assaying mathematical understanding in LLMs","volume":"2405","author":"Guo Siyuan","year":"2024","unstructured":"Siyuan Guo, Aniket Didolkar, Nan Rosemary Ke, Anirudh Goyal, Ferenc Husz\u00e1r, and Bernhard Sch\u00f6lkopf. 2024. Learning beyond pattern matching? assaying mathematical understanding in LLMs. CoRR abs\/2405.15485 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_59_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.507"},{"key":"e_1_3_4_60_2","article-title":"Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems","volume":"2402","author":"He Chaoqun","year":"2024","unstructured":"Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, et\u00a0al. 2024. Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems. CoRR abs\/2402.14008 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_61_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.emnlp-main.125"},{"key":"e_1_3_4_62_2","article-title":"Teacherlm: Teaching to fish rather than giving the fish, language modeling likewise","volume":"2310","author":"He Nan","year":"2023","unstructured":"Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, et\u00a0al. 2023. Teacherlm: Teaching to fish rather than giving the fish, language modeling likewise. CoRR abs\/2310.19019 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_63_2","article-title":"Measuring massive multitask language understanding","volume":"2009","author":"Hendrycks Dan","year":"2020","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. CoRR abs\/2009.03300 (2020).","journal-title":"CoRR"},{"key":"e_1_3_4_64_2","volume-title":"The 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. 2021. Measuring mathematical problem solving with the math dataset. In The 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_3_4_65_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.830"},{"key":"e_1_3_4_66_2","article-title":"V-star: Training verifiers for self-taught reasoners","volume":"2402","author":"Hosseini Arian","year":"2024","unstructured":"Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, and Rishabh Agarwal. 2024. V-star: Training verifiers for self-taught reasoners. CoRR abs\/2402.06457 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_67_2","article-title":"REINFORCE++: A simple and efficient approach for aligning large language models","volume":"2501","author":"Hu Jian","year":"2025","unstructured":"Jian Hu. 2025. REINFORCE++: A simple and efficient approach for aligning large language models. CoRR abs\/2501.03262 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_68_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.67"},{"key":"e_1_3_4_69_2","article-title":"Key-point-driven data synthesis with its enhancement on mathematical reasoning","volume":"2403","author":"Huang Yiming","year":"2024","unstructured":"Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, and Weizhu Chen. 2024. Key-point-driven data synthesis with its enhancement on mathematical reasoning. CoRR abs\/2403.02333 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_70_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-industry.4"},{"key":"e_1_3_4_71_2","article-title":"Controlling large language model with latent actions","volume":"2503","author":"Jia Chengxing","year":"2025","unstructured":"Chengxing Jia, Ziniu Li, Pengyuan Wang, Yi-Chen Li, Zhenyu Hou, Yuxiao Dong, and Yang Yu. 2025. Controlling large language model with latent actions. CoRR abs\/2503.21383 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_72_2","article-title":"BWArea Model: Learning world model, inverse dynamics, and policy for controllable language generation","volume":"2405","author":"Jia Chengxing","year":"2024","unstructured":"Chengxing Jia, Pengyuan Wang, Ziniu Li, Yi-Chen Li, Zhilong Zhang, Nan Tang, and Yang Yu. 2024. BWArea Model: Learning world model, inverse dynamics, and policy for controllable language generation. CoRR abs\/2405.17039 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_73_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.397"},{"key":"e_1_3_4_74_2","article-title":"Leveraging training data in few-shot prompting for numerical reasoning","volume":"2305","author":"Jie Zhanming","year":"2023","unstructured":"Zhanming Jie and Wei Lu. 2023. Leveraging training data in few-shot prompting for numerical reasoning. CoRR abs\/2305.18170 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_75_2","article-title":"Search-R1: Training LLMs to reason and leverage search engines with reinforcement learning","volume":"2503","author":"Jin Bowen","year":"2025","unstructured":"Bowen Jin, Hansi Zeng, Zhenrui Yue, Dong Wang, Hamed Zamani, and Jiawei Han. 2025. Search-R1: Training LLMs to reason and leverage search engines with reinforcement learning. CoRR abs\/2503.09516 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_76_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.108"},{"key":"e_1_3_4_77_2","article-title":"Vineppo: Unlocking RL potential for LLM reasoning through refined credit assignment","volume":"2410","author":"Kazemnejad Amirhossein","year":"2024","unstructured":"Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, and Nicolas Le Roux. 2024. Vineppo: Unlocking RL potential for LLM reasoning through refined credit assignment. CoRR abs\/2410.01679 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_78_2","volume-title":"Advances in Neural Information Processing Systems 36","author":"Kojima Takeshi","year":"2022","unstructured":"Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_4_79_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1136"},{"key":"e_1_3_4_80_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D13-1161"},{"key":"e_1_3_4_81_2","article-title":"Step-DPO: Step-wise preference optimization for long-chain reasoning of LLMs","volume":"2406","author":"Lai Xin","year":"2024","unstructured":"Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, and Jiaya Jia. 2024. Step-DPO: Step-wise preference optimization for long-chain reasoning of LLMs. CoRR abs\/2406.18629 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_82_2","article-title":"Beyond A*: Better planning with transformers via search dynamics bootstrapping","volume":"2402","author":"Lehnert Lucas","year":"2024","unstructured":"Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, and Yuandong Tian. 2024. Beyond A*: Better planning with transformers via search dynamics bootstrapping. CoRR abs\/2402.14083 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_83_2","article-title":"Retrieval-augmented generation to improve math question-answering: Trade-offs between groundedness and human preference","volume":"2310","author":"Levonian Zachary","year":"2023","unstructured":"Zachary Levonian, Chenglu Li, Wangda Zhu, Anoushka Gade, Owen Henkel, Millie-Ellen Postle, and Wanli Xing. 2023. Retrieval-augmented generation to improve math question-answering: Trade-offs between groundedness and human preference. CoRR abs\/2310.03184 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_84_2","article-title":"Solving quantitative reasoning problems with language models","volume":"2206","author":"Lewkowycz Aitor","year":"2022","unstructured":"Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay V. Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et\u00a0al. 2022. Solving quantitative reasoning problems with language models. CoRR abs\/2206.14858 (2022).","journal-title":"CoRR"},{"key":"e_1_3_4_85_2","volume-title":"The 39th Annual Conference on Neural Information Processing Systems","author":"Li Chengpeng","year":"2025","unstructured":"Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, et\u00a0al. 2025. Teaching language models to reason with tools. In The 39th Annual Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_86_2","article-title":"Common 7b language models already possess strong math capabilities","volume":"2403","author":"Li Chen","year":"2024","unstructured":"Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, and Houwen Peng. 2024. Common 7b language models already possess strong math capabilities. CoRR abs\/2403.04706 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_87_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.551"},{"key":"e_1_3_4_88_2","first-page":"9","article-title":"Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions","author":"Li Jia","year":"2024","unstructured":"Jia Li, Edward Beeching, Lewis Tunstall, Ben Lipkin, Roman Soletskyi, Shengyi Huang, Kashif Rasul, Longhui Yu, Albert Q Jiang, Ziju Shen, et\u00a0al. 2024. Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions. Hugging Face Repository 13, 9 (2024), 9.","journal-title":"Hugging Face Repository"},{"key":"e_1_3_4_89_2","article-title":"GSM-Plus: A comprehensive benchmark for evaluating the robustness of LLMs as mathematical problem solvers","volume":"2402","author":"Li Qintong","year":"2024","unstructured":"Qintong Li, Leyang Cui, Xueliang Zhao, Lingpeng Kong, and Wei Bi. 2024. GSM-Plus: A comprehensive benchmark for evaluating the robustness of LLMs as mathematical problem solvers. CoRR abs\/2402.19255 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_90_2","article-title":"A survey on LLM test-time compute via search: Tasks, LLM profiling, search algorithms, and relevant frameworks","volume":"2501","author":"Li Xinzhe","year":"2025","unstructured":"Xinzhe Li. 2025. A survey on LLM test-time compute via search: Tasks, LLM profiling, search algorithms, and relevant frameworks. CoRR abs\/2501.10069 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_91_2","article-title":"Torl: Scaling tool-integrated RL","volume":"2503","author":"Li Xuefeng","year":"2025","unstructured":"Xuefeng Li, Haoyang Zou, and Pengfei Liu. 2025. Torl: Scaling tool-integrated RL. CoRR abs\/2503.23383 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_92_2","article-title":"Treepo: Bridging the gap of policy optimization and efficacy and inference efficiency with heuristic tree-based modeling","volume":"22508","author":"Li Yizhi","year":"2025","unstructured":"Yizhi Li, Qingshui Gu, Zhoufutu Wen, Ziniu Li, Tianshun Xing, Shuyue Guo, Tianyu Zheng, Xin Zhou, Xingwei Qu, Wangchunshu Zhou, et\u00a0al. 2025. Treepo: Bridging the gap of policy optimization and efficacy and inference efficiency with heuristic tree-based modeling. CoRR abs\/22508.17445 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_93_2","article-title":"Making large language models better reasoners with step-aware verifier","volume":"2206","author":"Li Yifei","year":"2022","unstructured":"Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. 2022. Making large language models better reasoners with step-aware verifier. CoRR abs\/2206.02336 (2022).","journal-title":"CoRR"},{"key":"e_1_3_4_94_2","article-title":"Generalist reward models: Found inside large language models","volume":"2506","author":"Li Yi-Chen","year":"2025","unstructured":"Yi-Chen Li, Tian Xu, Yang Yu, Xuqin Zhang, Xiong-Hui Chen, Zhongxiang Ling, Ningjing Chao, Lei Yuan, and Zhi-Hua Zhou. 2025. Generalist reward models: Found inside large language models. CoRR abs\/2506.23235 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_95_2","volume-title":"The 13th International Conference on Learning Representations","author":"Li Ziniu","year":"2025","unstructured":"Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Zhi-Quan Luo, and Ruoyu Sun. 2025. Preserving diversity in supervised fine-tuning of large language models. In The 13th International Conference on Learning Representations."},{"key":"e_1_3_4_96_2","article-title":"Knapsack rl: Unlocking exploration of llms via optimizing budget allocation","volume":"2509","author":"Li Ziniu","year":"2025","unstructured":"Ziniu Li, Congliang Chen, Tianyun Yang, Tian Ding, Ruoyu Sun, Ge Zhang, Wenhao Huang, and Zhi-Quan Luo. 2025. Knapsack rl: Unlocking exploration of llms via optimizing budget allocation. CoRR abs\/2509.25849 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_97_2","unstructured":"Ziniu Li Pengyuan Wang Tian Xu Tian Ding Ruoyu Sun and Yang Yu. 2025. Review of Reinforcement Learning for Large Language Models: Formulations Algorithms and Opportunities. http:\/\/www.liziniu.org\/docs\/RL4LLM_Survey.pdf"},{"key":"e_1_3_4_98_2","volume-title":"The Second Tiny Papers Track at International Conference on Learning Representations","author":"Li Ziniu","year":"2024","unstructured":"Ziniu Li, Tian Xu, and Yang Yu. 2024. When is RL better than DPO in RLHF? A representation and optimization perspective. In The Second Tiny Papers Track at International Conference on Learning Representations."},{"key":"e_1_3_4_99_2","volume-title":"The 41st International Conference on Machine Learning","author":"Li Ziniu","year":"2024","unstructured":"Ziniu Li, Tian Xu, Yushun Zhang, Zhihang Lin, Yang Yu, Ruoyu Sun, and Zhi-Quan Luo. 2024. ReMax: A simple, effective, and efficient reinforcement learning method for aligning large language models. In The 41st International Conference on Machine Learning."},{"key":"e_1_3_4_100_2","article-title":"From system 1 to system 2: A survey of reasoning large language models","volume":"2502","author":"Li Zhong-Zhi","year":"2025","unstructured":"Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, et\u00a0al. 2025. From system 1 to system 2: A survey of reasoning large language models. CoRR abs\/2502.17419 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_101_2","article-title":"From system 1 to system 2: A survey of reasoning large language models","volume":"2502","author":"Li Zhong-Zhi","year":"2025","unstructured":"Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, et\u00a0al. 2025. From system 1 to system 2: A survey of reasoning large language models. CoRR abs\/2502.17419 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_102_2","doi-asserted-by":"publisher","DOI":"10.1145\/3656580"},{"key":"e_1_3_4_103_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.889"},{"key":"e_1_3_4_104_2","article-title":"Let\u2019s verify step by step","volume":"2305","author":"Lightman Hunter","year":"2023","unstructured":"Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let\u2019s verify step by step. CoRR abs\/2305.20050 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_105_2","article-title":"Plan and budget: Effective and efficient test-time scaling on large language model reasoning","volume":"2505","author":"Lin Junhong","year":"2025","unstructured":"Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, and Dawei Zhou. 2025. Plan and budget: Effective and efficient test-time scaling on large language model reasoning. CoRR abs\/2505.16122 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_106_2","article-title":"On the limited generalization capability of the implicit reward model induced by direct preference optimization","volume":"2409","author":"Lin Yong","year":"2024","unstructured":"Yong Lin, Skyler Seto, Maartje ter Hoeve, Katherine Metcalf, Barry-John Theobald, Xuan Wang, Yizhe Zhang, Chen Huang, and Tong Zhang. 2024. On the limited generalization capability of the implicit reward model induced by direct preference optimization. CoRR abs\/2409.03650 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_107_2","article-title":"Step-KTO: Optimizing mathematical reasoning through stepwise binary feedback","volume":"2501","author":"Lin Yen-Ting","year":"2025","unstructured":"Yen-Ting Lin, Di Jin, Tengyu Xu, Tianhao Wu, Sainbayar Sukhbaatar, Chen Zhu, Yun He, Yun-Nung Chen, Jason Weston, Yuandong Tian, et\u00a0al. 2025. Step-KTO: Optimizing mathematical reasoning through stepwise binary feedback. CoRR abs\/2501.10799 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_108_2","unstructured":"Jack Lindsey Wes Gurnee Emmanuel Ameisen Brian Chen Adam Pearce Nicholas L. Turner Craig Citro David Abrahams Shan Carter Basil Hosmer Jonathan Marcus Michael Sklar Adly Templeton Trenton Bricken Callum McDougall Hoagy Cunningham Thomas Henighan Adam Jermyn Andy Jones Andrew Persic Zhenyi Qi T. Ben Thompson Sam Zimmerman Kelley Rivoire Thomas Conerly Chris Olah and Joshua Batson. 2025. On the Biology of a Large Language Model. https:\/\/transformer-circuits.pub\/2025\/attribution-graphs\/biology.html"},{"key":"e_1_3_4_109_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1015"},{"key":"e_1_3_4_110_2","article-title":"Tinygsm: Achieving> 80% on gsm8k with small language models","volume":"2312","author":"Liu Bingbin","year":"2023","unstructured":"Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, and Yi Zhang. 2023. Tinygsm: Achieving> 80% on gsm8k with small language models. CoRR abs\/2312.09241 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_111_2","article-title":"Augmenting math word problems via iterative question composing","volume":"2401","author":"Liu Haoxiong","year":"2024","unstructured":"Haoxiong Liu, Yifan Zhang, Yifan Luo, and Andrew Chi-Chih Yao. 2024. Augmenting math word problems via iterative question composing. CoRR abs\/2401.09003 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_112_2","volume-title":"The Thirteenth International Conference on Learning Representations","author":"Liu Tian-Shuo","year":"2025","unstructured":"Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen, Lixuan Jin, Pengyuan Wang, Zhilong Zhang, and Yang Yu. 2025. Semantic temporal abstraction via vision-language model guidance for efficient reinforcement learning. In The Thirteenth International Conference on Learning Representations."},{"key":"e_1_3_4_113_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.817"},{"key":"e_1_3_4_114_2","article-title":"Improve mathematical reasoning in language models by automated process supervision","volume":"2406","author":"Luo Liangchen","year":"2024","unstructured":"Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, et\u00a0al. 2024. Improve mathematical reasoning in language models by automated process supervision. CoRR abs\/2406.06592 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_115_2","unstructured":"Michael Luo Sijun Tan Justin Wong Xiaoxiang Shi William Y. Tang Manan Roongta Colin Cai Jeffrey Luo Tianjun Zhang Li Erran Li et\u00a0al. 2025. DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL. https:\/\/pretty-radio-b75.notion.site\/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2Notion Blog."},{"key":"e_1_3_4_116_2","volume-title":"Forty-second International Conference on Machine Learning","author":"Ma Zehong","year":"2025","unstructured":"Zehong Ma, Shiliang Zhang, Longhui Wei, and Qi Tian. 2025. Efficient multi-modal long context learning for training-free adaptation. In Forty-second International Conference on Machine Learning. Retrieved from https:\/\/openreview.net\/forum?id=6Rvs8jluQP"},{"key":"e_1_3_4_117_2","article-title":"American invitational mathematics examination\u2014aime","year":"2024","unstructured":"MAA. 2024. American invitational mathematics examination\u2014aime. In American Invitational Mathematics Examination - AIME 2024, February 2024. (2024).","journal-title":"In American Invitational Mathematics Examination - AIME 2024, February 2024."},{"key":"e_1_3_4_118_2","volume-title":"Advances in Neural Information Processing Systems 36","author":"Madaan Aman","year":"2023","unstructured":"Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et\u00a0al. 2023. Self-refine: Iterative refinement with self-feedback. In Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_4_119_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-short.151"},{"key":"e_1_3_4_120_2","article-title":"A survey in mathematical language processing","volume":"2205","author":"Meadows Jordan","year":"2022","unstructured":"Jordan Meadows and Andre Freitas. 2022. A survey in mathematical language processing. CoRR abs\/2205.15231 (2022).","journal-title":"CoRR"},{"key":"e_1_3_4_121_2","article-title":"Simpo: Simple preference optimization with a reference-free reward","volume":"2405","author":"Meng Yu","year":"2024","unstructured":"Yu Meng, Mengzhou Xia, and Danqi Chen. 2024. Simpo: Simple preference optimization with a reference-free reward. CoRR abs\/2405.14734 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_122_2","volume-title":"The 12th International Conference on Learning Representations","author":"Merrill William","year":"2024","unstructured":"William Merrill and Ashish Sabharwal. 2024. The expressive power of transformers with chain of thought. In The 12th International Conference on Learning Representations."},{"key":"e_1_3_4_123_2","article-title":"A diverse corpus for evaluating and developing English math word problem solvers","volume":"2106","author":"Miao Shen-Yun","year":"2021","unstructured":"Shen-Yun Miao, Chao-Chun Liang, and Keh-Yih Su. 2021. A diverse corpus for evaluating and developing English math word problem solvers. CoRR abs\/2106.15772 (2021).","journal-title":"CoRR"},{"key":"e_1_3_4_124_2","article-title":"GSM-symbolic: Understanding the limitations of mathematical reasoning in large language models","volume":"2410","author":"Mirzadeh Seyed-Iman","year":"2024","unstructured":"Seyed-Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. 2024. GSM-symbolic: Understanding the limitations of mathematical reasoning in large language models. CoRR abs\/2410.05229 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_125_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.392"},{"key":"e_1_3_4_126_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-79876-5_37"},{"key":"e_1_3_4_127_2","article-title":"s1: Simple test-time scaling","volume":"2501","author":"Muennighoff Niklas","year":"2025","unstructured":"Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Cand\u00e8s, and Tatsunori Hashimoto. 2025. s1: Simple test-time scaling. CoRR abs\/2501.19393 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_128_2","unstructured":"OpenAI. 2024. Learning to reason with LLMs. Retrieved from https:\/\/openai.com\/index\/learning-to-reason-with-llms\/"},{"key":"e_1_3_4_129_2","volume-title":"International Conference on Learning Representations","author":"Pang Jing-Cheng","year":"2023","unstructured":"Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, and Yang Yu. Zhang, Zongzhang. 2023. Language model self-improvement by reinforcement learning contemplation. In International Conference on Learning Representations."},{"key":"e_1_3_4_130_2","volume-title":"Advances in Neural Information Processing Systems 38","author":"Pang Richard Yuanzhe","year":"2024","unstructured":"Richard Yuanzhe Pang, Weizhe Yuan, He He, Kyunghyun Cho, Sainbayar Sukhbaatar, and Jason Weston. 2024. Iterative reasoning preference optimization. In Advances in Neural Information Processing Systems 38."},{"key":"e_1_3_4_131_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Paster Keiran","year":"2024","unstructured":"Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, and Jimmy Ba. 2024. OpenWebMath: An open dataset of high-quality mathematical web text. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_4_132_2","article-title":"Are NLP models really able to solve simple math word problems?","volume":"2103","author":"Patel Arkil","year":"2021","unstructured":"Arkil Patel, Satwik Bhattamishra, and Navin Goyal. 2021. Are NLP models really able to solve simple math word problems? CoRR abs\/2103.07191 (2021).","journal-title":"CoRR"},{"key":"e_1_3_4_133_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.eacl-long.67"},{"key":"e_1_3_4_134_2","article-title":"ReGenesis: LLMs can grow into reasoning generalists via self-improvement","volume":"2410","author":"Peng Xiangyu","year":"2024","unstructured":"Xiangyu Peng, Congying Xia, Xinyi Yang, Caiming Xiong, Chien-Sheng Wu, and Chen Xing. 2024. ReGenesis: LLMs can grow into reasoning generalists via self-improvement. CoRR abs\/2410.02108 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_135_2","volume-title":"Advances in Neural Information Processing Systems 36","author":"Prystawski Ben","year":"2023","unstructured":"Ben Prystawski, Michael Li, and Noah D. Goodman. 2023. Why think step by step? Reasoning emerges from the locality of experience. In Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_4_136_2","article-title":"A survey of efficient reasoning for large reasoning models: Language, multimodality, and beyond","volume":"2503","author":"Qu Xiaoye","year":"2025","unstructured":"Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, et\u00a0al. 2025. A survey of efficient reasoning for large reasoning models: Language, multimodality, and beyond. CoRR abs\/2503.21614 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_137_2","volume-title":"Advances in Neural Information Processing Systems","author":"Qu Yuxiao","year":"2024","unstructured":"Yuxiao Qu, Tianjun Zhang, Naman Garg, and Aviral Kumar. 2024. Recursive introspection: Teaching language model agents how to self-improve. In Advances in Neural Information Processing Systems."},{"issue":"8","key":"e_1_3_4_138_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.","journal-title":"OpenAI blog"},{"key":"e_1_3_4_139_2","volume-title":"Advances in Neural Information Processing Systems 37","author":"Rafailov Rafael","year":"2023","unstructured":"Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems 37."},{"key":"e_1_3_4_140_2","article-title":"Proximal policy optimization algorithms","volume":"1707","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR abs\/1707.06347 (2017).","journal-title":"CoRR"},{"key":"e_1_3_4_141_2","article-title":"Rewarding progress: Scaling automated process verifiers for LLM reasoning","volume":"2410","author":"Setlur Amrith","year":"2024","unstructured":"Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, and Aviral Kumar. 2024. Rewarding progress: Scaling automated process verifiers for LLM reasoning. CoRR abs\/2410.08146 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_142_2","article-title":"Deepseekmath: Pushing the limits of mathematical reasoning in open language models","volume":"2402","author":"Shao Zhihong","year":"2024","unstructured":"Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, et\u00a0al. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. CoRR abs\/2402.03300 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_143_2","article-title":"Satori: Reinforcement learning with chain-of-action-thought enhances LLM reasoning via autoregressive search","volume":"2502","author":"Shen Maohao","year":"2025","unstructured":"Maohao Shen, Guangtao Zeng, Zhenting Qi, Zhang-Wei Hong, Zhenfang Chen, Wei Lu, Gregory W. Wornell, Subhro Das, David Cox, and Chuang Gan. 2025. Satori: Reinforcement learning with chain-of-action-thought enhances LLM reasoning via autoregressive search. CoRR abs\/2502.02508 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_144_2","article-title":"Llm with tools: A survey","volume":"2409","author":"Shen Zhuocheng","year":"2024","unstructured":"Zhuocheng Shen. 2024. Llm with tools: A survey. CoRR abs\/2409.18807 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_145_2","volume-title":"International Conference on Machine Learning","author":"Shi Freda","year":"2023","unstructured":"Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Sch\u00e4rli, and Denny Zhou. 2023. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning."},{"key":"e_1_3_4_146_2","article-title":"Language models are multilingual chain-of-thought reasoners","volume":"2210","author":"Shi Freda","year":"2022","unstructured":"Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, et\u00a0al. 2022. Language models are multilingual chain-of-thought reasoners. CoRR abs\/2210.03057 (2022).","journal-title":"CoRR"},{"key":"e_1_3_4_147_2","article-title":"Beyond human data: Scaling self-training for problem-solving with language models","author":"Singh Avi","year":"2024","unstructured":"Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, et\u00a0al. 2024. Beyond human data: Scaling self-training for problem-solving with language models. Transactions on Machine Learning Research (2024).","journal-title":"Transactions on Machine Learning Research"},{"key":"e_1_3_4_148_2","doi-asserted-by":"publisher","DOI":"10.1145\/365691.365960"},{"key":"e_1_3_4_149_2","article-title":"Scaling LLM test-time compute optimally can be more effective than scaling model parameters","volume":"2408","author":"Snell Charlie","year":"2024","unstructured":"Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. 2024. Scaling LLM test-time compute optimally can be more effective than scaling model parameters. CoRR abs\/2408.03314 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_150_2","article-title":"To cot or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning","volume":"2409","author":"Sprague Zayne","year":"2024","unstructured":"Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, and Greg Durrett. 2024. To cot or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning. CoRR abs\/2409.12183 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_151_2","article-title":"To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning","volume":"2409","author":"Sprague Zayne","year":"2024","unstructured":"Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, and Greg Durrett. 2024. To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning. CoRR abs\/2409.12183 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_152_2","volume-title":"Findings of the Association for Computational Linguistics,","author":"Srivastava Pragya","year":"2024","unstructured":"Pragya Srivastava, Manuj Malik, Vivek Gupta, Tanuja Ganu, and Dan Roth. 2024. Evaluating LLMs\u2019 mathematical reasoning in financial document question answering. In Findings of the Association for Computational Linguistics,."},{"key":"e_1_3_4_153_2","article-title":"Stop overthinking: A survey on efficient reasoning for large language models","volume":"2503","author":"Sui Yang","year":"2025","unstructured":"Yang Sui, Yu-Neng Chuang, Guanchu Wang, Jiamu Zhang, Tianyi Zhang, Jiayi Yuan, Hongyi Liu, Andrew Wen, Hanjie Chen, Xia Hu, et\u00a0al. 2025. Stop overthinking: A survey on efficient reasoning for large language models. CoRR abs\/2503.16419 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_154_2","article-title":"ToolAlpaca: Generalized tool learning for language models with 3000 simulated cases","volume":"2306","author":"Tang Qiaoyu","year":"2023","unstructured":"Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, and Le Sun. 2023. ToolAlpaca: Generalized tool learning for language models with 3000 simulated cases. CoRR abs\/2306.05301 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_155_2","article-title":"A comprehensive survey of the lean 4 theorem prover: Architecture, applications, and advances","volume":"2501","author":"Tang Xichen","year":"2025","unstructured":"Xichen Tang. 2025. A comprehensive survey of the lean 4 theorem prover: Architecture, applications, and advances. CoRR abs\/2501.18639 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_156_2","volume-title":"Proceedings of the 41st International Conference on Machine Learning","author":"Tang Zhengyang","year":"2024","unstructured":"Zhengyang Tang, Xingxing Zhang, Benyou Wang, and Furu Wei. 2024. MathScale: Scaling instruction tuning for mathematical reasoning. In Proceedings of the 41st International Conference on Machine Learning."},{"key":"e_1_3_4_157_2","article-title":"Kimi k1. 5: Scaling reinforcement learning with LLMs","volume":"2501","author":"Team Kimi","year":"2025","unstructured":"Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, et\u00a0al. 2025. Kimi k1. 5: Scaling reinforcement learning with LLMs. CoRR abs\/2501.12599 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_158_2","unstructured":"Adly Templeton Tom Conerly Jonathan Marcus Jack Lindsey Trenton Bricken Brian Chen Adam Pearce Craig Citro Emmanuel Ameisen Andy Jones Hoagy Cunningham Nicholas L. Turner Callum McDougall Monte MacDiarmid C. Daniel Freeman Theodore R. Sumers Edward Rees Joshua Batson Adam Jermyn Shan Carter Chris Olah and Tom Henighan. 2024. Scaling monosemanticity: Extracting interpretable features from Claude 3 sonnet. Transformer Circuits Thread (2024)."},{"key":"e_1_3_4_159_2","article-title":"OpenMathInstruct-2: Accelerating AI for math with massive open-source instruction data","volume":"2410","author":"Toshniwal Shubham","year":"2024","unstructured":"Shubham Toshniwal, Wei Du, Ivan Moshkov, Branislav Kisacanin, Alexan Ayrapetyan, and Igor Gitman. 2024. OpenMathInstruct-2: Accelerating AI for math with massive open-source instruction data. CoRR abs\/2410.01560 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_160_2","volume-title":"Advances in Neural Information Processing Systems 38","author":"Toshniwal Shubham","year":"2024","unstructured":"Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, and Igor Gitman. 2024. OpenMathInstruct-1: A 1.8 million math instruction tuning dataset. In Advances in Neural Information Processing Systems 38."},{"key":"e_1_3_4_161_2","article-title":"Llama: Open and efficient foundation language models","volume":"2302","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et\u00a0al. 2023. Llama: Open and efficient foundation language models. CoRR abs\/2302.13971 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_162_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.410"},{"key":"e_1_3_4_163_2","doi-asserted-by":"crossref","unstructured":"A. M. Turing. 1950. Computing machinery and intelligence. 49 236 (1950).","DOI":"10.1093\/mind\/LIX.236.433"},{"key":"e_1_3_4_164_2","article-title":"Why can large language models generate correct chain-of-thoughts?","volume":"2310","author":"Tutunov Rasul","year":"2023","unstructured":"Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, and Haitham Bou-Ammar. 2023. Why can large language models generate correct chain-of-thoughts? CoRR abs\/2310.13571 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_165_2","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3694110"},{"key":"e_1_3_4_166_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.153"},{"key":"e_1_3_4_167_2","article-title":"Planning in natural language improves LLM search for code generation","volume":"2409","author":"Wang Evan Z.","year":"2024","unstructured":"Evan Z. Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, William Song, Vaskar Nath, Ziwen Han, Sean M. Hendryx, Summer Yue, and Hugh Zhang. 2024. Planning in natural language improves LLM search for code generation. CoRR abs\/2409.03733 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_168_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.271"},{"key":"e_1_3_4_169_2","volume-title":"Advances in Neural Information Processing Systems 38","author":"Wang Ke","year":"2024","unstructured":"Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Houxing Ren, Aojun Zhou, Mingjie Zhan, and Hongsheng Li. 2024. Measuring multimodal mathematical reasoning with math-vision dataset. In Advances in Neural Information Processing Systems 38."},{"key":"e_1_3_4_170_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.147"},{"key":"e_1_3_4_171_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.510"},{"key":"e_1_3_4_172_2","doi-asserted-by":"publisher","DOI":"10.5555\/3709347.3743852"},{"key":"e_1_3_4_173_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Wang Xuezhi","year":"2023","unstructured":"Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-consistency improves chain of thought reasoning in language models. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_4_174_2","article-title":"Chain-of-thought reasoning without prompting","volume":"2402","author":"Wang Xuezhi","year":"2024","unstructured":"Xuezhi Wang and Denny Zhou. 2024. Chain-of-thought reasoning without prompting. CoRR abs\/2402.10200 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_175_2","article-title":"Thoughts are all over the place: On the underthinking of o1-like LLMs","volume":"2501","author":"Wang Yue","year":"2025","unstructured":"Yue Wang, Qiuzhi Liu, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Linfeng Song, Dian Yu, Juntao Li, Zhuosheng Zhang, et\u00a0al. 2025. Thoughts are all over the place: On the underthinking of o1-like LLMs. CoRR abs\/2501.18585 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_176_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1088"},{"key":"e_1_3_4_177_2","volume-title":"Advances in Neural Information Processing Systems 37","author":"Wang Zengzhi","year":"2024","unstructured":"Zengzhi Wang, Xuefeng Li, Rui Xia, and Pengfei Liu. 2024. MathPile: A billion-token-scale pretraining corpus for math. In Advances in Neural Information Processing Systems 37."},{"key":"e_1_3_4_178_2","doi-asserted-by":"publisher","DOI":"10.1145\/321105.321107"},{"key":"e_1_3_4_179_2","article-title":"Emergent abilities of large language models","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et\u00a0al. 2022. Emergent abilities of large language models. Transactions on Machine Learning Research 2022 (2022).","journal-title":"Transactions on Machine Learning Research"},{"key":"e_1_3_4_180_2","volume-title":"Advances in Neural Information Processing Systems 36","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_4_181_2","article-title":"CMATH: Can your language model pass chinese elementary school math test?","volume":"2306","author":"Wei Tianwen","year":"2023","unstructured":"Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, and Bin Wang. 2023. CMATH: Can your language model pass chinese elementary school math test? CoRR abs\/2306.16636 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_182_2","volume-title":"Proceedings of the 1th Neural Information Processing Systems Track on Datasets and Benchmarks","author":"Welleck Sean","year":"2021","unstructured":"Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, and Kyunghyun Cho. 2021. NaturalProofs: Mathematical theorem proving in natural language. In Proceedings of the 1th Neural Information Processing Systems Track on Datasets and Benchmarks."},{"key":"e_1_3_4_183_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Welleck Sean","year":"2023","unstructured":"Sean Welleck, Ximing Lu, Peter West, Faeze Brahman, Tianxiao Shen, Daniel Khashabi, and Yejin Choi. 2023. Generating sequences by learning to self-correct. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_4_184_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.167"},{"key":"e_1_3_4_185_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-97-1717-0_12"},{"key":"e_1_3_4_186_2","article-title":"Analyzing chain-of-thought prompting in large language models via gradient-based feature attributions","volume":"2307","author":"Wu Skyler","year":"2023","unstructured":"Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, and Himabindu Lakkaraju. 2023. Analyzing chain-of-thought prompting in large language models via gradient-based feature attributions. CoRR abs\/2307.13339 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_187_2","article-title":"Inference scaling laws: An empirical analysis of compute-optimal inference for problem-solving with language models","volume":"2408","author":"Wu Yangzhen","year":"2024","unstructured":"Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, and Yiming Yang. 2024. Inference scaling laws: An empirical analysis of compute-optimal inference for problem-solving with language models. CoRR abs\/2408.00724 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_188_2","volume-title":"Advances in Neural Information Processing Systems 37","author":"Wu Zeqiu","year":"2023","unstructured":"Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, and Hannaneh Hajishirzi. 2023. Fine-grained human feedback gives better rewards for language model training. In Advances in Neural Information Processing Systems 37."},{"key":"e_1_3_4_189_2","article-title":"A minimalist approach to LLM reasoning: from rejection sampling to reinforce","volume":"2504","author":"Xiong Wei","year":"2025","unstructured":"Wei Xiong, Jiarui Yao, Yuhui Xu, Bo Pang, Lei Wang, Doyen Sahoo, Junnan Li, Nan Jiang, Tong Zhang, Caiming Xiong, et\u00a0al. 2025. A minimalist approach to LLM reasoning: from rejection sampling to reinforce. CoRR abs\/2504.11343 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_190_2","article-title":"Self-rewarding correction for mathematical reasoning","volume":"2502","author":"Xiong Wei","year":"2025","unstructured":"Wei Xiong, Hanning Zhang, Chenlu Ye, Lichang Chen, Nan Jiang, and Tong Zhang. 2025. Self-rewarding correction for mathematical reasoning. CoRR abs\/2502.19613 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_191_2","article-title":"Towards large reasoning models: A survey of reinforced reasoning with large language models","volume":"2501","author":"Xu Fengli","year":"2025","unstructured":"Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, et\u00a0al. 2025. Towards large reasoning models: A survey of reinforced reasoning with large language models. CoRR abs\/2501.09686 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_192_2","article-title":"Is DPO superior to PPO for LLM Alignment? A Comprehensive Study","volume":"2404","author":"Xu Shusheng","year":"2024","unstructured":"Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, and Yi Wu. 2024. Is DPO superior to PPO for LLM Alignment? A Comprehensive Study. CoRR abs\/2404.10719 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_193_2","article-title":"A survey of mathematical reasoning in the era of multimodal large language model: Benchmark, method & challenges","volume":"2412","author":"Yan Yibo","year":"2024","unstructured":"Yibo Yan, Jiamin Su, Jianxiang He, Fangteng Fu, Xu Zheng, Yuanhuiyi Lyu, Kun Wang, Shen Wang, Qingsong Wen, and Xuming Hu. 2024. A survey of mathematical reasoning in the era of multimodal large language model: Benchmark, method & challenges. CoRR abs\/2412.11936 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_194_2","unstructured":"An Yang Baosong Yang Binyuan Hui Bo Zheng Bowen Yu Chang Zhou Chengpeng Li Chengyuan Li Dayiheng Liu Fei Huang Guanting Dong Haoran Wei Huan Lin Jialong Tang Jialin Wang Jian Yang Jianhong Tu Jianwei Zhang Jianxin Ma Jianxin Yang Jin Xu Jingren Zhou Jinze Bai Jinzheng He Junyang Lin Kai Dang Keming Lu Keqin Chen Kexin Yang Mei Li Mingfeng Xue Na Ni Pei Zhang Peng Wang Ru Peng Rui Men Ruize Gao Runji Lin Shijie Wang Shuai Bai Sinan Tan Tianhang Zhu Tianhao Li Tianyu Liu Wenbin Ge Xiaodong Deng Xiaohuan Zhou Xingzhang Ren Xinyu Zhang Xipin Wei Xuancheng Ren Xuejing Liu Yang Fan Yang Yao Yichang Zhang Yu Wan Yunfei Chu Yuqiong Liu Zeyu Cui Zhenru Zhang Zhifang Guo and Zhihao Fan. 2024. Qwen2 technical report. CoRR abs\/2407.10671 (2024)."},{"key":"e_1_3_4_195_2","article-title":"Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement","volume":"2409","author":"Yang An","year":"2024","unstructured":"An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et\u00a0al. 2024. Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement. CoRR abs\/2409.12122 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_196_2","volume-title":"Advances in Neural Information Processing Systems 36","author":"Yang Kaiyu","year":"2023","unstructured":"Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J. Prenger, and Animashree Anandkumar. 2023. LeanDojo: Theorem proving with retrieval-augmented language models. In Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_4_197_2","article-title":"Looped transformers are better at learning learning algorithms","volume":"2311","author":"Yang Liu","year":"2024","unstructured":"Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos. 2024. Looped transformers are better at learning learning algorithms. CoRR abs\/2311.12424 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_198_2","volume-title":"Advances in Neural Information Processing Systems","author":"Yang Sherry","year":"2022","unstructured":"Sherry Yang, Dale Schuurmans, Pieter Abbeel, and Ofir Nachum. 2022. Chain of thought imitation with procedure cloning. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_4_199_2","article-title":"Bridging formal language with chain-of-thought reasoning to geometry problem solving","volume":"2508","author":"Yang Tianyun","year":"2025","unstructured":"Tianyun Yang, Yunwen Li, Ziniu Li, Zhihang Lin, Ruoyu Sun, and Tian Ding. 2025. Bridging formal language with chain-of-thought reasoning to geometry problem solving. CoRR abs\/2508.09099 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_200_2","article-title":"LemmaHead: RAG assisted proof generation using large language models","volume":"2501","author":"Yang Tianbo","year":"2025","unstructured":"Tianbo Yang, Mingqi Yan, Hongyi Zhao, and Tianshuo Yang. 2025. LemmaHead: RAG assisted proof generation using large language models. CoRR abs\/2501.15797 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_201_2","volume-title":"Advances in Neural Information Processing Systems 37","author":"Yao Shunyu","year":"2023","unstructured":"Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. In Advances in Neural Information Processing Systems 37."},{"key":"e_1_3_4_202_2","article-title":"Physics of language models: Part 2.2, how to learn from mistakes on grade-school math problems","volume":"2408","author":"Ye Tian","year":"2024","unstructured":"Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. 2024. Physics of language models: Part 2.2, how to learn from mistakes on grade-school math problems. CoRR abs\/2408.16293 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_203_2","article-title":"LIMO: Less is more for reasoning","volume":"2502","author":"Ye Yixin","year":"2025","unstructured":"Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. LIMO: Less is more for reasoning. CoRR abs\/2502.03387 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_204_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.131"},{"key":"e_1_3_4_205_2","article-title":"Lean workbook: A large-scale lean problem set formalized from natural language math problems","volume":"2406","author":"Ying Huaiyuan","year":"2024","unstructured":"Huaiyuan Ying, Zijian Wu, Yihan Geng, Jiayu Wang, Dahua Lin, and Kai Chen. 2024. Lean workbook: A large-scale lean problem set formalized from natural language math problems. CoRR abs\/2406.03847 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_206_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Yu Longhui","year":"2024","unstructured":"Longhui Yu, Weisen Jiang, Han Shi, YU Jincheng, Zhengying Liu, Yu Zhang, James Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. 2024. MetaMath: Bootstrap your own mathematical questions for large language models. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_4_207_2","article-title":"Dapo: An open-source LLM reinforcement learning system at scale","volume":"2503","author":"Yu Qiying","year":"2025","unstructured":"Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Tiantian Fan, Gaohong Liu, Lingjun Liu, Xin Liu, et\u00a0al. 2025. Dapo: An open-source LLM reinforcement learning system at scale. CoRR abs\/2503.14476 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_208_2","article-title":"Formalmath: Benchmarking formal mathematical reasoning of large language models","volume":"2505","author":"Yu Zhouliang","year":"2025","unstructured":"Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, et\u00a0al. 2025. Formalmath: Benchmarking formal mathematical reasoning of large language models. CoRR abs\/2505.02735 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_209_2","article-title":"Free process rewards without process labels","volume":"2412","author":"Yuan Lifan","year":"2024","unstructured":"Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, and Hao Peng. 2024. Free process rewards without process labels. CoRR abs\/2412.01981 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_210_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Yue Xiang","year":"2024","unstructured":"Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, and Wenhu Chen. 2024. MAmmoTH: Building math generalist models through hybrid instruction tuning. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_4_211_2","article-title":"Mammoth2: Scaling instructions from the web","volume":"2405","author":"Yue Xiang","year":"2024","unstructured":"Xiang Yue, Tuney Zheng, Ge Zhang, and Wenhu Chen. 2024. Mammoth2: Scaling instructions from the web. CoRR abs\/2405.03548 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_212_2","article-title":"Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model?","volume":"2504","author":"Yue Yang","year":"2025","unstructured":"Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Shiji Song, and Gao Huang. 2025. Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model? CoRR abs\/2504.13837 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_213_2","article-title":"Quiet-star: Language models can teach themselves to think before speaking","volume":"2403","author":"Zelikman Eric","year":"2024","unstructured":"Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, and Noah D. Goodman. 2024. Quiet-star: Language models can teach themselves to think before speaking. CoRR abs\/2403.09629 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_214_2","article-title":"Star: Bootstrapping reasoning with reasoning","author":"Zelikman Eric","year":"2022","unstructured":"Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah Goodman. 2022. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems 36, 35 (2022), 15476\u201315488.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_215_2","article-title":"SimpleRL-Zoo: Investigating and taming zero reinforcement learning for open base models in the wild","volume":"2503","author":"Zeng Weihao","year":"2025","unstructured":"Weihao Zeng, Yuzhen Huang, Qian Liu, Wei Liu, Keqing He, Zejun Ma, and Junxian He. 2025. SimpleRL-Zoo: Investigating and taming zero reinforcement learning for open base models in the wild. CoRR abs\/2503.18892 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_216_2","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3694477"},{"key":"e_1_3_4_217_2","volume-title":"Advances in Neural Information Processing Systems 37","author":"Zhang Beichen","year":"2023","unstructured":"Beichen Zhang, Kun Zhou, Xilin Wei, Xin Zhao, Jing Sha, Shijin Wang, and Ji-Rong Wen. 2023. Evaluating and improving tool-augmented computation-intensive math reasoning. In Advances in Neural Information Processing Systems 37."},{"key":"e_1_3_4_218_2","article-title":"The gap of semantic parsing: A survey on automatic math word problem solvers","volume":"1808","author":"Zhang Dongxiang","year":"2019","unstructured":"Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Tian Dai, and Heng Tao Shen. 2019. The gap of semantic parsing: A survey on automatic math word problem solvers. CoRR abs\/1808.07290 (2019).","journal-title":"CoRR"},{"key":"e_1_3_4_219_2","article-title":"Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning","volume":"2410","author":"Zhang Di","year":"2024","unstructured":"Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, et\u00a0al. 2024. Llama-berry: Pairwise optimization for o1-like olympiad-level mathematical reasoning. CoRR abs\/2410.02884 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_220_2","article-title":"Rest-MCTS*: LLM self-training via process reward guided tree search","volume":"2406","author":"Zhang Dan","year":"2024","unstructured":"Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, and Jie Tang. 2024. Rest-MCTS*: LLM self-training via process reward guided tree search. CoRR abs\/2406.03816 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_221_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.findings-naacl.51"},{"key":"e_1_3_4_222_2","article-title":"Reasoning with reinforced functional token tuning","volume":"2502","author":"Zhang Kongcheng","year":"2025","unstructured":"Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, and Shunyu Liu. 2025. Reasoning with reinforced functional token tuning. CoRR abs\/2502.13389 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_223_2","article-title":"Generative verifiers: Reward modeling as next-token prediction","volume":"2408","author":"Zhang Lunjun","year":"2024","unstructured":"Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, and Rishabh Agarwal. 2024. Generative verifiers: Reward modeling as next-token prediction. CoRR abs\/2408.15240 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_224_2","article-title":"Autonomous data selection with language models for mathematical texts","volume":"2402","author":"Zhang Yifan","year":"2024","unstructured":"Yifan Zhang, Yifan Luo, Yang Yuan, and Andrew C. Yao. 2024. Autonomous data selection with language models for mathematical texts. CoRR abs\/2402.07625 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_225_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Zhang Zhuosheng","year":"2023","unstructured":"Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2023. Automatic chain of thought prompting in large language models. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_4_226_2","unstructured":"Bartosz Piotrowski Zhangir Azerbayev Edward Ayers. 2023. Proof-pile. Retrieved from https:\/\/huggingface.co\/datasets\/hoskinson-center\/proof-pile (2023)."},{"key":"e_1_3_4_227_2","article-title":"Echo chamber: RL post-training amplifies behaviors learned in pretraining","volume":"2504","author":"Zhao Rosie","year":"2025","unstructured":"Rosie Zhao, Alexandru Meterez, Sham Kakade, Cengiz Pehlevan, Samy Jelassi, and Eran Malach. 2025. Echo chamber: RL post-training amplifies behaviors learned in pretraining. CoRR abs\/2504.07912 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_228_2","article-title":"A survey of large language models","volume":"2303","author":"Zhao Wayne Xin","year":"2023","unstructured":"Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et\u00a0al. 2023. A survey of large language models. CoRR abs\/2303.18223 (2023).","journal-title":"CoRR"},{"key":"e_1_3_4_229_2","article-title":"Automatic curriculum expert iteration for reliable LLM reasoning","volume":"2410","author":"Zhao Zirui","year":"2024","unstructured":"Zirui Zhao, Hanze Dong, Amrita Saha, Caiming Xiong, and Doyen Sahoo. 2024. Automatic curriculum expert iteration for reliable LLM reasoning. CoRR abs\/2410.07627 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_230_2","article-title":"Processbench: Identifying process errors in mathematical reasoning","volume":"2412","author":"Zheng Chujie","year":"2024","unstructured":"Chujie Zheng, Zhenru Zhang, Beichen Zhang, Runji Lin, Keming Lu, Bowen Yu, Dayiheng Liu, Jingren Zhou, and Junyang Lin. 2024. Processbench: Identifying process errors in mathematical reasoning. CoRR abs\/2412.06559 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_231_2","volume-title":"Advances in Neural Information Processing Systems 36","author":"Zheng Ge","year":"2023","unstructured":"Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou, and Sibei Yang. 2023. DDCoT: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models. In Advances in Neural Information Processing Systems 36."},{"key":"e_1_3_4_232_2","article-title":"Minif2f: A cross-system benchmark for formal olympiad-level mathematics","volume":"2109","author":"Zheng Kunhao","year":"2021","unstructured":"Kunhao Zheng, Jesse Michael Han, and Stanislas Polu. 2021. Minif2f: A cross-system benchmark for formal olympiad-level mathematics. CoRR abs\/2109.00110 (2021).","journal-title":"CoRR"},{"key":"e_1_3_4_233_2","volume-title":"The 12th International Conference on Learning Representations","author":"Zheng Longtao","year":"2024","unstructured":"Longtao Zheng, Rundong Wang, Xinrun Wang, and Bo An. 2024. Synapse: Trajectory-as-exemplar prompting with memory for computer control. In The 12th International Conference on Learning Representations."},{"key":"e_1_3_4_234_2","article-title":"Achieving >97% on GSM8K: Deeply understanding the problems makes LLMs better reasoners","volume":"2404","author":"Zhong Qihuang","year":"2024","unstructured":"Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, and Dacheng Tao. 2024. Achieving >97% on GSM8K: Deeply understanding the problems makes LLMs better reasoners. CoRR abs\/2404.14963 (2024).","journal-title":"CoRR"},{"key":"e_1_3_4_235_2","article-title":"Reinforced MLLM: A survey on RL-based reasoning in multimodal large language models","volume":"2504","author":"Zhou Guanghao","year":"2025","unstructured":"Guanghao Zhou, Panjia Qiu, Cen Chen, Jie Wang, Zheming Yang, Jian Xu, and Minghui Qiu. 2025. Reinforced MLLM: A survey on RL-based reasoning in multimodal large language models. CoRR abs\/2504.21277 (2025).","journal-title":"CoRR"},{"key":"e_1_3_4_236_2","article-title":"Teaching algorithmic reasoning via in-context learning","volume":"2211","author":"Zhou Hattie","year":"2022","unstructured":"Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron C. Courville, Behnam Neyshabur, and Hanie Sedghi. 2022. Teaching algorithmic reasoning via in-context learning. CoRR abs\/2211.09066 (2022).","journal-title":"CoRR"},{"key":"e_1_3_4_237_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-023-3823-6"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3786333","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T12:17:23Z","timestamp":1770207443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3786333"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,4]]},"references-count":236,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2026,6,30]]}},"alternative-id":["10.1145\/3786333"],"URL":"https:\/\/doi.org\/10.1145\/3786333","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,4]]},"assertion":[{"value":"2025-06-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}