{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T07:33:34Z","timestamp":1777102414629,"version":"3.51.4"},"reference-count":101,"publisher":"Association for Computing Machinery (ACM)","issue":"5","funder":[{"name":"Natural Science Foundation of Shanghai","award":["24ZR1405000"],"award-info":[{"award-number":["24ZR1405000"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2026,5,31]]},"abstract":"<jats:p>\n                    Automated log analysis is crucial to ensure the high availability and reliability of complex systems. The advent of Large Language Models (LLMs) in Natural Language Processing (NLP) has ushered in a new era of language model-driven automated log analysis, garnering significant interest. Within this field, two primary paradigms based on language models for log analysis have become prominent. Small Language Models (SLMs) (such as BERT) follow the\n                    <jats:italic toggle=\"yes\">pre-train and fine-tune<\/jats:italic>\n                    paradigm, focusing on the specific log analysis task through fine-tuning on supervised datasets. On the other hand, LLMs (such as ChatGPT) following the\n                    <jats:italic toggle=\"yes\">in-context learning<\/jats:italic>\n                    paradigm, analyze logs by providing a few examples in prompt contexts without updating parameters. Despite their respective strengths, both models exhibit inherent limitations. By comparing SLMs and LLMs, we notice that SLMs are more cost-effective but less powerful, whereas LLMs with large parameters are highly powerful but expensive and inefficient. To tradeoff between the performance and inference costs of both models in automated log analysis, this article introduces an adaptive log analysis framework known as AdaptiveLog, which effectively reduces the costs associated with LLM while ensuring superior results. This framework collaborates an LLM and an SLM, strategically allocating the LLM to tackle complex logs while delegating simpler logs to the SLM. Specifically, to efficiently query the LLM, we propose an adaptive selection strategy based on the uncertainty estimation of the SLM, where the LLM is invoked only when the SLM is uncertain. In addition, to enhance the reasoning ability of the LLM in log analysis tasks, we propose a novel prompt strategy by retrieving similar error-prone cases as the reference, enabling the model to leverage past error experiences and learn solutions from these cases. We evaluate AdaptiveLog on different log analysis tasks, Extensive experiments demonstrate that AdaptiveLog achieves state-of-the-art results across different tasks, elevating the overall accuracy of log analysis while maintaining cost efficiency. Our source code and detailed experimental data are available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/LeaperOvO\/AdaptiveLog-review\">https:\/\/github.com\/LeaperOvO\/AdaptiveLog-review<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3749840","type":"journal-article","created":{"date-parts":[[2025,7,22]],"date-time":"2025-07-22T22:19:56Z","timestamp":1753222796000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model"],"prefix":"10.1145","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5974-5988","authenticated-orcid":false,"given":"Lipeng","family":"Ma","sequence":"first","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6473-9272","authenticated-orcid":false,"given":"Weidong","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9229-7555","authenticated-orcid":false,"given":"Yixuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3219-9996","authenticated-orcid":false,"given":"Ben","family":"Fei","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3289-0533","authenticated-orcid":false,"given":"Mingjie","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5175-7667","authenticated-orcid":false,"given":"Shuhao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0736-6457","authenticated-orcid":false,"given":"Sihang","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2083-4307","authenticated-orcid":false,"given":"Bo","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Donghua University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8403-9591","authenticated-orcid":false,"given":"Yanghua","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,24]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.05.008"},{"key":"e_1_3_3_3_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et al. 2023. GPT-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.eacl-srw.17"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2024.3358730"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3674805.3686684"},{"key":"e_1_3_3_7_2","unstructured":"Tom B. Brown. 2020. Language models are few-shot learners. arXiv:2005.14165. Retrieved from https:\/\/arxiv.org\/abs\/2005.14165"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-024-01276-1"},{"key":"e_1_3_3_9_2","article-title":"Fast greedy map inference for determinantal point process to improve recommendation diversity","volume":"31","author":"Chen Laming","year":"2018","unstructured":"Laming Chen, Guoxin Zhang, and Eric Zhou. 2018. Fast greedy map inference for determinantal point process to improve recommendation diversity. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE5003.2020.00013"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1080\/08839514.2022.2145642"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3629553"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2011.06.002"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338916"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.755"},{"key":"e_1_3_3_16_2","unstructured":"Jacob Devlin. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_3_17_2","unstructured":"Qingxiu Dong Lei Li Damai Dai Ce Zheng Zhiyong Wu Baobao Chang Xu Sun Jingjing Xu and Zhifang Sui. 2022. A survey on in-context learning. arXiv:2301.00234. Retrieved from https:\/\/arxiv.org\/abs\/2301.00234"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134015"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639219"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/FTXS56515.2022.00006"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3588195.3595943"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00128"},{"key":"e_1_3_3_23_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Frantar Elias","year":"2022","unstructured":"Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. 2022. OPTQ: Accurate quantization for generative pre-trained transformers. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_3_24_2","first-page":"10421","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Fu Yao","year":"2023","unstructured":"Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, and Tushar Khot. 2023. Specializing smaller language models towards multi-step reasoning. In Proceedings of the International Conference on Machine Learning. PMLR, 10421\u201310430."},{"key":"e_1_3_3_25_2","first-page":"1050","volume-title":"In Proceedings of the International Conference on Machine Learning","author":"Gal Yarin","year":"2016","unstructured":"Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. PMLR, 1050\u20131059."},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2020.2993728"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00109"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00355"},{"key":"e_1_3_3_29_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Gu Yuxian","year":"2024","unstructured":"Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. 2024. MiniLLM: Knowledge distillation of large language models. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_3_30_2","unstructured":"Wei Guan Jian Cao Shiyou Qian Jianqi Gao and Chun Ouyang. 2024. LogLLM: Log-based anomaly detection using large language models. arXiv:2411.08561. Retrieved from https:\/\/arxiv.org\/abs\/2411.08561"},{"key":"e_1_3_3_31_2","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Yu Wu Y. K. Li et al. 2024. DeepSeek-Coder: When the large language model meets programming\u2014The rise of code intelligence. arXiv:2401.14196. Retrieved from https:\/\/arxiv.org\/abs\/2401.14196"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3691620.3695475"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICWS.2017.13"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460345"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3660773"},{"key":"e_1_3_3_36_2","doi-asserted-by":"crossref","unstructured":"Cheng-Yu Hsieh Chun-Liang Li Chih-Kuan Yeh Hootan Nakhost Yasuhisa Fujii Alexander Ratner Ranjay Krishna Chen-Yu Lee and Tomas Pfister. 2023. Distilling step-by-step! Outperforming larger language models with less training data and smaller model sizes. arXiv:2305.02301. Retrieved from https:\/\/arxiv.org\/abs\/2305.02301","DOI":"10.18653\/v1\/2023.findings-acl.507"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2020.3034647"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2023.3257518"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.naacl-long.389"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00125"},{"key":"e_1_3_3_41_2","unstructured":"Zhihan Jiang Jinyang Liu Zhuangbin Chen Yichen Li Junjie Huang Yintong Huo Pinjia He Jiazhen Gu and Michael R. Lyu. 2023. LILAC: Log parsing using LLMs with adaptive parsing cache. arXiv:2310.01796. Retrieved from https:\/\/arxiv.org\/abs\/2310.01796"},{"key":"e_1_3_3_42_2","unstructured":"James Joyce. 2003. Bayes\u2019 theorem. Retrieved from https:\/\/seop.illc.uva.nl\/entries\/bayes-theorem\/"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.lindif.2023.102274"},{"key":"e_1_3_3_44_2","article-title":"What uncertainties do we need in Bayesian deep learning for computer vision","volume":"30","author":"Kendall Alex","year":"2017","unstructured":"Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in Bayesian deep learning for computer vision? In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00155578"},{"key":"e_1_3_3_46_2","unstructured":"Bhawesh Kumar Charlie Lu Gauri Gupta Anil Palepu David Bellamy Ramesh Raskar and Andrew Beam. 2023. Conformal prediction with large language models for multi-choice question answering. arXiv:2305.18404. Retrieved from https:\/\/arxiv.org\/abs\/2305.18404"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.395"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678773"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510155"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00204"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3654966"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2023.110689"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE5003.2020.00018"},{"key":"e_1_3_3_54_2","unstructured":"Yichen Li Yintong Huo Zhihan Jiang Renyi Zhong Pinjia He Yuxin Su and Michael R. Lyu. 2023. Exploring the effectiveness of LLMs in automated logging generation: An empirical study. arXiv:2307.05950. Retrieved from https:\/\/arxiv.org\/abs\/2307.05950"},{"key":"e_1_3_3_55_2","article-title":"Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation","volume":"36","author":"Liu Jiawei","year":"2024","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2024. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_56_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643916.3644408"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3511993"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/DASC\/PiCom\/DataCom\/CyberSciTec.2018.00037"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623304"},{"key":"e_1_3_3_61_2","unstructured":"Xiaoxue Ma Huiqi Zou Jacky Keung Pinjia He Yishu Li Xiao Yu and Federica Sarro. 2024. On the influence of data resampling for deep learning-based log anomaly detection: Insights and recommendations. arXiv:2405.03489. Retrieved from https:\/\/arxiv.org\/abs\/2405.03489"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.710"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639150"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1159\/000113158"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.21"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510096"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE62328.2024.00016"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC-DSS-SmartCity-DependSys60770.2023.00045"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2023.3239522"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1002\/jcpy.1313"},{"key":"e_1_3_3_71_2","unstructured":"Devjeet Roy Xuchao Zhang Rashi Bhave Chetan Bansal Pedro Las-Casas Rodrigo Fonseca and Saravan Rajmohan. 2024. Exploring LLM-based agents for root cause analysis. arXiv:2403.04123. Retrieved from https:\/\/arxiv.org\/abs\/2403.04123"},{"key":"e_1_3_3_72_2","unstructured":"Pranab Sahoo Ayush Kumar Singh Sriparna Saha Vinija Jain Samrat Mondal and Aman Chadha. 2024. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv:2402.07927. Retrieved from https:\/\/arxiv.org\/abs\/2402.07927"},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00185"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE59848.2023.00083"},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/IWQoS57198.2023.10188759"},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3368208"},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00043"},{"key":"e_1_3_3_78_2","unstructured":"Xuezhi Wang Jason Wei Dale Schuurmans Quoc Le Ed Chi Sharan Narang Aakanksha Chowdhery and Denny Zhou. 2022. Self-consistency improves chain of thought reasoning in language models. arXiv:2203.11171. Retrieved from https:\/\/arxiv.org\/abs\/2203.11171"},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888900007098"},{"key":"e_1_3_3_80_2","unstructured":"Jerry Wei Jason Wei Yi Tay Dustin Tran Albert Webson Yifeng Lu Xinyun Chen Hanxiao Liu Da Huang Denny Zhou et al. 2023. Larger language models do in-context learning differently. arXiv:2303.03846. Retrieved from https:\/\/arxiv.org\/abs\/2303.03846"},{"key":"e_1_3_3_81_2","doi-asserted-by":"crossref","unstructured":"Thorsten Wittkopp Philipp Wiesner and Odej Kao. 2024. LogRCA: Log-based root cause analysis for distributed services. arXiv:2405.13599. Retrieved from https:\/\/arxiv.org\/abs\/2405.13599","DOI":"10.1007\/978-3-031-69766-1_25"},{"key":"e_1_3_3_82_2","unstructured":"Yi Xiao Van-Hoang Le and Hongyu Zhang. 2024. Stronger cheaper and demonstration-free log parsing with LLMs. arXiv:2406.06156. Retrieved from https:\/\/arxiv.org\/abs\/2406.06156"},{"key":"e_1_3_3_83_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017322"},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3660800"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623326"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639155"},{"key":"e_1_3_3_87_2","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"Xu Junjielong","year":"2025","unstructured":"Junjielong Xu, Qinan Zhang, Zhiqing Zhong, Shilin He, Chaoyun Zhang, Qingwei Lin, Dan Pei, Pinjia He, Dongmei Zhang, and Qi Zhang. 2025. OpenRCA: Can large language models locate the root cause of software failures? In Proceedings of the 13th International Conference on Learning Representations."},{"key":"e_1_3_3_88_2","unstructured":"Xiaohan Xu Ming Li Chongyang Tao Tao Shen Reynold Cheng Jinyang Li Can Xu Dacheng Tao and Tianyi Zhou. 2024. A survey on knowledge distillation of large language models. arXiv:2402.13116. Retrieved from https:\/\/arxiv.org\/abs\/2402.13116"},{"key":"e_1_3_3_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00130"},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623308"},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3623316"},{"key":"e_1_3_3_92_2","article-title":"Evaluating and improving tool-augmented computation-intensive math reasoning","volume":"36","author":"Zhang Beichen","year":"2024","unstructured":"Beichen Zhang, Kun Zhou, Xilin Wei, Xin Zhao, Jing Sha, Shijin Wang, and Ji-Rong Wen. 2024. Evaluating and improving tool-augmented computation-intensive math reasoning. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639205"},{"key":"e_1_3_3_94_2","doi-asserted-by":"publisher","DOI":"10.1145\/3179405"},{"key":"e_1_3_3_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/3715005"},{"key":"e_1_3_3_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338931"},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3473919"},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3473933"},{"key":"e_1_3_3_99_2","first-page":"12697","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhao Zihao","year":"2021","unstructured":"Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate before use: Improving few-shot performance of language models. In Proceedings of the International Conference on Machine Learning. PMLR, 12697\u201312706."},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671810"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE59848.2023.00071"},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1145\/3631972"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3749840","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T06:34:19Z","timestamp":1777098859000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3749840"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,24]]},"references-count":101,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2026,5,31]]}},"alternative-id":["10.1145\/3749840"],"URL":"https:\/\/doi.org\/10.1145\/3749840","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,24]]},"assertion":[{"value":"2025-01-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-04-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}