{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T21:33:53Z","timestamp":1778276033281,"version":"3.51.4"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"ISSTA","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,22]]},"abstract":"<jats:p>Large language models (LLMs) have demonstrated remarkable performance in code generation, significantly enhancing the coding efficiency of developers. Recent advancements in LLM-based agents have led to significant progress in end-to-end automatic software engineering (ASE), particularly in software maintenance (e.g., fixing software issues) and evolution (e.g., adding new features). Despite these encouraging advances, current research faces two major challenges. First, state-of-the-art performance primarily depends on closed-source models like GPT-4, which significantly limits the technology\u2019s accessibility, and potential for customization in diverse software engineering tasks. This dependence also raises concerns about data privacy, particularly when handling sensitive codebases. Second, these models are predominantly trained on static code data, lacking a deep understanding of the dynamic interactions, iterative problem-solving processes, and evolutionary characteristics inherent in software development. Consequently, they may face challenges in navigating complex project structures and generating contextually relevant solutions, which can affect their practical utility in real-world scenarios.<\/jats:p>\n          <jats:p>\n            To address these challenges, our study adopts a software engineering perspective. We recognize that real-world software maintenance and evolution processes encompass not only static code data but also developers\u2019 thought processes, utilization of external tools, and the interaction between different functional personnel. Our objective is to develop an open-source large language model specifically optimized for software improvement, aiming to match the performance of closed-source alternatives while offering greater accessibility and customization potential. Consequently, we introduce the\n            <jats:bold>Lingma SWE-GPT<\/jats:bold>\n            series, comprising Lingma SWE-GPT 7B and Lingma SWE-GPT 72B. By learning from and simulating real-world code submission activities, Lingma SWE-GPT systematically incorporates the dynamic interactions and iterative problem-solving inherent in software development process\u2014such as repository understanding, fault localization, and patch generation\u2014thereby achieving a more comprehensive understanding of software improvement processes. We conducted experimental evaluations using SWE-bench-Verified benchmark (comprising 500 real GitHub issues), recently proposed by OpenAI. The results demonstrate that\n            <jats:bold>Lingma SWE-GPT 72B successfully resolves 30.20% of the GitHub issues<\/jats:bold>\n            , marking a significant improvement in automatic issue resolution (22.76% relative improvement compared to Llama 3.1 405B), approaching the performance of closed-source models (31.80% issues of GPT-4o resolved). Notably, Lingma SWE-GPT 7B resolves 18.20% of the issues, surpassing the 17.20% resolution rate of Llama 3.1 70B, highlighting the potential for applying smaller models to ASE tasks.\n          <\/jats:p>","DOI":"10.1145\/3728981","type":"journal-article","created":{"date-parts":[[2025,6,22]],"date-time":"2025-06-22T10:52:56Z","timestamp":1750589576000},"page":"2362-2383","source":"Crossref","is-referenced-by-count":1,"title":["SWE-GPT: A Process-Centric Language Model for Automated Software Improvement"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7630-4113","authenticated-orcid":false,"given":"Yingwei","family":"Ma","sequence":"first","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3779-5885","authenticated-orcid":false,"given":"Rongyu","family":"Cao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0311-5474","authenticated-orcid":false,"given":"Yongchang","family":"Cao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5068-2260","authenticated-orcid":false,"given":"Yue","family":"Zhang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2075-4283","authenticated-orcid":false,"given":"Jue","family":"Chen","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7060-9482","authenticated-orcid":false,"given":"Yibo","family":"Liu","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0034-5853","authenticated-orcid":false,"given":"Yuchen","family":"Liu","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4179-6979","authenticated-orcid":false,"given":"Binhua","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2795-3114","authenticated-orcid":false,"given":"Fei","family":"Huang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6992-6699","authenticated-orcid":false,"given":"Yongbin","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,22]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 21st International Conference on Mining Software Repositories. 202\u2013206","author":"AlOmar Eman Abdullah","year":"2024","unstructured":"Eman Abdullah AlOmar, Anushkrishna Venkatakrishnan, Mohamed Wiem Mkaouer, Christian Newman, and Ali Ouni. 2024. How to refactor this code? An exploratory study on developer-ChatGPT refactoring conversations. In Proceedings of the 21st International Conference on Mining Software Repositories. 202\u2013206."},{"key":"e_1_2_1_3_1","unstructured":"Anthropic. 2024. Introducing Claude 3.5 Sonnet. https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet"},{"key":"e_1_2_1_4_1","unstructured":"Anthropic. 2024. Introducing the next generation of Claude. https:\/\/www.anthropic.com\/news\/claude-3-family"},{"key":"e_1_2_1_5_1","unstructured":"Carlos E. Jimenez John Yang Jiayi Geng. 2024. SWE-bench Lite: A Canonical Subset for Efficient Evaluation of Language Models as Software Engineers. https:\/\/www.swebench.com\/lite.html"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of SDAIR-94","author":"Cavnar William B","year":"1994","unstructured":"William B Cavnar and John M Trenkle. 1994. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval. 161175, 14."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678559"},{"key":"e_1_2_1_8_1","unstructured":"Dong Chen Shaoxin Lin Muhan Zeng Daoguang Zan Jian-Gang Wang Anton Cheshkov Jun Sun Hao Yu Guoliang Dong and Artem Aliev. 2024. CodeR: Issue Resolving with Multi-Agent and Task Graphs. arXiv preprint arXiv:2406.01304."},{"key":"e_1_2_1_9_1","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman Alex Ray Raul Puri Gretchen Krueger Michael Petrov Heidy Khlaaf Girish Sastry Pamela Mishkin Brooke Chan Scott Gray Nick Ryder Mikhail Pavlov Alethea Power Lukasz Kaiser Mohammad Bavarian Clemens Winter Philippe Tillet Felipe Petroski Such Dave Cummings Matthias Plappert Fotios Chantzis Elizabeth Barnes Ariel Herbert-Voss William Hebgen Guss Alex Nichol Alex Paino Nikolas Tezak Jie Tang Igor Babuschkin Suchir Balaji Shantanu Jain William Saunders Christopher Hesse Andrew N. Carr Jan Leike Josh Achiam Vedant Misra Evan Morikawa Alec Radford Matthew Knight Miles Brundage Mira Murati Katie Mayer Peter Welinder Bob McGrew Dario Amodei Sam McCandlish Ilya Sutskever and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374."},{"key":"e_1_2_1_10_1","unstructured":"Xinyun Chen Maxwell Lin Nathanael Sch\u00e4rli and Denny Zhou. 2023. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128."},{"key":"e_1_2_1_11_1","unstructured":"Cognition. 2023. Introducing Devin. https:\/\/www.cognition.ai\/introducing-devin"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3649825"},{"key":"e_1_2_1_13_1","unstructured":"Linyuan Gong Mostafa Elhoushi and Alvin Cheung. 2024. AST-T5: Structure-Aware Pretraining for Code Generation and Understanding. arXiv preprint arXiv:2401.03003."},{"key":"e_1_2_1_14_1","volume-title":"Zijuan Lin, and Liyang Zhou.","author":"Hong Sirui","year":"2023","unstructured":"Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, and Liyang Zhou. 2023. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3609437.3609439"},{"key":"e_1_2_1_16_1","unstructured":"Huggingface Open LLM Leaderboard. 2024. Dataset Card for Evaluation run of Qwen. https:\/\/huggingface.co\/datasets\/open-llm-leaderboard\/Qwen__Qwen2-72B-details"},{"key":"e_1_2_1_17_1","unstructured":"Binyuan Hui Jian Yang Zeyu Cui Jiaxi Yang Dayiheng Liu Lei Zhang Tianyu Liu Jiajun Zhang Bowen Yu and Kai Dang. 2024. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186."},{"key":"e_1_2_1_18_1","volume-title":"Automatic Code Annotation Generation Based on Heterogeneous Graph Structure. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 497\u2013508","author":"Jiang Zhijie","year":"2023","unstructured":"Zhijie Jiang, Haixu Xiong, Yingwei Ma, Yao Zhang, Yan Ding, Yun Xiong, and Shanshan Li. 2023. Automatic Code Annotation Generation Based on Heterogeneous Graph Structure. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 497\u2013508."},{"key":"e_1_2_1_19_1","volume-title":"Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770.","author":"Jimenez Carlos E","year":"2023","unstructured":"Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770."},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Jiaolong Kong Mingfei Cheng Xiaofei Xie Shangqing Liu Xiaoning Du and Qi Guo. 2024. ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs. arXiv preprint arXiv:2403.01971.","DOI":"10.1145\/3719345"},{"key":"e_1_2_1_21_1","volume-title":"Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, and Michael R Lyu.","author":"Lee Cheryl","year":"2024","unstructured":"Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, and Michael R Lyu. 2024. A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. arXiv preprint arXiv:2404.17153."},{"key":"e_1_2_1_22_1","volume-title":"2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). 275\u2013286","author":"Li Jiaying","year":"2023","unstructured":"Jiaying Li, Yan Lei, Shanshan Li, Haifang Zhou, Yue Yu, Zhouyang Jia, Yingwei Ma, and Teng Wang. 2023. A two-stage framework for ambiguous classification in software engineering. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). 275\u2013286."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597207"},{"key":"e_1_2_1_24_1","unstructured":"Meiziniu Li Dongze Li Jianmeng Liu Jialun Cao Yongqiang Tian and Shing-Chi Cheung. 2024. DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis. arXiv preprint arXiv:2406.07944."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5430\u20135441","author":"Liu Bingchang","year":"2024","unstructured":"Bingchang Liu, Chaoyu Chen, Zi Gong, Cong Liao, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, and Hailian Zhou. 2024. Mftcoder: Boosting code llms with multitask fine-tuning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5430\u20135441."},{"key":"e_1_2_1_26_1","unstructured":"Junwei Liu Kaixin Wang Yixuan Chen Xin Peng Zhenpeng Chen Lingming Zhang and Yiling Lou. 2024. Large Language Model-Based Agents for Software Engineering: A Survey. arXiv preprint arXiv:2409.02977."},{"key":"e_1_2_1_27_1","volume-title":"Yuyao Wang, and Lingming Zhang.","author":"Liu Jiawei","year":"2024","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2024. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36 (2024)."},{"key":"e_1_2_1_28_1","unstructured":"Xiangyan Liu Bo Lan Zhiyuan Hu Yang Liu Zhicheng Zhang Wenmeng Zhou Fei Wang and Michael Shieh. 2024. CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases. arXiv preprint arXiv:2408.03910."},{"key":"e_1_2_1_29_1","unstructured":"Yizhou Liu Pengfei Gao Xinchen Wang Chao Peng and Zhao Zhang. 2024. MarsCode Agent: AI-native Automated Bug Fixing. arXiv preprint arXiv:2409.00899."},{"key":"e_1_2_1_30_1","volume-title":"Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, and Yuxiang Wei.","author":"Lozhkov Anton","year":"2024","unstructured":"Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, and Yuxiang Wei. 2024. StarCoder 2 and The Stack v2: The Next Generation. arXiv preprint arXiv:2402.19173."},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Qinyu Luo Yining Ye Shihao Liang Zhong Zhang Yujia Qin Yaxi Lu Yesai Wu Xin Cong Yankai Lin and Yingli Zhang. 2024. RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation. arXiv preprint arXiv:2402.16667.","DOI":"10.18653\/v1\/2024.emnlp-demo.46"},{"key":"e_1_2_1_32_1","volume-title":"Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568.","author":"Luo Ziyang","year":"2023","unstructured":"Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568."},{"key":"e_1_2_1_33_1","unstructured":"Yingwei Ma Yue Liu Yue Yu Yuanliang Zhang Yu Jiang Changjian Wang and Shanshan Li. 2023. At Which Training Stage Does Code Data Help LLMs Reasoning? arXiv preprint arXiv:2309.16298."},{"key":"e_1_2_1_34_1","unstructured":"Yingwei Ma Qingping Yang Rongyu Cao Binhua Li Fei Huang and Yongbin Li. 2024. How to Understand Whole Software Repository? arXiv preprint arXiv:2406.01422."},{"key":"e_1_2_1_35_1","volume-title":"2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 120\u2013131","author":"Ma Yingwei","year":"2023","unstructured":"Yingwei Ma, Yue Yu, Shanshan Li, Zhouyang Jia, Jun Ma, Rulin Xu, Wei Dong, and Xiangke Liao. 2023. Mulcs: Towards a unified deep representation for multilingual code search. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 120\u2013131."},{"key":"e_1_2_1_36_1","unstructured":"Meta. 2024. Introducing Llama 3.1. https:\/\/ai.meta.com\/blog\/meta-llama-3-1\/"},{"key":"e_1_2_1_37_1","volume-title":"Swayam Singh, Xiangru Tang, Leandro Von Werra, and Shayne Longpre.","author":"Muennighoff Niklas","year":"2023","unstructured":"Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro Von Werra, and Shayne Longpre. 2023. Octopack: Instruction tuning code large language models. arXiv preprint arXiv:2308.07124."},{"key":"e_1_2_1_38_1","unstructured":"Ansong Ni Miltiadis Allamanis Arman Cohan Yinlin Deng Kensen Shi Charles Sutton and Pengcheng Yin. 2024. NExT: Teaching Large Language Models to Reason about Code Execution. arXiv preprint arXiv:2404.14662."},{"key":"e_1_2_1_39_1","unstructured":"OpenAI. 2024. Introducing GPT-4o. https:\/\/openai.com\/index\/hello-gpt-4o\/"},{"key":"e_1_2_1_40_1","unstructured":"OpenAI. 2024. Introducing SWE-bench Verified. https:\/\/openai.com\/index\/introducing-swe-bench-verified\/"},{"key":"e_1_2_1_41_1","volume-title":"Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand.","author":"Pan Rangeet","year":"2023","unstructured":"Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2023. Understanding the effectiveness of large language models in code translation. arXiv preprint arXiv:2308.03109."},{"key":"e_1_2_1_42_1","unstructured":"Zhenyu Pan Rongyu Cao Yongchang Cao Yingwei Ma Binhua Li Fei Huang Han Liu and Yongbin Li. 2024. Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? arXiv preprint arXiv:2410.01353."},{"key":"e_1_2_1_43_1","unstructured":"Paul Gauthier. 2024. Aider is ai pair programming in your terminal.. https:\/\/aider.chat\/2024"},{"key":"e_1_2_1_44_1","unstructured":"Shuo Ren Daya Guo Shuai Lu Long Zhou Shujie Liu Duyu Tang Neel Sundaresan Ming Zhou Ambrosio Blanco and Shuai Ma. 2020. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297."},{"key":"e_1_2_1_45_1","volume-title":"Yossi Adi, Jingyu Liu, Romain Sauvestre, and Tal Remez.","author":"Roziere Baptiste","year":"2023","unstructured":"Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, and Tal Remez. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950."},{"key":"e_1_2_1_46_1","unstructured":"Bo Shen Jiaxin Zhang Taihong Chen Daoguang Zan Bing Geng An Fu Muhan Zeng Ailun Yu Jichuan Ji and Jingyang Zhao. 2023. Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936."},{"key":"e_1_2_1_47_1","unstructured":"Yuling Shi Songsong Wang Chengcheng Wan and Xiaodong Gu. 2024. From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging. arXiv preprint arXiv:2410.01215."},{"key":"e_1_2_1_48_1","volume-title":"2023 30th Asia-Pacific Software Engineering Conference (APSEC). 151\u2013160","author":"Shirafuji Atsushi","year":"2023","unstructured":"Atsushi Shirafuji, Yusuke Oda, Jun Suzuki, Makoto Morishita, and Yutaka Watanobe. 2023. Refactoring programs using large language models with few-shot examples. In 2023 30th Asia-Pacific Software Engineering Conference (APSEC). 151\u2013160."},{"key":"e_1_2_1_49_1","volume-title":"Codegemma: Open code models based on gemma. arXiv preprint arXiv:2406.11409.","author":"Team CodeGemma","year":"2024","unstructured":"CodeGemma Team. 2024. Codegemma: Open code models based on gemma. arXiv preprint arXiv:2406.11409."},{"key":"e_1_2_1_50_1","unstructured":"Gemini Team Rohan Anil Sebastian Borgeaud Yonghui Wu Jean-Baptiste Alayrac Jiahui Yu Radu Soricut Johan Schalkwyk Andrew M Dai and Anja Hauth. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805."},{"key":"e_1_2_1_51_1","first-page":"202","article-title":"Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for web retrieved documents using genetic algorithm","volume":"2","author":"Thada Vikas","year":"2013","unstructured":"Vikas Thada and Vivek Jaglan. 2013. Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for web retrieved documents using genetic algorithm. International Journal of Innovations in Engineering and Technology, 2, 4 (2013), 202\u2013205.","journal-title":"International Journal of Innovations in Engineering and Technology"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Yuchen Tian Weixiang Yan Qian Yang Qian Chen Wen Wang Ziyang Luo and Lei Ma. 2024. CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification. arXiv preprint arXiv:2405.00253.","DOI":"10.1609\/aaai.v39i24.34717"},{"key":"e_1_2_1_53_1","unstructured":"Xingyao Wang Yangyi Chen Lifan Yuan Yizhe Zhang Yunzhu Li Hao Peng and Heng Ji. 2024. Executable Code Actions Elicit Better LLM Agents. arxiv:2402.01030."},{"key":"e_1_2_1_54_1","volume-title":"Forty-first International Conference on Machine Learning.","author":"Wei Yuxiang","year":"2024","unstructured":"Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang. 2024. Magicoder: Empowering code generation with oss-instruct. In Forty-first International Conference on Machine Learning."},{"key":"e_1_2_1_55_1","volume-title":"Agentless: Demystifying llm-based software engineering agents. arXiv preprint arXiv:2407.01489.","author":"Xia Chunqiu Steven","year":"2024","unstructured":"Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, and Lingming Zhang. 2024. Agentless: Demystifying llm-based software engineering agents. arXiv preprint arXiv:2407.01489."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639121"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3691620.3695064"},{"key":"e_1_2_1_58_1","unstructured":"Ruiyang Xu Jialun Cao Yaojie Lu Hongyu Lin Xianpei Han Ben He Shing-Chi Cheung and Le Sun. 2024. CRUXEval-X: A Benchmark for Multilingual Code Reasoning Understanding and Execution. arXiv preprint arXiv:2408.13001."},{"key":"e_1_2_1_59_1","volume-title":"ACWRecommender: A Tool for Validating Actionable Warnings with Weak Supervision. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 1876\u20131880","author":"Xue Zhipeng","year":"2023","unstructured":"Zhipeng Xue, Zhipeng Gao, Xing Hu, and Shanping Li. 2023. ACWRecommender: A Tool for Validating Actionable Warnings with Weak Supervision. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 1876\u20131880."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650212.3680368"},{"key":"e_1_2_1_61_1","volume-title":"Better Debugging: Combining Static Analysis and LLMs for Explainable Crashing Fault Localization. arXiv preprint arXiv:2408.12070.","author":"Yan Jiwei","year":"2024","unstructured":"Jiwei Yan, Jinhao Huang, Chunrong Fang, Jun Yan, and Jian Zhang. 2024. Better Debugging: Combining Static Analysis and LLMs for Explainable Crashing Fault Localization. arXiv preprint arXiv:2408.12070."},{"key":"e_1_2_1_62_1","unstructured":"An Yang Baosong Yang Binyuan Hui Bo Zheng Bowen Yu Chang Zhou Chengpeng Li Chengyuan Li Dayiheng Liu and Fei Huang. 2024. Qwen2 technical report. arXiv preprint arXiv:2407.10671."},{"key":"e_1_2_1_63_1","volume-title":"Swe-agent: Agent-computer interfaces enable automated software engineering. arXiv preprint arXiv:2405.15793.","author":"Yang John","year":"2024","unstructured":"John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. Swe-agent: Agent-computer interfaces enable automated software engineering. arXiv preprint arXiv:2405.15793."},{"key":"e_1_2_1_64_1","volume-title":"Wavecoder: Widespread and versatile enhanced instruction tuning with refined data generation. arXiv preprint arXiv:2312.14187.","author":"Yu Zhaojian","year":"2023","unstructured":"Zhaojian Yu, Xin Zhang, Ning Shang, Yangyu Huang, Can Xu, Yishujie Zhao, Wenxiang Hu, and Qiufeng Yin. 2023. Wavecoder: Widespread and versatile enhanced instruction tuning with refined data generation. arXiv preprint arXiv:2312.14187."},{"key":"e_1_2_1_65_1","first-page":"15476","article-title":"Star: Bootstrapping reasoning with reasoning","volume":"35","author":"Zelikman Eric","year":"2022","unstructured":"Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah Goodman. 2022. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35 (2022), 15476\u201315488.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_66_1","doi-asserted-by":"crossref","unstructured":"Kechi Zhang Jia Li Ge Li Xianjie Shi and Zhi Jin. 2024. CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. arXiv preprint arXiv:2401.07339.","DOI":"10.18653\/v1\/2024.acl-long.737"},{"key":"e_1_2_1_67_1","unstructured":"Kexun Zhang Weiran Yao Zuxin Liu Yihao Feng Zhiwei Liu Rithesh Murthy Tian Lan Lei Li Renze Lou and Jiacheng Xu. 2024. Diversity empowers intelligence: Integrating expertise of software engineering agents. arXiv preprint arXiv:2408.07060."},{"key":"e_1_2_1_68_1","volume-title":"Shin Hwei Tan, and Chengnian Sun","author":"Zhang Mengxiao","year":"2023","unstructured":"Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Shin Hwei Tan, and Chengnian Sun. 2023. Lampr: Boosting the Effectiveness of Language-Generic Program Reduction via Large Language Models. arXiv preprint arXiv:2312.13064."},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 261\u2013273","author":"Zhang Mengxiao","year":"2024","unstructured":"Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Shin Hwei Tan, and Chengnian Sun. 2024. LPR: Large Language Models-Aided Program Reduction. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 261\u2013273."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650212.3680384"},{"key":"e_1_2_1_71_1","unstructured":"Yuwei Zhao Ziyang Luo Yuchen Tian Hongzhan Lin Weixiang Yan Annan Li and Jing Ma. 2024. CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding? arXiv preprint arXiv:2408.10718."},{"key":"e_1_2_1_72_1","volume-title":"Jie Fu, Wenhu Chen, and Xiang Yue.","author":"Zheng Tianyu","year":"2024","unstructured":"Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, and Xiang Yue. 2024. Opencodeinterpreter: Integrating code generation with execution and refinement. arXiv preprint arXiv:2402.14658."},{"key":"e_1_2_1_73_1","volume-title":"MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents. arXiv preprint arXiv:2409.16120.","author":"Zhu Ming","year":"2024","unstructured":"Ming Zhu and Yi Zhou. 2024. MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents. arXiv preprint arXiv:2409.16120."},{"key":"e_1_2_1_74_1","volume-title":"DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation. arXiv preprint arXiv:2408.13204.","author":"Zhu Qiming","year":"2024","unstructured":"Qiming Zhu, Jialun Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, and Shing-Chi Cheung. 2024. DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation. arXiv preprint arXiv:2408.13204."},{"key":"e_1_2_1_75_1","unstructured":"Qihao Zhu Daya Guo Zhihong Shao Dejian Yang Peiyi Wang Runxin Xu Y Wu Yukun Li Huazuo Gao and Shirong Ma. 2024. DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence. arXiv preprint arXiv:2406.11931."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3728981","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,1]],"date-time":"2025-08-01T20:15:01Z","timestamp":1754079301000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3728981"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,22]]},"references-count":75,"journal-issue":{"issue":"ISSTA","published-print":{"date-parts":[[2025,6,22]]}},"alternative-id":["10.1145\/3728981"],"URL":"https:\/\/doi.org\/10.1145\/3728981","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,22]]}}}