{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T22:49:18Z","timestamp":1778539758781,"version":"3.51.4"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>\n                    Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug localization and repair utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our approach uniquely employs LLMs to predict bug location at the token level and subsequently utilizes them for bug fixing. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information and improved incorporation of inductive biases. We introduce\n                    <jats:bold>\n                      Toggle:\n                      <jats:underline>To<\/jats:underline>\n                      ken-\n                      <jats:underline>G<\/jats:underline>\n                      ranulated Bug\n                      <jats:underline>L<\/jats:underline>\n                      ocalization and R\n                      <jats:underline>e<\/jats:underline>\n                      pair\n                    <\/jats:bold>\n                    , a comprehensive program repair framework that integrates a bug localization model, an adjustment model to address tokenizer inconsistencies, and a bug-fixing model. Toggle takes a buggy function as input and generates a complete corrected function. We investigate various styles of prompting to the bug fixing model to identify the most effective prompts that better utilize the inductive bias and significantly outperform others.\n                    <jats:bold>Toggle<\/jats:bold>\n                    achieves the new state-of-the-art performance on the CodeXGLUE code refinement benchmark, and exhibits better and comparable performance on several other widely-used APR datasets, including Defects4J. In the Defects4J benchmark, our approach consistently ranks above other methods, achieving superior results in the Top-10, Top-30, Top-50, and Top-100 metrics. Besides examining Toggle\u2019s generalizability to unseen data, evaluating the effectiveness of various prompts, we also investigate the impact of additional contextual information such as buggy lines and code comments on bug localization, and explore the importance of the adjustment model. Our extensive experiments offer valuable insights and answers to critical research questions.\n                  <\/jats:p>","DOI":"10.1145\/3660773","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"1471-1493","source":"Crossref","is-referenced-by-count":61,"title":["A Deep Dive into Large Language Models for Automated Bug Localization and Repair"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7282-061X","authenticated-orcid":false,"given":"Soneya Binta","family":"Hossain","sequence":"first","affiliation":[{"name":"University of Virginia, Charlottesville, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8518-2576","authenticated-orcid":false,"given":"Nan","family":"Jiang","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6770-0880","authenticated-orcid":false,"given":"Qiang","family":"Zhou","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4916-1131","authenticated-orcid":false,"given":"Xiaopeng","family":"Li","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2300-738X","authenticated-orcid":false,"given":"Wen-Hao","family":"Chiang","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0139-8028","authenticated-orcid":false,"given":"Yingjun","family":"Lyu","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6194-7930","authenticated-orcid":false,"given":"Hoan","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2393-854X","authenticated-orcid":false,"given":"Omer","family":"Tripp","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","unstructured":"Rui Abreu Peter Zoeteweij and Arjan J.C. van Gemund. 2007. On the Accuracy of Spectrum-based Fault Localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007). 89\u201398. https:\/\/doi.org\/10.1109\/TAIC.PART.2007.13 10.1109\/TAIC.PART.2007.13","DOI":"10.1109\/TAIC.PART.2007.13"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.3020502"},{"key":"e_1_3_1_4_2","unstructured":"Angelica Chen J\u00e9r\u00e9my Scheurer Tomasz Korbak Jon Ander Campos Jun Shern Chan Samuel R. Bowman Kyunghyun Cho and Ethan Perez. 2023. Improving Code Generation by Training with Natural Language Feedback. arXiv:2303.16749 [cs.SE]"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","unstructured":"Bei Chen Fengji Zhang Anh Nguyen Daoguang Zan Zeqi Lin Jian-Guang Lou and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. https:\/\/doi.org\/10.48550\/ARXIV.2207.10397 10.48550\/ARXIV.2207.10397","DOI":"10.48550\/ARXIV.2207.10397"},{"key":"e_1_3_1_6_2","unstructured":"Zimin Chen Steve James Kommrusch Michele Tufano Louis-No\u00ebl Pouchet Denys Poshyvanyk and Martin Monperrus. 2019. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair. IEEE Transactions on Software Engineering (2019)."},{"key":"e_1_3_1_7_2","unstructured":"Jianlei Chi Yu Qu Ting Liu Qinghua Zheng and Heng Yin. 2022. SeqTrans: Automatic Vulnerability Fix via Sequence to Sequence Learning. arXiv:2010.10805 [cs.CR]"},{"key":"e_1_3_1_8_2","first-page":"2383","volume-title":"32nd USENIX Security Symposium (USENIX Security 23)","author":"Christou Neophytos","year":"2023","unstructured":"Neophytos Christou, Di Jin, Vaggelis Atlidakis, Baishakhi Ray, and Vasileios P. Kemerlis. 2023. IvySyn: Automated Vulnerability Discovery in Deep Learning Frameworks. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 2383\u20132400. https:\/\/www.usenix.org\/conference\/usenixsecurity23\/presentation\/christou"},{"key":"e_1_3_1_9_2","unstructured":"CodeParrot. [n. d.]. https:\/\/github.com\/huggingface\/transformers\/tree\/main\/examples\/research_projects\/codeparrot."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598067"},{"key":"e_1_3_1_11_2","unstructured":"Elizabeth Dinella Hanjun Dai Ziyang Li Mayur Naik Le Song and Ke Wang. 2020. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In International Conference on Learning Representations."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3318162"},{"key":"e_1_3_1_14_2","unstructured":"Tatsunori B. Hashimoto Kelvin Guu Yonatan Oren and Percy Liang. 2018. A Retrieve-and-Edit Framework for Predicting Structured Outputs. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 NeurIPS 2018 December 3-8 2018 Montr\u00e9al Canada Samy Bengio Hanna M. Wallach Hugo Larochelle Kristen Grauman Nicol\u00f2 Cesa-Bianchi and Roman Garnett (Eds.). 10073\u201310083. https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/cd17d3ce3b64f227987cd92cd701cc58-Abstract.html"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.282"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3616265"},{"key":"e_1_3_1_17_2","unstructured":"Yaojie Hu Xingjian Shi Qiang Zhou and Lee Pike. 2022. Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar. arXiv preprint arXiv:2204.06643 (2022)."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00125"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","unstructured":"Nan Jiang Thibaud Lutellier Yiling Lou Lin Tan Dan Goldwasser and Xiangyu Zhang. 2023. KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1251\u20131263. https:\/\/doi.org\/10.1109\/ICSE48619.2023.00111 10.1109\/ICSE48619.2023.00111","DOI":"10.1109\/ICSE48619.2023.00111"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","unstructured":"Nan Jiang Thibaud Lutellier and Lin Tan. 2021. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In 2021 IEEE\/ACM 43rd International Conference on Software Engineering (ICSE). 1161\u20131173. https:\/\/doi.org\/10.1109\/ICSE43902.2021.00107 10.1109\/ICSE43902.2021.00107","DOI":"10.1109\/ICSE43902.2021.00107"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/2610384.2628055"},{"key":"e_1_3_1_22_2","unstructured":"Hung Le Yue Wang Akhilesh Deepak Gotmare Silvio Savarese and Steven C.H. Hoi. 2022. CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning. arXiv preprint abs\/2207.01780 (2022)."},{"key":"e_1_3_1_23_2","unstructured":"CodeXGLUE Leaderboard. 2023. https:\/\/microsoft.github.io\/CodeXGLUE\/. Accessed: 2023-09-27."},{"key":"e_1_3_1_24_2","first-page":"602","volume-title":"ICSE (Seoul, South Korea)","author":"Li Yi","year":"2020","unstructured":"Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. DLFix: Context-Based Code Transformation Learning for Automated Program Repair. In ICSE (Seoul, South Korea). ACM, 602\u2013614."},{"key":"e_1_3_1_25_2","unstructured":"Zhiyu Li Shuai Lu Daya Guo Nan Duan Shailesh Jannu Grant Jenks Deep Majumder Jared Green Alexey Svyatkovskiy Shengyu Fu et al. 2022. CodeReviewer: Pre-Training for Automating Code Review Activities. arXiv preprint arXiv:2203.09095 (2022)."},{"key":"e_1_3_1_26_2","unstructured":"Jiate Liu Yiqin Zhu Kaiwen Xiao Qiang Fu Xiao Han Wei Yang and Deheng Ye. 2023. RLTF: Reinforcement Learning from Unit Test Feedback. arXiv:2307.04349 [cs.AI]"},{"key":"e_1_3_1_27_2","unstructured":"Shuai Lu Daya Guo Shuo Ren Junjie Huang Alexey Svyatkovskiy Ambrosio Blanco Colin Clement Dawn Drain Daxin Jiang Duyu Tang et al. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. arXiv preprint arXiv:2102.04664 (2021)."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397369"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","unstructured":"Xiangxin Meng Xu Wang Hongyu Zhang Hailong Sun Xudong Liu and Chunming Hu. 2023. Template-based Neural Program Repair. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1456\u20131468. https:\/\/doi.org\/10.1109\/ICSE48619.2023.00127 10.1109\/ICSE48619.2023.00127","DOI":"10.1109\/ICSE48619.2023.00127"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3105906"},{"key":"e_1_3_1_31_2","unstructured":"Martin Monperrus. 2020. The living review on automated program repair. (2020)."},{"key":"e_1_3_1_32_2","unstructured":"Erik Nijkamp Bo Pang Hiroaki Hayashi Lifu Tu Huan Wang Yingbo Zhou Silvio Savarese and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474 [cs.LG]"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","unstructured":"Sheena Panthaplackel Miltiadis Allamanis and Marc Brockschmidt. 2020. Copy that! Editing Sequences by Copying Spans. arXiv:2006.04771 [cs.LG]","DOI":"10.1609\/aaai.v35i15.17606"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Long Phan Hieu Tran Daniel Le Hieu Nguyen James Anibal Alec Peltekian and Yanfang Ye. 2021. CoTexT: Multi-task Learning with Code-Text Transformer. arXiv preprint arXiv:2105.08645 (2021).","DOI":"10.18653\/v1\/2021.nlp4prog-1.5"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","unstructured":"Benjamin Steenhoek Md Mahbubur Rahman Richard Jiles and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 2237\u20132248. https:\/\/doi.org\/10.1109\/ICSE48619.2023.00188 10.1109\/ICSE48619.2023.00188","DOI":"10.1109\/ICSE48619.2023.00188"},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","unstructured":"Daniel Tarlow Subhodeep Moitra Andrew Rice Zimin Chen Pierre-Antoine Manzagol Charles Sutton and Edward Aftandilian. 2020. Learning to fix build errors with graph2diff neural networks. In Proceedings of the IEEE\/ACM 42nd International Conference on Software Engineering Workshops. 19\u201320.","DOI":"10.1145\/3387940.3392181"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340544"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597926.3598135"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00129"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549101"},{"key":"e_1_3_1_42_2","unstructured":"Danning Xie Byungwoo Yoo Nan Jiang Mijung Kim Lin Tan Xiangyu Zhang and Judy S. Lee. 2023. Impact of Large Language Models on Generating Software Specifications. arXiv:2306.03324 [cs.SE]"},{"key":"e_1_3_1_43_2","unstructured":"Frank F. Xu Uri Alon Graham Neubig and Vincent J. Hellendoorn. 2022. A Systematic Evaluation of Large Language Models of Code. arXiv:2202.13169 [cs.PL]"},{"key":"e_1_3_1_44_2","unstructured":"Ziyu Yao Frank F. Xu Pengcheng Yin Huan Sun and Graham Neubig. 2021. Learning Structural Edits via Incremental Tree Transformations. In 9th International Conference on Learning Representations ICLR 2021 Virtual Event Austria May 3-7 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=v9hAX77--cZ"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","unstructured":"He Ye Matias Martinez and Martin Monperrus. 2022. Neural Program Repair with Execution-based Backpropagation. In Proceedings of the International Conference on Software Engineering. https:\/\/doi.org\/10.1145\/3510003.3510222 10.1145\/3510003.3510222","DOI":"10.1145\/3510003.3510222"},{"key":"e_1_3_1_46_2","unstructured":"Pengcheng Yin Graham Neubig Miltiadis Allamanis Marc Brockschmidt and Alexander L. Gaunt. 2019. Learning to Represent Edits. In 7th International Conference on Learning Representations ICLR 2019 New Orleans LA USA May 6-9 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=BJl6AjC5F7"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468544"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","unstructured":"Qihao Zhu Zeyu Sun Wenjie Zhang Yingfei Xiong and Lu Zhang. 2023. Tare: Type-Aware Neural Program Repair. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1443\u20131455. https:\/\/doi.org\/10.1109\/ICSE48619.2023.00126 10.1109\/ICSE48619.2023.00126","DOI":"10.1109\/ICSE48619.2023.00126"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660773","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660773","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T08:00:52Z","timestamp":1770192052000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660773"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":47,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3660773"],"URL":"https:\/\/doi.org\/10.1145\/3660773","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}