{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T21:03:44Z","timestamp":1773522224005,"version":"3.50.1"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,12,30]],"date-time":"2024-12-30T00:00:00Z","timestamp":1735516800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2022YFF0711404"],"award-info":[{"award-number":["2022YFF0711404"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62172214"],"award-info":[{"award-number":["62172214"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Natural Science Foundation of Jiangsu Province, China","award":["BK20201250, BK20210279"],"award-info":[{"award-number":["BK20201250, BK20210279"]}]},{"name":"CCF-Huawei Populus Grove Fund"},{"name":"European Research Council (ERC) under the European Union\u2019s Horizon 2020 research and innovation program","award":["949014"],"award-info":[{"award-number":["949014"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,1,31]]},"abstract":"<jats:p>\n            Recent years have seen a rise in Neural Program Repair (NPR) systems in the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having a comprehensive understanding of existing systems can facilitate new improvements in this area and provide practical instructions for users. However, we observe two potential weaknesses in the current evaluation of NPR systems: \u2460 published systems are trained with varying data, and \u2461 NPR systems are roughly evaluated through the number of totally fixed bugs. Questions such as\n            <jats:italic>what types of bugs are repairable for current systems<\/jats:italic>\n            cannot be answered yet. Consequently, researchers cannot make target improvements in this area and users have no idea of the real affair of existing systems. In this article, we perform a systematic evaluation of the existing nine state-of-the-art NPR systems. To perform a fair and detailed comparison, we (1) build a new benchmark and framework that supports training and validating the nine systems with unified data and (2) evaluate re-trained systems with detailed performance analysis, especially on the effectiveness and the efficiency. We believe our benchmark tool and evaluation results could offer practitioners the real affairs of current NPR systems and the implications of further facilitating the improvements of NPR.\n          <\/jats:p>","DOI":"10.1145\/3688834","type":"journal-article","created":{"date-parts":[[2024,8,19]],"date-time":"2024-08-19T17:04:36Z","timestamp":1724087076000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7426-1594","authenticated-orcid":false,"given":"Wenkang","family":"Zhong","sequence":"first","affiliation":[{"name":"State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9270-5072","authenticated-orcid":false,"given":"Chuanyi","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0145-615X","authenticated-orcid":false,"given":"Kui","family":"Liu","sequence":"additional","affiliation":[{"name":"Huawei Software Engineering Application Technology Lab, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1773-0942","authenticated-orcid":false,"given":"Jidong","family":"Ge","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1102-9584","authenticated-orcid":false,"given":"Bin","family":"Luo","sequence":"additional","affiliation":[{"name":"State Key Laboratory for Novel Software and Technology, Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7270-9869","authenticated-orcid":false,"given":"Tegawend\u00e9 F.","family":"Bissyand\u00e9","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg, Luxembourg"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8237-429X","authenticated-orcid":false,"given":"Vincent","family":"Ng","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, Richardson, TX, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,12,30]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-021-09989-x"},{"key":"e_1_3_2_3_2","first-page":"780","volume-title":"Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139)","author":"Berabi Berkay","year":"2021","unstructured":"Berkay Berabi, Jingxuan He, Veselin Raychev, and Martin T. Vechev. 2021. TFix: Learning to fix coding errors with a text-to-text transformer. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139). Marina Meila and Tong Zhang (Eds.), PMLR, 780\u2013791. Retrieved from http:\/\/proceedings.mlr.press\/v139\/berabi21a.html"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.3020502"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2940179"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/32.177364"},{"key":"e_1_3_2_7_2","first-page":"2067","volume-title":"Proceedings of the 32nd International Conference on Machine Learning, ICML 2015","author":"Chung Junyoung","year":"2015","unstructured":"Junyoung Chung, \u00c7aglar G\u00fcl\u00e7ehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015 (JMLR Workshop and Conference Proceedings, Vol. 37). Francis R. Bach and David M. Blei (Eds.), JMLR.org, 2067\u20132075. Retrieved from http:\/\/proceedings.mlr.press\/v37\/chung15.html"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/N19-1423"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416587"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338911"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2896921.2896931"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"issue":"4","key":"e_1_3_2_13_2","article-title":"RobustNPR: Evaluating the Robustness of Neural Program Repair Models","volume":"36","author":"Ge Hongliang","year":"2023","unstructured":"Hongliang Ge, Wenkang Zhong, Chuanyi Li, Jidong Ge, Hao Hu, and Bin Luo. 2023. RobustNPR: Evaluating the Robustness of Neural Program Repair Models. J. Softw. Evol. Process. 36, 4 (2024).","journal-title":"J. Softw. Evol. Process."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00116"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-021-10100-7"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2012.6227211"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2011.104"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2022.ACL-LONG.499"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/0893-6080(89)90020-8"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00181"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","unstructured":"Kai Huang Zhengzi Xu Su Yang Hongyu Sun Xuejun Li Zheng Yan and Yuqing Zhang. 2023b. A survey on automated program repair techniques. arXiv:2303.18184. Retrieved from 10.48550\/arXiv.2303.18184","DOI":"10.48550\/arXiv.2303.18184"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3571730"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00033"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3213846.3213871"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00125"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00111"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00107"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00069"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3613892"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V37I4.25642"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2610384.2628055"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3379597.3387491"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-019-09780-z"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338935"},{"key":"e_1_3_2_36_2","unstructured":"Suhua Lei Huan Zhang Ke Wang and Zhendong Su. 2018. How training data affect the accuracy and robustness of neural networks for image classification. (2018). Retrieved from https:\/\/openreview.net\/forum?id=HklKWhC5F7"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510177"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360588"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3135932.3135941"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICST.2019.00020"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2019.8667970"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293882.3330577"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110817"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380338"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397369"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2019.8667991"},{"key":"e_1_3_2_47_2","article-title":"Common weakness enumeration","volume":"24","author":"Martin Robert A.","year":"2007","unstructured":"Robert A. Martin. 2007. Common weakness enumeration. Mitre Corp. 24 (2007).","journal-title":"Mitre Corp"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-016-9470-4"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/2931037.2948705"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-99241-9_3"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR52588.2021.00063"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00127"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568324"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3105906"},{"key":"e_1_3_2_55_2","unstructured":"Martin Monperrus Matias Martinez He Ye Fernanda Madeiral Thomas Durieux and Zhongxing Yu. 2021. Megadiff: A dataset of 600k Java source code changes categorized by diff size. arXiv:2108.04631. Retrieved from https:\/\/arxiv.org\/abs\/2108.04631"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3182533"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.2998785"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00089"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110538"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35289-8_5"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524459.3527351"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/2771783.2771791"},{"key":"e_1_3_2_63_2","first-page":"140:1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1\u2013140:67. DOI: http:\/\/jmlr.org\/papers\/v21\/20-074.html","journal-title":"J. Mach. Learn. Res"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3196398.3196473"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00020"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE-Companion58688.2023.00028"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","unstructured":"Andr\u00e9 Silva Sen Fang and Martin Monperrus. 2023. RepairLLaMA: Efficient representations and fine-tuned adapters for program repair. arXiv:2312.15698. Retrieved from 10.48550\/ARXIV.2312.15698","DOI":"10.48550\/ARXIV.2312.15698"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1145\/2786805.2786825"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.111"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340544"},{"key":"e_1_3_2_71_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 5998\u20136008. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.EMNLP-MAIN.685"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468600"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00129"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549101"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","unstructured":"Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv:2304.00385. Retrieved from 10.48550\/ARXIV.2304.00385","DOI":"10.48550\/ARXIV.2304.00385"},{"key":"e_1_3_2_77_2","unstructured":"Jiahong Xiang Xiaoyang Xu Fanchu Kong Mingyuan Wu Haotian Zhang and Yuqun Zhang. 2024. How far can we go with practical function-level program repair? arXiv:2404.12833."},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2017.45"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.2987862"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2016.2560811"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556893"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556926"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-020-09920-w"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510222"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-017-9552-y"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2018.2874648"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360004"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","unstructured":"Quanjun Zhang Chunrong Fang Yuxiang Ma Weisong Sun and Zhenyu Chen. 2023a. A survey of learning-based automated program repair. arXiv:2301.03270. Retrieved from 10.48550\/arXiv.2301.03270","DOI":"10.48550\/arXiv.2301.03270"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE56229.2023.00063"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556943"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/3545258.3545268"},{"key":"e_1_3_2_92_2","first-page":"860","volume-title":"Proceedings of the 2024 IEEE\/ACM 46th International Conference on Software Engineering (ICSE)","author":"Zhu Qihao","year":"2024","unstructured":"Qihao Zhu, Qingyuan Liang, Zeyu Sun, Yingfei Xiong, Lu Zhang, and Shengyu Cheng. 2024. GrammarT5: Grammar-integrated pretrained encoder-decoder neural model for code. In Proceedings of the 2024 IEEE\/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, 860\u2013860."},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468544"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00126"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3688834","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3688834","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:10Z","timestamp":1750291450000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3688834"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,30]]},"references-count":93,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1,31]]}},"alternative-id":["10.1145\/3688834"],"URL":"https:\/\/doi.org\/10.1145\/3688834","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,30]]},"assertion":[{"value":"2023-09-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}