{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T21:44:16Z","timestamp":1774129456444,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>\n            Large Language Models (LLMs) have achieved remarkable success in various applications, particularly in code-related tasks such as code generation and program repair, setting new performance benchmarks. However, the extensive use of large training corpora raises concerns about whether these achievements stem from genuine understanding or mere memorization of training data\u2014a question often overlooked in current research. This paper aims to study the memorization issue within LLM-based program repair by investigating whether the correct patches generated by LLMs are the result of memorization. The key challenge lies in the absence of ground truth for confirming memorization, leading to various ad-hoc methods designed for its detection. To address this challenge, we first propose a general framework that formalizes memorization detection as a general hypothesis testing problem, where existing approaches can be unified by defining a\n            <jats:italic toggle=\"yes\">low-probability event<\/jats:italic>\n            under the\n            <jats:italic toggle=\"yes\">null hypothesis<\/jats:italic>\n            that the data is not memorized. The occurrence of such an event leads to the rejection of the null hypothesis, indicating potential memorization.\n          <\/jats:p>\n          <jats:p>Based on this framework, we design two specific methods (i.e., low-probability events) to detect potential memorization: 1) basic ground-truth matching, and 2) reassessment after substantial code mutation. We investigate the memorization issue in LLM-based program repair using two datasets: Defects4J, a widely used benchmark that is likely included in the training data, and GitBug-Java, a new dataset that is unlikely to be part of the training data. Our findings reveal that a significant portion of correct patches exactly match the ground truths in Defects4J (e.g., 78.83% and 87.42% on GPT-3.5 and CodeLlama-7b, respectively). Moreover, even after significant modifications to the buggy code, where the original repairs should not be generated, a considerable percentage of bugs (e.g., 81.82% on GPT-3.5 and 88.24% on CodeLlama-7b) continue to be fixed exactly as in the original bug fixes, indicating a high likelihood of memorization. Furthermore, we evaluate existing memorization detection methods and demonstrate their ineffectiveness in this context (e.g., most AUROCs are below 0.5). The theoretical analysis under our hypothesis testing framework shows that their defined events may not meet the requirements for being low-probability. The study highlights the critical need for more robust and rigorous evaluations in LLM-based software engineering research, ensuring a clear distinction between true problem-solving capabilities and mere memorization.<\/jats:p>","DOI":"10.1145\/3729390","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:02Z","timestamp":1750346162000},"page":"2712-2734","source":"Crossref","is-referenced-by-count":7,"title":["Demystifying Memorization in LLM-Based Program Repair via a General Hypothesis Testing Framework"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-8248-1981","authenticated-orcid":false,"given":"Jiaolong","family":"Kong","sequence":"first","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1288-6502","authenticated-orcid":false,"given":"Xiaofei","family":"Xie","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5598-4006","authenticated-orcid":false,"given":"Shangqing","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2023. Did chatgpt cheat on your test? https:\/\/hitz-zentroa.github.io\/lm-contamination\/blog\/"},{"key":"e_1_2_1_2_1","unstructured":"2023. A Massively Spiffy Yet Delicately Unobtrusive Compression Library. https:\/\/zlib.net\/"},{"key":"e_1_2_1_3_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, and Shyamal Anadkat. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774."},{"key":"e_1_2_1_4_1","unstructured":"Anonymous. 2024. MemPrompt. https:\/\/sites.google.com\/view\/memprompt"},{"key":"e_1_2_1_5_1","volume-title":"Hypothesis testing, type I and type II errors. Industrial psychiatry journal, 18, 2","author":"Banerjee Amitav","year":"2009","unstructured":"Amitav Banerjee, UB Chitnis, SL Jadhav, JS Bhawalkar, and S Chaudhury. 2009. Hypothesis testing, type I and type II errors. Industrial psychiatry journal, 18, 2 (2009), 127\u2013131."},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1163\/156852802321016541","article-title":"The development of modus ponens in antiquity: From Aristotle to the 2nd century AD","volume":"47","author":"Bobzien Susanne","year":"2002","unstructured":"Susanne Bobzien. 2002. The development of modus ponens in antiquity: From Aristotle to the 2nd century AD. Phronesis, 47, 4 (2002), 359\u2013394.","journal-title":"Phronesis"},{"key":"e_1_2_1_7_1","unstructured":"Susanne Bobzien. 2006. Ancient logic."},{"key":"e_1_2_1_8_1","unstructured":"Islem Bouzenia Premkumar Devanbu and Michael Pradel. 2024. RepairAgent: An Autonomous LLM-Based Agent for Program Repair. arXiv preprint arXiv:2403.17134."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","first-page":"565","DOI":"10.2307\/3325045","article-title":"Decision processes for low probability events: Policy implications","volume":"8","author":"Camerer Colin F","year":"1989","unstructured":"Colin F Camerer and Howard Kunreuther. 1989. Decision processes for low probability events: Policy implications. Journal of policy analysis and management, 8, 4 (1989), 565\u2013592.","journal-title":"Journal of policy analysis and management"},{"key":"e_1_2_1_10_1","unstructured":"Nicholas Carlini Daphne Ippolito Matthew Jagielski Katherine Lee Florian Tramer and Chiyuan Zhang. 2022. Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646."},{"key":"e_1_2_1_11_1","volume-title":"28th USENIX security symposium (USENIX security 19). 267\u2013284.","author":"Carlini Nicholas","unstructured":"Nicholas Carlini, Chang Liu, \u00dalfar Erlingsson, Jernej Kos, and Dawn Song. 2019. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX security symposium (USENIX security 19). 267\u2013284."},{"key":"e_1_2_1_12_1","volume-title":"30th USENIX Security Symposium (USENIX Security 21)","author":"Carlini Nicholas","year":"2021","unstructured":"Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, and Ulfar Erlingsson. 2021. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21). 2633\u20132650."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1006\/csla.1999.0128","article-title":"An empirical study of smoothing techniques for language modeling","volume":"13","author":"Chen Stanley F","year":"1999","unstructured":"Stanley F Chen and Joshua Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13, 4 (1999), 359\u2013394.","journal-title":"Computer Speech & Language"},{"key":"e_1_2_1_14_1","first-page":"1","article-title":"Palm: Scaling language modeling with pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, and Sebastian Gehrmann. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24, 240 (2023), 1\u2013113.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Yihong Dong Xue Jiang Huanyu Liu Zhi Jin and Ge Li. 2024. Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models. arXiv preprint arXiv:2402.15938.","DOI":"10.18653\/v1\/2024.findings-acl.716"},{"key":"e_1_2_1_16_1","volume-title":"International Conference on Machine Learning. 5547\u20135569","author":"Du Nan","year":"2022","unstructured":"Nan Du, Yanping Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, and Orhan Firat. 2022. Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning. 5547\u20135569."},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Sarah Fakhoury Aaditya Naik Georgios Sakkas Saikat Chakraborty and Shuvendu K Lahiri. 2024. LLM-based Test-driven Interactive Code Generation: User Study and Empirical Evaluation. arXiv preprint arXiv:2404.10100.","DOI":"10.1109\/TSE.2024.3428972"},{"key":"e_1_2_1_18_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1469\u20131481","author":"Fan Zhiyu","year":"2023","unstructured":"Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1469\u20131481."},{"key":"e_1_2_1_19_1","unstructured":"Shahriar Golchin and Mihai Surdeanu. 2023. Data contamination quiz: A tool to detect and estimate contamination in large language models. arXiv preprint arXiv:2311.06233."},{"key":"e_1_2_1_20_1","unstructured":"Shahriar Golchin and Mihai Surdeanu. 2023. Time travel in llms: Tracing data contamination in large language models. arXiv preprint arXiv:2308.08493."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313","author":"Guo Qi","year":"2024","unstructured":"Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li, Bihuan Chen, and Xin Peng. 2024. Exploring the potential of chatgpt in automated code refinement: An empirical study. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 313\u2013324","author":"Guo Qi","year":"2024","unstructured":"Qi Guo, Xiaohong Li, Xiaofei Xie, Shangqing Liu, Ze Tang, Ruitao Feng, Junjie Wang, Jidong Ge, and Lei Bu. 2024. FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 313\u2013324."},{"key":"e_1_2_1_23_1","volume-title":"2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). 136\u2013146","author":"Hao Sichong","year":"2023","unstructured":"Sichong Hao, Xianjun Shi, Hongwei Liu, and Yanjun Shu. 2023. Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). 136\u2013146."},{"key":"e_1_2_1_24_1","unstructured":"Valentin Hartmann Anshuman Suri Vincent Bindschaedler David Evans Shruti Tople and Robert West. 2023. SoK: Memorization in General-Purpose Large Language Models. arXiv preprint arXiv:2310.18362."},{"key":"e_1_2_1_25_1","unstructured":"D\u00e1vid Hidv\u00e9gi Khashayar Etemadi Sofia Bobadilla and Martin Monperrus. 2024. CigaR: Cost-efficient Program Repair with LLMs. arXiv preprint arXiv:2402.06598."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1646\u20131656","author":"Jin Matthew","year":"2023","unstructured":"Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. 2023. Inferfix: End-to-end program repair with llms. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1646\u20131656."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 2014 international symposium on software testing and analysis. 437\u2013440","author":"Just Ren\u00e9","year":"2014","unstructured":"Ren\u00e9 Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437\u2013440."},{"key":"e_1_2_1_28_1","volume-title":"Contrastrepair: Enhancing conversation-based automated program repair via contrastive test case pairs. arXiv preprint arXiv:2403.01971.","author":"Kong Jiaolong","year":"2024","unstructured":"Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du, and Qi Guo. 2024. Contrastrepair: Enhancing conversation-based automated program repair via contrastive test case pairs. arXiv preprint arXiv:2403.01971."},{"key":"e_1_2_1_29_1","unstructured":"Yucheng Li. 2023. Estimating contamination via perplexity: Quantifying memorisation in language model evaluation. arXiv preprint arXiv:2309.10677."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1035\u20131047","author":"Li Zhiyu","year":"2022","unstructured":"Zhiyu Li, Shuai Lu, Daya Guo, Nan Duan, Shailesh Jannu, Grant Jenks, Deep Majumder, Jared Green, Alexey Svyatkovskiy, and Shengyu Fu. 2022. Automating code review activities by large-scale pre-training. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1035\u20131047."},{"key":"e_1_2_1_31_1","volume-title":"Yuyao Wang, and Lingming Zhang.","author":"Liu Jiawei","year":"2024","unstructured":"Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2024. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36 (2024)."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 31\u201342","author":"Liu Kui","year":"2019","unstructured":"Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend\u00e9 F Bissyand\u00e9. 2019. Tbar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 31\u201342."},{"key":"e_1_2_1_33_1","first-page":"1","article-title":"Automated commit intelligence by pre-training","volume":"33","author":"Liu Shangqing","year":"2024","unstructured":"Shangqing Liu, Yanzhou Li, Xiaofei Xie, Wei Ma, Guozhu Meng, and Yang Liu. 2024. Automated commit intelligence by pre-training. ACM Transactions on Software Engineering and Methodology, 33, 8 (2024), 1\u201330.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_2_1_34_1","unstructured":"Lezhi Ma Shangqing Liu Yi Li Xiaofei Xie and Lei Bu. 2024. SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. arXiv preprint arXiv:2401.08807."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Justus Mattern Fatemehsadat Mireshghallah Zhijing Jin Bernhard Sch\u00f6lkopf Mrinmaya Sachan and Taylor Berg-Kirkpatrick. 2023. Membership inference attacks against language models via neighbourhood comparison. arXiv preprint arXiv:2305.18462.","DOI":"10.18653\/v1\/2023.findings-acl.719"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Gary H McClelland William D Schulze and Don L Coursey. 1993. Insurance for low-probability hazards: A bimodal response to unlikely events. Making Decisions About Liability And Insurance: A Special Issue of the Journal of Risk and Uncertainty 95\u2013116.","DOI":"10.1007\/978-94-011-2192-7_7"},{"key":"e_1_2_1_37_1","volume-title":"International Conference on Machine Learning. 26106\u201326128","author":"Ni Ansong","year":"2023","unstructured":"Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida Wang, and Xi Victoria Lin. 2023. Lever: Learning to verify language-to-code generation with execution. In International Conference on Machine Learning. 26106\u201326128."},{"key":"e_1_2_1_38_1","volume-title":"2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). 112\u2013119","author":"Purba Moumita Das","year":"2023","unstructured":"Moumita Das Purba, Arpita Ghosh, Benjamin J Radford, and Bill Chu. 2023. Software vulnerability detection using large language models. In 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). 112\u2013119."},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","first-page":"107066","DOI":"10.1016\/j.infsof.2022.107066","article-title":"Memorization and generalization in neural code intelligence models","volume":"153","author":"Islam Rabin Md Rafiqul","year":"2023","unstructured":"Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour, and Vincent J Hellendoorn. 2023. Memorization and generalization in neural code intelligence models. Information and Software Technology, 153 (2023), 107066.","journal-title":"Information and Software Technology"},{"key":"e_1_2_1_40_1","volume-title":"then Q: Conditionals and the Foundations of Reasoning","author":"Sanford David","unstructured":"David Sanford. 2011. If P, then Q: Conditionals and the Foundations of Reasoning. Routledge."},{"key":"e_1_2_1_41_1","unstructured":"Weijia Shi Anirudh Ajith Mengzhou Xia Yangsibo Huang Daogao Liu Terra Blevins Danqi Chen and Luke Zettlemoyer. 2023. Detecting pretraining data from large language models. arXiv preprint arXiv:2310.16789."},{"key":"e_1_2_1_42_1","unstructured":"Andr\u00e9 Silva Sen Fang and Martin Monperrus. 2023. RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair. arXiv preprint arXiv:2312.15698."},{"key":"e_1_2_1_43_1","volume-title":"GitBug-Java: A Reproducible Benchmark of Recent Java Bugs. In 2024 IEEE\/ACM 21st International Conference on Mining Software Repositories (MSR). 118\u2013122","author":"Silva Andr\u00e9","year":"2024","unstructured":"Andr\u00e9 Silva, Nuno Saavedra, and Martin Monperrus. 2024. GitBug-Java: A Reproducible Benchmark of Recent Java Bugs. In 2024 IEEE\/ACM 21st International Conference on Mining Software Repositories (MSR). 118\u2013122."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313","author":"Steenhoek Benjamin","year":"2024","unstructured":"Benjamin Steenhoek, Hongyang Gao, and Wei Le. 2024. Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313."},{"key":"e_1_2_1_45_1","unstructured":"Shubham Ugare Tarun Suresh Hangoo Kang Sasa Misailovic and Gagandeep Singh. 2024. Improving llm code generation with grammar augmentation. arXiv preprint arXiv:2403.01632."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 146\u2013158","author":"Wang Weishi","year":"2023","unstructured":"Weishi Wang, Yue Wang, Shafiq Joty, and Steven CH Hoi. 2023. Rap-gen: Retrieval-augmented patch generation with codet5 for automatic program repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 146\u2013158."},{"key":"e_1_2_1_47_1","volume-title":"Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le.","author":"Wei Jason","year":"2021","unstructured":"Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. 2021. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652."},{"key":"e_1_2_1_48_1","volume-title":"Introduction to robust estimation and hypothesis testing","author":"Wilcox Rand R","unstructured":"Rand R Wilcox. 2011. Introduction to robust estimation and hypothesis testing. Academic press."},{"key":"e_1_2_1_49_1","unstructured":"Chunqiu Steven Xia Yifeng Ding and Lingming Zhang. 2023. Revisiting the plastic surgery hypothesis via large language models. arXiv preprint arXiv:2303.10494."},{"key":"e_1_2_1_50_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1482\u20131494","author":"Xia Chunqiu Steven","year":"2023","unstructured":"Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1482\u20131494."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959\u2013971","author":"Xia Chunqiu Steven","year":"2022","unstructured":"Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959\u2013971."},{"key":"e_1_2_1_52_1","unstructured":"Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for 0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering. 1\u201313","author":"Yang Zhou","year":"2024","unstructured":"Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, and David Lo. 2024. Unveiling memorization in code models. In Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering. 1\u201313."},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1274\u20131286","author":"Yin Xin","year":"2024","unstructured":"Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang. 2024. Thinkrepair: Self-directed automated program repair. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1274\u20131286."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis. 678\u2013690","author":"Yuan Wei","year":"2022","unstructured":"Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, Xiaodong Hao, and Hongzhi Yin. 2022. CIRCLE: continual repair across programming languages. In Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis. 678\u2013690."},{"key":"e_1_2_1_56_1","volume-title":"2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 535\u2013547","author":"Zhang Quanjun","year":"2023","unstructured":"Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. 2023. Gamma: Revisiting template-based automated program repair via mask prediction. In 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 535\u2013547."},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the 2024 ACM\/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results. 47\u201351","author":"Zhou Xin","year":"2024","unstructured":"Xin Zhou, Ting Zhang, and David Lo. 2024. Large language model for vulnerability detection: Emerging results and future directions. In Proceedings of the 2024 ACM\/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results. 47\u201351."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729390","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:19:20Z","timestamp":1750346360000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729390"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":57,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729390"],"URL":"https:\/\/doi.org\/10.1145\/3729390","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}