{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T19:18:17Z","timestamp":1778613497615,"version":"3.51.4"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA1","license":[{"start":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T00:00:00Z","timestamp":1744156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2025,4,9]]},"abstract":"<jats:p>Software updates, including bug repair and feature additions, are frequent in modern applications but they often leave test suites outdated, resulting in undetected bugs and increased chances of system failures. A recent study by Meta revealed that 14%-22% of software failures stem from outdated tests that fail to reflect changes in the codebase. This highlights the need to keep tests in sync with code changes to ensure software reliability.\n \n \n \n \n \n \n \nIn this paper, we present UTFix, a novel approach for repairing unit tests when their corresponding focal methods undergo changes. UTFix addresses two critical issues: assertion failure and reduced code coverage caused by changes in the focal method. Our approach leverages language models to repair unit tests by providing contextual information such as static code slices, dynamic code slices, and failure messages. We evaluate UTFix on our generated synthetic benchmarks (Tool-Bench), and real-world benchmarks. Tool- Bench includes diverse changes from popular open-source Python GitHub projects, where UTFix successfully repaired 89.2% of assertion failures and achieved 100% code coverage for 96 tests out of 369 tests. On the real-world benchmarks, UTFix repairs 60% of assertion failures while achieving 100% code coverage for 19 out of 30 unit tests. To the best of our knowledge, this is the first comprehensive study focused on unit test in evolving Python projects. Our contributions include the development of UTFix, the creation of Tool-Bench and real-world benchmarks, and the demonstration of the effectiveness of LLM-based methods in addressing unit test failures due to software evolution.<\/jats:p>","DOI":"10.1145\/3720419","type":"journal-article","created":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T13:48:26Z","timestamp":1744206506000},"page":"143-168","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["UTFix: Change Aware Unit Test Repairing using LLM"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0599-8215","authenticated-orcid":false,"given":"Shanto","family":"Rahman","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Austin, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5739-013X","authenticated-orcid":false,"given":"Sachit","family":"Kuhar","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4261-090X","authenticated-orcid":false,"given":"Berk","family":"Cirisci","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Berlin, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0575-6320","authenticated-orcid":false,"given":"Pranav","family":"Garg","sequence":"additional","affiliation":[{"name":"Amazon Web Services, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6338-1432","authenticated-orcid":false,"given":"Shiqi","family":"Wang","sequence":"additional","affiliation":[{"name":"Meta, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3163-0310","authenticated-orcid":false,"given":"Xiaofei","family":"Ma","sequence":"additional","affiliation":[{"name":"Amazon Web Services, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4566-8767","authenticated-orcid":false,"given":"Anoop","family":"Deoras","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Santa Clara, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3406-5235","authenticated-orcid":false,"given":"Baishakhi","family":"Ray","sequence":"additional","affiliation":[{"name":"Amazon Web Services, New York, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,4,9]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. Claude-3.5-sonnet. https:\/\/www.anthropic.com\/news\/claude-3-5-sonnet"},{"key":"e_1_2_1_2_1","unstructured":"2024. huggingface. https:\/\/huggingface.co\/"},{"key":"e_1_2_1_3_1","unstructured":"2024. langchain. https:\/\/api.python.langchain.com\/en\/latest\/chat_message_histories\/langchain_community.chat_message_histories.in_memory.ChatMessageHistory.html"},{"key":"e_1_2_1_4_1","unstructured":"2024. Tox. https:\/\/tox.wiki\/en\/latest\/config.html."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985898"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/93548.93576"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3663529.3663839"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.3390\/app11104673"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/24.963124"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985978"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2009.17"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510141"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2025113.2025179"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICST.2013.51"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2685612"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2012.6227172"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568278"},{"key":"e_1_2_1_18_1","unstructured":"Siqi Gu Chunrong Fang Quanjun Zhang Fangyuan Tian Jianyi Zhou and Zhenyu Chen. 2024. Improving LLM-based Unit test generation via Template-based Repair. arXiv preprint arXiv:2408.03095."},{"key":"e_1_2_1_19_1","unstructured":"Sepehr Hashtroudi Jiho Shin Hadi Hemmati and Song Wang. 2023. Automated test case generation using code models and domain adaptation. arXiv preprint arXiv:2308.08033."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3690928"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICST.2019.00030"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3678167"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510454.3516829"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2025113.2025172"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICST.2012.103"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2007.37"},{"key":"e_1_2_1_27_1","volume-title":"The economic impacts of inadequate infrastructure for software testing","author":"Planning Strategic","year":"2002","unstructured":"Strategic Planning. 2002. The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology, 1 (2002)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICST60714.2024.00018"},{"key":"e_1_2_1_29_1","volume-title":"Ranking Relevant Tests for Order-Dependent Flaky Tests. In International Conference on Software Engineering.","author":"Rahman Shanto","year":"2025","unstructured":"Shanto Rahman, Bala Naren Chanumolu, Suzzana Rafi, August Shi, and Wing Lam. 2025. Ranking Relevant Tests for Order-Dependent Flaky Tests. In International Conference on Software Engineering."},{"key":"e_1_2_1_30_1","volume-title":"FlakeSync: Automatically Repairing Async Flaky Tests. In International Conference on Software Engineering. 1\u201312","author":"Rahman Shanto","year":"2024","unstructured":"Shanto Rahman and August Shi. 2024. FlakeSync: Automatically Repairing Async Flaky Tests. In International Conference on Software Engineering. 1\u201312."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3643769"},{"key":"e_1_2_1_32_1","article-title":"An empirical evaluation of using large language models for automated unit test generation","author":"Sch\u00e4fer Max","year":"2023","unstructured":"Max Sch\u00e4fer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering.","journal-title":"IEEE Transactions on Software Engineering."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSTW.2015.7107476"},{"key":"e_1_2_1_34_1","unstructured":"2025. https:\/\/sites.google.com\/view\/utfix."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSM.2015.7332456"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3691620.3695501"},{"key":"e_1_2_1_37_1","volume-title":"Denny Zhou, et al.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35 (2022), 24824\u201324837."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1984.5010248"},{"key":"e_1_2_1_39_1","unstructured":"Zhuokui Xie Yinghao Chen Chen Zhi Shuiguang Deng and Jianwei Yin. 2023. ChatUniTest: a ChatGPT-based automated unit test generation tool. arXiv preprint arXiv:2305.04764."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/APSEC.2014.51"},{"key":"e_1_2_1_41_1","unstructured":"Chen Yang Junjie Chen Bin Lin Jianyi Zhou and Ziqi Wang. 2024. Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. arXiv preprint arXiv:2404.04966."},{"key":"e_1_2_1_42_1","article-title":"Automated Test Case Repair Using Language Models","author":"Yaraghi Ahmadreza Saboor","year":"2024","unstructured":"Ahmadreza Saboor Yaraghi, Darren Holden, Nafiseh Kahani, and Lionel Briand. 2024. Automated Test Case Repair Using Language Models. IEEE Transactions on Software Engineering.","journal-title":"IEEE Transactions on Software Engineering."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534396"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3720419","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3720419","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:07:29Z","timestamp":1760029649000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3720419"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,9]]},"references-count":43,"journal-issue":{"issue":"OOPSLA1","published-print":{"date-parts":[[2025,4,9]]}},"alternative-id":["10.1145\/3720419"],"URL":"https:\/\/doi.org\/10.1145\/3720419","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,9]]},"assertion":[{"value":"2024-10-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}