{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T13:09:25Z","timestamp":1780492165523,"version":"3.54.1"},"reference-count":106,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T00:00:00Z","timestamp":1757376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Automated test case generation aims to improve software testing by reducing the manual effort required to create test cases. Recent advancements in large language models (LLMs), with their ability to understand natural language and generate code, have identified new opportunities to enhance this process. In this review, the focus is on the use of LLMs in test case generation to identify the effectiveness of the proposed methods compared with existing tools and potential directions for future research. A literature search was conducted using online resources, filtering the studies based on the defined inclusion and exclusion criteria. This paper presents the findings from the selected studies according to the three research questions and further categorizes the findings based on the common themes. These findings highlight the opportunities and challenges associated with the use of LLMs in this domain. Although improvements were observed in metrics such as test coverage, usability, and correctness, limitations such as inconsistent performance and compilation errors were highlighted. This provides a state-of-the-art review of LLM-based test case generation, emphasizing the potential of LLMs to improve automated testing while identifying areas for further advancements.<\/jats:p>","DOI":"10.3390\/make7030097","type":"journal-article","created":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T12:04:55Z","timestamp":1757505895000},"page":"97","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["A Review of Large Language Models for Automated Test Case Generation"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-7339-0183","authenticated-orcid":false,"given":"Arda","family":"Celik","sequence":"first","affiliation":[{"name":"Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON L1G 0C5, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0472-5757","authenticated-orcid":false,"given":"Qusay H.","family":"Mahmoud","sequence":"additional","affiliation":[{"name":"Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON L1G 0C5, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Putra, S.J., Sugiarti, Y., Prayoga, B.Y., Samudera, D.W., and Khairani, D. (2023, January 10\u201311). Analysis of Strengths and Weaknesses of Software Testing Strategies: Systematic Literature Review. Proceedings of the 2023 11th International Conference on Cyber and IT Service Management (CITSM), Makassar, Indonesia.","DOI":"10.1109\/CITSM60085.2023.10455226"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"106093","DOI":"10.1109\/ACCESS.2022.3211949","article-title":"Evolution of Software Testing Strategies and Trends: Semantic Content Analysis of Software Research Corpus of the Last 40 Years","volume":"10","author":"Gurcan","year":"2022","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2809","DOI":"10.1007\/s10664-020-09815-w","article-title":"What Am I Testing and Where? Comparing Testing Procedures Based on Lightweight Requirements Annotations","volume":"25","author":"Pudlitz","year":"2020","journal-title":"Empir. Softw. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"68905","DOI":"10.1109\/ACCESS.2021.3077755","article-title":"Exploring the Profiles of Software Testing Jobs in the United States","volume":"9","author":"Kassab","year":"2021","journal-title":"IEEE Access"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"51086","DOI":"10.1109\/ACCESS.2024.3384459","article-title":"The Impact of Software Testing on Serverless Applications","volume":"12","author":"Hewawasam","year":"2024","journal-title":"IEEE Access"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Alshahwan, N., Harman, M., and Marginean, A. (2023, January 16\u201320). Software Testing Research Challenges: An Industrial Perspective. Proceedings of the 2023 IEEE Conference on Software Testing, Verification and Validation (ICST), Dublin, Ireland.","DOI":"10.1109\/ICST57152.2023.00008"},{"key":"ref_7","first-page":"4925","article-title":"How Developers Engineer Test Cases: An Observational Study","volume":"48","author":"Aniche","year":"2021","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_8","unstructured":"Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., and Dong, Z. (2024). A Survey of Large Language Models. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1109\/TSE.2024.3368208","article-title":"Software Testing With Large Language Models: Survey, Landscape, and Vision","volume":"50","author":"Wang","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_10","unstructured":"Chen, L., Guo, Q., Jia, H., Zeng, Z., Wang, X., Xu, Y., Wu, J., Wang, Y., Gao, Q., and Wang, J. (2024). A Survey on Evaluating Large Language Models in Code Generation Tasks. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"26839","DOI":"10.1109\/ACCESS.2024.3365742","article-title":"A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges","volume":"12","author":"Raiaan","year":"2024","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J.M. (2023, January 14\u201320). Large Language Models for Software Engineering: Survey and Open Problems. Proceedings of the 2023 IEEE\/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE), Melbourne, Australia.","DOI":"10.1109\/ICSE-FoSE59343.2023.00008"},{"key":"ref_13","unstructured":"(2017). ISO\/IEC\/IEEE International Standard\u2014Systems and Software Engineering\u2014Vocabulary (Standard No. ISO\/IEC\/IEEE 24765:2017(E))."},{"key":"ref_14","unstructured":"Mayeda, M., and Andrews, A. (2021). Evaluating Software Testing Techniques: A Systematic Mapping Study. Advances in Computers, Missouri University of Science and Technology."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/bs.adcom.2017.11.003","article-title":"Emerging Software Testing Technologies","volume":"Volume 108","author":"Lonetti","year":"2018","journal-title":"Advances in Computers"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"106567","DOI":"10.1016\/j.infsof.2021.106567","article-title":"Test Case Generation for Agent-Based Models: A Systematic Literature Review","volume":"135","author":"Clark","year":"2021","journal-title":"Inf. Softw. Technol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., and Wang, H. (2024). Large Language Models for Software Engineering: A Systematic Literature Review. arXiv.","DOI":"10.1145\/3695988"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1340","DOI":"10.1109\/TSE.2024.3382365","article-title":"ChatGPT vs. SBST: A Comparative Assessment of Unit Test Suite Generation","volume":"50","author":"Tang","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, Y., Hu, Z., Zhi, C., Han, J., Deng, S., and Yin, J. (2024). ChatUniTest: A Framework for LLM-Based Test Generation. arXiv.","DOI":"10.1145\/3663529.3663801"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Rao, N., Jain, K., Alon, U., Goues, C.L., and Hellendoorn, V.J. (2023, January 11\u201315). CAT-LM Training Language Models on Aligned Code And Tests. Proceedings of the 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE), Luxembourg.","DOI":"10.1109\/ASE56229.2023.00193"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lemieux, C., Inala, J.P., Lahiri, S.K., and Sen, S. (2023, January 14\u201320). CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia.","DOI":"10.1109\/ICSE48619.2023.00085"},{"key":"ref_22","unstructured":"Zhang, Q., Fang, C., Gu, S., Shang, Y., Chen, Z., and Xiao, L. (2025). Large Language Models for Unit Testing: A Systematic Literature Review. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yi, G., Chen, Z., Chen, Z., Wong, W.E., and Chau, N. (2023, January 22\u201326). Exploring the Capability of ChatGPT in Test Generation. Proceedings of the 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C), Chiang Mai, Thailand.","DOI":"10.1109\/QRS-C60940.2023.00013"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Elvira, T., Procko, T.T., Couder, J.O., and Ochoa, O. (2023, January 24\u201327). Digital Rubber Duck: Leveraging Large Language Models for Extreme Programming. Proceedings of the 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE), Las Vegas, NV, USA.","DOI":"10.1109\/CSCE60160.2023.00051"},{"key":"ref_25","unstructured":"Chen, B., Zhang, F., Nguyen, A., Zan, D., Lin, Z., Lou, J.-G., and Chen, W. (2022). CodeT: Code Generation with Generated Tests. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yuan, Z., Lou, Y., Liu, M., Ding, S., Wang, K., Chen, Y., and Peng, X. (2024). No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. arXiv.","DOI":"10.1145\/3660783"},{"key":"ref_27","unstructured":"Kitchenham, B., and Charters, S. (2007). Guidelines for Performing Systematic Literature Reviews in Software Engineering, EBSE."},{"key":"ref_28","unstructured":"Tufano, M., Drain, D., Svyatkovskiy, A., Deng, S.K., and Sundaresan, N. (2021). Unit Test Case Generation with Transformers and Focal Context. arXiv."},{"key":"ref_29","unstructured":"Li, V., and Doiron, N. (2023). Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions. arXiv."},{"key":"ref_30","unstructured":"Zhang, Y., Song, W., Ji, Z., Yao, D., and Meng, N. (2023). How well does LLM generate security tests?. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Guilherme, V., and Vincenzi, A. (2023, January 25\u201329). An Initial Investigation of ChatGPT Unit Test Generation Capability. Proceedings of the 8th Brazilian Symposium on Systematic and Automated Software Testing, Campo Grande, MS, Brazil.","DOI":"10.1145\/3624032.3624035"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Siddiq, M.L., Da Silva Santos, J.C., Tanvir, R.H., Ulfat, N., Al Rifat, F., and Lopes, V.C. Using Large Language Models to Generate JUnit Tests: An Empirical Study. Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, Salerno Italy, 18\u201321 June 2024.","DOI":"10.1145\/3661167.3661216"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yang, L., Yang, C., Gao, S., Wang, W., Wang, B., Zhu, Q., Chu, X., Zhou, J., Liang, G., and Wang, Q. (2024, January 27). On the Evaluation of Large Language Models in Unit Test Generation. Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA.","DOI":"10.1145\/3691620.3695529"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Chang, H.-F., and Shirazi, M.S. (2025). A Systematic Approach for Assessing Large Language Models\u2019 Test Case Generation Capability. arXiv.","DOI":"10.3390\/software4010005"},{"key":"ref_35","unstructured":"Xu, J., Pang, B., Qu, J., Hayashi, H., Xiong, C., and Zhou, Y. (2025). CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification. arXiv."},{"key":"ref_36","unstructured":"Wang, Y., Xia, C., Zhao, W., Du, J., Miao, C., Deng, Z., Yu, P.S., and Xing, C. (2025). ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Bayr\u0131, V., and Demirel, E. (2023, January 21\u201322). AI-Powered Software Testing: The Impact of Large Language Models on Testing Methodologies. Proceedings of the 2023 4th International Informatics and Software Engineering Conference (IISEC), Ankara, T\u00fcrkiye.","DOI":"10.1109\/IISEC59749.2023.10391027"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Plein, L., Ou\u00e9draogo, W.C., Klein, J., and Bissyand\u00e9, T.F. (2024, January 14\u201320). Automatic Generation of Test Cases based on Bug Reports: A Feasibility Study with Large Language Models. Proceedings of the 2024 IEEE\/ACM 46th International Conference on Software Engineering: Companion Proceedings, Lisbon, Portugal.","DOI":"10.1145\/3639478.3643119"},{"key":"ref_39","unstructured":"Heiko, K., Virendra, A., Soumyadip, B., and Chandrika, K.R. (2024). Automated Control Logic Test Case Generation using Large Language Models. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yin, H., Mohammed, H., and Boyapati, S. (2024, January 21\u201323). Leveraging Pre-Trained Large Language Models (LLMs) for On-Premises Comprehensive Automated Test Case Generation: An Empirical Study. Proceedings of the 2024 9th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Okinawa, Japan.","DOI":"10.1109\/ICIIBMS62405.2024.10792720"},{"key":"ref_41","unstructured":"Rao, N., Gilbert, E., Green, H., Ramananandro, T., Swamy, N., Le Goues, C., and Fakhoury, S. (2025). DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts. arXiv."},{"key":"ref_42","unstructured":"Zhang, Q., Shang, Y., Fang, C., Gu, S., Zhou, J., and Chen, Z. (2024). TestBench: Evaluating Class-Level Test Case Generation Capability of Large Language Models. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Jiri, M., Emese, B., and Medlen, P. (2024, January 15\u201318). Leveraging Large Language Models for Python Unit Test. Proceedings of the 2024 IEEE International Conference on Artificial Intelligence Testing (AITest), Shanghai, China.","DOI":"10.1109\/AITest62860.2024.00020"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ryan, G., Jain, S., Shang, M., Wang, S., Ma, X., Ramanathan, M.K., and Ray, B. (2024). Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM. arXiv.","DOI":"10.1145\/3643769"},{"key":"ref_45","unstructured":"Gao, S., Wang, C., Gao, C., Jiao, X., Chong, C.Y., Gao, S., and Lyu, M. (2025). The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generation. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, W., Yang, C., Wang, Z., Huang, Y., Chu, Z., Song, D., Zhang, L., Chen, A.R., and Ma, L. (2025). TESTEVAL: Benchmarking Large Language Models for Test Case Generation. arXiv.","DOI":"10.18653\/v1\/2025.findings-naacl.197"},{"key":"ref_47","unstructured":"Ou\u00e9draogo, W.C., Kabor\u00e9, K., Li, Y., Tian, H., Koyuncu, A., Klein, J., Lo, D., and Bissyand\u00e9, T.F. (2024). Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation. arXiv."},{"key":"ref_48","unstructured":"Khelladi, D.E., Reux, C., and Acher, M. (2025). Unify and Triumph: Polyglot, Diverse, and Self-Consistent Generation of Unit Tests with LLMs. arXiv."},{"key":"ref_49","unstructured":"Sharma, R.K., Halleux, J.D., Barke, S., and Zorn, B. (2025). PromptPex: Automatic Test Generation for Language Model Prompts. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"103942","DOI":"10.1016\/j.csi.2024.103942","article-title":"Evaluating large language models for software testing","volume":"93","author":"Li","year":"2025","journal-title":"Comput. Stand. Interfaces"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Godage, T., Nimishan, S., Vasanthapriyan, S., Palanisamy, V., Joseph, C., and Thuseethan, S. (2025, January 19\u201320). Evaluating the Effectiveness of Large Language Models in Automated Unit Test Generation. Proceedings of the 2025 5th International Conference on Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka.","DOI":"10.1109\/ICARC64760.2025.10962997"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Roy Chowdhury, S., Sridhara, G., Raghavan, A.K., Bose, J., Mazumdar, S., Singh, H., Sugumaran, S.B., and Britto, R. (2024, January 18\u201321). Static Program Analysis Guided LLM Based Unit Test Generation. Proceedings of the 8th International Conference on Data Science and Management of Data (12th ACM IKDD CODS and 30th COMAD), Jodhpur, India.","DOI":"10.1145\/3703323.3703742"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Kang, S., Yoon, J., and Yoo, S. (2023, January 14\u201320). Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction. Proceedings of the 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia.","DOI":"10.1109\/ICSE48619.2023.00194"},{"key":"ref_54","unstructured":"Lahiri, S.K., Fakhoury, S., Naik, A., Sakkas, G., Chakraborty, S., Musuvathi, M., Choudhury, P., von Veh, C., Inala, J.P., and Wang, C. (2023). Interactive Code Generation via Test-Driven User-Intent Formalization. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Yu, S., Fang, C., Ling, Y., Wu, C., and Chen, Z. (2023, January 22\u201326). LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities. Proceedings of the 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS), Chiang Mai, Thailand.","DOI":"10.1109\/QRS60937.2023.00029"},{"key":"ref_56","unstructured":"Nashid, N., Bouzenia, I., Pradel, M., and Mesbah, A. (2025). Issue2Test: Generating Reproducing Test Cases from Issue Reports. arXiv."},{"key":"ref_57","unstructured":"Chen, M., Liu, Z., Tao, H., Hong, Y., Lo, D., Xia, X., and Sun, J. (November, January 27). B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible Tests. Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA."},{"key":"ref_58","unstructured":"Ni, C., Wang, X., Chen, L., Zhao, D., Cai, Z., Wang, S., and Yang, X. (2024). CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation. arXiv."},{"key":"ref_59","unstructured":"Liu, R., Zhang, Z., Hu, Y., Lin, Y., Gao, X., and Sun, H. (2025). LLM-based Unit Test Generation for Dynamically-Typed Programs. arXiv."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Bhatia, S., Gandhi, T., Kumar, D., and Jalote, P. (2024, January 20). Unit Test Generation using Generative AI: A Comparative Performance Analysis of Autogeneration Tools. Proceedings of the 1st International Workshop on Large Language Models for Code, Lisbon, Portugal.","DOI":"10.1145\/3643795.3648396"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1109\/TSE.2023.3334955","article-title":"An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation","volume":"50","author":"Nadi","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_62","unstructured":"Kumar, N.A., and Lan, A. (2024). Using Large Language Models for Student-Code Guided Test Case Generation in Computer Science Education. arXiv."},{"key":"ref_63","unstructured":"Wang, Z., Liu, K., Li, G., and Jin, Z. (November, January 27). HITS: High-coverage LLM-based Unit Test Generation via Method Slicing. Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering, Sacramento, CA, USA."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Etemadi, K., Mohammadi, B., Su, Z., and Monperrus, M. (2024). Mokav: Execution-driven Differential Testing with LLMs. arXiv.","DOI":"10.1016\/j.jss.2025.112571"},{"key":"ref_65","unstructured":"Yang, C., Chen, J., Lin, B., Zhou, J., and Wang, Z. (2024). Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. arXiv."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Alshahwan, N., Chheda, J., Finegenova, A., Gokkaya, B., Harman, M., Harper, I., Marginean, A., Sengupta, S., and Wang, E. (2024). Automated Unit Test Improvement using Large Language Models at Meta. arXiv.","DOI":"10.1145\/3663529.3663839"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1145\/3729398","article-title":"CoverUp: Effective High Coverage Test Generation for Python","volume":"2","author":"Pizzorno","year":"2025","journal-title":"Proc. ACM Softw. Eng."},{"key":"ref_68","unstructured":"Jain, K., and Goues, C.L. (2025). TestForge: Feedback-Driven, Agentic Test Suite Generation. arXiv."},{"key":"ref_69","unstructured":"Gu, S., Nashid, N., and Mesbah, A. (2025). LLM Test Generation via Iterative Hybrid Program Analysis. arXiv."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Straubinger, P., Kreis, M., Lukasczyk, S., and Fraser, G. (2025). Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging. arXiv.","DOI":"10.1109\/ICSTW64639.2025.10962485"},{"key":"ref_71","unstructured":"Zhang, Z., Liu, X., Lin, Y., Gao, X., Sun, H., and Yuan, Y. (2024). LLM-based Unit Test Generation via Property Retrieval. arXiv."},{"key":"ref_72","unstructured":"Zhong, Z., Wang, S., Wang, H., Wen, S., Guan, H., Tao, Y., and Liu, Y. (2024). Advancing Bug Detection in Fastjson2 with Large Language Models Driven Unit Test Generation. arXiv."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Pan, R., Kim, M., Krishna, R., Pavuluri, R., and Sinha, S. (2025). ASTER: Natural and Multi-language Unit Test Generation with LLMs. arXiv.","DOI":"10.1109\/ICSE-SEIP66354.2025.00042"},{"key":"ref_74","unstructured":"Gu, S., Zhang, Q., Li, K., Fang, C., Tian, F., Zhu, L., Zhou, J., and Chen, Z. (2025). TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Li, K., Yu, H., Guo, T., Cao, S., and Yuan, Y. (2025). CoCoEvo: Co-Evolution of Programs and Test Cases to Enhance Code Generation. arXiv.","DOI":"10.36227\/techrxiv.173930984.47381789\/v1"},{"key":"ref_76","unstructured":"Cheng, R., Tufano, M., Cito, J., Cambronero, J., Rondon, P., Wei, R., Sun, A., and Chandra, S. (2025). Agentic Bug Reproduction for Effective Automated Program Repair at Google. arXiv."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"2113","DOI":"10.1145\/3728970","article-title":"STRUT: Structured Seed Case Guided Unit Test Generation for C Programs using LLMs","volume":"2","author":"Liu","year":"2025","journal-title":"Proc. ACM Softw. Eng."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Alagarsamy, S., Tantithamthavorn, C., and Aleti, A. (2023). A3Test: Assertion-Augmented Automated Test Case Generation. arXiv.","DOI":"10.2139\/ssrn.4724885"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Shin, J., Hashtroudi, S., Hemmati, H., and Wang, S. (2024, January 16\u201320). Domain Adaptation for Code Model-Based Unit Test Case Generation. Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria.","DOI":"10.1145\/3650212.3680354"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Rehan, S., Al-Bander, B., and Ahmad, A.A.-S. (2025). Harnessing Large Language Models for Automated Software Testing: A Leap Towards Scalable Test Case Generation. Electronics, 14.","DOI":"10.3390\/electronics14071463"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"He, Y., Huang, J., Rong, Y., Guo, Y., Wang, E., and Chen, H. (2024, January 16\u201320). UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing. Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria.","DOI":"10.1145\/3650212.3680342"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Tufano, M., Drain, D., Svyatkovskiy, A., and Sundaresan, N. (2022, January 17\u201318). Generating accurate assert statements for unit test cases using pretrained transformers. Proceedings of the 3rd ACM\/IEEE International Conference on Automation of Software Test, Pittsburgh, PA, USA.","DOI":"10.1145\/3524481.3527220"},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"3721128","DOI":"10.1145\/3721128","article-title":"Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models","volume":"34","author":"Zhang","year":"2025","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Primbs, S., Fein, B., and Fraser, G. (2025, January 28\u201329). AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model. Proceedings of the 2025 IEEE\/ACM International Conference on Automation of Software Test (AST), Ottawa, ON, Canada.","DOI":"10.1109\/AST66626.2025.00008"},{"key":"ref_85","unstructured":"Storhaug, A., and Li, J. (2024). Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study. arXiv."},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Alagarsamy, S., Tantithamthavorn, C., Takerngsaksiri, W., Arora, C., and Aleti, A. (2025). Enhancing Large Language Models for Text-to-Testcase Generation. arXiv.","DOI":"10.2139\/ssrn.4732705"},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"1678","DOI":"10.1145\/3728951","article-title":"A Large-Scale Empirical Study on Fine-Tuning Large Language Models for Unit Testing","volume":"2","author":"Shang","year":"2025","journal-title":"Proc. ACM Softw. Eng."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"107468","DOI":"10.1016\/j.infsof.2024.107468","article-title":"Effective test generation using pre-trained Large Language Models and mutation testing","volume":"171","author":"Dakhel","year":"2024","journal-title":"Inf. Softw. Technol."},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Steenhoek, B., Tufano, M., Sundaresan, N., and Svyatkovskiy, A. (2025). Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation. arXiv.","DOI":"10.1109\/DeepTest66595.2025.00011"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Li, T.-O., Zong, W., Wang, Y., Tian, H., and Cheung, S.-C. (2023, January 11\u201315). Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. Proceedings of the 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE), Luxembourg.","DOI":"10.1109\/ASE56229.2023.00089"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Liu, K., Chen, Z., Liu, Y., Zhang, J.M., Harman, M., Han, Y., Ma, Y., Dong, Y., Li, G., and Huang, G. (2024). LLM-Powered Test Case Generation for Detecting Tricky Bugs. arXiv.","DOI":"10.18653\/v1\/2025.acl-long.20"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Zhang, J., Hu, X., Xia, X., Cheung, S.-C., and Li, S. (2025). Automated Unit Test Generation via Chain of Thought Prompt and Reinforcement Learning from Coverage Feedback. ACM Trans. Softw. Eng. Methodol.","DOI":"10.1145\/3745765"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Sapozhnikov, A., Olsthoorn, M., Panichella, A., Kovalenko, V., and Derakhshanfar, P. (2024, January 14\u201320). TestSpark: IntelliJ IDEA\u2019s Ultimate Test Generation Companion. Proceedings of the 2024 IEEE\/ACM 46th International Conference on Software Engineering: Companion Proceedings, Lisbon, Portugal.","DOI":"10.1145\/3639478.3640024"},{"key":"ref_94","unstructured":"Li, J., Shen, J., Su, Y., and Lyu, M.R. (2025). LLM-assisted Mutation for Whitebox API Testing. arXiv."},{"key":"ref_95","unstructured":"Li, K., and Yuan, Y. (2024). Large Language Models as Test Case Generators: Performance Evaluation and Enhancement. arXiv."},{"key":"ref_96","unstructured":"Huang, D., Zhang, J.M., Luck, M., Bu, Q., Qing, Y., and Cui, H. (2024). AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. arXiv."},{"key":"ref_97","unstructured":"M\u00fcndler, N., M\u00fcller, M.N., He, J., and Vechev, M. (2025). SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents. arXiv."},{"key":"ref_98","unstructured":"Taherkhani, H., and Hemmati, H. (2024). VALTEST: Automated Validation of Language Model Generated Test Cases. arXiv."},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Lops, A., Narducci, F., Ragone, A., Trizio, M., and Bartolini, C. (2024). A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites. arXiv.","DOI":"10.1109\/ICSTW64639.2025.10962454"},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1007\/s10515-025-00496-7","article-title":"LLM-enhanced evolutionary test generation for untyped languages","volume":"32","author":"Yang","year":"2025","journal-title":"Autom. Softw. Eng."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Xu, J., Xu, J., Chen, T., and Ma, X. (2024, January 1\u20135). Symbolic Execution with Test Cases Generated by Large Language Models. Proceedings of the 2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS), Cambridge, UK.","DOI":"10.1109\/QRS62785.2024.00031"},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Lu, Q., Liu, K., Dou, W., Zhu, J., Qian, L., Zhang, C., Lin, Z., and Wei, J. (2025). CITYWALK: Enhancing LLM-Based C++ Unit Test Generation via Project-Dependency Awareness and Language-Specific Knowledge. arXiv.","DOI":"10.1145\/3763791"},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1007\/s10664-025-10635-z","article-title":"Enriching automatic test case generation by extracting relevant test inputs from bug reports","volume":"30","author":"Plein","year":"2025","journal-title":"Empir. Softw. Eng."},{"key":"ref_104","unstructured":"Xu, W., Pei, H., Yang, J., Shi, Y., Zhang, Y., and Zhao, Q. (2024). Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach. arXiv."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Duvvuru, V.S.A., Zhang, B., Vierhauser, M., and Agrawal, A. (2025). LLM-Agents Driven Automated Simulation Testing and Analysis of small Uncrewed Aerial Systems. arXiv.","DOI":"10.1109\/ICSE55347.2025.00223"},{"key":"ref_106","doi-asserted-by":"crossref","unstructured":"Petrovic, N., Lebioda, K., Zolfaghari, V., Schamschurko, A., Kirchner, S., and Purschke, N. (2024, January 26\u201329). LLM-Driven Testing for Autonomous Driving Scenarios. Proceedings of the 2024 2nd International Conference on Foundation and Large Language Models (FLLM), Dubai, United Arab Emirates.","DOI":"10.1109\/FLLM63129.2024.10852505"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/97\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:42:45Z","timestamp":1760035365000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/97"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,9]]},"references-count":106,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["make7030097"],"URL":"https:\/\/doi.org\/10.3390\/make7030097","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,9]]}}}