{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T03:58:58Z","timestamp":1772423938217,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T00:00:00Z","timestamp":1759190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009076","name":"Capgemini","doi-asserted-by":"crossref","award":["A01"],"award-info":[{"award-number":["A01"]}],"id":[{"id":"10.13039\/100009076","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["www.mdpi.com"],"crossmark-restriction":true},"short-container-title":["Data"],"abstract":"<jats:p>This study explores the potential of generative AI, specifically Large Language Models (LLMs), in automating unit test generation in Python 3.13. We analyze tests, both those created by programmers and those generated by LLM models, for fifty source code cases. Our main focus is on how the choice of model, the difficulty of the source code, and the prompting strategy influence the quality of the generated tests. The results show that AI models can help automate test creation for simple code, but their effectiveness decreases for more complex tasks. We introduce an embedding-based similarity analysis to assess how closely AI-generated tests resemble human-written ones, revealing that AI outputs often lack semantic diversity. The study also highlights the potential of AI models for rapid test prototyping, which can significantly speed up the software development cycle. However, further customization and training of the models on specific use cases is needed to achieve greater precision. Our findings provide practical insights into integrating LLMs into software testing workflows and emphasize the importance of prompt design and model selection.<\/jats:p>","DOI":"10.3390\/data10100156","type":"journal-article","created":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T14:16:29Z","timestamp":1759241789000},"page":"156","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Automated Test Generation Using Large Language Models"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-2089-3290","authenticated-orcid":false,"given":"Marcin","family":"Andrzejewski","sequence":"first","affiliation":[{"name":"GenerativeAI Academic Research Team (GART), Capgemini Insights & Data, 54-202 Wroclaw, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6666-172X","authenticated-orcid":false,"given":"Nina","family":"Dubicka","sequence":"additional","affiliation":[{"name":"GenerativeAI Academic Research Team (GART), Capgemini Insights & Data, 54-202 Wroclaw, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1679-2134","authenticated-orcid":false,"given":"J\u0119drzej","family":"Podolak","sequence":"additional","affiliation":[{"name":"GenerativeAI Academic Research Team (GART), Capgemini Insights & Data, 54-202 Wroclaw, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-4099-6981","authenticated-orcid":false,"given":"Marek","family":"Kowal","sequence":"additional","affiliation":[{"name":"GenerativeAI Academic Research Team (GART), Capgemini Insights & Data, 54-202 Wroclaw, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9133-4388","authenticated-orcid":false,"given":"Jakub","family":"Si\u0142ka","sequence":"additional","affiliation":[{"name":"Faculty of Applied Mathematics, Silesian University of Technology, 44-100 Gliwice, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,30]]},"reference":[{"key":"ref_1","unstructured":"Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., and Gao, J. (2024). Large Language Models: A Survey. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MS.2006.91","article-title":"A survey of unit testing practices","volume":"23","author":"Runeson","year":"2006","journal-title":"IEEE Softw."},{"key":"ref_3","first-page":"319","article-title":"Unit testing: Test early, test often","volume":"19","author":"Olan","year":"2003","journal-title":"J. Comput. Sci. Coll."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"McMinn, P. (2011, January 21\u201325). Search-Based Software Testing: Past, Present and Future. Proceedings of the 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops, Berlin, Germany.","DOI":"10.1109\/ICSTW.2011.100"},{"key":"ref_5","unstructured":"Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., and Dong, Z. (2023). A Survey of Large Language Models. arXiv."},{"key":"ref_6","unstructured":"Jiang, J., Wang, F., Shen, J., Kim, S., and Kim, S. (2024). A Survey on Large Language Models for Code Generation. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., and Wang, Q. (2024). Software Testing with Large Language Models: Survey, Landscape, and Vision. arXiv.","DOI":"10.1109\/TSE.2024.3368208"},{"key":"ref_8","unstructured":"Pizzorno, J.A., and Berger, E.D. (2024). CoverUp: Coverage-Guided LLM-Based Test Generation. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ryan, G., Jain, S., Shang, M., Wang, S., Ma, X., Ramanathan, M.K., and Ray, B. (2024). Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM. arXiv.","DOI":"10.1145\/3643769"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wong, M.F., and Tan, C.W. (2025). Aligning Crowd-Sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models. arXiv.","DOI":"10.1109\/TBDATA.2024.3524104"},{"key":"ref_11","unstructured":"Chen, B., Zhang, F., Nguyen, A., Zan, D., Lin, Z., Lou, J.G., and Chen, W. (2022). CodeT: Code Generation with Generated Tests. arXiv."},{"key":"ref_12","unstructured":"Bi, Z., Zhang, N., Jiang, Y., Deng, S., Zheng, G., and Chen, H. (2023). When Do Program-of-Thoughts Work for Reasoning?. arXiv."},{"key":"ref_13","unstructured":"Astels, D. (2003). Test Driven Development: A Practical Guide, Prentice Hall Professional Technical Reference."},{"key":"ref_14","unstructured":"Reese, J. (2025, April 12). Unit Testing Best Practices for NET. Microsoft Learn. Available online: https:\/\/learn.microsoft.com\/en-us\/dotnet\/core\/testing\/unit-testing-best-practices."},{"key":"ref_15","unstructured":"(2024, July 04). Website: Introducing Meta Llama 3: The Most Capable Openly Available LLM to Date. Available online: https:\/\/ai.meta.com\/blog\/meta-llama-3\/."},{"key":"ref_16","unstructured":"Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de las Casas, D., Bressand, F., Lengyel, G., Lample, G., and Saulnier, L. (2023). Mistral 7B. arXiv."},{"key":"ref_17","unstructured":"Team, V. (2024, July 04). Website: *Llama 3 8B vs. Mistral 7B: Small LLM Pricing Considerations*. Available online: https:\/\/www.vantage.sh\/blog\/best-small-llm-llama-3-8b-vs-mistral-7b-cost."},{"key":"ref_18","unstructured":"Anthropic (2025, April 06). The Claude 3 Model Family: Opus, Sonnet, Haiku. Claude-3 Model Card. Available online: https:\/\/assets.anthropic.com\/m\/61e7d27f8c8f5919\/original\/Claude-3-Model-Card.pdf."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ivankovi\u0107, M., Petrovi\u0107, G., Just, R., and Fraser, G. (2019, January 26\u201330). Code coverage at Google. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia.","DOI":"10.1145\/3338906.3340459"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1340","DOI":"10.1109\/TSE.2024.3382365","article-title":"Chatgpt vs sbst: A comparative assessment of unit test suite generation","volume":"50","author":"Tang","year":"2024","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_21","unstructured":"Chen, Z., and Monperrus, M. (2019). A literature study of embeddings on source code. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kenter, T., and De Rijke, M. (2015, January 18\u201323). Short text similarity with word embeddings. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia.","DOI":"10.1145\/2806416.2806475"},{"key":"ref_23","unstructured":"Pouly, M. (2024). Estimating Text Similarity based on Semantic Concept Embeddings. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Steck, H., Ekanadham, C., and Kallus, N. (2024, January 13\u201317). Is cosine-similarity of embeddings really about similarity?. Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, Singapore.","DOI":"10.1145\/3589335.3651526"},{"key":"ref_25","unstructured":"Chen, B., Zhang, Z., Langren\u00e9, N., and Zhu, S. (2023). Unleashing the potential of prompt engineering in Large Language Models: A comprehensive review. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1038\/s41586-023-06647-8","article-title":"Role play with large language models","volume":"623","author":"Shanahan","year":"2023","journal-title":"Nature"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Logan IV, R.L., Bala\u017eevi\u0107, I., Wallace, E., Petroni, F., Singh, S., and Riedel, S. (2021). Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv.","DOI":"10.18653\/v1\/2022.findings-acl.222"},{"key":"ref_28","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","unstructured":"Beer, R., Feix, A., Guttzeit, T., Muras, T., M\u00fcller, V., Rauscher, M., Sch\u00e4ffler, F., and L\u00f6we, W. (2024). Examination of Code generated by Large Language Models. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MS.2025.3549628","article-title":"From code generation to software testing: AI Copilot with context-based RAG","volume":"42","author":"Wang","year":"2025","journal-title":"IEEE Softw."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/10\/156\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,2]],"date-time":"2025-10-02T04:28:54Z","timestamp":1759379334000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/10\/156"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,30]]},"references-count":30,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["data10100156"],"URL":"https:\/\/doi.org\/10.3390\/data10100156","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,30]]}}}