{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T20:38:14Z","timestamp":1773693494695,"version":"3.50.1"},"reference-count":50,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2024,12,25]],"date-time":"2024-12-25T00:00:00Z","timestamp":1735084800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computer Assisted Learning"],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:title>ABSTRACT<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Automatically graded programming assignments provide instant feedback to students and significantly reduce manual grading time for instructors. However, creating comprehensive suites of test cases for programming problems within automatic graders can be time\u2010consuming and complex. The effort needed to define test suites may deter some instructors from creating additional problems or lead to inadequate test coverage, potentially resulting in misleading feedback on student solutions. Such limitations may reduce student access to the well\u2010documented benefits of timely feedback when learning programming.<\/jats:p><\/jats:sec><jats:sec><jats:title>Objectives<\/jats:title><jats:p>We evaluate the effectiveness of using Large Language Models (LLMs), as part of a larger workflow, to automatically generate test suites for CS1\u2010level programming problems.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>Each problem's statement and reference solution are provided to GPT\u20104 to produce a test suite that can be used by an autograder. We evaluate our proposed approach using a sample of 26 problems, and more than 25,000 attempted solutions to those problems, submitted by students in an introductory programming course. We compare the performance of the LLM\u2010generated test suites against the instructor\u2010created test suites for each problem.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results and Conclusions<\/jats:title><jats:p>Our findings reveal that LLM\u2010generated test suites can correctly identify most valid solutions, and for most problems are at least as comprehensive as the instructor test suites. Additionally, the LLM\u2010generated test suites exposed ambiguities in some problem statements, underscoring their potential to improve both autograding and instructional design.<\/jats:p><\/jats:sec>","DOI":"10.1111\/jcal.13100","type":"journal-article","created":{"date-parts":[[2024,12,26]],"date-time":"2024-12-26T07:04:05Z","timestamp":1735196645000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Automating Autograding: Large Language Models as Test Suite Generators for Introductory Programming"],"prefix":"10.1111","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-6003-3370","authenticated-orcid":false,"given":"Umar","family":"Alkafaween","sequence":"first","affiliation":[{"name":"Department of Computer Science Al Hussein Technical University  Amman Jordan"}]},{"given":"Ibrahim","family":"Albluwi","sequence":"additional","affiliation":[{"name":"Department of Computer Science Princess Sumaya University for Technology  Amman Jordan"}]},{"given":"Paul","family":"Denny","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Auckland  Auckland New Zealand"}]}],"member":"311","published-online":{"date-parts":[[2024,12,25]]},"reference":[{"key":"e_1_2_11_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491140.3528282"},{"key":"e_1_2_11_3_1","unstructured":"Alkafaween U.2024a.\u201cRuby Code to Run Python Generated llm Unit Tests.\u201dhttps:\/\/github.com\/umar221b\/autograder_llm_test_generator\/blob\/master\/lib\/code_execution\/runners\/unit_test_runners\/python3_unit_test_runner.rb."},{"key":"e_1_2_11_4_1","unstructured":"Alkafaween U.2024b.\u201cA Small App That Talks to Openaito Generate Tests for cs1 Programming Problems.\u201dhttps:\/\/github.com\/umar221b\/autograder_llm_test_generator."},{"key":"e_1_2_11_5_1","unstructured":"Alkafaween U.2024c.\u201cSystem Prompts to Generate Unit Tests For Python.\u201dhttps:\/\/github.com\/umar221b\/autograder_llm_test_generator\/blob\/master\/app\/services\/llm_services\/llm_chat_query_templates\/system_prompts.yml."},{"key":"e_1_2_11_6_1","article-title":"AI\u2010Enhanced Autocorrection of Programming Exercises: How Effective is Gpt\u20103.5?","volume":"13","author":"Azaiz I.","year":"2023","journal-title":"arXiv preprint arXiv 2311.10737"},{"key":"e_1_2_11_7_1","first-page":"31","volume-title":"Proceedings of the 2024 on Innovation and Technology in Computer Science Education","author":"Azaiz I.","year":"2024"},{"issue":"3","key":"e_1_2_11_8_1","article-title":"Enhancing the Learning Process in Programming Courses Through an Automated Feedback and Assignment Management System","volume":"17","author":"Bai X.","year":"2016","journal-title":"Issues in Information Systems"},{"key":"e_1_2_11_9_1","first-page":"500","volume-title":"Proceedings of the 54th ACM Technical Symposium on Computer Science Education","author":"Becker B. A.","year":"2023"},{"key":"e_1_2_11_10_1","unstructured":"Bengtsson D. andA.Kaliff.2023.Assessment Accuracy of a Large Language Model on Programming Assignments."},{"key":"e_1_2_11_11_1","article-title":"Teaching Large Language Models to Self\u2010Debug","author":"Chen X.","year":"2023","journal-title":"arXiv preprint arXiv 2304.05128"},{"key":"e_1_2_11_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626252.3630909"},{"key":"e_1_2_11_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3649217.3653574"},{"key":"e_1_2_11_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3624720"},{"key":"e_1_2_11_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3636243.3636256"},{"key":"e_1_2_11_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HSI.2018.8431146"},{"key":"e_1_2_11_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2899415.2899454"},{"key":"e_1_2_11_18_1","doi-asserted-by":"publisher","DOI":"10.3390\/su11205568"},{"key":"e_1_2_11_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/FIE43999.2019.9028686"},{"key":"e_1_2_11_20_1","unstructured":"Hui B.2023.\u201cAn Awesome and Curated List of Best Code\u2010Llm For Research.\u201dhttps:\/\/github.com\/huybery\/Awesome\u2010Code\u2010LLM."},{"key":"e_1_2_11_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3231711"},{"key":"e_1_2_11_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.caeai.2023.100151"},{"key":"e_1_2_11_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3587102.3588785"},{"key":"e_1_2_11_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3545945.3569770"},{"key":"e_1_2_11_25_1","first-page":"1","volume-title":"Proceedings of the 23rd Koli Calling International Conference on Computing Education Research","author":"Liffiton M.","year":"2024"},{"key":"e_1_2_11_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810041"},{"key":"e_1_2_11_27_1","first-page":"491","volume-title":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education","author":"Messer M.","year":"2023"},{"key":"e_1_2_11_28_1","first-page":"388","volume-title":"Proceedings of the 54th ACM Technical Symposium on Computer Science Education","author":"Mitra J.","year":"2023"},{"key":"e_1_2_11_29_1","unstructured":"Nilsson F. andJ.Tuvstedt.2023.Gpt\u20104 as an Automatic Grader: The Accuracy of Grades Set By Gpt\u20104 on Introductory Programming Assignments."},{"key":"e_1_2_11_30_1","unstructured":"OpenAI.2023.\u201cA Small App That Talks to Openaito Generate Tests For cs1 Programming Problems.\u201dhttps:\/\/platform.openai.com\/docs\/api\u2010reference\/chat\/create#chat\u2010create\u2010temperature."},{"key":"e_1_2_11_31_1","unstructured":"OpenAI.2024.\u201cAbout gpt4 and gpt4 Turbo.\u201dhttps:\/\/platform.openai.com\/docs\/models\/gpt\u20104\u2010and\u2010gpt\u20104\u2010turbo."},{"key":"e_1_2_11_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3513140"},{"key":"e_1_2_11_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3059009.3059026"},{"key":"e_1_2_11_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3568812.3603476"},{"key":"e_1_2_11_35_1","first-page":"4","article-title":"Automated Assessment of Programming Assignments","volume":"13","author":"Pieterse V.","year":"2013","journal-title":"CSERC"},{"key":"e_1_2_11_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372782.3406263"},{"key":"e_1_2_11_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3623762.3633499"},{"key":"e_1_2_11_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230977.3230981"},{"key":"e_1_2_11_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/FIE43999.2019.9028450"},{"key":"e_1_2_11_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3501385.3543957"},{"key":"e_1_2_11_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2023.3334955"},{"issue":"6","key":"e_1_2_11_42_1","first-page":"69","article-title":"Impact of Auto\u2010Grading on an Introductory Computing Course","volume":"28","author":"Sherman M.","year":"2013","journal-title":"Journal of Computing Sciences in Colleges"},{"key":"e_1_2_11_43_1","article-title":"Reflexion: Language agents with verbal reinforcement learning","volume":"36","author":"Shinn N.","year":"2024","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_11_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3661167.3661216"},{"key":"e_1_2_11_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3317694"},{"key":"e_1_2_11_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3657604.3662039"},{"key":"e_1_2_11_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TALE.2015.7386010"},{"key":"e_1_2_11_48_1","first-page":"267","volume-title":"Proceedings of the Fifth Australasian Conference on Computing Education","author":"Venables A.","year":"2003"},{"key":"e_1_2_11_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3368208"},{"key":"e_1_2_11_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517582"},{"key":"e_1_2_11_51_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i1.27798"}],"container-title":["Journal of Computer Assisted Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/jcal.13100","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,25]],"date-time":"2025-04-25T14:31:41Z","timestamp":1745591501000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/jcal.13100"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,25]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["10.1111\/jcal.13100"],"URL":"https:\/\/doi.org\/10.1111\/jcal.13100","archive":["Portico"],"relation":{},"ISSN":["0266-4909","1365-2729"],"issn-type":[{"value":"0266-4909","type":"print"},{"value":"1365-2729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,25]]},"assertion":[{"value":"2024-03-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e13100"}}