{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T15:42:02Z","timestamp":1782834122519,"version":"3.54.5"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100018777","name":"Nile University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100018777","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In recent years, large language models (LLMs) have gained significant traction across various domains, including education. This paper explores the application of LLMs in grading programming assignments. By leveraging data collected from existing programming assignments and their corresponding grades, we aim to develop a robust LLM-based grading system. We also incorporate augmented data representing various grading scenarios to enhance the model\u2019s performance and ensure comprehensive coverage across all grading levels. Our approach involves training the LLM on this combined dataset to enable accurate and consistent evaluation of programming assignments. The proposed model, BeGrading, aims to reduce the grading burden on educators and provide timely and objective feedback to students. Compared to the Codestral model, our proposed model demonstrates an absolute difference rate of 19%, equivalent to <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\pm 0.95$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mo>\u00b1<\/mml:mo>\n                    <mml:mn>0.95<\/mml:mn>\n                  <\/mml:mrow>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> out of 5. This is acceptable for using a small, fine-tuned model with optimized data. Additionally, the Codestral model compared to the dataset optimized score shows a difference of 15% equivalent to a margin of <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\pm 0.75$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mo>\u00b1<\/mml:mo>\n                    <mml:mn>0.75<\/mml:mn>\n                  <\/mml:mrow>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> out of 5. Preliminary results demonstrate the potential of LLMs to perform grading tasks with a high degree of reliability, opening avenues for further research and practical applications in automated education systems.<\/jats:p>","DOI":"10.1007\/s00521-024-10449-y","type":"journal-article","created":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T06:01:55Z","timestamp":1729058515000},"page":"1027-1040","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["BeGrading: large language models for enhanced feedback in programming education"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-0316-2724","authenticated-orcid":false,"given":"Mina","family":"Yousef","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2695-5692","authenticated-orcid":false,"given":"Kareem","family":"Mohamed","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9482-8412","authenticated-orcid":false,"given":"Walaa","family":"Medhat","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1615-0236","authenticated-orcid":false,"given":"Ensaf Hussein","family":"Mohamed","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7332-0759","authenticated-orcid":false,"given":"Ghada","family":"Khoriba","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9553-8265","authenticated-orcid":false,"given":"Tamer","family":"Arafa","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,10,16]]},"reference":[{"key":"10449_CR1","doi-asserted-by":"crossref","unstructured":"Ahmed UZ, Kumar P, Karkare A, Kar P, Gulwani S(2018) Compilation error repair: for the student programs, from the student programs. In: Proceedings of the international conference on software engineering, pp 78\u201387","DOI":"10.1145\/3183377.3183383"},{"key":"10449_CR2","unstructured":"Bellman J (2016) Jsymtester: symbolic execution framework for java pathfinder. Master\u2019s thesis, Unknown"},{"key":"10449_CR3","unstructured":"Bengtsson D, Kaliff A (2023) Assessment accuracy of a large language model on programming assignments. Degree project in computer science and engineering, first cycle, KTH Royal Institute of Technology"},{"key":"10449_CR4","doi-asserted-by":"crossref","unstructured":"Bhatia S, Kohli P, Singh R (2018) Neuro-symbolic program corrector for introductory programming assignments. In: Proceedings of the international conference on software engineering, pp 60\u201370","DOI":"10.1145\/3180155.3180219"},{"key":"10449_CR5","unstructured":"Boudewijn Nadia (2016) Automated grading of java assignments. Master\u2019s thesis, Utrecht University"},{"key":"10449_CR6","first-page":"1877","volume":"33","author":"TB Brown","year":"2020","unstructured":"Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inform Process Syst 33:1877\u20131901","journal-title":"Adv Neural Inform Process Syst"},{"key":"10449_CR7","doi-asserted-by":"crossref","unstructured":"Brumley D, Caballero J, Liang Z, Newsome J, Song D (2007) Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation. In: Proceedings of 16th USENIX security symposium on USENIX security symposium, vol 15, no 1\u201315, p 16","DOI":"10.1109\/SP.2006.41"},{"key":"10449_CR8","doi-asserted-by":"crossref","unstructured":"Day M, Penumala MR, Gonzalez-Sanchez J (2019) Annete: an intelligent tutoring companion embedded into the eclipse ide. In: IEEE First international conference on cognitive machine intelligence, pp 71\u201380","DOI":"10.1109\/CogMI48466.2019.00018"},{"key":"10449_CR9","unstructured":"Douce C et\u00a0al (2005) Automated grading of java assignments using black-box testing. Int J Comput Sci Educ"},{"key":"10449_CR10","doi-asserted-by":"crossref","unstructured":"Dunder N, Lundborg S, Wong J, Viberg O (2024) Kattis versus chatgpt: assessment and evaluation of programming tasks in the age of artificial intelligence. In: Proceedings of the 14th learning analytics and knowledge conference (LAK \u201924), ACM, pp 821\u2013827","DOI":"10.1145\/3636555.3636882"},{"key":"10449_CR11","doi-asserted-by":"crossref","unstructured":"Gan W, Qi Z, Wu J, Lin J (2023) Large language models in education: vision and opportunities. In: 2023 IEEE international conference on big data (BigData), dec IEEE Computer Society, Los Alamitos, CA, pp 4776\u20134785","DOI":"10.1109\/BigData59044.2023.10386291"},{"key":"10449_CR12","unstructured":"Gao Y, Zhang Y, Liu B (2022) Generating synthetic programming assignments for training automated grading systems. In: Proceedings of the 2022 ACM conference on learning at scale, pp 101\u2013110"},{"key":"10449_CR13","unstructured":"Goedicke M, Striewe M (2013) Static analysis of java code: tools and techniques. J Softw Eng"},{"key":"10449_CR14","doi-asserted-by":"crossref","unstructured":"Gupta R, Kanade A, Shevade S (2019) Deep reinforcement learning for syntactic error repair in student programs. In: Proceedings of the AAAI conference on artificial intelligence, pp 930\u2013937","DOI":"10.1609\/aaai.v33i01.3301930"},{"key":"10449_CR15","doi-asserted-by":"crossref","unstructured":"Gupta R, Pal S, Kanade A, Shevade S (2017) Deepfix: fixing common c language errors by deep learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 1345\u20131351","DOI":"10.1609\/aaai.v31i1.10742"},{"key":"10449_CR16","unstructured":"Jones M, Smith S (2022) Hybrid model for grading programming assignments using LLMS. In: Proceedings of the 2022 ACM conference on learning at scale, pp 201\u2013210"},{"issue":"1\u20132","key":"10449_CR17","first-page":"25","volume":"14","author":"Z Ke","year":"2000","unstructured":"Ke Z (2000) Automated essay scoring: a cross-disciplinary perspective. Artif Intell Rev 14(1\u20132):25\u201334","journal-title":"Artif Intell Rev"},{"issue":"2","key":"10449_CR18","first-page":"123","volume":"30","author":"Z Ke","year":"2020","unstructured":"Ke Z, Xie B (2020) Automated feedback mechanisms for programming education. Int J Artif Intell Educ 30(2):123\u2013145","journal-title":"Int J Artif Intell Educ"},{"issue":"7","key":"10449_CR19","doi-asserted-by":"publisher","first-page":"385","DOI":"10.1145\/360248.360252","volume":"19","author":"JC King","year":"1976","unstructured":"King JC (1976) Symbolic execution and program testing. Commun ACM 19(7):385\u2013394","journal-title":"Commun ACM"},{"key":"10449_CR20","doi-asserted-by":"crossref","unstructured":"Lagakis P, Demetriadis S, Psathas G (2024) Automated grading in coding exercises using large language models. In: Proceedings of the 17th international conference on interactive mobile communication technologies and learning (IMCL 2023), Springer, pp 363\u2013373","DOI":"10.1007\/978-3-031-54327-2_37"},{"key":"10449_CR21","doi-asserted-by":"crossref","unstructured":"Timotej L, Martin M, Ivan B (2017) Automatic extraction of AST patterns for debugging student programs. In: Lecture notes in computer science vol 10331, pp 162\u2013174","DOI":"10.1007\/978-3-319-61425-0_14"},{"key":"10449_CR22","unstructured":"Liu X, Wang S, Wang P, Wu D (2024) Automatic grading of programming assignments: an approach based on formal semantics. In: Proceedings of the international conference on software engineering, University Park, PA, ACM, pp 123\u2013134"},{"key":"10449_CR23","doi-asserted-by":"crossref","unstructured":"Liu X, Liu Y, Tang J (2021) What makes good in-context examples for gpt-3? arXiv:2101.06804","DOI":"10.18653\/v1\/2022.deelio-1.10"},{"issue":"1","key":"10449_CR24","first-page":"1","volume":"12","author":"E Mayfield","year":"2020","unstructured":"Mayfield E, Black A (2020) Should we use AI to grade essays? J Educ Data Min 12(1):1\u20137","journal-title":"J Educ Data Min"},{"issue":"1","key":"10449_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3636515","volume":"1","author":"M Messer","year":"2023","unstructured":"Messer M, Brown NCC, K\u00f6lling M, Shi M (2023) Automated grading and feedback tools for programming education: a systematic review. ACM Trans Comput Educ 1(1):1\u201343","journal-title":"ACM Trans Comput Educ"},{"issue":"4","key":"10449_CR26","doi-asserted-by":"publisher","first-page":"1647","DOI":"10.1109\/TR.2016.2570554","volume":"65","author":"J Ming","year":"2016","unstructured":"Ming J, Zhang F, Wu D, Liu P, Zhu S (2016) Deviation-based obfuscation-resilient program equivalence checking with application to software plagiarism detection. IEEE Trans Reliab 65(4):1647\u20131664","journal-title":"IEEE Trans Reliab"},{"key":"10449_CR27","unstructured":"Mistral (2024) Introducing codestral: a revolutionary approach to code generation. Accessed 02 Aug 2024"},{"issue":"20","key":"10449_CR28","doi-asserted-by":"publisher","first-page":"358","DOI":"10.35631\/IJMOE.620027","volume":"6","author":"M Munisamy","year":"2024","unstructured":"Munisamy M, Osman SZ, Sanmugam M (2024) Code, click, learn: a systematic review of online assessment tools in 21st century programming education. Int J Mod Educ 6(20):358\u2013377","journal-title":"Int J Mod Educ"},{"issue":"3","key":"10449_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3513140","volume":"22","author":"JC Paiva","year":"2022","unstructured":"Paiva JC, Leal P, Figueira \u00c1 (2022) Automated assessment in computer science education: a state-of-the-art review. ACM Trans Comput Educ 22(3):1\u201340","journal-title":"ACM Trans Comput Educ"},{"key":"10449_CR30","unstructured":"Piech C, Bassen J, Huang J, Ganguli S, Sahami M, Guibas L, Sohl-Dickstein J (2015) Deep knowledge tracing. In: Advances in neural information processing systems, pp 505\u2013513"},{"key":"10449_CR31","unstructured":"Piech C, Huang J, Nguyen A, Phulsuksombati M, Sahami M, Guibas L (2015) Learning program embeddings to propagate feedback on student code. In: Proceedings of the 32nd international conference on machine learning, pp 1093\u20131102"},{"key":"10449_CR32","doi-asserted-by":"crossref","unstructured":"Saikkonen R, Malmi L, Korhonen A(2001) Fully automatic assessment of programming exercises. In: Proceedings of the 6th annual conference on innovation and technology in computer science education, pp 133\u2013136","DOI":"10.1145\/377435.377666"},{"key":"10449_CR33","doi-asserted-by":"crossref","unstructured":"Taghipour K, Ng HT (2016) A neural approach to automated essay scoring. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1882\u20131891","DOI":"10.18653\/v1\/D16-1193"},{"issue":"1","key":"10449_CR34","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1080\/03057267.2020.1735757","volume":"56","author":"X Zhai","year":"2020","unstructured":"Zhai X, Yin Y, Pellegrino JW, Haudek KC, Shi L (2020) Applying machine learning in science assessment: a systematic review. Stud Sci Educ 56(1):111\u2013151","journal-title":"Stud Sci Educ"},{"key":"10449_CR35","doi-asserted-by":"crossref","unstructured":"Zhang F, Wu D, Liu P, Zhu S (2014) Program logic based software plagiarism detection. In: 2014 IEEE 25th international symposium on software reliability engineering, IEEE, pp. 66\u201377","DOI":"10.1109\/ISSRE.2014.18"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10449-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-10449-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10449-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T00:53:35Z","timestamp":1737680015000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-10449-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,16]]},"references-count":35,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10449"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-10449-y","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,16]]},"assertion":[{"value":"14 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 October 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"There is no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}