{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T19:31:16Z","timestamp":1772652676297,"version":"3.50.1"},"reference-count":38,"publisher":"Elsevier BV","issue":"6","license":[{"start":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T00:00:00Z","timestamp":1758758400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T00:00:00Z","timestamp":1758758400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001774","name":"University of Sydney","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001774","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Student data from online programming courses can provide valuable insights into how students approach problems and acquire critical computational skills, as well as about the challenges they face in the process. However, extracting these insights from data can be challenging, especially when dealing with large, complex and multi-dimensional datasets. In this paper, we propose a machine learning approach to predicting student progress at a module level in large-scale online programming courses. Our approach defines suitable content interaction features from the log data that measure student engagement with course material, and then creates a decision tree classifier to predict performance on the last problem in the module. Using data from four large-scale programming courses for upper primary and high school students, we demonstrate that this approach can produce accurate predictions of student progress and dropouts. Additionally, our approach provides interpretable tree-based visualisations that identify key course materials and programming tasks for successful course completion, and also highlights differences in student behaviour between courses. The predictive models were found to be informative by educators for improving the design of these online courses. The intrinsically explainable decision trees provided competitive accuracy compared to more advanced black-box models. By providing insights into the pedagogical value of course content, our approach facilitates a data-driven approach to increasing learning outcomes.<\/jats:p>","DOI":"10.1007\/s40593-025-00510-9","type":"journal-article","created":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T16:12:54Z","timestamp":1758816774000},"page":"3614-3644","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A Machine Learning Approach for Predicting Student Progress in Online Programming Education"],"prefix":"10.1016","volume":"35","author":[{"given":"Vincent","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Bryn","family":"Jeffries","sequence":"additional","affiliation":[]},{"given":"Irena","family":"Koprinska","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2025,9,25]]},"reference":[{"key":"510_CR1","doi-asserted-by":"crossref","unstructured":"Abdelrahman, G., Wang, Q., Nunes, B. (2023). Knowledge tracing: A survey. ACM Compututing Surveys. 55(11)","DOI":"10.1145\/3569576"},{"issue":"3","key":"510_CR2","first-page":"1","volume":"11","author":"J Berens","year":"2019","unstructured":"Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from german universities and machine learning methods. Journal of Educational Data Mining., 11(3), 1\u201341.","journal-title":"Journal of Educational Data Mining."},{"issue":"2","key":"510_CR3","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1207\/s15327051hci0102_3","volume":"1","author":"J Bonar","year":"1985","unstructured":"Bonar, J., & Soloway, E. (1985). Preprogramming knowledge: A major source of misconceptions in novice programmers. Human-Computer Interaction., 1(2), 133\u2013161.","journal-title":"Human-Computer Interaction."},{"issue":"1","key":"510_CR4","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1109\/TLT.2016.2616312","volume":"10","author":"R Conijn","year":"2017","unstructured":"Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from lms data: A comparison of 17 blended courses using moodle lms. IEEE Transactions on Learning Technologies., 10(1), 17\u201329.","journal-title":"IEEE Transactions on Learning Technologies."},{"issue":"4","key":"510_CR5","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1007\/BF01099821","volume":"4","author":"AT Corbett","year":"1995","unstructured":"Corbett, A. T., & Anderson, J. R. (1995). Knowledge tracing: Modelling the acquisition of procedural knowledge. User Modelling and User-Adapted Interaction., 4(4), 253\u2013278.","journal-title":"User Modelling and User-Adapted Interaction."},{"issue":"2","key":"510_CR6","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1145\/1138403.1138432","volume":"38","author":"NB Dale","year":"2006","unstructured":"Dale, N. B. (2006). Most difficult topics in CS1: Results of an online survey of educators. ACM SIGCSE Bulletin., 38(2), 49\u201353.","journal-title":"ACM SIGCSE Bulletin."},{"key":"510_CR7","doi-asserted-by":"crossref","unstructured":"Dsilva, V., Schleiss, J., Stober, S. (2023). Trustworthy academic risk prediction with explainable boosting machines. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 463\u2013475","DOI":"10.1007\/978-3-031-36272-9_38"},{"issue":"6","key":"510_CR8","doi-asserted-by":"publisher","first-page":"1437","DOI":"10.1109\/TKDE.2003.1245283","volume":"15","author":"MA Hall","year":"2003","unstructured":"Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering., 15(6), 1437\u20131447.","journal-title":"IEEE Transactions on Knowledge and Data Engineering."},{"key":"510_CR9","doi-asserted-by":"crossref","unstructured":"Jeffries, B., Baldwin, T., Zalk, M., Taylor, B. (2020). Online tutoring to support programming exercises. In: Proceedings of the Twenty-Second Australasian Computing Education Conference, pp. 56\u201365","DOI":"10.1145\/3373165.3373172"},{"issue":"4","key":"510_CR10","doi-asserted-by":"publisher","first-page":"450","DOI":"10.1109\/TLT.2017.2689017","volume":"10","author":"T Kaser","year":"2017","unstructured":"Kaser, T., Klingler, S., Schwing, A. G., & Gross, M. (2017). Dynamic Bayesian networks for student modeling. IEEE Transactions on Learning Technologies., 10(4), 450\u2013462.","journal-title":"IEEE Transactions on Learning Technologies."},{"key":"510_CR11","doi-asserted-by":"crossref","unstructured":"Kemper, L., Vorhoff, G., & Wigger, B.U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education., 10(1), 28\u201347.","DOI":"10.1080\/21568235.2020.1718520"},{"key":"510_CR12","doi-asserted-by":"crossref","unstructured":"Koedinger, K. R., McLaughlin, E. A., Jia, J. Z., Bier, N. L. (2016). Is the doer effect a causal relationship? In: Proceedings of the International Conference on Learning Analytics and Knowledge, pp. 388\u2013397","DOI":"10.1145\/2883851.2883957"},{"key":"510_CR13","doi-asserted-by":"crossref","unstructured":"Koprinska, I., Stretton, J., Yacef, K. (2015). Predicting student performance from multiple data sources. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 678\u2013681","DOI":"10.1007\/978-3-319-19773-9_90"},{"key":"510_CR14","unstructured":"Koprinska, I., Stretton, J., Yacef, K. (2015). Students at risk: detection and remediation. In: Proceedings of the International Conference on Educational Data Mining, pp. 512\u2013515"},{"key":"510_CR15","unstructured":"Kuzilek, J., Hlosta, M., Herrmannova, D., Zdrahal, Z., Vaclavek, J., Wolff, A. (2015). Ou analyse: analysing at-risk students at the open university authors. Learning Analytics Review, pp. 1\u201316"},{"key":"510_CR16","doi-asserted-by":"crossref","unstructured":"Kuzilek, J., Hlosta, M., Zdrahal, Z. (2017) Open university learning analytics dataset. Scientific Data. (4)","DOI":"10.1038\/sdata.2017.171"},{"key":"510_CR17","unstructured":"Lundberg, S. M., Lee, S.-I. (2017). A unified approach to interpreting model predictions. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 4765\u20134774"},{"issue":"3","key":"510_CR18","doi-asserted-by":"publisher","first-page":"950","DOI":"10.1016\/j.compedu.2009.05.010","volume":"53","author":"I Lykourentzou","year":"2009","unstructured":"Lykourentzou, I., Giannoukos, I., Nikolopoulos, V., Mpardis, G., & Loumos, V. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. Computers & Education., 53(3), 950\u2013965.","journal-title":"Computers & Education."},{"key":"510_CR19","unstructured":"McBroom, J., Jeffries, B., Koprinska, I., Yacef, K. (2016). Mining behaviors of students in autograding submission system logs. In: Educational Data Mining, pp. 159\u2013166"},{"key":"510_CR20","doi-asserted-by":"crossref","unstructured":"McBroom, J., Paassen, B., Jeffries, B., Koprinska, I., Yacef, K. (2021). Progress networks as a tool for analysing student programming difficulties. In: Proceedings of the Australasian Conference on Computing Education, pp. 158\u2013167","DOI":"10.1145\/3441636.3442366"},{"key":"510_CR21","doi-asserted-by":"crossref","unstructured":"McBroom, J., Yacef, K., Koprinska, I. (2020). Detect: A hierarchical clustering algorithm for behavioural trends in temporal educational data. In: Artificial Intelligence in Education, pp. 374\u2013385. Springer, Cham","DOI":"10.1007\/978-3-030-52237-7_30"},{"key":"510_CR22","doi-asserted-by":"crossref","unstructured":"Pardos, Z. A., Heffernan, N. T. (2010). Modeling individualization in a Bayesian networks implementation of knowledge tracing. In: De\u00a0Bra, P., Kobsa, A., Chin, D. (eds.) Proceedings of the 18th International Conference on User Modeling, Adaptation, and Personalization (UMAP), pp. 255\u2013266","DOI":"10.1007\/978-3-642-13470-8_24"},{"key":"510_CR23","doi-asserted-by":"crossref","unstructured":"Pereira, F. D., Oliveira, E., Cristea, A., Fernandes, D., Silva, L., Aguiar, G., Alamri, A., Alshehri, M. (2019). Early dropout prediction for programming courses supported by online judges. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 67\u201372","DOI":"10.1007\/978-3-030-23207-8_13"},{"issue":"6","key":"510_CR24","doi-asserted-by":"publisher","first-page":"759","DOI":"10.1109\/TKDE.2008.138","volume":"21","author":"D Perera","year":"2009","unstructured":"Perera, D., Kay, J., Koprinska, I., Yacef, K., & Za\u00efane, O. (2009). Clustering and sequential pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering., 21(6), 759\u2013772.","journal-title":"IEEE Transactions on Knowledge and Data Engineering."},{"key":"510_CR25","unstructured":"Piech, C., Bassen, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., Sohl-Dickstein, J. (2015). Deep knowledge tracing. In: In Proceedings of the International Conference on Neural Information Processing Systems, pp. 505\u2013513"},{"key":"510_CR26","doi-asserted-by":"crossref","unstructured":"Retzlaff, C. O., Angerschmid, A., Saranti, A., Schneeberger, D., R\u00f6ttger, R., M\u00fcller, H., Holzinger, A. (2024). Post-hoc vs ante-hoc explanations: xAI design guidelines for data scientists. Cognitive Systems Research, 86","DOI":"10.1016\/j.cogsys.2024.101243"},{"key":"510_CR27","doi-asserted-by":"crossref","unstructured":"Ribeiro, M. T., Singh, S., Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135\u20131144","DOI":"10.1145\/2939672.2939778"},{"key":"510_CR28","doi-asserted-by":"publisher","DOI":"10.1016\/j.compedu.2020.104108","volume":"163","author":"M Riestra-Gonz\u00e1lez","year":"2021","unstructured":"Riestra-Gonz\u00e1lez, M., Puerto Paule-Ru\u00edz, M., & Ortin, F. (2021). Massive lms log data analysis for the early prediction of course-agnostic student performance. Computers & Education., 163, Article 104108.","journal-title":"Computers & Education."},{"key":"510_CR29","doi-asserted-by":"publisher","first-page":"458","DOI":"10.1016\/j.compedu.2013.06.009","volume":"68","author":"C Romero","year":"2013","unstructured":"Romero, C., L\u00f3pez, M.-I., Luna, J.-M., & Ventura, S. (2013). Predicting students\u2019 final performance from participation in on-line discussion forums. Computers & Education., 68, 458\u2013472.","journal-title":"Computers & Education."},{"key":"510_CR30","doi-asserted-by":"crossref","unstructured":"Sentance, S., Waite, J., Kallia, M. (2019). Teachers\u2019 experiences of using primm to teach programming in school. In: Proceedings of the 50th ACM Technical Symposium on Computer Science Education, pp. 476\u2013482. ACM, New York, NY, USA","DOI":"10.1145\/3287324.3287477"},{"key":"510_CR31","doi-asserted-by":"publisher","DOI":"10.1016\/j.compedu.2019.103676","volume":"143","author":"N Tomasevic","year":"2020","unstructured":"Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & Education., 143, Article 103676.","journal-title":"Computers & Education."},{"key":"510_CR32","volume-title":"How to expand and improve computer science education around the world","author":"E Vegas","year":"2021","unstructured":"Vegas, E., Hansen, M., & Fowler, B. (2021). How to expand and improve computer science education around the world. Brookings Institution: Technical report."},{"key":"510_CR33","doi-asserted-by":"crossref","unstructured":"Wagner, K., Volkening, H., Basyigit, S., Merceron, A., Sauer, P., Pinkwart, N. (2023). Which approach best predicts dropouts in higher education? In: International Conference on Computer Supported Education, pp. 15\u201326","DOI":"10.5220\/0011838100003470"},{"key":"510_CR34","doi-asserted-by":"crossref","unstructured":"Weintrop, D., Wilensky, U. (2017). Comparing block-based and text-based programming in high school computer science classrooms. ACM Transactions on Computing Education, 18(1)","DOI":"10.1145\/3089799"},{"key":"510_CR35","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1145\/234867.234872","volume":"28","author":"LE Winslow","year":"1996","unstructured":"Winslow, L. E. (1996). Programming pedagogy\u2014a psychological overview. ACM SIGCSE Bulletin., 28, 17\u201322.","journal-title":"ACM SIGCSE Bulletin."},{"key":"510_CR36","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques","author":"IH Witten","year":"2016","unstructured":"Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.","edition":"4"},{"key":"510_CR37","doi-asserted-by":"crossref","unstructured":"Yudelson, M. V., Koedinger, K. R., Gordon, G. J. (2013). Individualized Bayesian knowledge tracing models. In: In Proceedings of the International Conference on Artificial Intelligence in Education, pp. 171\u2013180","DOI":"10.1007\/978-3-642-39112-5_18"},{"key":"510_CR38","doi-asserted-by":"crossref","unstructured":"Zhang, V., Jeffries, B., Koprinska, I. (2023). Predicting progress in a large-scale online programming course. In: Proceedings of the International Conference on Artificial Intelligence in Education, pp. 810\u2013816","DOI":"10.1007\/978-3-031-36272-9_76"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00510-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-025-00510-9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00510-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:47Z","timestamp":1772647967000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-025-00510-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,25]]},"references-count":38,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["510"],"URL":"https:\/\/doi.org\/10.1007\/s40593-025-00510-9","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,25]]},"assertion":[{"value":"3 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 September 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}