{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:30:01Z","timestamp":1777735801940,"version":"3.51.4"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,4]],"date-time":"2024-01-04T00:00:00Z","timestamp":1704326400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,4]],"date-time":"2024-01-04T00:00:00Z","timestamp":1704326400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Discov Artif Intell"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Utilizing a dataset sourced from a higher education institution, this study aims to assess the efficacy of diverse machine learning algorithms in predicting student dropout and academic success. Our focus was on algorithms capable of effectively handling imbalanced data. To tackle class imbalance, we employed the SMOTE resampling technique. We applied a range of algorithms, including Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), as well as boosting algorithms such as Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), CatBoost (CB), and Light Gradient Boosting Machine (LB). To enhance the models' performance, we conducted hyperparameter tuning using Optuna. Additionally, we employed the Isolation Forest (IF) method to identify outliers or anomalies within the dataset. Notably, our findings indicate that boosting algorithms, particularly LightGBM and CatBoost with Optuna, outperformed traditional classification methods. Our study's generalizability to other contexts is constrained due to its reliance on a single dataset, with inherent limitations. Nevertheless, this research provides valuable insights into the effectiveness of various machine learning algorithms for predicting student dropout and academic success. By benchmarking these algorithms, our project offers guidance to both researchers and practitioners in their choice of suitable approaches for similar predictive tasks.<\/jats:p>","DOI":"10.1007\/s44163-023-00079-z","type":"journal-article","created":{"date-parts":[[2024,1,4]],"date-time":"2024-01-04T18:02:16Z","timestamp":1704391336000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":73,"title":["Supervised machine learning algorithms for predicting student dropout and academic success: a comparative study"],"prefix":"10.1007","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8250-1340","authenticated-orcid":false,"given":"Alice","family":"Villar","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5344-2991","authenticated-orcid":false,"given":"Carolina Robledo Velini","family":"de Andrade","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,4]]},"reference":[{"key":"79_CR1","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1016\/j.childyouth.2018.11.030","volume":"96","author":"JY Chung","year":"2019","unstructured":"Chung JY, Lee S. Dropout early warning systems for high school students using machine learning. Child Youth Serv Rev. 2019;96:346\u201353.","journal-title":"Child Youth Serv Rev"},{"issue":"6","key":"79_CR2","doi-asserted-by":"publisher","first-page":"1028","DOI":"10.1080\/10494820.2019.1709209","volume":"30","author":"AF Gkontzis","year":"2022","unstructured":"Gkontzis AF, Kotsiantis S, Panagiotakopoulos CT, Verykios VS. A predictive analytics framework as a countermeasure for attrition of students. Interact Learn Environ. 2022;30(6):1028\u201343.","journal-title":"Interact Learn Environ"},{"key":"79_CR3","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.3275433","author":"J Berens","year":"2018","unstructured":"Berens J, Schneider K, G\u00f6rtz S, Oster S, Burghoff J. Early detection of students at risk\u2013predicting student dropouts using administrative student data and machine learning methods. SSRN J. 2018. https:\/\/doi.org\/10.2139\/ssrn.3275433.","journal-title":"SSRN J"},{"key":"79_CR4","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1007\/978-3-030-72657-7_16","volume-title":"Trends and applications in information systems and technologies","author":"MV Martins","year":"2021","unstructured":"Martins MV, Tolledo D, Machado J, Baptista LM, Realinho V. Early prediction of student\u2019s performance in higher education: a case study. In: Rocha \u00c1, Adeli H, Dzemyda G, Moreira F, Correia AMR, editors. Trends and applications in information systems and technologies, vol. 9. Berlin: Springer International Publishing; 2021. p. 166\u201375."},{"issue":"2","key":"79_CR5","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1016\/j.eswa.2013.07.046","volume":"41","author":"D Thammasiri","year":"2014","unstructured":"Thammasiri D, Delen D, Meesad P, Kasap N. A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Syst Appl. 2014;41(2):321\u201330.","journal-title":"Expert Syst Appl"},{"key":"79_CR6","doi-asserted-by":"publisher","first-page":"108288","DOI":"10.1016\/j.asoc.2021.108288","volume":"115","author":"A Islam","year":"2022","unstructured":"Islam A, Belhaouari SB, Rehman AU, Bensmail H. KNNOR: An oversampling technique for imbalanced datasets. Appl Soft Comput. 2022;115:108288.","journal-title":"Appl Soft Comput"},{"issue":"3","key":"79_CR7","doi-asserted-by":"publisher","first-page":"1042","DOI":"10.3390\/app10031042","volume":"10","author":"JL Rastrollo-Guerrero","year":"2020","unstructured":"Rastrollo-Guerrero JL, G\u00f3mez-Pulido JA, Dur\u00e1n-Dom\u00ednguez A. Analyzing and predicting students\u2019 performance by means of machine learning: a review. Appl Sci. 2020;10(3):1042.","journal-title":"Appl Sci"},{"issue":"5","key":"79_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.4018\/IJWLTT.20220901.oa4","volume":"17","author":"T Hamim","year":"2022","unstructured":"Hamim T, Benabbou F, Sael N. Student profile modeling using boosting algorithms. Int J Web-Based Learn Teach Technol. 2022;17(5):1\u201313.","journal-title":"Int J Web-Based Learn Teach Technol"},{"key":"79_CR9","doi-asserted-by":"crossref","unstructured":"Tenpipat W, Akkarajitsakul K. Student dropout prediction: a KMUTT case study. In: 2020 1st international conference on big data analytics and practices (IBDAP). IEEE. 2020. pp. 1\u20135.","DOI":"10.1109\/IBDAP50342.2020.9245457"},{"key":"79_CR10","doi-asserted-by":"publisher","first-page":"103724","DOI":"10.1016\/j.compedu.2019.103724","volume":"145","author":"KF Hew","year":"2020","unstructured":"Hew KF, Hu X, Qiao C, Tang Y. What predicts student satisfaction with MOOCs: a gradient boosting trees supervised machine learning and sentiment analysis approach. Comput Educ. 2020;145:103724.","journal-title":"Comput Educ"},{"key":"79_CR11","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1016\/j.jbusres.2018.02.012","volume":"94","author":"E Fernandes","year":"2019","unstructured":"Fernandes E, Holanda M, Victorino M, Borges V, Carvalho R, Van Erven G. Educational data mining: predictive analysis of academic performance of public school students in the capital of Brazil. J Bus Res. 2019;94:335\u201343.","journal-title":"J Bus Res"},{"issue":"7","key":"79_CR12","doi-asserted-by":"publisher","first-page":"e13209","DOI":"10.2196\/13209","volume":"7","author":"A Doryab","year":"2019","unstructured":"Doryab A, Villalba DK, Chikersal P, Dutcher JM, Tumminia M, Liu X, Dey AK. Identifying behavioral phenotypes of loneliness and social isolation with passive sensing: statistical analysis, data mining and machine learning of smartphone and fitbit data. JMIR mHealth uHealth. 2019;7(7):e13209.","journal-title":"JMIR mHealth uHealth"},{"key":"79_CR13","doi-asserted-by":"publisher","first-page":"587413","DOI":"10.3389\/fpsyg.2020.587413","volume":"11","author":"C Wang","year":"2020","unstructured":"Wang C, Zhao H, Zhang H. Chinese college students have higher anxiety in new semester of online learning during COVID-19: a machine learning approach. Front Psychol. 2020;11:587413.","journal-title":"Front Psychol"},{"key":"79_CR14","doi-asserted-by":"publisher","first-page":"2231","DOI":"10.3389\/fpsyg.2018.02231","volume":"9","author":"X Qiao","year":"2018","unstructured":"Qiao X, Jiao H. Data mining techniques in analyzing process data: a didactic. Front Psychol. 2018;9:2231.","journal-title":"Front Psychol"},{"key":"79_CR15","doi-asserted-by":"publisher","first-page":"2111","DOI":"10.2147\/NDT.S262004","volume":"16","author":"F Ge","year":"2020","unstructured":"Ge F, Zhang D, Wu L, Mu H. Predicting psychological state among Chinese undergraduate students in the COVID-19 epidemic: a longitudinal study using a machine learning. Neuropsychiatr Dis Treat. 2020;16:2111\u20138.","journal-title":"Neuropsychiatr Dis Treat"},{"key":"79_CR16","doi-asserted-by":"publisher","DOI":"10.1080\/10494820.2021.1928235","author":"A Asselman","year":"2021","unstructured":"Asselman A, Khaldi M, Aammou S. Enhancing the prediction of student performance based on the machine learning XGBoost algorithm. Interact Learn Environ. 2021. https:\/\/doi.org\/10.1080\/10494820.2021.1928235.","journal-title":"Interact Learn Environ"},{"issue":"6","key":"79_CR17","doi-asserted-by":"publisher","first-page":"e0217639","DOI":"10.1371\/journal.pone.0217639","volume":"14","author":"JS Jung","year":"2019","unstructured":"Jung JS, Park SJ, Kim EY, Na KS, Kim YJ, Kim KG. Prediction models for high risk of suicide in Korean adolescents using machine learning techniques. PLoS ONE. 2019;14(6):e0217639.","journal-title":"PLoS ONE"},{"issue":"2","key":"79_CR18","doi-asserted-by":"publisher","first-page":"1527","DOI":"10.1007\/s10639-020-10316-y","volume":"26","author":"R Costa-Mendes","year":"2021","unstructured":"Costa-Mendes R, Oliveira T, Castelli M, Cruz-Jesus F. A machine learning approximation of the 2015 Portuguese high school student grades: a hybrid approach. Educ Inf Technol. 2021;26(2):1527\u201347.","journal-title":"Educ Inf Technol"},{"key":"79_CR19","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1007\/s11145-020-10070-0","volume":"34","author":"J Chen","year":"2021","unstructured":"Chen J, Zhang Y, Hu J. Synergistic effects of instruction and affect factors on high-and low-ability disparities in elementary students\u2019 reading literacy. Read Writ. 2021;34:199\u2013230.","journal-title":"Read Writ"},{"key":"79_CR20","doi-asserted-by":"publisher","first-page":"140731","DOI":"10.1109\/ACCESS.2021.3119596","volume":"9","author":"A Nabil","year":"2021","unstructured":"Nabil A, Seyam M, Abou-Elfetouh A. Prediction of students\u2019 academic performance based on courses\u2019 grades using deep neural networks. IEEE Access. 2021;9:140731\u201346.","journal-title":"IEEE Access"},{"issue":"1","key":"79_CR21","doi-asserted-by":"publisher","first-page":"105","DOI":"10.3390\/su11010105","volume":"11","author":"SMR Abidi","year":"2018","unstructured":"Abidi SMR, Hussain M, Xu Y, Zhang W. Prediction of confusion attempting algebra homework in an intelligent tutoring system through machine learning techniques for educational sustainable development. Sustainability. 2018;11(1):105.","journal-title":"Sustainability"},{"key":"79_CR22","doi-asserted-by":"publisher","first-page":"100066","DOI":"10.1016\/j.caeai.2022.100066","volume":"3","author":"J Niyogisubizo","year":"2022","unstructured":"Niyogisubizo J, Liao L, Nziyumva E, Murwanashyaka E, Nshimyumukiza PC. Predicting student\u2019s dropout in university classes using two-layer ensemble machine learning approach: a novel stacked generalization. Comput Educ Artif Intell. 2022;3:100066.","journal-title":"Comput Educ Artif Intell"},{"issue":"12","key":"79_CR23","doi-asserted-by":"publisher","first-page":"8541","DOI":"10.1021\/acs.est.2c01778","volume":"56","author":"L Zhang","year":"2022","unstructured":"Zhang L, Li X, Chen H, Wu Z, Hu M, Yao M. Haze air pollution health impacts of breath-borne VOCs. Environ Sci Technol. 2022;56(12):8541\u201351.","journal-title":"Environ Sci Technol"},{"key":"79_CR24","doi-asserted-by":"publisher","DOI":"10.1002\/9781118548387","volume-title":"Applied logistic regression","author":"DW Hosmer Jr","year":"2013","unstructured":"Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression, vol. 398. Hoboken: John Wiley & Sons; 2013."},{"issue":"21","key":"79_CR25","doi-asserted-by":"publisher","first-page":"8187","DOI":"10.3390\/ijerph17218187","volume":"17","author":"J Cohen","year":"2020","unstructured":"Cohen J, Wright-Berryman J, Rohlfs L, Wright D, Campbell M, Gingrich D, Pestian J. A feasibility study using a machine learning suicide risk prediction model based on open-ended interview language in adolescent therapy sessions. Int J Environ Res Public Health. 2020;17(21):8187.","journal-title":"Int J Environ Res Public Health"}],"container-title":["Discover Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-023-00079-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44163-023-00079-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-023-00079-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,4]],"date-time":"2024-01-04T18:03:36Z","timestamp":1704391416000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44163-023-00079-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,4]]},"references-count":25,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["79"],"URL":"https:\/\/doi.org\/10.1007\/s44163-023-00079-z","relation":{},"ISSN":["2731-0809"],"issn-type":[{"value":"2731-0809","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,4]]},"assertion":[{"value":"27 June 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"We declare that we have no significant competing financial, professional, or personal interests that might have influenced the performance or presentation of the work described in this manuscript.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"2"}}