{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T22:25:36Z","timestamp":1761171936821,"version":"build-2065373602"},"reference-count":61,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T00:00:00Z","timestamp":1760745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Student dropout remains a persistent challenge in higher education, with substantial personal, institutional, and societal costs. We developed a modular dropout prediction pipeline that couples data preprocessing with multi-model benchmarking and a governance-ready explainability layer. Using 17,883 undergraduate records from a Moroccan higher education institution, we evaluated nine algorithms (logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbors (k-NN), support vector machine (SVM), gradient boosting, Extreme Gradient Boosting (XGBoost), Na\u00efve Bayes (NB), and multilayer perceptron (MLP)). On our test set, XGBoost attained an area under the receiver operating characteristic curve (AUC\u2013ROC) of 0.993, F1-score of 0.911, and recall of 0.944. Subgroup reporting supported governance and fairness: across credit\u2013load bins, recall remained high and stable (e.g., &lt;9 credits: precision 0.85, recall 0.932; 9\u201312: 0.886\/0.969; &gt;12: 0.915\/0.936), with full TP\/FP\/FN\/TN provided. A Shapley additive explanations (SHAP)-based layer identified risk and protective factors (e.g., administrative deadlines, cumulative GPA, and passed-course counts), surfaced ambiguous and anomalous cases for human review, and offered case-level diagnostics. To assess generalization, we replicated our findings on a public dataset (UCI\u2013Portugal; tables only): XGBoost remained the top-ranked (F1-score 0.792, AUC\u2013ROC 0.922). Overall, boosted ensembles combined with SHAP delivered high accuracy, transparent attribution, and governance-ready outputs, enabling responsible early-warning implementation for student retention.<\/jats:p>","DOI":"10.3390\/a18100662","type":"journal-article","created":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T12:27:29Z","timestamp":1760963249000},"page":"662","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Modular and Explainable Machine Learning Pipeline for Student Dropout Prediction in Higher Education"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-6951-1254","authenticated-orcid":false,"given":"Abdelkarim","family":"Bettahi","sequence":"first","affiliation":[{"name":"AMIPS Research Team, E3S Research Center, Computer Science Department, Mohammadia School of Engineers, Mohammed V University in Rabat, Avenue Ibn Sina B.P. 765, Rabat 10090, Morocco"}]},{"given":"Fatima-Zahra","family":"Belouadha","sequence":"additional","affiliation":[{"name":"AMIPS Research Team, E3S Research Center, Computer Science Department, Mohammadia School of Engineers, Mohammed V University in Rabat, Avenue Ibn Sina B.P. 765, Rabat 10090, Morocco"}]},{"given":"Hamid","family":"Harroud","sequence":"additional","affiliation":[{"name":"School of Science and Engineering, Al Akhawayn University in Ifrane, Ifrane 53000, Morocco"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ara\u00fajo, S.O., Peres, R.S., Ramalho, J.C., Lidon, F., and Barata, J. (2023). Machine Learning Applications in Agriculture: Current Trends, Challenges, and Future Perspectives. Agronomy, 13.","DOI":"10.3390\/agronomy13122976"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"An, Q., Rahman, S., Zhou, J., and Kang, J.J. (2023). A Comprehensive Review on Machine Learning in Healthcare Industry: Classification, Restrictions, Opportunities and Challenges. Sensors, 23.","DOI":"10.3390\/s23094178"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1007\/s42979-021-00592-x","article-title":"Machine Learning: Algorithms, Real-World Applications and Research Directions","volume":"2","author":"Sarker","year":"2021","journal-title":"SN Comput. Sci."},{"key":"ref_4","unstructured":"Correia, M., and Colombini, E.L. (2021). Attention, please! A survey of Neural Attention Models in Deep Learning. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3503044","article-title":"A General Survey on Attention Mechanisms in Deep Learning","volume":"55","author":"Brauwers","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_6","first-page":"121107","article-title":"Visual Attention Methods in Deep Learning: An In-Depth Survey","volume":"235","author":"Hassanin","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Quimiz-Moreira, M., Delgadillo, R., Parraga-Alava, J., Maculan, N., and Mauricio, D. (2025). Factors, prediction, explainability, and simulating university dropout through machine learning: A systematic review, 2012\u20132024. Computation, 13.","DOI":"10.3390\/computation13080198"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Alghamdi, S., Soh, B., and Li, A. (2025). ISELDP: An enhanced dropout prediction model using a stacked ensemble approach for in-session learning platforms. Electronics, 14.","DOI":"10.3390\/electronics14132568"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ba\u00f1eres, D., Rodr\u00edguez, M.E., Guerrero-Rold\u00e1n, A.E., and Karadeniz, A. (2020). An early warning system to detect at-risk students in online higher education. Appl. Sci., 10.","DOI":"10.3390\/app10134427"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Padmasiri, P., and Kasthuriarachchi, S. (2024, January 3). Interpretable prediction of student dropout using explainable AI models. Proceedings of the International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka.","DOI":"10.1109\/SCSE61872.2024.10550525"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"100203","DOI":"10.1016\/j.teler.2025.100203","article-title":"The integration of explainable AI in educational data mining for student academic performance prediction and support system","volume":"18","author":"Islam","year":"2025","journal-title":"Telemat. Inform. Rep."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Demirt\u00fcrk, B., and Haruno\u011flu, T. (2025). A comparative analysis of different machine learning algorithms developed with hyperparameter optimization in the prediction of student academic success. Appl. Sci., 15.","DOI":"10.3390\/app15115879"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1007\/s44163-023-00079-z","article-title":"Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study","volume":"4","author":"Villar","year":"2024","journal-title":"Discov. Artif. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Psyridou, M., Prezja, F., Torppa, M., Lerkkanen, M.-K., Poikkeus, A.-M., and Vasalampi, K. (2024). Machine learning predicts upper secondary education dropout as early as the end of primary school. arXiv.","DOI":"10.1038\/s41598-024-63629-0"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Martinez, A.L.J., Sood, K., and Mahto, R. (2025). Early detection of at-risk students using machine learning. Foundations of Computer Science and Frontiers in Education: Computer Science and Computer Engineering, Springer Nature.","DOI":"10.1007\/978-3-031-85930-4_36"},{"key":"ref_16","first-page":"C5MC89","article-title":"Predict Students\u2019 Dropout and Academic Success [Dataset]","volume":"10","author":"Realinho","year":"2021","journal-title":"UCI Mach. Learn. Repos."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"118","DOI":"10.2478\/eurodl-2014-0008","article-title":"Predicting dropout student: An application of data mining methods in an online education program","volume":"17","author":"Yukselturk","year":"2014","journal-title":"Eur. J. Open Distance e-Learn."},{"key":"ref_18","unstructured":"Dekker, G.W., Pechenizkiy, M., and Vleeshouwers, J.M. (2009, January 1\u20133). Predicting students drop out: A case study. Proceedings of the 2nd International Conference on Educational Data Mining (EDM), Cordoba, Spain."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hassan, M.A., Muse, A.H., and Nadarajah, S. (2024). Predicting student dropout rates using supervised machine learning: Insights from the 2022 national education accessibility survey in Somaliland. Appl. Sci., 14.","DOI":"10.3390\/app14177593"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"El Mahmoudi, A., Chaoui, N.E.H., and Chaoui, H. (2025). Predictive analytics leveraging a machine learning approach to identify students\u2019 reasons for dropping out of university. Appl. Sci., 15.","DOI":"10.3390\/app15158496"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Panizales, W., Lagunzad, H., Ferrer, M., Villanueva, C., Kalaw, D., and Garcia, K. (2025, January 26\u201329). Predicting student dropout rates in online education platforms utilizing Naive Bayes algorithm. Proceedings of the 16th International Conference on E-Education, E-Business, E-Management and E-Learning (IC4e), Tokyo, Japan.","DOI":"10.1109\/IC4e65071.2025.11075311"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kuntintara, W., Warabuntaweesuk, P., and Rattapasakorn, S. (2024, January 23\u201324). Student dropout prediction using machine learning. Proceedings of the 9th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand.","DOI":"10.1109\/ICBIR61386.2024.10875840"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chen, L.-S., Huynh-Cam, T.-T., Nalluri, V., Lu, T.-C., and Agrawal, S. (2024, January 28\u201330). Determining important features for first-year student dropouts using artificial intelligence algorithms. Proceedings of the 6th International Workshop on Artificial Intelligence and Education (WAIE), Tokyo, Japan.","DOI":"10.1109\/WAIE63876.2024.00030"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Kalita, E., Alfarwan, A.M., El Aouifi, H., Kukkar, A., Hussain, S., Ali, T., and Gaftandzhieva, S. (2025). Predicting student academic performance using Bi-LSTM: A deep learning framework with SHAP-based interpretability and statistical validation. Front. Educ., 10.","DOI":"10.3389\/feduc.2025.1581247"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s13278-025-01446-7","article-title":"Kanformer: An attention-enhanced deep learning model for predicting student performance in virtual learning environments","volume":"15","author":"Alnasyan","year":"2025","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yang, S., Xiao, Y., and Meng, F. (2024, January 18\u201320). Deep learning-based method for predicting student dropouts in MOOCs. Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP), Chengdu, China.","DOI":"10.1109\/MLNLP63328.2024.10800676"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kakarla, S., and Mahany, D. (2024, January 28\u201330). Implementation of artificial neural networks and machine learning algorithms to evaluate and predict educational student dropout metrics. Proceedings of the 6th International Workshop on Artificial Intelligence and Education (WAIE), Tokyo, Japan.","DOI":"10.1109\/WAIE63876.2024.00026"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Castro, R.Q., Garcia, K.C., and Jarrin, E.P. (2025, January 18\u201320). Predictive modeling and explainability for academic dropout risk detection using machine learning. Proceedings of the Eleventh International Conference on eDemocracy & eGovernment (ICEDEG), Bern, Switzerland.","DOI":"10.1109\/ICEDEG65568.2025.11081590"},{"key":"ref_29","first-page":"105","article-title":"An investigation into dropout indicators in secondary technical education using explainable artificial intelligence","volume":"20","author":"Silva","year":"2025","journal-title":"IEEE Rev. Iberoam. Tecnol. Aprendiz."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"P\u00e9rez, M., Navarrete, D., Baldeon-Calisto, M., Guerrero, Y., and Sarmiento, A. (2025, January 29\u201330). Unlocking student success: Applying machine learning for predicting student dropout in higher education. Proceedings of the 2025 13th International Symposium on Digital Forensics and Security (ISDFS), Antalya, Turkey.","DOI":"10.1109\/ISDFS65363.2025.11012013"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Abdullah, A., Ali, R.H., Koutaly, R., Khan, T.A., and Ahmad, I. (2025, January 18\u201320). Enhancing student retention: Predictive machine learning models for identifying and preventing university dropout. Proceedings of the 2025 International Conference on Innovation in Artificial Intelligence and Internet of Things (AIIT), Dubai, United Arab Emirates.","DOI":"10.1109\/AIIT63112.2025.11082926"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., and Guestrin, C. (2016, January 13\u201317). \u201cWhy should I trust you?\u201d: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201916), San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939778"},{"key":"ref_33","unstructured":"Guyon, I., von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017, January 4\u20139). A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA. Available online: https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/8a20a8621978632d76c43dfd28b67767-Paper.pdf."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"52138","DOI":"10.1109\/ACCESS.2018.2870052","article-title":"Peeking inside the black-box: A survey on explainable artificial intelligence (XAI)","volume":"6","author":"Adadi","year":"2018","journal-title":"IEEE Access"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1109\/TSMCC.2010.2053532","article-title":"Educational Data Mining: A Review of the State of the Art","volume":"40","author":"Romero","year":"2010","journal-title":"IEEE Trans. Syst. Man Cybern. C Appl. Rev."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Little, R.J.A., and Rubin, D.B. (2019). Statistical Analysis with Missing Data, Wiley. [3rd ed.].","DOI":"10.1002\/9781119482260"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Baker, R.S., and Inventado, P.S. (2014). Educational data mining and learning analytics. Learning Analytics, Springer.","DOI":"10.1007\/978-1-4614-3305-7_4"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1007\/s40593-015-0048-x","article-title":"Student modeling based on problem solving times","volume":"25","year":"2015","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1023\/A:1022643204877","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Mach. Learn."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201916), San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hosmer, D.W., and Lemeshow, S. (2000). Introduction to the Logistic Regression Model. Applied Logistic Regression, Wiley. [2nd ed.].","DOI":"10.1002\/0471722146"},{"key":"ref_43","unstructured":"Rish, I. (2001, January 4). An empirical study of the naive Bayes classifier. Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest neighbor pattern classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1037\/h0042519","article-title":"The perceptron: A probabilistic model for information storage and organization in the brain","volume":"65","author":"Rosenblatt","year":"1958","journal-title":"Psychol. Rev."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"103676","DOI":"10.1016\/j.compedu.2019.103676","article-title":"An overview and comparison of supervised data mining techniques for student exam performance prediction","volume":"143","author":"Tomasevic","year":"2020","journal-title":"Comput. Educ."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Dass, S., Gary, K., and Cunningham, J. (2021). Predicting student dropout in self-paced MOOC course using random forest model. Information, 12.","DOI":"10.3390\/info12110476"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"102474","DOI":"10.1016\/j.techsoc.2024.102474","article-title":"Predicting student dropouts with machine learning: An empirical study in Finnish higher education","volume":"76","author":"Vaarma","year":"2024","journal-title":"Technol. Soc."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Carballo-Mend\u00edvil, B., Arellano-Gonz\u00e1lez, A., R\u00edos-V\u00e1zquez, N.J., and Lizardi-Duarte, M.P. (2025). Predicting student dropout from day one: XGBoost-based early warning system using pre-enrollment data. Appl. Sci., 15.","DOI":"10.3390\/app15169202"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"887","DOI":"10.35631\/IJMOE.724064","article-title":"Ensemble learning in educational data analysis for improved prediction of student performance: A literature review","volume":"7","author":"Shir","year":"2025","journal-title":"Int. J. Mod. Educ."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Mojumder Anik, M.S., Israt Zerin, S., Ayman, U., Muntakim, M.A., Atif Asif Khan Akash, M., and Imam Bijoy, M.H. (2025, January 27\u201329). Dropout prediction of university students in Bangladesh using machine learning technique. Proceedings of the 2025 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox\u2019s Bazar, Bangladesh.","DOI":"10.1109\/ECCE64574.2025.11012934"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Davis, J., and Goadrich, M. (2006, January 25\u201329). The relationship between precision\u2013recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning (ICML \u201906), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143874"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision\u2013recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS ONE"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1007\/s40593-023-00331-8","article-title":"Interpretable dropout prediction: Towards XAI-based personalized intervention","volume":"34","author":"Nagy","year":"2024","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Wachter, S., Mittelstadt, B., and Russell, C. (2018). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. arXiv.","DOI":"10.2139\/ssrn.3063289"},{"key":"ref_59","unstructured":"Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. arXiv."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin","year":"2019","journal-title":"Nat. Mach. Intell."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/3236386.3241340","article-title":"The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery","volume":"16","author":"Lipton","year":"2018","journal-title":"Queue"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/10\/662\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T04:20:54Z","timestamp":1761106854000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/10\/662"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,18]]},"references-count":61,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["a18100662"],"URL":"https:\/\/doi.org\/10.3390\/a18100662","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2025,10,18]]}}}