{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T07:33:04Z","timestamp":1775806384216,"version":"3.50.1"},"reference-count":41,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,4,8]],"date-time":"2023-04-08T00:00:00Z","timestamp":1680912000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"SATDAP\u2014Capacita\u00e7\u00e3o da Administra\u00e7\u00e3o P\u00fablica 388","award":["POCI-05-5762-FSE-000191"],"award-info":[{"award-number":["POCI-05-5762-FSE-000191"]}]},{"name":"SATDAP\u2014Capacita\u00e7\u00e3o da Administra\u00e7\u00e3o P\u00fablica 388","award":["UIDB\/05064\/2020"],"award-info":[{"award-number":["UIDB\/05064\/2020"]}]},{"DOI":"10.13039\/501100001871","name":"Portuguese Foundation for Science and Technology","doi-asserted-by":"publisher","award":["POCI-05-5762-FSE-000191"],"award-info":[{"award-number":["POCI-05-5762-FSE-000191"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Portuguese Foundation for Science and Technology","doi-asserted-by":"publisher","award":["UIDB\/05064\/2020"],"award-info":[{"award-number":["UIDB\/05064\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Applied Sciences"],"abstract":"<jats:p>The application of intelligent systems in the higher education sector is an active field of research, powered by the abundance of available data and by the urgency to define effective, data-driven strategies to overcome students\u2019 dropout and improve students\u2019 academic performance. This work applies machine learning techniques to develop prediction models that can contribute to the early detection of students at risk of dropping out or not finishing their degree in due time. It also evaluates the best moment for performing the prediction along the student\u2019s enrollment year. The models are built on data of undergraduate students from a Polytechnic University in Portugal, enrolled between 2009 and 2017, comprising academic, social\u2013demographic, and macroeconomic information at three different phases during the first academic year of the students. Five machine learning algorithms are used to train prediction models at each phase, and the most relevant features for the top performing models are identified. Results show that the best models use Random Forest, either incorporating strategies to deal with the imbalanced nature of the data or using such strategies at the data level. The best results are obtained at the end of the first semester, when some information about the academic performance after enrollment is already available. The overall results compare fairly with some similar works that address the early prediction of students\u2019 dropout or academic performance.<\/jats:p>","DOI":"10.3390\/app13084702","type":"journal-article","created":{"date-parts":[[2023,4,10]],"date-time":"2023-04-10T03:26:06Z","timestamp":1681097166000},"page":"4702","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Multi-Class Phased Prediction of Academic Performance and Dropout in Higher Education"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1358-8638","authenticated-orcid":false,"given":"M\u00f3nica V.","family":"Martins","sequence":"first","affiliation":[{"name":"Polythecnic Institute of Portalegre, 7300-110 Portalegre, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4769-5706","authenticated-orcid":false,"given":"Lu\u00eds","family":"Baptista","sequence":"additional","affiliation":[{"name":"Polythecnic Institute of Portalegre, 7300-110 Portalegre, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0776-1665","authenticated-orcid":false,"given":"Jorge","family":"Machado","sequence":"additional","affiliation":[{"name":"Polythecnic Institute of Portalegre, 7300-110 Portalegre, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6962-3490","authenticated-orcid":false,"given":"Valentim","family":"Realinho","sequence":"additional","affiliation":[{"name":"Polythecnic Institute of Portalegre, 7300-110 Portalegre, Portugal"},{"name":"VALORIZA\u2014Research Center for Endogenous Resource Valorization, 7300-555 Portalegre, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1080\/07294360.2017.1404557","article-title":"Student psychological distress and degree dropout or completion: A discrete-time, competing risks survival analysis","volume":"37","author":"Cvetkovski","year":"2018","journal-title":"High. Educ. Res. Dev."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1080\/01425692.2013.816042","article-title":"Interrupted trajectories: The impact of academic failure on the social mobility of working-class students","volume":"34","author":"Byrom","year":"2013","journal-title":"Br. J. Sociol. Educ."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Rastrollo-Guerrero, J.L., G\u00f3mez-Pulido, J.A., and Dur\u00e1n-Dom\u00ednguez, A. (2020). Analyzing and predicting students\u2019 performance by means of machine learning: A review. Appl. Sci., 10.","DOI":"10.3390\/app10031042"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s41239-020-0177-7","article-title":"Predicting academic success in higher education: Literature review and best practices","volume":"17","author":"Alyahyan","year":"2020","journal-title":"Int. J. Educ. Technol. High. Educ."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1016\/j.dss.2018.09.001","article-title":"Early segmentation of students according to their academic performance: A predictive modelling approach","volume":"115","author":"Freitas","year":"2018","journal-title":"Decis. Support Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.knosys.2018.07.042","article-title":"Predicting academic performance by considering student heterogeneity","volume":"161","author":"Helal","year":"2018","journal-title":"Knowl.-Based Syst."},{"key":"ref_7","first-page":"711","article-title":"Deep learning with data transformation and factor analysis for student performance prediction","volume":"11","author":"Dien","year":"2020","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1111\/bjet.12836","article-title":"The potential for student performance prediction in small cohorts with minimal available attributes","volume":"51","author":"Wakelam","year":"2020","journal-title":"Br. J. Educ. Technol."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"67899","DOI":"10.1109\/ACCESS.2020.2986809","article-title":"Comparing Different Resampling Methods in Predicting Students\u2019 Performance Using Machine Learning Techniques","volume":"8","author":"Ghorbani","year":"2020","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"55462","DOI":"10.1109\/ACCESS.2020.2981905","article-title":"Using data mining techniques to predict student performance to support decision making in university admission systems","volume":"8","author":"Mengash","year":"2020","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/s40561-022-00192-z","article-title":"Educational data mining: Prediction of students\u2019 academic performance using machine learning algorithms","volume":"9","year":"2022","journal-title":"Smart Learn. Environ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"206","DOI":"10.25046\/aj040425","article-title":"Predictive modelling of student dropout using ensemble classifier method in higher education","volume":"4","author":"Hutagaol","year":"2019","journal-title":"Adv. Sci. Technol. Eng. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1080\/21568235.2020.1718520","article-title":"Predicting student dropout: A machine learning approach","volume":"10","author":"Kemper","year":"2020","journal-title":"Eur. J. High. Educ."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kabathova, J., and Drlik, M. (2021). Towards predicting student\u2019s dropout in university courses using different machine learning techniques. Appl. Sci., 11.","DOI":"10.3390\/app11073130"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bottcher, A., Thurner, V., Hafner, T., and Hertle, J. (2021, January 21\u201323). A data science-based approach for identifying counseling needs in first-year students. Proceedings of the IEEE Global Engineering Education Conference, EDUCON, Vienna, Austria.","DOI":"10.1109\/EDUCON46332.2021.9454042"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"133076","DOI":"10.1109\/ACCESS.2021.3115851","article-title":"A real-life machine learning experience for predicting university dropout at different stages using academic data","volume":"9","author":"Preciado","year":"2021","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1007\/s10639-020-10260-x","article-title":"A two-phase machine learning approach for predicting student outcomes","volume":"26","author":"Iatrellis","year":"2021","journal-title":"Educ. Inf. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1007\/s10115-019-01374-x","article-title":"Identifying at-risk students based on the phased prediction model","volume":"62","author":"Chen","year":"2020","journal-title":"Knowl. Inf. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1038\/s41598-021-03867-8","article-title":"Predicting students\u2019 performance in e-learning using learning process and behaviour data","volume":"12","author":"Qiu","year":"2022","journal-title":"Sci. Rep."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3152714","article-title":"Transfer-Learning Methods in Programming Course Outcome Prediction","volume":"4","author":"Lagus","year":"2018","journal-title":"ACM Trans. Comput. Educ."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Nagy, M., and Molontay, R. (2018, January 21\u201323). Predicting Dropout in Higher Education Based on Secondary School Performance. Proceedings of the INES 2018\u2014IEEE 22nd International Conference on Intelligent Engineering Systems, Las Palmas de Gran Canaria, Spain.","DOI":"10.1109\/INES.2018.8523888"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1048","DOI":"10.1007\/s11162-019-09546-y","article-title":"Predicting University Students\u2019 Academic Success and Major Using Random Forests","volume":"60","author":"Beaulac","year":"2019","journal-title":"Res. High. Educ."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1186\/s40537-020-00349-y","article-title":"Boosting methods for multi-class imbalanced data classification: An experimental review","volume":"7","author":"Tanha","year":"2020","journal-title":"J. Big Data"},{"key":"ref_24","first-page":"176","article-title":"Classification with class imbalance problem: A review","volume":"7","author":"Ali","year":"2015","journal-title":"Int. J. Adv. Soft Comput. Its Appl."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1504\/IJKESDP.2011.039875","article-title":"Borderline over-sampling for imbalanced data classification","volume":"3","author":"Nguyen","year":"2011","journal-title":"Int. J. Knowl. Eng. Soft Data Paradig."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1109\/TSMCA.2009.2029559","article-title":"RUSBoost: A hybrid approach to alleviating class imbalance","volume":"40","author":"Seiffert","year":"2010","journal-title":"IEEE Trans. Syst. Man Cybern. Part A Systems Humans"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/3-540-59119-2_166","article-title":"A decision-theoretic generalization of on-line learning and an application to boosting","volume":"904","author":"Freund","year":"1995","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1007\/BF00116251","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Mach. Learn."},{"key":"ref_31","unstructured":"Chen, C., Liaw, A., and Breiman, L. (2004). Using Random Forest to Learn Imbalanced Data, University of California. Technical Report."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1109\/TSMCB.2008.2007853","article-title":"Exploratory undersampling for class-imbalance learning","volume":"39","author":"Liu","year":"2009","journal-title":"IEEE Trans. Syst. Man Cybern. Part Cybern."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Martins, M.V., Tolledo, D., Machado, J., Baptista, L.M., and Realinho, V. (2021). Early Prediction of Student\u2019s Performance in Higher Education: A Case Study, Springer International Publishing.","DOI":"10.1007\/978-3-030-72657-7_16"},{"key":"ref_34","unstructured":"McKinney, W. (July, January 28). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, Austin, TX, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3021","DOI":"10.21105\/joss.03021","article-title":"Seaborn: Statistical data visualization","volume":"6","author":"Waskom","year":"2021","journal-title":"J. Open Source Softw."},{"key":"ref_36","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_37","first-page":"559","article-title":"Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning","volume":"18","author":"Nogueira","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s40537-021-00514-x","article-title":"A literature review on one-class classification and its potential applications in big data","volume":"8","author":"Seliya","year":"2021","journal-title":"J. Big Data"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"5453","DOI":"10.1109\/ACCESS.2020.3002791","article-title":"Academic performance prediction based on multisource, multifeature behavioral data","volume":"9","author":"Zhao","year":"2021","journal-title":"IEEE Access"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Gallego, M.G., Perez de los Cobos, A.P., and Gallego, J.C.G. (2021). Identifying Students at Risk to Academic Dropout in Higher Education. Educ. Sci., 11.","DOI":"10.3390\/educsci11080427"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1177\/0020720916688484","article-title":"Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts","volume":"54","author":"Sultana","year":"2017","journal-title":"Int. J. Electr. Eng. Educ."}],"container-title":["Applied Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2076-3417\/13\/8\/4702\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:12:24Z","timestamp":1760123544000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2076-3417\/13\/8\/4702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,8]]},"references-count":41,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["app13084702"],"URL":"https:\/\/doi.org\/10.3390\/app13084702","relation":{},"ISSN":["2076-3417"],"issn-type":[{"value":"2076-3417","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,8]]}}}