{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T00:42:16Z","timestamp":1779928936841,"version":"3.53.1"},"reference-count":52,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2021,4,20]],"date-time":"2021-04-20T00:00:00Z","timestamp":1618876800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Data mining is employed to extract useful information and to detect patterns from often large data sets, closely related to knowledge discovery in databases and data science. In this investigation, we formulate models based on machine learning algorithms to extract relevant information predicting student retention at various levels, using higher education data and specifying the relevant variables involved in the modeling. Then, we utilize this information to help the process of knowledge discovery. We predict student retention at each of three levels during their first, second, and third years of study, obtaining models with an accuracy that exceeds 80% in all scenarios. These models allow us to adequately predict the level when dropout occurs. Among the machine learning algorithms used in this work are: decision trees, k-nearest neighbors, logistic regression, naive Bayes, random forest, and support vector machines, of which the random forest technique performs the best. We detect that secondary educational score and the community poverty index are important predictive variables, which have not been previously reported in educational studies of this type. The dropout assessment at various levels reported here is valid for higher education institutions around the world with similar conditions to the Chilean case, where dropout rates affect the efficiency of such institutions. Having the ability to predict dropout based on student\u2019s data enables these institutions to take preventative measures, avoiding the dropouts. In the case study, balancing the majority and minority classes improves the performance of the algorithms.<\/jats:p>","DOI":"10.3390\/e23040485","type":"journal-article","created":{"date-parts":[[2021,4,20]],"date-time":"2021-04-20T01:41:25Z","timestamp":1618882885000},"page":"485","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":124,"title":["Knowledge Discovery for Higher Education Student Retention Based on Data Mining: Machine Learning Algorithms and Case Study in Chile"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2803-3579","authenticated-orcid":false,"given":"Carlos A.","family":"Palacios","sequence":"first","affiliation":[{"name":"Departamento de Obras Civiles, Universidad Cat\u00f3lica del Maule, Talca 3480112, Chile"},{"name":"Programa de Mag\u00edster en Gesti\u00f3n de Operaciones, Facultad de Ingenier\u00eda, Universidad de Talca, Curic\u00f3 3344158, Chile"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3506-3787","authenticated-orcid":false,"given":"Jos\u00e9 A.","family":"Reyes-Su\u00e1rez","sequence":"additional","affiliation":[{"name":"Departamento de Bioinform\u00e1tica, Facultad de Ingenier\u00eda, Universidad de Talca, Talca 3460000, Chile"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9355-1308","authenticated-orcid":false,"given":"Lorena A.","family":"Bearzotti","sequence":"additional","affiliation":[{"name":"Escuela de Ingenier\u00eda en Transporte, Pontificia Universidad Cat\u00f3lica de Valpara\u00edso, Valpara\u00edso 2362807, Chile"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4755-3270","authenticated-orcid":false,"given":"V\u00edctor","family":"Leiva","sequence":"additional","affiliation":[{"name":"Escuela de Ingenier\u00eda Industrial, Pontificia Universidad Cat\u00f3lica de Valpara\u00edso, Valpara\u00edso 2362807, Chile"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1832-4444","authenticated-orcid":false,"given":"Carolina","family":"Marchant","sequence":"additional","affiliation":[{"name":"Facultad de Ciencias B\u00e1sicas, Universidad Cat\u00f3lica del Maule, Talca 3480112, Chile"},{"name":"ANID-Millennium Science Initiative Program-Millennium Nucleus Center for the Discovery of Structures in Complex Data, Santiago 7820244, Chile"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,20]]},"reference":[{"key":"ref_1","unstructured":"Berry, M., and Linoff, G. (1997). Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners, Wiley."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1016\/j.techfore.2019.01.005","article-title":"Recent developments of control charts, identification of big data sources and future trends of current research","volume":"144","author":"Aykroyd","year":"2019","journal-title":"Technol. Forecast. Soc. Chang."},{"key":"ref_3","first-page":"37","article-title":"From data mining to knowledge discovery in databases","volume":"17","author":"Fayyad","year":"1996","journal-title":"AI Mag."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Adhikari, A., and Adhikari, J. (2015). Advances in Knowledge Discovery in Databases, Springer.","DOI":"10.1007\/978-3-319-13212-9"},{"key":"ref_5","unstructured":"Tan, P., Steinbach, M., Karpatne, A., and Kumar, V. (2018). Introduction to Data Mining, Pearson Education."},{"key":"ref_6","unstructured":"Hastie, T., and Tibshirani, R. (2016). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1016\/j.dss.2010.06.003","article-title":"A comparative analysis of machine learning techniques for student retention management","volume":"49","author":"Delen","year":"2010","journal-title":"Decis. Support Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1150","DOI":"10.1016\/j.dss.2012.10.040","article-title":"A comparative analysis of machine learning systems for measuring the impact of knowledge management practices","volume":"54","author":"Delen","year":"2013","journal-title":"Decis. Support Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1007\/s11162-006-9009-4","article-title":"Institutional selectivity and institutional expenditures: Examining organizational factors that contribute to retention and graduation","volume":"47","author":"Schuh","year":"2006","journal-title":"Res. High. Educ."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Hooshyar, D., Pedaste, M., and Yang, Y. (2020). Mining educational data to predict students\u2019 performance through procrastination behavior. Entropy, 22.","DOI":"10.3390\/e22010012"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Qu, S., Li, K., Wu, B., Zhang, X., and Zhu, K. (2019). Predicting student performance and deficiency in mastering knowledge points in MOOCs using multi-task learning. Entropy, 21.","DOI":"10.3390\/e21121216"},{"key":"ref_12","unstructured":"Aguayo, I., and G\u00f3mez, G. (2011). Evolution in the Number of Enrollments in the Higher Education System, 1983\u20132010, Chilean Higher Education Information System. (In Spanish)."},{"key":"ref_13","unstructured":"SIES (2018). Registered Ration Report in Higher Education in Chile, Chilean Ministry of Education. (In Spanish)."},{"key":"ref_14","unstructured":"MINEDUC (2012). Dropout in Higher Education in Chile, Chilean Ministry of Education (MINEDUC). (In Spanish)."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1007\/s10639-017-9616-z","article-title":"Educational data mining applications and tasks: A survey of the last 10 years","volume":"23","author":"Bakhshinategh","year":"2018","journal-title":"Educ. Inf. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"89","DOI":"10.3102\/00346543045001089","article-title":"Dropout of higher education: A theoretical synthesis of recent research","volume":"45","author":"Tinto","year":"1975","journal-title":"Rev. Educ. Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"91","DOI":"10.31619\/caledu.n17.409","article-title":"Models of analysis of student desertion in higher education","volume":"17","author":"Himmel","year":"2002","journal-title":"Calid. Educ."},{"key":"ref_18","unstructured":"McGaw, B., Peterson, P., and Baker, E. (2010). Data mining for education. International Encyclopedia of Education, Elsevier."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.eswa.2006.04.005","article-title":"Educational data mining: A survey from 1995 to 2005","volume":"33","author":"Romero","year":"2007","journal-title":"Expert Syst. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1002\/widm.1075","article-title":"Data mining in education","volume":"3","author":"Romero","year":"2013","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1109\/TSMCC.2010.2053532","article-title":"Educational data mining: A review of the state of the art","volume":"40","author":"Romero","year":"2010","journal-title":"IEE Trans. Syst. Man Cybern. Part Appl. Rev."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-319-02738-8_1","article-title":"Which contribution does EDM provide to computer-based learning environments?","volume":"524","author":"Bousbia","year":"2014","journal-title":"Stud. Comput. Intell."},{"key":"ref_23","unstructured":"Dekker, G.W., Pechenizkiy, M., and Vleeshouwers, J.M. (2009, January 1\u20133). Predicting students dropout: A case study. Proceedings of the Second International Working Group on Educational Data Mining, Cordoba, Spain."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1016\/j.compedu.2009.05.010","article-title":"Dropout prediction in e-learning courses through the combination of machine learning techniques","volume":"53","author":"Lykourentzou","year":"2009","journal-title":"Comput. Educ."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"14984","DOI":"10.1016\/j.eswa.2011.05.048","article-title":"Learning patterns of university student retention","volume":"38","author":"Nandeshwar","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_26","unstructured":"Fischer-Angulo, E.S. (2012). Model for the Automation of the Process of Determining the Risk of Desertion in University Students. [Master\u2019s Thesis, Universidad de Chile]. (In Spanish)."},{"key":"ref_27","first-page":"44","article-title":"Deep learning approach for predicting university dropout: A case study at Roma Tre University","volume":"16","author":"Agrusti","year":"2020","journal-title":"J. E-Learn. Knowl. Soc."},{"key":"ref_28","first-page":"743","article-title":"Early prediction of university dropouts\u2014A random forest approach","volume":"240","author":"Behr","year":"2020","journal-title":"J. Econ. Stat."},{"key":"ref_29","unstructured":"Bogard, M., Helbig, T., Huff, G., and James, C. (2014, June 19). A Comparison of Empirical Models for Predicting Student Retention. Working Paper. Available online: https:\/\/www.wku.edu\/instres\/documents\/comparison_of_empirical_models.pdf."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bittencourt, I., Cukurova, M., Muldner, K., Luckin, R., and Mill\u00e1n, E. (2020). Student Dropout Prediction. Artificial Intelligence in Education, Springer.","DOI":"10.1007\/978-3-030-52240-7"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Pena-Ayala, A. (2014). Modeling student performance in higher education using data mining. Educational Data Mining, Springer.","DOI":"10.1007\/978-3-319-02738-8"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1007\/s13042-015-0341-x","article-title":"Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings","volume":"8","author":"Boongoen","year":"2017","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lauria, E., Baron, J., Devireddy, M., Sundararaju, V., and Jayaprakash, S. (2012). Mining academic data to improve college student retention: An open source perspective. Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, ACM.","DOI":"10.1145\/2330601.2330637"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"241","DOI":"10.3926\/jotse.922","article-title":"Predicting computer engineering students dropout in cuban higher education with pre-enrollment and early performance data","volume":"10","author":"Callejas","year":"2020","journal-title":"J. Technol. Sci. Educ."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Manh\u00e3es, L.M., da Cruz, S.M., and Zimbrao, G. (2014). Wave: An architecture for predicting dropout in undergraduate courses using edm. Proceedings of the 29th Annual ACM Symposium on Applied Computing, ACM.","DOI":"10.1145\/2554850.2555135"},{"key":"ref_36","unstructured":"Mellalieu, P. (August, January 31). Predicting success, excellence, and retention from students early course performance: Progress results from a data-mining-based decision support system in a first year tertiary education program. Proceedings of the International Conference of the International Council for Higher Education, Miami, FL, USA."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"563","DOI":"10.2190\/CS.16.4.e","article-title":"Exploring student characteristics of retention that lead to graduation in higher education using data mining models","volume":"16","author":"Raju","year":"2015","journal-title":"J. Coll. Stud. Retention: Res. Theory Pract."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Rodr\u00edguez-Mu\u00f1iz, L.J., Bernardo, A.B., Esteban, M., and D\u00edaz, I. (2019). Dropout and transfer paths: What are the risky profiles when analyzing university persistence with machine learning techniques?. PLoS ONE, 14.","DOI":"10.1371\/journal.pone.0218796"},{"key":"ref_39","first-page":"33","article-title":"Data mining: Prediction of school desertion using the algorithm of decision trees and the algorithm of the nearest k neighbors","volume":"779","author":"Valero","year":"2005","journal-title":"Ene"},{"key":"ref_40","first-page":"113","article-title":"Mining education data to predict students retention: A comparative study","volume":"10","author":"Yadav","year":"2012","journal-title":"Int. J. Comput. Sci. Inf. Secur."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"307","DOI":"10.6339\/JDS.2010.08(2).574","article-title":"A data mining approach for identifying predictors of student retention from sophomore to junior year","volume":"8","author":"Yu","year":"2010","journal-title":"J. Data Sci."},{"key":"ref_42","unstructured":"Guti\u00e9rrez-Salazar, H. (2010). Proposed Extension of Kohonen Self-Organized Maps Using Fuzzy Logic to Be Used in Data Mining, a Practical Case. [Master\u2019s Thesis, Universidad Cat\u00f3lica del Maule]. (In Spanish)."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Roy, R. (2001). Building the KDD Roadmap. Industrial Knowledge Management, Springer.","DOI":"10.1007\/978-1-4471-0351-6"},{"key":"ref_44","unstructured":"Olson, D.L., and Delen, D. (2008). Advanced Data Mining Techniques, Springer."},{"key":"ref_45","unstructured":"Yang, Y., and Pedersen, J. (1997, January 8\u201312). A comparative study on feature selection in text categorization. Proceedings of the Fourteenth International Conference on Machine Learning, San Francisco, CA, USA."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"156","DOI":"10.19101\/IJACR.2018.839045","article-title":"Machine learning approach for reducing students dropout rates","volume":"9","author":"Mduma","year":"2019","journal-title":"Int. J. Adv. Comput. Res."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_48","unstructured":"Daniel, W.W. (1990). Applied Nonparametric Statistics, PWS-Kent Pulisher."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Rodriguez-Fdez, I., Canosa, A., Mucientes, M., and Bugarin, A. (2015, January 2\u20135). STAC: A web platform for the comparison of algorithms using statistical tests. Proceedings of the 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Istanbul, Turkey.","DOI":"10.1109\/FUZZ-IEEE.2015.7337889"},{"key":"ref_50","unstructured":"Social Observatory (2013). Incidence of Poverty at the Community Level, According to Estimation Methodology for Small Areas, Chile 2009 and 2011, Ministry of Social Development. (In Spanish)."},{"key":"ref_51","unstructured":"Arrau, F., and Loiseau, V. (2003). Dropout in Higher Education in Chile, Library of the National Congress of Chile. (In Spanish)."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"Smote: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/4\/485\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:49:59Z","timestamp":1760161799000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/4\/485"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,20]]},"references-count":52,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["e23040485"],"URL":"https:\/\/doi.org\/10.3390\/e23040485","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,20]]}}}