{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T19:01:50Z","timestamp":1771614110229,"version":"3.50.1"},"reference-count":28,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T00:00:00Z","timestamp":1754611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Office of Data and Strategic Analytics at Louisiana State University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Predicting undergraduate student success is critical for informing timely interventions and improving outcomes in higher education. This study leverages over a decade of historical data from Louisiana State University (LSU) to forecast graduation outcomes using advanced machine learning techniques, with a focus on convolutional autoencoders (CAEs). We detail the data processing and transformation steps, including feature selection and imputation, to construct a robust dataset. The CAE effectively extracts meaningful latent features, validated through low-dimensional t-SNE visualizations that reveal clear clusters based on class labels, differentiating students likely to graduate from those at risk. A two-year gap strategy is introduced to ensure rigorous evaluation and simulate real-world conditions by predicting outcomes on unseen future data. Our results demonstrate the promise of CAE-derived embeddings for dimensionality reduction and computational efficiency, with competitive performance in downstream classification tasks. While models trained on embeddings showed slightly reduced performance compared to raw input data, with accuracies of 83% and 85%, respectively, their compactness and computational efficiency highlight their potential for large-scale analyses. The study emphasizes the importance of rigorous preprocessing, feature engineering, and evaluation protocols. By combining these approaches, we provide actionable insights and adaptive modeling strategies to support robust and generalizable predictive systems, enabling educators and administrators to enhance student success initiatives in dynamic educational environments.<\/jats:p>","DOI":"10.3390\/make7030080","type":"journal-article","created":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T09:56:42Z","timestamp":1754647002000},"page":"80","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Harnessing Large-Scale University Registrar Data for Predictive Insights: A Data-Driven Approach to Forecasting Undergraduate Student Success with Convolutional Autoencoders"],"prefix":"10.3390","volume":"7","author":[{"given":"Mohammad Erfan","family":"Shoorangiz","sequence":"first","affiliation":[{"name":"Department of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA 70803, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6204-2869","authenticated-orcid":false,"given":"Michal","family":"Brylinski","sequence":"additional","affiliation":[{"name":"Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA"},{"name":"Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,8]]},"reference":[{"key":"ref_1","unstructured":"National Center for Education Statistics (2024, October 01). Undergraduate Retention and Graduation Rates, Available online: https:\/\/nces.ed.gov\/programs\/coe\/indicator\/ctr."},{"key":"ref_2","first-page":"5","article-title":"The impact of socioeconomic status on educational attainment: A comprehensive review","volume":"12","author":"Gupta","year":"2024","journal-title":"Int. J. Educ. Res."},{"key":"ref_3","first-page":"1","article-title":"Factors affecting the graduation rates of university students from underrepresented populations","volume":"11","author":"Creighton","year":"2007","journal-title":"Int. Electron. J. Leadersh. Learn."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"19","DOI":"10.14507\/epaa.v12n19.2004","article-title":"Predicting higher education graduation rates from institutional characteristics and resource allocation","volume":"12","author":"Hamrick","year":"2004","journal-title":"Educ. Policy Anal. Arch."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Martinez, L.A.J., Sood, K., and Mahto, R. (2024). Early detection of at-risk students using machine learning. arXiv.","DOI":"10.1007\/978-3-031-85930-4_36"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1109\/JSTSP.2017.2692560","article-title":"A machine learning approach for tracking and predicting student performance in degree programs","volume":"115","author":"Xu","year":"2017","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_7","unstructured":"Pojon, M. (2017). Using Machine Learning to Predict Student Performance. [Master\u2019s Thesis, University of Tampere]."},{"key":"ref_8","first-page":"13247","article-title":"Early prediction of student academic performance based on machine learning algorithms: A case study of bachelor\u2019s degree students in KSA","volume":"29","author":"Algarni","year":"2023","journal-title":"Educ. Inf. Technol."},{"key":"ref_9","unstructured":"Anderson, H., Boodhwani, A., and Baker, R.S. (2019, January 4\u20138). Predicting graduation at a public R1 university. Proceedings of the 9th International Learning Analytics and Knowledge Conference, Tempe, AZ, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"23451","DOI":"10.1109\/ACCESS.2024.3361479","article-title":"Predicting university student graduation using academic performance and machine learning: A systematic literature review","volume":"12","author":"Pelima","year":"2024","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1016\/j.compbiomed.2017.08.022","article-title":"deep convolutional neural network model to classify heartbeats","volume":"89","author":"Acharya","year":"2017","journal-title":"Comput. Biol. Med."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"e2389","DOI":"10.7717\/peerj-cs.2389","article-title":"Deep learning-based anomaly detection using one-dimensional convolutional neural networks (1D CNN) in machine centers (MCT) and computer numerical control (CNC) machines","volume":"10","author":"Athar","year":"2024","journal-title":"PeerJ Comput. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"127229","DOI":"10.1109\/ACCESS.2023.3332125","article-title":"One-Dimensional Convolutional Neural Network Model for Local Road Annual Average Daily Traffic Estimation","volume":"11","author":"Mathew","year":"2023","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"066053","DOI":"10.1088\/1741-2552\/ac4430","article-title":"A 1D CNN for high accuracy classification and transfer learning in motor imagery EEG-based brain-computer interface","volume":"18","author":"Mattioli","year":"2022","journal-title":"J. Neural Eng."},{"key":"ref_16","first-page":"73","article-title":"A Review of the F-Measure: Its History, Properties, Criticism, and Alternatives","volume":"56","author":"Christen","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1093\/aje\/kwf068","article-title":"Geocoding and monitoring of US socioeconomic inequalities in mortality and cancer incidence: Does the choice of area-based measure and geographic level matter? The Public Health Disparities Geocoding Project","volume":"156","author":"Krieger","year":"2002","journal-title":"Am. J. Epidemiol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1177\/02537176211046525","article-title":"Z scores, standard scores, and composite test scores explained","volume":"43","author":"Andrade","year":"2021","journal-title":"Indian. J. Psychol. Med."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Masci, J., Meier, U., Cire\u015fan, D., and Schmidhuber, J. (2011). Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. Artificial Neural Networks and Machine Learning\u2014ICANN 2011, Proceedings of the International Conference on Artificial Neural Networks, Espoo, Finland, 14\u201317 June 2011, Springer.","DOI":"10.1007\/978-3-642-21735-7_7"},{"key":"ref_21","unstructured":"Xu, B., Wang, N., Chen, T., and Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","article-title":"A systematic study of the class imbalance problem in convolutional neural networks","volume":"106","author":"Buda","year":"2018","journal-title":"Neural Netw."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest neighbor pattern classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_25","first-page":"2579","article-title":"Visualizing high-dimensional data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","first-page":"1625","article-title":"Handling missing values when applying classification models","volume":"8","author":"Provost","year":"2007","journal-title":"J. Mach. Learn. Res."},{"key":"ref_27","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention Is all you need. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1145\/2523813","article-title":"A survey on concept drift adaptation","volume":"46","author":"Gama","year":"2014","journal-title":"ACM Comput. Surv."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/80\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:26:17Z","timestamp":1760034377000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/80"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,8]]},"references-count":28,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["make7030080"],"URL":"https:\/\/doi.org\/10.3390\/make7030080","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,8]]}}}