{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T19:13:44Z","timestamp":1774898024919,"version":"3.50.1"},"reference-count":91,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,2,27]],"date-time":"2023-02-27T00:00:00Z","timestamp":1677456000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Canada\u2019s International Development Research Centre, Ottawa, Canada and the Swedish International Development Cooperation Agency","award":["109704-001\/002"],"award-info":[{"award-number":["109704-001\/002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the data imbalance issue into account may lead to an ungeneralized model. In this study, different data balancing techniques were applied to improve prediction accuracy in the minority class while maintaining a satisfactory overall classification performance. Random Over Sampling, Random Under Sampling, Synthetic Minority Over Sampling, SMOTE with Edited Nearest Neighbor and SMOTE with Tomek links were tested, along with three popular classification models: Logistic Regression, Random Forest, and Multi-Layer Perceptron. Publicly accessible datasets from Tanzania and India were used to evaluate the effectiveness of balancing techniques and prediction models. The results indicate that SMOTE with Edited Nearest Neighbor achieved the best classification performance on the 10-fold holdout sample. Furthermore, Logistic Regression correctly classified the largest number of dropout students (57348 for the Uwezo dataset and 13430 for the India dataset) using the confusion matrix as the evaluation matrix. The applications of these models allow for the precise prediction of at-risk students and the reduction of dropout rates.<\/jats:p>","DOI":"10.3390\/data8030049","type":"journal-article","created":{"date-parts":[[2023,2,27]],"date-time":"2023-02-27T03:29:48Z","timestamp":1677468588000},"page":"49","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["Data Balancing Techniques for Predicting Student Dropout Using Machine Learning"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4364-3124","authenticated-orcid":false,"given":"Neema","family":"Mduma","sequence":"first","affiliation":[{"name":"Department of Information and Communication Sciences and Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha P.O. Box 447, Tanzania"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1093\/bib\/bbs006","article-title":"Class-imbalanced classifiers for high-dimensional data","volume":"14","author":"Lin","year":"2013","journal-title":"Brief. Bioinform."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.ins.2013.07.007","article-title":"An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics","volume":"250","author":"Palade","year":"2013","journal-title":"Inf. Sci."},{"key":"ref_3","unstructured":"Krawczyk, B. (2015). Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Springer International Publishing."},{"key":"ref_4","unstructured":"Galar, M., Fern\u00e1ndez, A., Barrenechea, E., Bustince, H., and Herrera, F. (2016). Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Springer International Publishing."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","article-title":"Learning from imbalanced data: Open challenges and future directions","volume":"5","author":"Krawczyk","year":"2016","journal-title":"Prog. Artif. Intell."},{"key":"ref_6","unstructured":"Borowska, K., and Topczewska, M. (2016). Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, Springer International Publishing."},{"key":"ref_7","unstructured":"Mazumder, R.U., Begum, S.A., and Biswas, D. (2015). Proceedings of Fourth International Conference on Soft Computing for Problem Solving, Springer."},{"key":"ref_8","unstructured":"Abdi, L., and Hashemi, S. (2014). Proceedings of the Third International Conference on Soft Computing for Problem Solving, Springer."},{"key":"ref_9","first-page":"338","article-title":"A Survey on Methods to Handle Imbalance Dataset","volume":"4","author":"Sonak","year":"2015","journal-title":"Int. J. Comput. Sci. Mob. Comput."},{"key":"ref_10","first-page":"1552","article-title":"Imbalance class problems in data mining: A review","volume":"14","author":"Ali","year":"2019","journal-title":"Indones. J. Electr. Eng. Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Realinho, V., Machado, J., Baptista, L., and Martins, M.V. (2022). Predicting Student Dropout and Academic Success. Data, 7.","DOI":"10.3390\/data7110146"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.eswa.2013.07.046","article-title":"A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition","volume":"41","author":"Thammasiri","year":"2013","journal-title":"Expert Syst. Appl."},{"key":"ref_13","unstructured":"UNESCO (2017). Estimation of the Numbers and Rates of Out-of-school Children and Adolescents Using Administrative and Household Survey Data, UNESCO Institute for Statistics."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Valles-coral, M.A., Salazar-ram, L., Injante, R., Hernandez-torres, E.A., Ju, J., Navarro-cabrera, J.R., Pinedo, L., and Vidaurre-rojas, P. (2022). Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels. Data, 7.","DOI":"10.3390\/data7110165"},{"key":"ref_15","unstructured":"Mduma, N. (2020). Data Driven Approach for Predicting Student Dropout in Secondary Schools. [Ph.D. Thesis, NM-AIST]."},{"key":"ref_16","unstructured":"Gao, T. (2015). Hybrid Classification Approach of SMOTE and Instance Selection for Imbalanced Datasets. [Ph.D. Thesis, Iowa State University]."},{"key":"ref_17","unstructured":"Hoens, T.R., and Chawla, N.V. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications, John Wiley & Inc."},{"key":"ref_18","first-page":"1","article-title":"Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method","volume":"1","author":"Elhassan","year":"2016","journal-title":"J. Inform. Data Min."},{"key":"ref_19","unstructured":"Santoso, B., Wijayanto, H., Notodiputro, K.A., and Sartono, B. (2017). IOP Conference Series: Earth and Environmental Science, IOP Publishing."},{"key":"ref_20","first-page":"7","article-title":"Influence of minority class instance types on SMOTE imbalanced data oversampling","volume":"74","author":"Skryjomski","year":"2017","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yu, X., Zhou, M., Chen, X., Deng, L., and Wang, L. (2017). Using Class Imbalance Learning for Cross-Company Defect Prediction. Int. Conf. Softw. Eng. Knowl. Eng., 117\u2013122.","DOI":"10.18293\/SEKE2017-035"},{"key":"ref_22","unstructured":"Douzas, G., and Bacao, F. (2017). Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE. arXiv."},{"key":"ref_23","unstructured":"Shilbayeh, S.A. (2015). Cost Sensitive Meta Learning Samar Ali Shilbayeh School of Computing, Science and Engineering, University of Salford."},{"key":"ref_24","first-page":"8","article-title":"Literature Survey on Educational Dropout Prediction","volume":"7","author":"Kumar","year":"2017","journal-title":"Int. J. Educ. Manag. Eng."},{"key":"ref_25","first-page":"225","article-title":"Predicting Students\u2019 Dropout at University Using Artificial Neural Networks","volume":"7","author":"Siri","year":"2015","journal-title":"Ital. J. Sociol. Educ."},{"key":"ref_26","unstructured":"Oancea, B., Dragoescu, R., and Ciucu, S. (2013, January 23\u201325). Predicting Students\u2019 Results in Higher Education Using Neural Networks. Proceedings of the International Conference on Applied Information and Communication Technologies, Baku, Azerbaijan."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.21917\/ijsc.2016.0161","article-title":"Enhanced Prediction of Student Dropouts Using Fuzzy Inference System and Logistic Regression","volume":"6","author":"Saranya","year":"2016","journal-title":"ICTACT J. Soft Comput."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Fei, M., and Yeung, D.Y. (2015, January 14\u201317). Temporal Models for Predicting Student Dropout in Massive Open Online Courses. Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA.","DOI":"10.1109\/ICDMW.2015.174"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1481","DOI":"10.1016\/j.sbspro.2015.02.296","article-title":"A Recommender for Improving the Student Academic Performance","volume":"180","author":"Goga","year":"2015","journal-title":"Procedia Soc. Behav. Sci."},{"key":"ref_30","first-page":"166","article-title":"Exploiting Academic Records for Predicting Student Drop Out: A case study in Brazilian higher education","volume":"7","author":"Sales","year":"2016","journal-title":"J. Inf. Data Manag."},{"key":"ref_31","unstructured":"Nagrecha, S., Dillon, J.Z., and Chawla, N.V. (2017). Proceedings of the 26th International Conference on World Wide Web Companion, ACM."},{"key":"ref_32","unstructured":"Aulck, L., Velagapudi, N., Blumenstock, J., and West, J. (2017). Predicting Student Dropout in Higher Education. ICML Workshop on #Data4Good: Machine Learning in Social Good Applications 2016. arXiv, 16\u201320."},{"key":"ref_33","unstructured":"Halland, R., Igel, C., and Alstrup, S. (2015, January 22\u201323). High-School Dropout Prediction Using Machine Learning: A Danish Large-scale Study. Proceedings of the 23rd European Symposium on Artificial Neural Networks, Bruges, Belgium."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1080\/21568235.2020.1718520","article-title":"Predicting student dropout: A machine learning approach","volume":"10","author":"Kemper","year":"2020","journal-title":"Eur. J. High. Educ."},{"key":"ref_35","first-page":"186332","article-title":"Determinant Factors for Undergraduate Student\u2019s Dropout in Accounting Studies Department of A Brazilian Public University","volume":"34","year":"2018","journal-title":"Fed. Univ. Minas Gerais"},{"key":"ref_36","unstructured":"Nath, S.R., Ferris, D., Kabir, M.M., Chowdhury, T., and Hossain, A. (2017). Transition and Dropout in Lower Income Countries: Case Studies of Secondary Education in Bangladesh and Uganda. World Innov. Summit Educ., Available online: https:\/\/www.wise-qatar.org\/app\/uploads\/2019\/04\/rr.3.2017_brac.pdf."},{"key":"ref_37","unstructured":"Wang, X., and Schneider, H. (2018). A Study of Modelling Approaches for Predicting Dropout in a Business College, Louisiana State University."},{"key":"ref_38","first-page":"1","article-title":"An Analysis of Dropout Predictors within a State High School Graduation Panel","volume":"5","author":"Franklin","year":"2014","journal-title":"Schooling"},{"key":"ref_39","first-page":"249","article-title":"Analytical and experimental investigation of steel friction dampers and horizontal brake pads in chevron frames under cyclic loads","volume":"15","author":"Helou","year":"2018","journal-title":"Issues Inf. Sci. Inf. Technol. Educ."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Aguiar, E., Dame, N., Miller, D., Yuhas, B., and Addison, K.L. (2015). Who, When, and Why: A Machine Learning Approach to Prioritizing Students at Risk of not Graduating High School on Time. ACM, 93\u2013102.","DOI":"10.1145\/2723576.2723619"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Rovira, S., Puertas, E., and Igual, L. (2017). Data-driven System to Predict Academic Grades and Dropout. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0171207"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Mgala, M., and Mbogho, A. (2015, January 15\u201318). Data-driven Intervention-level Prediction Modeling for Academic Performance. Proceedings of the Seventh International Conference on Information and Communication Technologies and Development, Singapore.","DOI":"10.1145\/2737856.2738012"},{"key":"ref_43","first-page":"1","article-title":"Multi-layer Perceptron and Pruning","volume":"1","author":"Voyant","year":"2017","journal-title":"Turk. J. Forecast."},{"key":"ref_44","first-page":"26","article-title":"Multilayer Perceptron: Archi-tecture Optimization and Training","volume":"4","author":"Ramchoun","year":"2016","journal-title":"Int. J. Interact. Multimed. Artif. Intell."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.48084\/etasr.936","article-title":"Comparison of Multilayer Perceptron and Radial Basis Function Neural Networks in Predicting the Success of New Product Development","volume":"7","author":"Fesghandis","year":"2017","journal-title":"Eng. Technol. Appl. Sci. Res."},{"key":"ref_46","first-page":"353","article-title":"Advancements in Multi-Layer Perceptron Training to Improve Classification","volume":"5","author":"Rani","year":"2017","journal-title":"Int. J. Recent Innov. Trends Comput. Commun."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1007\/s12040-015-0602-9","article-title":"Multilayer perceptron neural network for downscaling rainfall in arid region: A case study of Baluchistan, Pakistan","volume":"124","author":"Ahmed","year":"2015","journal-title":"J. Earth Syst. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1529","DOI":"10.3390\/rs70201529","article-title":"Multilayer perceptron neural networks model for meteosat second generation SEVIRI daytime cloud masking","volume":"7","author":"Taravat","year":"2015","journal-title":"Remote Sens."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wu, Z., Lin, W., Zhang, Z., Wen, A., and Lin, L. (2017, January 21\u201324). An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering and IEEE\/IFIP International Conference on Embedded and Ubiquitous Computing, CSE and EUC 2017, Guangzhou, China.","DOI":"10.1109\/CSE-EUC.2017.99"},{"key":"ref_50","first-page":"1","article-title":"Submitted to the Annals of Statistics","volume":"45","author":"Compo","year":"2017","journal-title":"Ann. Stat."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/s11749-016-0481-7","article-title":"A Random Forest Guided Tour","volume":"25","author":"Biau","year":"2015","journal-title":"TEST"},{"key":"ref_52","first-page":"196","article-title":"A Comparative Study on Decision Tree and Random Forest Using R Tool","volume":"4","author":"Prajwala","year":"2015","journal-title":"Ijarcce"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1145\/3130332.3130346","article-title":"Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval","volume":"Volume 51","author":"Ibrahim","year":"2017","journal-title":"ACM SIGIR Forum"},{"key":"ref_54","first-page":"58","article-title":"Random Forest for Land Cover Classification","volume":"4","author":"Kulkarni","year":"2016","journal-title":"Int. J. Recent Innov. Trends Comput. Commun."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"2449","DOI":"10.1093\/bioinformatics\/bty087","article-title":"A new approach for interpreting Random Forest models and its application to the biology of ageing","volume":"34","author":"Fabris","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"251","DOI":"10.23956\/ijarcsse\/V7I1\/01113","article-title":"Random Forest: A Review","volume":"7","author":"Goel","year":"2017","journal-title":"Int. J. Adv. Res. Comput. Sci. Softw. Eng."},{"key":"ref_57","first-page":"24","article-title":"Classification of the Fire Station Requirement with Using Machine Learning Algorithms","volume":"11","year":"2019","journal-title":"I.J. Inf. Technol. Comput. Sci."},{"key":"ref_58","unstructured":"Klusowski, J.M. (2018). Complete Analysis of a Random Forest Model, Rutgers University."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Tyralis, H., and Papacharalampous, G. (2017). Variable selection in time series forecasting using random forests. Algorithms, 10.","DOI":"10.3390\/a10040114"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"285","DOI":"10.5194\/isprs-archives-XLI-B2-285-2016","article-title":"Modeling urban dynamics using random forest: Implementing Roc and Toc for model evaluation","volume":"41","author":"Ahmadlou","year":"2016","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.bdr.2017.07.003","article-title":"Random Forests for Big Data","volume":"9","author":"Genuer","year":"2015","journal-title":"Big Data Res."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"357","DOI":"10.12691\/ajams-2-6-1","article-title":"Application of Binary Logistic Regression in Assessing Risk Factors Affecting the Prevalence of Toxoplasmosis","volume":"2","author":"Kudakwashe","year":"2014","journal-title":"Am. J. Appl. Math. Stat."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"12","DOI":"10.11613\/BM.2014.003","article-title":"Understanding logistic regression analysis","volume":"24","author":"Sperandei","year":"2014","journal-title":"Biochem. Med."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"154","DOI":"10.4040\/jkan.2013.43.2.154","article-title":"An introduction to logistic regression: From basic concepts to interpretation with particular attention to nursing domain","volume":"43","author":"Park","year":"2013","journal-title":"J. Korean Acad. Nurs."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"120","DOI":"10.5539\/ijsp.v6n6p120","article-title":"A New Method for Logistic Model Assessment","volume":"6","author":"Shu","year":"2017","journal-title":"Int. J. Stat. Probab."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Ameri, S., Fard, M.J., Chinnam, R.B., and Reddy, C.K. (2016). Survival Analysis based Framework for Early Prediction of Student Dropouts. ACM, 903\u2013912.","DOI":"10.1145\/2983323.2983351"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Bhanpuri, N., Ghani, R., and Addison, K.L. (2015, January 10-13). A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia.","DOI":"10.1145\/2783258.2788620"},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"042087","DOI":"10.1088\/1757-899X\/263\/4\/042087","article-title":"Assessment of various supervised learning algorithms using different performance metrics","volume":"263","author":"Laxkar","year":"2017","journal-title":"IOP Conf. Ser. Mater. Sci. Eng."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Maggo, S., and Gupta, C. (2014). A Machine Learning based Efficient Software Reusability Prediction Model for Java Based Object Oriented Software. I.J. Inf. Technol. Comput. Sci., 1\u201313.","DOI":"10.5815\/ijitcs.2014.02.01"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Liang, J., Li, C., and Zheng, L. (2016, January 23\u201325). Machine learning application in MOOCs: Dropout prediction. Proceedings of the ICCSE 2016 11th International Conference on Computer Science and Education, Nagoya, Japan.","DOI":"10.1109\/ICCSE.2016.7581554"},{"key":"ref_71","first-page":"83","article-title":"Class imbalance problem in data mining: Review","volume":"2","author":"Longadge","year":"2013","journal-title":"Int. J. Comput. Sci. Netw."},{"key":"ref_72","first-page":"11","article-title":"Prediction of student dropout from a university in Turkey using data balancing techniques","volume":"108","author":"Yilmaz","year":"2020","journal-title":"Comput. Educ."},{"key":"ref_73","first-page":"1","article-title":"Applying data balancing techniques to predict student dropout using machine learning","volume":"5","author":"Mesut","year":"2017","journal-title":"Int. J. Adv. Comput. Technol."},{"key":"ref_74","first-page":"1","article-title":"Prediction of Student Dropouts Using Machine Learning Techniques","volume":"5","author":"Antar","year":"2020","journal-title":"Int. J. Comput. Appl."},{"key":"ref_75","first-page":"430","article-title":"Application of data balancing techniques to predict student dropout using machine learning","volume":"11","author":"Jain","year":"2018","journal-title":"Int. J. Comput. Appl."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Barros, T.M., Neto, P.A., Silva, I., and Guedes, L.A. (2019). Predictive models for imbalanced data: A school dropout perspective. Educ. Sci., 9.","DOI":"10.3390\/educsci9040275"},{"key":"ref_77","first-page":"249","article-title":"Supervised Machine Learning: A Review of Classification Techniques","volume":"31","author":"Kotsiantis","year":"2007","journal-title":"Informatica"},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1145\/1007730.1007735","article-title":"A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data","volume":"6","author":"Batista","year":"2004","journal-title":"SIGKDD Explor. Newsl."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.dss.2012.01.016","article-title":"Preprocessing Unbalanced Data Using Support Vector Machine","volume":"53","author":"Farquad","year":"2012","journal-title":"Decis. Support Syst."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s10115-011-0465-6","article-title":"SMOTE-RSB *: A Hybrid Preprocessing Approach Based on Oversampling and Undersampling for High Imbalanced Data-sets Using SMOTE and Rough Sets Theory","volume":"33","author":"Ramentol","year":"2012","journal-title":"Knowl. Inf. Syst."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"5718","DOI":"10.1016\/j.eswa.2008.06.108","article-title":"Cluster-based Under-sampling Approaches for Imbalanced Data Distributions","volume":"36","author":"Yen","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1109\/TR.2013.2259203","article-title":"Using Class Imbalance Learning for Software Defect Prediction","volume":"62","author":"Wang","year":"2013","journal-title":"IEEE Trans. Reliab."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"4626","DOI":"10.1016\/j.eswa.2008.05.027","article-title":"Handling Class Imbalance in Customer Churn Prediction","volume":"36","author":"Burez","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Prusa, J., Khoshgoftaar, T.M., DIttman, D.J., and Napolitano, A. (2015, January 13\u201315). Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data. Proceedings of the IEEE 16th International Conference on Information Reuse and Integration, IRI 2015, San Francisco, CA, USA.","DOI":"10.1109\/IRI.2015.39"},{"key":"ref_85","unstructured":"Aulck, L., Aras, R., Li, L., Heureux, C.L., Lu, P., and West, J. (2017). STEM-ming the Tide: Predicting STEM Attrition Using Student Transcript Data. arXiv."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"1250003","DOI":"10.1142\/S0219720012500035","article-title":"Adjusted Geometric-mean: A Novel Performance Measure for Imbalanced Bioinformatics Datasets Learning","volume":"10","author":"Batuwita","year":"2012","journal-title":"J. Bioinform. Comput. Biol."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.1016\/j.eswa.2014.08.025","article-title":"Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction","volume":"42","author":"Kim","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_88","unstructured":"Mgala, M. (2016). Investigating Prediction Modelling of Academic Performance for Students in Rural Schools in Kenya. [Ph.D. Thesis, University of Cape Town]."},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1007\/s13748-019-00172-4","article-title":"Instance Selection Improves Geometric Mean Accuracy: A Study on Imbalanced Data Classification","volume":"8","author":"Kuncheva","year":"2019","journal-title":"Prog. Artif. Intell."},{"key":"ref_90","unstructured":"Hakim, A. (2019). Performance Evaluation of Machine Learning Techniques for Early Prediction of Brain Strokes. [Ph.D. Thesis, United International University]."},{"key":"ref_91","unstructured":"Amin, M.Z., and Ali, A. (2017). Performance Evaluation of Supervised Machine Learning Classifiers for Predicting Healthcare Operational Decisions. Tech. Rep."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/3\/49\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:43:30Z","timestamp":1760121810000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/3\/49"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,27]]},"references-count":91,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["data8030049"],"URL":"https:\/\/doi.org\/10.3390\/data8030049","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,27]]}}}