{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T20:49:00Z","timestamp":1777409340205,"version":"3.51.4"},"reference-count":60,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T00:00:00Z","timestamp":1740009600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Education is crucial for leading a productive life and obtaining necessary resources. Higher education institutions are progressively incorporating artificial intelligence into conventional teaching methods as a result of innovations in technology. As a high academic record raises a university\u2019s ranking and increases student career chances, predicting learning success has been a central focus in education. Both performance analysis and providing high-quality instruction are challenges faced by modern schools. Maintaining high academic standards, juggling life and academics, and adjusting to technology are problems that students must overcome. In this study, we present a comprehensive dataset, SAPEx-D (Student Academic Performance Exploration), designed to predict student performance, encompassing a wide array of personal, familial, academic, and behavioral factors. Our data collection effort at Air University, Islamabad, Pakistan, involved both online and paper questionnaires completed by students across multiple departments, ensuring diverse representation. After meticulous preprocessing to remove duplicates and entries with significant missing values, we retained 494 valid responses. The dataset includes detailed attributes such as demographic information, parental education and occupation, study habits, reading frequencies, and transportation modes. To facilitate robust analysis, we encoded ordinal attributes using label encoding and nominal attributes using one-hot encoding, expanding our dataset from 38 to 88 attributes. Feature scaling was performed to standardize the range and distribution of data, using a normalization technique. Our analysis revealed that factors such as degree major, parental education, reading frequency, and scholarship type significantly influence student performance. The machine learning models applied to this dataset, including Gradient Boosting and Random Forest, demonstrated high accuracy and robustness, underscoring the dataset\u2019s potential for insightful academic performance prediction. In terms of model performance, Gradient Boosting achieved an accuracy of 68.7% and an F1-score of 68% for the eight-class classification task. For the three-class classification, Random Forest outperformed other models, reaching an accuracy of 80.8% and an F1-score of 78%. These findings highlight the importance of comprehensive data in understanding and predicting academic outcomes, paving the way for more personalized and effective educational strategies.<\/jats:p>","DOI":"10.3390\/data10030027","type":"journal-article","created":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T04:03:17Z","timestamp":1740024197000},"page":"27","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["SAPEx-D: A Comprehensive Dataset for Predictive Analytics in Personalized Education Using Machine Learning"],"prefix":"10.3390","volume":"10","author":[{"given":"Muhammad","family":"Adnan Aslam","sequence":"first","affiliation":[{"name":"Department of Creative Technologies, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fiza","family":"Murtaza","sequence":"additional","affiliation":[{"name":"Department of Creative Technologies, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Ehatisham Ul Haq","sequence":"additional","affiliation":[{"name":"Department of Creative Technologies, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0202-9061","authenticated-orcid":false,"given":"Amanullah","family":"Yasin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bahria University, BSEAS, H11, Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9087-4814","authenticated-orcid":false,"given":"Numan","family":"Ali","sequence":"additional","affiliation":[{"name":"Department of Creative Technologies, Faculty of Computing and Artificial Intelligence (FCAI), Air University, Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Thai-Nghe, N., Horv\u2019th, T., and Schmidt-Thieme, L. (2011, January 6\u20138). Personalized Forecasting Student Performance. Proceedings of the 2011 IEEE 11th International Conference on Advanced Learning Technologies, Athens, GA, USA.","DOI":"10.1109\/ICALT.2011.130"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"George, B., and Wooden, O. (2023). Managing the Strategic Transformation of Higher Education through Artificial Intelligence. Adm. Sci., 13.","DOI":"10.3390\/admsci13090196"},{"key":"ref_3","first-page":"188","article-title":"Study Factors for Student Performance Applying Data Mining Regression Model Approach","volume":"21","author":"Khan","year":"2021","journal-title":"IJCSNS Int. J. Comput. Sci. Netw. Secur."},{"key":"ref_4","first-page":"993","article-title":"Factors Affecting Student Performance in E-Learning: A Case Study of Higher Educational Institutions in Indonesia","volume":"8","author":"Marlina","year":"2021","journal-title":"J. Asian Financ. Econ. Bus."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Albreiki, B., Zaki, N., and Alashwal, H. (2021). A Systematic Literature Review of Student\u2019 Performance Prediction Using Machine Learning Techniques. Educ. Sci., 11.","DOI":"10.3390\/educsci11090552"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"5:1","DOI":"10.1147\/JRD.2015.2458631","article-title":"On early prediction of risks in academic performance for students","volume":"59","author":"Ikbal","year":"2015","journal-title":"IBM J. Res. Dev."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.eswa.2006.04.005","article-title":"Educational data mining: A survey from 1995 to 2005","volume":"33","author":"Romero","year":"2007","journal-title":"Expert Syst. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ali, N., and Ullah, S. (2024). The effect of interactive tutorial information and purpose built virtual chemistry laboratory on students\u2019 performance. Multimed. Tools Appl., 1\u201320.","DOI":"10.1007\/s11042-024-19808-2"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"012005","DOI":"10.1088\/1742-6596\/1496\/1\/012005","article-title":"Predicting Student Drop-Out in Higher Institution Using Data Mining Techniques","volume":"1496","author":"Nasir","year":"2020","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_10","first-page":"1","article-title":"Enhancing Student Performance Prediction via Educational Data Mining on Academic data","volume":"23","author":"Alamgir","year":"2023","journal-title":"Inform. Educ."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Yun, Y., An, R., Cui, J., Dai, H., and Shang, X. (2021). Educational Data Mining Techniques for Student Performance Prediction: Method Review and Comparison Analysis. Front. Psychol., 12.","DOI":"10.3389\/fpsyg.2021.698490"},{"key":"ref_12","first-page":"1","article-title":"A Review: An Approach for Secondary School Students Performance using Machine Learning and Data Mining","volume":"12","author":"Patel","year":"2024","journal-title":"Int. J. Intell. Syst. Appl. Eng."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gull, H., Saqib, M., Iqbal, S.Z., and Saeed, S. (2020, January 6\u20138). Improving Learning Experience of Students by Early Prediction of Student Performance using Machine Learning. Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangaluru, India.","DOI":"10.1109\/INOCON50539.2020.9298266"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"5264","DOI":"10.1109\/ACCESS.2019.2963503","article-title":"Analysis of the Factors Influencing Learners\u2019 Performance Prediction With Learning Analytics","volume":"8","author":"Pong","year":"2020","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Uskov, V.L., Bakken, J.P., Byerly, A., and Shah, A. (2019, January 8\u201311). Machine Learning-based Predictive Analytics of Student Academic Performance in STEM Education. Proceedings of the 2019 IEEE Global Engineering Education Conference (EDUCON), Dubai, United Arab Emirates.","DOI":"10.1109\/EDUCON.2019.8725237"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.chb.2017.01.047","article-title":"Evaluating the effectiveness of educational data mining techniques for early prediction of students\u2019 academic failure in introductory programming courses","volume":"73","author":"Costa","year":"2017","journal-title":"Comput. Human Behav."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chandrahasa, S., Ganeshan, S.M.S., Maddineni, R.C., Divya, M.S., Tumuluru, P., and Suneetha, B. (2023, January 23\u201325). Machine Learning Algorithms based Student Performance Prediction based on Previous Records. Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India.","DOI":"10.1109\/ICCMC56507.2023.10084099"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"95608","DOI":"10.1109\/ACCESS.2021.3093563","article-title":"Multiclass Prediction Model for Student Grade Prediction Using Machine Learning","volume":"9","author":"Bujang","year":"2021","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Althaph, B., Sreenivasu, S.V.N., and Reddy, D.V. (2023, January 23\u201325). Student Performance Analysis with Ensemble Progressive Prediction. Proceedings of the 2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India.","DOI":"10.1109\/ICSSIT55814.2023.10060910"},{"key":"ref_20","unstructured":"Saraswathi, K., Devadharshini, B., Kavina, S., and Srinidhi, S. (2023, January 23\u201325). Prediction on Impact of Electronic Gadgets in Students Life using Machine Learning. Proceedings of the 2023 7th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"55462","DOI":"10.1109\/ACCESS.2020.2981905","article-title":"Using Data Mining Techniques to Predict Student Performance to Support Decision Making in University Admission Systems","volume":"8","author":"Mengash","year":"2020","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"27579","DOI":"10.1109\/ACCESS.2023.3250702","article-title":"Early Predicting of Students Performance in Higher Education","volume":"11","author":"Alhazmi","year":"2023","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"6286","DOI":"10.1109\/ACCESS.2022.3143035","article-title":"An Efficient Data Mining Technique for Assessing Satisfaction Level With Online Learning for Higher Education Students During the COVID-19","volume":"10","author":"Abdelkader","year":"2022","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sekeroglu, B., Dimililer, K., and Tuncal, K. (2019, January 2\u20134). Student Performance Prediction and Classification Using Machine Learning Algorithms. Proceedings of the 2019 8th International Conference on Educational and Information Technology, Cambridge, UK.","DOI":"10.1145\/3318396.3318419"},{"key":"ref_25","unstructured":"Alamri, L.H., Almuslim, R.S., Alotibi, M.S., Alkadi, D.K., Khan, I.U., and Aslam, N. (2020, January 17\u201319). Predicting Student Academic Performance using Support Vector Machine and Random Forest. Proceedings of the 2020 3rd International Conference on Education Technology Management, London, UK."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Rustia, R.A., Cruz, M.M.A., Burac, M.A.P., and Palaoag, T.D. (2018, January 8\u201310). Predicting Student\u2019s Board Examination Performance using Classification Algorithms. Proceedings of the 2018 7th International Conference on Software and Computer Applications, Kuantan, Malaysia.","DOI":"10.1145\/3185089.3185101"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zong, J., Cui, C., Ma, Y., Yao, L., Chen, M., and Yin, Y. (2020, January 19\u201323). Behavior-driven Student Performance Prediction with Tri-branch Convolutional Neural Network. Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland.","DOI":"10.1145\/3340531.3412110"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kusumawardani, S.S., Alfarozi, S.A.I., Pradana, C., Ratnaningsih, D.J., Aluicius, I.E., and Chaeruman, U.A. (2023). Predicting student performance based on their interaction activities in a learning management system using machine learning method. Education Technology in the New Normal: Now and Beyond, Routledge.","DOI":"10.1201\/9781003353423-25"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"981","DOI":"10.1007\/s11633-021-1312-1","article-title":"Data Augmentation and Deep Neuro-fuzzy Network for Student Performance Prediction with MapReduce Framework","volume":"18","author":"Baruah","year":"2021","journal-title":"Int. J. Autom. Comput."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"23427","DOI":"10.1007\/s11042-022-14083-5","article-title":"Deep auto encoder based on a transient search capsule network for student performance prediction","volume":"82","author":"Rahul","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Musek, J. (2024). Personality and Well-Being. Personality Psychology, Springer Nature.","DOI":"10.1007\/978-3-031-55308-0"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1111\/jopy.12301","article-title":"Unique Associations Between Big Five Personality Aspects and Multiple Dimensions of Well-Being","volume":"86","author":"Sun","year":"2018","journal-title":"J. Pers."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tjandra, E., Kusumawardani, S.S., and Ferdiana, R. (2021, January 25\u201326). Student performance prediction in higher education: A comprehensive review. Proceedings of the International Conference on Informatics, Technology, and Engineering 2021 (Incite 2021): Leveraging Smart Engineering, Online.","DOI":"10.1063\/5.0080187"},{"key":"ref_34","unstructured":"Uyeno, R., Zhang, S., and Chin-Chance, S. (2024, April 16). The Role of Demographic Factors in Predicting Student Performance on a State Reading Test. Online Submission, Available online: https:\/\/files.eric.ed.gov\/fulltext\/ED493067.pdf."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kamal, P., and Ahuja, S. (2019). Academic Performance Prediction Using Data Mining Techniques: Identification of Influential Factors Effecting the Academic Performance in Undergrad Professional Course. Harmony Search and Nature Inspired Optimization Algorithms, Springer.","DOI":"10.1007\/978-981-13-0761-4_79"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1016\/j.paid.2005.02.013","article-title":"The relationship between the big five personality traits and academic motivation","volume":"39","author":"Komarraju","year":"2005","journal-title":"Personal. Individ. Differ."},{"key":"ref_37","unstructured":"Mccormick, P., and Vongrey, G. (2023). Exploring the Occupational Health and Longevity of New General and Special Education Teachers: A Five Year Study of Novice Teachers Prepared by Bethel University, Bethel University."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"14","DOI":"10.3126\/njmr.v6i4.61991","article-title":"Does the Staff Bonus Fund Decrease the Net Profit? Empirical Insights from Nepalese Commercial Banks","volume":"6","author":"Bhattarai","year":"2023","journal-title":"Nepal J. Multidiscip. Res."},{"key":"ref_39","first-page":"1650","article-title":"Analisis Karakter Kesabaran Mahasiswa dalam Mengatasi Stress Akademik di Universitas Negeri Medan","volume":"3","author":"Siregar","year":"2024","journal-title":"J-CEKI J. Cendekia Ilm."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.gltp.2022.04.020","article-title":"A review: Data pre-processing and data augmentation techniques","volume":"3","author":"Maharana","year":"2022","journal-title":"Glob. Transit. Proc."},{"key":"ref_41","first-page":"375","article-title":"The Effect of Using Data Pre-Processing by Imputations in Handling Missing Values","volume":"10","author":"Karrar","year":"2022","journal-title":"Indones. J. Electr. Eng. Inform."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hedeker, D. (2008). Multilevel Models for Ordinal and Nominal Variables. Handbook of Multilevel Analysis, Springer.","DOI":"10.1007\/978-0-387-73186-5_6"},{"key":"ref_43","first-page":"3560","article-title":"Learnable Weighting of Intra-attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes","volume":"44","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1109\/TCYB.2020.2983073","article-title":"A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering","volume":"52","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Cybern."},{"key":"ref_45","first-page":"4917","article-title":"Multi-Dimensional Classification via Sparse Label Encoding","volume":"139","author":"Jia","year":"2021","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1080\/1540496X.2020.1825935","article-title":"Missing Data Preprocessing in Credit Classification: One-Hot Encoding or Imputation?","volume":"58","author":"Yu","year":"2022","journal-title":"Emerg. Mark. Financ. Trade"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"105524","DOI":"10.1016\/j.asoc.2019.105524","article-title":"Investigating the impact of data normalization on classification performance","volume":"97","author":"Singh","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"23","DOI":"10.11591\/csit.v5i1.pp23-31","article-title":"Predicting students\u2019 success level in an examination using advanced linear regression and extreme gradient boosting","volume":"5","author":"Wahyuningsih","year":"2024","journal-title":"Comput. Sci. Inf. Technol."},{"key":"ref_49","first-page":"240","article-title":"Enhancing Educational Assessment: Predicting and Visualizing Student Performance using EDA and Machine Learning Techniques","volume":"37","author":"Parkavi","year":"2024","journal-title":"J. Eng. Educ. Transform."},{"key":"ref_50","first-page":"994","article-title":"Predicting Students\u2019 Academic Performance Through Machine Learning Classifiers: A Study Employing the Naive Bayes Classifier (NBC)","volume":"15","author":"Zheng","year":"2024","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_51","first-page":"2949","article-title":"Jurnal Mantik Modification of random forest method to predict student graduation data","volume":"7","author":"Dewi","year":"2024","journal-title":"J. Mantik"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/s40561-022-00192-z","article-title":"Educational data mining: Prediction of students\u2019 academic performance using machine learning algorithms","volume":"9","year":"2022","journal-title":"Smart Learn. Environ."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"103676","DOI":"10.1016\/j.compedu.2019.103676","article-title":"An overview and comparison of supervised data mining techniques for student exam performance prediction","volume":"143","author":"Tomasevic","year":"2020","journal-title":"Comput. Educ."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Naidu, G., Zuva, T., and Sibanda, E.M. (2023, January 3\u20135). A Review of Evaluation Metrics in Machine Learning Algorithms. Proceedings of the Computer Science On-line Conference, Online.","DOI":"10.1007\/978-3-031-35314-7_2"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5121\/ijdkp.2015.5201","article-title":"A Review on Evaluation Metrics for Data Classification Evaluations","volume":"5","author":"Hossin","year":"2015","journal-title":"Int. J. Data Min. Knowl. Manag. Process"},{"key":"ref_56","first-page":"599","article-title":"Classification Model Evaluation Metrics","volume":"12","author":"Vujovic","year":"2021","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Losada, D.E., and Fern\u00e1ndez-Luna, J.M. (2005). A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. Advances in Information Retrieval, Springer.","DOI":"10.1007\/b107096"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Rainio, O., Teuho, J., and Kl\u00e9n, R. (2024). Evaluation metrics and statistical tests for machine learning. Sci. Rep., 14.","DOI":"10.1038\/s41598-024-56706-x"},{"key":"ref_59","first-page":"2229","article-title":"Ailab Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation","volume":"2","author":"Powers","year":"2011","journal-title":"J. Mach. Learn. Technol."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3606367","article-title":"A Review of the F-Measure: Its History, Properties, Criticism, and Alternatives","volume":"56","author":"Christen","year":"2023","journal-title":"ACM Comput. Surv."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/3\/27\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:38:31Z","timestamp":1760027911000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/10\/3\/27"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,20]]},"references-count":60,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["data10030027"],"URL":"https:\/\/doi.org\/10.3390\/data10030027","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,20]]}}}