{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T15:51:36Z","timestamp":1762876296691,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2021,7,19]],"date-time":"2021-07-19T00:00:00Z","timestamp":1626652800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Data-driven solutions to societal challenges continue to bring new dimensions to our daily lives. For example, while good-quality education is a well-acknowledged foundation of sustainable development, innovation and creativity, variations in student attainment and general performance remain commonplace. Developing data -driven solutions hinges on two fronts-technical and application. The former relates to the modelling perspective, where two of the major challenges are the impact of data randomness and general variations in definitions, typically referred to as concept drift in machine learning. The latter relates to devising data-driven solutions to address real-life challenges such as identifying potential triggers of pedagogical performance, which aligns with the Sustainable Development Goal (SDG) #4-Quality Education. A total of 3145 pedagogical data points were obtained from the central data collection platform for the United Arab Emirates (UAE) Ministry of Education (MoE). Using simple data visualisation and machine learning techniques via a generic algorithm for sampling, measuring and assessing, the paper highlights research pathways for educationists and data scientists to attain unified goals in an interdisciplinary context. Its novelty derives from embedded capacity to address data randomness and concept drift by minimising modelling variations and yielding consistent results across samples. Results show that intricate relationships among data attributes describe the invariant conditions that practitioners in the two overlapping fields of data science and education must identify.<\/jats:p>","DOI":"10.3390\/data6070077","type":"journal-article","created":{"date-parts":[[2021,7,19]],"date-time":"2021-07-19T10:07:37Z","timestamp":1626689257000},"page":"77","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Dealing with Randomness and Concept Drift in Large Datasets"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1134-547X","authenticated-orcid":false,"given":"Kassim S.","family":"Mwitondi","sequence":"first","affiliation":[{"name":"Industry & Innovation Research Institute, College of Business, Technology & Engineering, Sheffield Hallam University, 9410 Cantor Building, City Campus, 153 Arundel Street, Sheffield S1 2NU, UK"}]},{"given":"Raed A.","family":"Said","sequence":"additional","affiliation":[{"name":"Faculty of Management, Canadian University Dubai, Al Safa Street-Al Wasl, City Walk Mall, Dubai P.O. Box 415053, United Arab Emirates"}]}],"member":"1968","published-online":{"date-parts":[[2021,7,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.chb.2017.01.047","article-title":"Evaluating the effectiveness of educational data mining techniques for early prediction of students\u2019 academic failure in introductory programming courses","volume":"73","author":"Costa","year":"2017","journal-title":"Comput. Hum. Behav."},{"key":"ref_2","unstructured":"Wilson, K. (2020). What does it mean to do teaching? A qualitative study of resistance to Flipped Learning in a higher education context. Teach. High. Educ., 1\u201314."},{"key":"ref_3","first-page":"1","article-title":"Modeling engagement of programming students using unsupervised machine learning technique","volume":"6","author":"Marshall","year":"2018","journal-title":"GSTF J. Comput."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1016\/j.compedu.2014.03.002","article-title":"Modelling and quantifying the behaviours of students in lecture capture environments","volume":"75","author":"Brooks","year":"2014","journal-title":"Comput. Educ."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1016\/j.dss.2018.09.001","article-title":"Early segmentation of students according to their academic performance: A predictive modelling approach","volume":"115","author":"Freitas","year":"2018","journal-title":"Decis. Support Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"65","DOI":"10.5944\/ried.23.2.26470","article-title":"Data-Driven Educational Algorithms Pedagogical Framing","volume":"23","year":"2020","journal-title":"Revista Iberoamericana de Educaci\u00f3n a Distancia"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"293","DOI":"10.12785\/jsap\/020312","article-title":"A data-based method for harmonising heterogeneous data modelling techniques across data mining applications","volume":"2","author":"Mwitondi","year":"2013","journal-title":"J. Stat. Appl. Probab."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"106031","DOI":"10.1016\/j.cie.2019.106031","article-title":"Machine learning based concept drift detection for predictive maintenance","volume":"137","author":"Zenisek","year":"2019","journal-title":"Comput. Ind. Eng."},{"key":"ref_9","unstructured":"CHEDS (2018). Center For Higher Education Data and Statistics."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Japkowicz, N., and Stefanowski, J. (2016). An Overview of Concept Drift Applications. Big Data Analysis: New Algorithms for a New Society, Springer International Publishing.","DOI":"10.1007\/978-3-319-26989-4"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.inffus.2006.11.002","article-title":"Dynamic integration of classifiers for handling concept drift","volume":"9","author":"Tsymbal","year":"2008","journal-title":"Inf. Fusion"},{"key":"ref_12","unstructured":"SILPA (2019). Standards for Institutional Licensure and Program Accreditation."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"WDS247","DOI":"10.2481\/dsj.WDS-045","article-title":"A Data-Driven Method for Selecting Optimal Models Based on Graphical Visualisation of Differences in Sequentially Fitted ROC Model Parameters","volume":"12","author":"Mwitondi","year":"2013","journal-title":"Data Sci. J."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1016\/j.ipm.2018.01.010","article-title":"A survey towards an integration of big data analytics to big insights for value-creation","volume":"54","author":"Saggi","year":"2018","journal-title":"Inf. Process. Manag."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/s11528-015-0842-1","article-title":"The skinny on big data in education: Learning analytics simplified","volume":"59","author":"Reyes","year":"2015","journal-title":"TechTrends"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1016\/j.neucom.2017.01.026","article-title":"Machine learning on big data: Opportunities and challenges","volume":"237","author":"Zhou","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s41664-018-0068-2","article-title":"On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning","volume":"2","author":"Xu","year":"2018","journal-title":"J. Anal. Test."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chen, S., Dorn, S., Lell, M., Kachelrie\u00df, M., and Maier, A. (2018). Manifold Learning-Based Data Sampling for Model Training, Springer.","DOI":"10.1007\/978-3-662-56537-7_70"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Mwitondi, K., Munyakazi, I., and Gatsheni, B. (2020). A robust machine learning approach to SDG data segmentation. J. Big Data, 7.","DOI":"10.1186\/s40537-020-00373-y"},{"key":"ref_20","unstructured":"Mwitondi, K., Munyakazi, I., and Gatsheni, B. (2018, January 12\u201315). Amenability of the United Nations Sustainable Development Goals to Big Data Modelling. Proceedings of the International Workshop on Data Science-Present and Future of Open Data and Open Science, Joint Support Centre for Data Science Research, Mishima Citizens Cultural Hall, Mishima, Shizuoka, Japan."},{"key":"ref_21","unstructured":"Mwitondi, K., Munyakazi, I., and Gatsheni, B. (2018, January 19\u201321). An Interdisciplinary Data-Driven Framework for Development Science. Proceedings of the DIRISA National Research Data Workshop, CSIR ICC, Pretoria, South Africa."},{"key":"ref_22","unstructured":"Drori, I., Krishnamurthy, Y., Lourenco, R., Rampin, R., Cho, K., Silva, C., and Freire, J. (2019). Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1162\/neco.2006.18.4.961","article-title":"Feature Scaling for Kernel Fisher Discriminant Analysis Using Leave-One-Out Cross Validation","volume":"18","author":"Bo","year":"2006","journal-title":"Neural Comput."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Galkin, F., Aliper, A., Putin, E., Kuznetsov, I., Gladyshev, V.N., and Zhavoronkov, A. (2018). Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. bioRxiv.","DOI":"10.1101\/507780"},{"key":"ref_25","first-page":"102360","article-title":"A robust domain partitioning intrusion detection method","volume":"48","author":"Mwitondi","year":"2019","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_26","unstructured":"Looney, C.G. (1997). Pattern Recognition Using Neural Networks: Theory and Algorithms for Engineers and Scientists, Oxford University Press."},{"key":"ref_27","unstructured":"Webb, A. (2005). Statistical Pattern Recognition, Wiley."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1111\/j.2517-6161.1995.tb02023.x","article-title":"Deletion Influence and Masking in Regression","volume":"57","author":"Lawrence","year":"1995","journal-title":"J. R. Stat. Society. Ser. B (Methodol.)"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1080\/03610928908829928","article-title":"Masking and swamping effects on tests for multiple outliers in normal sample","volume":"18","author":"Bendre","year":"1989","journal-title":"Commun. Stat. Theory Methods"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1177\/0165551511412705","article-title":"A conceptual framework for managing very diverse data for complex, interdisciplinary science","volume":"37","author":"Parsons","year":"2011","journal-title":"J. Inf. Sci."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1080\/00221546.2018.1441107","article-title":"Academic Engagement and Student Success: Do High-Impact Practices Mean Higher Graduation Rates?","volume":"89","author":"Johnson","year":"2018","journal-title":"J. High. Educ."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1016\/j.chb.2016.02.074","article-title":"The impact of learning design on student behaviour, satisfaction and performance: A cross-institutional comparison across 151 modules","volume":"60","author":"Rienties","year":"2016","journal-title":"Comput. Hum. Behav."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lerman, R. (2019). Do firms benefit from apprenticeship investments?. IZA World Labor.","DOI":"10.15185\/izawol.55.v2"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Di Meglio, G., Barge-Gil, A., Cami\u00f1a, E., and Moreno, L. (2021, July 15). Knocking on Employment\u00b4s Door: Internships and Job Attainment. Munich Personal RePEc Archive 2019. Available online: https:\/\/mpra.ub.uni-muenchen.de\/95712\/1\/MPRA_paper_95712.pdf.","DOI":"10.1007\/s10734-020-00643-x"},{"key":"ref_35","unstructured":"Kennedy, J., and Eberhart, R. (December, January 27). Particle swarm optimization. Proceedings of the ICNN\u201995\u2014International Conference on Neural Networks, Perth, WA, Australia."},{"key":"ref_36","unstructured":"Shi, Y., and Eberhart, R. (1998, January 4\u20139). A modified particle swarm optimizer. Proceedings of the 1998 IEEE International Conference on Evolutionary Computation Proceedings, IEEE World Congress on Computational Intelligence (Cat. No.98TH8360), Anchorage, AK, USA."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/7\/77\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:31:53Z","timestamp":1760164313000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/7\/77"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,19]]},"references-count":36,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["data6070077"],"URL":"https:\/\/doi.org\/10.3390\/data6070077","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2021,7,19]]}}}