{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T14:47:39Z","timestamp":1768229259531,"version":"3.49.0"},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T00:00:00Z","timestamp":1768176000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T00:00:00Z","timestamp":1768176000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100015720","name":"Universidad de Extremadura","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100015720","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2026,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The elevated rates of dropout within academic institutions have prompted the use of Artificial Intelligence (AI) to tackle this issue. These efforts often rely mainly on administrative and academic data, lacking personal information about students. In a previous study, we explored machine learning models to leverage this data and harness their knowledge-extraction capabilities. However, a critical factor, the availability of labeled data, was not addressed. Obtaining these data may be challenging due to their distribution across different systems or the considerable time required to collect them, especially when new degrees are being implemented. The lack of labeled data is a major obstacle for institutions that do not possess them so that they are unable to take advantage of the full potential of AI for their purposes. Clustering algorithms have conventionally been employed to uncover latent patterns within unlabeled data. These unsupervised algorithms may reduce the need for data labeling; nonetheless, it necessitates rigorous validation of the resulting clusters, particularly when dealing with datasets encompassing numerical and categorical attributes. This paper introduces a comparison of various clustering algorithms to discern the most appropriate technique for uncovering the underlying factors contributing to university student attrition, employing unlabeled data. The novelty lies not only in the algorithmic comparison but also in their integration with diverse data preprocessing methodologies, streamlining the selection of the optimal combination including advanced data transformations for the harmonization of numerical and categorical information. It is illustrated through a real-world case utilizing academic data from a Spanish university, providing empirical validation for the proposed methodology. We also conducted an exploratory analysis to identify the factors behind cluster formation. The insights gained can be extrapolated to analogous experiments where social or economic data are scarce, and most of the available attributes are academic in nature.<\/jats:p>","DOI":"10.1007\/s41060-025-00965-y","type":"journal-article","created":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T08:40:20Z","timestamp":1768207220000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["An empirical evaluation of clustering processes for early detection of university dropout"],"prefix":"10.1007","volume":"22","author":[{"given":"Fran","family":"Melchor","sequence":"first","affiliation":[]},{"given":"Jos\u00e9 M.","family":"Conejero","sequence":"additional","affiliation":[]},{"given":"Antonio Jes\u00fas","family":"Fern\u00e1ndez-Garc\u00eda","sequence":"additional","affiliation":[]},{"given":"Fernando","family":"S\u00e1nchez-Figueroa","sequence":"additional","affiliation":[]},{"given":"Roberto","family":"Rodr\u00edguez-Echeverr\u00eda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,1,12]]},"reference":[{"key":"965_CR1","volume-title":"Indicadores Sint\u00e9ticos de las Universidades Espa\u00f1olas","author":"F P\u00e9rez","year":"2019","unstructured":"P\u00e9rez, F., Ald\u00e1s, J.: Indicadores Sint\u00e9ticos de las Universidades Espa\u00f1olas. Fundaci\u00f3n BBVA e Ivie, Tlaxcala Barrio de Tlaxcala, Mexico (2019)"},{"key":"965_CR2","doi-asserted-by":"publisher","first-page":"189069","DOI":"10.1109\/ACCESS.2020.3031572","volume":"8","author":"AJ Fern\u00e1ndez-Garc\u00eda","year":"2020","unstructured":"Fern\u00e1ndez-Garc\u00eda, A.J., Rodr\u00edguez-Echeverr\u00eda, R., Preciado, J.C., Manzano, J.M.C., S\u00e1nchez-Figueroa, F.: Creating a recommender system to support higher education students in the subject enrollment decision. IEEE Access. 8, 189069\u2013189088 (2020). https:\/\/doi.org\/10.1109\/ACCESS.2020.3031572","journal-title":"IEEE Access."},{"key":"965_CR3","doi-asserted-by":"publisher","first-page":"133076","DOI":"10.1109\/ACCESS.2021.3115851","volume":"9","author":"AJ Fern\u00e1ndez-Garc\u00eda","year":"2021","unstructured":"Fern\u00e1ndez-Garc\u00eda, A.J., Preciado, J.C., Melchor, F., Rodriguez-Echeverria, R., Conejero, J.M., S\u00e1nchez-Figueroa, F.: A real-life machine learning experience for predicting university dropout at different stages using academic data. IEEE Access. 9, 133076\u2013133090 (2021). https:\/\/doi.org\/10.1109\/ACCESS.2021.3115851","journal-title":"IEEE Access."},{"key":"965_CR4","doi-asserted-by":"publisher","unstructured":"Palani, K., Stynes, P., Pathak, P.,: Clustering techniques to identify low-engagement student levels. In: International Conference On Computer Supported Education, CSEDU - Proceedings, vol. 2, pp. 248\u2013257 (2021). https:\/\/doi.org\/10.5220\/0010456802480257","DOI":"10.5220\/0010456802480257"},{"key":"965_CR5","doi-asserted-by":"publisher","unstructured":"Agarwal, K., Maheshwari, E., Roy, C., Pandey, M., Rautray, S.S.: Analyzing student performance in engineering placement using data mining. In: Lecture notes on data engineering and communications technologies vol. 28, pp. 171\u2013181. Springer, New York, USA (2019). https:\/\/doi.org\/10.1007\/978-981-13-6459-4_18","DOI":"10.1007\/978-981-13-6459-4_18"},{"issue":"2","key":"965_CR6","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1007\/s12351-008-0032-4","volume":"11","author":"Y Psaromiligkos","year":"2011","unstructured":"Psaromiligkos, Y., Orfanidou, M., Kytagias, C., Zafiri, E.: Mining log data for the analysis of learners\u2019 behaviour in web-based learning management systems. Oper. Res. Int. J. 11(2), 187\u2013200 (2011). https:\/\/doi.org\/10.1007\/s12351-008-0032-4","journal-title":"Oper. Res. Int. J."},{"key":"965_CR7","doi-asserted-by":"publisher","unstructured":"Baruque, C.B., Amaral, M.A., Barcellos, A., Da Silva, J.C., Freitas, C.J.: Longo, analysing users\u2019 access logs in moodle to improve e learning. In: Euro American conference on telematics and information systems - proceedings of the 2007 Euro American conference on telematics and information systems, EATIS 2007. Association for computing machinery, New York, USA (2007). https:\/\/doi.org\/10.1145\/1352694.1352767","DOI":"10.1145\/1352694.1352767"},{"key":"965_CR8","doi-asserted-by":"crossref","unstructured":"Rodriguez, M.Z., Comin, C.H., Casanova, D., Bruno, O.M., Amancio, D.R., Costa, L.d.F., Rodrigues, F.A.: Clustering algorithms: a comparative approach. Plos One. 14(1), 1\u201334 (2019)","DOI":"10.1371\/journal.pone.0210236"},{"issue":"3","key":"965_CR9","first-page":"161","volume":"15","author":"F Agrusti","year":"2019","unstructured":"Agrusti, F., Bonavolont\u00e0, G., Mezzini, M.: University dropout prediction through educational data mining techniques: a systematic review. J. E-Learn. Knowl. Soc. 15(3), 161\u2013182 (2019)","journal-title":"J. E-Learn. Knowl. Soc."},{"issue":"2","key":"965_CR10","doi-asserted-by":"publisher","first-page":"456","DOI":"10.1111\/obes.12277","volume":"81","author":"D Sansone","year":"2019","unstructured":"Sansone, D.: Beyond early warning indicators: high school dropout and machine learning. Oxf. Bull. Econ. Stat. 81(2), 456\u2013485 (2019). https:\/\/doi.org\/10.1111\/obes.12277","journal-title":"Oxf. Bull. Econ. Stat."},{"issue":"3","key":"965_CR11","doi-asserted-by":"publisher","first-page":"468","DOI":"10.3390\/electronics11030468","volume":"11","author":"MM Tamada","year":"2022","unstructured":"Tamada, M.M., Giusti, R., Netto, JFd.M.: Predicting students at risk of dropout in technical course using lms logs. Electron 11(3), 468 (2022)","journal-title":"Electron"},{"issue":"1","key":"965_CR12","doi-asserted-by":"publisher","first-page":"104","DOI":"10.12928\/telkomnika.v21i1.24238","volume":"21","author":"R Widayanti","year":"2023","unstructured":"Widayanti, R., Madenda, S., Wibowo, E.P., Anwar, K.: SOM-SIS approach to auto summary of clustering results on university academic performance. Telkomnika (Telecommun. Comp. Electron Control) 21(1), 104\u2013112 (2023)","journal-title":"Telkomnika (Telecommun. Comp. Electron Control)"},{"key":"965_CR13","doi-asserted-by":"publisher","unstructured":"Balioti, V., Tzimopoulos, C., Evangelides, C.: Multi-criteria decision making using TOPSIS method under fuzzy environment. application in spillway selection. In: Proceedings 2018, vol. 2, p. 637. Proceedings, MDPI, Basel, Switzerland (2018). https:\/\/doi.org\/10.3390\/proceedings2110637 . https:\/\/www.mdpi.com\/2504-3900\/2\/11\/637\/htmhttps:\/\/www.mdpi.com\/2504-3900\/2\/11\/637","DOI":"10.3390\/proceedings2110637"},{"key":"965_CR14","doi-asserted-by":"publisher","unstructured":"Al-shargabi, A.A., Nusari, A.N.: Discovering vital patterns from UST students data by applying data mining techniques. In: 2010 The 2nd international conference on computer and automation engineering, ICCAE 2010, vol. 2, pp. 547\u2013551 (2010). https:\/\/doi.org\/10.1109\/ICCAE.2010.5451653","DOI":"10.1109\/ICCAE.2010.5451653"},{"issue":"1","key":"965_CR15","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1007\/s44196-022-00087-4","volume":"15","author":"T Wang","year":"2022","unstructured":"Wang, T., Xiao, B., Ma, W.: Student behavior data analysis based on association rule mining. Int. J. Comput. Intell. Syst. 15(1), 32 (2022)","journal-title":"Int. J. Comput. Intell. Syst."},{"key":"965_CR16","doi-asserted-by":"publisher","DOI":"10.1186\/s13673-016-0083-0","author":"N Iam-On","year":"2017","unstructured":"Iam-On, N., Boongoen, T.: Generating descriptive model for student dropout: a review of clustering approach. Springer (2017). https:\/\/doi.org\/10.1186\/s13673-016-0083-0","journal-title":"Springer"},{"issue":"19","key":"965_CR17","doi-asserted-by":"publisher","first-page":"9467","DOI":"10.3390\/app12199467","volume":"12","author":"AF Mohamed Nafuri","year":"2022","unstructured":"Mohamed Nafuri, A.F., Sani, N.S., Zainudin, N.F.A., Rahman, A.H.A., Aliff, M.: Clustering analysis for classifying student academic performance in higher education. Appl. Sci. 12(19), 9467 (2022)","journal-title":"Appl. Sci."},{"issue":"2","key":"965_CR18","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1111\/ejed.12433","volume":"56","author":"A Behr","year":"2021","unstructured":"Behr, A., Giese, M., Teguim Kamdjou, H.D., Theune, K.: Motives for dropping out from higher education\u2013an analysis of bachelor\u2019s degree students in Germany. Eur. J. Educ. 56(2), 325\u2013343 (2021). https:\/\/doi.org\/10.1111\/ejed.12433","journal-title":"Eur. J. Educ."},{"key":"965_CR19","doi-asserted-by":"publisher","unstructured":"Marchal, S., Szyller, S.: Detecting organized ecommerce fraud using scalable categorical clustering. In: Proceedings of the 35th annual computer security applications conference. ACSAC \u201919, pp. 215\u2013228. Association for computing machinery, New York, NY, USA (2019). https:\/\/doi.org\/10.1145\/3359789.3359810","DOI":"10.1145\/3359789.3359810"},{"key":"965_CR20","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0210236","author":"MZ Rodriguez","year":"2019","unstructured":"Rodriguez, M.Z., Comin, C.H., Casanova, D., Bruno, O.M., Amancio, D.R., Rodrigues, F.A., Costa, Ld.F.: Rodrigues, clustering algorithms: a comparative approach. Plos One. (2019). https:\/\/doi.org\/10.1371\/journal.pone.0210236","journal-title":"Plos One."},{"issue":"12","key":"965_CR21","doi-asserted-by":"publisher","first-page":"1650","DOI":"10.1109\/TPAMI.2002.1114856","volume":"24","author":"U Maulik","year":"2002","unstructured":"Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650\u20131654 (2002). https:\/\/doi.org\/10.1109\/TPAMI.2002.1114856","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"965_CR22","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1016\/j.knosys.2018.10.019","volume":"164","author":"AJ Fern\u00e1ndez-Garc\u00eda","year":"2019","unstructured":"Fern\u00e1ndez-Garc\u00eda, A.J., Iribarne, L., Corral, A., Criado, J., Wang, J.Z.: A recommender system for component-based applications using machine learning techniques. Knowl.based syst. 164, 68\u201384 (2019). https:\/\/doi.org\/10.1016\/j.knosys.2018.10.019","journal-title":"Knowl.based syst."},{"key":"965_CR23","doi-asserted-by":"publisher","unstructured":"Ramasubramanian, K., Singh, A.: Feature engineering, pp. 181\u2013217. Apress, Berkeley, CA (2017). https:\/\/doi.org\/10.1007\/978-1-4842-2334-5_5","DOI":"10.1007\/978-1-4842-2334-5_5"},{"issue":"3","key":"965_CR24","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1111\/j.1751-5823.2009.00095_18.x","volume":"77","author":"T Hastie","year":"2009","unstructured":"Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data mining, inference, and prediction. Int. Stat. Rev. 77(3), 482\u2013482 (2009). https:\/\/doi.org\/10.1111\/j.1751-5823.2009.00095_18.x","journal-title":"Int. Stat. Rev."},{"issue":"4","key":"965_CR25","doi-asserted-by":"publisher","first-page":"563","DOI":"10.2190\/CS.16.4.e","volume":"16","author":"D Raju","year":"2015","unstructured":"Raju, D., Schumacker, R.: Exploring student characteristics of retention that lead to graduation in higher education using data mining models. J. Coll. Stud. Retent.: Res. Theory Pract. 16(4), 563\u2013591 (2015). https:\/\/doi.org\/10.2190\/CS.16.4.e","journal-title":"J. Coll. Stud. Retent.: Res. Theory Pract."},{"issue":"1","key":"965_CR26","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1080\/21568235.2020.1718520","volume":"10","author":"L Kemper","year":"2020","unstructured":"Kemper, L., Vorhoff, G., Wigger, B.U.: Predicting student dropout: a machine learning approach. European J. High. Educ. 10(1), 28\u201347 (2020). https:\/\/doi.org\/10.1080\/21568235.2020.1718520","journal-title":"European J. High. Educ."},{"key":"965_CR27","doi-asserted-by":"publisher","unstructured":"Perez, B., Castellanos, C., Correal, D.: Applying data mining techniques to predict student dropout: a case study. 2018 IEEE 1st colombian conference on applications in computational intelligence, ColCACI 2018 - Proceedings (2018) https:\/\/doi.org\/10.1109\/COLCACI.2018.8484847","DOI":"10.1109\/COLCACI.2018.8484847"},{"key":"965_CR28","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1186\/S13052-025-01983-Z","volume":"51","author":"Y Tao","year":"2025","unstructured":"Tao, Y., Cheng, W., Zhen, H., Shen, J., Guan, H., Liu, Z.: Global time-trend analysis and projections of disease burden for neuroblastic tumors: a worldwide study from 1990 to 2021. Italian J. Pediatr 51, 164 (2025). https:\/\/doi.org\/10.1186\/S13052-025-01983-Z","journal-title":"Italian J. Pediatr"},{"key":"965_CR29","doi-asserted-by":"publisher","unstructured":"Freedman, D., Diaconis, P.: On the histogram as a density estimator:l2 theory. Zeitschrift f\u00fcr Wahrscheinlichkeitstheorie und Verwandte Gebiete 57, 453\u2013476 (1981) https:\/\/doi.org\/10.1007\/BF01025868\/METRICS","DOI":"10.1007\/BF01025868\/METRICS"},{"key":"965_CR30","doi-asserted-by":"publisher","unstructured":"Fern\u00e1ndez-Garc\u00eda, A.J., Iribarne, L., Corral, A., Criado, J.: A comparison of feature selection methods to optimize predictive models based on decision forest algorithms for academic data analysis. In: advances in intelligent systems and computing, vol. 745, pp. 338\u2013347. Springer, New York, USA (2018). https:\/\/doi.org\/10.1007\/978-3-319-77703-0_35","DOI":"10.1007\/978-3-319-77703-0_35"},{"issue":"66\u201371","key":"965_CR31","first-page":"13","volume":"10","author":"L Van Der Maaten","year":"2009","unstructured":"Van Der Maaten, L., Postma, E.O., Van Den Herik, H.J., et al.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10(66\u201371), 13 (2009)","journal-title":"J. Mach. Learn. Res."},{"key":"965_CR32","first-page":"20","volume":"2","author":"B Mohammed","year":"2021","unstructured":"Mohammed, B., Hasan, S., Abdulazeez, A.M.: A review of principal component analysis algorithm for dimensionality reduction. J. Soft Comput. Data Mining 2, 20\u201330 (2021)","journal-title":"J. Soft Comput. Data Mining"},{"key":"965_CR33","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1186\/1745-6150-2-2","volume":"2","author":"R Cangelosi","year":"2007","unstructured":"Cangelosi, R., Goriely, A.: Component retention in principal component analysis with application to cdna microarray data. Biol. Direct 2, 2 (2007). https:\/\/doi.org\/10.1186\/1745-6150-2-2","journal-title":"Biol. Direct"},{"key":"965_CR34","doi-asserted-by":"publisher","unstructured":"Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: Proceedings - international conference on data engineering, pp. 341\u2013352 (2005). https:\/\/doi.org\/10.1109\/ICDE.2005.34","DOI":"10.1109\/ICDE.2005.34"},{"key":"965_CR35","doi-asserted-by":"publisher","unstructured":"Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. Proceedings - international conference on data engineering, 512\u2013521 (1999) https:\/\/doi.org\/10.1109\/icde.1999.754967","DOI":"10.1109\/icde.1999.754967"},{"key":"965_CR36","doi-asserted-by":"crossref","unstructured":"Barbar\u00e1, D., Couto, J., Li, Y.: COOLCAT: an entropy-based algorithm for categorical clustering. In: International conference on information and knowledge management, proceedings, pp. 582\u2013589 (2002)","DOI":"10.1145\/584792.584888"},{"issue":"1","key":"965_CR37","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1007\/s00357-001-0004-3","volume":"18","author":"A Chaturvedi","year":"2001","unstructured":"Chaturvedi, A., Foods, K., Green, P.E., Carroll, J.D.: K-modes clustering. J. Classif. 18(1), 35\u201355 (2001). https:\/\/doi.org\/10.1007\/s00357-001-0004-3","journal-title":"J. Classif."},{"key":"965_CR38","doi-asserted-by":"publisher","unstructured":"Mannor, S., Jin, X., Han, J., Jin, X., Han, J., Jin, X., Han, J., Zhang, X.: K-means clustering. In: Encyclopedia of machine learning, pp. 563\u2013564. Springer, New York, USA (2011). https:\/\/doi.org\/10.1007\/978-0-387-30164-8_425","DOI":"10.1007\/978-0-387-30164-8_425"},{"key":"965_CR39","doi-asserted-by":"publisher","unstructured":"Zepeda-Mendoza, M.L., Resendis-Antonio, O.: Hierarchical agglomerative clustering. in: encyclopedia of systems biology, pp. 886\u2013887. Springer, New York, USA (2013).https:\/\/doi.org\/10.1007\/978-1-4419-9863-7_1371","DOI":"10.1007\/978-1-4419-9863-7_1371"},{"issue":"11","key":"965_CR40","doi-asserted-by":"publisher","first-page":"0188274","DOI":"10.1371\/journal.pone.0188274","volume":"12","author":"M Hummel","year":"2017","unstructured":"Hummel, M., Edelmann, D., Kopp-Schneider, A.: Clustering of samples and variables with mixed-type data. Plos One 12(11), 0188274 (2017)","journal-title":"Plos One"},{"issue":"3","key":"965_CR41","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1016\/j.ins.2003.03.011","volume":"159","author":"S Hirano","year":"2004","unstructured":"Hirano, S., Sun, X., Tsumoto, S.: Comparison of clustering methods for clinical databases. Inf. Sci. 159(3), 155\u2013165 (2004). https:\/\/doi.org\/10.1016\/j.ins.2003.03.011","journal-title":"Inf. Sci."},{"key":"965_CR42","doi-asserted-by":"crossref","unstructured":"Reynolds, D.A.: Gaussian mixture models. In: Encyclopedia of biometrics (2009)","DOI":"10.1007\/978-0-387-73003-5_196"},{"key":"965_CR43","doi-asserted-by":"publisher","unstructured":"Shahapure, K.R., Nicholas, C.: Cluster quality analysis using silhouette score. In: 2020 IEEE 7th international conference on data science and advanced analytics (DSAA), pp. 747\u2013748 (2020). https:\/\/doi.org\/10.1109\/DSAA49011.2020.00096","DOI":"10.1109\/DSAA49011.2020.00096"},{"key":"965_CR44","unstructured":"Zhao, Y., Karypis, G.: Criterion functions for document clustering: Experiments and analysis (2001)"},{"key":"965_CR45","doi-asserted-by":"crossref","unstructured":"Li, T., Ma, S., Ogihara, M.: Entropy-based criterion in categorical clustering. In: Proceedings of the twenty-first international conference on machine learning, p. 68 (2004)","DOI":"10.1145\/1015330.1015404"},{"key":"965_CR46","doi-asserted-by":"publisher","unstructured":"Serra, A., Greco, D., Tagliaferri, R.: Impact of different metrics on multi-view clustering. In: 2015 international joint conference on neural networks (IJCNN), pp. 1\u20138 (2015). https:\/\/doi.org\/10.1109\/IJCNN.2015.7280445","DOI":"10.1109\/IJCNN.2015.7280445"},{"key":"965_CR47","doi-asserted-by":"publisher","unstructured":"Senoussaoui, M., Kenny, P., Dumouchel, P., Stafylakis, T.: Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp. 7712\u20137715 (2013). https:\/\/doi.org\/10.1109\/ICASSP.2013.6639164","DOI":"10.1109\/ICASSP.2013.6639164"},{"issue":"2","key":"965_CR48","first-page":"459","volume":"36","author":"W He","year":"2023","unstructured":"He, W., Hung, J.-L., Liu, L.: Impact of big data analytics on banking: a case study. J. Enterp. Inf. Manag. 36(2), 459\u2013479 (2023)","journal-title":"J. Enterp. Inf. Manag."},{"key":"965_CR49","unstructured":"Alghofaili, Y.: Interpretable K-Means: clusters feature importances (2020). https:\/\/towardsdatascience.com\/interpretable-k-means-clusters-feature-importances-7e516eeb8d3c"},{"key":"965_CR50","doi-asserted-by":"crossref","unstructured":"Zhao, Q., Xu, M., Fr\u00e4nti, P.,: Sum-of-squares based cluster validity index and significance analysis. In: International conference on adaptive and natural computing algorithms, pp. 313\u2013322 (2009). Springer","DOI":"10.1007\/978-3-642-04921-7_32"},{"key":"965_CR51","doi-asserted-by":"publisher","unstructured":"Gonz\u00e1lez, F.J.M., Rodriguez-Echeverria, R., Conejero, J.M., Prieto, A.E., Rodriguez-Echeverriaz, J.D.G.: A model-driven approach for systematic reproducibility and replicability of data science projects. In: Franch, X., Poels, G., Gailly, F., Snoeck, M. (eds.) advanced information systems engineering - 34th international conference, caise 2022, leuven, belgium, june 6-10, 2022, proceedings. lecture notes in computer science, vol. 13295, pp. 147\u2013163. Springer (2022). https:\/\/doi.org\/10.1007\/978-3-031-07472-1_9","DOI":"10.1007\/978-3-031-07472-1_9"},{"key":"965_CR52","doi-asserted-by":"publisher","unstructured":"Ji, J., Bai, T., Zhou, C., Ma, C., Wang, Z.: An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing 120, 590\u2013596 (2013) https:\/\/doi.org\/10.1016\/j.neucom.2013.04.011 . image feature detection and description","DOI":"10.1016\/j.neucom.2013.04.011"},{"issue":"2","key":"965_CR53","doi-asserted-by":"publisher","first-page":"1313","DOI":"10.1007\/s10586-017-0818-3","volume":"20","author":"X Liu","year":"2017","unstructured":"Liu, X., Yang, Q., He, L.: A novel dbscan with entropy and probability for mixed data. Clust. Comput. 20(2), 1313\u20131323 (2017). https:\/\/doi.org\/10.1007\/s10586-017-0818-3","journal-title":"Clust. Comput."}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-025-00965-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-025-00965-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-025-00965-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T08:40:24Z","timestamp":1768207224000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-025-00965-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,12]]},"references-count":53,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,12]]}},"alternative-id":["965"],"URL":"https:\/\/doi.org\/10.1007\/s41060-025-00965-y","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,12]]},"assertion":[{"value":"18 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"25"}}