{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T20:09:04Z","timestamp":1776283744384,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T00:00:00Z","timestamp":1762300800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Spectral clustering has established itself as a powerful technique for data partitioning across various domains due to its ability to handle complex cluster structures. However, its computational efficiency remains a challenge, especially with large datasets. In this paper, we propose an enhancement of spectral clustering by integrating Cover tree data structure to optimize the nearest neighbor search, a crucial step in the construction of similarity graphs. Cover trees are a type of spatial tree that allow for efficient exact nearest neighbor queries in high-dimensional spaces. By embedding this technique into the spectral clustering framework, we achieve significant reductions in computational cost while maintaining clustering accuracy. Through extensive experiments on random, synthetic, and real-world datasets, we demonstrate that our approach outperforms traditional spectral clustering methods in terms of scalability and execution speed, without compromising the quality of the resultant clusters. This work provides a more efficient utilization of spectral clustering in big data applications.<\/jats:p>","DOI":"10.3390\/make7040139","type":"journal-article","created":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T16:09:47Z","timestamp":1762358987000},"page":"139","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Cover Tree-Optimized Spectral Clustering: Efficient Nearest Neighbor Search for Large-Scale Data Partitioning"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-1189-1352","authenticated-orcid":false,"given":"Abderrafik","family":"Laakel Hemdanou","sequence":"first","affiliation":[{"name":"Department of Mathematics and Computer Science, Normal Higher School, Abdelmalek Essaadi University, P.O. Box 209, Tetouan 93000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5334-2383","authenticated-orcid":false,"given":"Youssef","family":"Achtoun","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Normal Higher School, Abdelmalek Essaadi University, P.O. Box 209, Tetouan 93000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sara","family":"Mouali","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Normal Higher School, Abdelmalek Essaadi University, P.O. Box 209, Tetouan 93000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8270-2660","authenticated-orcid":false,"given":"Mohammed","family":"Lamarti Sefian","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, Normal Higher School, Abdelmalek Essaadi University, P.O. Box 209, Tetouan 93000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vesna","family":"\u0160e\u0161um \u010cavi\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Civil Engineering, University of Belgrade, Bulevar Kralja Aleksandra 73, 11000 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8254-6688","authenticated-orcid":false,"given":"Stojan","family":"Radenovi\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Mechanical Engineering, University of Belgrade, Kraljice Marije 16, 11120 Belgrade, Serbia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","article-title":"A Tutorial on Spectral Clustering","volume":"17","year":"2007","journal-title":"Stat. Comput."},{"key":"ref_2","unstructured":"Ng, A.Y., Jordan, M.I., and Weiss, Y. (2001, January 3\u20138). On Spectral Clustering: Analysis and an Algorithm. Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS\u201901), Vancouver, BC, Canada."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized Cuts and Image Segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","unstructured":"Li, M., Bi, W., Kwok, J.T., and Lu, B.L. (2011, January 16\u201322). Large-Scale Spectral Clustering on Graphs. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI\u201911), Barcelona, Spain."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1109\/TPAMI.2004.1262185","article-title":"Spectral Grouping Using the Nystrom Method","volume":"26","author":"Fowlkes","year":"2004","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Beygelzimer, A., Kakade, S., and Langford, J. (2006, January 25\u201329). Cover Trees for Nearest Neighbor. Proceedings of the 23rd International Conference on Machine Learning (ICML \u201906), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143857"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"121607","DOI":"10.1016\/j.apenergy.2023.121607","article-title":"Interpretable building energy consumption forecasting using spectral clustering algorithm and temporal fusion transformers architecture","volume":"349","author":"Zheng","year":"2023","journal-title":"Appl. Energy"},{"key":"ref_8","unstructured":"Elkin, Y. (2022). A New Compressed Cover Tree for k-Nearest Neighbour Search and the Stable-Under-Noise Mergegram of a Point Cloud. [Ph.D. Thesis, University of Liverpool]."},{"key":"ref_9","unstructured":"Driver, H.E., and Kroeber, A.L. (1932). Quantitative Expression of Cultural Relationships, University of California."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1037\/h0055441","article-title":"A Technique for Measuring Like-Mindedness","volume":"33","author":"Zubin","year":"1938","journal-title":"J. Abnorm. Soc. Psychol."},{"key":"ref_11","unstructured":"Tryon, R.C. (1939). Cluster Analysis: Correlation Profile and Orthometric Analysis for the Isolation of Unities in Mind and Personality, Edward Brothers."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/BF02289588","article-title":"Hierarchical Clustering Schemes","volume":"32","author":"Johnson","year":"1967","journal-title":"Psychometrika"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"661","DOI":"10.3390\/math11030661","article-title":"Reduced Clustering Method Based on the Inversion Formula Density Estimation","volume":"11","author":"Lukauskas","year":"2023","journal-title":"Mathematics"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"4459","DOI":"10.1007\/s11192-022-04463-x","article-title":"A New Clustering Method to Explore the Dynamics of Research Communities","volume":"127","author":"Cambe","year":"2022","journal-title":"Scientometrics"},{"key":"ref_15","unstructured":"Chung, F.R.K. (1997). Spectral Graph Theory, American Mathematical Society."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"298","DOI":"10.21136\/CMJ.1973.101168","article-title":"Algebraic Connectivity of Graphs","volume":"23","author":"Fiedler","year":"1973","journal-title":"Czechoslov. Math. J."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lehoucq, R.B., Sorensen, D.C., and Yang, C. (1998). ARPACK Users\u2019 Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. Software, Environments, and Tools, SIAM.","DOI":"10.1137\/1.9780898719628"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2165","DOI":"10.1109\/TGRS.2008.2011432","article-title":"Generalization of Subpixel Analysis for Hyperspectral Data with Flexibility in Spectral Similarity Measures","volume":"47","author":"Chen","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1969","DOI":"10.1007\/s11280-019-00731-8","article-title":"Spectral clustering via half-quadratic optimization","volume":"23","author":"Zhu","year":"2020","journal-title":"World Wide Web"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"20149","DOI":"10.1007\/s11042-017-4566-4","article-title":"Improved spectral clustering based on Nystr\u00f6m method","volume":"76","author":"Zhan","year":"2017","journal-title":"Multimed. Tools Appl."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/s12667-017-0240-1","article-title":"Improved spectral clustering for multi-objective controlled islanding of power grid","volume":"10","author":"Goubko","year":"2017","journal-title":"Energy Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1303","DOI":"10.1007\/s11590-020-01639-3","article-title":"Improved analysis of spectral algorithm for clustering","volume":"15","author":"Mizutani","year":"2021","journal-title":"Optim. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yan, J., Cheng, D., Zong, M., and Deng, Z. (2014, January 19\u201324). Improved Spectral Clustering Algorithm Based on Similarity Measure. Proceedings of the International Conference on Advanced Data Mining and Applications, Guilin, China.","DOI":"10.1007\/978-3-319-14717-8_50"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1007\/s00453-012-9740-5","article-title":"Improved Approximation Algorithms for the Min-max Tree Cover and Bounded Tree Cover Problems","volume":"69","author":"Khani","year":"2011","journal-title":"Algorithmica"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"100301","DOI":"10.1016\/j.caeai.2024.100301","article-title":"Comparative analysis of feature selection and extraction methods for student performance prediction across different machine learning models","volume":"7","author":"Hemdanou","year":"2024","journal-title":"Comput. Educ. Artif. Intell."},{"key":"ref_26","unstructured":"Elkin, Y.G., and Kurlin, V. (2021, January 18\u201324). A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. Proceedings of the International Conference on Machine Learning, Online."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ren, S., Zhang, S., and Wu, T. (2020). An Improved Spectral Clustering Community Detection Algorithm Based on Probability Matrix. Discrete Dynamics in Nature and Society, John Wiley & Sons Ltd.","DOI":"10.1155\/2020\/4540302"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1109\/TAI.2021.3065894","article-title":"A survey on multiview clustering","volume":"2","author":"Chao","year":"2021","journal-title":"IEEE Trans. Artif. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"5365","DOI":"10.1109\/TWC.2022.3233483","article-title":"Improved Spectral Efficiency in STAR-RIS Aided Uplink Communication Using Rate Splitting Multiple Access","volume":"22","author":"Katwe","year":"2023","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.acha.2022.02.004","article-title":"Improved spectral convergence rates for graph Laplacians on \u03f5-graphs and k-NN graphs","volume":"60","author":"Calder","year":"2022","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, W., Li, J., and Lu, S. (2023). Application of Signal Denoising Technology Based on Improved Spectral Subtraction in Arc Fault Detection. Electronics, 12.","DOI":"10.3390\/electronics12143147"},{"key":"ref_32","unstructured":"Fisher, R.A. (1988). Iris, UCI Machine Learning Repository."},{"key":"ref_33","unstructured":"Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P., and Lukasik, S. (2012). Seeds, UCI Machine Learning Repository."},{"key":"ref_34","unstructured":"German, B. (1987). Glass Identification, UCI Machine Learning Repository."},{"key":"ref_35","unstructured":"Kaur, G., and Rani, L. (2023, January 23\u201324). Mall Customer Segmentation Using K-Means Clustering. Proceedings of the International Conference on Data Analytics and Management, London, UK."},{"key":"ref_36","unstructured":"Wolberg, W., Mangasarian, O., Street, N., and Street, W. (1995). Breast Cancer Wisconsin (Diagnostic), UCI Machine Learning Repository."},{"key":"ref_37","first-page":"433","article-title":"A supervised machine learning algorithm for arrhythmia analysis","volume":"24","author":"Acar","year":"1997","journal-title":"Comput. Cardiol."},{"key":"ref_38","unstructured":"Nene, S.A., Nayar, S.K., and Murase, H. (1996). Columbia Object Image Library (Coil-20), Columbia University. Technical Report CUCS-005-96."},{"key":"ref_39","unstructured":"G\u00fcveni, H.A., Demir\u00f6z, G., and Ilter, N. (1998). Dermatology Dataset, UCI Machine Learning Repository."},{"key":"ref_40","unstructured":"Hofmann, H. (1994). Statlog (German Credit Data), UCI Machine Learning Repository."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1016\/0002-9149(89)90524-9","article-title":"International application of a new probability algorithm for the diagnosis of coronary artery disease","volume":"64","author":"Detrano","year":"1989","journal-title":"Am. J. Cardiol."},{"key":"ref_42","first-page":"262","article-title":"Classification of radar returns from the ionosphere using neural networks","volume":"10","author":"Sigillito","year":"1989","journal-title":"Johns Hopkins APL Tech. Dig."},{"key":"ref_43","unstructured":"Hopkins, M., Reeber, E., Forman, G., and Suermondt, J. (1999). Spambase Dataset, UCI Machine Learning Repository."},{"key":"ref_44","unstructured":"Street, W.N., Wolberg, W.H., and Mangasarian, O.L. (February, January 31). Nuclear feature extraction for breast tumor diagnosis. Proceedings of the IS and T\/SPIE Symposium on Electronic Imaging: Science and Technology, San Jose, CA, USA."},{"key":"ref_45","unstructured":"Yin, H., Aryani, A., Petrie, S., Nambissan, A., Astudillo, A., and Cao, S. (2024). A Rapid Review of Clustering Algorithms. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1007\/s40745-015-0040-1","article-title":"A Comprehensive Survey of Clustering Algorithms","volume":"2","author":"Xu","year":"2015","journal-title":"Ann. Data Sci."},{"key":"ref_47","first-page":"1","article-title":"Statistical Comparisons of Classifiers over Multiple Data Sets","volume":"7","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"ref_48","unstructured":"Blair, M.D., Huang, X., and Sogge, C.D. (2022). Improved spectral projection estimates. J. Eur. Math. Soc."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Achtoun, Y., Gardasevi\u0107-Filipovi\u0107, M., Mitrovi\u0107, S., and Radenovi\u0107, S. (2024). On Pre\u0161i\u0107-Type Mappings: Survey. Symmetry, 16.","DOI":"10.3390\/sym16040415"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Hemdanou, A.L., Achtoun, Y., Sefian, M.L., Tahiri, I., and Afia, A.E. (2025). Random Normed k-Means: A Paradigm-Shift in Clustering within Probabilistic Metric Spaces. arXiv.","DOI":"10.2139\/ssrn.5230900"},{"key":"ref_51","unstructured":"Hemdanou, A.L., Barkouk, H., Lamarti Sefian, M., Achtoun, Y., and Tahiri, I. (2024, January 17\u201323). Determinant Factors for Predicting Student Academic Success in Portuguese High Schools. Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development, Agadir, Morocco."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/139\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T16:39:12Z","timestamp":1762360752000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/139"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,5]]},"references-count":51,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040139"],"URL":"https:\/\/doi.org\/10.3390\/make7040139","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,5]]}}}