{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T01:41:07Z","timestamp":1772847667660,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,9,27]],"date-time":"2020-09-27T00:00:00Z","timestamp":1601164800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>The K-means algorithm is a well-known and widely used clustering algorithm due to its simplicity and convergence properties. However, one of the drawbacks of the algorithm is its instability. This paper presents improvements to the K-means algorithm using a K-dimensional tree (Kd-tree) data structure. The proposed Kd-tree is utilized as a data structure to enhance the choice of initial centers of the clusters and to reduce the number of the nearest neighbor searches required by the algorithm. The developed framework also includes an efficient center insertion technique leading to an incremental operation that overcomes the instability problem of the K-means algorithm. The results of the proposed algorithm were compared with those obtained from the K-means algorithm, K-medoids, and K-means++ in an experiment using six different datasets. The results demonstrated that the proposed algorithm provides superior and more stable clustering solutions.<\/jats:p>","DOI":"10.3390\/informatics7040038","type":"journal-article","created":{"date-parts":[[2020,9,27]],"date-time":"2020-09-27T22:24:42Z","timestamp":1601245482000},"page":"38","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Tree-Based Algorithm for Stable and Efficient Data Clustering"],"prefix":"10.3390","volume":"7","author":[{"given":"Hasan","family":"Aljabbouli","sequence":"first","affiliation":[{"name":"Department of Information Systems, Colorado State University Global, Salida Way, Aurora, CO 80526, USA"}]},{"given":"Abdullah","family":"Albizri","sequence":"additional","affiliation":[{"name":"Department of Information Management &amp; Business Analytics, Montclair State University, Montclair, NJ 07043, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0407-9217","authenticated-orcid":false,"given":"Antoine","family":"Harfouche","sequence":"additional","affiliation":[{"name":"Department of SEGMI\u2013CEROS, University Paris Nanterre, 92000 Nanterre, France"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Berkhin, P. (2006). A survey of clustering data mining techniques. Grouping Multidimensional Data, Springer.","DOI":"10.1007\/3-540-28349-8_2"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Abdullah, S.S., Rostamzadeh, N., Sedig, K., Garg, A.X., and McArthur, E. (2020). Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records. Informatics, 7.","DOI":"10.3390\/informatics7020017"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"103397","DOI":"10.1016\/j.jbi.2020.103397","article-title":"FilterK: A new outlier detection method for k-means clustering of physical activity","volume":"104","author":"Jones","year":"2020","journal-title":"J. Biomed. Inform."},{"key":"ref_4","unstructured":"Jain, A.K., and Dubes, R.C. (1988). Algorithms for Clustering Data, Prentice Hall."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1145\/331499.331504","article-title":"Data Clustering: A Review","volume":"31","author":"Jain","year":"1999","journal-title":"ACM Comput. Surv."},{"key":"ref_6","unstructured":"MacQueen, J.B. (1967). Some Methods for Classification and Analysis of Multivariate Observations. 5th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Dobbins, C., and Rawassizadeh, R. (2018). Towards Clustering of Mobile and Smartwatch Accelerometer Data for Physical Activity Recognition. Informatics, 5.","DOI":"10.3390\/informatics5020029"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2006.226","article-title":"Evaluation of stability of k-means cluster ensembles with respect to random initialization","volume":"28","author":"Kuncheva","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","unstructured":"Rakhlin, A., and Caponnetto, A. (2016, January 4\u20137). Stability of K-means clustering. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1348\/000711007X184849","article-title":"Stability analysis in K-means clustering","volume":"61","author":"Steinley","year":"2008","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1348\/000711005X48266","article-title":"K-means Clustering: A Half-Century Synthesis","volume":"59","author":"Steinley","year":"2007","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"88","DOI":"10.4018\/jdm.2004100105","article-title":"Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources","volume":"15","author":"Zhao","year":"2004","journal-title":"J. Database Manag."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/j.neucom.2019.07.048","article-title":"Fast and stable clustering analysis based on Grid-mapping K-means algorithm and new clustering validity index","volume":"363","author":"Zhu","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1293","DOI":"10.1016\/j.patrec.2004.04.007","article-title":"Cluster Center Initialization Algorithm for K-means Clustering","volume":"25","author":"Khan","year":"2004","journal-title":"Pattern Recognit. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1007\/s11859-009-0106-z","article-title":"Stable initialization scheme for k-means clustering","volume":"14","author":"Xu","year":"2009","journal-title":"Wuhan Univ. J. Nat. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Arora, P., Virmani, D., Jindal, H., and Sharma, M. (2016, January 19\u201320). Sorted K-means towards the enhancement of K-means to form stable clusters. Proceedings of the International Conference on Communication and Networks, Ahmedabad, India.","DOI":"10.1007\/978-981-10-2750-5_50"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1145\/358841.358850","article-title":"Multidimensional Divide and Conquer","volume":"23","author":"Bentley","year":"1980","journal-title":"Commun. ACM"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1145\/355744.355745","article-title":"An Algorithm for Finding Best Matches in Logarithmic Expected Time","volume":"2","author":"Friedman","year":"1977","journal-title":"ACM Trans. Math. Softw."},{"key":"ref_19","unstructured":"Moore, A. (1999). Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-trees. Advances in Neural Information Processing Systems II (NIPS), MIT Press."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Pelleg, D., and Moore, A. (1999, January 15\u201318). Accelerating Exact K-means Algorithms with Geometric Reasoning. Proceedings of the 5th ACM International Conference of the Special Interest Group on Knowledge Discovery and Data Mining (ACM-SIGKDD-99), San Diego, CA, USA.","DOI":"10.1145\/312129.312248"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pelleg, D., and Moore, A. (2000). Accelerating Exact K-Means Algorithms with Geometric Reasoning-Technical Report, School of Computer Science, Carnegie Mellon University.","DOI":"10.1145\/312129.312248"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1613\/jair.453","article-title":"Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets","volume":"8","author":"Moore","year":"1998","journal-title":"J. Artif. Intell. Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1109\/TPAMI.2002.1017616","article-title":"An Efficient K-means Clustering Algorithm: Analysis and Implementation","volume":"24","author":"Kanungo","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","unstructured":"Hussein, N. (2002). A Fast Greedy K-Means Algorithm. [Master\u2019s Thesis, University of Amsterdam]."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1016\/S0031-3203(02)00060-2","article-title":"The Global K-means Clustering Algorithm","volume":"36","author":"Likas","year":"2003","journal-title":"Pattern Recognit."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1016\/j.patrec.2007.01.001","article-title":"A Method for Initialising the K-means Clustering Algorithm Using Kd-Trees","volume":"28","author":"Redmond","year":"2007","journal-title":"Pattern Recognit. Lett."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2551","DOI":"10.1016\/j.patcog.2009.02.014","article-title":"A fast k-means clustering algorithm using cluster center displacement","volume":"42","author":"Lai","year":"2009","journal-title":"Pattern Recognit."},{"key":"ref_28","unstructured":"Asuncion, A., and Newman, D.J. (2020, January 15). UCI Machine Learning Repository. Available online: http:\/\/www.ics.uci.edu\/~mlearn\/MLRepository.html."},{"key":"ref_29","unstructured":"Johnson, R.A., and Wichern, D.W. (2001). Applied Multivariate Statistical Analysis, Prentice Hall. [5th ed.]."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1023\/A:1016308404627","article-title":"Techniques of Cluster Algorithms in Data Mining","volume":"6","author":"Grabmeier","year":"2002","journal-title":"Data Min. Knowl. Discov."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/7\/4\/38\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:14:04Z","timestamp":1760177644000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/7\/4\/38"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,27]]},"references-count":30,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["informatics7040038"],"URL":"https:\/\/doi.org\/10.3390\/informatics7040038","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,27]]}}}