{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T16:53:22Z","timestamp":1781110402454,"version":"3.54.1"},"reference-count":44,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T00:00:00Z","timestamp":1654128000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Basic Science Research Program through the National Research Foundation (NRF) of Korea funded by the Ministry of Education","award":["2018R1D1A1B07048948"],"award-info":[{"award-number":["2018R1D1A1B07048948"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>The importance of unsupervised clustering methods is well established in the statistics and machine learning literature. Many sophisticated unsupervised classification techniques have been made available to deal with a growing number of datasets. Due to its simplicity and efficiency in clustering a large dataset, the k-means clustering algorithm is still popular and widely used in the machine learning community. However, as with other clustering methods, it requires one to choose the balanced number of clusters in advance. This paper\u2019s primary emphasis is to develop a novel method for finding the optimum number of clusters, k, using a data-driven approach. Taking into account the cluster symmetry property, the k-means algorithm is applied multiple times to a range of k values within which the balanced optimum k value is expected. This is based on the uniqueness and symmetrical nature among the centroid values for the clusters produced, and we chose the final k value as the one for which symmetry is observed. We evaluated the proposed algorithm\u2019s performance on different simulated datasets with controlled parameters and also on real datasets taken from the UCI machine learning repository. We also evaluated the performance of the proposed method with the aim of remote sensing, such as in deforestation and urbanization, using satellite images of the Islamabad region in Pakistan, taken from the Sentinel-2B satellite of the United States Geological Survey. From the experimental results and real data analysis, it is concluded that the proposed algorithm has better accuracy and minimum root mean square error than the existing methods.<\/jats:p>","DOI":"10.3390\/sym14061149","type":"journal-article","created":{"date-parts":[[2022,6,3]],"date-time":"2022-06-03T08:01:18Z","timestamp":1654243278000},"page":"1149","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["Model Selection Using K-Means Clustering Algorithm for the Symmetrical Segmentation of Remote Sensing Datasets"],"prefix":"10.3390","volume":"14","author":[{"given":"Ishfaq","family":"Ali","sequence":"first","affiliation":[{"name":"Department of Statistics, Abdul Wali Khan University, Mardan 23200, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Atiq Ur","family":"Rehman","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Faculty of Basic and Applied Sciences, International Islamic University, Islamabad 44000, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3919-8136","authenticated-orcid":false,"given":"Dost Muhammad","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Statistics, Abdul Wali Khan University, Mardan 23200, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zardad","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Statistics, Abdul Wali Khan University, Mardan 23200, Pakistan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7337-7608","authenticated-orcid":false,"given":"Muhammad","family":"Shafiq","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7186-2156","authenticated-orcid":false,"given":"Jin-Ghoo","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Caraka, R.E., Chen, R.C., Huang, S.W., Chiou, S.Y., Gio, P.U., and Pardamean, B. (2022). Big data ordination towards intensive care event count cases using fast computing GLLVMS. BMC Med. Res. Methodol., 22.","DOI":"10.1186\/s12874-022-01538-4"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bhadani, A.K., and Jothimani, D. (2016). Big data: Challenges, opportunities, and realities. Effective Big Data Management and Opportunities for Implementation, IGI Global.","DOI":"10.4018\/978-1-5225-0182-4.ch001"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1109\/TETC.2014.2330519","article-title":"A survey of clustering algorithms for big data: Taxonomy and empirical analysis","volume":"2","author":"Fahad","year":"2014","journal-title":"IEEE Trans. Emerg. Top. Comput."},{"key":"ref_4","unstructured":"Silipo, R., Adae, I., Hart, A., and Berthold, M. (2014). Seven Techniques for Dimensionality Reduction, KNIME."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Mart\u00edn-Fern\u00e1ndez, J.D., Luna-Romera, J.M., Pontes, B., and Riquelme-Santos, J.C. (2019, January 13\u201315). Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering. Proceedings of the International Workshop on Soft Computing Models in Industrial and Environmental Applications, Seville, Spain.","DOI":"10.1007\/978-3-030-20055-8_1"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"105928","DOI":"10.1016\/j.asoc.2019.105928","article-title":"Fuzzy C-means clustering through SSIM and patch for image segmentation","volume":"87","author":"Tang","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1109\/TFUZZ.2018.2883033","article-title":"Deviation-Sparse Fuzzy C-Means With Neighbor Information Constraint","volume":"27","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.asoc.2018.06.033","article-title":"A novel internal validity index based on the cluster centre and the nearest neighbour cluster","volume":"71","author":"Zhou","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ye, F., Chen, Z., Qian, H., Li, R., Chen, C., and Zheng, Z. (2018). New approaches in multi-view clustering. Recent Applications in Data Clustering, IntechOpen.","DOI":"10.5772\/intechopen.75598"},{"key":"ref_10","unstructured":"MacQueen, J. (1965\u20137, January 27). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond K-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.ins.2015.06.008","article-title":"Kernel penalized k-means: A feature selection method based on kernel k-means","volume":"322","author":"Maldonado","year":"2015","journal-title":"Inf. Sci."},{"key":"ref_13","unstructured":"Du, L., Zhou, P., Shi, L., Wang, H., Fan, M., Wang, W., and Shen, Y.D. (2015, January 25\u201331). Robust multiple kernel k-means using l21-norm. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina."},{"key":"ref_14","unstructured":"Wang, S., Gittens, A., and Mahoney, M.W. (2017). Scalable kernel k-means clustering with nystrom approximation: Relative-error bounds. arXiv."},{"key":"ref_15","first-page":"1191","article-title":"Multiple kernel k-means with incomplete kernels","volume":"42","author":"Liu","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"588","DOI":"10.17706\/jcp.13.6.588-595","article-title":"Bisecting K-means Algorithm Based on K-valued Selfdetermining and Clustering Center Optimization","volume":"13","author":"Di","year":"2018","journal-title":"J. Comput."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"15","DOI":"10.5430\/air.v7n1p15","article-title":"Estimating the number of clusters using diversity","volume":"7","author":"Kingrani","year":"2017","journal-title":"Artif. Intell. Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3007","DOI":"10.1109\/TNNLS.2016.2608001","article-title":"Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering","volume":"28","author":"Zhou","year":"2017","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/BF02294245","article-title":"An examination of procedures for determining the number of clusters in a data set","volume":"50","author":"Milligan","year":"1985","journal-title":"Psychometrika"},{"key":"ref_20","unstructured":"Shafeeq, A., and Hareesha, K. (2012, January 26\u201328). Dynamic clustering of data with modified k-means algorithm. Proceedings of the 2012 Conference on Information and Computer Networks, Singapore."},{"key":"ref_21","unstructured":"Hamerly, G., and Elkan, C. (2004). Learning the k in k-means. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Feng, Y., and Hamerly, G. (2007). PG-means: Learning the number of clusters in data. Advances in Neural Information Processing Systems, MIT Press.","DOI":"10.7551\/mitpress\/7503.003.0054"},{"key":"ref_24","unstructured":"Ray, S., and Turi, R.H. (1999, January 27\u201329). Determination of number of clusters in k-means clustering and application in colour image segmentation. Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta, India."},{"key":"ref_25","first-page":"97","article-title":"An efficient incremental clustering algorithm","volume":"3","author":"Gupta","year":"2013","journal-title":"World Comput. Sci. Inf. Technol. J"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1016\/j.ins.2017.05.024","article-title":"Curvature-based method for determining the number of clusters","volume":"415","author":"Zhang","year":"2017","journal-title":"Inf. Sci."},{"key":"ref_27","first-page":"90","article-title":"Review on determining number of Cluster in K-Means Clustering","volume":"1","author":"Kodinariya","year":"2013","journal-title":"Int. J."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9227","DOI":"10.1007\/s00500-019-04449-7","article-title":"A cluster validity evaluation method for dynamically determining the near-optimal number of clusters","volume":"24","author":"Li","year":"2020","journal-title":"Soft Comput."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Shao, X., Lee, H., Liu, Y., and Shen, B. (2017, January 11\u201313). Automatic K selection method for the K\u2014Means algorithm. Proceedings of the 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China.","DOI":"10.1109\/ICSAI.2017.8248533"},{"key":"ref_30","unstructured":"Duda, R.O., and Hart, P.E. (1973). Pattern Classification and Scene Analysis, Wiley."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Harabasz","year":"1974","journal-title":"Commun. Stat. Theory Methods"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1080\/01969727408546059","article-title":"Well-separated clusters and optimal fuzzy partitions","volume":"4","author":"Dunn","year":"1974","journal-title":"J. Cybern."},{"key":"ref_33","unstructured":"Hartigan, J.A. (1975). Clustering Algorithms, John Wiley & Sons, Inc."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"PAMI-1","author":"Davies","year":"1979","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"23","DOI":"10.2307\/2531893","article-title":"A criterion for determining the number of groups in a data set using sum-of-squares clustering","volume":"44","author":"Krzanowski","year":"1988","journal-title":"Biometrics"},{"key":"ref_37","unstructured":"Tou, J.T., and Gonzalez, R.C. (1974). Pattern Recognition Principles, Addison-Wesley Publishing Company."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gordon, A. (1999). Classification, Chapman and Hall.","DOI":"10.1201\/9780367805302"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1159","DOI":"10.1080\/01621459.1967.10500923","article-title":"On some invariant criteria for grouping data","volume":"62","author":"Friedman","year":"1967","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1037\/0033-2909.83.6.1072","article-title":"A general statistical framework for assessing categorical clustering in free recall","volume":"83","author":"Hubert","year":"1976","journal-title":"Psychol. Bull."},{"key":"ref_41","unstructured":"Dua, D., and Graff, C. (2017). UCI Machine Learning Repository, University of California Irvine."},{"key":"ref_42","unstructured":"Guyon, I., Von Luxburg, U., and Williamson, R.C. (2009). Clustering: Science or art. NIPS 2009 Workshop on Clustering Theory, NIPS."},{"key":"ref_43","unstructured":"Hijmans, R.J. (2012, April 03). Raster: Geographic Data Analysis and Modeling. R Package. Available online: https:\/\/CRAN.R-project.org\/package=raster."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1186\/s40537-019-0188-1","article-title":"Bayesian mixture models and their Big Data implementations with application to invasive species presence-only data","volume":"6","author":"Ullah","year":"2019","journal-title":"J. Big Data"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/14\/6\/1149\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:23:46Z","timestamp":1760138626000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/14\/6\/1149"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,2]]},"references-count":44,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["sym14061149"],"URL":"https:\/\/doi.org\/10.3390\/sym14061149","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,2]]}}}