{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T22:35:59Z","timestamp":1775687759944,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2017,9,6]],"date-time":"2017-09-06T00:00:00Z","timestamp":1504656000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on the behavior of different variants of clustering algorithms will be given.<\/jats:p>","DOI":"10.3390\/a10030105","type":"journal-article","created":{"date-parts":[[2017,9,6]],"date-time":"2017-09-06T11:23:34Z","timestamp":1504697014000},"page":"105","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":84,"title":["Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering"],"prefix":"10.3390","volume":"10","author":[{"given":"Joonas","family":"H\u00e4m\u00e4l\u00e4inen","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, University of Jyvaskyla, P.O. Box 35, FI-40014 Jyvaskyla, Finland"}]},{"given":"Susanne","family":"Jauhiainen","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Jyvaskyla, P.O. Box 35, FI-40014 Jyvaskyla, Finland"}]},{"given":"Tommi","family":"K\u00e4rkk\u00e4inen","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Jyvaskyla, P.O. Box 35, FI-40014 Jyvaskyla, Finland"}]}],"member":"1968","published-online":{"date-parts":[[2017,9,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1145\/331499.331504","article-title":"Data clustering: A review","volume":"31","author":"Jain","year":"1999","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Aggarwal, C.C., and Reddy, C.K. (2013). Data Clustering: Algorithms and Applications, CRC Press.","DOI":"10.1201\/b15410"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1109\/34.85677","article-title":"A validity measure for fuzzy clustering","volume":"13","author":"Xie","year":"1991","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond K-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zaki, M.J., and Meira, W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press.","DOI":"10.1017\/CBO9780511810114"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Saarela, M., H\u00e4m\u00e4l\u00e4inen, J., and K\u00e4rkk\u00e4inen, T. (2017, January 23\u201326). Feature Ranking of Large, Robust, and Weighted Clustering Result. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Jeju, Korea.","DOI":"10.1007\/978-3-319-57454-7_8"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"7444","DOI":"10.1016\/j.eswa.2013.07.002","article-title":"Cluster center initialization algorithm for K-modes clustering","volume":"40","author":"Khan","year":"2013","journal-title":"Expert Syst. Appl."},{"key":"ref_9","unstructured":"Arthur, D., and Vassilvitskii, S. (2007, January 7\u20139). K-means++: The advantages of careful seeding. Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1109\/TNN.2005.845141","article-title":"Survey of clustering algorithms","volume":"16","author":"Xu","year":"2005","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1109\/TSMCC.2008.2007252","article-title":"A survey of evolutionary algorithms for clustering","volume":"39","author":"Hruschka","year":"2009","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Miller, H., and Han, J. (2001). Spatial Clustering Methods in Data Mining: A Survey. Geographic Data Mining and Knowledge Discovery, CRC Press.","DOI":"10.1201\/b12382"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Huber, P.J. (1981). Robust Statistics, John Wiley & Sons Inc.","DOI":"10.1002\/0471725250"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Rousseeuw, P.J., and Leroy, A.M. (1987). Robust Regression and Outlier Detection, John Wiley & Sons Inc.","DOI":"10.1002\/0471725382"},{"key":"ref_15","unstructured":"Hettmansperger, T.P., and McKean, J.W. (1998). Robust Nonparametric Statistical Methods, Edward Arnold."},{"key":"ref_16","first-page":"3","article-title":"Analysing Student Performance using Sparse Data of Core Bachelor Courses","volume":"7","author":"Saarela","year":"2015","journal-title":"J. Educ. Data Min."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1162\/089976604322860721","article-title":"Robust Formulations for Training Multilayer Perceptrons","volume":"16","author":"Heikkola","year":"2004","journal-title":"Neural Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1007\/s11634-010-0062-7","article-title":"The k-step spatial sign covariance matrix","volume":"4","author":"Croux","year":"2010","journal-title":"Adv. Data Anal. Classif."},{"key":"ref_19","unstructured":"\u00c4yr\u00e4m\u00f6, S. (2006). Knowledge Mining Using Robust Clustering. [Ph.D. Thesis, University of Jyv\u00e4skyl\u00e4]. Jyv\u00e4skyl\u00e4 Studies in Computing 63."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/584091.584093","article-title":"A mathematical theory of communication","volume":"5","author":"Shannon","year":"2001","journal-title":"ACM SIGMOBILE Mob. Comput. Commun. Rev."},{"key":"ref_21","first-page":"583","article-title":"Cluster ensembles\u2014A knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/j.datak.2014.07.008","article-title":"WB-index: A sum-of-squares based index for cluster validity","volume":"92","author":"Zhao","year":"2014","journal-title":"Data Knowl. Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"PAMI-1","author":"Davies","year":"1979","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Harabasz","year":"1974","journal-title":"Commun. Stat. Theory Methods"},{"key":"ref_25","unstructured":"Ray, S., and Turi, R.H. (1999, January 27\u201329). Determination of number of clusters in k-means clustering and application in colour image segmentation. Proceedings of the 4th International Conference on Advances in Pattern Recognition and Digital Techniques, Calcutta, India."},{"key":"ref_26","first-page":"27","article-title":"Internal versus external cluster validation indexes","volume":"5","author":"Abundez","year":"2011","journal-title":"Int. J. Comput. Commun."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1023\/A:1012801612483","article-title":"On clustering validation techniques","volume":"17","author":"Halkidi","year":"2001","journal-title":"J. Intell. Inf. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2006.226","article-title":"Evaluation of stability of k-means cluster ensembles with respect to random initialization","volume":"28","author":"Kuncheva","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1109\/TEVC.2006.877146","article-title":"An evolutionary approach to multiobjective clustering","volume":"11","author":"Handl","year":"2007","journal-title":"IEEE Trans. Evolut. Comput."},{"key":"ref_30","unstructured":"Jauhiainen, S., and K\u00e4rkk\u00e4inen, T. (2017, January 26\u201328). A Simple Cluster Validation Index with Maximal Coverage. Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESAINN 2017), Bruges, Belgium."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2353","DOI":"10.1016\/j.patrec.2005.04.007","article-title":"New indices for cluster validity assessment","volume":"26","author":"Kim","year":"2005","journal-title":"Pattern Recognit. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1650","DOI":"10.1109\/TPAMI.2002.1114856","article-title":"Performance evaluation of some clustering algorithms and validity indices","volume":"24","author":"Maulik","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1016\/j.patcog.2012.07.021","article-title":"An extensive comparative study of cluster validity indices","volume":"46","author":"Arbelaitz","year":"2013","journal-title":"Pattern Recognit."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010, January 13\u201317). Understanding of internal clustering validation measures. Proceedings of the 2010 IEEE 10th International Conference on.Data Mining (ICDM), Sydney, Australia.","DOI":"10.1109\/ICDM.2010.35"},{"key":"ref_35","first-page":"338","article-title":"Performance measures for densed and arbitrary shaped clusters","volume":"6","author":"Agrawal","year":"2015","journal-title":"Int. J. Comput. Sci. Commun."},{"key":"ref_36","unstructured":"Halkidi, M., and Vazirgiannis, M. (December, January 29). Clustering validity assessment: Finding the optimal partitioning of a data set. Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), San Jose, CA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/s12530-012-9046-5","article-title":"A dynamic split-and-merge approach for evolving cluster models","volume":"3","author":"Lughofer","year":"2012","journal-title":"Evol. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.ins.2015.01.010","article-title":"Autonomous data stream clustering implementing split-and-merge concepts\u2014Towards a plug-and-play approach","volume":"304","author":"Lughofer","year":"2015","journal-title":"Inf. Sci."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ordonez, C. (2003, January 13). Clustering binary data streams with K-means. Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, USA.","DOI":"10.1145\/882082.882087"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1016\/j.ejor.2004.06.014","article-title":"A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems","volume":"170","author":"Bagirov","year":"2006","journal-title":"Eur. J. Oper. Res."},{"key":"ref_41","unstructured":"Karmitsa, N., Bagirov, A., and Taheri, S. (2016). MSSC Clustering of Large Data using the Limited Memory Bundle Method, Discussion Paper; University of Turku."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1023\/A:1004655007088","article-title":"Nonmonotone and monotone active-set methods for image restoration, Part 1: Convergence analysis","volume":"106","author":"Majava","year":"2000","journal-title":"J. Optim. Theory Appl."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1023\/B:JOTA.0000006687.57272.b6","article-title":"Augmented Lagrangian Active Set Methods for Obstacle Problems","volume":"119","author":"Kunisch","year":"2003","journal-title":"J. Optim. Theory Appl."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1007\/s00607-004-0097-8","article-title":"Denoising of smooth images using L1-fitting","volume":"74","author":"Kunisch","year":"2005","journal-title":"Computing"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1016\/j.patcog.2003.06.005","article-title":"Validity index for crisp and fuzzy clusters","volume":"37","author":"Pakhira","year":"2004","journal-title":"Pattern Recognit."},{"key":"ref_46","unstructured":"Desgraupes, B. (2017, September 06). \u201cClusterCrit: Clustering Indices\u201d. Available online: https:\/\/cran.r-project.org\/web\/packages\/clusterCrit\/."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/BF02294245","article-title":"An examination of procedures for determining the number of clusters in a data set","volume":"50","author":"Milligan","year":"1985","journal-title":"Psychometrika"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Fr\u00e4nti, P., and Sieranoja, S. (2017). K-means properties on six clustering benchmark datasets. Algorithms, submitted.","DOI":"10.1007\/s10489-018-1238-7"},{"key":"ref_49","unstructured":"Saarela, M., and K\u00e4rkk\u00e4inen, T. (2015, January 26\u201329). Do country stereotypes exist in educational data? A clustering approach for large, sparse, and weighted data. Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015), Madrid, Spain."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Verleysen, M., and Fran\u00e7ois, D. (2005, January 14\u201316). The Curse of Dimensionality in Data Mining and Time Series Prediction. Proceedings of the International Work-Conference on Artificial Neural Networks (IWANN), Cadiz, Spain.","DOI":"10.1007\/11494669_93"},{"key":"ref_51","unstructured":"Wartiainen, P., and K\u00e4rkk\u00e4inen, T. (2015, January 22\u201324). Hierarchical, prototype-based clustering of multiple time series with missing values. Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015), Bruges, Belgium."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/10\/3\/105\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:44:16Z","timestamp":1760208256000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/10\/3\/105"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9,6]]},"references-count":51,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2017,9]]}},"alternative-id":["a10030105"],"URL":"https:\/\/doi.org\/10.3390\/a10030105","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,9,6]]}}}