{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T01:00:42Z","timestamp":1767920442149,"version":"3.49.0"},"reference-count":19,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2020,5,18]],"date-time":"2020-05-18T00:00:00Z","timestamp":1589760000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,5,18]],"date-time":"2020-05-18T00:00:00Z","timestamp":1589760000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["647209"],"award-info":[{"award-number":["647209"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Stat"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We improve instability-based methods for the selection of the number of clusters <jats:italic>k<\/jats:italic> in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible <jats:italic>k<\/jats:italic>, overcoming limitations of current insability-based methods for large <jats:italic>k<\/jats:italic>. We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package .<\/jats:p>","DOI":"10.1007\/s00180-020-00981-5","type":"journal-article","created":{"date-parts":[[2020,5,18]],"date-time":"2020-05-18T16:51:46Z","timestamp":1589820706000},"page":"1879-1894","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Estimating the number of clusters via a corrected clustering instability"],"prefix":"10.1007","volume":"35","author":[{"given":"Jonas M. B.","family":"Haslbeck","sequence":"first","affiliation":[]},{"given":"Dirk U.","family":"Wulff","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,5,18]]},"reference":[{"key":"981_CR1","doi-asserted-by":"crossref","unstructured":"Ben-David S, Von Luxburg U, P\u00e1l D (2006) A sober look at clustering stability. In: International conference on computational learning theory. Springer, Berlin, pp 5\u201319","DOI":"10.1007\/11776420_4"},{"key":"981_CR2","doi-asserted-by":"crossref","unstructured":"Ben-Hur A, Elisseeff A, Guyon I (2001) A stability based method for discovering structure in clustered data. In: Pacific symposium on biocomputing, vol 7, pp 6\u201317","DOI":"10.1142\/9789812799623_0002"},{"key":"981_CR3","unstructured":"Bengio Y, Vincent P, Paiement JF, Delalleau O, Ouimet M, Le\u00a0Roux N (2003) Spectral clustering and kernel PCA are learning eigenfunctions. Technical report 1239, D\u00e9partement d\u2019Informatique et recherche operationelle, Universite de Montreal. http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.448.5357&rep=rep1&type=pdf. Accessed 17 Mar 2020"},{"key":"981_CR4","unstructured":"Dick T, Wong E, Dann C (2014) How many random restarts are enough? Technical report. http:\/\/www.cs.cmu.edu\/~epxing\/Class\/10715\/project-reports\/DannDickWong.pdf. Accessed 17 Mar 2020"},{"issue":"3","key":"981_CR5","doi-asserted-by":"publisher","first-page":"468","DOI":"10.1016\/j.csda.2011.09.003","volume":"56","author":"Y Fang","year":"2012","unstructured":"Fang Y, Wang J (2012) Selection of the number of clusters via the bootstrap method. Comput Stat Data Anal 56(3):468\u2013477","journal-title":"Comput Stat Data Anal"},{"key":"981_CR6","volume-title":"The elements of statistical learning","author":"J Friedman","year":"2001","unstructured":"Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer, Berlin"},{"key":"981_CR7","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.csda.2013.11.012","volume":"73","author":"A Fujita","year":"2014","unstructured":"Fujita A, Takahashi DY, Patriota AG (2014) A non-parametric method to estimate the number of clusters. Comput Stat Data Anal 73:27\u201339","journal-title":"Comput Stat Data Anal"},{"key":"981_CR8","volume-title":"Clustering algorithms","author":"JA Hartigan","year":"1975","unstructured":"Hartigan JA (1975) Clustering algorithms. Wiley, New York"},{"issue":"1","key":"981_CR9","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1016\/j.csda.2006.11.025","volume":"52","author":"C Hennig","year":"2007","unstructured":"Hennig C (2007) Cluster-wise assessment of cluster stability. Comput Stat Data Anal 52(1):258\u2013271","journal-title":"Comput Stat Data Anal"},{"key":"981_CR10","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/j.patrec.2015.04.009","volume":"64","author":"C Hennig","year":"2015","unstructured":"Hennig C (2015) What are the true clusters? Pattern Recognit Lett 64:53\u201362","journal-title":"Pattern Recognit Lett"},{"key":"981_CR11","doi-asserted-by":"publisher","first-page":"1350","DOI":"10.1214\/aos\/1176348772","volume":"20","author":"BG Leroux","year":"1992","unstructured":"Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 20:1350\u20131360","journal-title":"Ann Stat"},{"key":"981_CR12","first-page":"849","volume":"2","author":"AY Ng","year":"2002","unstructured":"Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849\u2013856","journal-title":"Adv Neural Inf Process Syst"},{"key":"981_CR13","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53\u201365","journal-title":"J Comput Appl Math"},{"issue":"2","key":"981_CR14","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1214\/aos\/1176344136","volume":"6","author":"G Schwarz","year":"1978","unstructured":"Schwarz G et al (1978) Estimating the dimension of a model. Ann Stat 6(2):461\u2013464","journal-title":"Ann Stat"},{"key":"981_CR15","first-page":"113","volume":"2","author":"RJ Steele","year":"2010","unstructured":"Steele RJ, Raftery AE (2010) Performance of bayesian model selection criteria for gaussian mixture models. Front Stat Decis Mak Bayesian Anal 2:113\u2013130","journal-title":"Front Stat Decis Mak Bayesian Anal"},{"issue":"463","key":"981_CR16","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1198\/016214503000000666","volume":"98","author":"CA Sugar","year":"2003","unstructured":"Sugar CA, James GM (2003) Finding the number of clusters in a dataset: an information-theoretic approach. J Am Stat Assoc 98(463):750\u2013763","journal-title":"J Am Stat Assoc"},{"issue":"3","key":"981_CR17","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1198\/106186005X59243","volume":"14","author":"R Tibshirani","year":"2005","unstructured":"Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14(3):511\u2013528","journal-title":"J Comput Graph Stat"},{"issue":"2","key":"981_CR18","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","volume":"63","author":"R Tibshirani","year":"2001","unstructured":"Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411\u2013423","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"issue":"4","key":"981_CR19","doi-asserted-by":"publisher","first-page":"893","DOI":"10.1093\/biomet\/asq061","volume":"97","author":"J Wang","year":"2010","unstructured":"Wang J (2010) Consistent selection of the number of clusters via crossvalidation. Biometrika 97(4):893\u2013904","journal-title":"Biometrika"}],"container-title":["Computational Statistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-020-00981-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00180-020-00981-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-020-00981-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T23:16:21Z","timestamp":1621293381000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00180-020-00981-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,18]]},"references-count":19,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["981"],"URL":"https:\/\/doi.org\/10.1007\/s00180-020-00981-5","relation":{},"ISSN":["0943-4062","1613-9658"],"issn-type":[{"value":"0943-4062","type":"print"},{"value":"1613-9658","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,18]]},"assertion":[{"value":"4 September 2017","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 March 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 May 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}