{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T12:43:51Z","timestamp":1784292231419,"version":"3.55.0"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions.<\/jats:p>\n          <jats:p>This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.<\/jats:p>","DOI":"10.1186\/1471-2105-14-s1-s8","type":"journal-article","created":{"date-parts":[[2013,1,14]],"date-time":"2013-01-14T13:16:27Z","timestamp":1358169387000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["Finding reproducible cluster partitions for the k-means algorithm"],"prefix":"10.1186","volume":"14","author":[{"given":"Paulo JG","family":"Lisboa","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Terence A","family":"Etchells","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ian H","family":"Jarman","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Simon J","family":"Chambers","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2013,1,14]]},"reference":[{"key":"5582_CR1","doi-asserted-by":"publisher","first-page":"2247","DOI":"10.1093\/bioinformatics\/btm320","volume":"23","author":"GC Tseng","year":"2007","unstructured":"Tseng GC: Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics. 2007, 23: 2247-55. 10.1093\/bioinformatics\/btm320.","journal-title":"Bioinformatics"},{"key":"5582_CR2","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1093\/bioinformatics\/btq569","volume":"27","author":"S Yu","year":"2011","unstructured":"Yu S, Liu X, Tranchevent L-C, Gl\u00e4nzel W, Suykens JaK, De Moor B, Moreau Y: Optimized data fusion for K-means Laplacian clustering. Bioinformatics. 2011, 27: 118-26. 10.1093\/bioinformatics\/btq569.","journal-title":"Bioinformatics"},{"key":"5582_CR3","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1504\/IJDMB.2010.033523","volume":"4","author":"B Chen","year":"2010","unstructured":"Chen B, He J, Pellicer S, Pan Y: Using Hybrid Hierarchical K-means (HHK) clustering algorithm for protein sequence motif Super-Rule-Tree (SRT) structure construction. International Journal of Data Mining and Bioinformatics. 2010, 4: 316-10.1504\/IJDMB.2010.033523.","journal-title":"International Journal of Data Mining and Bioinformatics"},{"key":"5582_CR4","first-page":"1","volume":"4","author":"H-H Bock","year":"2008","unstructured":"Bock H-H: Origins and extensions of the k-means algorithm in cluster analysis. Electronic Journ@l for History of Probability and Statistics. 2008, 4: 1-18.","journal-title":"Electronic Journ@l for History of Probability and Statistics"},{"key":"5582_CR5","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","volume":"31","author":"AK Jain","year":"2010","unstructured":"Jain AK: Data clustering: 50 years beyond K-means. Pattern Recognition Letters. 2010, 31: 651-666. 10.1016\/j.patrec.2009.09.011.","journal-title":"Pattern Recognition Letters"},{"key":"5582_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1348\/000711005X48266","volume":"59","author":"D Steinley","year":"2006","unstructured":"Steinley D: K-means clustering: a half-century synthesis. The British journal of mathematical and statistical psychology. 2006, 59: 1-34. 10.1348\/000711005X48266.","journal-title":"The British journal of mathematical and statistical psychology"},{"key":"5582_CR7","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1145\/1143844.1143923","volume-title":"Proceedings of the 23rd international conference on Machine learning - ICML '06","author":"M Meil\u0103","year":"2006","unstructured":"Meil\u0103 M: The uniqueness of a good optimum for K-means. Proceedings of the 23rd international conference on Machine learning - ICML '06. 2006, New York, New York, USA: ACM Press, 625-632."},{"key":"5582_CR8","first-page":"203","volume-title":"Skandinavisk Aktuarietidskrift","author":"T Dalenius","year":"1950","unstructured":"Dalenius T: The problem of optimal stratification I. Skandinavisk Aktuarietidskrift. 1950, 203-213."},{"key":"5582_CR9","first-page":"133","volume-title":"Skandinavisk Aktuarietidskrift","author":"T Dalenius","year":"1951","unstructured":"Dalenius T, Gurney M: The problem of optimal stratification II. Skandinavisk Aktuarietidskrift. 1951, 133-148."},{"key":"5582_CR10","first-page":"801","volume":"1","author":"H Steinhaus","year":"1956","unstructured":"Steinhaus H: Sur la division des corp materiels en parties. Bull Acad Polon Sci. 1956, 1: 801-804.","journal-title":"Bull Acad Polon Sci"},{"key":"5582_CR11","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2346830","volume":"28","author":"Ja Hartigan","year":"1979","unstructured":"Hartigan Ja, Wong MA: Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics. 1979, 28: 100-10.2307\/2346830.","journal-title":"Applied Statistics"},{"key":"5582_CR12","volume-title":"University of San Diego, California, Tech Report","author":"N Alldrin","year":"2003","unstructured":"Alldrin N, Smith A, Turnbull D: Clustering with EM and K-means. University of San Diego, California, Tech Report. 2003"},{"key":"5582_CR13","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1016\/j.patcog.2007.05.018","volume":"41","author":"M Filippone","year":"2008","unstructured":"Filippone M, Camastra F, Masulli F, Rovetta S: A survey of kernel and spectral methods for clustering. Pattern Recognition. 2008, 41: 176-190. 10.1016\/j.patcog.2007.05.018.","journal-title":"Pattern Recognition"},{"key":"5582_CR14","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1023\/A:1009769707641","volume":"2","author":"Z Huang","year":"1998","unstructured":"Huang Z: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery. 1998, 2: 283-304. 10.1023\/A:1009769707641.","journal-title":"Data Mining and Knowledge Discovery"},{"key":"5582_CR15","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1007\/s11336-008-9058-z","volume":"73","author":"D Steinley","year":"2008","unstructured":"Steinley D, Hubert L: Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal. Psychometrika. 2008, 73: 647-664. 10.1007\/s11336-008-9058-z.","journal-title":"Psychometrika"},{"key":"5582_CR16","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1037\/1082-989X.11.2.178","volume":"11","author":"D Steinley","year":"2006","unstructured":"Steinley D: Profiling local optima in K-means clustering: developing a diagnostic technique. Psychological methods. 2006, 11: 178-92.","journal-title":"Psychological methods"},{"key":"5582_CR17","first-page":"6","volume-title":"Pac Symp Biocomput","author":"A Ben-Hur","year":"2002","unstructured":"Ben-Hur A, Elisseeff A, Guyon I: A stability based method for discovering structure in clustered data. Pac Symp Biocomput. 2002, 6-17."},{"key":"5582_CR18","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1348\/000711007X184849","volume":"61","author":"D Steinley","year":"2008","unstructured":"Steinley D: Stability analysis in K-means clustering. The British journal of mathematical and statistical psychology. 2008, 61: 255-73. 10.1348\/000711007X184849.","journal-title":"The British journal of mathematical and statistical psychology"},{"key":"5582_CR19","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1162\/089976604773717621","volume":"16","author":"T Lange","year":"2004","unstructured":"Lange T, Roth V, Braun ML, Buhmann JM: Stability-based validation of clustering solutions. Neural computation. 2004, 16: 1299-323. 10.1162\/089976604773717621.","journal-title":"Neural computation"},{"key":"5582_CR20","volume-title":"Eighth International Meeting on Computational Intelligence Methods in Bioinformatics and Biostatistics","author":"PJG Lisboa","year":"2011","unstructured":"Lisboa PJG, Jarman IH, Etchells TA, Chambers SJ, Bacciu D, Whittaker J, Garibaldi JM, Ortega-Martorell S, Vellido A, Ellis IH: Discovering hidden pathways in bioinformatics. Eighth International Meeting on Computational Intelligence Methods in Bioinformatics and Biostatistics. 2011"},{"key":"5582_CR21","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1109\/TNN.2005.845141","volume":"16","author":"R Xu","year":"2005","unstructured":"Xu R, Wunsch D: Survey of clustering algorithms. IEEE transactions on neural networks\/a publication of the IEEE Neural Networks Council. 2005, 16: 645-78. 10.1109\/TNN.2005.845141.","journal-title":"IEEE transactions on neural networks\/a publication of the IEEE Neural Networks Council"},{"key":"5582_CR22","volume-title":"Clustering methods for the analysis of dna microarray data","author":"R Tibshirani","year":"1999","unstructured":"Tibshirani R, Hastie T, Eisen M, Ross D, Botstein D, Brown P: Clustering methods for the analysis of dna microarray data. 1999"},{"key":"5582_CR23","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1109\/ICIME.2009.53","volume-title":"2009 International Conference on Information Management and Engineering","author":"R Suresh","year":"2009","unstructured":"Suresh R, Dinakaran K, Valarmathie P: Model Based Modified K-Means Clustering for Microarray Data. 2009 International Conference on Information Management and Engineering. 2009, 271-273."},{"key":"5582_CR24","first-page":"401","volume-title":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings","author":"F-X Wu","year":"2003","unstructured":"Wu F-X, Zhang W-J, Kusalik A: Determination of the minimum sample size in microarray experiments to cluster genes using k-means clustering. Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. 2003, IEEE Comput. Soc, 401-406."},{"key":"5582_CR25","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1007\/s11222-010-9169-0","volume":"21","author":"G Menardi","year":"2009","unstructured":"Menardi G: Density-based Silhouette diagnostics for clustering methods. Statistics and Computing. 2009, 21: 295-308.","journal-title":"Statistics and Computing"},{"key":"5582_CR26","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2006.226","volume":"28","author":"LI Kuncheva","year":"2006","unstructured":"Kuncheva LI, Vetrov DP: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE transactions on pattern analysis and machine intelligence. 2006, 28: 1798-808.","journal-title":"IEEE transactions on pattern analysis and machine intelligence"},{"key":"5582_CR27","unstructured":"Agresti A: Categorical Data Analysis. Wiley, 1990-558."},{"key":"5582_CR28","volume-title":"Mathematical Methods of Statistics. (PMS-9)","author":"H Cramer","year":"1999","unstructured":"Cramer H: Mathematical Methods of Statistics. (PMS-9). 1999, Princeton University Press"},{"key":"5582_CR29","doi-asserted-by":"publisher","first-page":"386","DOI":"10.1037\/1082-989X.9.3.386","volume":"9","author":"D Steinley","year":"2004","unstructured":"Steinley D: Properties of the Hubert-Arabie adjusted Rand index. Psychological methods. 2004, 9: 386-96.","journal-title":"Psychological methods"},{"key":"5582_CR30","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1002\/ijc.21004","volume":"116","author":"DM Abd El-Rehim","year":"2005","unstructured":"Abd El-Rehim DM, Ball G, Pinder SE, Rakha E, Paish C, Robertson JFR, Macmillan D, Blamey RW, Ellis IO: High-throughput protein expression analysis using tissue microarray technology of a large well-characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses. International journal of cancer. 2005, 116: 340-50. 10.1002\/ijc.21004.","journal-title":"International journal of cancer"},{"key":"5582_CR31","doi-asserted-by":"publisher","first-page":"1814","DOI":"10.1016\/j.patrec.2008.05.021","volume":"29","author":"P Lisboa","year":"2008","unstructured":"Lisboa P, Ellis I, Green a, Ambrogi F, Dias M: Cluster-based visualisation with scatter matrices. Pattern Recognition Letters. 2008, 29: 1814-1823. 10.1016\/j.patrec.2008.05.021.","journal-title":"Pattern Recognition Letters"},{"key":"5582_CR32","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1109\/T-C.1969.222678","volume":"C-18","author":"JW Sammon","year":"1969","unstructured":"Sammon JW: A Nonlinear Mapping for Data Structure Analysis. IEEE Transactions on Computers. 1969, C-18: 401-409.","journal-title":"IEEE Transactions on Computers"},{"key":"5582_CR33","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1002\/1520-6661(200009\/10)9:5<311::AID-MFM12>3.0.CO;2-9","volume":"9","author":"D Ayres de Campos","year":"2000","unstructured":"Ayres de Campos D, Bernardes J, Garrido A, Marques de S\u00e1 J, Pereira Leite L: Sisporto 2.0: A program for automated analysis of cardiotocograms. The Journal of Maternal-Fetal Medicine. 2000, 9: 311-318. 10.1002\/1520-6661(200009\/10)9:5<311::AID-MFM12>3.0.CO;2-9.","journal-title":"The Journal of Maternal-Fetal Medicine"},{"key":"5582_CR34","volume-title":"UCI Machine Learning Repository","author":"A Frank","year":"2010","unstructured":"Frank A, Asuncion A: UCI Machine Learning Repository. 2010"},{"key":"5582_CR35","doi-asserted-by":"publisher","first-page":"2246","DOI":"10.1016\/j.artint.2011.09.003","volume":"176","author":"T Chen","year":"2012","unstructured":"Chen T, Zhang NL, Liu T, Poon KM, Wang Y: Model-based multidimensional clustering of categorical data. Artificial Intelligence. 2012, 176: 2246-2269. 10.1016\/j.artint.2011.09.003.","journal-title":"Artificial Intelligence"},{"key":"5582_CR36","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","volume":"63","author":"R Tibshirani","year":"2001","unstructured":"Tibshirani R, Walther G, Hastie T: Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001, 63: 411-423. 10.1111\/1467-9868.00293.","journal-title":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-14-S1-S8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T21:11:38Z","timestamp":1630530698000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-14-S1-S8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,1]]},"references-count":36,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["5582"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-14-s1-s8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,1]]},"assertion":[{"value":"14 January 2013","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S8"}}