{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,21]],"date-time":"2025-04-21T11:07:59Z","timestamp":1745233679449,"version":"3.37.3"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2020,10,2]],"date-time":"2020-10-02T00:00:00Z","timestamp":1601596800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,2]],"date-time":"2020-10-02T00:00:00Z","timestamp":1601596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003500","name":"Universit\u00e0 degli Studi di Padova","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003500","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Adv Data Anal Classif"],"published-print":{"date-parts":[[2021,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With the recent growth in data availability and complexity, and the associated outburst of elaborate modelling approaches, model selection tools have become a lifeline, providing objective criteria to deal with this increasingly challenging landscape. In fact, basing predictions and inference on a single model may be limiting if not harmful; ensemble approaches, which combine different models, have been proposed to overcome the selection step, and proven fruitful especially in the supervised learning framework. Conversely, these approaches have been scantily explored in the unsupervised setting. In this work we focus on the model-based clustering formulation, where a plethora of mixture models, with different number of components and parametrizations, is typically estimated. We propose an ensemble clustering approach that circumvents the single best model paradigm, while improving stability and robustness of the partitions. A new density estimator, being a convex linear combination of the density estimates in the ensemble, is introduced and exploited for group assignment. As opposed to the standard case, where clusters are typically associated to the components of the selected mixture model, we define partitions by borrowing the modal, or nonparametric, formulation of the clustering problem, where groups are linked with high-density regions. Staying in the density-based realm we thus show how blending together parametric and nonparametric approaches may be beneficial from a clustering perspective.<\/jats:p>","DOI":"10.1007\/s11634-020-00423-6","type":"journal-article","created":{"date-parts":[[2020,10,2]],"date-time":"2020-10-02T07:02:58Z","timestamp":1601622178000},"page":"599-623","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Better than the best? Answers via model ensemble in density-based clustering"],"prefix":"10.1007","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2929-3850","authenticated-orcid":false,"given":"Alessandro","family":"Casa","sequence":"first","affiliation":[]},{"given":"Luca","family":"Scrucca","sequence":"additional","affiliation":[]},{"given":"Giovanna","family":"Menardi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,10,2]]},"reference":[{"issue":"3","key":"423_CR1","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1038\/nmeth.2365","volume":"10","author":"N Aghaeepour","year":"2013","unstructured":"Aghaeepour N, Finak G, Hoos H, Mosmann T, Brinkman R, Gottardo R, Scheuermann R, FlowCAP Consortium, DREAM Consortium (2013) Critical assessment of automated flow cytometry data analysis techniques. Nat Methods 10(3):228","journal-title":"Nat Methods"},{"issue":"4","key":"423_CR2","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1093\/biomet\/83.4.715","volume":"83","author":"A Azzalini","year":"1996","unstructured":"Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83(4):715\u2013726","journal-title":"Biometrika"},{"key":"423_CR3","doi-asserted-by":"crossref","unstructured":"Banfield J, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803\u2013821","DOI":"10.2307\/2532201"},{"issue":"2","key":"423_CR4","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1198\/jcgs.2010.08111","volume":"19","author":"JP Baudry","year":"2010","unstructured":"Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332\u2013353","journal-title":"J Comput Graph Stat"},{"issue":"7","key":"423_CR5","doi-asserted-by":"publisher","first-page":"719","DOI":"10.1109\/34.865189","volume":"22","author":"C Biernacki","year":"2000","unstructured":"Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE T Pattern Anal 22(7):719\u2013725","journal-title":"IEEE T Pattern Anal"},{"issue":"5","key":"423_CR6","doi-asserted-by":"publisher","first-page":"781","DOI":"10.1016\/0031-3203(94)00125-6","volume":"28","author":"G Celeux","year":"1995","unstructured":"Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781\u2013793","journal-title":"Pattern Recognit"},{"issue":"2","key":"423_CR7","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1007\/s11634-018-0308-3","volume":"13","author":"JE Chac\u00f3n","year":"2019","unstructured":"Chac\u00f3n JE (2019) Mixture model modal clustering. Adv Data Anal Classif 13(2):379\u2013404","journal-title":"Adv Data Anal Classif"},{"key":"423_CR8","doi-asserted-by":"publisher","DOI":"10.1201\/9780429485572","volume-title":"Multivariate kernel smoothing and its applications","author":"JE Chac\u00f3n","year":"2018","unstructured":"Chac\u00f3n JE, Duong T (2018) Multivariate kernel smoothing and its applications. Chapman and Hall\/CRC, London"},{"issue":"8","key":"423_CR9","doi-asserted-by":"publisher","first-page":"790","DOI":"10.1109\/34.400568","volume":"17","author":"Y Cheng","year":"1995","unstructured":"Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal 17(8):790\u2013799","journal-title":"IEEE Trans Pattern Anal"},{"key":"423_CR10","volume-title":"Model selection and model averaging","author":"G Claeskens","year":"2008","unstructured":"Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, Cambridge"},{"issue":"1","key":"423_CR11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"A Dempster","year":"1977","unstructured":"Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Stat Methodol 39(1):1\u201322","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"issue":"2","key":"423_CR12","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1023\/A:1007607513941","volume":"40","author":"T Dietterich","year":"2000","unstructured":"Dietterich T (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139\u2013157","journal-title":"Mach Learn"},{"key":"423_CR13","unstructured":"Duong T (2019) ks: Kernel Smoothing. R package version 1.11.4. https:\/\/CRAN.R-project.org\/package=ks. Accessed Aug 2019"},{"key":"423_CR14","unstructured":"Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning, pp 186\u2013193"},{"issue":"2","key":"423_CR15","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","volume":"7","author":"R Fisher","year":"1936","unstructured":"Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179\u2013188","journal-title":"Ann Eugen"},{"issue":"3","key":"423_CR16","first-page":"189","volume":"25","author":"M Forina","year":"1986","unstructured":"Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25(3):189\u2013201","journal-title":"Vitis"},{"issue":"458","key":"423_CR17","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1198\/016214502760047131","volume":"97","author":"C Fraley","year":"2002","unstructured":"Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611\u2013631","journal-title":"J Am Stat Assoc"},{"key":"423_CR18","volume-title":"The elements of statistical learning","author":"J Friedman","year":"2001","unstructured":"Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer, New York"},{"issue":"1","key":"423_CR19","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1109\/TIT.1975.1055330","volume":"21","author":"K Fukunaga","year":"1975","unstructured":"Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory 21(1):32\u201340","journal-title":"IEEE Trans Inform Theory"},{"issue":"1","key":"423_CR20","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1007\/s00180-012-0374-5","volume":"28","author":"M Glodek","year":"2013","unstructured":"Glodek M, Schels M, Schwenker F (2013) Ensemble Gaussian mixture models for probability density estimation. Comput Stat 28(1):127\u2013138","journal-title":"Comput Stat"},{"issue":"1","key":"423_CR21","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s11634-010-0058-3","volume":"4","author":"C Hennig","year":"2010","unstructured":"Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4(1):3\u201334","journal-title":"Adv Data Anal Classif"},{"issue":"1","key":"423_CR22","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193\u2013218","journal-title":"J Classif"},{"key":"423_CR23","doi-asserted-by":"crossref","unstructured":"Kuncheva L, Hadjitodorov S (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol\u00a02. IEEE, pp 1214\u20131219","DOI":"10.1109\/ICSMC.2004.1399790"},{"issue":"1","key":"423_CR24","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1017\/S0266466605050036","volume":"21","author":"H Leeb","year":"2005","unstructured":"Leeb H, P\u00f6tscher B (2005) Model selection and inference: facts and fiction. Econom Theory 21(1):21\u201359","journal-title":"Econom Theory"},{"issue":"3","key":"423_CR25","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1198\/106186005X59586","volume":"14","author":"J Li","year":"2005","unstructured":"Li J (2005) Clustering based on a multilayer mixture model. J Comput Graph Stat 14(3):547\u2013568","journal-title":"J Comput Graph Stat"},{"key":"423_CR26","first-page":"1687","volume":"8","author":"J Li","year":"2007","unstructured":"Li J, Ray S, Lindsay B (2007) A nonparametric statistical approach to clustering via mode identification. J Mach Learn Res 8:1687\u20131723","journal-title":"J Mach Learn Res"},{"issue":"428","key":"423_CR27","doi-asserted-by":"publisher","first-page":"1535","DOI":"10.1080\/01621459.1994.10476894","volume":"89","author":"D Madigan","year":"1994","unstructured":"Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam\u2019s window. J Am Stat Assoc 89(428):1535\u20131546","journal-title":"J Am Stat Assoc"},{"issue":"2","key":"423_CR28","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1080\/10618600.2016.1200472","volume":"26","author":"G Malsiner-Walli","year":"2017","unstructured":"Malsiner-Walli G, Fr\u00fchwirth-Schnatter S, Gr\u00fcn B (2017) Identifying mixtures of mixtures using Bayesian estimation. J Comput Graph Stat 26(2):285\u2013295","journal-title":"J Comput Graph Stat"},{"issue":"3","key":"423_CR29","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1111\/insr.12109","volume":"84","author":"G Menardi","year":"2016","unstructured":"Menardi G (2016) A review on modal clustering. Int Stat Rev 84(3):413\u2013433","journal-title":"Int Stat Rev"},{"issue":"1\u20132","key":"423_CR30","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/A:1023949509487","volume":"52","author":"S Monti","year":"2003","unstructured":"Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1\u20132):91\u2013118","journal-title":"Mach Learn"},{"key":"423_CR31","unstructured":"R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https:\/\/www.R-project.org\/. Accessed Aug 2019"},{"issue":"3","key":"423_CR32","doi-asserted-by":"publisher","first-page":"260","DOI":"10.3103\/S1066530707030052","volume":"16","author":"P Rigollet","year":"2007","unstructured":"Rigollet P, Tsybakov A (2007) Linear and convex aggregation of density estimators. Math Methods Stat 16(3):260\u2013280","journal-title":"Math Methods Stat"},{"key":"423_CR33","unstructured":"Russell N, Murphy TB, Raftery AE (2015) Bayesian model averaging in model-based clustering and density estimation. arXiv preprint arXiv:1506.09035"},{"issue":"2","key":"423_CR34","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1214\/aos\/1176344136","volume":"6","author":"G Schwarz","year":"1978","unstructured":"Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461\u2013464","journal-title":"Ann Stat"},{"key":"423_CR35","doi-asserted-by":"publisher","DOI":"10.1002\/9781118575574","volume-title":"Multivariate density estimation: theory, practice, and visualization","author":"D Scott","year":"2015","unstructured":"Scott D (2015) Multivariate density estimation: theory, practice, and visualization. Wiley, New York"},{"key":"423_CR36","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.csda.2015.01.006","volume":"93","author":"L Scrucca","year":"2016","unstructured":"Scrucca L (2016) Identifying connected components in Gaussian finite mixture models for clustering. Comput Stat Data Anal 93:5\u201317","journal-title":"Comput Stat Data Anal"},{"key":"423_CR37","unstructured":"Scrucca L (2020) A fast and efficient modal EM algorithm for Gaussian mixtures. arXiv preprint arXiv:2002.03600"},{"issue":"4","key":"423_CR38","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1007\/s11634-015-0220-z","volume":"9","author":"L Scrucca","year":"2015","unstructured":"Scrucca L, Raftery AE (2015) Improved initialisation of model-based clustering using Gaussian hierarchical partitions. Adv Data Anal Classif 9(4):447\u2013460","journal-title":"Adv Data Anal Classif"},{"issue":"1","key":"423_CR39","doi-asserted-by":"publisher","first-page":"289","DOI":"10.32614\/RJ-2016-021","volume":"8","author":"L Scrucca","year":"2016","unstructured":"Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289","journal-title":"R J"},{"issue":"1\u20132","key":"423_CR40","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1023\/A:1007511322260","volume":"36","author":"P Smyth","year":"1999","unstructured":"Smyth P, Wolpert D (1999) Linearly combining density estimators via stacking. Mach Learn 36(1\u20132):59\u201383","journal-title":"Mach Learn"},{"issue":"9","key":"423_CR41","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1002\/cyto.a.22106","volume":"81","author":"J Spidlen","year":"2012","unstructured":"Spidlen J, Breuer K, Rosenberg C, Kotecha N, Brinkman R (2012) Flowrepository: a resource of annotated flow cytometry datasets associated with peer-reviewed publications. Cytom Part A 81(9):727\u2013731","journal-title":"Cytom Part A"},{"key":"423_CR42","first-page":"583","volume":"3","author":"A Strehl","year":"2002","unstructured":"Strehl A, Ghosh J (2002) Cluster ensembles\u2014a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583\u2013617","journal-title":"J Mach Learn Res"},{"issue":"1","key":"423_CR43","doi-asserted-by":"publisher","first-page":"025","DOI":"10.1007\/s00357-003-0004-6","volume":"20","author":"W Stuetzle","year":"2003","unstructured":"Stuetzle W (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif 20(1):025\u2013047","journal-title":"J Classif"},{"key":"423_CR44","volume-title":"Statistical learning with sparsity: the lasso and generalizations","author":"R Tibshirani","year":"2015","unstructured":"Tibshirani R, Wainwright M, Hastie T (2015) Statistical learning with sparsity: the lasso and generalizations. Chapman and Hall, London"},{"issue":"1","key":"423_CR45","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/s11222-017-9793-z","volume":"29","author":"C Viroli","year":"2019","unstructured":"Viroli C, McLachlan G (2019) Deep Gaussian mixture models. Stat Comput 29(1):43\u201351","journal-title":"Stat Comput"},{"key":"423_CR46","unstructured":"Wang K, Ng A, McLachlan G (2018) EMMIXskew: the EM algorithm and skew mixture distribution. https:\/\/CRAN.R-project.org\/package=EMMIXskew. R package version 1.0.3"},{"issue":"2","key":"423_CR47","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/s11634-014-0182-6","volume":"9","author":"Y Wei","year":"2015","unstructured":"Wei Y, McNicholas PD (2015) Mixture model averaging for clustering. Adv Data Anal Classif 9(2):197\u2013217","journal-title":"Adv Data Anal Classif"}],"container-title":["Advances in Data Analysis and Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-020-00423-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11634-020-00423-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-020-00423-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,15]],"date-time":"2024-08-15T10:41:11Z","timestamp":1723718471000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11634-020-00423-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,2]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["423"],"URL":"https:\/\/doi.org\/10.1007\/s11634-020-00423-6","relation":{},"ISSN":["1862-5347","1862-5355"],"issn-type":[{"type":"print","value":"1862-5347"},{"type":"electronic","value":"1862-5355"}],"subject":[],"published":{"date-parts":[[2020,10,2]]},"assertion":[{"value":"6 March 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 September 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 October 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}