{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T15:36:20Z","timestamp":1773329780453,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T00:00:00Z","timestamp":1697500800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"H2020-LongITools"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularized approaches. We propose a procedure for the calibration of the number of clusters (and regularization parameter) by maximizing the sharp score, a novel stability score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) approaches calibrated by maximizing the sharp score compared to existing calibration scores and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The R package sharp (version \u22651.4.3) is available on CRAN at https:\/\/CRAN.R-project.org\/package=sharp.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad635","type":"journal-article","created":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T19:48:03Z","timestamp":1697572083000},"source":"Crossref","is-referenced-by-count":3,"title":["Automated calibration of consensus weighted distance-based clustering approaches using sharp"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0781-3624","authenticated-orcid":false,"given":"Barbara","family":"Bodinier","sequence":"first","affiliation":[{"name":"Department of Epidemiology and Biostatistics, Imperial College London , Norfolk place , London W2 1PG, United Kingdom"}]},{"given":"Dragana","family":"Vuckovic","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Biostatistics, Imperial College London , Norfolk place , London W2 1PG, United Kingdom"}]},{"given":"Sabrina","family":"Rodrigues","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Biostatistics, Imperial College London , Norfolk place , London W2 1PG, United Kingdom"}]},{"given":"Sarah","family":"Filippi","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Imperial College London , London SW7 2RH, United Kingdom"}]},{"given":"Julien","family":"Chiquet","sequence":"additional","affiliation":[{"name":"UMR MIA Paris-Saclay, AgroParisTech\/INRAE , Palaiseau 91123, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8341-5436","authenticated-orcid":false,"given":"Marc","family":"Chadeau-Hyam","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Biostatistics, Imperial College London , Norfolk place , London W2 1PG, United Kingdom"}]}],"member":"286","published-online":{"date-parts":[[2023,10,17]]},"reference":[{"key":"2023110617181311700_btad635-B1","doi-asserted-by":"crossref","DOI":"10.1201\/9781351074988","volume-title":"The New S Language","author":"Becker","year":"2018"},{"key":"2023110617181311700_btad635-B2","doi-asserted-by":"crossref","first-page":"13790","DOI":"10.1073\/pnas.191502998","article-title":"Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses","volume":"98","author":"Bhattacharjee","year":"2001","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023110617181311700_btad635-B3","doi-asserted-by":"crossref","first-page":"qlad058","DOI":"10.1093\/jrsssc\/qlad058","article-title":"Automated calibration for stability selection in penalised regression and graphical models","author":"Bodinier","year":"2023","journal-title":"J R Stat Soc Series C Appl Stat"},{"key":"2023110617181311700_btad635-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Calinski","year":"1974","journal-title":"Comm Stat Theory Methods"},{"key":"2023110617181311700_btad635-B5","first-page":"101","volume-title":"The Normal Distribution","author":"Casella","year":"2012"},{"key":"2023110617181311700_btad635-B6","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"1","author":"Davies","year":"1979","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023110617181311700_btad635-B7","author":"Dua","year":"2019"},{"key":"2023110617181311700_btad635-B8","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J Am Stat Assoc"},{"key":"2023110617181311700_btad635-B9","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1111\/j.1467-9868.2004.02059.x","article-title":"Clustering objects on subsets of attributes (with discussion)","volume":"66","author":"Friedman","year":"2004","journal-title":"J R Stat Soc Series B Stat Methodol"},{"key":"2023110617181311700_btad635-B10","doi-asserted-by":"crossref","first-page":"e1010577","DOI":"10.1371\/journal.pcbi.1010577","article-title":"Fast and interpretable consensus clustering via minipatch learning","volume":"18","author":"Gan","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2023110617181311700_btad635-B11","doi-asserted-by":"crossref","first-page":"5079","DOI":"10.1200\/JCO.2005.05.1748","article-title":"Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts","volume":"24","author":"Hayes","year":"2006","journal-title":"J Clin Oncol"},{"key":"2023110617181311700_btad635-B12","doi-asserted-by":"crossref","first-page":"244","DOI":"10.32614\/RJ-2022-020","article-title":"Palmer archipelago penguins data in the palmerpenguins R package\u2014an alternative to Anderson\u2019s irises","volume":"14","author":"Horst","year":"2022","journal-title":"R J"},{"key":"2023110617181311700_btad635-B13","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"key":"2023110617181311700_btad635-B14","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris","volume":"562","author":"Iram","year":"2018","journal-title":"Nature"},{"key":"2023110617181311700_btad635-B15","doi-asserted-by":"crossref","first-page":"1816","DOI":"10.1038\/s41598-020-58766-1","article-title":"M3C: Monte Carlo reference-based consensus clustering","volume":"10","author":"John","year":"2020","journal-title":"Sci Rep"},{"key":"2023110617181311700_btad635-B16","doi-asserted-by":"crossref","first-page":"514","DOI":"10.1007\/s00357-017-9240-z","article-title":"rCOSA: a software package for clustering objects on subsets of attributes","volume":"34","author":"Kampert","year":"2017","journal-title":"J Classif"},{"key":"2023110617181311700_btad635-B17","volume-title":"Finding Groups in Data: An Introduction to Cluster Analysis","author":"Kaufman","year":"2009"},{"key":"2023110617181311700_btad635-B18","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"key":"2023110617181311700_btad635-B19","author":"Maechler","year":"2022"},{"key":"2023110617181311700_btad635-B20","volume-title":"Applied Statistics and Probability for Engineers","author":"Montgomery","year":"2010"},{"key":"2023110617181311700_btad635-B21","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/A:1023949509487","article-title":"Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data","volume":"52","author":"Monti","year":"2003","journal-title":"Mach Learn"},{"key":"2023110617181311700_btad635-B22","doi-asserted-by":"crossref","first-page":"2843","DOI":"10.1093\/bioinformatics\/bty1049","article-title":"PINSPlus: a tool for tumor subtype discovery in integrated genomic data","volume":"35","author":"Nguyen","year":"2019","journal-title":"Bioinformatics"},{"key":"2023110617181311700_btad635-B23","doi-asserted-by":"crossref","first-page":"2025","DOI":"10.1101\/gr.215129.116","article-title":"A novel approach for data integration and disease subtyping","volume":"27","author":"Nguyen","year":"2017","journal-title":"Genome Res"},{"key":"2023110617181311700_btad635-B24","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1093\/bib\/bbz138","article-title":"Clustering and variable selection evaluation of 13 unsupervised methods for multi-omics data integration","volume":"21","author":"Pierre-Jean","year":"2020","journal-title":"Brief Bioinform"},{"key":"2023110617181311700_btad635-B25","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J Am Stat Assoc"},{"key":"2023110617181311700_btad635-B26","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J Comput Appl Math"},{"key":"2023110617181311700_btad635-B27","doi-asserted-by":"crossref","first-page":"6207","DOI":"10.1038\/srep06207","article-title":"Critical limitations of consensus clustering in class discovery","volume":"4","author":"\u0218enbabao\u011flu","year":"2014","journal-title":"Sci Rep"},{"key":"2023110617181311700_btad635-B28","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023110617181311700_btad635-B29","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J R Stat Soc Series B Stat Methodol"},{"key":"2023110617181311700_btad635-B30","volume-title":"Information Retrieval","author":"Van Rijsbergen","year":"1979"},{"key":"2023110617181311700_btad635-B31","first-page":"235","article-title":"Clustering stability: an overview","volume":"2","author":"Von Luxburg","year":"2010","journal-title":"Found Trends Mach Learn"},{"key":"2023110617181311700_btad635-B32","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The cancer genome atlas pan-cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat Genet"},{"key":"2023110617181311700_btad635-B33","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1093\/bioinformatics\/btq170","article-title":"ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking","volume":"26","author":"Wilkerson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023110617181311700_btad635-B34","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1198\/jasa.2010.tm09415","article-title":"A framework for feature selection in clustering","volume":"105","author":"Witten","year":"2010","journal-title":"J Am Stat Assoc"},{"key":"2023110617181311700_btad635-B35","doi-asserted-by":"crossref","first-page":"1131","DOI":"10.1038\/nsmb.2660","article-title":"Single-cell RNA-seq profiling of human preimplantation embryos and embryonic stem cells","volume":"20","author":"Yan","year":"2013","journal-title":"Nat Struct Mol Biol"},{"key":"2023110617181311700_btad635-B36","first-page":"103","author":"Zhang","year":"1996"},{"key":"2023110617181311700_btad635-B37","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1016\/j.neucom.2009.12.029","article-title":"Spectral clustering with eigenvector selection based on entropy ranking","volume":"73","author":"Zhao","year":"2010","journal-title":"Neurocomputing"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad635\/52191190\/btad635.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad635\/52761666\/btad635.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad635\/52761666\/btad635.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,6]],"date-time":"2023-11-06T17:20:26Z","timestamp":1699291226000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad635\/7320014"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,10,17]]},"references-count":37,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad635","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,10,17]]},"article-number":"btad635"}}