{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T04:11:42Z","timestamp":1773288702065,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"23","license":[{"start":{"date-parts":[[2018,6,22]],"date-time":"2018-06-22T00:00:00Z","timestamp":1529625600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>One of the most important research areas in personalized medicine is the discovery of disease sub-types with relevance in clinical applications. This is usually accomplished by exploring gene expression data with unsupervised clustering methodologies. Then, with the advent of multiple omics technologies, data integration methodologies have been further developed to obtain better performances in patient separability. However, these methods do not guarantee the survival separability of the patients in different clusters.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We propose a new methodology that first computes a robust and sparse correlation matrix of the genes, then decomposes it and projects the patient data onto the first m spectral components of the correlation matrix. After that, a robust and adaptive to noise clustering algorithm is applied. The clustering is set up to optimize the separation between survival curves estimated cluster-wise. The method is able to identify clusters that have different omics signatures and also statistically significant differences in survival time. The proposed methodology is tested on five cancer datasets downloaded from The Cancer Genome Atlas repository. The proposed method is compared with the Similarity Network Fusion (SNF) approach, and model based clustering based on Student\u2019s t-distribution (TMIX). Our method obtains a better performance in terms of survival separability, even if it uses a single gene expression view compared to the multi-view approach of the SNF method. Finally, a pathway based analysis is accomplished to highlight the biological processes that differentiate the obtained patient groups.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our R source code is available online at https:\/\/github.com\/angy89\/RobustClusteringPatientSubtyping<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty502","type":"journal-article","created":{"date-parts":[[2018,6,20]],"date-time":"2018-06-20T03:29:09Z","timestamp":1529465349000},"page":"4064-4072","source":"Crossref","is-referenced-by-count":28,"title":["Robust clustering of noisy high-dimensional gene expression data for patients subtyping"],"prefix":"10.1093","volume":"34","author":[{"given":"Pietro","family":"Coretto","sequence":"first","affiliation":[{"name":"Department of Economics and Statistics, STATLAB, University of Salerno, Fisciano, SA, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3374-1492","authenticated-orcid":false,"given":"Angela","family":"Serra","sequence":"additional","affiliation":[{"name":"Department of Management and Innovation Systems, NeuRoNeLab, University of Salerno, Fisciano, SA, Italy"}]},{"given":"Roberto","family":"Tagliaferri","sequence":"additional","affiliation":[{"name":"Department of Management and Innovation Systems, NeuRoNeLab, University of Salerno, Fisciano, SA, Italy"}]}],"member":"286","published-online":{"date-parts":[[2018,6,22]]},"reference":[{"key":"2023012712292088400_bty502-B1","doi-asserted-by":"crossref","first-page":"3558","DOI":"10.1093\/bioinformatics\/btx464","article-title":"Towards clinically more relevant dissection of patient heterogeneity via survival-based bayesian clustering","volume":"33","author":"Ahmad","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712292088400_bty502-B2","doi-asserted-by":"crossref","first-page":"803","DOI":"10.2307\/2532201","article-title":"Model-based gaussian and non-Gaussian clustering","volume":"49","author":"Banfield","year":"1993","journal-title":"Biometrics"},{"key":"2023012712292088400_bty502-B3","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1109\/34.865189","article-title":"Assessing a mixture model for clustering with the integrated completed likelihood","volume":"22","author":"Biernacki","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023012712292088400_bty502-B4","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"Brunet","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712292088400_bty502-B5","doi-asserted-by":"crossref","first-page":"1648.","DOI":"10.1080\/01621459.2015.1100996","article-title":"Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering","volume":"111","author":"Coretto","year":"2016","journal-title":"J. Am. Stat. Assoc"},{"key":"2023012712292088400_bty502-B6","first-page":"1","article-title":"Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering","volume":"18","author":"Coretto","year":"2017","journal-title":"J. Mach. Learn. Res"},{"key":"2023012712292088400_bty502-B7","doi-asserted-by":"crossref","first-page":"D972","DOI":"10.1093\/nar\/gkw838","article-title":"The comparative toxicogenomics database: update 2017","volume":"45","author":"Davis","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023012712292088400_bty502-B8","first-page":"653","volume-title":"Robustness and Outliers","author":"Escudero","year":"2015"},{"key":"2023012712292088400_bty502-B9","doi-asserted-by":"crossref","first-page":"1324","DOI":"10.1214\/07-AOS515","article-title":"A general trimming approach to robust cluster analysis","volume":"36","author":"Garc\u00eda-Escudero","year":"2008","journal-title":"Ann. Stat"},{"key":"2023012712292088400_bty502-B10","author":"Green","year":"2006"},{"key":"2023012712292088400_bty502-B11","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21606-5","volume-title":"The Elements of Statistical Learning.","author":"Hastie","year":"2001"},{"key":"2023012712292088400_bty502-B12","doi-asserted-by":"crossref","first-page":"1313","DOI":"10.1214\/009053604000000571","article-title":"Breakdown points for maximum likelihood estimators of location? scale mixtures","volume":"32","author":"Hennig","year":"2004","journal-title":"Ann. Stat"},{"key":"2023012712292088400_bty502-B13","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1089\/omi.2015.0020","article-title":"The promise of multi-omics and clinical data integration to identify and target personalized healthcare approaches in autism spectrum disorders","volume":"19","author":"Higdon","year":"2015","journal-title":"Omics J. Integr. Biol"},{"key":"2023012712292088400_bty502-B14","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1038\/nrclinonc.2010.227","article-title":"Predictive, personalized, preventive, participatory (p4) cancer medicine","volume":"8","author":"Hood","year":"2011","journal-title":"Nat. Rev. Clin. Oncol"},{"key":"2023012712292088400_bty502-B15","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.gene.2013.08.027","article-title":"Insights into significant pathways and gene interaction networks underlying breast cancer cell line mcf-7 treated with 17\u03b2-estradiol (e2)","volume":"533","author":"Huan","year":"2014","journal-title":"Gene"},{"key":"2023012712292088400_bty502-B16","doi-asserted-by":"crossref","first-page":"D353","DOI":"10.1093\/nar\/gkw1092","article-title":"Kegg: new perspectives on genomes, pathways, diseases and drugs","volume":"45","author":"Kanehisa","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023012712292088400_bty502-B17","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1111\/j.1699-0463.1997.tb05056.x","article-title":"The cell cycle in breast cancer","volume":"105","author":"Landberg","year":"1997","journal-title":"Apmis"},{"key":"2023012712292088400_bty502-B18","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.jbo.2016.02.008","article-title":"The role of osteoclasts in breast cancer bone metastasis","volume":"5","author":"Le Pape","year":"2016","journal-title":"J. Bone Oncol"},{"key":"2023012712292088400_bty502-B19","doi-asserted-by":"crossref","first-page":"e0165457.","DOI":"10.1371\/journal.pone.0165457","article-title":"Integrated multiple &lt;sname&gt;Cl\u00e9zardin&lt;\/sname&gt; &lt;atl&gt;The roleocellular carcinoma","volume":"11","author":"Liu","year":"2016","journal-title":"PloS One"},{"key":"2023012712292088400_bty502-B20","doi-asserted-by":"crossref","first-page":"37","DOI":"10.2147\/IMCRJ.S76488","article-title":"Toxoplasmosis complicating lung cancer: a case report","volume":"8","author":"Lu","year":"2015","journal-title":"Int. Med. Case Rep. J"},{"key":"2023012712292088400_bty502-B21","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1126\/science.306.5696.630","article-title":"Getting the noise out of gene arrays","volume":"306","author":"Marshall","year":"2004","journal-title":"Science"},{"key":"2023012712292088400_bty502-B22","doi-asserted-by":"crossref","DOI":"10.1002\/0471721182","volume-title":"Finite Mixture Models","author":"McLachlan","year":"2000"},{"key":"2023012712292088400_bty502-B23","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1093\/bioinformatics\/18.3.413","article-title":"A mixture model-based approach to the clustering of microarray expression data","volume":"18","author":"McLachlan","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012712292088400_bty502-B24","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1056\/NEJMp1114866","article-title":"Preparing for precision medicine","volume":"366","author":"Mirnezami","year":"2012","journal-title":"N. Engl. J. Med"},{"key":"2023012712292088400_bty502-B25","first-page":"332","article-title":"Robust methods of estimation of correlation-coefficient","volume":"48","author":"Pasman","year":"1987","journal-title":"Automat. Remote Control"},{"key":"2023012712292088400_bty502-B26","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1023\/A:1008981510081","article-title":"Robust mixture modelling using the t distribution","volume":"10","author":"Peel","year":"2000","journal-title":"Stat. Comput"},{"key":"2023012712292088400_bty502-B27","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1038\/35021093","article-title":"Molecular portraits of human breast tumours","volume":"406","author":"Perou","year":"2000","journal-title":"Nature"},{"key":"2023012712292088400_bty502-B28","doi-asserted-by":"crossref","first-page":"27.","DOI":"10.1186\/s13073-016-0281-4","article-title":"Coincide: a framework for discovery of patient subtypes across multiple datasets","volume":"8","author":"Planey","year":"2016","journal-title":"Genome Med"},{"key":"2023012712292088400_bty502-B29","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1016\/j.ceb.2004.02.003","article-title":"Endocytosis and cancer","volume":"16","author":"Polo","year":"2004","journal-title":"Curr. Opin. Cell Biol"},{"key":"2023012712292088400_bty502-B30","doi-asserted-by":"crossref","first-page":"e47.","DOI":"10.1093\/nar\/gkv007","article-title":"limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023012712292088400_bty502-B31","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math"},{"key":"2023012712292088400_bty502-B32","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1186\/1471-2288-13-152","article-title":"Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome","volume":"13","author":"Royston","year":"2013","journal-title":"BMC Med. Res. Methodol"},{"key":"2023012712292088400_bty502-B33","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1109\/MIS.2015.60","article-title":"Subtyping: what it is and its role in precision medicine","volume":"30","author":"Saria","year":"2015","journal-title":"IEEE Intell. Syst"},{"key":"2023012712292088400_bty502-B34","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1093\/bioinformatics\/btx642","article-title":"Robust and sparse correlation matrix estimation for the analysis of high-dimensional genomics data","volume":"34","author":"Serra","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712292088400_bty502-B35","doi-asserted-by":"crossref","first-page":"214.","DOI":"10.1186\/bcr2886","article-title":"Nf-\u03bab, stem cells and breast cancer: the links get stronger","volume":"13","author":"Shostak","year":"2011","journal-title":"Breast Cancer Res"},{"key":"2023012712292088400_bty502-B36","doi-asserted-by":"crossref","DOI":"10.1038\/srep24949","article-title":"Pan-cancer subtyping in a 2d-map shows substructures that are driven by specific combinations of molecular characteristics","volume":"6","author":"Taskesen","year":"2016","journal-title":"Sci. Rep"},{"key":"2023012712292088400_bty502-B37","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/BF02289263","article-title":"Who belongs in the family?","volume":"18","author":"Thorndike","year":"1953","journal-title":"Psychometrika"},{"key":"2023012712292088400_bty502-B38","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023012712292088400_bty502-B39","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1080\/02841860801995396","article-title":"The value of top2a gene copy number variation as a biomarker in breast cancer: update of dbcg trial 89d","volume":"47","author":"Vang Nielsen","year":"2008","journal-title":"Acta Oncol"},{"key":"2023012712292088400_bty502-B40","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat. Methods"},{"key":"2023012712292088400_bty502-B41","author":"Wang","year":"2018"},{"key":"2023012712292088400_bty502-B42","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","article-title":"Model-based clustering and data transformations for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012712292088400_bty502-B43","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1089\/omi.2011.0118","article-title":"clusterprofiler: an r package for comparing biological themes among gene clusters","volume":"16","author":"Yu","year":"2012","journal-title":"OMICS J. Integr. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/23\/4064\/48920041\/bioinformatics_34_23_4064.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/23\/4064\/48920041\/bioinformatics_34_23_4064.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:16:30Z","timestamp":1674825390000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/23\/4064\/5043009"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,6,22]]},"references-count":43,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2018,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty502","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,12,1]]},"published":{"date-parts":[[2018,6,22]]}}}