{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:56Z","timestamp":1772138096099,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"23","license":[{"start":{"date-parts":[[2019,5,11]],"date-time":"2019-05-11T00:00:00Z","timestamp":1557532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100005416","name":"Research Council of Norway","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"publisher"}]},{"name":"BigInsight"},{"DOI":"10.13039\/100008730","name":"Norwegian Cancer Society","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008730","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006095","name":"South-Eastern Norway Regional Health Authority","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006095","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Unsupervised clustering is important in disease subtyping, among having other genomic applications. As genomic data has become more multifaceted, how to cluster across data sources for more precise subtyping is an ever more important area of research. Many of the methods proposed so far, including iCluster and Cluster of Cluster Assignments (COCAs), make an unreasonable assumption of a common clustering across all data sources, and those that do not are fewer and tend to be computationally intensive.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, distinguishing it from methods like COCAs and iCluster, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources. A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler, which we improve with a warm start strategy and modified density functions to robustify and speed convergence. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. An interesting statistical formulation of the model results in sampling from closed-form posteriors despite incorporation of a complex latent structure. We fit the model with Gaussian and more general densities, which influences the degree of across-dataset cluster label sharing. Uniquely among integrative clustering models, our formulation makes no nestedness assumptions of samples across data sources so that a sample missing data from one genomic source can be clustered according to its existing data sources. We apply our model to a Norwegian breast cancer cohort of ductal carcinoma in situ and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. We find enrichment in the Her2 subtype and ductal carcinoma among those observations exhibiting greater cluster correspondence across expression and CNA data. In general, there are few pan-genomic clusterings, suggesting that models assuming a common clustering across genomic data sources might yield misleading results.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The model is implemented in an R package called twl (\u2018two-way latent\u2019), available on CRAN. Data for analysis are available within the R package.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz381","type":"journal-article","created":{"date-parts":[[2019,5,1]],"date-time":"2019-05-01T15:13:32Z","timestamp":1556723612000},"page":"4886-4897","source":"Crossref","is-referenced-by-count":9,"title":["A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3174-1656","authenticated-orcid":false,"given":"David M","family":"Swanson","sequence":"first","affiliation":[{"name":"Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital , Oslo, Norway"}]},{"given":"Tonje","family":"Lien","sequence":"additional","affiliation":[{"name":"Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital , Oslo, Norway"}]},{"given":"Helga","family":"Bergholtz","sequence":"additional","affiliation":[{"name":"Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital , Oslo, Norway"}]},{"given":"Therese","family":"S\u00f8rlie","sequence":"additional","affiliation":[{"name":"Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital , Oslo, Norway"},{"name":"Institute of Clinical Medicine, University of Oslo , Oslo, Norway"}]},{"given":"Arnoldo","family":"Frigessi","sequence":"additional","affiliation":[{"name":"Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital , Oslo, Norway"},{"name":"Oslo Centre for Biostatistics and Epidemiology, University of Oslo , Oslo, Norway"}]}],"member":"286","published-online":{"date-parts":[[2019,5,11]]},"reference":[{"key":"2023013108304825300_btz381-B1","doi-asserted-by":"crossref","first-page":"431.","DOI":"10.1186\/s13059-014-0431-1","article-title":"Genome-driven integrated classification of breast cancer validated in over 7500 samples","volume":"15","author":"Ali","year":"2014","journal-title":"Genome Biol"},{"key":"2023013108304825300_btz381-B2","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1198\/016214501753381814","article-title":"Analysis of data from viral DNA microchips","volume":"96","author":"Amaratunga","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023013108304825300_btz381-B3","doi-asserted-by":"crossref","first-page":"803.","DOI":"10.2307\/2532201","article-title":"Model-based Gaussian and non-Gaussian clustering","volume":"49","author":"Banfield","year":"1993","journal-title":"Biometrics"},{"key":"2023013108304825300_btz381-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12918-015-0211-x","article-title":"Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential","volume":"9","author":"Cava","year":"2015","journal-title":"BMC Syst. Biol"},{"key":"2023013108304825300_btz381-B5","doi-asserted-by":"crossref","first-page":"e0176278.","DOI":"10.1371\/journal.pone.0176278","article-title":"Integrative clustering of multi-level omic data based on non-negative matrix factorization algorithm","volume":"12","author":"Chalise","year":"2017","journal-title":"PLoS One"},{"key":"2023013108304825300_btz381-B6","first-page":"202.","article-title":"Integrative clustering methods for high-dimensional molecular data","volume":"3","author":"Chalise","year":"2014","journal-title":"Transl. Cancer Res"},{"key":"2023013108304825300_btz381-B7","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1007\/s10115-008-0134-6","article-title":"Non-negative matrix factorization for semi-supervised data clustering","volume":"17","author":"Chen","year":"2008","journal-title":"Knowl. Inf. Syst"},{"key":"2023013108304825300_btz381-B8","doi-asserted-by":"crossref","first-page":"1648","DOI":"10.1080\/01621459.2015.1100996","article-title":"Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering","volume":"111","author":"Coretto","year":"2016","journal-title":"J. Am. Stat. Assoc"},{"key":"2023013108304825300_btz381-B9","first-page":"39.","article-title":"Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering","volume":"18","author":"Coretto","year":"2017","journal-title":"J. Mach. Learn. Res"},{"key":"2023013108304825300_btz381-B10","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1038\/nature10983","article-title":"The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups","volume":"486","author":"Curtis","year":"2012","journal-title":"Nature"},{"key":"2023013108304825300_btz381-B11","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1093\/biostatistics\/kxh025","article-title":"Bayesian latent variable models for mixed discrete outcomes","volume":"6","author":"Dunson","year":"2005","journal-title":"Biostatistics"},{"key":"2023013108304825300_btz381-B12","doi-asserted-by":"crossref","first-page":"e1005781.","DOI":"10.1371\/journal.pcbi.1005781","article-title":"Clusternomics: integrative context-dependent clustering for heterogeneous datasets","volume":"13","author":"Gabasova","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023013108304825300_btz381-B13","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1093\/biostatistics\/kxw005","article-title":"Integrative clustering of high-dimensional data with joint and individual clusters","volume":"17","author":"Hellton","year":"2016","journal-title":"Biostatistics"},{"key":"2023013108304825300_btz381-B14","doi-asserted-by":"crossref","first-page":"1313","DOI":"10.1214\/009053604000000571","article-title":"Breakdown points for maximum likelihood estimators of location-scale mixtures","volume":"32","author":"Hennig","year":"2004","journal-title":"Ann. Stat"},{"key":"2023013108304825300_btz381-B15","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1016\/j.cell.2014.06.049","article-title":"Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin","volume":"158","author":"Hoadley","year":"2014","journal-title":"Cell"},{"key":"2023013108304825300_btz381-B16","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1093\/bib\/bbr056","article-title":"Lessons from a decade of integrating cancer copy number alterations with gene expression profiles","volume":"13","author":"Huang","year":"2012","journal-title":"Brief. Bioinform"},{"key":"2023013108304825300_btz381-B17","doi-asserted-by":"crossref","first-page":"3290","DOI":"10.1093\/bioinformatics\/bts595","article-title":"Bayesian correlated clustering to integrate multiple datasets","volume":"28","author":"Kirk","year":"2012","journal-title":"Bioinformatics"},{"key":"2023013108304825300_btz381-B18","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1126\/science.220.4598.671","article-title":"Optimization by simulated annealing","volume":"220","author":"Kirkpatrick","year":"1983","journal-title":"Science"},{"key":"2023013108304825300_btz381-B19","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"Koboldt","year":"2012","journal-title":"Nature"},{"key":"2023013108304825300_btz381-B20","doi-asserted-by":"crossref","first-page":"1327","DOI":"10.1214\/11-AOAS533","article-title":"Integrative model-based clustering of microarray methylation and expression data","volume":"6","author":"Kormaksson","year":"2012","journal-title":"Ann. Appl. Stat"},{"key":"2023013108304825300_btz381-B21","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/nrc3721","article-title":"Principles and methods of integrative genomic analyses in cancer","volume":"14","author":"Kristensen","year":"2014","journal-title":"Nat. Rev. Cancer"},{"key":"2023013108304825300_btz381-B22","doi-asserted-by":"crossref","first-page":"1166","DOI":"10.1016\/j.celrep.2016.06.051","article-title":"Molecular features of subtype-specific progression from ductal carcinoma in situ to invasive breast cancer","volume":"16","author":"Lesurf","year":"2016","journal-title":"Cell Rep"},{"key":"2023013108304825300_btz381-B24","doi-asserted-by":"crossref","first-page":"2610","DOI":"10.1093\/bioinformatics\/btt425","article-title":"Bayesian consensus clustering","volume":"29","author":"Lock","year":"2013","journal-title":"Bioinformatics"},{"key":"2023013108304825300_btz381-B25","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1214\/12-AOAS597","article-title":"Joint and individual variation explained (JIVE) for integrated analysis of multiple data types","volume":"7","author":"Lock","year":"2013","journal-title":"Ann. Appl. Stat"},{"key":"2023013108304825300_btz381-B23","first-page":"1","article-title":"Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome","volume":"19","author":"Miriam","year":"2017","journal-title":"Breast Cancer Res"},{"key":"2023013108304825300_btz381-B26","doi-asserted-by":"crossref","first-page":"4245","DOI":"10.1073\/pnas.1208949110","article-title":"Pattern discovery and cancer gene identification in integrated cancer genomic data","volume":"110","author":"Mo","year":"2013","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023013108304825300_btz381-B27","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1093\/biostatistics\/kxx017","article-title":"A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data","volume":"19","author":"Mo","year":"2018","journal-title":"Biostatistics"},{"key":"2023013108304825300_btz381-B28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-015-1994-2","article-title":"Changes in correlation between promoter methylation and gene expression in cancer","volume":"16","author":"Moarii","year":"2015","journal-title":"BMC Genomics"},{"key":"2023013108304825300_btz381-B29","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1016\/j.molonc.2010.06.007","article-title":"Molecular diversity in ductal carcinoma in situ (DCIS) and early invasive breast cancer","volume":"4","author":"Muggerud","year":"2010","journal-title":"Mol. Oncol"},{"key":"2023013108304825300_btz381-B30","doi-asserted-by":"crossref","first-page":"704","DOI":"10.1016\/j.molonc.2013.02.018","article-title":"Influence of DNA copy number and mRNA levels on the expression of breast cancer related proteins","volume":"7","author":"Myhre","year":"2013","journal-title":"Mol. Oncol"},{"key":"2023013108304825300_btz381-B31","first-page":"1","article-title":"Expression and methylation patterns partition luminal-A breast tumors into distinct prognostic subgroups","volume":"18","author":"Netanely","year":"2016","journal-title":"Breast Cancer Res"},{"key":"2023013108304825300_btz381-B32","doi-asserted-by":"crossref","first-page":"591.","DOI":"10.1186\/1471-2164-13-591","article-title":"Copynumber: efficient algorithms for single-and multi-track copy number segmentation","volume":"13","author":"Nilsen","year":"2012","journal-title":"BMC Genomics"},{"key":"2023013108304825300_btz381-B33","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1172\/JCI40724","article-title":"Cellular and genetic diversity in the progression of in situ human breast carcinomas to an invasive phenotype","volume":"120","author":"Park","year":"2010","journal-title":"J. Clin. Invest"},{"key":"2023013108304825300_btz381-B34","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised risk predictor of breast cancer based on intrinsic subtypes","volume":"27","author":"Parker","year":"2009","journal-title":"J. Clin. Oncol"},{"key":"2023013108304825300_btz381-B35","year":"2017"},{"key":"2023013108304825300_btz381-B36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-7-280","article-title":"Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks","volume":"7","author":"Reiss","year":"2006","journal-title":"BMC Bioinform"},{"key":"2023013108304825300_btz381-B37","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1093\/bioinformatics\/btp543","article-title":"Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis","volume":"25","author":"Shen","year":"2009","journal-title":"Bioinformatics"},{"key":"2023013108304825300_btz381-B38","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1214\/12-AOAS578","article-title":"Sparse integrative clustering of multiple omics data sets","volume":"7","author":"Shen","year":"2013","journal-title":"Ann. Appl. Stat"},{"key":"2023013108304825300_btz381-B39","doi-asserted-by":"crossref","first-page":"10869","DOI":"10.1073\/pnas.191367098","article-title":"Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications","volume":"98","author":"S\u00f8rlie","year":"2001","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023013108304825300_btz381-B40","doi-asserted-by":"crossref","first-page":"8418","DOI":"10.1073\/pnas.0932692100","article-title":"Repeated observation of breast tumor subtypes in independent gene expression data sets","volume":"100","author":"S\u00f8rlie","year":"2003","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023013108304825300_btz381-B41","doi-asserted-by":"crossref","first-page":"i268","DOI":"10.1093\/bioinformatics\/btv244","article-title":"Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery","volume":"31","author":"Speicher","year":"2015","journal-title":"Bioinformatics"},{"key":"2023013108304825300_btz381-B42","doi-asserted-by":"crossref","first-page":"3009","DOI":"10.1093\/nar\/gky131","article-title":"The association between copy number aberration, DNA methylation and gene expression in tumor samples","volume":"46","author":"Sun","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023013108304825300_btz381-B43","doi-asserted-by":"crossref","first-page":"325","DOI":"10.2217\/epi.12.21","article-title":"Complete pipeline for Infinium \u00c2\u00ae Human Methylation 450k BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation","volume":"4","author":"Touleimat","year":"2012","journal-title":"Epigenomics"},{"key":"2023013108304825300_btz381-B44","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1038\/nature13600","article-title":"Clonal evolution in breast cancer revealed by single nucleus genome sequencing","volume":"512","author":"Wang","year":"2014","journal-title":"Nature"},{"key":"2023013108304825300_btz381-B45","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1038\/bjc.2013.496","article-title":"Review of processing and analysis methods for DNA methylation array data","volume":"109","author":"Wilhelm-Benartzi","year":"2013","journal-title":"Br. J. Cancer"},{"key":"2023013108304825300_btz381-B46","doi-asserted-by":"crossref","first-page":"127ps10","DOI":"10.1126\/scitranslmed.3003854","article-title":"Intratumor heterogeneity: seeing the wood for the trees","volume":"4","author":"Yap","year":"2012","journal-title":"Sci. Transl. Med"},{"key":"2023013108304825300_btz381-B47","doi-asserted-by":"crossref","first-page":"9379","DOI":"10.1093\/nar\/gks725","article-title":"Discovery of multi-dimensional modules by integrative analysis of cancer genomic data","volume":"40","author":"Zhang","year":"2012","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz381\/28768971\/btz381.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/23\/4886\/48978493\/bioinformatics_35_23_4886.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/23\/4886\/48978493\/bioinformatics_35_23_4886.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T12:35:56Z","timestamp":1675168556000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/23\/4886\/5488120"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,5,11]]},"references-count":47,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2019,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz381","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/387076","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,12,1]]},"published":{"date-parts":[[2019,5,11]]}}}