{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:12:17Z","timestamp":1740147137501,"version":"3.37.3"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2020,6,1]],"date-time":"2020-06-01T00:00:00Z","timestamp":1590969600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T00:00:00Z","timestamp":1591920000000},"content-version":"vor","delay-in-days":11,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["1654596"],"award-info":[{"award-number":["1654596"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012338","name":"Alan Turing Institute","doi-asserted-by":"publisher","award":["TU\/D\/000013"],"award-info":[{"award-number":["TU\/D\/000013"]}],"id":[{"id":"10.13039\/100012338","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Adv Data Anal Classif"],"published-print":{"date-parts":[[2020,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We present a novel nonparametric Bayesian approach for performing cluster analysis in a context where observational units have data arising from multiple sources. Our approach uses a particle Gibbs sampler for inference in which cluster allocations are jointly updated using a conditional particle filter within a Gibbs sampler, improving the mixing of the MCMC chain. We develop several approaches to improving the computational performance of our algorithm. These methods can achieve greater than an order-of-magnitude improvement in performance at no cost to accuracy and can be applied more broadly to Bayesian inference for mixture models with a single dataset. We apply our algorithm to the discovery of risk cohorts amongst 243 patients presenting with kidney renal clear cell carcinoma, using samples from the Cancer Genome Atlas, for which there are gene expression, copy number variation, DNA methylation, protein expression and microRNA data. We identify 4 distinct consensus subtypes and show they are prognostic for survival rate (<jats:inline-formula><jats:alternatives><jats:tex-math>$$p &lt; 0.0001$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mi>p<\/mml:mi>\n                    <mml:mo>&lt;<\/mml:mo>\n                    <mml:mn>0.0001<\/mml:mn>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>).<\/jats:p>","DOI":"10.1007\/s11634-020-00401-y","type":"journal-article","created":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T13:02:36Z","timestamp":1591966956000},"page":"463-484","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification"],"prefix":"10.1007","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2663-519X","authenticated-orcid":false,"given":"Nathan","family":"Cunningham","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4828-7368","authenticated-orcid":false,"given":"Jim E.","family":"Griffin","sequence":"additional","affiliation":[]},{"given":"David L.","family":"Wild","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,6,12]]},"reference":[{"issue":"3","key":"401_CR1","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1111\/j.1467-9868.2009.00736.x","volume":"72","author":"C Andrieu","year":"2010","unstructured":"Andrieu C, Doucet A, Holenstein R (2010) Particle Markov chain Monte Carlo methods. J R Stat Soc Ser B Stat Methodol 72(3):269\u2013342","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"key":"401_CR2","unstructured":"Bernardo JM, Smith AF (2001) Bayesian Theory"},{"issue":"28","key":"401_CR3","first-page":"1","volume":"18","author":"A Bouchard-C\u00f4t\u00e9","year":"2017","unstructured":"Bouchard-C\u00f4t\u00e9 A, Doucet A, Roth A (2017) Particle Gibbs split-merge sampling for Bayesian inference in mixture models. J Mach Learn Res 18(28):1\u201339","journal-title":"J Mach Learn Res"},{"issue":"3","key":"401_CR4","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1093\/biomet\/89.3.539","volume":"89","author":"N Chopin","year":"2002","unstructured":"Chopin N (2002) A sequential particle filter method for static models. Biometrika 89(3):539\u2013552","journal-title":"Biometrika"},{"issue":"3","key":"401_CR5","doi-asserted-by":"publisher","first-page":"1855","DOI":"10.3150\/14-BEJ629","volume":"21","author":"N Chopin","year":"2015","unstructured":"Chopin N, Singh SS (2015) On particle Gibbs sampling. Bernoulli 21(3):1855\u20131883","journal-title":"Bernoulli"},{"key":"401_CR6","unstructured":"Cunningham N, Griffin JE, Wild DL, Lee A (2019) Bayesian Statistics: New Challenges and New Generations, vol 2018, Springer"},{"issue":"656\u2013704","key":"401_CR7","first-page":"3","volume":"12","author":"A Doucet","year":"2009","unstructured":"Doucet A, Johansen AM (2009) A tutorial on particle filtering and smoothing: fifteen years later. Handb Nonlinear Filter 12(656\u2013704):3","journal-title":"Handb Nonlinear Filter"},{"issue":"1","key":"401_CR8","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1023\/B:STCO.0000009418.04621.cd","volume":"14","author":"P Fearnhead","year":"2004","unstructured":"Fearnhead P (2004) Particle filters for mixture models with an unknown number of components. Stat Comput 14(1):11\u201321","journal-title":"Stat Comput"},{"issue":"2","key":"401_CR9","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","volume":"7","author":"RA Fisher","year":"1936","unstructured":"Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179\u2013188","journal-title":"Ann Eugen"},{"issue":"2","key":"401_CR10","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1214\/09-BA414","volume":"4","author":"A Fritsch","year":"2009","unstructured":"Fritsch A, Ickstadt K et al (2009) Improved criteria for clustering based on the posterior similarity matrix. Bayesian Anal 4(2):367\u2013391","journal-title":"Bayesian Anal"},{"issue":"10","key":"401_CR11","doi-asserted-by":"publisher","first-page":"e1005781","DOI":"10.1371\/journal.pcbi.1005781","volume":"13","author":"E Gabasova","year":"2017","unstructured":"Gabasova E, Reid J, Wernisch L (2017) Clusternomics: integrative context-dependent clustering for heterogeneous datasets. PLoS Comput Biol 13(10):e1005781","journal-title":"PLoS Comput Biol"},{"issue":"2","key":"401_CR12","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1111\/1467-9469.00242","volume":"28","author":"PJ Green","year":"2001","unstructured":"Green PJ, Richardson S (2001) Modelling heterogeneity with and without the Dirichlet process. Scand J Stat 28(2):355\u2013375","journal-title":"Scand J Stat"},{"issue":"1","key":"401_CR13","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s11222-015-9612-3","volume":"27","author":"J Griffin","year":"2014","unstructured":"Griffin J (2014) Sequential Monte Carlo methods for mixtures with normalized random measures with independent increments priors. Stat Comput 27(1):131\u2013145","journal-title":"Stat Comput"},{"key":"401_CR14","doi-asserted-by":"crossref","unstructured":"Hol JD, Schon TB, Gustafsson F (2006) On resampling algorithms for particle filters. In: nonlinear statistical signal processing workshop, 2006 IEEE, IEEE, pp 79\u201382","DOI":"10.1109\/NSSPW.2006.4378824"},{"issue":"2","key":"401_CR15","doi-asserted-by":"publisher","first-page":"269","DOI":"10.2307\/3315951","volume":"30","author":"H Ishwaran","year":"2002","unstructured":"Ishwaran H, Zarepour M (2002) Exact and approximate sum representations for the Dirichlet process. Can J Stat 30(2):269\u2013283","journal-title":"Can J Stat"},{"key":"401_CR16","unstructured":"Kassambara A, Kosinski M (2018) survminer: Drawing Survival Curves using \u2019ggplot2\u2019. R package version (4):2"},{"issue":"24","key":"401_CR17","doi-asserted-by":"publisher","first-page":"3290","DOI":"10.1093\/bioinformatics\/bts595","volume":"28","author":"P Kirk","year":"2012","unstructured":"Kirk P, Griffin JE, Savage RS, Ghahramani Z, Wild DL (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics 28(24):3290\u20133297","journal-title":"Bioinformatics"},{"key":"401_CR18","doi-asserted-by":"publisher","first-page":"CIN-S38000","DOI":"10.4137\/CIN.S38000","volume":"15","author":"N Lawlor","year":"2016","unstructured":"Lawlor N, Fabbri A, Guan P, George J, Karuturi RKM (2016) multiclust: an r-package for identifying biologically relevant clusters in cancer transcriptome profiles. Cancer Inf 15:CIN-S38000","journal-title":"Cancer Inf"},{"issue":"1","key":"401_CR19","doi-asserted-by":"publisher","first-page":"e0116774","DOI":"10.1371\/journal.pone.0116774","volume":"10","author":"H Li","year":"2015","unstructured":"Li H, Han D, Hou Y, Chen H, Chen Z (2015) Statistical inference methods for two crossing survival curves: a comparison of methods. PLoS One 10(1):e0116774","journal-title":"PLoS One"},{"issue":"430","key":"401_CR20","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1080\/01621459.1995.10476549","volume":"90","author":"JS Liu","year":"1995","unstructured":"Liu JS, Chen R (1995) Blind deconvolution via sequential imputations. J Am Stat Assoc 90(430):567\u2013576","journal-title":"J Am Stat Assoc"},{"issue":"20","key":"401_CR21","doi-asserted-by":"publisher","first-page":"2610","DOI":"10.1093\/bioinformatics\/btt425","volume":"29","author":"EF Lock","year":"2013","unstructured":"Lock EF, Dunson DB (2013) Bayesian consensus clustering. Bioinformatics 29(20):2610\u20132616","journal-title":"Bioinformatics"},{"issue":"2","key":"401_CR22","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1214\/14-AOAS726","volume":"8","author":"D McParland","year":"2014","unstructured":"McParland D, Gormley IC, McCormick TH, Clark SJ, Kabudula CW, Collinson MA (2014) Clustering South African households based on their asset status using latent variable models. Ann Appl Stat 8(2):747","journal-title":"Ann Appl Stat"},{"issue":"28","key":"401_CR23","doi-asserted-by":"publisher","first-page":"4548","DOI":"10.1002\/sim.7371","volume":"36","author":"D McParland","year":"2017","unstructured":"McParland D, Phillips CM, Brennan L, Roche HM, Gormley IC (2017) Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data. Stat Med 36(28):4548\u20134569","journal-title":"Stat Med"},{"issue":"8","key":"401_CR24","doi-asserted-by":"publisher","first-page":"1222","DOI":"10.1093\/bioinformatics\/bth068","volume":"20","author":"M Medvedovic","year":"2004","unstructured":"Medvedovic M, Yeung K, Bumgarner R (2004) Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20(8):1222\u20131232","journal-title":"Bioinformatics"},{"issue":"1\u20132","key":"401_CR25","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/A:1023949509487","volume":"52","author":"S Monti","year":"2003","unstructured":"Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1\u20132):91\u2013118","journal-title":"Mach Learn"},{"key":"401_CR26","unstructured":"Murphy KP (2007) Conjugate Bayesian analysis of the Gaussian distribution. Tech. rep"},{"issue":"336","key":"401_CR27","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","volume":"66","author":"WM Rand","year":"1971","unstructured":"Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846\u2013850","journal-title":"J Am Stat Assoc"},{"issue":"4","key":"401_CR28","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1109\/TCBB.2007.70269","volume":"6","author":"C Rasmussen","year":"2009","unstructured":"Rasmussen C, de la Cruz B, Ghahramani Z, Wild D (2009) Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures. IEEE\/ACM Trans Comput Biol Bioinf 6(4):615\u2013628","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"issue":"5","key":"401_CR29","doi-asserted-by":"publisher","first-page":"689","DOI":"10.1111\/j.1467-9868.2011.00781.x","volume":"73","author":"J Rousseau","year":"2011","unstructured":"Rousseau J, Mengersen K (2011) Asymptotic behaviour of the posterior distribution in overfitted mixture models. J R Stat Soc Ser B Stat Methodol 73(5):689\u2013710","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"key":"401_CR30","unstructured":"Savage RS, Ghahramani Z, Griffin JE, Kirk P, Wild DL (2013) Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data. arXiv preprint arXiv:1304.3577"},{"issue":"22","key":"401_CR31","doi-asserted-by":"publisher","first-page":"2906","DOI":"10.1093\/bioinformatics\/btp543","volume":"25","author":"R Shen","year":"2009","unstructured":"Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22):2906\u20132912","journal-title":"Bioinformatics"},{"issue":"1","key":"401_CR32","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1007\/s11336-007-9019-y","volume":"73","author":"D Steinley","year":"2008","unstructured":"Steinley D, Brusco MJ (2008) Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika 73(1):125","journal-title":"Psychometrika"},{"issue":"1","key":"401_CR33","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1093\/biomet\/64.1.156","volume":"64","author":"RE Tarone","year":"1977","unstructured":"Tarone RE, Ware J (1977) On distribution-free tests for equality of survival distributions. Biometrika 64(1):156\u2013160","journal-title":"Biometrika"},{"issue":"7","key":"401_CR34","doi-asserted-by":"publisher","first-page":"644","DOI":"10.1038\/nbt.2940","volume":"32","author":"Y Yuan","year":"2014","unstructured":"Yuan Y, Van Allen EM, Omberg L, Wagle N, Amin-Mansour A, Sokolov A, Byers LA, Xu Y, Hess KR, Diao L et al (2014) Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat Biotechnol 32(7):644","journal-title":"Nat Biotechnol"}],"container-title":["Advances in Data Analysis and Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-020-00401-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11634-020-00401-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-020-00401-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,11]],"date-time":"2021-06-11T23:59:02Z","timestamp":1623455942000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11634-020-00401-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6]]},"references-count":34,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,6]]}},"alternative-id":["401"],"URL":"https:\/\/doi.org\/10.1007\/s11634-020-00401-y","relation":{},"ISSN":["1862-5347","1862-5355"],"issn-type":[{"type":"print","value":"1862-5347"},{"type":"electronic","value":"1862-5355"}],"subject":[],"published":{"date-parts":[[2020,6]]},"assertion":[{"value":"15 January 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 March 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 May 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 June 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}