{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T13:19:16Z","timestamp":1765545556602,"version":"3.37.0"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2685,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population.<\/jats:p><jats:p>Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile, which contains the locations of unusually frequent alterations. The profile is represented as a hidden Markov model. Samples are assigned to clusters based on their similarity to the cluster's profile. We simultaneously infer the cluster assignments and the cluster profiles using an expectation maximization-like algorithm. We show, using a realistic simulation study, that our method is significantly more accurate than standard clustering techniques. We then apply our method to two clinical datasets. In particular, we examine previously reported aCGH data from a cohort of 106 follicular lymphoma patients, and discover clusters that are known to correspond to clinically relevant subgroups. In addition, we examine a cohort of 92 diffuse large B-cell lymphoma patients, and discover previously unreported clusters of biological interest which have inspired followup clinical research on an independent cohort.<\/jats:p><jats:p>Availability: Software and synthetic datasets are available at http:\/\/www.cs.ubc.ca\/\u223csshah\/acgh as part of the CNA-HMMer package.<\/jats:p><jats:p>Contact: \u00a0sshah@bccrc.ca<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp205","type":"journal-article","created":{"date-parts":[[2009,5,28]],"date-time":"2009-05-28T15:48:54Z","timestamp":1243525734000},"page":"i30-i38","source":"Crossref","is-referenced-by-count":16,"title":["Model-based clustering of array CGH data"],"prefix":"10.1093","volume":"25","author":[{"given":"Sohrab P.","family":"Shah","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"},{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"suffix":"Jr","given":"K-John","family":"Cheung","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"given":"Nathalie A.","family":"Johnson","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"given":"Guillaume","family":"Alain","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"given":"Randy D.","family":"Gascoyne","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"given":"Douglas E.","family":"Horsman","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"given":"Raymond T.","family":"Ng","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]},{"given":"Kevin P.","family":"Murphy","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of British Columbia, 201-2366 Main Mall Vancouver, BC V6T 1Z4 Canada and 2British Columbia Cancer Agency, 600 W 10th Ave Vancouver, BC V5Z 4E6 Canada"}]}],"member":"286","published-online":{"date-parts":[[2009,5,27]]},"reference":[{"key":"2023013112040704500_B1","doi-asserted-by":"crossref","first-page":"9067","DOI":"10.1073\/pnas.0402932101","article-title":"High-resolution characterization of the pancreatic adenocarcinoma genome","volume":"101","author":"Aguirre","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112040704500_B2","article-title":"Probabilistic models in noisy environments \u2013 and their application to a visual prosthesis for the blind","volume-title":"PhD Thesis.","author":"Archambeau","year":"2005"},{"key":"2023013112040704500_B3","doi-asserted-by":"crossref","first-page":"3183","DOI":"10.1182\/blood-2005-04-1399","article-title":"Diffuse large b-cell lymphoma subgroups have distinct genetic profiles that influence tumor biology and improve gene-expression-based survival prediction","volume":"106","author":"Bea","year":"2005","journal-title":"Blood"},{"key":"2023013112040704500_B4","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1111\/j.2517-6161.1986.tb01412.x","article-title":"On the statistical analysis of dirty pictures","volume":"48","author":"Besag","year":"1986","journal-title":"J. R. Stat. Soc. Ser. B"},{"volume-title":"Pattern Recognition and Machine Learning.","year":"2006","author":"Bishop","key":"2023013112040704500_B5"},{"key":"2023013112040704500_B6","first-page":"47","article-title":"Using Dirichlet mixture priors to derive Hidden Markov models for protein families","volume-title":"Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology.","author":"Brown","year":"1993"},{"key":"2023013112040704500_B7","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1182\/blood-2008-02-140616","article-title":"Genome-wide profiling of follicular lymphoma by array comparative genomic hybridization reveals prognostically significant DNA copy number imbalances","volume":"113","author":"Cheung","year":"2008","journal-title":"Blood"},{"key":"2023013112040704500_B8","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1038\/nature06914","article-title":"Translating insights from the cancer genome into clinical practice","volume":"242","author":"Chin","year":"2008","journal-title":"Nature"},{"key":"2023013112040704500_B9","doi-asserted-by":"crossref","first-page":"R215","DOI":"10.1186\/gb-2007-8-10-r215","article-title":"High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer","volume":"8","author":"Chin","year":"2007","journal-title":"Genome Biol."},{"key":"2023013112040704500_B10","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1093\/nar\/gkm076","article-title":"QuantiSNP: an objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data","volume":"35","author":"Colella","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013112040704500_B11","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1038\/scientificamerican0307-50","article-title":"Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies","volume":"296","author":"Collins","year":"2007","journal-title":"Sci. Am."},{"key":"2023013112040704500_B12","doi-asserted-by":"crossref","first-page":"1827","DOI":"10.1093\/hmg\/ddh195","article-title":"Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes","volume":"13","author":"de Leeuw","year":"2004","journal-title":"Hum. Mol. Genet."},{"key":"2023013112040704500_B13","first-page":"1","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"34","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soci. Ser. B"},{"key":"2023013112040704500_B14","doi-asserted-by":"crossref","first-page":"1149","DOI":"10.1101\/gr.5076506","article-title":"STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments","volume":"16","author":"Diskin","year":"2006","journal-title":"Genome Res."},{"volume-title":"Markov Chain Monte Carlo in Practice.","year":"1996","author":"Gilks","key":"2023013112040704500_B15"},{"key":"2023013112040704500_B16","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1002\/gcc.10314","article-title":"Identification of cytogenetic subgroups and karyotypic pathways of clonal evolution in follicular lymphomas","volume":"39","author":"H\u00f6glund","year":"2004","journal-title":"Genes Chromosomes Cancer"},{"key":"2023013112040704500_B17","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1002\/ijc.23270","article-title":"BAC array CGH distinguishes mutually exclusive alterations that define clinicogenetic subtypes of gliomas","volume":"122","author":"Idbaih","year":"2008","journal-title":"Int. J. Cancer"},{"key":"2023013112040704500_B18","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/ng1307","article-title":"A tiling resolution DNA microarray with complete coverage of the human genome","volume":"36","author":"Ishkanian","year":"2004","journal-title":"Nat. Genet."},{"key":"2023013112040704500_B19","doi-asserted-by":"crossref","first-page":"a477","DOI":"10.1182\/blood.V112.11.477.477","article-title":"Deletion in chromosome 17p12 and gains in chromosome 9q33.3 by array comparative hybridization are associated with R-CHOP treatment failure in patients with diffuse large B cell lymphoma","volume":"111","author":"Johnson","year":"2008","journal-title":"Blood"},{"key":"2023013112040704500_B20","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1002\/path.2112","article-title":"Genetic intra-tumour heterogeneity in epithelial ovarian cancer and its implications for molecular diagnosis of tumours","volume":"211","author":"Khalique","year":"2007","journal-title":"J. Pathol."},{"key":"2023013112040704500_B21","doi-asserted-by":"crossref","first-page":"e13","DOI":"10.1093\/nar\/gkm1143","article-title":"Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data","volume":"36","author":"Klijn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112040704500_B22","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1109\/TPAMI.2004.71","article-title":"Simultaneous Feature selection and clustering using mixture models","volume":"26","author":"Law","year":"2004","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023013112040704500_B23","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1002\/gcc.20496","article-title":"ArrayCGH-based classification of neuroblastoma into genomic subgroups","volume":"46","author":"Michels","year":"2007","journal-title":"Genes Chromosomes Cancer"},{"key":"2023013112040704500_B24","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1038\/35021093","article-title":"Molecular portraits of human breast tumours","volume":"406","author":"Perou","year":"2000","journal-title":"Nature"},{"issue":"Suppl","key":"2023013112040704500_B25","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1038\/ng1569","article-title":"Array comparative genomic hybridization and its applications in cancer","volume":"37","author":"Pinkel","year":"2005","journal-title":"Nat. Genet."},{"key":"2023013112040704500_B26","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1198\/016214506000000113","article-title":"Variable selection for model-based clustering","volume":"101","author":"Raftery","year":"2006","journal-title":"J. Am. Stat. Assoc."},{"key":"2023013112040704500_B27","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1093\/bioinformatics\/btl004","article-title":"Computation of recurrent minimal genomic alterations from array-CGH data","volume":"22","author":"Rouveirol","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112040704500_B28","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1214\/07-AOAS155","article-title":"Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays","volume":"2","author":"Scharpf","year":"2008","journal-title":"Ann. Appl. Stat."},{"key":"2023013112040704500_B29","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1093\/bioinformatics\/btl238","article-title":"Integrating copy number polymorphisms into array CGH analysis using a robust HMM","volume":"22","author":"Shah","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112040704500_B30","doi-asserted-by":"crossref","first-page":"450","DOI":"10.1093\/bioinformatics\/btm221","article-title":"Modeling recurrent DNA copy number alterations in array CGH data","volume":"23","author":"Shah","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112040704500_B31","doi-asserted-by":"crossref","first-page":"2667","DOI":"10.1016\/j.ejca.2004.08.021","article-title":"Molecular portraits of breast cancer: tumour subtypes as distinct disease entities","volume":"40","author":"Sorlie","year":"2004","journal-title":"Eur. J. Cancer"},{"key":"2023013112040704500_B32","volume-title":"Introduction to Data Mining.","author":"Tan","year":"2005","edition":"First"},{"key":"2023013112040704500_B33","doi-asserted-by":"crossref","first-page":"9625","DOI":"10.1073\/pnas.0504126102","article-title":"High-resolution genomic profiles of human lung cancer","volume":"102","author":"Tonon","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112040704500_B34","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1080\/0094965031000136012","article-title":"A new partitioning around medoids algorithm","volume":"73","author":"van der Laan","year":"2003","journal-title":"J. Stat. Comput. Simul."},{"key":"2023013112040704500_B35","first-page":"484","article-title":"Nonparametric testing for DNA copy number induced differential mRNA gene expression","volume":"9","author":"van Wieringen","year":"2008","journal-title":"Biometrics"},{"key":"2023013112040704500_B36","doi-asserted-by":"crossref","first-page":"9991","DOI":"10.1073\/pnas.1732008100","article-title":"A gene expression-based method to diagnose clinically distinct subgroups of diffuse large b cell lymphoma","volume":"100","author":"Wright","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/12\/i30\/48994580\/bioinformatics_25_12_i30.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/12\/i30\/48994580\/bioinformatics_25_12_i30.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,9]],"date-time":"2025-02-09T19:02:49Z","timestamp":1739127769000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/12\/i30\/189221"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,5,27]]},"references-count":36,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2009,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp205","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2009,6,15]]},"published":{"date-parts":[[2009,5,27]]}}}