{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T02:03:02Z","timestamp":1774922582244,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation\u2013maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets.<\/jats:p><jats:p>Results: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.<\/jats:p><jats:p>Availability: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info<\/jats:p><jats:p>Contact: \u00a0pmcnicho@uoguelph.ca<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq498","type":"journal-article","created":{"date-parts":[[2010,8,30]],"date-time":"2010-08-30T00:12:56Z","timestamp":1283127176000},"page":"2705-2712","source":"Crossref","is-referenced-by-count":157,"title":["Model-based clustering of microarray expression data via latent Gaussian mixture models"],"prefix":"10.1093","volume":"26","author":[{"given":"Paul D.","family":"McNicholas","sequence":"first","affiliation":[{"name":"1 Department of Mathematics & Statistics, University of Guelph, Guelph, ON, Canada, N1G2W1 and 2School of Mathematical Sciences, University College Dublin, Belfield, Dublin 4, Ireland"}]},{"given":"Thomas Brendan","family":"Murphy","sequence":"additional","affiliation":[{"name":"1 Department of Mathematics & Statistics, University of Guelph, Guelph, ON, Canada, N1G2W1 and 2School of Mathematical Sciences, University College Dublin, Belfield, Dublin 4, Ireland"}]}],"member":"286","published-online":{"date-parts":[[2010,8,29]]},"reference":[{"key":"2023012507544566800_B1","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1017\/S0370164600022070","article-title":"On Bernoulli's numerical solution of algebraic equations","volume":"46","author":"Aitken","year":"1926","journal-title":"Proc. R. Soc. Edinb."},{"key":"2023012507544566800_B2","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"A new look at the statistical model identification","volume":"19","author":"Akaike","year":"1974","journal-title":"IEEE Trans. Automat. Contr."},{"key":"2023012507544566800_B3","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507544566800_B4","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1109\/34.865189","article-title":"Assessing a mixture model for clustering with the integrated completed likelihood","volume":"22","author":"Biernacki","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023012507544566800_B5","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1007\/BF01720593","article-title":"The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family","volume":"46","author":"B\u00f6hning","year":"1994","journal-title":"Ann. Inst. Stat. Math."},{"key":"2023012507544566800_B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012507544566800_B7","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1198\/016214502753479248","article-title":"Comparison of discrimination methods for the classification of tumors using gene expression data","volume":"97","author":"Dudoit","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012507544566800_B8","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1007\/s003579900058","article-title":"MCLUST: software for model-based cluster analysis","volume":"16","author":"Fraley","year":"1999","journal-title":"J. Classif."},{"key":"2023012507544566800_B9","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012507544566800_B10","volume-title":"Finite Mixture and Markov Switching Models.","author":"Fr\u00fchwirth-Schnatter","year":"2006"},{"key":"2023012507544566800_B11","doi-asserted-by":"crossref","first-page":"12079","DOI":"10.1073\/pnas.210134797","article-title":"Coupled two-way clustering analysis of gene microarray data","volume":"97","author":"Getz","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507544566800_B12","article-title":"The EM algorithm for factor analyzers","volume-title":"Technical Report CRG-TR-96-1","author":"Ghahramani","year":"1997"},{"key":"2023012507544566800_B13","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023012507544566800_B14","doi-asserted-by":"crossref","first-page":"100","DOI":"10.2307\/2346830","article-title":"A k-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"Appl. Stat."},{"key":"2023012507544566800_B15","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"2023012507544566800_B16","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316801","volume-title":"Finding Groups in Data: An Introduction to Cluster Analysis.","author":"Kaufman","year":"1990"},{"key":"2023012507544566800_B17","doi-asserted-by":"crossref","DOI":"10.5479\/sil.322586.39088000898585","volume-title":"M\u00e9chanique Analitique.","author":"Lagrange","year":"1788"},{"key":"2023012507544566800_B18","article-title":"Mixture models: theory, geometry and applications","volume-title":"NSF-CBMS Regional Conference Series in Probability and Statistics","author":"Lindsay","year":"1995"},{"key":"2023012507544566800_B19","first-page":"41","article-title":"Bayesian model assessment in factor analysis","volume":"14","author":"Lopes","year":"2004","journal-title":"Stat. Sin."},{"key":"2023012507544566800_B20","doi-asserted-by":"crossref","DOI":"10.1002\/9780470191613","volume-title":"The EM Algorithm and Extensions","author":"McLachlan","year":"2008","edition":"2"},{"key":"2023012507544566800_B21","doi-asserted-by":"crossref","DOI":"10.1002\/0471721182","volume-title":"Finite Mixture Models.","author":"McLachlan","year":"2000"},{"key":"2023012507544566800_B22","first-page":"599","article-title":"Mixtures of factor analyzers","volume-title":"Seventh International Conference on Machine Learning.","author":"McLachlan","year":"2000"},{"key":"2023012507544566800_B23","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1093\/bioinformatics\/18.3.413","article-title":"A mixture model-based approach to the clustering of microarray expression data","volume":"18","author":"McLachlan","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012507544566800_B24","doi-asserted-by":"crossref","DOI":"10.1002\/047172842X","volume-title":"Analyzing Microarray Gene Expression Data.","author":"McLachlan","year":"2004"},{"key":"2023012507544566800_B25","doi-asserted-by":"crossref","first-page":"1608","DOI":"10.1093\/bioinformatics\/btl148","article-title":"A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays","volume":"22","author":"McLachlan","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012507544566800_B26","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1007\/s11222-008-9056-0","article-title":"Parsimonious Gaussian mixture models","volume":"18","author":"McNicholas","year":"2008","journal-title":"Stat. Comput."},{"key":"2023012507544566800_B27","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1002\/cjs.10047","article-title":"Model-based clustering of longitudinal data","volume":"38","author":"McNicholas","year":"2010","journal-title":"Can. J. Stat."},{"key":"2023012507544566800_B28","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1016\/j.csda.2009.02.011","article-title":"Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models","volume":"54","author":"McNicholas","year":"2010","journal-title":"Comput. Stat. Data Anal."},{"key":"2023012507544566800_B29","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1093\/biomet\/80.2.267","article-title":"Maximum likelihood estimation via the ECM algorithm: a general framework","volume":"80","author":"Meng","year":"1993","journal-title":"Biometrika"},{"key":"2023012507544566800_B30","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1111\/1467-9868.00082","article-title":"The EM algorithm \u2014 an old folk song sung to a fast new tune (with discussion)","volume":"59","author":"Meng","year":"1997","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012507544566800_B31","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012507544566800_B32","volume-title":"R: A Language and Environment for Statistical Computing.","author":"R Development Core Team","year":"2010"},{"key":"2023012507544566800_B33","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"2023012507544566800_B34","doi-asserted-by":"crossref","first-page":"72","DOI":"10.2307\/1412159","article-title":"The proof and measurement of association between two things","volume":"15","author":"Spearman","year":"1904","journal-title":"Am. J. Psychol."},{"key":"2023012507544566800_B35","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1162\/089976699300016728","article-title":"Mixtures of probabilistic principal component analysers","volume":"11","author":"Tipping","year":"1999","journal-title":"Neural Comput."},{"key":"2023012507544566800_B36","first-page":"235","article-title":"Clustering stability: an overview","volume":"2","author":"von Luxburg","year":"2009","journal-title":"Found. Trends Mach. Learn."},{"key":"2023012507544566800_B37","article-title":"Inverting Modified Matrices","volume-title":"Statistical Research Group Memorandum Report no. 42.","author":"Woodbury","year":"1950"},{"key":"2023012507544566800_B38","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","article-title":"Model-based clustering and data transformations for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2705\/48853820\/bioinformatics_26_21_2705.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2705\/48853820\/bioinformatics_26_21_2705.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,25]],"date-time":"2025-02-25T06:54:44Z","timestamp":1740466484000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/21\/2705\/212794"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8,29]]},"references-count":38,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2010,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq498","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,11,1]]},"published":{"date-parts":[[2010,8,29]]}}}