{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T12:12:08Z","timestamp":1769170328008,"version":"3.49.0"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis.<\/jats:p>\n               <jats:p>Results: In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models.<\/jats:p>\n               <jats:p>Availability and implementation: An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http:\/\/www.r-project.org.<\/jats:p>\n               <jats:p>Contact: \u00a0sy@swufe.edu.cn; pliu@iastate.edu<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt632","type":"journal-article","created":{"date-parts":[[2013,11,5]],"date-time":"2013-11-05T01:38:58Z","timestamp":1383615538000},"page":"197-205","source":"Crossref","is-referenced-by-count":114,"title":["Model-based clustering for RNA-seq data"],"prefix":"10.1093","volume":"30","author":[{"given":"Yaqing","family":"Si","sequence":"first","affiliation":[{"name":"1 School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, China, 2Department of Statistics, Iowa State University, Ames, IA 50011, USA, 3Institute of Tropical Biosciences and Biotechnology (ITBB), Chinese Academy of Tropical Agriculture Sciences (CATAS), Haikou, Hainan 571101, China and 4Enterprise Institute for Renewable Fuels, Donald Danforth Plant Science Center, St. Louis, MO 63132, USA"},{"name":"1 School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, China, 2Department of Statistics, Iowa State University, Ames, IA 50011, USA, 3Institute of Tropical Biosciences and Biotechnology (ITBB), Chinese Academy of Tropical Agriculture Sciences (CATAS), Haikou, Hainan 571101, China and 4Enterprise Institute for Renewable Fuels, Donald Danforth Plant Science Center, St. Louis, MO 63132, USA"}]},{"given":"Peng","family":"Liu","sequence":"additional","affiliation":[{"name":"1 School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, China, 2Department of Statistics, Iowa State University, Ames, IA 50011, USA, 3Institute of Tropical Biosciences and Biotechnology (ITBB), Chinese Academy of Tropical Agriculture Sciences (CATAS), Haikou, Hainan 571101, China and 4Enterprise Institute for Renewable Fuels, Donald Danforth Plant Science Center, St. Louis, MO 63132, USA"}]},{"given":"Pinghua","family":"Li","sequence":"additional","affiliation":[{"name":"1 School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, China, 2Department of Statistics, Iowa State University, Ames, IA 50011, USA, 3Institute of Tropical Biosciences and Biotechnology (ITBB), Chinese Academy of Tropical Agriculture Sciences (CATAS), Haikou, Hainan 571101, China and 4Enterprise Institute for Renewable Fuels, Donald Danforth Plant Science Center, St. Louis, MO 63132, USA"}]},{"given":"Thomas P.","family":"Brutnell","sequence":"additional","affiliation":[{"name":"1 School of Statistics, Southwestern University of Finance and Economics, Chengdu, Sichuan 611130, China, 2Department of Statistics, Iowa State University, Ames, IA 50011, USA, 3Institute of Tropical Biosciences and Biotechnology (ITBB), Chinese Academy of Tropical Agriculture Sciences (CATAS), Haikou, Hainan 571101, China and 4Enterprise Institute for Renewable Fuels, Donald Danforth Plant Science Center, St. Louis, MO 63132, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,11,4]]},"reference":[{"key":"2023012710392653500_btt632-B1","doi-asserted-by":"crossref","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol."},{"key":"2023012710392653500_btt632-B2","first-page":"1027","article-title":"K-means++: the advantages of careful seeding","volume-title":"Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Arthur","year":"2007"},{"key":"2023012710392653500_btt632-B3","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1111\/j.1467-9868.2007.00629.x","article-title":"Clustering using objective functions and stochastic search","volume":"70","author":"Booth","year":"2008","journal-title":"J. R. Stat. Soc. Series B"},{"key":"2023012710392653500_btt632-B4","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/1471-2105-11-94","article-title":"Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments","volume":"11","author":"Bullard","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012710392653500_btt632-B5","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/0167-9473(92)90042-E","article-title":"Ea classification em algorithm for clustering and two stochastic versions","volume":"14","author":"Celeux","year":"1992","journal-title":"Comput. Stat. Data Anal."},{"key":"2023012710392653500_btt632-B6","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1137\/S1064827596311451","article-title":"Algorithms for model-based gaussian hierarchical clustering","volume":"20","author":"Fraley","year":"1999","journal-title":"SIAM J. Sci. Comput."},{"key":"2023012710392653500_btt632-B7","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012710392653500_btt632-B8","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/4235.771164","article-title":"Clustering with a genetically optimized approach","volume":"3","author":"Hall","year":"1999","journal-title":"IEEE Trans. Evol. Comput."},{"key":"2023012710392653500_btt632-B9","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1038\/ng.703","article-title":"The developmental dynamics of the maize leaf transcriptome","volume":"42","author":"Li","year":"2010","journal-title":"Nat. Genet."},{"key":"2023012710392653500_btt632-B10","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1042\/BST0361091","article-title":"Next-generation sequencing: applications beyond genomes","volume":"36","author":"Marguerat","year":"2008","journal-title":"Biochem. Soc. Trans."},{"key":"2023012710392653500_btt632-B11","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1101\/gr.079558.108","article-title":"Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays","volume":"18","author":"Marioni","year":"2008","journal-title":"Genome Res."},{"key":"2023012710392653500_btt632-B12","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1177\/096228029700600106","article-title":"On the em algorithm for overdispersed count data","volume":"6","author":"McLachlan","year":"1997","journal-title":"Stat. Methods Med. Res."},{"key":"2023012710392653500_btt632-B13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1023\/A:1007648401407","article-title":"An experimental comparison of model-based clustering methods","volume":"42","author":"Meila","year":"2001","journal-title":"Mach. Learn."},{"key":"2023012710392653500_btt632-B14","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies \u2013 the next generation","volume":"11","author":"Metzker","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"2023012710392653500_btt632-B15","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1038\/nmeth.1226","article-title":"Mapping and quantifying mammalian transcriptomes by Rna-seq","volume":"5","author":"Mortazavi","year":"2008","journal-title":"Nat. Methods"},{"key":"2023012710392653500_btt632-B16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1166\/jctn.2005.2977","article-title":"Evolutionary fuzzy clustering algorithm with knowledge-based evaluation and applications for gene expression profiling","volume":"2","author":"Park","year":"2005","journal-title":"J. Comput. Theor. Nanosci."},{"key":"2023012710392653500_btt632-B17","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1152\/physiolgenomics.00138.2002","article-title":"Clustering gene expression data using adaptive double self-organizing map","volume":"14","author":"Ressom","year":"2003","journal-title":"Physiol. Genomics"},{"key":"2023012710392653500_btt632-B18","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of Rna-seq data","volume":"11","author":"Robinson","year":"2010","journal-title":"Genome Biol."},{"key":"2023012710392653500_btt632-B19","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1093\/biostatistics\/kxm030","article-title":"Small-sample estimation of negative binomial dispersion, with applications to sage data","volume":"9","author":"Robinson","year":"2008","journal-title":"Biostatistics"},{"key":"2023012710392653500_btt632-B20","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edger: a bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012710392653500_btt632-B21","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1109\/5.726788","article-title":"Deterministic annealing for clustering, compression, classification, regression, and related optimization problems","volume":"86","author":"Rose","year":"1998","journal-title":"Proc. IEEE"},{"key":"2023012710392653500_btt632-B22","first-page":"583","article-title":"Cluster ensembles - a knowledge reuse framework for combining partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"2023012710392653500_btt632-B23","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1126\/science.1160342","article-title":"A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome","volume":"321","author":"Sultan","year":"2008","journal-title":"Science"},{"key":"2023012710392653500_btt632-B24","doi-asserted-by":"crossref","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","article-title":"Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation","volume":"96","author":"Tamayo","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012710392653500_btt632-B25","first-page":"599","article-title":"Model-based hierarchical clustering","volume-title":"Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence","author":"Vaithyanathan","year":"2000"},{"key":"2023012710392653500_btt632-B26","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/bfgp\/elp057","article-title":"Exploring plant transcriptomes using ultra high-throughput sequencing","volume":"9","author":"Wang","year":"2010","journal-title":"Brief. Funct. Genomics"},{"key":"2023012710392653500_btt632-B27","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1038\/nrg2484","article-title":"Rna-seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"2023012710392653500_btt632-B28","doi-asserted-by":"crossref","first-page":"2493","DOI":"10.1214\/11-AOAS493","article-title":"Classification and clustering of sequencing data using a poisson model","volume":"5","author":"Witten","year":"2011","journal-title":"Ann. Appl. Stat."},{"key":"2023012710392653500_btt632-B29","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1198\/jasa.2010.ap09545","article-title":"Model-based clustering for online crisis identification in distributed computing","volume":"106","author":"Woodard","year":"2011","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012710392653500_btt632-B30","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","article-title":"Model-based clustering and data transformations for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012710392653500_btt632-B31","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1128","article-title":"General framework for weighted gene co-expression network analysis","volume":"4","author":"Zhang","year":"2005","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012710392653500_btt632-B32","first-page":"1001","article-title":"A unified framework for model-based clustering","volume":"4","author":"Zhong","year":"2003","journal-title":"J. Mach. Learn. Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/2\/197\/48915785\/bioinformatics_30_2_197.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/2\/197\/48915785\/bioinformatics_30_2_197.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T10:49:27Z","timestamp":1674816567000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/2\/197\/217752"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,11,4]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt632","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,1,15]]},"published":{"date-parts":[[2013,11,4]]}}}