{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T03:43:48Z","timestamp":1773373428231,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2017,8,2]],"date-time":"2017-08-02T00:00:00Z","timestamp":1501632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Science Foundation of China","doi-asserted-by":"publisher","award":["11401338"],"award-info":[{"award-number":["11401338"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu\/\u223cwec47\/singlecell.html.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx490","type":"journal-article","created":{"date-parts":[[2017,8,1]],"date-time":"2017-08-01T11:19:46Z","timestamp":1501586386000},"page":"139-146","source":"Crossref","is-referenced-by-count":79,"title":["DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data"],"prefix":"10.1093","volume":"34","author":[{"given":"Zhe","family":"Sun","sequence":"first","affiliation":[{"name":"Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4712-6658","authenticated-orcid":false,"given":"Ting","family":"Wang","sequence":"additional","affiliation":[{"name":"Division of Pulmonary Medicine, Allergy and Immunology and Department of Pediatrics, Children\u2019s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA"}]},{"given":"Ke","family":"Deng","sequence":"additional","affiliation":[{"name":"Center for Statistical Science, Tsinghua University, Beijing, China"}]},{"given":"Xiao-Feng","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA"}]},{"given":"Robert","family":"Lafyatis","sequence":"additional","affiliation":[{"name":"Division of Rheumatology and Clinical Immunology, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA"}]},{"given":"Ying","family":"Ding","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, USA"}]},{"given":"Ming","family":"Hu","sequence":"additional","affiliation":[{"name":"Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA"}]},{"given":"Wei","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, USA"},{"name":"Division of Pulmonary Medicine, Allergy and Immunology and Department of Pediatrics, Children\u2019s Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,8,2]]},"reference":[{"key":"2023020208405721700_btx490-B1","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"New Look at Statistical-Model Identification","volume":"19","author":"Akaike","year":"1974","journal-title":"IEEE Trans. Automat. Contr"},{"issue":"1","key":"2023020208405721700_btx490-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020208405721700_btx490-B3","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1186\/s12859-016-1175-6","article-title":"CellTree: an R\/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data","volume":"17","author":"duVerle","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023020208405721700_btx490-B4","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1038\/nrg.2015.16","article-title":"Single-cell genome sequencing: current state of the science","volume":"17","author":"Gawad","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2023020208405721700_btx490-B5","doi-asserted-by":"crossref","first-page":"e30126.","DOI":"10.1371\/journal.pone.0030126","article-title":"Dirichlet multinomial mixtures: generative models for microbial metagenomics","volume":"7","author":"Holmes","year":"2012","journal-title":"PLoS One"},{"key":"2023020208405721700_btx490-B6","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1038\/nmeth.2772","article-title":"Quantitative single-cell RNA-seq with unique molecular identifiers","volume":"11","author":"Islam","year":"2014","journal-title":"Nat. Methods"},{"key":"2023020208405721700_btx490-B7","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1126\/science.1247651","article-title":"Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types","volume":"343","author":"Jaitin","year":"2014","journal-title":"Science"},{"key":"2023020208405721700_btx490-B8","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1038\/nmeth.1778","article-title":"Counting absolute numbers of molecules using unique molecular identifiers","volume":"9","author":"Kivioja","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020208405721700_btx490-B9","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2023020208405721700_btx490-B10","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020208405721700_btx490-B11","doi-asserted-by":"crossref","first-page":"1492","DOI":"10.1126\/science.1242072","article-title":"Machine learning. Clustering by fast search and find of density peaks","volume":"344","author":"Rodriguez","year":"2014","journal-title":"Science"},{"key":"2023020208405721700_btx490-B12","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1080\/00949658908811178","article-title":"Maximum-likelihood estimation of dirichlet distributions","volume":"32","author":"Ronning","year":"1989","journal-title":"J. Stat. Comput. Simul"},{"key":"2023020208405721700_btx490-B13","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023020208405721700_btx490-B14","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat"},{"key":"2023020208405721700_btx490-B15","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1038\/nrg3833","article-title":"Computational and analytical challenges in single-cell transcriptomics","volume":"16","author":"Stegle","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023020208405721700_btx490-B16","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1007\/978-0-387-30164-8_219","volume-title":"Dirichlet process. Encyclopedia of Machine Learning","author":"Teh","year":"2011"},{"key":"2023020208405721700_btx490-B17","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023020208405721700_btx490-B18","article-title":"Fast clustering using adaptive density peak detection","author":"Wang","year":"2015","journal-title":"Stat. Methods Med. Res"},{"key":"2023020208405721700_btx490-B19","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1146\/annurev.genet.36.050802.093940","article-title":"Estimating F-statistics","volume":"36","author":"Weir","year":"2002","journal-title":"Annu. Rev. Genet"},{"key":"2023020208405721700_btx490-B20","author":"Yamamoto"},{"key":"2023020208405721700_btx490-B21","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/1\/139\/49043482\/bioinformatics_34_1_139.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/1\/139\/49043482\/bioinformatics_34_1_139.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T22:28:20Z","timestamp":1719354500000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/1\/139\/4060554"}},"subtitle":[],"editor":[{"given":"Cenk","family":"Sahinalp","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,8,2]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx490","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,1,1]]},"published":{"date-parts":[[2017,8,2]]}}}