{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T03:43:50Z","timestamp":1773373430519,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2018,8,28]],"date-time":"2018-08-28T00:00:00Z","timestamp":1535414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000893","name":"Simons Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000893","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (i) the clustering quality still needs to be improved; (ii) most models need prior knowledge on number of clusters, which is not always available; (iii) there is a demand for faster computational speed.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We propose to tackle these challenges with Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code is publicly available on https:\/\/github.com\/tiehangd\/Para_DPMM\/tree\/master\/Para_DPMM_package.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty702","type":"journal-article","created":{"date-parts":[[2018,8,22]],"date-time":"2018-08-22T23:56:55Z","timestamp":1534982215000},"page":"953-961","source":"Crossref","is-referenced-by-count":17,"title":["Parallel clustering of single cell transcriptomic data with split-merge sampling on Dirichlet process mixtures"],"prefix":"10.1093","volume":"35","author":[{"given":"Tiehang","family":"Duan","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of California, Irvine, CA, USA"}]},{"given":"Jos\u00e9 P","family":"Pinto","sequence":"additional","affiliation":[{"name":"SysBioLab, Centre for Biomedical Research (CBMR), University of Algarve, Faro, Algarve, Portugal"}]},{"given":"Xiaohui","family":"Xie","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California, Irvine, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,8,28]]},"reference":[{"key":"2023020108345915800_bty702-B1","doi-asserted-by":"crossref","DOI":"10.1038\/nmeth.4463","article-title":"Scenic: single-cell regulatory network inference and clustering","volume":"14","author":"Aibar","year":"2017","journal-title":"Nat. Methods"},{"key":"2023020108345915800_bty702-B2","doi-asserted-by":"crossref","first-page":"2045.","DOI":"10.1038\/s41467-017-02305-6","article-title":"Single-cell rna-sequencing uncovers transcriptional states and fate decisions in haematopoiesis","volume":"8","author":"Athanasiadis","year":"2017","journal-title":"Nat. Commun"},{"key":"2023020108345915800_bty702-B3","first-page":"elx035","article-title":"Experimental design for single-cell RNA sequencing","volume":"17","author":"Baran-Gale","year":"2017","journal-title":"Brief. Funct. Genomics"},{"key":"2023020108345915800_bty702-B4","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1214\/06-BA104","article-title":"Variational inference for Dirichlet process mixtures","volume":"1","author":"Blei","year":"2006","journal-title":"Bayesian Anal"},{"key":"2023020108345915800_bty702-B5","first-page":"2003","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Machine Learn. Res"},{"key":"2023020108345915800_bty702-B6","first-page":"620","volume-title":"Proceedings of the 26th International Conference on Neural Information Processing Systems","author":"Chang","year":"2013"},{"key":"2023020108345915800_bty702-B7","doi-asserted-by":"crossref","first-page":"363.","DOI":"10.1186\/s12859-016-1175-6","article-title":"Celltree: an r\/bioconductor package to infer the hierarchical structure of cell populations from single-cell rna-seq data","volume":"17","author":"DuVerle","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023020108345915800_bty702-B8","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1080\/01621459.1995.10476550","article-title":"Bayesian density estimation and inference using mixtures","volume":"90","author":"Escobar","year":"1995","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108345915800_bty702-B9","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1214\/13-STS422","article-title":"Mcmc for normalized random measure mixture models","volume":"28","author":"Favaro","year":"2013","journal-title":"Statist. Sci"},{"key":"2023020108345915800_bty702-B10","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108345915800_bty702-B11","volume-title":"Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees.","author":"Gonzalez","year":"2011"},{"key":"2023020108345915800_bty702-B12","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1007\/s11390-010-9355-8","article-title":"Dirichlet process gaussian mixture models: choice of the base distribution","volume":"25","author":"G\u00f6r\u00fcr","year":"2010","journal-title":"J. Computer Sci. Technol"},{"key":"2023020108345915800_bty702-B13","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger rna sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"2023020108345915800_bty702-B14","doi-asserted-by":"crossref","first-page":"e1004575","DOI":"10.1371\/journal.pcbi.1004575","article-title":"Sincera: a pipeline for single-cell rna-seq profiling analysis","volume":"11","author":"Guo","year":"2015","journal-title":"PLOS Comput. Biol"},{"key":"2023020108345915800_bty702-B15","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classification"},{"key":"2023020108345915800_bty702-B16","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1198\/016214501750332758","article-title":"Gibbs sampling methods for stick-breaking priors","volume":"96","author":"Ishwaran","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020108345915800_bty702-B17","doi-asserted-by":"crossref","first-page":"269","DOI":"10.2307\/3315951","article-title":"Exact and approximate sum representations for the dirichlet process","volume":"30","author":"Ishwaran","year":"2002","journal-title":"Can. J. Stat"},{"key":"2023020108345915800_bty702-B18","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1038\/nmeth.2772","article-title":"Quantitative single-cell rna-seq with unique molecular identifiers","volume":"11","author":"Islam","year":"2014","journal-title":"Nat. Methods"},{"key":"2023020108345915800_bty702-B19","volume-title":"Icml","author":"Ji","year":"2017"},{"key":"2023020108345915800_bty702-B20","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1109\/TPAMI.2002.1017616","article-title":"An efficient k-means clustering algorithm: analysis and implementation","volume":"24","author":"Kanungo","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Machine Intel"},{"key":"2023020108345915800_bty702-B21","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"Sc3: consensus clustering of single-cell rna-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"2023020108345915800_bty702-B22","first-page":"2796","volume-title":"Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI\u201907","author":"Kurihara","year":"2007"},{"key":"2023020108345915800_bty702-B23","doi-asserted-by":"crossref","first-page":"59.","DOI":"10.1186\/s13059-017-1188-0","article-title":"CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data","volume":"18","author":"Lin","year":"2017","journal-title":"Genome Biol"},{"key":"2023020108345915800_bty702-B24","volume-title":"ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures","author":"Lovell","year":"2013"},{"key":"2023020108345915800_bty702-B25","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"Manning","year":"2008"},{"key":"2023020108345915800_bty702-B26","first-page":"197","volume-title":"Bayesian Mixture Modeling","author":"Neal","year":"1992"},{"key":"2023020108345915800_bty702-B27","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1080\/10618600.2000.10474879","article-title":"Markov chain sampling methods for dirichlet process mixture models","volume":"9","author":"Neal","year":"2000","journal-title":"J. Comput. Graph. Stat"},{"key":"2023020108345915800_bty702-B28","first-page":"849","volume-title":"Advances in Neural Information Processing Systems","author":"Ng","year":"2001"},{"key":"2023020108345915800_bty702-B29","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1093\/biomet\/asm086","article-title":"Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models","volume":"95","author":"Papaspiliopoulos","year":"2008","journal-title":"Biometrika"},{"key":"2023020108345915800_bty702-B30","doi-asserted-by":"crossref","first-page":"595.","DOI":"10.12688\/f1000research.11290.1","article-title":"Gene length and detection bias in single cell RNA sequencing protocols","volume":"6","author":"Phipson","year":"2017","journal-title":"F1000Research"},{"key":"2023020108345915800_bty702-B31","doi-asserted-by":"crossref","first-page":"225.","DOI":"10.1038\/icb.2015.106","article-title":"Single-cell technologies are revolutionizing the approach to rare cells","volume":"94","author":"Proserpio","year":"2016","journal-title":"Immunol. Cell Biol"},{"key":"2023020108345915800_bty702-B32","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotech"},{"key":"2023020108345915800_bty702-B33","volume-title":"Dimm-Sc: A Dirichlet Mixture Model for Clustering Droplet-Based Single Cell Transcriptomic Data","author":"Sun","year":"2017"},{"key":"2023020108345915800_bty702-B34","first-page":"1701","article-title":"Markov chains for exploring posterior distributions","volume":"22","author":"Tierney","year":"1994","journal-title":"Ann. Statist"},{"key":"2023020108345915800_bty702-B35","doi-asserted-by":"crossref","first-page":"414.","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by Kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2023020108345915800_bty702-B36","first-page":"0962280215609948","article-title":"Fast clustering using adaptive density peak detection","volume":"1","author":"Wang","year":"2015","journal-title":"Stat. Methods Med. Res"},{"key":"2023020108345915800_bty702-B37","author":"Williamson","year":"2013"},{"key":"2023020108345915800_bty702-B38","doi-asserted-by":"crossref","first-page":"1974","DOI":"10.1093\/bioinformatics\/btv088","article-title":"Identification of cell types from single-cell transcriptomes using a novel clustering method","volume":"31","author":"Xu","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020108345915800_bty702-B39","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"},{"key":"2023020108345915800_bty702-B40","doi-asserted-by":"crossref","first-page":"140.","DOI":"10.1186\/s12859-016-0984-y","article-title":"Pcareduce: hierarchical clustering of single cell transcriptional profiles","volume":"17","author":"\u017durauskien\u0117","year":"2016","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/6\/953\/48967850\/bioinformatics_35_6_953.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/6\/953\/48967850\/bioinformatics_35_6_953.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,9]],"date-time":"2024-07-09T10:02:29Z","timestamp":1720519349000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/6\/953\/5085373"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,8,28]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2019,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty702","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/271163","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,3,15]]},"published":{"date-parts":[[2018,8,28]]}}}