{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:22:49Z","timestamp":1775326969493,"version":"3.50.1"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2020,11,9]],"date-time":"2020-11-09T00:00:00Z","timestamp":1604880000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1U19MH114830"],"award-info":[{"award-number":["1U19MH114830"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1U19MH114821"],"award-info":[{"award-number":["1U19MH114821"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000011","name":"Howard Hughes Medical Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000011","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>One major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor and the resolution parameters, among others.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat and estimation of cluster stability using the Jaccard similarity index and providing rich visualizations.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availabilityand implementation<\/jats:title>\n                    <jats:p>R package scclusteval: https:\/\/github.com\/crazyhottommy\/scclusteval Snakemake workflow: https:\/\/github.com\/crazyhottommy\/pyflow_seuratv3_parameter Tutorial: https:\/\/crazyhottommy.github.io\/EvaluateSingleCellClustering\/.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa956","type":"journal-article","created":{"date-parts":[[2020,11,2]],"date-time":"2020-11-02T15:13:55Z","timestamp":1604330035000},"page":"2212-2214","source":"Crossref","is-referenced-by-count":98,"title":["Evaluating single-cell cluster stability using the Jaccard similarity index"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9975-3541","authenticated-orcid":false,"given":"Ming","family":"Tang","sequence":"first","affiliation":[{"name":"FAS Informatics Group, Harvard University , Cambridge, MA, USA"},{"name":"Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University , Cambridge, MA, USA"},{"name":"Howard Hughes Medical Institute , Cambridge, MA, USA"}]},{"given":"Yasin","family":"Kaymaz","sequence":"additional","affiliation":[{"name":"FAS Informatics Group, Harvard University , Cambridge, MA, USA"}]},{"given":"Brandon L","family":"Logeman","sequence":"additional","affiliation":[{"name":"Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University , Cambridge, MA, USA"},{"name":"Howard Hughes Medical Institute , Cambridge, MA, USA"}]},{"given":"Stephen","family":"Eichhorn","sequence":"additional","affiliation":[{"name":"Department of Chemistry, Harvard University , Cambridge, MA, USA"}]},{"given":"Zhengzheng S","family":"Liang","sequence":"additional","affiliation":[{"name":"Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University , Cambridge, MA, USA"},{"name":"Howard Hughes Medical Institute , Cambridge, MA, USA"}]},{"given":"Catherine","family":"Dulac","sequence":"additional","affiliation":[{"name":"Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University , Cambridge, MA, USA"},{"name":"Howard Hughes Medical Institute , Cambridge, MA, USA"}]},{"given":"Timothy B","family":"Sackton","sequence":"additional","affiliation":[{"name":"FAS Informatics Group, Harvard University , Cambridge, MA, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,11,9]]},"reference":[{"key":"2023061310303699900_btaa956-B1","doi-asserted-by":"crossref","first-page":"63","DOI":"10.12688\/wellcomeopenres.15191.1","article-title":"Raincloud plots: a multi-platform tool for robust data visualization","volume":"4","author":"Allen","year":"2019","journal-title":"Wellcome Open Res"},{"key":"2023061310303699900_btaa956-B2","doi-asserted-by":"crossref","first-page":"1749","DOI":"10.12688\/f1000research.20843.1","article-title":"Creating and sharing reproducible research code the workflowr way [version 1; peer review: 3 approved]","volume":"8","author":"Blischak","year":"2019","journal-title":"F1000Research"},{"key":"2023061310303699900_btaa956-B78978867","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.12688\/f1000research.15666.2","article-title":"A systematic performance evaluation of clustering methods for single-cell RNA-seq data","volume":"7","year":"2018","journal-title":"F1000Res."},{"key":"2023061310303699900_btaa956-B3","doi-asserted-by":"crossref","first-page":"e1004575","DOI":"10.1371\/journal.pcbi.1004575","article-title":"SINCERA: a pipeline for single-cell RNA-seq profiling analysis","volume":"11","author":"Guo","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023061310303699900_btaa956-B4","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1016\/j.csda.2006.11.025","article-title":"Cluster-wise assessment of cluster stability","volume":"52","author":"Hennig","year":"2007","journal-title":"Comput. Stat. Data Anal"},{"key":"2023061310303699900_btaa956-B5","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2014a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"key":"2023061310303699900_btaa956-B6","author":"Lun","year":"2019"},{"key":"2023061310303699900_btaa956-B7","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2023061310303699900_btaa956-B8","doi-asserted-by":"crossref","first-page":"dev169748","DOI":"10.1242\/dev.169748","article-title":"The evolving concept of cell identity in the single cell era","volume":"146","author":"Morris","year":"2019","journal-title":"Development"},{"key":"2023061310303699900_btaa956-B9","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/s41592-019-0425-8","article-title":"Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments","volume":"16","author":"Tian","year":"2019","journal-title":"Nat. Methods"},{"key":"2023061310303699900_btaa956-B10","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2023061310303699900_btaa956-B11","doi-asserted-by":"crossref","first-page":"dev169854","DOI":"10.1242\/dev.169854","article-title":"A periodic table of cell types","volume":"146","author":"Xia","year":"2019","journal-title":"Development"},{"key":"2023061310303699900_btaa956-B12","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giy083","article-title":"Clustering trees: a visualization for evaluating clusterings at multiple resolutions","volume":"7","author":"Zappia","year":"2018","journal-title":"GigaScience"},{"key":"2023061310303699900_btaa956-B13","volume-title":"Practical Data Science with R","author":"Zumel","year":"2014","edition":"1"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa956\/35037826\/btaa956.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/15\/2212\/50578875\/btaa956.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/15\/2212\/50578875\/btaa956.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T06:33:18Z","timestamp":1686637998000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/15\/2212\/5962080"}},"subtitle":[],"editor":[{"given":"Birol","family":"Inanc","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,9]]},"references-count":14,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2021,8,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa956","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.05.26.116640","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,8,1]]},"published":{"date-parts":[[2020,11,9]]}}}