{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T12:11:28Z","timestamp":1774008688336,"version":"3.50.1"},"reference-count":66,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T00:00:00Z","timestamp":1639094400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R21CA246358"],"award-info":[{"award-number":["R21CA246358"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HG009892"],"award-info":[{"award-number":["R01HG009892"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01HG008164"],"award-info":[{"award-number":["1R01HG008164"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science Foundation CCF","award":["1651236"],"award-info":[{"award-number":["1651236"]}]},{"name":"National Science Foundation CCF","award":["NSF CIF 1703403"],"award-info":[{"award-number":["NSF CIF 1703403"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,2,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we introduce CellMeSH\u2014a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene\u2013cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene\u2013cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Web server at https:\/\/uncurl.cs.washington.edu\/db_query and API at https:\/\/github.com\/shunfumao\/cellmesh.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab834","type":"journal-article","created":{"date-parts":[[2021,12,6]],"date-time":"2021-12-06T15:15:44Z","timestamp":1638803744000},"page":"1393-1402","source":"Crossref","is-referenced-by-count":16,"title":["CellMeSH: probabilistic cell-type identification using indexed literature"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8203-0507","authenticated-orcid":false,"given":"Shunfu","family":"Mao","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering Department, University of Washington , Seattle, WA 98195, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6533-5563","authenticated-orcid":false,"given":"Yue","family":"Zhang","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, USA"}]},{"given":"Georg","family":"Seelig","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, University of Washington , Seattle, WA 98195, USA"},{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington , Seattle, WA 98195, USA"}]},{"given":"Sreeram","family":"Kannan","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, University of Washington , Seattle, WA 98195, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,12,10]]},"reference":[{"key":"2023020108561546600_btab834-B1","doi-asserted-by":"crossref","DOI":"10.1101\/323238","article-title":"scQuery: a web server for comparative analysis of single-cell RNA-seq data","author":"Alavi","year":"2018","journal-title":"."},{"key":"2023020108561546600_btab834-B2","author":"Andrews","year":"2010"},{"key":"2023020108561546600_btab834-B3","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1038\/s41590-018-0276-y","article-title":"Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage","volume":"20","author":"Aran","year":"2019","journal-title":"Nat. Immunol"},{"key":"2023020108561546600_btab834-B4","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2023020108561546600_btab834-B5","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023020108561546600_btab834-B6","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023020108561546600_btab834-B7","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1126\/science.aam8940","article-title":"Comprehensive single-cell transcriptional profiling of a multicellular organism","volume":"357","author":"Cao","year":"2017","journal-title":"Science"},{"key":"2023020108561546600_btab834-B8","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1186\/1471-2105-14-128","article-title":"Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool","volume":"14","author":"Chen","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020108561546600_btab834-B9","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a tabula muris","volume":"562","author":"Consortium","year":"2018","journal-title":"Nature"},{"key":"2023020108561546600_btab834-B10","article-title":"Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data","author":"Diaz-Mejia","year":"2019"},{"key":"2023020108561546600_btab834-B66"},{"key":"2023020108561546600_btab834-B11","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"Star: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020108561546600_btab834-B12","doi-asserted-by":"crossref","first-page":"39","DOI":"10.2307\/2342435","article-title":"The logic of inductive inference","volume":"98","author":"Fisher","year":"1935","journal-title":"J. R. Stat. Soc"},{"key":"2023020108561546600_btab834-B13","doi-asserted-by":"crossref","first-page":"baz046","DOI":"10.1093\/database\/baz046","article-title":"PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data","volume":"2019","author":"Franz\u00e9n","year":"2019","journal-title":"Database"},{"key":"2023020108561546600_btab834-B14","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1016\/j.cell.2015.10.039","article-title":"Design and analysis of single-cell sequencing experiments","volume":"163","author":"Gr\u00fcn","year":"2015","journal-title":"Cell"},{"key":"2023020108561546600_btab834-B15","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1016\/j.cell.2018.02.001","article-title":"Mapping the mouse cell atlas by microwell-seq","volume":"172","author":"Han","year":"2018","journal-title":"Cell"},{"key":"2023020108561546600_btab834-B16","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1471-2105-14-7","article-title":"GSVA: gene set variation analysis for microarray and RNA-seq data","volume":"14","author":"H\u00e4nzelmann","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020108561546600_btab834-B17","doi-asserted-by":"crossref","first-page":"4688","DOI":"10.1093\/bioinformatics\/btz292","article-title":"scMatch: a single-cell gene expression profile annotation tool using reference datasets","volume":"35","author":"Hou","year":"2019","journal-title":"Bioinformatics"},{"key":"2023020108561546600_btab834-B18","doi-asserted-by":"crossref","first-page":"776","DOI":"10.1126\/science.1247651","article-title":"Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types","volume":"343","author":"Jaitin","year":"2014","journal-title":"Science"},{"key":"2023020108561546600_btab834-B19","doi-asserted-by":"crossref","first-page":"R36","DOI":"10.1186\/gb-2013-14-4-r36","article-title":"TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions","volume":"14","author":"Kim","year":"2013","journal-title":"Genome Biol"},{"key":"2023020108561546600_btab834-B20","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"2023020108561546600_btab834-B21","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1038\/nmeth.4644","article-title":"scmap: projection of single-cell RNA-seq data across data sets","volume":"15","author":"Kiselev","year":"2018","journal-title":"Nat. Methods"},{"key":"2023020108561546600_btab834-B22","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2023020108561546600_btab834-B23","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/s13059-020-1926-6","article-title":"Eleven grand challenges in single-cell data science","volume":"21","author":"L\u00e4hnemann","year":"2020","journal-title":"Genome Biol"},{"key":"2023020108561546600_btab834-B24","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1186\/1471-2105-12-323","article-title":"Rsem: accurate transcript quantification from rna-seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020108561546600_btab834-B25","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1186\/s13059-017-1188-0","article-title":"CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data","volume":"18","author":"Lin","year":"2017","journal-title":"Genome Biol"},{"key":"2023020108561546600_btab834-B26","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1093\/bioinformatics\/btz592","article-title":"ACTINN: automated identification of cell types in single cell RNA sequencing","volume":"36","author":"Ma","year":"2019","journal-title":"Bioinformatics"},{"key":"2023020108561546600_btab834-B27","doi-asserted-by":"crossref","first-page":"D26","DOI":"10.1093\/nar\/gkl993","article-title":"Entrez Gene: gene-centered information at NCBI","volume":"35","author":"Maglott","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023020108561546600_btab834-B28","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"Manning","year":"2008"},{"key":"2023020108561546600_btab834-B29","author":"Mao","year":"2021"},{"key":"2023020108561546600_btab834-B30","author":"Mao","year":"2021"},{"key":"2023020108561546600_btab834-B31","year":"2015"},{"key":"2023020108561546600_btab834-B32","year":"2019"},{"key":"2023020108561546600_btab834-B33","doi-asserted-by":"crossref","first-page":"i124","DOI":"10.1093\/bioinformatics\/bty293","article-title":"Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge","volume":"34","author":"Mukherjee","year":"2018","journal-title":"Bioinformatics"},{"key":"2023020108561546600_btab834-B34","first-page":"3","article-title":"A survey of named entity recognition and classification","volume":"30","author":"Nadeau","year":"2007","journal-title":"Int. J. Ling. Lang. Resour"},{"key":"2023020108561546600_btab834-B35","author":"Orr Ashenberg","year":"2019"},{"key":"2023020108561546600_btab834-B36","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1186\/s13059-015-0805-z","article-title":"ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis","volume":"16","author":"Pierson","year":"2015","journal-title":"Genome Biol"},{"key":"2023020108561546600_btab834-B37","doi-asserted-by":"crossref","DOI":"10.1038\/s41592-019-0535-3","article-title":"Supervised classification enables rapid annotation of cell atlases","author":"Pliner","year":"2019"},{"key":"2023020108561546600_btab834-B38","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139058452","volume-title":"Mining of Massive Datasets","author":"Rajaraman","year":"2011"},{"key":"2023020108561546600_btab834-B39","doi-asserted-by":"crossref","first-page":"eaam8999-182","DOI":"10.1126\/science.aam8999","article-title":"Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding","volume":"360","author":"Rosenberg","year":"2018","journal-title":"Science"},{"key":"2023020108561546600_btab834-B40","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023020108561546600_btab834-B41","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1038\/nbt.3569","article-title":"Wishbone identifies bifurcating developmental trajectories from single-cell data","volume":"34","author":"Setty","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023020108561546600_btab834-B42","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1016\/j.stem.2015.07.013","article-title":"Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis","volume":"17","author":"Shin","year":"2015","journal-title":"Cell Stem Cell"},{"key":"2023020108561546600_btab834-B43","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1038\/nmeth.4612","article-title":"Bias, robustness and scalability in single-cell differential expression analysis","volume":"15","author":"Soneson","year":"2018","journal-title":"Nat. Methods"},{"key":"2023020108561546600_btab834-B44","doi-asserted-by":"crossref","first-page":"D950","DOI":"10.1093\/nar\/gkt1264","article-title":"CellFinder: a cell data repository","volume":"42","author":"Stachelscheid","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020108561546600_btab834-B45","doi-asserted-by":"crossref","first-page":"6062","DOI":"10.1073\/pnas.0400782101","article-title":"A gene atlas of the mouse and human protein-encoding transcriptomes","volume":"101","author":"Su","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020108561546600_btab834-B46","doi-asserted-by":"crossref","first-page":"1649","DOI":"10.1038\/s41467-019-09639-3","article-title":"A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies","volume":"10","author":"Sun","year":"2019","journal-title":"Nat. Commun"},{"key":"2023020108561546600_btab834-B47","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.cels.2019.06.004","article-title":"SingleCellNet: a computational tool to classify single cell RNA-seq data across platforms and across species","volume":"9","author":"Tan","year":"2019","journal-title":"Cell Syst"},{"key":"2023020108561546600_btab834-B48","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/nmeth.1315","article-title":"mRNA-seq whole-transcriptome analysis of a single cell","volume":"6","author":"Tang","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020108561546600_btab834-B49","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nn.4216","article-title":"Adult mouse cortical cell taxonomy revealed by single cell transcriptomics","volume":"19","author":"Tasic","year":"2016","journal-title":"Nat. Neurosci"},{"key":"2023020108561546600_btab834-B50","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nbt.2859","article-title":"The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells","volume":"32","author":"Trapnell","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023020108561546600_btab834-B51","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1038\/nn.3881","article-title":"Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing","volume":"18","author":"Usoskin","year":"2014","journal-title":"Nat. Neurosci"},{"key":"2023020108561546600_btab834-B52","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2023020108561546600_btab834-B53","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1186\/s13059-016-0975-3","article-title":"SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data","volume":"17","author":"Welch","year":"2016","journal-title":"Genome Biol"},{"key":"2023020108561546600_btab834-B54","year":"2020"},{"key":"2023020108561546600_btab834-B55","year":"2020"},{"key":"2023020108561546600_btab834-B56","year":"2020"},{"key":"2023020108561546600_btab834-B57","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2023020108561546600_btab834-B58","first-page":"2145","author":"Yadav","year":"2019"},{"key":"2023020108561546600_btab834-B59","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1126\/science.aaa1934","article-title":"Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq","volume":"347","author":"Zeisel","year":"2015","journal-title":"Science"},{"key":"2023020108561546600_btab834-B60","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1038\/s41592-019-0529-1","article-title":"Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling","volume":"16","author":"Zhang","year":"2019","journal-title":"Nat. Methods"},{"key":"2023020108561546600_btab834-B61","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1186\/s12859-018-2092-7","article-title":"An interpretable framework for clustering single-cell RNA-Seq datasets","volume":"19","author":"Zhang","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023020108561546600_btab834-B62","doi-asserted-by":"crossref","first-page":"D721","DOI":"10.1093\/nar\/gky900","article-title":"CellMarker: a manually curated resource of cell markers in human and mouse","volume":"47","author":"Zhang","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023020108561546600_btab834-B63","author":"Zhang","year":"2021"},{"key":"2023020108561546600_btab834-B64","author":"Zhang","year":"2020"},{"key":"2023020108561546600_btab834-B65","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab834\/42088024\/btab834.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/5\/1393\/49008764\/btab834.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/5\/1393\/49008764\/btab834.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T15:24:45Z","timestamp":1675265085000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/5\/1393\/6459167"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,12,10]]},"references-count":66,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,2,7]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab834","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.05.29.124743","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,3,1]]},"published":{"date-parts":[[2021,12,10]]}}}