{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:01Z","timestamp":1772138041977,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 HG009937"],"award-info":[{"award-number":["R01 HG009937"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1750472"],"award-info":[{"award-number":["CCF-1750472"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-1763680"],"award-info":[{"award-number":["CNS-1763680"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["P30 CA016086"],"award-info":[{"award-number":["P30 CA016086"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["P50 CA058223"],"award-info":[{"award-number":["P50 CA058223"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Quantification estimates of gene expression from single-cell RNA-seq (scRNA-seq) data have inherent uncertainty due to reads that map to multiple genes. Many existing scRNA-seq quantification pipelines ignore multi-mapping reads and therefore underestimate expected read counts for many genes. alevin accounts for multi-mapping reads and allows for the generation of \u2018inferential replicates\u2019, which reflect quantification uncertainty. Previous methods have shown improved performance when incorporating these replicates into statistical analyses, but storage and use of these replicates increases computation time and memory requirements.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We demonstrate that storing only the mean and variance from a set of inferential replicates (\u2018compression\u2019) is sufficient to capture gene-level quantification uncertainty, while reducing disk storage to as low as 9% of original storage, and memory usage when loading data to as low as 6%. Using these values, we generate \u2018pseudo-inferential\u2019 replicates from a negative binomial distribution and propose a general procedure for incorporating these replicates into a proposed statistical testing framework. When applying this procedure to trajectory-based differential expression analyses, we show false positives are reduced by more than a third for genes with high levels of quantification uncertainty. We additionally extend the Swish method to incorporate pseudo-inferential replicates and demonstrate improvements in computation time and memory usage without any loss in performance. Lastly, we show that discarding multi-mapping reads can result in significant underestimation of counts for functionally important genes in a real dataset.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>makeInfReps and splitSwish are implemented in the R\/Bioconductor fishpond package available at https:\/\/bioconductor.org\/packages\/fishpond. Analyses and simulated datasets can be found in the paper\u2019s GitHub repo at https:\/\/github.com\/skvanburen\/scUncertaintyPaperCode.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab001","type":"journal-article","created":{"date-parts":[[2021,1,4]],"date-time":"2021-01-04T15:35:40Z","timestamp":1609774540000},"page":"1699-1707","source":"Crossref","is-referenced-by-count":7,"title":["Compression of quantification uncertainty for scRNA-seq counts"],"prefix":"10.1093","volume":"37","author":[{"given":"Scott","family":"Van Buren","sequence":"first","affiliation":[{"name":"Department of Biostatistics, University of North Carolina at Chapel Hill , Chapel Hill, NC 27516, USA"}]},{"given":"Hirak","family":"Sarkar","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Maryland , College Park, MD 20742, USA"},{"name":"Center for Bioinformatics and Computational Biology, University of Maryland , College Park, MD 20742, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9798-2079","authenticated-orcid":false,"given":"Avi","family":"Srivastava","sequence":"additional","affiliation":[{"name":"New York Genome Center , New York, NY 10013, USA"},{"name":"Center for Genomics and Systems Biology, New York University , New York, NY 10003, USA"}]},{"given":"Naim U","family":"Rashid","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, University of North Carolina at Chapel Hill , Chapel Hill, NC 27516, USA"},{"name":"Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill , Chapel Hill, NC 27599, USA"}]},{"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Maryland , College Park, MD 20742, USA"},{"name":"Center for Bioinformatics and Computational Biology, University of Maryland , College Park, MD 20742, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8401-0545","authenticated-orcid":false,"given":"Michael I","family":"Love","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, University of North Carolina at Chapel Hill , Chapel Hill, NC 27516, USA"},{"name":"Department of Genetics, University of North Carolina at Chapel Hill , Chapel Hill, NC 27514, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,1,20]]},"reference":[{"key":"2023051709554966700_btab001-B1","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1101\/gr.4137606","article-title":"Transcription-mediated gene fusion in the human genome","volume":"16","author":"Akiva","year":"2005","journal-title":"Genome Res"},{"key":"2023051709554966700_btab001-B2","first-page":"S2","volume-title":"BMC Genomics","author":"Al Seesi","year":"2014"},{"key":"2023051709554966700_btab001-B3","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1038\/labinvest.2017.137","article-title":"The ndpk\/nme superfamily: state of the art","volume":"98","author":"Boissan","year":"2018","journal-title":"Lab. Investig"},{"key":"2023051709554966700_btab001-B4","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1038\/nbt.3519","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051709554966700_btab001-B5","doi-asserted-by":"crossref","first-page":"2496","DOI":"10.1002\/eji.201646347","article-title":"Computational methods for trajectory inference from single-cell transcriptomics","volume":"46","author":"Cannoodt","year":"2016","journal-title":"Eur. J. Immunol"},{"key":"2023051709554966700_btab001-B6","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1186\/1471-2148-9-256","article-title":"Nme protein family evolutionary history, a vertebrate perspective","volume":"9","author":"Desvignes","year":"2009","journal-title":"BMC Evol. Biol"},{"key":"2023051709554966700_btab001-B7","first-page":"15","article-title":"Star: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023051709554966700_btab001-B8","doi-asserted-by":"crossref","first-page":"D766","DOI":"10.1093\/nar\/gky955","article-title":"GENCODE reference annotation for the human and mouse genomes","volume":"47","author":"Frankish","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051709554966700_btab001-B9","doi-asserted-by":"crossref","first-page":"213","DOI":"10.12688\/f1000research.17916.1","article-title":"Relative abundance of transcripts (rats): Identifying differential isoform abundance from RNA-seq [version 1; peer review: 1 approved, 2 approved with reservations","volume":"8","author":"Froussios","year":"2019","journal-title":"F1000Research"},{"key":"2023051709554966700_btab001-B10","doi-asserted-by":"crossref","DOI":"10.1201\/b16018","volume-title":"Bayesian Data Analysis","author":"Gelman","year":"2013","edition":"3rd edn"},{"key":"2023051709554966700_btab001-B11","doi-asserted-by":"crossref","first-page":"1760","DOI":"10.1101\/gr.135350.111","article-title":"Gencode: the reference human genome annotation for the encode project","volume":"22","author":"Harrow","year":"2012","journal-title":"Genome Res"},{"key":"2023051709554966700_btab001-B12","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1023\/A:1005597231776","article-title":"Nm23\/nucleoside diphosphate kinase in human cancers","volume":"32","author":"Hartsough","year":"2000","journal-title":"J. Bioenerg. Biomembranes"},{"key":"2023051709554966700_btab001-B13","first-page":"297","article-title":"Generalized additive models","volume":"1","author":"Hastie","year":"1986","journal-title":"Statist. Sci"},{"key":"2023051709554966700_btab001-B14","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-92407-6","volume-title":"A First Course in Bayesian Statistical Methods","author":"Hoff","year":"2009","edition":"1st edn"},{"key":"2023051709554966700_btab001-B15","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1038\/s12276-018-0071-8","article-title":"Single-cell RNA sequencing technologies and bioinformatics pipelines","volume":"50","author":"Hwang","year":"2018","journal-title":"Exp. Mol. Med"},{"key":"2023051709554966700_btab001-B16","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1080\/00031305.1996.10473566","article-title":"Sample quantiles in statistical packages","volume":"50","author":"Hyndman","year":"1996","journal-title":"Am. Stat"},{"key":"2023051709554966700_btab001-B17","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s10585-012-9495-z","article-title":"Nm23 deficiency promotes metastasis in a UV radiation-induced mouse model of human melanoma","volume":"30","author":"Jarrett","year":"2013","journal-title":"Clin. Exp. Metastasis"},{"key":"2023051709554966700_btab001-B18","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake-a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051709554966700_btab001-B19","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/s13059-020-1926-6","article-title":"Eleven grand challenges in single-cell data science","volume":"21","author":"L\u00e4hnemann","year":"2020","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B20","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1186\/1471-2105-12-323","article-title":"Rsem: accurate transcript quantification from RNA-seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023051709554966700_btab001-B21","doi-asserted-by":"crossref","first-page":"e1007664","DOI":"10.1371\/journal.pcbi.1007664","article-title":"Tximeta: reference sequence checksums for provenance identification in RNA-seq","volume":"16","author":"Love","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023051709554966700_btab001-B22","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1016\/0959-8049(95)00152-9","article-title":"The potential roles of nm23 in cancer metastasis and cellular differentiation","volume":"31","author":"MacDonald","year":"1995","journal-title":"Eur. J. Cancer"},{"key":"2023051709554966700_btab001-B23","doi-asserted-by":"crossref","first-page":"3302","DOI":"10.1093\/bioinformatics\/btx365","article-title":"Fast bootstrapping-based estimation of confidence intervals of expression levels and differential expression from RNA-Seq data","volume":"33","author":"Mandric","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051709554966700_btab001-B24","author":"Melsted","year":"2019"},{"key":"2023051709554966700_btab001-B25","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1186\/s12859-017-1790-x","article-title":"Bayesian unidimensional scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations","volume":"18","author":"Nguyen","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023051709554966700_btab001-B26","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051709554966700_btab001-B27","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1186\/s13059-018-1449-6","article-title":"dropest: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments","volume":"19","author":"Petukhov","year":"2018","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B28","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1038\/s41586-019-0933-9","article-title":"A single-cell molecular map of mouse gastrulation and early organogenesis","volume":"566","author":"Pijuan-Sala","year":"2019","journal-title":"Nature"},{"key":"2023051709554966700_btab001-B29","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1038\/nmeth.4324","article-title":"Differential analysis of RNA-seq incorporating quantification uncertainty","volume":"14","author":"Pimentel","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051709554966700_btab001-B30","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/s11010-009-0110-9","article-title":"Double knockout nme1\/nme2 mouse model suggests a critical role for ndp kinases in erythroid development","volume":"329","author":"Postel","year":"2009","journal-title":"Mol. Cell. Biochem"},{"key":"2023051709554966700_btab001-B31","doi-asserted-by":"crossref","first-page":"e13284","DOI":"10.1371\/journal.pone.0013284","article-title":"Expression of conjoined genes: another mechanism for gene regulation in eukaryotes","volume":"5","author":"Prakash","year":"2010","journal-title":"PLoS One"},{"key":"2023051709554966700_btab001-B32","doi-asserted-by":"crossref","first-page":"1430","DOI":"10.1080\/01621459.2017.1288631","article-title":"Bayesian nonparametric ordination for the analysis of microbial communities","volume":"112","author":"Ren","year":"2017","journal-title":"J. Am. Stat. Assoc"},{"key":"2023051709554966700_btab001-B33","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1186\/s13059-015-0734-x","article-title":"Errors in RNA-Seq quantification affect genes of relevance to human disease","volume":"16","author":"Robert","year":"2015","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B34","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1038\/s41587-019-0071-9","article-title":"A comparison of single-cell trajectory inference methods","volume":"37","author":"Saelens","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023051709554966700_btab001-B35","doi-asserted-by":"crossref","first-page":"i136","DOI":"10.1093\/bioinformatics\/btz351","article-title":"Minnow: a principled framework for rapid simulation of dscRNA-seq data at the read level","volume":"35","author":"Sarkar","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051709554966700_btab001-B36","first-page":"i102","author":"Sarkar","year":"2020"},{"key":"2023051709554966700_btab001-B37","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/nmeth.3805","article-title":"icobra: open, reproducible, standardized and live method benchmarking","volume":"13","author":"Soneson","year":"2016","journal-title":"Nat. Methods"},{"key":"2023051709554966700_btab001-B38","doi-asserted-by":"crossref","first-page":"1521; 1521","DOI":"10.12688\/f1000research.7563.2","article-title":"Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences","volume":"4","author":"Soneson","year":"2016","journal-title":"F1000Research"},{"key":"2023051709554966700_btab001-B39","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1186\/s13059-019-1670-y","article-title":"Alevin efficiently estimates accurate gene abundances from dscRNA-seq data","volume":"20","author":"Srivastava","year":"2019","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B40","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1111\/1467-9868.00346","article-title":"A direct approach to false discovery rates","volume":"64","author":"Storey","year":"2002","journal-title":"J. R. Stat. Soc. Ser. B (Statistical Methodology)"},{"key":"2023051709554966700_btab001-B41","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1186\/s12864-018-4772-0","article-title":"Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics","volume":"19","author":"Street","year":"2018","journal-title":"BMC Genomics"},{"key":"2023051709554966700_btab001-B42","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1186\/s13059-020-01967-8","article-title":"Bandits: bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty","volume":"21","author":"Tiberi","year":"2020","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B43","doi-asserted-by":"crossref","first-page":"R13","DOI":"10.1186\/gb-2011-12-2-r13","article-title":"Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads","volume":"12","author":"Turro","year":"2011","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B44","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1093\/bioinformatics\/btt624","article-title":"Flexible analysis of RNA-seq data using mixed effects models","volume":"30","author":"Turro","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051709554966700_btab001-B45","author":"Van Buren","year":"2020"},{"key":"2023051709554966700_btab001-B46","doi-asserted-by":"crossref","first-page":"1201","DOI":"10.1038\/s41467-020-14766-3","article-title":"Trajectory-based differential expression analysis for single-cell sequencing data","volume":"11","author":"Van den Berge","year":"2020","journal-title":"Nat. Commun"},{"key":"2023051709554966700_btab001-B47","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1186\/1471-2105-7-175","article-title":"Ls-nmf: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates","volume":"7","author":"Wang","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023051709554966700_btab001-B48","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2023051709554966700_btab001-B49","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"},{"key":"2023051709554966700_btab001-B50","doi-asserted-by":"crossref","first-page":"e105","DOI":"10.1093\/nar\/gkz622","article-title":"Nonparametric expression analysis using inferential replicate counts","volume":"47","author":"Zhu","year":"2019","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab001\/36158262\/btab001.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/12\/1699\/50361346\/btab001.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/12\/1699\/50361346\/btab001.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,20]],"date-time":"2024-08-20T15:30:11Z","timestamp":1724167811000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/12\/1699\/6104828"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1,20]]},"references-count":50,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab001","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.07.06.189639","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,6,15]]},"published":{"date-parts":[[2021,1,20]]}}}