{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:49Z","timestamp":1772138089585,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CCF-1651236"],"award-info":[{"award-number":["CCF-1651236"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["5R01HG008164-02"],"award-info":[{"award-number":["5R01HG008164-02"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1-R01-CA207029-01A1"],"award-info":[{"award-number":["1-R01-CA207029-01A1"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Single cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code is available at https:\/\/github.com\/yjzhang\/uncurl_python.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty293","type":"journal-article","created":{"date-parts":[[2018,4,24]],"date-time":"2018-04-24T15:15:06Z","timestamp":1524582906000},"page":"i124-i132","source":"Crossref","is-referenced-by-count":24,"title":["Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge"],"prefix":"10.1093","volume":"34","author":[{"given":"Sumit","family":"Mukherjee","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, University of Washington, Seattle, WA, USA"}]},{"given":"Yue","family":"Zhang","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA"}]},{"given":"Joshua","family":"Fan","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA"}]},{"given":"Georg","family":"Seelig","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, University of Washington, Seattle, WA, USA"},{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA"}]},{"given":"Sreeram","family":"Kannan","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, University of Washington, Seattle, WA, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604235790400_bty293-B1","author":"10XGenomics","year":"2017"},{"key":"2023051604235790400_bty293-B2","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1002\/wics.101","article-title":"Principal component analysis","volume":"2","author":"Abdi","year":"2010","journal-title":"Wiley Interdisc. Rev. Comput. Stat"},{"key":"2023051604235790400_bty293-B3","doi-asserted-by":"crossref","first-page":"R106.","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol"},{"key":"2023051604235790400_bty293-B4","first-page":"1027","author":"Arthur","year":"2007"},{"key":"2023051604235790400_bty293-B5","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2023051604235790400_bty293-B6","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1287\/moor.2016.0817","article-title":"A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications","volume":"42","author":"Bauschke","year":"2017","journal-title":"Math. Operat. Res"},{"key":"2023051604235790400_bty293-B7","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1080\/01621459.1972.10482387","article-title":"On Simpson\u2019s paradox and the sure-thing principle","volume":"67","author":"Blyth","year":"1972","journal-title":"J. Am. Stat. Assoc"},{"key":"2023051604235790400_bty293-B8","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1016\/j.patcog.2007.09.010","article-title":"SVD based initialization: a head start for nonnegative matrix factorization","volume":"41","author":"Boutsidis","year":"2008","journal-title":"Pattern Recogn"},{"key":"2023051604235790400_bty293-B9","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"Brunet","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051604235790400_bty293-B10","author":"Dijk","year":"2017"},{"key":"2023051604235790400_bty293-B11","first-page":"29","author":"Ding","year":"2004"},{"key":"2023051604235790400_bty293-B12","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1016\/j.cell.2015.10.039","article-title":"Design and Analysis of Single-Cell Sequencing Experiments","volume":"163","author":"Grun","year":"2015","journal-title":"Cell"},{"key":"2023051604235790400_bty293-B13","doi-asserted-by":"crossref","first-page":"637.","DOI":"10.1038\/nmeth.2930","article-title":"Validation of noise models for single-cell transcriptomics","volume":"11","author":"Grun","year":"2014","journal-title":"Nat. Methods"},{"key":"2023051604235790400_bty293-B14","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1002\/bimj.200710403","article-title":"Testing the ratio of two poisson rates","volume":"50","author":"Gu","year":"2008","journal-title":"Biometr. J"},{"key":"2023051604235790400_bty293-B15","first-page":"1251","author":"Hanchate","year":"2015"},{"key":"2023051604235790400_bty293-B16","volume-title":"Algorithms for Clustering Data","author":"Jain","year":"1988"},{"key":"2023051604235790400_bty293-B17","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2023051604235790400_bty293-B18","first-page":"23","author":"Langville","year":"2006"},{"key":"2023051604235790400_bty293-B19","first-page":"556","volume-title":"Advances in Neural Information Processing Systems 13","author":"Lee","year":"2001"},{"key":"2023051604235790400_bty293-B20","doi-asserted-by":"crossref","first-page":"550.","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for rna-seq data with deseq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2023051604235790400_bty293-B21","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604235790400_bty293-B22","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604235790400_bty293-B23","first-page":"9072","article-title":"Computing the confidence levels for a root-mean-square test of goodness-of-fit","volume":"217","author":"Perkins","year":"2011","journal-title":"Appl. Math. Comput"},{"key":"2023051604235790400_bty293-B24","doi-asserted-by":"crossref","first-page":"241.","DOI":"10.1186\/s13059-015-0805-z","article-title":"ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis","volume":"16","author":"Pierson","year":"2015","journal-title":"Genome Biol"},{"key":"2023051604235790400_bty293-B25","doi-asserted-by":"crossref","first-page":"979.","DOI":"10.1038\/nmeth.4402","article-title":"Reversed graph embedding resolves complex single-cell trajectories","volume":"14","author":"Qiu","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051604235790400_bty293-B26","first-page":"176","article-title":"Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding","volume-title":"Science","author":"Rosenberg","year":"2018"},{"key":"2023051604235790400_bty293-B27","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","article-title":"Nonlinear dimensionality reduction by locally linear embedding","volume":"290","author":"Roweis","year":"2000","journal-title":"Science"},{"key":"2023051604235790400_bty293-B28","doi-asserted-by":"crossref","first-page":"495.","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023051604235790400_bty293-B29","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1038\/nbt.3569","article-title":"Wishbone identifies bifurcating developmental trajectories from single-cell data","volume":"34","author":"Setty","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051604235790400_bty293-B30","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/bioinformatics\/btw607","article-title":"Robust classification of single-cell transcriptome data by nonnegative matrix factorization","volume":"33","author":"Shao","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051604235790400_bty293-B31","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1016\/j.stem.2015.07.013","article-title":"Single-cell RNA-Seq with waterfall reveals molecular cascades underlying adult neurogenesis","volume":"17","author":"Shin","year":"2015","journal-title":"Cell Stem Cell"},{"key":"2023051604235790400_bty293-B32","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1093\/bioinformatics\/18.4.555","article-title":"Binary analysis and optimization-based normalization of gene expression data","volume":"18","author":"Shmulevich","year":"2002","journal-title":"Bioinformatics"},{"key":"2023051604235790400_bty293-B33","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nn.4216","article-title":"Adult mouse cortical cell taxonomy revealed by single cell transcriptomics","volume":"19","author":"Tasic","year":"2016","journal-title":"Nat. Neurosci"},{"key":"2023051604235790400_bty293-B34","doi-asserted-by":"crossref","first-page":"1491","DOI":"10.1101\/gr.190595.115","article-title":"Defining cell types and states with single-cell genomics","volume":"25","author":"Trapnell","year":"2015","journal-title":"Genome Res"},{"key":"2023051604235790400_bty293-B35","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1038\/nbt.2859","article-title":"The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells","volume":"32","author":"Trapnell","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023051604235790400_bty293-B36","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1038\/nn.3881","article-title":"Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing","volume":"18","author":"Usoskin","year":"2015","journal-title":"Nat. Neurosci"},{"key":"2023051604235790400_bty293-B37","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1038\/nbt.3711","article-title":"Revealing the vectors of cellular identity with single-cell genomics","volume":"34","author":"Wagner","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023051604235790400_bty293-B38","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051604235790400_bty293-B39","doi-asserted-by":"crossref","first-page":"106.","DOI":"10.1186\/s13059-016-0975-3","article-title":"SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data","volume":"17","author":"Welch","year":"2016","journal-title":"Genome Biol"},{"key":"2023051604235790400_bty293-B40","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1126\/science.aaa1934","article-title":"Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq","volume":"347","author":"Zeisel","year":"2015","journal-title":"Science"},{"key":"2023051604235790400_bty293-B151","doi-asserted-by":"crossref","first-page":"11929","DOI":"10.1523\/JNEUROSCI.1860-14.2014","article-title":"An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex","volume":"34","author":"Zhang","year":"2014","journal-title":"J Neurosci."},{"key":"2023051604235790400_bty293-B41","doi-asserted-by":"crossref","first-page":"14049.","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i124\/50315991\/bioinformatics_34_13_i124.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i124\/50315991\/bioinformatics_34_13_i124.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T00:26:32Z","timestamp":1684196792000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i124\/5045758"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":42,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty293","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/142398","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}