{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T23:09:32Z","timestamp":1772665772247,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2019,7,8]],"date-time":"2019-07-08T00:00:00Z","timestamp":1562544000000},"content-version":"vor","delay-in-days":7,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CCF-1750472"],"award-info":[{"award-number":["CCF-1750472"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2018-182752"],"award-info":[{"award-number":["2018-182752"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["1531492"],"award-info":[{"award-number":["1531492"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>With the advancements of high-throughput single-cell RNA-sequencing protocols, there has been a rapid increase in the tools available to perform an array of analyses on the gene expression data that results from such studies. For example, there exist methods for pseudo-time series analysis, differential cell usage, cell-type detection RNA-velocity in single cells, etc. Most analysis pipelines validate their results using known marker genes (which are not widely available for all types of analysis) and by using simulated data from gene-count-level simulators. Typically, the impact of using different read-alignment or unique molecular identifier (UMI) deduplication methods has not been widely explored. Assessments based on simulation tend to start at the level of assuming a simulated count matrix, ignoring the effect that different approaches for resolving UMI counts from the raw read data may produce. Here, we present minnow, a comprehensive sequence-level droplet-based single-cell RNA-sequencing (dscRNA-seq) experiment simulation framework. Minnow accounts for important sequence-level characteristics of experimental scRNA-seq datasets and models effects such as polymerase chain reaction amplification, cellular barcodes (CB) and UMI selection and sequence fragmentation and sequencing. It also closely matches the gene-level ambiguity characteristics that are observed in real scRNA-seq experiments. Using minnow, we explore the performance of some common processing pipelines to produce gene-by-cell count matrices from droplet-bases scRNA-seq data, demonstrate the effect that realistic levels of gene-level sequence ambiguity can have on accurate quantification and show a typical use-case of minnow in assessing the output generated by different quantification pipelines on the simulated experiment.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz351","type":"journal-article","created":{"date-parts":[[2019,5,9]],"date-time":"2019-05-09T19:21:53Z","timestamp":1557429713000},"page":"i136-i144","source":"Crossref","is-referenced-by-count":28,"title":["<i>Minnow<\/i>: a principled framework for rapid simulation of dscRNA-seq data at the read level"],"prefix":"10.1093","volume":"35","author":[{"given":"Hirak","family":"Sarkar","sequence":"first","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Avi","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]},{"given":"Rob","family":"Patro","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stony Brook University, Stony Brook, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,7,5]]},"reference":[{"key":"2023062712344868400_btz351-B1","doi-asserted-by":"crossref","first-page":"110.","DOI":"10.1186\/s13059-018-1496-z","article-title":"Single-cell RNAseq for the study of isoforms\u2014how is that possible?","volume":"19","author":"Arzalluz-Luque","year":"2018","journal-title":"Genome Biol"},{"key":"2023062712344868400_btz351-B2","doi-asserted-by":"crossref","first-page":"14629","DOI":"10.1038\/srep14629","article-title":"Computational analysis of stochastic heterogeneity in PCR amplification efficiency revealed by single molecule barcoding","volume":"5","author":"Best","year":"2015","journal-title":"Sci. Rep"},{"key":"2023062712344868400_btz351-B3","first-page":"18","article-title":"Improved protocols for illumina sequencing","volume":"79","author":"Bronner","year":"2013","journal-title":"Curr. Protoc. Hum. Genet"},{"key":"2023062712344868400_btz351-B4","doi-asserted-by":"crossref","first-page":"1209","DOI":"10.1016\/j.cell.2012.08.023","article-title":"Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase","volume":"150","author":"Buganim","year":"2012","journal-title":"Cell"},{"key":"2023062712344868400_btz351-B5","doi-asserted-by":"crossref","first-page":"411.","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023062712344868400_btz351-B6","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062712344868400_btz351-B7","doi-asserted-by":"crossref","first-page":"278.","DOI":"10.1186\/s13059-015-0844-5","article-title":"Mast: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data","volume":"16","author":"Finak","year":"2015","journal-title":"Genome Biol"},{"key":"2023062712344868400_btz351-B8","doi-asserted-by":"crossref","first-page":"2778","DOI":"10.1093\/bioinformatics\/btv272","article-title":"Polyester: simulating RNA-seq datasets with differential transcript expression","volume":"31","author":"Frazee","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062712344868400_btz351-B9","doi-asserted-by":"crossref","first-page":"10073","DOI":"10.1093\/nar\/gks666","article-title":"Modelling and simulating generic RNA-seq experiments with the flux simulator","volume":"40","author":"Griebel","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062712344868400_btz351-B10","doi-asserted-by":"crossref","first-page":"251.","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"2023062712344868400_btz351-B11","doi-asserted-by":"crossref","first-page":"666","DOI":"10.1016\/j.celrep.2012.08.003","article-title":"Cel-seq: single-cell RNA-seq by multiplexed linear amplification","volume":"2","author":"Hashimshony","year":"2012","journal-title":"Cell Rep"},{"key":"2023062712344868400_btz351-B12","doi-asserted-by":"crossref","first-page":"e1005761.","DOI":"10.1371\/journal.pcbi.1005761","article-title":"Stochastic principles governing alternative splicing of RNA","volume":"13","author":"Hu","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023062712344868400_btz351-B13","doi-asserted-by":"crossref","first-page":"483.","DOI":"10.1038\/nmeth.4236","article-title":"Sc3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"2023062712344868400_btz351-B14","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2023062712344868400_btz351-B15","doi-asserted-by":"crossref","first-page":"494.","DOI":"10.1038\/s41586-018-0414-6","article-title":"RNA velocity of single cells","volume":"560","author":"La Manno","year":"2018","journal-title":"Nature"},{"key":"2023062712344868400_btz351-B16","doi-asserted-by":"crossref","first-page":"323.","DOI":"10.1186\/1471-2105-12-323","article-title":"Rsem: accurate transcript quantification from RNA-seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinform"},{"key":"2023062712344868400_btz351-B17","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1093\/bioinformatics\/btt656","article-title":"featureCounts: an efficient general-purpose read summarization program","volume":"30","author":"Liao","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062712344868400_btz351-B18","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2023062712344868400_btz351-B19","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1016\/j.jmva.2006.11.013","article-title":"Comparing clusterings? An information based distance","volume":"98","author":"Meil\u0103","year":"2007","journal-title":"J. Multivariate Anal"},{"key":"2023062712344868400_btz351-B20","doi-asserted-by":"crossref","first-page":"4024","DOI":"10.1093\/bioinformatics\/btw609","article-title":"TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes","volume":"33","author":"Minkin","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062712344868400_btz351-B21","article-title":"Alignment-free clustering of UMI tagged DNA molecules","volume":"35","author":"Orabi","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062712344868400_btz351-B22","doi-asserted-by":"crossref","first-page":"417.","DOI":"10.1038\/nmeth.4197","article-title":"Salmon provides fast and bias-aware quantification of transcript expression","volume":"14","author":"Patro","year":"2017","journal-title":"Nat. Methods"},{"key":"2023062712344868400_btz351-B23","doi-asserted-by":"crossref","first-page":"1096.","DOI":"10.1038\/nmeth.2639","article-title":"Smart-seq2 for sensitive full-length transcriptome profiling in single cells","volume":"10","author":"Picelli","year":"2013","journal-title":"Nat. Methods"},{"key":"2023062712344868400_btz351-B24","doi-asserted-by":"crossref","first-page":"241.","DOI":"10.1186\/s13059-015-0805-z","article-title":"ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis","volume":"16","author":"Pierson","year":"2015","journal-title":"Genome Biol"},{"key":"2023062712344868400_btz351-B25","doi-asserted-by":"crossref","first-page":"979.","DOI":"10.1038\/nmeth.4402","article-title":"Reversed graph embedding resolves complex single-cell trajectories","volume":"14","author":"Qiu","year":"2017","journal-title":"Nat. Methods"},{"key":"2023062712344868400_btz351-B26","volume-title":"Zinb-wave: A General and Flexible Method for Signal Extraction from Single-Cell RNA-seq Data","author":"Risso","year":"2017"},{"key":"2023062712344868400_btz351-B27","doi-asserted-by":"crossref","first-page":"495.","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023062712344868400_btz351-B28","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1101\/gr.209601.116","article-title":"UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy","volume":"27","author":"Smith","year":"2017","journal-title":"Genome Res"},{"key":"2023062712344868400_btz351-B29","first-page":"65","volume-title":"Alevin Efficiently Estimates Accurate Gene Abundances from Dscrna-Seq Data","author":"Srivastava","year":"2019"},{"key":"2023062712344868400_btz351-B30","doi-asserted-by":"crossref","first-page":"1491","DOI":"10.1101\/gr.190595.115","article-title":"Defining cell types and states with single-cell genomics","volume":"25","author":"Trapnell","year":"2015","journal-title":"Genome Res"},{"key":"2023062712344868400_btz351-B31","doi-asserted-by":"crossref","first-page":"3486","DOI":"10.1093\/bioinformatics\/btx435","article-title":"powsimR: power analysis for bulk and single cell RNA-seq experiments","volume":"33","author":"Vieth","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062712344868400_btz351-B32","first-page":"E6437","article-title":"Gene expression distribution deconvolution in single-cell RNA sequencing","volume":"115","author":"Wang","year":"2018","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062712344868400_btz351-B33","doi-asserted-by":"crossref","first-page":"997.","DOI":"10.1038\/s41467-018-03405-7","article-title":"An accurate and robust imputation method scImpute for single-cell RNA-seq data","volume":"9","author":"Wei","year":"2018","journal-title":"Nat. Commun"},{"key":"2023062712344868400_btz351-B34","doi-asserted-by":"crossref","first-page":"191.","DOI":"10.1186\/s13059-018-1571-5","article-title":"Simulation-based benchmarking of isoform quantification in single-cell RNA-seq","volume":"19","author":"Westoby","year":"2018","journal-title":"Genome Biol"},{"key":"2023062712344868400_btz351-B35","doi-asserted-by":"crossref","first-page":"174.","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2023062712344868400_btz351-B36","doi-asserted-by":"crossref","first-page":"e1006245.","DOI":"10.1371\/journal.pcbi.1006245","article-title":"Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database","volume":"14","author":"Zappia","year":"2018","journal-title":"PLoS Comput. Biol"},{"key":"2023062712344868400_btz351-B37","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"},{"key":"2023062712344868400_btz351-B38","doi-asserted-by":"crossref","first-page":"14049.","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/14\/i136\/50721031\/bioinformatics_35_14_i136.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/14\/i136\/50721031\/bioinformatics_35_14_i136.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T12:35:55Z","timestamp":1687869355000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/14\/i136\/5529127"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7]]},"references-count":38,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2019,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz351","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,7]]},"published":{"date-parts":[[2019,7]]}}}