{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T00:01:38Z","timestamp":1776038498523,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strategy can result in an overly optimistic view of the performance of an RNA-seq analysis method.<\/jats:p><jats:p>Results: We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the distribution of a source RNA-seq dataset provided by the user. We conduct simulation experiments based on the negative binomial distribution and our proposed nonparametric simulation algorithm. We compare performance between the two simulation experiments over a small subset of statistical methods for RNA-seq analysis available in the literature. We use as a benchmark the ability of a method to control the false discovery rate. Not surprisingly, methods based on parametric modeling assumptions seem to perform better with respect to false discovery rate control when data are simulated from parametric models rather than using our more realistic nonparametric simulation strategy.<\/jats:p><jats:p>Availability and implementation: The nonparametric simulation algorithm developed in this article is implemented in the R package SimSeq, which is freely available under the GNU General Public License (version 2 or later) from the Comprehensive R Archive Network (http:\/\/cran.rproject.org\/).<\/jats:p><jats:p>Contact: \u00a0sgbenidt@gmail.com<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv124","type":"journal-article","created":{"date-parts":[[2015,2,28]],"date-time":"2015-02-28T01:22:29Z","timestamp":1425086549000},"page":"2131-2140","source":"Crossref","is-referenced-by-count":59,"title":["SimSeq: a nonparametric approach to simulation of RNA-sequence datasets"],"prefix":"10.1093","volume":"31","author":[{"given":"Sam","family":"Benidt","sequence":"first","affiliation":[{"name":"Department of Statistics, Iowa State University, Ames, IA 50011-1210, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dan","family":"Nettleton","sequence":"additional","affiliation":[{"name":"Department of Statistics, Iowa State University, Ames, IA 50011-1210, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,2,26]]},"reference":[{"key":"2023020202133644100_btv124-B1","doi-asserted-by":"crossref","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol."},{"key":"2023020202133644100_btv124-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023020202133644100_btv124-B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B"},{"key":"2023020202133644100_btv124-B4","doi-asserted-by":"crossref","first-page":"e17820","DOI":"10.1371\/journal.pone.0017820","article-title":"Evaluating gene expression in c57bl\/6j and dba\/2j mouse striatum using rna-seq and microarrays","volume":"6","author":"Bottomly","year":"2011","journal-title":"PLoS One"},{"key":"2023020202133644100_btv124-B5","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/1471-2105-11-94","article-title":"Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments","volume":"11","author":"Bullard","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020202133644100_btv124-B6","doi-asserted-by":"crossref","first-page":"e576","DOI":"10.7717\/peerj.576","article-title":"Error estimates for the analysis of differential expression from RNA-seq count data","volume":"2","author":"Burden","year":"2014","journal-title":"Peer J."},{"key":"2023020202133644100_btv124-B7","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/bib\/bbs046","article-title":"A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis","volume":"14","author":"Dillies","year":"2013","journal-title":"Brief. Bioinform"},{"key":"2023020202133644100_btv124-B8","doi-asserted-by":"crossref","first-page":"e1000098","DOI":"10.1371\/journal.pgen.1000098","article-title":"Evaluating statistical methods using plasmode data sets in the age of massive public databases: an illustration using false discovery rates","volume":"4","author":"Gadbury","year":"2008","journal-title":"PLoS Genet."},{"key":"2023020202133644100_btv124-B9","doi-asserted-by":"crossref","first-page":"10073","DOI":"10.1093\/nar\/gks666","article-title":"Modelling and simulating generic RNA-Seq experiments with the flux simulator","volume":"40","author":"Griebel","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023020202133644100_btv124-B10","doi-asserted-by":"crossref","first-page":"R29","DOI":"10.1186\/gb-2014-15-2-r29","article-title":"Voom: precision weights unlock linear model analysis tools for RNA-seq read counts","volume":"15","author":"Law","year":"2014","journal-title":"Genome Biol."},{"key":"2023020202133644100_btv124-B11","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1186\/1471-2105-12-323","article-title":"RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020202133644100_btv124-B12","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1177\/0962280211428386","article-title":"Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data","volume":"22","author":"Li","year":"2013","journal-title":"Stat. Methods Med. Res."},{"key":"2023020202133644100_btv124-B13","doi-asserted-by":"crossref","first-page":"1444","DOI":"10.1198\/jasa.2010.tm10195","article-title":"A hidden Markov model approach to testing multiple hypotheses on a tree-transformed gene ontology graph","volume":"105","author":"Liang","year":"2010","journal-title":"J. Am. Stat. Assoc."},{"key":"2023020202133644100_btv124-B14","doi-asserted-by":"crossref","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol."},{"key":"2023020202133644100_btv124-B15","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1515\/1544-6115.1826","article-title":"Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates","volume":"11","author":"Lund","year":"2012","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023020202133644100_btv124-B16","doi-asserted-by":"crossref","first-page":"4288","DOI":"10.1093\/nar\/gks042","article-title":"Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation","volume":"40","author":"McCarthy","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023020202133644100_btv124-B17","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1093\/bioinformatics\/btm583","article-title":"Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis","volume":"24","author":"Nettleton","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020202133644100_btv124-B18","doi-asserted-by":"crossref","first-page":"178","DOI":"10.3389\/fgene.2013.00178","article-title":"Evaluating statistical analysis models for RNA sequencing experiments","volume":"4","author":"Reeb","year":"2013","journal-title":"Front. Genet."},{"key":"2023020202133644100_btv124-B19","doi-asserted-by":"crossref","first-page":"3424","DOI":"10.1093\/bioinformatics\/btu552","article-title":"subSeq: Determining appropriate sequencing depth through efficient read subsampling","volume":"30","author":"Robinson","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020202133644100_btv124-B20","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of RNA-seq data","volume":"11","author":"Robinson","year":"2010","journal-title":"Genome Biol."},{"key":"2023020202133644100_btv124-B21","doi-asserted-by":"crossref","first-page":"2881","DOI":"10.1093\/bioinformatics\/btm453","article-title":"Moderated statistical tests for assessing differences in tag abundance","volume":"23","author":"Robinson","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020202133644100_btv124-B22","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1093\/biostatistics\/kxm030","article-title":"Small-sample estimation of negative binomial dispersion, with applications to SAGE data","volume":"9","author":"Robinson","year":"2008","journal-title":"Biostatistics"},{"key":"2023020202133644100_btv124-B23","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020202133644100_btv124-B24","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/1471-2105-14-91","article-title":"A comparison of methods for differential expression analysis of RNA-seq data","volume":"14","author":"Soneson","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020202133644100_btv124-B25","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1111\/1467-9868.00346","article-title":"A direct approach to false discovery rates","volume":"64","author":"Storey","year":"2002","journal-title":"J. R. Stat. Soc. B"},{"key":"2023020202133644100_btv124-B26","doi-asserted-by":"crossref","first-page":"1461","DOI":"10.1093\/bioinformatics\/btn209","article-title":"fdrtool: a versatile R package for estimating local and tail area-based false discovery rates","volume":"24","author":"Strimmer","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020202133644100_btv124-B27","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1186\/1471-2105-9-303","article-title":"A unified approach to false discovery rate estimation","volume":"9","author":"Strimmer","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020202133644100_btv124-B28","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/nature12222","article-title":"Comprehensive molecular characterization of clear cell renal cell carcinoma","volume":"499","author":"The Cancer Genome Atlas Research Network","year":"2013","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/13\/2131\/49034504\/bioinformatics_31_13_2131.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/13\/2131\/49034504\/bioinformatics_31_13_2131.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T14:07:58Z","timestamp":1717769278000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/13\/2131\/196386"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,2,26]]},"references-count":28,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2015,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv124","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,7,1]]},"published":{"date-parts":[[2015,2,26]]}}}