{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T07:59:51Z","timestamp":1773302391726,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1267,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: RNA-seq experiments produce digital counts of reads that are affected by both biological and technical variation. To distinguish the systematic changes in expression between conditions from noise, the counts are frequently modeled by the Negative Binomial distribution. However, in experiments with small sample size, the per-gene estimates of the dispersion parameter are unreliable.<\/jats:p><jats:p>Method: We propose a simple and effective approach for estimating the dispersions. First, we obtain the initial estimates for each gene using the method of moments. Second, the estimates are regularized, i.e. shrunk towards a common value that minimizes the average squared difference between the initial estimates and the shrinkage estimates. The approach does not require extra modeling assumptions, is easy to compute and is compatible with the exact test of differential expression.<\/jats:p><jats:p>Results: We evaluated the proposed approach using 10 simulated and experimental datasets and compared its performance with that of currently popular packages edgeR, DESeq, baySeq, BBSeq and SAMseq. For these datasets, sSeq performed favorably for experiments with small sample size in sensitivity, specificity and computational time.<\/jats:p><jats:p>Availability: \u00a0http:\/\/www.stat.purdue.edu\/\u223covitek\/Software.html and Bioconductor.<\/jats:p><jats:p>Contact: \u00a0ovitek@purdue.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt143","type":"journal-article","created":{"date-parts":[[2013,4,16]],"date-time":"2013-04-16T00:15:15Z","timestamp":1366071315000},"page":"1275-1282","source":"Crossref","is-referenced-by-count":119,"title":["Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size"],"prefix":"10.1093","volume":"29","author":[{"given":"Danni","family":"Yu","sequence":"first","affiliation":[{"name":"1 Genome Biology Unit, European Molecular Biology Laboratory, Mayerhofstra\u00dfe 1, Heidelberg 69117, Germany 2Department of Statistics and 3Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA"},{"name":"1 Genome Biology Unit, European Molecular Biology Laboratory, Mayerhofstra\u00dfe 1, Heidelberg 69117, Germany 2Department of Statistics and 3Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA"}]},{"given":"Wolfgang","family":"Huber","sequence":"additional","affiliation":[{"name":"1 Genome Biology Unit, European Molecular Biology Laboratory, Mayerhofstra\u00dfe 1, Heidelberg 69117, Germany 2Department of Statistics and 3Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA"}]},{"given":"Olga","family":"Vitek","sequence":"additional","affiliation":[{"name":"1 Genome Biology Unit, European Molecular Biology Laboratory, Mayerhofstra\u00dfe 1, Heidelberg 69117, Germany 2Department of Statistics and 3Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA"},{"name":"1 Genome Biology Unit, European Molecular Biology Laboratory, Mayerhofstra\u00dfe 1, Heidelberg 69117, Germany 2Department of Statistics and 3Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,4,14]]},"reference":[{"key":"2023012810351829500_btt143-B1","doi-asserted-by":"crossref","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol."},{"key":"2023012810351829500_btt143-B2","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1186\/1471-2164-9-328","article-title":"Cross-platform comparison of SYBR Green real-time PCR with TaqMan PCR, microarrays and other gene expression measurement technologies evaluated in the MicroArray Quality Control (MAQC) study","volume":"9","author":"Arikawa","year":"2008","journal-title":"BMC Genomics"},{"key":"2023012810351829500_btt143-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2202\/1544-6115.1627","article-title":"A two-stage Poisson model for testing RNA-seq data","volume":"10","author":"Auer","year":"2011","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012810351829500_btt143-B4","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R Stat. Soc. B"},{"key":"2023012810351829500_btt143-B5","doi-asserted-by":"crossref","first-page":"e17820","DOI":"10.1371\/journal.pone.0017820","article-title":"Evaluating Gene Expression in C57BL\/6J and DBA\/2J mouse striatum using RNA-seq and microarrays","volume":"6","author":"Bottomly","year":"2011","journal-title":"PloS One"},{"key":"2023012810351829500_btt143-B6","doi-asserted-by":"crossref","first-page":"249","DOI":"10.2307\/2530767","article-title":"Extended moment series and the parameters of the negative binomial distribution","volume":"40","author":"Bowman","year":"1984","journal-title":"Biometrics"},{"key":"2023012810351829500_btt143-B7","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1101\/gr.108662.110","article-title":"Conservation of an RNA regulatory map between Drosophila and mammals","volume":"21","author":"Brooks","year":"2011","journal-title":"Genome Res."},{"key":"2023012810351829500_btt143-B8","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/1471-2105-11-94","article-title":"Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments","volume":"11","author":"Bullard","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012810351829500_btt143-B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511814365","volume-title":"Regression Analysis of Count Data","author":"Cameron","year":"1998"},{"key":"2023012810351829500_btt143-B10","doi-asserted-by":"crossref","first-page":"309","DOI":"10.2307\/2532055","article-title":"Estimation of the negative binomial parameter \u03ba by maximum quasi-likelihood","volume":"45","author":"Clark","year":"1989","journal-title":"Biometrics"},{"key":"2023012810351829500_btt143-B11","volume-title":"NIST\/SEMATECH e-Handbook of Statistical Methods","author":"Croarkin","year":"2006"},{"key":"2023012810351829500_btt143-B12","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1186\/1471-2105-12-449","article-title":"ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets","volume":"12","author":"Frazee","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012810351829500_btt143-B13","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1038\/nmeth.1613","article-title":"Computational methods for transcriptome annotation and quantification using RNA-seq","volume":"8","author":"Garber","year":"2011","journal-title":"Nat. Methods"},{"key":"2023012810351829500_btt143-B14","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1038\/nmeth.1503","article-title":"Alternative expression analysis by RNA sequencing","volume":"7","author":"Griffith","year":"2010","journal-title":"Nature Methods"},{"key":"2023012810351829500_btt143-B15","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1101\/gr.101204.109","article-title":"mRNA-seq with agnostic splice site discovery for nervous system transcriptomics tested in chronic pain","volume":"20","author":"Hammer","year":"2010","journal-title":"Genome Res."},{"key":"2023012810351829500_btt143-B16","volume-title":"Generalized Shrinkage Estimators","author":"Hansen","year":"2008"},{"key":"2023012810351829500_btt143-B17","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1186\/1471-2105-11-422","article-title":"BaySeq: Empirical Bayesian methods for identifying differential expression in sequence count data","volume":"11","author":"Hardcastle","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012810351829500_btt143-B18","first-page":"361","article-title":"Estimation with quadratic loss","volume-title":"Proceedings of the fourth Berkeley Symposium on Mathematical Statistics and Probability Held at the Statistical Laboratory, University of California, June 20-July 30, 1960","author":"James","year":"1961"},{"key":"2023012810351829500_btt143-B19","volume-title":"Theory of Point Estimation","author":"Lehmann","year":"1998"},{"key":"2023012810351829500_btt143-B20","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1093\/biostatistics\/kxr031","article-title":"Normalization, testing, and false discovery rate estimation for RNA-sequencing data","volume":"13","author":"Li","year":"2011","journal-title":"Biostatistics"},{"key":"2023012810351829500_btt143-B21","article-title":"Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-seq data","author":"Li","year":"2011","journal-title":"Stat. Methods Med. Res."},{"key":"2023012810351829500_btt143-B22","doi-asserted-by":"crossref","first-page":"e180","DOI":"10.1371\/journal.pone.0000180","article-title":"Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases","volume":"2","author":"Lloyd-Smith","year":"2007","journal-title":"PLoS One"},{"key":"2023012810351829500_btt143-B23","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1038\/nbt1186","article-title":"Statistical practice in high-throughput screening data analysis","volume":"24","author":"Malo","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023012810351829500_btt143-B24","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1146\/annurev.genom.9.081307.164359","article-title":"Next-generation DNA sequencing methods","volume":"9","author":"Mardis","year":"2008","journal-title":"Annu. Rev. Genomics Hum. Genet."},{"key":"2023012810351829500_btt143-B25","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1101\/gr.079558.108","article-title":"RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays","volume":"18","author":"Marioni","year":"2008","journal-title":"Genome Res."},{"key":"2023012810351829500_btt143-B26","doi-asserted-by":"crossref","first-page":"e1000655","DOI":"10.1371\/journal.pcbi.1000655","article-title":"How to understand the cell by breaking it: network analysis of gene perturbation screens","volume":"6","author":"Markowetz","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023012810351829500_btt143-B27","doi-asserted-by":"crossref","first-page":"4288","DOI":"10.1093\/nar\/gks042","article-title":"Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation","volume":"40","author":"McCarthy","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012810351829500_btt143-B28","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-3242-6","volume-title":"Generalized Linear Models","author":"McCullagh","year":"1989"},{"key":"2023012810351829500_btt143-B29","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies: The next generation","volume":"11","author":"Metzker","year":"2009","journal-title":"Nat. Rev. Genetics"},{"key":"2023012810351829500_btt143-B30","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1186\/gb-2010-11-12-220","article-title":"From RNA-seq reads to differential expression results","volume":"11","author":"Oshlack","year":"2010","journal-title":"Genome Biol."},{"key":"2023012810351829500_btt143-B31","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1038\/nbt1242","article-title":"Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project","volume":"24","author":"Patterson","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023012810351829500_btt143-B32","doi-asserted-by":"crossref","first-page":"S22","DOI":"10.1038\/nmeth.1371","article-title":"Computation for ChIP-seq and RNA-seq studies","volume":"6","author":"Pepke","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012810351829500_btt143-B33","doi-asserted-by":"crossref","first-page":"863","DOI":"10.2307\/2532104","article-title":"Maximum likelihood estimation for the negative binomial dispersion parameter","volume":"46","author":"Piegorsch","year":"1990","journal-title":"Biometrics"},{"key":"2023012810351829500_btt143-B34","article-title":"An Introduction to James-Stein estimation","author":"Richards","year":"1999"},{"key":"2023012810351829500_btt143-B35","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"EdgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810351829500_btt143-B36","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of RNA-seq data","volume":"11","author":"Robinson","year":"2010","journal-title":"Genome Biol."},{"key":"2023012810351829500_btt143-B37","doi-asserted-by":"crossref","first-page":"2881","DOI":"10.1093\/bioinformatics\/btm453","article-title":"Moderated statistical tests for assessing differences in tag abundance","volume":"23","author":"Robinson","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012810351829500_btt143-B38","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1038\/nbt1239","article-title":"The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements","volume":"24","author":"Shi","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023012810351829500_btt143-B39","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2202\/1544-6115.1027","article-title":"Linear models and empirical Bayes methods for assessing differential expression in microarray experiments","volume":"3","author":"Smyth","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012810351829500_btt143-B40","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","article-title":"Limma: Linear models for microarray data","volume-title":"Bioinformatics Computational Biology Solutions Using R and Bioconductor","author":"Smyth","year":"2005"},{"key":"2023012810351829500_btt143-B50","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/1471-2105-14-91","article-title":"A comparison of methods for differential expression analysis of RNA-seq data","volume":"14","author":"Soneson","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023012810351829500_btt143-B41","first-page":"197","article-title":"Inadmissibility of the usual estimator for the mean of a multivariate Normal distribution","volume-title":"Proceedings of the Third Berkeley symposium on mathematical statistics and probability","author":"Stein","year":"1956"},{"key":"2023012810351829500_btt143-B42","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1126\/science.1160342","article-title":"A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome","volume":"321","author":"Sultan","year":"2008","journal-title":"Science"},{"key":"2023012810351829500_btt143-B43","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1016\/j.fm.2005.01.014","article-title":"The Gamma-Poisson model as a statistical method to determine if micro-organisms are randomly distributed in a food matrix","volume":"23","author":"Toft","year":"2006","journal-title":"Food Microbiol."},{"key":"2023012810351829500_btt143-B44","doi-asserted-by":"crossref","first-page":"e9317","DOI":"10.1371\/journal.pone.0009317","article-title":"Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations","volume":"5","author":"Tuch","year":"2010","journal-title":"PloS One"},{"key":"2023012810351829500_btt143-B45","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1093\/bioinformatics\/btp612","article-title":"DEGseq: an R package for identifying differentially expressed genes from RNA-seq data","volume":"26","author":"Wang","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012810351829500_btt143-B46","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"2023012810351829500_btt143-B47","doi-asserted-by":"crossref","first-page":"109","DOI":"10.2307\/2530749","article-title":"Multistage estimation compared with fixed-sample-size estimation of the negative binomial parameter k","volume":"40","author":"Willson","year":"1984","journal-title":"Biometrics"},{"key":"2023012810351829500_btt143-B48","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/1471-2105-11-S6-S10","article-title":"Evaluation of gene expression data generated from expired Affymetrix GeneChip microarrays using MAQC reference RNA samples","volume":"11","author":"Zhining","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012810351829500_btt143-B49","doi-asserted-by":"crossref","first-page":"2672","DOI":"10.1093\/bioinformatics\/btr449","article-title":"A powerful and flexible approach to the analysis of RNA sequence count data","volume":"27","author":"Zhou","year":"2011","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/10\/1275\/48886853\/bioinformatics_29_10_1275.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/10\/1275\/48886853\/bioinformatics_29_10_1275.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,8]],"date-time":"2024-05-08T23:22:21Z","timestamp":1715210541000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/10\/1275\/259212"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,4,14]]},"references-count":50,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2013,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt143","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,5,15]]},"published":{"date-parts":[[2013,4,14]]}}}