{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T11:13:38Z","timestamp":1769598818786,"version":"3.49.0"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>High-throughput RNA sequencing (RNA-seq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNA-seq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two well-known count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNA-seq data sets is not trivial for the non-expert bioinformatics user.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here we show that expression profiles produced by extensively-replicated RNA-seq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as Poisson-Inverse Gaussian or P\u00f3lya-Aeppli, which can be captured by a more general family of count data distributions called the Poisson-Tweedie. The flexibility of the Poisson-Tweedie family enables a direct fitting of emerging features of large expression profiles, such as heavy-tails or zero-inflation, without the need to alter a single configuration parameter. We provide a software package for R called implementing a new test for differential expression based on the Poisson-Tweedie family. Using simulations on synthetic and real RNA-seq data we show that yields<jats:italic>P<\/jats:italic>-values that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sex-specific gene expression changes in human lymphoblastoid cell lines, we also show that accurately detects differentially expressed genes in a real large RNA-seq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>RNA-seq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNA-seq samples with markedly different count data distributions. The package forms part of the Bioconductor project and it is available for download at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.bioconductor.org\" ext-link-type=\"uri\">http:\/\/www.bioconductor.org<\/jats:ext-link>.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-14-254","type":"journal-article","created":{"date-parts":[[2013,8,21]],"date-time":"2013-08-21T10:15:01Z","timestamp":1377080101000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":52,"title":["A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments"],"prefix":"10.1186","volume":"14","author":[{"given":"Mikel","family":"Esnaola","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pedro","family":"Puig","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Gonzalez","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Castelo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juan R","family":"Gonzalez","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2013,8,21]]},"reference":[{"key":"6060_CR1","doi-asserted-by":"publisher","first-page":"621","DOI":"10.1038\/nmeth.1226","volume":"5","author":"A Mortazavi","year":"2008","unstructured":"Mortazavi1 A, Williams B, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038\/nmeth.1226.","journal-title":"Nat Methods"},{"key":"6060_CR2","doi-asserted-by":"publisher","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","volume":"11","author":"M Robinson","year":"2010","unstructured":"Robinson M, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11: R25-10.1186\/gb-2010-11-3-r25.","journal-title":"Genome Biol"},{"key":"6060_CR3","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1186\/1471-2105-12-480","volume":"12","author":"D Risso","year":"2011","unstructured":"Risso D, Schwartz K, Sherlock G, Dudoit S: GC-content normalization for RNA-Seq data. BMC Bioinformatics. 2011, 12: 480-10.1186\/1471-2105-12-480.","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"6060_CR4","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1093\/biostatistics\/kxr054","volume":"13","author":"KD Hansen","year":"2012","unstructured":"Hansen KD, Irizarry RA, Wu Z: Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics. 2012, 13 (2): 204-216. 10.1093\/biostatistics\/kxr054.","journal-title":"Biostatistics"},{"key":"6060_CR5","doi-asserted-by":"publisher","first-page":"1509","DOI":"10.1101\/gr.079558.108","volume":"18","author":"J Marioni","year":"2008","unstructured":"Marioni J, Mason C, Mane S, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101\/gr.079558.108.","journal-title":"Genome Res"},{"issue":"2","key":"6060_CR6","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1093\/biostatistics\/kxm030","volume":"9","author":"MD Robinson","year":"2008","unstructured":"Robinson MD, Smyth GK: Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008, 9 (2): 321-332.","journal-title":"Biostatistics"},{"issue":"10","key":"6060_CR7","doi-asserted-by":"publisher","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","volume":"11","author":"S Anders","year":"2010","unstructured":"Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11 (10): R106-10.1186\/gb-2010-11-10-r106.","journal-title":"Genome Biol"},{"key":"6060_CR8","volume-title":"Detecting differential expression in RNA-sequence data using quasi-Likelihood with shrunken dispersion estimates","author":"SP Lund","year":"2012","unstructured":"Lund SP, Nettleton D, McCarthy DJ, Smyth GK: Detecting differential expression in RNA-sequence data using quasi-Likelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012, 11 (5): doi:10.1093\/biostatistics\/kxs033."},{"key":"6060_CR9","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1186\/1471-2105-11-422","volume":"11","author":"TJ Hardcastle","year":"2010","unstructured":"Hardcastle TJ, Kelly KA: baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010, 11: 422-10.1186\/1471-2105-11-422.","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"6060_CR10","doi-asserted-by":"publisher","first-page":"4288","DOI":"10.1093\/nar\/gks042","volume":"40","author":"DJ McCarthy","year":"2012","unstructured":"McCarthy DJ, Chen Y, Smyth GK: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012, 40 (10): 4288-4297. 10.1093\/nar\/gks042.","journal-title":"Nucleic Acids Res"},{"key":"6060_CR11","volume-title":"Biostatistics","author":"H Wu","year":"2012","unstructured":"Wu H, Wang C, Wu Z: A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics. 2012, doi:10.1093\/biostatistics\/kxs033."},{"key":"6060_CR12","doi-asserted-by":"publisher","first-page":"768","DOI":"10.1038\/nature08872","volume":"464","author":"J Pickrell","year":"2010","unstructured":"Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464: 768-772. 10.1038\/nature08872.","journal-title":"Nature"},{"issue":"10","key":"6060_CR13","doi-asserted-by":"publisher","first-page":"R80","DOI":"10.1186\/gb-2004-5-10-r80","volume":"5","author":"RC Gentleman","year":"2004","unstructured":"Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R80-10.1186\/gb-2004-5-10-r80.","journal-title":"Genome Biol"},{"key":"6060_CR14","volume-title":"Biostatistics","author":"MA Van De Wiel","year":"2012","unstructured":"Van De Wiel MA, Leday GGR, Pardo L, Rue H, Van Der Vaart AW, Van Wieringen WN: Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 2012, doi:10.1093\/biostatistics\/kxs031."},{"key":"6060_CR15","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1038\/nbt.1910","volume":"29","author":"K Hansen","year":"2011","unstructured":"Hansen K, Wu Z, Irizarry R, Leek J: Sequencing technology does not eliminate biological variability. Nat Biotech. 2011, 29: 572-573. 10.1038\/nbt.1910.","journal-title":"Nat Biotech"},{"key":"6060_CR16","volume-title":"The Theory of Dispersion Models","author":"B Jorgensen","year":"1997","unstructured":"Jorgensen B: The Theory of Dispersion Models. 1997, New York: Chapman and Hall"},{"issue":"2","key":"6060_CR17","first-page":"201","volume":"28","author":"C Kokonendji","year":"2004","unstructured":"Kokonendji C, Dossou-Gb\u00e9t\u00e9 S, Dem\u00e9trio C: Some discrete exponencial dispersion models: Poisson-Tweedie and Hinde-Dem\u00e9trio classes. SORT. 2004, 28 (2): 201-214.","journal-title":"SORT"},{"key":"6060_CR18","doi-asserted-by":"publisher","first-page":"D1011","DOI":"10.1093\/nar\/gkq1259","volume":"39","author":"M McCall","year":"2011","unstructured":"McCall M, Uppal K, Jaffee H, Zilliox R M J Irizarry: The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res. 2011, 39: D1011-D1015. 10.1093\/nar\/gkq1259.","journal-title":"Nucleic Acids Res"},{"issue":"7","key":"6060_CR19","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1016\/S0168-9525(03)00140-9","volume":"19","author":"E Eisenberg","year":"2003","unstructured":"Eisenberg E, Levanon EY: Human housekeeping genes are compact. Trends Genet. 2003, 19 (7): 362-365. 10.1016\/S0168-9525(03)00140-9.","journal-title":"Trends Genet"},{"issue":"9","key":"6060_CR20","doi-asserted-by":"publisher","first-page":"1724","DOI":"10.1371\/journal.pgen.0030161","volume":"3","author":"JT Leek","year":"2007","unstructured":"Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007, 3 (9): 1724-1735.","journal-title":"PLoS Genet"},{"issue":"16","key":"6060_CR21","doi-asserted-by":"publisher","first-page":"9440","DOI":"10.1073\/pnas.1530509100","volume":"100","author":"JD Storey","year":"2003","unstructured":"Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100 (16): 9440-9445. 10.1073\/pnas.1530509100.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"6060_CR22","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1038\/nature03479","volume":"434","author":"L Carrel","year":"2005","unstructured":"Carrel L, HF W: X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature. 2005, 434: 400-404. 10.1038\/nature03479.","journal-title":"Nature"},{"key":"6060_CR23","doi-asserted-by":"publisher","first-page":"825","DOI":"10.1038\/nature01722","volume":"423","author":"H Skaletsky","year":"2003","unstructured":"Skaletsky H, Kuroda-Kawaguchi T, Minx P, Cordum H, Hillier L, Brown L, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G, Fulton L, Fulton R, Graves T, Hou SF, Latrielle P, Leonard S, Mardis E, Maupin R, McPherson J, Miner T, Nash W, Nguyen C, Ozersky P, Pepin K, Rock S, Rohlfing T, Scott K, Schultz B, Strong C, Tin-Wollam A, Yang SP, Waterston R, Wilson R, Rozen S, Page D: The male-specific region of the human Y chromosome is a mosic of discrete sequence classes. Nature. 2003, 423: 825-837. 10.1038\/nature01722.","journal-title":"Nature"},{"issue":"23","key":"6060_CR24","doi-asserted-by":"publisher","first-page":"9758","DOI":"10.1073\/pnas.0703736104","volume":"104","author":"RS Huang","year":"2007","unstructured":"Huang RS, Duan S, Bleibel WK, Kistner EO, Zhang W, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME: A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc Natl Acad Sci U S A. 2007, 104 (23): 9758-9563. 10.1073\/pnas.0703736104.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"6060_CR25","volume-title":"Stat Appl Genet Mol Biol","author":"GK Smyth","year":"2004","unstructured":"Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: doi:10.2202\/1544-6115.1027."},{"key":"6060_CR26","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1038\/ng1705","volume":"38","author":"DK Nguyen","year":"2006","unstructured":"Nguyen DK, Disteche CM: Dosage compensation of the active X chromosome in mammals. Nat Genet. 2006, 38: 47-53. 10.1038\/ng1705.","journal-title":"Nat Genet"},{"issue":"5","key":"6060_CR27","doi-asserted-by":"publisher","first-page":"614","DOI":"10.1093\/bioinformatics\/btt016","volume":"29","author":"DG Knowles","year":"2013","unstructured":"Knowles DG, R\u00f6der M, Merkel A, Guig\u00f3 R: Grape RNA-Seq analysis pipeline environment. Bioinformatics. 2013, 29 (5): 614-621. 10.1093\/bioinformatics\/btt016.","journal-title":"Bioinformatics"},{"issue":"12","key":"6060_CR28","doi-asserted-by":"publisher","first-page":"1185","DOI":"10.1038\/nmeth.2221","volume":"9","author":"S Marco-Sola","year":"2012","unstructured":"Marco-Sola S, Sammeth M, Guig\u00f3 R, Ribeca P: The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods. 2012, 9 (12): 1185-1188. 10.1038\/nmeth.2221.","journal-title":"Nat Methods"},{"issue":"Suppl 1","key":"6060_CR29","doi-asserted-by":"publisher","first-page":"S4.1","DOI":"10.1186\/gb-2006-7-s1-s4","volume":"7","author":"J Harrow","year":"2006","unstructured":"Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JGR, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R: Genome Biol. 2006, 7 (Suppl 1): S4.1-S4.9.","journal-title":"Genome Biol"},{"key":"6060_CR30","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","volume":"26","author":"MD Robinson","year":"2010","unstructured":"Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139-140. 10.1093\/bioinformatics\/btp616.","journal-title":"Bioinformatics"},{"key":"6060_CR31","doi-asserted-by":"publisher","first-page":"1225","DOI":"10.2307\/2533492","volume":"53","author":"P Hougaard","year":"1997","unstructured":"Hougaard P, Lee ML, Whitmore G: Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes. Biometrics. 1997, 53: 1225-1238. 10.2307\/2533492.","journal-title":"Biometrics"},{"key":"6060_CR32","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1016\/S0167-9473(02)00301-8","volume":"45","author":"R Gupta","year":"2004","unstructured":"Gupta R, Ong S: A new generalization of the negative binomial distribution. Compu Stat Data An. 2004, 45: 287-300. 10.1016\/S0167-9473(02)00301-8.","journal-title":"Compu Stat Data An"},{"key":"6060_CR33","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1198\/016214505000000718","volume":"101","author":"P Puig","year":"2006","unstructured":"Puig P, Valero J: Count Data Distributions: Some Characterizations With Applications. J Am Stat Assoc. 2006, 101: 332-340. 10.1198\/016214505000000718.","journal-title":"J Am Stat Assoc"},{"key":"6060_CR34","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1002\/env.1036","volume":"22","author":"A El-Shaarawi","year":"2011","unstructured":"El-Shaarawi A, Zhu R, Joe H: Modelling species abundance using the Poisson-Tweedie family. Environmetrics. 2011, 22: 152-164. 10.1002\/env.1036.","journal-title":"Environmetrics"},{"key":"6060_CR35","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc B. 1995, 57: 289-300.","journal-title":"J R Stat Soc B"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-14-254.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T23:10:31Z","timestamp":1715901031000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-14-254"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,21]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["6060"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-14-254","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,8,21]]},"assertion":[{"value":"1 June 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 August 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 August 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"254"}}