{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T19:35:00Z","timestamp":1774121700701,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S13","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship.<\/jats:p><\/jats:sec><jats:sec><jats:title>Result<\/jats:title><jats:p>We empirically evaluate the accuracy of repeated gene expression measurements using RNA-Seq. We identify library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, we show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. We therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters we introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. We shown that our modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>We proposed a novel form of overdispersion guaranteeing that the accuracy improves with sequencing depth. We demonstrated that the new form provides a better fit to the data.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-13-s13-s5","type":"journal-article","created":{"date-parts":[[2012,8,24]],"date-time":"2012-08-24T10:14:53Z","timestamp":1345803293000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["Accuracy of RNA-Seq and its dependence on sequencing depth"],"prefix":"10.1186","volume":"13","author":[{"given":"Guoshuai","family":"Cai","sequence":"first","affiliation":[]},{"given":"Hua","family":"Li","sequence":"additional","affiliation":[]},{"given":"Yue","family":"Lu","sequence":"additional","affiliation":[]},{"given":"Xuelin","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Juhee","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Peter","family":"M\u00fcller","sequence":"additional","affiliation":[]},{"given":"Yuan","family":"Ji","sequence":"additional","affiliation":[]},{"given":"Shoudan","family":"Liang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,8,24]]},"reference":[{"key":"5290_CR1","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1038\/nature07517","volume":"456","author":"DR Bentley","year":"2008","unstructured":"Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456: 53\u201359. 10.1038\/nature07517","journal-title":"Nature"},{"key":"5290_CR2","doi-asserted-by":"publisher","first-page":"621","DOI":"10.1038\/nmeth.1226","volume":"5","author":"A Mortazavi","year":"2008","unstructured":"Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5: 621\u2013628. 10.1038\/nmeth.1226","journal-title":"Nat Methods"},{"key":"5290_CR3","doi-asserted-by":"publisher","first-page":"1239","DOI":"10.1038\/nature07002","volume":"453","author":"BT Wilhelm","year":"2008","unstructured":"Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008, 453: 1239\u20131243. 10.1038\/nature07002","journal-title":"Nature"},{"key":"5290_CR4","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","volume":"25","author":"C Trapnell","year":"2009","unstructured":"Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25: 1105\u20131111. 10.1093\/bioinformatics\/btp120","journal-title":"Bioinformatics"},{"key":"5290_CR5","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1038\/nbt.1621","volume":"28","author":"C Trapnell","year":"2010","unstructured":"Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van BM, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010, 28: 511\u2013515. 10.1038\/nbt.1621","journal-title":"Nat Biotechnol"},{"key":"5290_CR6","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1126\/science.1207018","volume":"333","author":"M Li","year":"2011","unstructured":"Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG: Widespread RNA and DNA sequence differences in the human transcriptome. Science 2011, 333: 53\u201358. 10.1126\/science.1207018","journal-title":"Science"},{"key":"5290_CR7","doi-asserted-by":"publisher","first-page":"470","DOI":"10.1038\/nature07509","volume":"456","author":"ET Wang","year":"2008","unstructured":"Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature 2008, 456: 470\u2013476. 10.1038\/nature07509","journal-title":"Nature"},{"key":"5290_CR8","doi-asserted-by":"publisher","first-page":"1009","DOI":"10.1038\/nmeth.1528","volume":"7","author":"Y Katz","year":"2010","unstructured":"Katz Y, Wang ET, Airoldi EM, Burge CB: Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010, 7: 1009\u20131015. 10.1038\/nmeth.1528","journal-title":"Nat Methods"},{"key":"5290_CR9","first-page":"195","volume":"25","author":"WJ Ansorge","year":"2009","unstructured":"Ansorge WJ: Next-generation DNA sequencing techniques. Nat Biotechnol 2009, 25: 195\u2013203.","journal-title":"Nat Biotechnol"},{"key":"5290_CR10","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1093\/nar\/gkn705","volume":"36","author":"PA \u2018t Hoen","year":"2008","unstructured":"\u2018t Hoen PA, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RH, de Menezes RX, Boer JM, van Ommen GJ, den Dunnen JT: Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. N Nucleic Acids Res 2008, 36: 141. 10.1093\/nar\/gkn705","journal-title":"N Nucleic Acids Res"},{"key":"5290_CR11","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1038\/418244a","volume":"418","author":"GJ Hannon","year":"2002","unstructured":"Hannon GJ: RNA interference. Nature 2002, 418: 244\u2013251. 10.1038\/418244a","journal-title":"Nature"},{"key":"5290_CR12","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1006\/meth.1998.0593","volume":"14","author":"B Sauer","year":"1998","unstructured":"Sauer B: Inducible gene targeting in mice using the Cre\/lox system. Methods 1998, 14: 381\u2013392. 10.1006\/meth.1998.0593","journal-title":"Methods"},{"key":"5290_CR13","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1471-2105-11-94","volume":"11","author":"JH Bullard","year":"2010","unstructured":"Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 2010, 11: 94. 10.1186\/1471-2105-11-94","journal-title":"BMC Bioinformatics"},{"key":"5290_CR14","first-page":"257","volume":"10","author":"JG Skellam","year":"1948","unstructured":"Skellam JG: A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Methodol 1948, 10: 257\u2013261.","journal-title":"Methodol"},{"key":"5290_CR15","unstructured":"Lee J, Mueller P, Liang S, Cai G, Ji Y: On Differential Gene Expression Using RNA-Seq Data. Cancer Informatics, in press."},{"key":"5290_CR16","doi-asserted-by":"publisher","first-page":"991","DOI":"10.1101\/gr.116335.110","volume":"21","author":"JM Toung","year":"2011","unstructured":"Toung JM, Morley M, Li M, Cheung VG: RNA-sequence analysis of human B-cells. Genome Res 2011, 21: 991\u2013998. 10.1101\/gr.116335.110","journal-title":"Genome Res"},{"key":"5290_CR17","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1093\/bioinformatics\/btp612","volume":"26","author":"L Wang","year":"2010","unstructured":"Wang L, Feng Z, Wang X, Zhang X: DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 2010, 26: 136\u2013138. 10.1093\/bioinformatics\/btp612","journal-title":"Bioinformatics"},{"key":"5290_CR18","doi-asserted-by":"publisher","first-page":"R50","DOI":"10.1186\/gb-2010-11-5-r50","volume":"11","author":"J Li","year":"2010","unstructured":"Li J, Jiang H, Wong WH: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biol 2010, 11: R50. 10.1186\/gb-2010-11-5-r50","journal-title":"Genome Biol"},{"key":"5290_CR19","doi-asserted-by":"publisher","first-page":"e131","DOI":"10.1093\/nar\/gkq224","volume":"38","author":"KD Hansen","year":"2010","unstructured":"Hansen KD, Brenner SE, Dudoit S: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 2010, 38: e131. 10.1093\/nar\/gkq224","journal-title":"Nucleic Acids Res"},{"key":"5290_CR20","doi-asserted-by":"publisher","first-page":"1477","DOI":"10.1093\/bioinformatics\/btg173","volume":"19","author":"KA Baggerly","year":"2003","unstructured":"Baggerly KA, Deng L, Morris JS, Aldaz CM: Differential expression in SAGE: accounting for normal between-library variation. Bioinformatics 2003, 19: 1477\u20131483. 10.1093\/bioinformatics\/btg173","journal-title":"Bioinformatics"},{"key":"5290_CR21","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1093\/bioinformatics\/btp677","volume":"26","author":"T Pham","year":"2010","unstructured":"Pham T, Piersma SR, Warmoes M, Jimenez CR: On the beta-binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics. Bioinformatics 2010, 26: 363\u2013369. 10.1093\/bioinformatics\/btp677","journal-title":"Bioinformatics"},{"key":"5290_CR22","doi-asserted-by":"publisher","first-page":"16320","DOI":"10.1073\/pnas.1002176107","volume":"107","author":"PM Chiang","year":"2010","unstructured":"Chiang PM, Ling J, Jeong YH, Price DL, Aja SM, Wong P: Deletion of TDP-43 down-regulates Tbc1d1, a gene linked to obesity, and alters body fat metabolism. Proc Natl Acad Sci U S A 2010, 107: 16320\u201316324. 10.1073\/pnas.1002176107","journal-title":"Proc Natl Acad Sci U S A"},{"key":"5290_CR23","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc B 1995, 57: 289\u2013300.","journal-title":"J Roy Statist Soc B"},{"key":"5290_CR24","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","volume":"27","author":"T Fawcett","year":"2006","unstructured":"Fawcett T: An introduction to ROC analysis. Pattern Recognition Letters 2006, 27: 861\u2013874. 10.1016\/j.patrec.2005.10.010","journal-title":"Pattern Recognition Letters"},{"key":"5290_CR25","doi-asserted-by":"publisher","first-page":"e105","DOI":"10.1093\/nar\/gkn425","volume":"36","author":"JC Dohm","year":"2008","unstructured":"Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 2008, 36: e105. 10.1093\/nar\/gkn425","journal-title":"Nucleic Acids Res"},{"key":"5290_CR26","doi-asserted-by":"publisher","first-page":"1115","DOI":"10.1038\/nbt1236","volume":"24","author":"R Canales","year":"2006","unstructured":"Canales R, L Y, Willey J, Austermiller B, Barbacioru C, Boysen C, Hunkapiller K, Jensen R, Knight CR, Lee K, Ma Y, Maqsodi B, Papallo A, Peters E, Poulter K, Ruppel P, Samaha R, Shi L, Yang W, Zhang L, Goodsaid FM: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 2006, 24: 1115\u20131122. 10.1038\/nbt1236","journal-title":"Nat Biotechnol"},{"key":"5290_CR27","unstructured":"OOMPA package[http:\/\/genome.ucsc.edu\/cgi-bin\/hgTrackUi?db=hg18%5C&g=wgEncodeCaltechRnaSeq]"},{"key":"5290_CR28","unstructured":"Wold\/Caltech lab[http:\/\/genome.ucsc.edu\/cgi-bin\/hgTrackUi?db=hg18%5C&g=wgEncodeCaltechRnaSeq]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-S13-S5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,28]],"date-time":"2024-04-28T17:37:07Z","timestamp":1714325827000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-S13-S5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,8]]},"references-count":28,"journal-issue":{"issue":"S13","published-print":{"date-parts":[[2012,8]]}},"alternative-id":["5290"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-s13-s5","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,8]]},"assertion":[{"value":"24 August 2012","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S5"}}