{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T00:57:18Z","timestamp":1773881838472,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and<jats:italic>p<\/jats:italic>-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-12-480","type":"journal-article","created":{"date-parts":[[2011,12,17]],"date-time":"2011-12-17T07:26:27Z","timestamp":1324106787000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":769,"title":["GC-Content Normalization for RNA-Seq Data"],"prefix":"10.1186","volume":"12","author":[{"given":"Davide","family":"Risso","sequence":"first","affiliation":[]},{"given":"Katja","family":"Schwartz","sequence":"additional","affiliation":[]},{"given":"Gavin","family":"Sherlock","sequence":"additional","affiliation":[]},{"given":"Sandrine","family":"Dudoit","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,12,17]]},"reference":[{"issue":"5881","key":"5121_CR1","doi-asserted-by":"publisher","first-page":"1344","DOI":"10.1126\/science.1158441","volume":"320","author":"U Nagalakshmi","year":"2008","unstructured":"Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320(5881):1344. 10.1126\/science.1158441","journal-title":"Science"},{"key":"5121_CR2","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1038\/nrg2484","volume":"10","author":"Z Wang","year":"2009","unstructured":"Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 2009, 10: 57\u201363. 10.1038\/nrg2484","journal-title":"Nature Reviews Genetics"},{"key":"5121_CR3","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1471-2105-11-94","volume":"11","author":"J Bullard","year":"2010","unstructured":"Bullard J, Purdom E, Hansen K, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 2010, 11: 94. 10.1186\/1471-2105-11-94","journal-title":"BMC Bioinformatics"},{"issue":"9","key":"5121_CR4","doi-asserted-by":"publisher","first-page":"1509","DOI":"10.1101\/gr.079558.108","volume":"18","author":"J Marioni","year":"2008","unstructured":"Marioni J, Mason C, Mane S, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 2008, 18(9):1509. 10.1101\/gr.079558.108","journal-title":"Genome Research"},{"issue":"7","key":"5121_CR5","doi-asserted-by":"publisher","first-page":"621","DOI":"10.1038\/nmeth.1226","volume":"5","author":"A Mortazavi","year":"2008","unstructured":"Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 2008, 5(7):621\u2013628. 10.1038\/nmeth.1226","journal-title":"Nature Methods"},{"key":"5121_CR6","volume-title":"Tech Rep 804","author":"Y Benjamini","year":"2011","unstructured":"Benjamini Y, Speed T: Estimation and correction for GC-content bias in high throughput sequencing. Tech Rep 804 Department of Statistics, University of California, Berkeley; 2011. [http:\/\/www.stat.berkeley.edu\/25]"},{"issue":"7218","key":"5121_CR7","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1038\/nature07517","volume":"456","author":"D Bentley","year":"2008","unstructured":"Bentley D, Balasubramanian S, Swerdlow H, Smith G, Milton J, Brown C, Hall K, Evers D, Barnes C, Bignell H, et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456(7218):53\u201359. 10.1038\/nature07517","journal-title":"Nature"},{"issue":"2","key":"5121_CR8","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1093\/bioinformatics\/btq635","volume":"27","author":"V Boeva","year":"2011","unstructured":"Boeva V, Zinovyev A, Bleakley K, Vert J, Janoueix-Lerosey I, Delattre O, Barillot E: Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 2011, 27(2):268. 10.1093\/bioinformatics\/btq635","journal-title":"Bioinformatics"},{"issue":"11","key":"5121_CR9","doi-asserted-by":"publisher","first-page":"5058","DOI":"10.1073\/pnas.0912959107","volume":"107","author":"J Bullard","year":"2010","unstructured":"Bullard J, Mostovoy Y, Dudoit S, Brem R: Polygenic and directional regulatory evolution across pathways in Saccharomyces . Proceedings of the National Academy of Sciences 2010, 107(11):5058. 10.1073\/pnas.0912959107","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"16","key":"5121_CR10","doi-asserted-by":"publisher","first-page":"e105","DOI":"10.1093\/nar\/gkn425","volume":"36","author":"J Dohm","year":"2008","unstructured":"Dohm J, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research 2008, 36(16):e105. 10.1093\/nar\/gkn425","journal-title":"Nucleic Acids Research"},{"issue":"12","key":"5121_CR11","doi-asserted-by":"publisher","first-page":"e131","DOI":"10.1093\/nar\/gkq224","volume":"38","author":"K Hansen","year":"2010","unstructured":"Hansen K, Brenner S, Dudoit S: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research 2010, 38(12):e131. 10.1093\/nar\/gkq224","journal-title":"Nucleic Acids Research"},{"key":"5121_CR12","volume-title":"Tech Rep 227","author":"K Hansen","year":"2011","unstructured":"Hansen K, Irizarry R, Wu Z: Removing technical variability in RNA-Seq data using conditional quantile normalization. Tech Rep 227 Department of Biostatistics, Johns Hopkins University; 2011. [http:\/\/www.bepress.com\/jhubiostat\/paper227]"},{"issue":"5","key":"5121_CR13","doi-asserted-by":"publisher","first-page":"R50","DOI":"10.1186\/gb-2010-11-5-r50","volume":"11","author":"J Li","year":"2010","unstructured":"Li J, Jiang H, Wong W: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology 2010, 11(5):R50. 10.1186\/gb-2010-11-5-r50","journal-title":"Genome Biology"},{"key":"5121_CR14","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1186\/1745-6150-4-14","volume":"4","author":"A Oshlack","year":"2009","unstructured":"Oshlack A, Wakefield M: Transcript length bias in RNA-seq data confounds systems biology. Biology Direct 2009, 4: 14. 10.1186\/1745-6150-4-14","journal-title":"Biology Direct"},{"issue":"7289","key":"5121_CR15","doi-asserted-by":"publisher","first-page":"768","DOI":"10.1038\/nature08872","volume":"464","author":"J Pickrell","year":"2010","unstructured":"Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010, 464(7289):768\u2013772. 10.1038\/nature08872","journal-title":"Nature"},{"issue":"3","key":"5121_CR16","doi-asserted-by":"publisher","first-page":"R22","DOI":"10.1186\/gb-2011-12-3-r22","volume":"12","author":"A Roberts","year":"2011","unstructured":"Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L: Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology 2011, 12(3):R22. 10.1186\/gb-2011-12-3-r22","journal-title":"Genome Biology"},{"issue":"8","key":"5121_CR17","doi-asserted-by":"publisher","first-page":"e6700","DOI":"10.1371\/journal.pone.0006700","volume":"4","author":"L Teytelman","year":"2009","unstructured":"Teytelman L, \u00d6zayd\u0131n B, Zill O, Lefran\u00fccois P, Snyder M, Rine J, Eisen M: Impact of chromatin structures on DNA processing for genomic analyses. PLoS One 2009, 4(8):e6700. 10.1371\/journal.pone.0006700","journal-title":"PLoS One"},{"issue":"9","key":"5121_CR18","doi-asserted-by":"publisher","first-page":"1586","DOI":"10.1101\/gr.092981.109","volume":"19","author":"S Yoon","year":"2009","unstructured":"Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 2009, 19(9):1586. 10.1101\/gr.092981.109","journal-title":"Genome Research"},{"issue":"2","key":"5121_CR19","doi-asserted-by":"publisher","first-page":"R14","DOI":"10.1186\/gb-2010-11-2-r14","volume":"11","author":"M Young","year":"2010","unstructured":"Young M, Wakefield M, Smyth G, Oshlack A: Gene Ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 2010, 11(2):R14. 10.1186\/gb-2010-11-2-r14","journal-title":"Genome Biology"},{"key":"5121_CR20","doi-asserted-by":"publisher","first-page":"290","DOI":"10.1186\/1471-2105-12-290","volume":"12","author":"W Zheng","year":"2011","unstructured":"Zheng W, Chung L, Zhao H: Bias Detection and Correction in RNA-Sequencing Data. BMC Bioinformatics 2011, 12: 290. 10.1186\/1471-2105-12-290","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"5121_CR21","doi-asserted-by":"publisher","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","volume":"11","author":"S Anders","year":"2010","unstructured":"Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biology 2010, 11(10):R106. 10.1186\/gb-2010-11-10-r106","journal-title":"Genome Biology"},{"issue":"3","key":"5121_CR22","doi-asserted-by":"publisher","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","volume":"11","author":"M Robinson","year":"2010","unstructured":"Robinson M, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 2010, 11(3):R25. 10.1186\/gb-2010-11-3-r25","journal-title":"Genome Biology"},{"issue":"9","key":"5121_CR23","doi-asserted-by":"publisher","first-page":"1151","DOI":"10.1038\/nbt1239","volume":"24","author":"MAQC Consortium","year":"2006","unstructured":"MAQC Consortium: The MicroArray Quality Control (MAQC): project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology 2006, 24(9):1151\u20131161. 10.1038\/nbt1239","journal-title":"Nature Biotechnology"},{"issue":"10","key":"5121_CR24","doi-asserted-by":"publisher","first-page":"3091","DOI":"10.1093\/nar\/18.10.3091","volume":"18","author":"ME Schmitt","year":"1990","unstructured":"Schmitt ME, Brown TA, Trumpower BL: A rapid and simple method for preparation of RNA from Saccharomyces cerevisiae . Nucleic Acids Research 1990, 18(10):3091\u20133092. 10.1093\/nar\/18.10.3091","journal-title":"Nucleic Acids Research"},{"issue":"6","key":"5121_CR25","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1016\/j.cub.2011.02.019","volume":"21","author":"JM Maniar","year":"2011","unstructured":"Maniar JM, Fire AZ: EGO-1, a C. elegans RdRP, modulates gene expression via production of mRNA-templated short antisense RNAs. Current Biology 2011, 21(6):449\u2013459. 10.1016\/j.cub.2011.02.019","journal-title":"Current Biology"},{"issue":"18","key":"5121_CR26","doi-asserted-by":"publisher","first-page":"e123","DOI":"10.1093\/nar\/gkp596","volume":"37","author":"D Parkhomchuk","year":"2009","unstructured":"Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A: Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Research 2009, 37(18):e123. 10.1093\/nar\/gkp596","journal-title":"Nucleic Acids Research"},{"key":"5121_CR27","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1186\/1471-2164-11-663","volume":"11","author":"J Martin","year":"2010","unstructured":"Martin J, Bruno VM, Fang Z, Meng X, Blow M, Zhang T, Sherlock G, Snyder M, Wang Z: Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 2010, 11: 663. 10.1186\/1471-2164-11-663","journal-title":"BMC Genomics"},{"key":"5121_CR28","unstructured":"Saccharomyces Genome Databaser64. [http:\/\/www.yeastgenome.org]"},{"issue":"3","key":"5121_CR29","doi-asserted-by":"publisher","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","volume":"10","author":"B Langmead","year":"2009","unstructured":"Langmead B, Trapnell C, Pop M, Salzberg S: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 2009, 10(3):R25. 10.1186\/gb-2009-10-3-r25","journal-title":"Genome Biology"},{"issue":"2","key":"5121_CR30","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1093\/biostatistics\/4.2.249","volume":"4","author":"R Irizarry","year":"2003","unstructured":"Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U, Speed T: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4(2):249. 10.1093\/biostatistics\/4.2.249","journal-title":"Biostatistics"},{"key":"5121_CR31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.2202\/1544-6115.1027","volume":"3","author":"G Smyth","year":"2004","unstructured":"Smyth G: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 2004, 3: 3.","journal-title":"Statistical Applications in Genetics and Molecular Biology"},{"issue":"403","key":"5121_CR32","doi-asserted-by":"publisher","first-page":"596","DOI":"10.1080\/01621459.1988.10478639","volume":"83","author":"W Cleveland","year":"1988","unstructured":"Cleveland W, Devlin S: Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American Statistical Association 1988, 83(403):596\u2013610. 10.2307\/2289282","journal-title":"Journal of the American Statistical Association"},{"key":"5121_CR33","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","volume":"26","author":"M Robinson","year":"2010","unstructured":"Robinson M, McCarthy D, Smyth G: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26: 139. 10.1093\/bioinformatics\/btp616","journal-title":"Bioinformatics"},{"issue":"419","key":"5121_CR34","doi-asserted-by":"publisher","first-page":"597","DOI":"10.1080\/01621459.1992.10475256","volume":"87","author":"I Good","year":"1992","unstructured":"Good I: The Bayes\/non-Bayes compromise: A brief review. Journal of the American Statistical Association 1992, 87(419):597\u2013606. 10.2307\/2290192","journal-title":"Journal of the American Statistical Association"},{"issue":"4","key":"5121_CR35","doi-asserted-by":"publisher","first-page":"e15","DOI":"10.1093\/nar\/30.4.e15","volume":"30","author":"YH Yang","year":"2002","unstructured":"Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002, 30(4):e15. 10.1093\/nar\/30.4.e15","journal-title":"Nucleic Acids Research"},{"key":"5121_CR36","volume-title":"Genome Research","author":"L Jiang","year":"2011","unstructured":"Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B: Synthetic spike-in standards for RNA-seq experiments. Genome Research 2011. [Advance online publication)] [Advance online publication)]"},{"key":"5121_CR37","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 1995, 57: 289\u2013300.","journal-title":"Journal of the Royal Statistical Society, Series B"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-480.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,16]],"date-time":"2025-03-16T00:08:21Z","timestamp":1742083701000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-480"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,12]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["5121"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-480","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,12]]},"assertion":[{"value":"30 August 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"480"}}