{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T14:54:03Z","timestamp":1774968843947,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009939","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,4,5]],"date-time":"2022-04-05T00:00:00Z","timestamp":1649116800000}}],"reference-count":27,"publisher":"Public Library of Science (PLoS)","issue":"3","license":[{"start":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T00:00:00Z","timestamp":1648080000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["2R01GM097171-09"],"award-info":[{"award-number":["2R01GM097171-09"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Bio-X","award":["Bio-X Bowes Graduate Student Fellowship"],"award-info":[{"award-number":["Bio-X Bowes Graduate Student Fellowship"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity\u2013a simple metric based on Shannon entropy\u2013explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009939","type":"journal-article","created":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T17:59:33Z","timestamp":1648144773000},"page":"e1009939","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":22,"title":["Transcriptome diversity is a systematic source of variation in RNA-sequencing data"],"prefix":"10.1371","volume":"18","author":[{"given":"Pablo E.","family":"Garc\u00eda-Nieto","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2907-6132","authenticated-orcid":true,"given":"Ban","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Hunter B.","family":"Fraser","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,3,24]]},"reference":[{"key":"pcbi.1009939.ref001","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1126\/science.270.5235.467","article-title":"Quantitative monitoring of gene expression patterns with a complementary DNA microarray","volume":"270","author":"M Schena","year":"1995","journal-title":"Science (80-)"},{"key":"pcbi.1009939.ref002","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1038\/s41576-019-0150-2","article-title":"RNA sequencing: the teenage years","volume":"20","author":"R Stark","year":"2019","journal-title":"Nat Rev Genet"},{"key":"pcbi.1009939.ref003","first-page":"1","article-title":"A survey of best practices for RNA-seq data analysis","volume":"17","author":"A Conesa","year":"2016","journal-title":"Genome Biol"},{"key":"pcbi.1009939.ref004","doi-asserted-by":"crossref","first-page":"2870","DOI":"10.1093\/bioinformatics\/bty175","article-title":"Understanding sequencing data as compositions: An outlook and review","volume":"34","author":"TP Quinn","year":"2018","journal-title":"Bioinformatics"},{"key":"pcbi.1009939.ref005","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1186\/1471-2164-12-293","article-title":"RNA-seq: Technical variability and sampling","volume":"12","author":"LM McIntyre","year":"2011","journal-title":"BMC Genomics"},{"key":"pcbi.1009939.ref006","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2010-11-3-r25","article-title":"A scaling normalization method for differential expression analysis of RNA-seq data","volume":"11","author":"MD Robinson","year":"2010","journal-title":"Genome Biol"},{"key":"pcbi.1009939.ref007","doi-asserted-by":"crossref","first-page":"R106","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"S Anders","year":"2010","journal-title":"Genome Biol"},{"key":"pcbi.1009939.ref008","doi-asserted-by":"crossref","first-page":"1318","DOI":"10.1126\/science.aaz1776","article-title":"The GTEx Consortium atlas of genetic regulatory effects across human tissues","volume":"369","author":"The GTEx Consortium","year":"2020","journal-title":"Science"},{"key":"pcbi.1009939.ref009","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12915-017-0352-z","article-title":"The landscape of sex-differential transcriptome and its consequent selection in human adults","volume":"15","author":"M Gershoni","year":"2017","journal-title":"BMC Biol"},{"key":"pcbi.1009939.ref010","doi-asserted-by":"crossref","first-page":"3263","DOI":"10.1016\/j.celrep.2019.08.043","article-title":"Age-Related Gene Expression Signature in Rats Demonstrate Early, Late, and Linear Transcriptional Changes from Multiple Tissues","volume":"28","author":"T Shavlakadze","year":"2019","journal-title":"Cell Rep"},{"key":"pcbi.1009939.ref011","doi-asserted-by":"crossref","first-page":"4601","DOI":"10.1073\/pnas.1821367117","article-title":"Population-based RNA profiling in Add Health finds social disparities in inflammatory and antiviral gene regulation to emerge by young adulthood","volume":"117","author":"SW Cole","year":"2020","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1009939.ref012","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.1534\/genetics.116.193714","article-title":"Detecting sources of transcriptional heterogeneity in large-scale RNA-seq data sets","volume":"204","author":"BC Searle","year":"2016","journal-title":"Genetics"},{"key":"pcbi.1009939.ref013","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1038\/nprot.2011.457","article-title":"Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses","volume":"7","author":"O Stegle","year":"2012","journal-title":"Nat Protoc"},{"key":"pcbi.1009939.ref014","doi-asserted-by":"crossref","first-page":"9709","DOI":"10.1073\/pnas.0803479105","article-title":"Defining diversity, specialization, and gene specificity in transcriptomes through information theory","volume":"105","author":"O Mart\u00ednez","year":"2008","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1009939.ref015","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A Mathematical Theory of Communication","volume":"27","author":"CE Shannon","year":"1948","journal-title":"Bell Syst Tech J"},{"key":"pcbi.1009939.ref016","doi-asserted-by":"crossref","first-page":"4197","DOI":"10.1534\/g3.116.035444","article-title":"Microenvironmental gene expression plasticity among individual drosophila melanogaster","volume":"6","author":"Y Lin","year":"2016","journal-title":"G3 Genes, Genomes, Genet"},{"key":"pcbi.1009939.ref017","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2164-13-654","article-title":"Population and sex differences in Drosophila melanogaster brain gene expression","volume":"13","author":"A Catal\u00e1n","year":"2012","journal-title":"BMC Genomics"},{"key":"pcbi.1009939.ref018","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-49889-1","article-title":"Comparative evaluation of RNA-Seq library preparation methods for strand-specificity and low input","volume":"9","author":"D Sarantopoulou","year":"2019","journal-title":"Sci Rep"},{"key":"pcbi.1009939.ref019","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-59516-z","article-title":"Variability in estimated gene expression among commonly used RNA-seq pipelines","volume":"10","author":"S Arora","year":"2020","journal-title":"Sci Rep"},{"key":"pcbi.1009939.ref020","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/ng.2653","article-title":"The Genotype-Tissue Expression (GTEx) project","author":"J Lonsdale","year":"2013","journal-title":"Nat. Genet"},{"key":"pcbi.1009939.ref021","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1038\/nbt.3772","article-title":"Toil enables reproducible, open source, big biomedical data analyses","author":"J Vivian","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"pcbi.1009939.ref022","first-page":"319","article-title":"Reproducible RNA-seq analysis using recount2","author":"L Collado-Torres","year":"2017","journal-title":"[Internet]Nat. Biotechnol"},{"key":"pcbi.1009939.ref023","doi-asserted-by":"crossref","DOI":"10.1038\/sdata.2018.61","article-title":"Data Descriptor: Unifying cancer and normal RNA sequencing data from different sources","volume":"5","author":"Q Wang","year":"2018","journal-title":"Sci Data"},{"key":"pcbi.1009939.ref024","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1002330","article-title":"Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies","volume":"8","author":"N Fusi","year":"2012","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1009939.ref025","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btw777","article-title":"Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R","volume":"33","author":"DJ McCarthy","year":"2017","journal-title":"Bioinformatics"},{"key":"pcbi.1009939.ref026","doi-asserted-by":"crossref","first-page":"4288","DOI":"10.1093\/nar\/gks042","article-title":"Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation","volume":"40","author":"DJ McCarthy","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009939.ref027","doi-asserted-by":"crossref","unstructured":"Risso D, Schwartz K, Sherlock G, Dudoit S. GC-Content Normalization for RNA-Seq Data. 2011;","DOI":"10.1186\/1471-2105-12-480"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009939","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,4,5]],"date-time":"2022-04-05T00:00:00Z","timestamp":1649116800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009939","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,19]],"date-time":"2023-11-19T01:18:00Z","timestamp":1700356680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009939"}},"subtitle":[],"editor":[{"given":"Chongzhi","family":"Zang","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,3,24]]},"references-count":27,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3,24]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009939","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.04.27.441712","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,24]]}}}