{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:17:24Z","timestamp":1772173044891,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010028","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T00:00:00Z","timestamp":1650931200000}}],"reference-count":56,"publisher":"Public Library of Science (PLoS)","issue":"4","license":[{"start":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T00:00:00Z","timestamp":1649894400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    Genome wide association studies (GWASs) for complex traits have implicated thousands of genetic loci. Most GWAS-nominated variants lie in noncoding regions, complicating the systematic translation of these findings into functional understanding. Here, we leverage convolutional neural networks to assist in this challenge. Our computational framework, peaBrain, models the transcriptional machinery of a tissue as a two-stage process: first, predicting the mean tissue specific abundance of all genes and second, incorporating the transcriptomic consequences of genotype variation to predict individual abundance on a subject-by-subject basis. We demonstrate that peaBrain accounts for the majority (&gt;50%) of variance observed in mean transcript abundance across most tissues and outperforms regularized linear models in predicting the consequences of individual genotype variation. We highlight the validity of the peaBrain model by calculating non-coding impact scores that correlate with nucleotide evolutionary constraint that are also predictive of disease-associated variation and allele-specific transcription factor binding. We further show how these tissue-specific peaBrain scores can be leveraged to pinpoint functional tissues underlying complex traits, outperforming methods that depend on colocalization of eQTL and GWAS signals. We subsequently: (a) derive continuous dense embeddings of genes for downstream applications; (b) highlight the utility of the model in predicting transcriptomic impact of small molecules and shRNA (on par with\n                    <jats:italic>in vitro<\/jats:italic>\n                    experimental replication of external test sets); (c) explore how peaBrain can be used to model difficult-to-study processes (such as neural induction); and (d) identify putatively functional eQTLs that are missed by high-throughput experimental approaches.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1010028","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T13:54:55Z","timestamp":1649944495000},"page":"e1010028","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":4,"title":["A general framework for predicting the transcriptomic consequences of non-coding variation and small molecules"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2481-9753","authenticated-orcid":true,"given":"Moustafa","family":"Abdalla","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2776-6036","authenticated-orcid":true,"given":"Mohamed","family":"Abdalla","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,4,14]]},"reference":[{"issue":"23","key":"pcbi.1010028.ref001","doi-asserted-by":"crossref","first-page":"9362","DOI":"10.1073\/pnas.0903103106","article-title":"Potential etiologic and functional implications of genome-wide association loci for human diseases and traits","volume":"106","author":"LA Hindorff","year":"2009","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"7146","key":"pcbi.1010028.ref002","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1038\/nature05874","article-title":"Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project","volume":"447","author":"EP Consortium","year":"2007","journal-title":"nature"},{"issue":"2","key":"pcbi.1010028.ref003","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1038\/ng.2504","article-title":"Chromatin marks identify critical cell types for fine mapping complex trait variants","volume":"45","author":"G Trynka","year":"2013","journal-title":"Nature genetics"},{"issue":"5","key":"pcbi.1010028.ref004","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1038\/nmeth.1937","article-title":"Unsupervised pattern discovery in human chromatin structure through genomic segmentation","volume":"9","author":"MM Hoffman","year":"2012","journal-title":"Nature methods"},{"issue":"4","key":"pcbi.1010028.ref005","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1016\/j.cell.2013.09.053","article-title":"Super-enhancers in the control of cell identity and disease","volume":"155","author":"D Hnisz","year":"2013","journal-title":"Cell"},{"issue":"9","key":"pcbi.1010028.ref006","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1038\/ng.3367","article-title":"A gene-based association method for mapping traits using reference transcriptome data","volume":"47","author":"ER Gamazon","year":"2015","journal-title":"Nature genetics"},{"issue":"6","key":"pcbi.1010028.ref007","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1016\/j.ajhg.2016.10.003","article-title":"Colocalization of GWAS and eQTL signals detects target genes","volume":"99","author":"F Hormozdiari","year":"2016","journal-title":"The American Journal of Human Genetics"},{"issue":"10","key":"pcbi.1010028.ref008","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning\u2013based sequence model","volume":"12","author":"J Zhou","year":"2015","journal-title":"Nature methods"},{"issue":"3","key":"pcbi.1010028.ref009","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1038\/ng.3506","article-title":"Integrative approaches for large-scale transcriptome-wide association studies","volume":"48","author":"A Gusev","year":"2016","journal-title":"Nature genetics"},{"issue":"7","key":"pcbi.1010028.ref010","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"DR Kelley","year":"2016","journal-title":"Genome research"},{"key":"pcbi.1010028.ref011","first-page":"161851","article-title":"Sequential regulatory activity prediction across chromosomes with convolutional neural networks.","author":"DR Kelley","year":"2018","journal-title":"bioRxiv"},{"issue":"15","key":"pcbi.1010028.ref012","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome","volume":"37","author":"Y Ji","year":"2021","journal-title":"Bioinformatics"},{"issue":"3","key":"pcbi.1010028.ref013","doi-asserted-by":"crossref","first-page":"e1006646","DOI":"10.1371\/journal.pgen.1006646","article-title":"Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization.","volume":"13","author":"X Wen","year":"2017","journal-title":"PLoS genetics"},{"issue":"8","key":"pcbi.1010028.ref014","doi-asserted-by":"crossref","first-page":"802","DOI":"10.1002\/gepi.21765","article-title":"Statistical testing of shared genetic control for potentially related traits","volume":"37","author":"C. Wallace","year":"2013","journal-title":"Genetic epidemiology"},{"issue":"7","key":"pcbi.1010028.ref015","doi-asserted-by":"crossref","first-page":"e1006933","DOI":"10.1371\/journal.pgen.1006933","article-title":"Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer\u2019s disease","volume":"13","author":"Q Lu","year":"2017","journal-title":"PLoS genetics"},{"issue":"3","key":"pcbi.1010028.ref016","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"M Kircher","year":"2014","journal-title":"Nature genetics"},{"issue":"2","key":"pcbi.1010028.ref017","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1038\/ng.3477","article-title":"A spectral approach integrating functional genomic annotations for coding and noncoding variants","volume":"48","author":"I Ionita-Laza","year":"2016","journal-title":"Nature genetics"},{"issue":"1","key":"pcbi.1010028.ref018","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1101\/gr.155192.113","article-title":"Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals","volume":"24","author":"A Battle","year":"2014","journal-title":"Genome research"},{"issue":"7675","key":"pcbi.1010028.ref019","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1038\/nature24267","article-title":"The impact of rare variation on gene expression across tissues","volume":"550","author":"X Li","year":"2017","journal-title":"Nature"},{"issue":"7","key":"pcbi.1010028.ref020","doi-asserted-by":"crossref","first-page":"e1008050","DOI":"10.1371\/journal.pcbi.1008050","article-title":"Cross-species regulatory sequence activity prediction","volume":"16","author":"DR Kelley","year":"2020","journal-title":"PLoS computational biology"},{"key":"pcbi.1010028.ref021","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","author":"Z Avsec","year":"2021","journal-title":"bioRxiv"},{"issue":"8","key":"pcbi.1010028.ref022","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1038\/s41588-018-0160-6","article-title":"Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk","volume":"50","author":"J Zhou","year":"2018","journal-title":"Nature genetics"},{"issue":"7","key":"pcbi.1010028.ref023","doi-asserted-by":"crossref","first-page":"107663","DOI":"10.1016\/j.celrep.2020.107663","article-title":"Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks","volume":"31","author":"V Agarwal","year":"2020","journal-title":"Cell reports"},{"issue":"6","key":"pcbi.1010028.ref024","doi-asserted-by":"crossref","first-page":"1519","DOI":"10.1016\/j.cell.2016.04.027","article-title":"Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay","volume":"165","author":"R Tewhey","year":"2016","journal-title":"Cell"},{"key":"pcbi.1010028.ref025","article-title":"High throughput characterization of genetic effects on DNA:protein binding and gene transcription.","author":"CA Kalita","year":"2018","journal-title":"bioRxiv"},{"key":"pcbi.1010028.ref026","first-page":"193136","article-title":"High-resolution genome-wide functional dissection of transcriptional regulatory regions in human.","author":"X Wang","year":"2017","journal-title":"bioRxiv"},{"issue":"6","key":"pcbi.1010028.ref027","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"WJ Kent","year":"2002","journal-title":"Genome research"},{"issue":"6235","key":"pcbi.1010028.ref028","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1126\/science.1262110","article-title":"The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans","volume":"348","author":"G. Consortium","year":"2015","journal-title":"Science"},{"key":"pcbi.1010028.ref029","first-page":"014241","article-title":"Partitioning heritability by functional category using GWAS summary statistics.","author":"HK Finucane","year":"2015","journal-title":"bioRxiv"},{"issue":"10","key":"pcbi.1010028.ref030","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1038\/nbt1010-1045","article-title":"The NIH roadmap epigenomics mapping consortium","volume":"28","author":"BE Bernstein","year":"2010","journal-title":"Nature biotechnology"},{"key":"pcbi.1010028.ref031","first-page":"092445","article-title":"Paired CRISPR\/Cas9 guide-RNAs enable high-throughput deletion scanning (ScanDel) of a Mendelian disease locus for functionally critical non-coding elements.","author":"M Gasperini","year":"2016","journal-title":"bioRxiv"},{"issue":"D1","key":"pcbi.1010028.ref032","doi-asserted-by":"crossref","first-page":"D805","DOI":"10.1093\/nar\/gku1075","article-title":"COSMIC: exploring the world\u2019s knowledge of somatic mutations in human cancer","volume":"43","author":"SA Forbes","year":"2014","journal-title":"Nucleic acids research"},{"key":"pcbi.1010028.ref033","first-page":"253427","article-title":"Allele-specific transcription factor binding as a benchmark for assessing variant impact predictors.","author":"O Wagih","year":"2018","journal-title":"bioRxiv"},{"issue":"8","key":"pcbi.1010028.ref034","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning","volume":"33","author":"B Alipanahi","year":"2015","journal-title":"Nature biotechnology"},{"issue":"8","key":"pcbi.1010028.ref035","doi-asserted-by":"crossref","first-page":"955","DOI":"10.1038\/ng.3331","article-title":"A method to predict the impact of regulatory variants from DNA sequence","volume":"47","author":"D Lee","year":"2015","journal-title":"Nature genetics"},{"issue":"4","key":"pcbi.1010028.ref036","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1093\/bioinformatics\/btv565","article-title":"GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding","volume":"32","author":"H Zeng","year":"2015","journal-title":"Bioinformatics"},{"issue":"11","key":"pcbi.1010028.ref037","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1038\/ng.2797","article-title":"Discovery and refinement of loci associated with lipid levels","volume":"45","author":"CJ Willer","year":"2013","journal-title":"Nature genetics"},{"issue":"1","key":"pcbi.1010028.ref038","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.ajhg.2016.05.013","article-title":"Contrasting the genetic architecture of 30 complex traits from summary association data","volume":"99","author":"H Shi","year":"2016","journal-title":"The American Journal of Human Genetics"},{"issue":"12","key":"pcbi.1010028.ref039","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.1038\/ng.3981","article-title":"Estimating the causal tissues for complex traits and diseases","volume":"49","author":"H Ongen","year":"2017","journal-title":"Nature genetics"},{"issue":"10","key":"pcbi.1010028.ref040","doi-asserted-by":"crossref","first-page":"1084","DOI":"10.1038\/ng.2394","article-title":"Mapping cis-and trans-regulatory effects across multiple tissues in twins","volume":"44","author":"E Grundberg","year":"2012","journal-title":"Nature genetics"},{"issue":"1","key":"pcbi.1010028.ref041","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1016\/j.ajhg.2010.11.011","article-title":"GCTA: a tool for genome-wide complex trait analysis","volume":"88","author":"J Yang","year":"2011","journal-title":"The American Journal of Human Genetics"},{"key":"pcbi.1010028.ref042","article-title":"Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues","author":"AA Brown","year":"2017","journal-title":"Nature Genetics"},{"issue":"2","key":"pcbi.1010028.ref043","doi-asserted-by":"crossref","first-page":"e1005875","DOI":"10.1371\/journal.pgen.1005875","article-title":"Which genetics variants in DNase-Seq footprints are more likely to alter binding?","volume":"12","author":"GA Moyerbrailean","year":"2016","journal-title":"PLoS genetics."},{"key":"pcbi.1010028.ref044","first-page":"170506","article-title":"Frequent lack of repressive capacity of promoter DNA methylation identified through genome-wide epigenomic manipulation","author":"EE Ford","year":"2017","journal-title":"bioRxiv"},{"issue":"3","key":"pcbi.1010028.ref045","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1038\/nbt.2137","article-title":"Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay","volume":"30","author":"A Melnikov","year":"2012","journal-title":"Nature biotechnology"},{"issue":"9","key":"pcbi.1010028.ref046","doi-asserted-by":"crossref","first-page":"1790","DOI":"10.1101\/gr.137323.112","article-title":"Annotation of functional variation in personal genomes using RegulomeDB","volume":"22","author":"AP Boyle","year":"2012","journal-title":"Genome research"},{"issue":"7370","key":"pcbi.1010028.ref047","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1038\/nature10530","article-title":"A high-resolution map of human evolutionary constraint using 29 mammals","volume":"478","author":"K Lindblad-Toh","year":"2011","journal-title":"Nature"},{"issue":"7493","key":"pcbi.1010028.ref048","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1038\/nature12787","article-title":"An atlas of active enhancers across human cell types and tissues","volume":"507","author":"R Andersson","year":"2014","journal-title":"Nature"},{"key":"pcbi.1010028.ref049","article-title":"Fast and accurate deep network learning by exponential linear units (elus).","author":"D-A Clevert","year":"2015","journal-title":"arXiv preprint arXiv:151107289."},{"key":"pcbi.1010028.ref050","article-title":"Improving neural networks by preventing co-adaptation of feature detectors.","author":"GE Hinton","year":"2012","journal-title":"arXiv preprint arXiv:12070580."},{"issue":"1","key":"pcbi.1010028.ref051","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"N Srivastava","year":"2014","journal-title":"Journal of machine learning research"},{"key":"pcbi.1010028.ref052","article-title":"Adam: A method for stochastic optimization.","author":"D Kingma","year":"2014","journal-title":"arXiv preprint arXiv:14126980."},{"key":"pcbi.1010028.ref053","doi-asserted-by":"crossref","DOI":"10.1038\/ncomms15452","article-title":"A complete tool set for molecular QTL discovery and analysis.","volume":"8","author":"O Delaneau","year":"2017","journal-title":"Nature Communications"},{"issue":"4","key":"pcbi.1010028.ref054","article-title":"glmnet: Lasso and elastic-net regularized generalized linear models.","volume":"1","author":"J Friedman","year":"2009"},{"issue":"Oct","key":"pcbi.1010028.ref055","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python.","volume":"12","author":"F Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"issue":"43","key":"pcbi.1010028.ref056","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"A Subramanian","year":"2005","journal-title":"Proceedings of the National Academy of Sciences"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010028","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T00:00:00Z","timestamp":1650931200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010028","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,26]],"date-time":"2022-04-26T14:03:34Z","timestamp":1650981814000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010028"}},"subtitle":[],"editor":[{"given":"Eric","family":"Gamazon","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,4,14]]},"references-count":56,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4,14]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010028","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/279323","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,14]]}}}