{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,4]],"date-time":"2024-08-04T07:17:10Z","timestamp":1722755830422},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2017,5,31]],"date-time":"2017-05-31T00:00:00Z","timestamp":1496188800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We develop tractable count-based models, which are amenable to efficient estimation through the introduction of latent variables and the appropriate application of recent statistical theory in a sparse Bayesian modelling framework. Furthermore, we examine several transformation methods for RNA-seq read counts and we introduce arcsin, logit and Laplace smoothing as preprocessing steps for transformation-based models. Using natural and carefully simulated data from the 1000 Genomes and gEUVADIS projects, we benchmark both approaches under a variety of scenarios, including the presence of noise and violation of basic model assumptions. We demonstrate that an arcsin transformation of Laplace-smoothed data is at least as good as state-of-the-art models, particularly at small samples. Furthermore, we show that an over-dispersed Poisson model is comparable to the celebrated Negative Binomial, but much easier to estimate. These results provide strong support for transformation-based versus count-based (particularly Negative-Binomial-based) models for eQTL mapping.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>All methods are implemented in the free software eQTLseq: https:\/\/github.com\/dvav\/eQTLseq<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx355","type":"journal-article","created":{"date-parts":[[2017,5,30]],"date-time":"2017-05-30T11:21:29Z","timestamp":1496143289000},"page":"3058-3064","source":"Crossref","is-referenced-by-count":4,"title":["Hierarchical probabilistic models for multiple gene\/variant associations based on next-generation sequencing data"],"prefix":"10.1093","volume":"33","author":[{"given":"Dimitrios V","family":"Vavoulis","sequence":"first","affiliation":[{"name":"The Nuffield Division of Clinical Laboratory Sciences, University of Oxford, Oxford, UK"},{"name":"The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK"},{"name":"National Health Service Translational Molecular Diagnostics Centre, Oxford University Hospitals, John Radcliffe Hospital, Oxford, UK"},{"name":"National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UK"}]},{"given":"Jenny C","family":"Taylor","sequence":"additional","affiliation":[{"name":"The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK"},{"name":"National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UK"}]},{"given":"Anna","family":"Schuh","sequence":"additional","affiliation":[{"name":"National Health Service Translational Molecular Diagnostics Centre, Oxford University Hospitals, John Radcliffe Hospital, Oxford, UK"},{"name":"National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UK"},{"name":"Department of Oncology, University of Oxford, Oxford, UK"}]}],"member":"286","published-online":{"date-parts":[[2017,5,31]]},"reference":[{"key":"2023020206431524900_btx355-B1","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023020206431524900_btx355-B2","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1038\/nrg3891","article-title":"The role of regulatory variation in complex traits and disease","volume":"16","author":"Albert","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023020206431524900_btx355-B3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1020281327116","article-title":"An introduction to mcmc for machine learning","volume":"50","author":"Andrieu","year":"2003","journal-title":"Mach. Learn"},{"key":"2023020206431524900_btx355-B4","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1007\/s10519-009-9281-0","article-title":"Rank-based inverse normal transformations are increasingly used, but are they merited?","volume":"39","author":"Beasley","year":"2009","journal-title":"Behav. Genet"},{"key":"2023020206431524900_btx355-B5","doi-asserted-by":"crossref","first-page":"1449","DOI":"10.1534\/genetics.111.131425","article-title":"Bayesian detection of expression quantitative trait loci hot spots","volume":"189","author":"Bottolo","year":"2011","journal-title":"Genetics"},{"key":"2023020206431524900_btx355-B6","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1111\/j.2517-6161.1964.tb00553.x","article-title":"An analysis of transformations","volume":"26","author":"Box","year":"1964","journal-title":"J. R. Stat. Soc. Ser. B-Stat. Methodol"},{"key":"2023020206431524900_btx355-B7","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1006\/csla.1999.0128","article-title":"An empirical study of smoothing techniques for language modeling","volume":"13","author":"Chen","year":"1999","journal-title":"Comput. Speech Lang"},{"key":"2023020206431524900_btx355-B8","doi-asserted-by":"crossref","first-page":"i139","DOI":"10.1093\/bioinformatics\/btu293","article-title":"Graph-regularized dual lasso for robust eqtl mapping","volume":"30","author":"Cheng","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020206431524900_btx355-B9","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1038\/nrg2537","article-title":"Mapping complex disease traits with global gene expression","volume":"10","author":"Cookson","year":"2009","journal-title":"Nat. Rev. Genet"},{"key":"2023020206431524900_btx355-B10","first-page":"697","volume-title":"Advances in Neural Information Processing Systems 14, Vols 1 and 2, Volume 14, Five Cambridge Center","author":"Figueiredo","year":"2002"},{"key":"2023020206431524900_btx355-B11","doi-asserted-by":"crossref","first-page":"e1003486.","DOI":"10.1371\/journal.pgen.1003486","article-title":"A statistical framework for joint eqtl analysis in multiple tissues","volume":"9","author":"Flutre","year":"2013","journal-title":"PLoS Genet"},{"key":"2023020206431524900_btx355-B12","doi-asserted-by":"crossref","first-page":"449.","DOI":"10.1186\/1471-2105-12-449","article-title":"Recount: a multi-experiment resource of analysis-ready rna-seq gene count datasets","volume":"12","author":"Frazee","year":"2011","journal-title":"BMC Bioinform"},{"key":"2023020206431524900_btx355-B13","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1214\/009053604000001147","article-title":"Spike and slab variable selection: frequentist and Bayesian strategies","volume":"33","author":"Ishwaran","year":"2005","journal-title":"Ann. Stat"},{"key":"2023020206431524900_btx355-B14","doi-asserted-by":"crossref","first-page":"16.","DOI":"10.1186\/s13059-016-1142-6","article-title":"Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies","volume":"18","author":"Joehanes","year":"2017","journal-title":"Genome Biol"},{"key":"2023020206431524900_btx355-B15","doi-asserted-by":"crossref","first-page":"1534","DOI":"10.1214\/10-AOAS435","article-title":"Nonparametric bayesian sparse factor models with application to gene expression modeling","volume":"5","author":"Knowles","year":"2011","journal-title":"Ann. Appl. Statistics"},{"key":"2023020206431524900_btx355-B16","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1038\/ng.3467","article-title":"Fine-mapping cellular qtls with rasqual and atac-seq","volume":"48","author":"Kumasaka","year":"2016","journal-title":"Nat. Genet"},{"key":"2023020206431524900_btx355-B17","doi-asserted-by":"crossref","first-page":"248","DOI":"10.3732\/ajb.1100340","article-title":"A comparison of statistical methods for detecting differentially expressed genes from rna-seq data","volume":"99","author":"Kvam","year":"2012","journal-title":"Am. J. Bot"},{"key":"2023020206431524900_btx355-B18","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1038\/nature12531","article-title":"Transcriptome and genome sequencing uncovers functional variation in humans","volume":"501","author":"Lappalainen","year":"2013","journal-title":"Nature"},{"key":"2023020206431524900_btx355-B19","doi-asserted-by":"crossref","first-page":"R29","DOI":"10.1186\/gb-2014-15-2-r29","article-title":"voom: precision weights unlock linear model analysis tools for rna-seq read counts","volume":"15","author":"Law","year":"2014","journal-title":"Genome Biol"},{"key":"2023020206431524900_btx355-B20","doi-asserted-by":"crossref","first-page":"255","DOI":"10.2307\/2532051","article-title":"A concordance correlation-coefficient to evaluate reproducibility","volume":"45","author":"Lin","year":"1989","journal-title":"Biometrics"},{"key":"2023020206431524900_btx355-B21","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/s13059-014-0560-6","article-title":"Gateways to the fantom5 promoter level mammalian expression atlas","volume":"16","author":"Lizio","year":"2015","journal-title":"Genome Biol"},{"key":"2023020206431524900_btx355-B22","doi-asserted-by":"crossref","first-page":"550.","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for rna-seq data with deseq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2023020206431524900_btx355-B23","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of predicted and observed secondary structure of t4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochim. Biophys. Acta"},{"key":"2023020206431524900_btx355-B24","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s13059-016-0974-4","article-title":"The ensembl variant effect predictor","volume":"17","author":"McLaren","year":"2016","journal-title":"Genome Biol"},{"key":"2023020206431524900_btx355-B25","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1038\/nature08903","article-title":"Transcriptome genetics using second generation sequencing in a Caucasian population","volume":"464","author":"Montgomery","year":"2010","journal-title":"Nature"},{"key":"2023020206431524900_btx355-B26","first-page":"85","article-title":"A review of Bayesian variable selection methods: what, how and which","volume":"4","author":"O\u2019Hara","year":"2009","journal-title":"Bayesian Anal"},{"key":"2023020206431524900_btx355-B27","doi-asserted-by":"crossref","first-page":"681","DOI":"10.1198\/016214508000000337","article-title":"The Bayesian lasso","volume":"103","author":"Park","year":"2008","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020206431524900_btx355-B28","doi-asserted-by":"crossref","first-page":"1339","DOI":"10.1080\/01621459.2013.829001","article-title":"Bayesian inference for logistic models using polya-gamma latent variables","volume":"108","author":"Polson","year":"2013","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020206431524900_btx355-B29","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1093\/bib\/bbt086","article-title":"Comparison of software packages for detecting differential expression in rna-seq studies","volume":"16","author":"Seyednasrollah","year":"2015","journal-title":"Brief Bioinform"},{"key":"2023020206431524900_btx355-B30","doi-asserted-by":"crossref","first-page":"1353","DOI":"10.1093\/bioinformatics\/bts163","article-title":"Matrix eqtl: ultra fast eqtl analysis via large matrix operations","volume":"28","author":"Shabalin","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020206431524900_btx355-B31","doi-asserted-by":"crossref","first-page":"91.","DOI":"10.1186\/1471-2105-14-91","article-title":"A comparison of methods for differential expression analysis of rna-seq data","volume":"14","author":"Soneson","year":"2013","journal-title":"BMC Bioinform"},{"key":"2023020206431524900_btx355-B32","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1007\/s12561-012-9068-3","article-title":"eqtl mapping using rna-seq data","volume":"5","author":"Sun","year":"2013","journal-title":"Stat. Biosci"},{"key":"2023020206431524900_btx355-B33","first-page":"211","article-title":"Sparse Bayesian learning and the relevance vector machine","volume":"1","author":"Tipping","year":"2001","journal-title":"J. Mach. Learn. Res"},{"key":"2023020206431524900_btx355-B34","doi-asserted-by":"crossref","first-page":"39.","DOI":"10.1186\/s13059-015-0604-6","article-title":"Dgeclust: differential expression analysis of clustered count data","volume":"16","author":"Vavoulis","year":"2015","journal-title":"Genome Biol"},{"key":"2023020206431524900_btx355-B35","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"Rna-seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet"},{"key":"2023020206431524900_btx355-B36","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1890\/10-0340.1","article-title":"The arcsine is asinine: the analysis of proportions in ecology","volume":"92","author":"Warton","year":"2011","journal-title":"Ecology"},{"key":"2023020206431524900_btx355-B37","first-page":"733","volume-title":"Bayesian Statistics 7, Walton St","author":"West","year":"2003"},{"key":"2023020206431524900_btx355-B38","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1093\/biostatistics\/kxs033","article-title":"A new shrinkage estimator for dispersion improves differential expression detection in rna-seq data","volume":"14","author":"Wu","year":"2013","journal-title":"Biostatistics"},{"key":"2023020206431524900_btx355-B39","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1534\/genetics.107.085589","article-title":"Bayesian lasso for quantitative trait loci mapping","volume":"179","author":"Yi","year":"2008","journal-title":"Genetics"},{"key":"2023020206431524900_btx355-B40","doi-asserted-by":"crossref","first-page":"e85150","DOI":"10.1371\/journal.pone.0085150","article-title":"Transforming rna-seq data to improve the performance of prognostic gene signatures","volume":"9","author":"Zwiener","year":"2014","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/19\/3058\/49041009\/bioinformatics_33_19_3058.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/19\/3058\/49041009\/bioinformatics_33_19_3058.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,24]],"date-time":"2024-06-24T12:15:59Z","timestamp":1719231359000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/19\/3058\/3859180"}},"subtitle":[],"editor":[{"given":"Ziv","family":"Bar-Joseph","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,5,31]]},"references-count":40,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2017,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx355","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,10,1]]},"published":{"date-parts":[[2017,5,31]]}}}