{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:39:16Z","timestamp":1740184756019,"version":"3.37.3"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2021,2,1]],"date-time":"2021-02-01T00:00:00Z","timestamp":1612137600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Data generated from high-throughput technologies such as sequencing, microarray and bead-chip technologies are unavoidably affected by batch effects (BEs). Large effort has been put into developing methods for correcting these effects. Often, BE correction and hypothesis testing cannot be done with one single model, but are done successively with separate models in data analysis pipelines. This potentially leads to biased P-values or false discovery rates due to the influence of BE correction on the data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We present a novel approach for estimating null distributions of test statistics in data analysis pipelines where BE correction is followed by linear model analysis. The approach is based on generating simulated datasets by random rotation and thereby retains the dependence structure of genes adequately. This allows estimating null distributions of dependent test statistics, and thus the calculation of resampling-based P-values and false-discovery rates following BE correction while maintaining the alpha level.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability<\/jats:title><jats:p>The described methods are implemented as randRotation package on Bioconductor: https:\/\/bioconductor.org\/packages\/randRotation\/<\/jats:p><\/jats:sec><jats:sec><jats:title>Contact<\/jats:title><jats:p>p.hettegger@gmail.com<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab063","type":"journal-article","created":{"date-parts":[[2021,1,27]],"date-time":"2021-01-27T20:11:14Z","timestamp":1611778274000},"page":"2142-2149","source":"Crossref","is-referenced-by-count":0,"title":["Random rotation for identifying differentially expressed genes with linear models following batch effect correction"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8557-588X","authenticated-orcid":false,"given":"Peter","family":"Hettegger","sequence":"first","affiliation":[{"name":"Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology , Vienna 1220, Austria"}]},{"given":"Klemens","family":"Vierlinger","sequence":"additional","affiliation":[{"name":"Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology , Vienna 1220, Austria"}]},{"given":"Andreas","family":"Weinhaeusel","sequence":"additional","affiliation":[{"name":"Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology , Vienna 1220, Austria"}]}],"member":"286","published-online":{"date-parts":[[2021,2,1]]},"reference":[{"key":"2023061310295847200_btab063-B1","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1139\/f01-004","article-title":"Permutation tests for univariate or multivariate analysis of variance and regression","volume":"58","author":"Anderson","year":"2001","journal-title":"Canadian J. Fish. Aquat. Sci"},{"key":"2023061310295847200_btab063-B2","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B"},{"key":"2023061310295847200_btab063-B3","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under depencency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Ann. Stat"},{"key":"2023061310295847200_btab063-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2202\/1544-6115.1418","article-title":"Rotation testing in gene set enrichment analysis for small direct comparison experiments","volume":"8","author":"D\u00f8rum","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023061310295847200_btab063-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/aos\/1176344552","article-title":"Bootstrap methods: another look at the Jackknife","volume":"7","author":"Efron","year":"1979","journal-title":"Ann. Stat"},{"key":"2023061310295847200_btab063-B6","doi-asserted-by":"publisher","first-page":"619","DOI":"10.1198\/016214504000000692","article-title":"The Estimation of Prediction Error: Covariance penalties and cross-validation","volume":"99","author":"Efron","year":"2004","journal-title":"Journal of the American Statistical Association"},{"key":"2023061310295847200_btab063-B7","doi-asserted-by":"crossref","first-page":"2634","DOI":"10.1093\/bioinformatics\/bty117","article-title":"Mitigating the adverse impact of batch effects in sample pattern detection","volume":"34","author":"Fei","year":"2018","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B8","doi-asserted-by":"crossref","first-page":"e1006102","DOI":"10.1371\/journal.pcbi.1006102","article-title":"Correcting for batch effects in case-control microbiome studies","volume":"14","author":"Gibbons","year":"2018","journal-title":"PLoS Comput. Biol"},{"key":"2023061310295847200_btab063-B9","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1016\/j.tibtech.2017.02.012","article-title":"Why batch effects matter in omics data, and how to avoid them","volume":"35","author":"Goh","year":"2017","journal-title":"Trends Biotechnol"},{"key":"2023061310295847200_btab063-B10","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023061310295847200_btab063-B11","doi-asserted-by":"crossref","DOI":"10.1371\/annotation\/58cf4d21-f9b0-4292-94dd-3177f393a284","article-title":"Differential expression analysis for pathways","author":"Haynes","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023061310295847200_btab063-B12","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-015-0870-z","article-title":"Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment","author":"Hornung","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023061310295847200_btab063-B13","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1093\/bioinformatics\/bts096","article-title":"R\/DWD: distance-weighted discrimination for classification, visualization and batch adjustment","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B14","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/nmeth.3252","article-title":"Orchestrating high-throughput genomic analysis with bioconductor","volume":"12","author":"Huber","year":"2015","journal-title":"Nat. Methods"},{"key":"2023061310295847200_btab063-B15","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2023061310295847200_btab063-B16","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1007\/s11222-005-4789-5","article-title":"Rotation tests","volume":"15","author":"Langsrud","year":"2005","journal-title":"Stat. Comput"},{"key":"2023061310295847200_btab063-B17","doi-asserted-by":"crossref","first-page":"R29","DOI":"10.1186\/gb-2014-15-2-r29","article-title":"Voom: precision weights unlock linear model analysis tools for RNA-seq read counts","volume":"15","author":"Law","year":"2014","journal-title":"Genome Biol"},{"key":"2023061310295847200_btab063-B18","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"key":"2023061310295847200_btab063-B19","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat. Rev. Genet"},{"year":"2020","author":"Leek","key":"2023061310295847200_btab063-B20"},{"key":"2023061310295847200_btab063-B21","doi-asserted-by":"crossref","first-page":"10849","DOI":"10.1038\/s41598-017-11110-6","article-title":"A novel statistical method to diagnose, quantify and correct batch effects in genomic studies","volume":"7","author":"Nyamundanda","year":"2017","journal-title":"Sci. Rep"},{"key":"2023061310295847200_btab063-B22","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1093\/biostatistics\/kxv027","article-title":"Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses","volume":"17","author":"Nygaard","year":"2015","journal-title":"Biostatistics (Oxford, England)"},{"key":"2023061310295847200_btab063-B23","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1016\/j.spl.2005.04.057","article-title":"Bootstrap hypothesis testing in regression models","volume":"74","author":"Paparoditis","year":"2005","journal-title":"Stat. Probabil. Lett"},{"key":"2023061310295847200_btab063-B24","article-title":"Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn","volume":"9, Article39","author":"Phipson","year":"2010","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023061310295847200_btab063-B25","doi-asserted-by":"crossref","first-page":"83","DOI":"10.3389\/fgene.2018.00083","article-title":"Adjusting for batch effects in DNA methylation microarray data, a lesson learned","volume":"9","author":"Price","year":"2018","journal-title":"Front. Genet"},{"volume-title":"R: A Language and Environment for Statistical Computing","year":"2020","author":"R Core Team","key":"2023061310295847200_btab063-B26"},{"key":"2023061310295847200_btab063-B27","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1093\/bioinformatics\/btf877","article-title":"Identifying differentially expressed genes using false discovery rate controlling procedures","volume":"19","author":"Reiner","year":"2003","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B28","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023061310295847200_btab063-B29","doi-asserted-by":"crossref","first-page":"i908","DOI":"10.1093\/bioinformatics\/bty553","article-title":"An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets","volume":"34","author":"Schmidt","year":"2018","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B30","doi-asserted-by":"crossref","first-page":"2539","DOI":"10.1093\/bioinformatics\/btx196","article-title":"Removal of batch effects using distribution-matching residual networks","volume":"33","author":"Shaham","year":"2017","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B31","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","article-title":"limma: linear models for microarray data","author":"Smyth","year":"2005","journal-title":"Bioinformatics and Computational Biology Solutions Using R and Bioconductor"},{"key":"2023061310295847200_btab063-B32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2202\/1544-6115.1027","article-title":"Linear models and empirical bayes methods for assessing differential expression in microarray experiments","volume":"3","author":"Smyth","year":"2004","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023061310295847200_btab063-B33","doi-asserted-by":"crossref","first-page":"2067","DOI":"10.1093\/bioinformatics\/bti270","article-title":"Use of within-array replicate spots for assessing differential expression in microarray experiments","volume":"21","author":"Smyth","year":"2005","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B34","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1214\/aos\/1176350158","article-title":"Discussion: jackknife, bootstrap and other resampling methods in regression analysis","volume":"14","author":"Tibshirani","year":"1986","journal-title":"Ann. Stat"},{"key":"2023061310295847200_btab063-B35","doi-asserted-by":"crossref","first-page":"e83757","DOI":"10.1371\/journal.pone.0083757","article-title":"Comparing the biological impact of glatiramer acetate with the biological impact of a generic","volume":"9","author":"Towfic","year":"2014","journal-title":"PLoS ONE"},{"key":"2023061310295847200_btab063-B36","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1016\/j.neuroimage.2014.01.060","article-title":"Permutation inference for the general linear model","volume":"92","author":"Winkler","year":"2014","journal-title":"NeuroImage"},{"key":"2023061310295847200_btab063-B37","first-page":"1343","article-title":"Jackknife, Bootstrap and other resampling methods in regression analysis","volume":"14","author":"Wu","year":"1986","journal-title":"Ann. Stat"},{"key":"2023061310295847200_btab063-B38","doi-asserted-by":"crossref","first-page":"2176","DOI":"10.1093\/bioinformatics\/btq401","article-title":"ROAST: rotation gene set tests for complex microarray experiments","volume":"26","author":"Wu","year":"2010","journal-title":"Bioinformatics"},{"key":"2023061310295847200_btab063-B39","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1080\/01621459.1998.10474094","article-title":"On measuring and correcting the effects of data mining and model selection","volume":"93","author":"Ye","year":"1998","journal-title":"J. Am. Stat. Assoc"},{"key":"2023061310295847200_btab063-B40","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/S0378-3758(99)00041-5","article-title":"Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics","volume":"82","author":"Yekutieli","year":"1999","journal-title":"J. Stat. Plann. Infer"},{"key":"2023061310295847200_btab063-B41","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2263-6","article-title":"Alternative empirical Bayes models for adjusting for batch effects in genomic studies","volume":"19","author":"Zhang","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023061310295847200_btab063-B42","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/MSP.2007.4286560","article-title":"Bootstrap methods and applications","volume":"24","author":"Zoubir","year":"2007","journal-title":"IEEE Signal Process. Mag"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab063\/38504076\/btab063.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/15\/2142\/50578601\/btab063.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/15\/2142\/50578601\/btab063.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,23]],"date-time":"2024-08-23T02:38:43Z","timestamp":1724380723000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/15\/2142\/6125383"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,2,1]]},"references-count":42,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2021,8,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab063","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,8,1]]},"published":{"date-parts":[[2021,2,1]]}}}