{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T16:59:31Z","timestamp":1774544371447,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2017,10,9]],"date-time":"2017-10-09T00:00:00Z","timestamp":1507507200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100006606","name":"Natural Science Foundation of Tianjin","doi-asserted-by":"publisher","award":["15JCYBJC18900"],"award-info":[{"award-number":["15JCYBJC18900"]}],"id":[{"id":"10.13039\/501100006606","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31728013"],"award-info":[{"award-number":["31728013"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DMS-1263932"],"award-info":[{"award-number":["DMS-1263932"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["RP-170387"],"award-info":[{"award-number":["RP-170387"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100001174","name":"Houston Endowment","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100001174","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Batch effects are one of the major source of technical variations that affect the measurements in high-throughput studies such as RNA sequencing. It has been well established that batch effects can be caused by different experimental platforms, laboratory conditions, different sources of samples and personnel differences. These differences can confound the outcomes of interest and lead to spurious results. A critical input for batch correction algorithms is the knowledge of batch factors, which in many cases are unknown or inaccurate. Hence, the primary motivation of our paper is to detect hidden batch factors that can be used in standard techniques to accurately capture the relationship between gene expression and other modeled variables of interest.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We introduce a new algorithm based on data-adaptive shrinkage and semi-Non-negative Matrix Factorization for the detection of unknown batch effects. We test our algorithm on three different datasets: (i) Sequencing Quality Control, (ii) Topotecan RNA-Seq and (iii) Single-cell RNA sequencing (scRNA-Seq) on Glioblastoma Multiforme. We have demonstrated a superior performance in identifying hidden batch effects as compared to existing algorithms for batch detection in all three datasets. In the Topotecan study, we were able to identify a new batch factor that has been missed by the original study, leading to under-representation of differentially expressed genes. For scRNA-Seq, we demonstrated the power of our method in detecting subtle batch effects.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>DASC R package is available via Bioconductor or at https:\/\/github.com\/zhanglabNKU\/DASC.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx635","type":"journal-article","created":{"date-parts":[[2017,10,6]],"date-time":"2017-10-06T19:12:01Z","timestamp":1507317121000},"page":"1141-1147","source":"Crossref","is-referenced-by-count":20,"title":["Detecting hidden batch factors through data-adaptive adjustment for biological effects"],"prefix":"10.1093","volume":"34","author":[{"given":"Haidong","family":"Yi","sequence":"first","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Tianjin, China"}]},{"given":"Ayush T","family":"Raman","sequence":"additional","affiliation":[{"name":"Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX, USA"},{"name":"Department of Pediatrics, Neurological Research Institute, Baylor College of Medicine, Houston, TX, USA"}]},{"given":"Han","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer and Control Engineering, Nankai University, Tianjin, China"},{"name":"Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin, China"}]},{"given":"Genevera I","family":"Allen","sequence":"additional","affiliation":[{"name":"Department of Statistics, Rice University, Houston, TX, USA"}]},{"given":"Zhandong","family":"Liu","sequence":"additional","affiliation":[{"name":"Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX, USA"},{"name":"Department of Pediatrics, Neurological Research Institute, Baylor College of Medicine, Houston, TX, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,10,9]]},"reference":[{"key":"2023012712570295900_btx635-B1","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1038\/ng0707-807","article-title":"On the design and analysis of gene expression studies in human populations","volume":"39","author":"Akey","year":"2007","journal-title":"Nat. Genet"},{"key":"2023012712570295900_btx635-B2","doi-asserted-by":"crossref","first-page":"169.","DOI":"10.1186\/s12859-016-1327-8","article-title":"GFS: fuzzy preprocessing for effective gene expression analysis","volume":"17","author":"Belorkar","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023012712570295900_btx635-B3","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1093\/bioinformatics\/btg385","article-title":"Adjustment of systematic microarray data bases","volume":"20","author":"Benito","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B4","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"Brunet","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712570295900_btx635-B5","first-page":"21","article-title":"Splitting methods for convex clustering","volume":"212","author":"Chi","year":"2013","journal-title":"J. Comput. Graph. Statist"},{"key":"2023012712570295900_btx635-B6","author":"Chung","year":"1997"},{"key":"2023012712570295900_btx635-B7","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1038\/nbt.3838","article-title":"Reproducible RNA-seq analysis using recount2","volume":"35","author":"Collado-Torres","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023012712570295900_btx635-B8","doi-asserted-by":"crossref","first-page":"e1004226.","DOI":"10.1371\/journal.pgen.1004226","article-title":"The functional consequences of variation in transcription factor binding","volume":"10","author":"Cusanovich","year":"2014","journal-title":"PLoS Genet"},{"key":"2023012712570295900_btx635-B9","author":"Ding","year":"2005"},{"key":"2023012712570295900_btx635-B10","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1109\/TPAMI.2008.277","article-title":"Convex and semi-nonnegative matrix factorizations","volume":"32","author":"Ding","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023012712570295900_btx635-B11","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B12","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The ENCODE (Encyclopedia of DNA Elements) project","volume":"306","author":"Feingold","year":"2004","journal-title":"Science"},{"key":"2023012712570295900_btx635-B13","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1093\/biostatistics\/kxr034","article-title":"Using control genes to correct for unwanted variation in microarray data","volume":"13","author":"Gagnon-Bartsch","year":"2012","journal-title":"Biostatistics"},{"key":"2023012712570295900_btx635-B14","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-11-367","article-title":"A flexible R package for nonnegative matrix factorization","volume":"11","author":"Gaujoux","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012712570295900_btx635-B15","doi-asserted-by":"crossref","first-page":"121","DOI":"10.12688\/f1000research.6536.1","article-title":"A reanalysis of mouse encode comparative gene expression data","volume":"4","author":"Gilad","year":"2015","journal-title":"F1000Res"},{"key":"2023012712570295900_btx635-B16","volume-title":"The Elements of Statistical Learning","author":"Hastie","year":"2003"},{"key":"2023012712570295900_btx635-B17","author":"Hicks","year":"2017"},{"key":"2023012712570295900_btx635-B18","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","article-title":"Ridge regression: biased estimation for nonorthogonal problems","volume":"12","author":"Hoerl","year":"1970","journal-title":"Technometrics"},{"key":"2023012712570295900_btx635-B19","author":"Hornung","year":"2016"},{"key":"2023012712570295900_btx635-B20","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2023012712570295900_btx635-B21","doi-asserted-by":"crossref","first-page":"1495","DOI":"10.1093\/bioinformatics\/btm134","article-title":"Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis","volume":"23","author":"Kim","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B22","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1038\/nature12504","article-title":"Topoisomerases facilitate transcription of long genes linked to autism","volume":"501","author":"King","year":"2013","journal-title":"Nature"},{"key":"2023012712570295900_btx635-B23","author":"Lazar","year":"2012"},{"key":"2023012712570295900_btx635-B24","first-page":"556","article-title":"Algorithms for non-negative matrix factorization","volume":"13","author":"Lee","year":"2001","journal-title":"Adv. Neural Inform. Process. Syst"},{"key":"2023012712570295900_btx635-B25","doi-asserted-by":"crossref","DOI":"10.1038\/44565","article-title":"Learning the parts of objects by non-negative matrix factorization","author":"Lee","year":"1999","journal-title":"Nature"},{"key":"2023012712570295900_btx635-B26","doi-asserted-by":"crossref","first-page":"gku864.","DOI":"10.1093\/nar\/gku864","article-title":"svaseq: removing batch effects and other unwanted noise from sequencing data","volume":"42","author":"Leek","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023012712570295900_btx635-B27","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"key":"2023012712570295900_btx635-B28","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat. Rev. Genet"},{"key":"2023012712570295900_btx635-B29","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/bioinformatics\/bts034","article-title":"The sva package for removing batch effects and other unwanted variation in high-throughput experiments","volume":"28","author":"Leek","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B30","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2023012712570295900_btx635-B31","doi-asserted-by":"crossref","first-page":"75.","DOI":"10.1186\/s13059-016-0947-7","article-title":"Pooling across cells to normalize single-cell RNA sequencing data with many zero counts","volume":"17","author":"Lun","year":"2016","journal-title":"Genome Biol"},{"key":"2023012712570295900_btx635-B32","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1038\/tpj.2010.57","article-title":"A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data","volume":"10","author":"Luo","year":"2010","journal-title":"Pharmacogenomics J"},{"key":"2023012712570295900_btx635-B33","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1093\/bioinformatics\/btw777","article-title":"Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R","volume":"33","author":"McCarthy","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B34","doi-asserted-by":"crossref","first-page":"e68141.","DOI":"10.1371\/journal.pone.0068141","article-title":"Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge","volume":"8","author":"Mostafavi","year":"2013","journal-title":"PLoS One"},{"key":"2023012712570295900_btx635-B35","doi-asserted-by":"crossref","first-page":"1396","DOI":"10.1126\/science.1254257","article-title":"Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma","volume":"344","author":"Patel","year":"2014","journal-title":"Science"},{"key":"2023012712570295900_btx635-B36","doi-asserted-by":"crossref","first-page":"2877","DOI":"10.1093\/bioinformatics\/btt480","article-title":"A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis","volume":"29","author":"Reese","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B37","doi-asserted-by":"crossref","first-page":"896","DOI":"10.1038\/nbt.2931","article-title":"Normalization of RNA-seq data using factor analysis of control genes or samples","volume":"32","author":"Risso","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023012712570295900_btx635-B38","doi-asserted-by":"crossref","DOI":"10.1002\/9780470685983","volume-title":"Batch Effects and Noise in Microarray Experiments: Sources and Solutions","author":"Scherer","year":"2009"},{"key":"2023012712570295900_btx635-B39","doi-asserted-by":"crossref","first-page":"e1000770.","DOI":"10.1371\/journal.pcbi.1000770","article-title":"A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies","volume":"6","author":"Stegle","year":"2010","journal-title":"PLoS Comput. Biol"},{"key":"2023012712570295900_btx635-B40","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/s12859-015-0478-3","article-title":"Removing batch effects from purified plasma cell gene expression microarrays with modified combat","volume":"16","author":"Stein","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023012712570295900_btx635-B41","doi-asserted-by":"crossref","first-page":"903","DOI":"10.1038\/nbt.2957","article-title":"A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium","volume":"32","author":"Su","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023012712570295900_btx635-B42","doi-asserted-by":"crossref","first-page":"1496","DOI":"10.1093\/bioinformatics\/btr171","article-title":"Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies","volume":"27","author":"Teschendorff","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012712570295900_btx635-B43","author":"Tung","year":"2016"},{"key":"2023012712570295900_btx635-B44","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The Cancer Genome Atlas Pan-Cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat. Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/7\/1141\/48914430\/bioinformatics_34_7_1141.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/7\/1141\/48914430\/bioinformatics_34_7_1141.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,26]],"date-time":"2023-08-26T21:27:58Z","timestamp":1693085278000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/7\/1141\/4386916"}},"subtitle":[],"editor":[{"given":"Ziv","family":"Bar-Joseph","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,10,9]]},"references-count":44,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2018,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx635","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,4,1]]},"published":{"date-parts":[[2017,10,9]]}}}