{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T19:54:51Z","timestamp":1769284491088,"version":"3.49.0"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori.<\/jats:p><jats:p>Results: Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set.<\/jats:p><jats:p>Availability and implementation: All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https:\/\/sourceforge.net\/projects\/psva .<\/jats:p><jats:p>Contact: \u00a0ejfertig@jhmi.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu375","type":"journal-article","created":{"date-parts":[[2014,6,7]],"date-time":"2014-06-07T06:24:55Z","timestamp":1402122295000},"page":"2757-2763","source":"Crossref","is-referenced-by-count":121,"title":["Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction"],"prefix":"10.1093","volume":"30","author":[{"given":"Hilary S.","family":"Parker","sequence":"first","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Jeffrey T.","family":"Leek","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Alexander V.","family":"Favorov","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"},{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"},{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Michael","family":"Considine","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Xiaoxin","family":"Xia","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Sameer","family":"Chavan","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Christine H.","family":"Chung","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]},{"given":"Elana J.","family":"Fertig","sequence":"additional","affiliation":[{"name":"1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 2 Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA, 3 Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119333, Russia, 4 Research Institute for Genetics and Selection of Industrial Microorganisms \u201cGosNIIGenetika\u201d, Moscow 117545, Russia, 5 Department of Statistics and Biostatistics, Rutgers University, NJ 08854, USA and 6 Division of Allergy & Clinical Immunology, Department of Medicine, Johns Hopkins University, Baltimore, MD 21224, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,6,6]]},"reference":[{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1016\/S1535-6108(04)00112-6","article-title":"Molecular classification of head and neck squamous cell carcinomas using patterns of gene expression","volume":"5","author":"Chung","year":"2004","journal-title":"Cancer Cell"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1093\/annonc\/mdp390","article-title":"Nuclear factor-kappa b pathway and response in a phase ii trial of bortezomib and docetaxel in patients with recurrent and\/or metastatic head and neck squamous cell carcinoma","volume":"21","author":"Chung","year":"2010","journal-title":"Ann. Oncol."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"1804","DOI":"10.1002\/hed.21478","article-title":"Insulin-like growth factor-1 receptor inhibitor, amg-479, in cetuximab-refractory head and neck squamous cell carcinoma","volume":"33","author":"Chung","year":"2011","journal-title":"Head Neck"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1038\/ng.2762","article-title":"Emerging landscape of oncogenic signatures across human cancers","volume":"45","author":"Ciriello","year":"2013","journal-title":"Nat. Genet."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1158\/0008-5472.CAN-08-0377","article-title":"A feed-forward loop involving protein kinase calpha and micrornas regulates tumor cell cycle","volume":"69","author":"Cohen","year":"2009","journal-title":"Cancer Res."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1038\/nature10524","article-title":"Temporal dynamics and genetic control of transcription in the human prefrontal cortex","volume":"478","author":"Colantuoni","year":"2011","journal-title":"Nature"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"2792","DOI":"10.1093\/bioinformatics\/btq503","article-title":"Cogaps: an r\/c++ package to identify patterns and biological process activity in transcriptomic data","volume":"26","author":"Fertig","year":"2010","journal-title":"Bioinformatics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1093\/biostatistics\/kxr034","article-title":"Using control genes to correct for unwanted variation in microarray data","volume":"13","author":"Gagnon-Bartsch","year":"2012","journal-title":"Biostatistics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1002\/cncr.26364","article-title":"Phase 2 trial of oxaliplatin and pemetrexed as an induction regimen in locally advanced head and neck cancer","volume":"118","author":"Gilbert","year":"2012","journal-title":"Cancer"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"365s","DOI":"10.1200\/jco.2013.31.15_suppl.6010","article-title":"Genomic profiling of kinase genes in head and neck squamous cell carcinomas to identify potentially targetable genetic aberrations in fgfr1\/2, ddr2, epha2, and pik3ca","volume":"31","author":"Keck","year":"2013","journal-title":"J. Clin. Oncol."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1038\/nrg2825","article-title":"Tackling the widespread and critical impact of batch effects in high-throughput data","volume":"11","author":"Leek","year":"2010","journal-title":"Nat. Rev. Genet."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1093\/bioinformatics\/bts034","article-title":"The sva package for removing batch effects and other unwanted variation in high-throughput experiments","volume":"28","author":"Leek","year":"2012","journal-title":"Bioinformatics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1038\/tpj.2010.57","article-title":"A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-ii microarray gene expression data","volume":"10","author":"Luo","year":"2010","journal-title":"Pharmacogenomics J."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"2950","DOI":"10.1093\/bioinformatics\/btl433","article-title":"Copa\u2013cancer outlier profile analysis","volume":"22","author":"MacDonald","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1093\/biostatistics\/kxp059","article-title":"Frozen robust multiarray analysis (fRMA)","volume":"11","author":"McCall","year":"2010","journal-title":"Biostatistics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"Article 10","DOI":"10.1515\/1544-6115.1766","article-title":"The practical effect of batch on genomic prediction","volume":"11","author":"Parker","year":"2012","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023041303361692000_","article-title":"Removing batch effects for prediction problems with frozen surrogate variable analysis","author":"Parker","year":"2013","journal-title":"arXiv"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"4605","DOI":"10.1158\/0008-5472.CAN-06-3619","article-title":"Fundamental differences in cell cycle deregulation in human papillomavirus-positive and human papillomavirus-negative head\/neck and cervical cancers","volume":"67","author":"Pyeon","year":"2007","journal-title":"Cancer Res."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1016\/j.oraloncology.2010.02.013","article-title":"Refining the diagnosis of oropharyngeal squamous cell carcinoma using human papillomavirus testing","volume":"46","author":"Robinson","year":"2010","journal-title":"Oral Oncol."},{"issue":"3 Pt 1","key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1158\/1078-0432.CCR-05-2017","article-title":"Gene expression differences associated with human papillomavirus status in head and neck squamous cell carcinoma","volume":"12","author":"Slebos","year":"2006","journal-title":"Clin. Cancer Res."},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"2465","DOI":"10.1002\/ijc.22980","article-title":"A novel algorithm for reliable detection of human papillomavirus in paraffin embedded head and neck cancer specimen","volume":"121","author":"Smeets","year":"2007","journal-title":"Int. J. Cancer"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/1755-8794-4-84","article-title":"Batch effect correction for genome-wide methylation data with illumina infinium platform","volume":"4","author":"Sun","year":"2011","journal-title":"BMC Med. Genomics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"6567","DOI":"10.1073\/pnas.082099299","article-title":"Diagnosis of multiple cancer types by shrunken centroids of gene expression","volume":"99","author":"Tibshirani","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/1471-2164-14-14","article-title":"Quality assessment and data handling methods for affymetrix gene 1.0 ST arrays with variable RNA integrity","volume":"14","author":"Viljoen","year":"2013","journal-title":"BMC Genomics"},{"key":"2023041303361692000_","doi-asserted-by":"crossref","first-page":"e56823","DOI":"10.1371\/journal.pone.0056823","article-title":"Molecular subtypes in head and neck cancer exhibit distinct patterns of chromosomal gain and loss of canonical cancer genes","volume":"8","author":"Walter","year":"2013","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/19\/2757\/49872400\/bioinformatics_30_19_2757.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/19\/2757\/49872400\/bioinformatics_30_19_2757.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,14]],"date-time":"2023-07-14T01:30:45Z","timestamp":1689298245000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/19\/2757\/2422195"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6,6]]},"references-count":27,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2014,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu375","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,10]]},"published":{"date-parts":[[2014,6,6]]}}}