{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T01:58:27Z","timestamp":1775613507993,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard analysis of variance (ANOVA)\/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, this technique becomes fundamentally flawed when there are unaccounted sources of variability in these arrays (latent variables attributable to different biological, environmental or other factors relevant in the context). These factors distort the true picture of differential gene expression between the two tissue types and introduce spurious signals of expression heterogeneity. As a result, many genes which are actually differentially expressed are not detected, whereas many others are falsely identified as positives. Moreover, these distortions can be different for different genes. Thus, it is also not possible to get rid of these variations by simple array normalizations. This both-way error can lead to a serious loss in sensitivity and specificity, thereby causing a severe inefficiency in the underlying multiple testing problem. In this work, we attempt to identify the hidden effects of the underlying latent factors in a gene expression profiling study by partial least squares (PLS) and apply ANCOVA technique with the PLS-identified signatures of these hidden effects as covariates, in order to identify the genes that are truly differentially expressed between the two concerned tissue types.<\/jats:p><jats:p>Results: We compare the performance of our method SVA-PLS with standard ANOVA and a relatively recent technique of surrogate variable analysis (SVA), on a wide variety of simulation settings (incorporating different effects of the hidden variable, under situations with varying signal intensities and gene groupings). In all settings, our method yields the highest sensitivity while maintaining relatively reasonable values for the specificity, false discovery rate and false non-discovery rate. Application of our method to gene expression profiling for acute megakaryoblastic leukemia shows that our method detects an additional six genes, that are missed by both the standard ANOVA method as well as SVA, but may be relevant to this disease, as can be seen from mining the existing literature.<\/jats:p><jats:p>Availability: The R code for our method, SVA-PLS, is freely available on the Supplementary website http:\/\/www.somnathdatta.org\/Supp\/SVPLS\/<\/jats:p><jats:p>Contact: \u00a0s0chak10@louisville.edu; susmita.datta@louisville.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts022","type":"journal-article","created":{"date-parts":[[2012,1,12]],"date-time":"2012-01-12T03:10:30Z","timestamp":1326337830000},"page":"799-806","source":"Crossref","is-referenced-by-count":76,"title":["Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies"],"prefix":"10.1093","volume":"28","author":[{"given":"Sutirtha","family":"Chakraborty","sequence":"first","affiliation":[{"name":"Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA"}]},{"given":"Somnath","family":"Datta","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA"}]},{"given":"Susmita","family":"Datta","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,1,11]]},"reference":[{"key":"2023012512203092600_B1","article-title":"Partial least squares regression (PLS-regression)","volume-title":"Encyclopedia for Research Methods for the Social Sciences.","author":"Abdi","year":"2003"},{"key":"2023012512203092600_B2","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/sj.leu.2405071","article-title":"Re-expression of DNA methylation-silenced CD44 gene in a resistant NB4 cell line: rescue of CD44-dependent cell death by cAMP","volume":"22","author":"Abecassis","year":"2008","journal-title":"Leukemia"},{"key":"2023012512203092600_B3","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1002\/ajh.21332","article-title":"Clinical significance of Gata-1, Gata-2, EKLF, and c-MPL expression in acute myeloid leukemia","volume":"84","author":"Ayala","year":"2009","journal-title":"Am. J. Hematol."},{"key":"2023012512203092600_B4","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc."},{"key":"2023012512203092600_B5","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1001\/jama.1953.03690110032010","article-title":"Preleukemic acute human leukemia","volume":"152","author":"Block","year":"1953","journal-title":"JAMA"},{"key":"2023012512203092600_B6","doi-asserted-by":"crossref","first-page":"3339","DOI":"10.1073\/pnas.0511150103","article-title":"Identification of distinct molecular phenotypes in acute megakaryoblastic leukemia by gene expression profiling","volume":"103","author":"Bourquin","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512203092600_B7","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1038\/9518","article-title":"Ligation of the CD44 adhesion molecule reverses blockage of differentiation in human acute myeloid leukemia","volume":"5","author":"Charrad","year":"1999","journal-title":"Nat. Med."},{"key":"2023012512203092600_B8","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1182\/blood-2004-01-0338","article-title":"A decision analysis of allogeneic bone marrow transplantation for the myelodysplastic syndromes: delayed transplantation for low-risk myelodysplasia is associated with improved outcome","volume":"104","author":"Cutler","year":"2004","journal-title":"Blood"},{"key":"2023012512203092600_B9","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1038\/sj.bjc.6602387","article-title":"Increased sensitivity to TRAIL-induced apoptosis occurs during the adenoma to carcinoma transition of colorectal carcinogenesis","volume":"92","author":"Haque","year":"2005","journal-title":"Br. J. Cancer"},{"key":"2023012512203092600_B10","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/S0169-7439(01)00154-X","article-title":"Some theoretical aspects of partial least squares regression","volume":"58","author":"Helland","year":"1999","journal-title":"Chemometr. Intell. Lab. Syst."},{"key":"2023012512203092600_B11","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"A new look at the statistical model identification","volume":"19","author":"Hirotsugu","year":"1974","journal-title":"IEEE Trans. Automat. Control"},{"key":"2023012512203092600_B12","first-page":"143","article-title":"Likelihood and the Bayes procedure","volume":"166","author":"Hirotsugu","year":"1980","journal-title":"Bayesian Stat."},{"key":"2023012512203092600_B13","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.lungcan.2005.08.012","article-title":"Expression of MAGE-D4, a novel MAGE family antigen, is correlated with tumor-cell proliferation of non-small cell lung cancer","volume":"51","author":"Ito","year":"2006","journal-title":"Lung Cancer"},{"key":"2023012512203092600_B14","doi-asserted-by":"crossref","first-page":"1709","DOI":"10.1534\/genetics.107.080101","article-title":"Efficient control of population structure in model organism association mapping","volume":"178","author":"Kang","year":"2008","journal-title":"Genetics"},{"key":"2023012512203092600_B15","doi-asserted-by":"crossref","first-page":"1909","DOI":"10.1534\/genetics.108.094201","article-title":"Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots","volume":"180","author":"Kang","year":"2008","journal-title":"Genetics"},{"key":"2023012512203092600_B16","doi-asserted-by":"crossref","first-page":"8961","DOI":"10.1073\/pnas.161273698","article-title":"Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments","volume":"98","author":"Kerr","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512203092600_B17","first-page":"203","article-title":"Statistical analysis of a gene expression microarray experiment with replication","volume":"12","author":"Kerr","year":"2002","journal-title":"Stat. Sin."},{"key":"2023012512203092600_B18","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1089\/10665270050514954","article-title":"Analysis of variance for gene expression microarray data","volume":"7","author":"Kerr","year":"2000","journal-title":"J. Comput. Biol."},{"key":"2023012512203092600_B19","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1038\/386761a0","article-title":"Gatekeepers and caretakers","volume":"386","author":"Kinzler","year":"1997","journal-title":"Nature"},{"key":"2023012512203092600_B20","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1038\/ng0696-238","article-title":"BRCA2 mutations in primary breast and ovarian cancers","volume":"13","author":"Lancaster","year":"1996","journal-title":"Nat. Genet."},{"key":"2023012512203092600_B21","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet."},{"key":"2023012512203092600_B22","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-1923-9","volume-title":"Testing Statistical Hypotheses","author":"Lehmann","year":"1986","edition":"2"},{"key":"2023012512203092600_B23","doi-asserted-by":"crossref","first-page":"16465","DOI":"10.1073\/pnas.1002425107","article-title":"Correction for hidden confounders in the genetic analysis of gene expression","volume":"107","author":"Listgarten","year":"2010","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512203092600_B24","first-page":"1","article-title":"The pls package: principal component and partial least squares regression in R","volume":"18","author":"Mevik","year":"2007","journal-title":"J. Stat. Softwr."},{"key":"2023012512203092600_B25","first-page":"3789","article-title":"Evaluation of candidate genes MAP2K4, MADH4, ACVR1B, and BRCA2 in familial pancreatic cancer: deleterious BRCA2 mutations in 17","volume":"62","author":"Murphy","year":"2002","journal-title":"Cancer Res."},{"key":"2023012512203092600_B26","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1038\/sj.bjc.6604453","article-title":"Rapid progression of prostate cancer in men with a BRCA2 mutation","volume":"99","author":"Narod","year":"2008","journal-title":"Br. J. Cancer"},{"key":"2023012512203092600_B27","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1038\/ng0597-17","article-title":"Germline BRCA2 6174delT mutations in Ashkenazi Jewish pancreatic cancer patients","volume":"16","author":"\u00d6zcelik","year":"1997","journal-title":"Nat. Genet."},{"key":"2023012512203092600_B28","doi-asserted-by":"crossref","first-page":"2010","DOI":"10.1038\/sj.leu.2404849","article-title":"The multi-functional cellular adhesion molecule CD44 is regulated by the 8;21 chromosomal translocation","volume":"21","author":"Peterson","year":"2007","journal-title":"Leukemia"},{"key":"2023012512203092600_B29","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1038\/ng1847","article-title":"Principal components analysis corrects for stratification in genome-wide association studies","volume":"38","author":"Price","year":"2006","journal-title":"Nat. Genet."},{"key":"2023012512203092600_B30","first-page":"621","article-title":"TRAIL decoy receptors mediate resistance of acute myeloid leukemia cells to TRAIL","volume":"90","author":"Riccioni","year":"2005","journal-title":"Haematologica"},{"key":"2023012512203092600_B31","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1007\/11752790_2","article-title":"Overview and recent advances in partial least squares","volume-title":"Subspace, Latent Structure and Feature Selection Techniques, Lecture Notes in Computer Science.","author":"Rosipal","year":"2006"},{"key":"2023012512203092600_B32","doi-asserted-by":"crossref","first-page":"3214","DOI":"10.1182\/blood-2005-05-2013","article-title":"Characterization of 8p21.3 chromosomal deletions in B-cell lymphoma: TRAIL-R1 and TRAIL-R2 as candidate dosage-dependent tumor suppressor genes","volume":"106","author":"Rubio-Moscardo","year":"2005","journal-title":"Blood"},{"key":"2023012512203092600_B33","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1089\/cmb.2007.R009","article-title":"Compensating for unknown confounders in microarray data analysis using filtered permutations","volume":"14","author":"Scheid","year":"2007","journal-title":"J. Comput. Biol."},{"key":"2023012512203092600_B34","first-page":"411","article-title":"Accounting for non-genetic factors improves the power of eQTL studies","volume-title":"Proceedings of the 12th International Conference on Research in Computational Molecular Biology","author":"Stegle","year":"2008"},{"key":"2023012512203092600_B35","doi-asserted-by":"crossref","first-page":"e1000770","DOI":"10.1371\/journal.pcbi.1000770","article-title":"A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies","volume":"6","author":"Stegle","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023012512203092600_B36","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1177\/030089160709300310","article-title":"Beta-catenin and CD44 expression in keratoacanthoma and squamous cell carcinoma of the skin","volume":"93","author":"Tataroglu","year":"2007","journal-title":"Tumori."},{"key":"2023012512203092600_B37","doi-asserted-by":"crossref","first-page":"3226","DOI":"10.1182\/blood-2003-09-3138","article-title":"Germline mutations in BRCA2: shared genetic susceptibility to breast cancer, early onset leukemia, and Fanconi anemia","volume":"103","author":"Wagner","year":"2004","journal-title":"Blood"},{"key":"2023012512203092600_B38","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1093\/biomet\/29.3-4.350","article-title":"The significance of the difference between two means when the population variances are unequal","volume":"29","author":"Welch","year":"1938","journal-title":"Biometrika"},{"key":"2023012512203092600_B39","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/B978-0-12-103950-9.50017-4","article-title":"Path models with latent variables: the NIPALS approach","author":"Wold","year":"1975","journal-title":"Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building"},{"key":"2023012512203092600_B40","first-page":"581","article-title":"Partial least squares","volume-title":"Encyclopedia of the Statistical Sciences","author":"Wold","year":"1985"},{"key":"2023012512203092600_B41","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1089\/106652701753307520","article-title":"Assessing gene significance from cDNA microarray expression data via mixed models","volume":"8","author":"Wolfinger","year":"2001","journal-title":"J. Comput. Biol."},{"key":"2023012512203092600_B42","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/ng1702","article-title":"A unified mixed-model method for association mapping that accounts for multiple levels of relatedness","volume":"38","author":"Yu","year":"2006","journal-title":"Nat. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/6\/799\/48880733\/bioinformatics_28_6_799.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/6\/799\/48880733\/bioinformatics_28_6_799.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T18:56:28Z","timestamp":1742324188000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/6\/799\/310815"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1,11]]},"references-count":42,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2012,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts022","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,3,15]]},"published":{"date-parts":[[2012,1,11]]}}}