{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T22:59:18Z","timestamp":1780613958541,"version":"3.54.1"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2020,12,31]],"date-time":"2020-12-31T00:00:00Z","timestamp":1609372800000},"content-version":"vor","delay-in-days":30,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R35 GM 128638"],"award-info":[{"award-number":["R35 GM 128638"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 NIA AG 061132"],"award-info":[{"award-number":["R01 NIA AG 061132"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"CAREER","award":["DBI-1552309"],"award-info":[{"award-number":["DBI-1552309"]}]},{"name":"CAREER","award":["DBI-1759487"],"award-info":[{"award-number":["DBI-1759487"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,12,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Increasing number of gene expression profiles has enabled the use of complex models, such as deep unsupervised neural networks, to extract a latent space from these profiles. However, expression profiles, especially when collected in large numbers, inherently contain variations introduced by technical artifacts (e.g. batch effects) and uninteresting biological variables (e.g. age) in addition to the true signals of interest. These sources of variations, called confounders, produce embeddings that fail to transfer to different domains, i.e. an embedding learned from one dataset with a specific confounder distribution does not generalize to different distributions. To remedy this problem, we attempt to disentangle confounders from true signals to generate biologically informative embeddings.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this article, we introduce the Adversarial Deconfounding AutoEncoder (AD-AE) approach to deconfounding gene expression latent spaces. The AD-AE model consists of two neural networks: (i) an autoencoder to generate an embedding that can reconstruct original measurements, and (ii) an adversary trained to predict the confounder from that embedding. We jointly train the networks to generate embeddings that can encode as much information as possible without encoding any confounding signal. By applying AD-AE to two distinct gene expression datasets, we show that our model can (i) generate embeddings that do not encode confounder information, (ii) conserve the biological signals present in the original space and (iii) generalize successfully across different confounder domains. We demonstrate that AD-AE outperforms standard autoencoder and other deconfounding approaches.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Our code and data are available at https:\/\/gitlab.cs.washington.edu\/abdincer\/ad-ae.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Contact<\/jats:title>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa796","type":"journal-article","created":{"date-parts":[[2020,9,11]],"date-time":"2020-09-11T15:12:45Z","timestamp":1599837165000},"page":"i573-i582","source":"Crossref","is-referenced-by-count":61,"title":["Adversarial deconfounding autoencoder for learning robust gene expression embeddings"],"prefix":"10.1093","volume":"36","author":[{"given":"Ayse B","family":"Dincer","sequence":"first","affiliation":[{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington , Seattle, WA 98195, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joseph D","family":"Janizek","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington , Seattle, WA 98195, USA"},{"name":"Medical Scientist Training Program, University of Washington , Seattle, WA 98195, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Su-In","family":"Lee","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science & Engineering, University of Washington , Seattle, WA 98195, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2020,12,29]]},"reference":[{"key":"2023062504245469600_btaa796-B1","doi-asserted-by":"crossref","first-page":"1139","DOI":"10.1038\/s41592-019-0576-7","article-title":"Exploring single-cell data with deep multitasking neural networks","volume":"16","author":"Amodio","year":"2019","journal-title":"Nat. Methods"},{"key":"2023062504245469600_btaa796-B2","author":"Amodio","year":"2018"},{"key":"2023062504245469600_btaa796-B3","author":"Arthur","year":"2006"},{"key":"2023062504245469600_btaa796-B4","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1093\/bioinformatics\/btg385","article-title":"Adjustment of systematic microarray data biases","volume":"20","author":"Benito","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062504245469600_btaa796-B5","doi-asserted-by":"crossref","first-page":"e49","DOI":"10.1093\/bioinformatics\/btl242","article-title":"Integrating structured biological data by Kernel Maximum Mean Discrepancy","volume":"22","author":"Borgwardt","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062504245469600_btaa796-B6","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1056\/NEJMoa1402121","article-title":"Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas","volume":"372","author":"Brat","year":"2015","journal-title":"N. Engl. J. Med"},{"key":"2023062504245469600_btaa796-B7","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1016\/j.cell.2013.09.034","article-title":"The somatic genomic landscape of glioblastoma","volume":"155","author":"Brennan","year":"2013","journal-title":"Cell"},{"key":"2023062504245469600_btaa796-B8","doi-asserted-by":"crossref","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep learning\u2013based multi-omics integration robustly predicts survival in liver cancer","volume":"24","author":"Chaudhary","year":"2018","journal-title":"Clin. Cancer Res"},{"key":"2023062504245469600_btaa796-B9","author":"Dayton","year":"2019"},{"key":"2023062504245469600_btaa796-B10","first-page":"278739","author":"Dincer","year":"2018"},{"key":"2023062504245469600_btaa796-B11","doi-asserted-by":"crossref","DOI":"10.1186\/s12864-018-5370-x","article-title":"Gene2vec: distributed representation of genes based on co-expression","volume":"20","author":"Du","year":"2019","journal-title":"BMC Genomics"},{"key":"2023062504245469600_btaa796-B12","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1093\/nar\/30.1.207","article-title":"Gene expression omnibus: NCBI gene expression and hybridization array data repository","volume":"30","author":"Edgar","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023062504245469600_btaa796-B13","author":"Erion","year":"2019"},{"key":"2023062504245469600_btaa796-B14","first-page":"1","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin","year":"2016","journal-title":"J. Mach. Learn. Res"},{"key":"2023062504245469600_btaa796-B15","doi-asserted-by":"crossref","first-page":"R47","DOI":"10.1186\/gb-2014-15-3-r47","article-title":"Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines","volume":"15","author":"Geeleher","year":"2014","journal-title":"Genome Biol"},{"key":"2023062504245469600_btaa796-B16","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023062504245469600_btaa796-B17","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1007\/s10549-009-0674-9","article-title":"An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients","volume":"123","author":"Gy\u00f6rffy","year":"2010","journal-title":"Breast Cancer Res. Treat"},{"key":"2023062504245469600_btaa796-B18","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/nature12831","article-title":"Inconsistency in large pharmacogenomic studies","volume":"504","author":"Haibe-Kains","year":"2013","journal-title":"Nature"},{"key":"2023062504245469600_btaa796-B19","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the dimensionality of data with neural networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"key":"2023062504245469600_btaa796-B35","doi-asserted-by":"crossref","first-page":"4121","DOI":"10.1109\/ICCV.2015.469","article-title":"Unsupervised domain adaptation with imbalanced cross-domain data","author":"Hsu","year":"2015","journal-title":"Proceedings of IEEE International Conference on Computer Vision (ICCV)"},{"key":"2023062504245469600_btaa796-B20","first-page":"69","author":"Janizek","year":"2020"},{"key":"2023062504245469600_btaa796-B21","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1093\/biostatistics\/kxj037","article-title":"Adjusting batch effects in microarray expression data using empirical Bayes methods","volume":"8","author":"Johnson","year":"2007","journal-title":"Biostatistics"},{"key":"2023062504245469600_btaa796-B22","author":"Kingma","year":"2014"},{"key":"2023062504245469600_btaa796-B23","author":"Kingma","year":"2013"},{"key":"2023062504245469600_btaa796-B24","first-page":"4669","article-title":"Estrogen receptor as an independent prognostic factor for early recurrence in breast cancer","volume":"37","author":"Knight","year":"1977","journal-title":"Cancer Res"},{"key":"2023062504245469600_btaa796-B25","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1093\/bib\/bbs037","article-title":"Batch effect removal methods for microarray gene expression data integration: a survey","volume":"14","author":"Lazar","year":"2013","journal-title":"Brief. Bioinf"},{"key":"2023062504245469600_btaa796-B26","doi-asserted-by":"crossref","first-page":"e161","DOI":"10.1371\/journal.pgen.0030161","article-title":"Capturing heterogeneity in gene expression studies by surrogate variable analysis","volume":"3","author":"Leek","year":"2007","journal-title":"PLoS Genet"},{"key":"2023062504245469600_btaa796-B27","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1073\/pnas.98.1.31","article-title":"Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection","volume":"98","author":"Li","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062504245469600_btaa796-B28","doi-asserted-by":"crossref","first-page":"255","DOI":"10.2307\/2532051","article-title":"A concordance correlation coefficient to evaluate reproducibility","volume":"45","author":"Lin","year":"1989","journal-title":"Biometrics"},{"key":"2023062504245469600_btaa796-B29","author":"Louizos","year":"2015"},{"key":"2023062504245469600_btaa796-B30","unstructured":"Louppe,G et al (2017) Learning to pivot with adversarial networks. In Proceedings of the\u00a031st\u00a0International\u00a0Conference\u00a0on\u00a0Advances in Neural Information Processing Systems, pp. 981\u2013990.\u00a0Curran\u00a0Associates\u00a0Inc.,Red\u00a0Hook, NY, USA."},{"key":"2023062504245469600_btaa796-B31","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1038\/tpj.2010.57","article-title":"A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data","volume":"10","author":"Luo","year":"2010","journal-title":"Pharmacogenomics J"},{"key":"2023062504245469600_btaa796-B32","first-page":"89","author":"Lyu","year":"2018"},{"key":"2023062504245469600_btaa796-B33","author":"McInnes","year":"2018"},{"key":"2023062504245469600_btaa796-B34","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature07385","article-title":"Comprehensive genomic characterization defines human glioblastoma genes and core pathways","volume":"455","author":"McLendon","year":"2008","journal-title":"Nature"},{"key":"2023062504245469600_btaa796-B36","doi-asserted-by":"crossref","first-page":"2757","DOI":"10.1093\/bioinformatics\/btu375","article-title":"Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction","volume":"30","author":"Parker","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062504245469600_btaa796-B37","doi-asserted-by":"crossref","first-page":"1538","DOI":"10.1093\/bioinformatics\/btx806","article-title":"DeepSynergy: predicting anti-cancer drug synergy with Deep Learning","volume":"34","author":"Preuer","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062504245469600_btaa796-B38","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/bcr2607","article-title":"Breast cancer prognostic classification in the molecular era: the role of histological grade","volume":"12","author":"Rakha","year":"2010","journal-title":"Breast Cancer Res"},{"key":"2023062504245469600_btaa796-B39","first-page":"557","article-title":"Learning module networks","volume":"6","author":"Segal","year":"2005","journal-title":"J. Mach. Learn. Res"},{"key":"2023062504245469600_btaa796-B40","author":"Shaham","year":"2018"},{"key":"2023062504245469600_btaa796-B41","doi-asserted-by":"crossref","first-page":"2539","DOI":"10.1093\/bioinformatics\/btx196","article-title":"Removal of batch effects using distribution-matching residual networks","volume":"33","author":"Shaham","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062504245469600_btaa796-B42","doi-asserted-by":"crossref","first-page":"822","DOI":"10.1038\/nm.1790","article-title":"Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study","volume":"14","author":"Shedden","year":"2008","journal-title":"Nat. Med"},{"key":"2023062504245469600_btaa796-B43","article-title":"The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets\u2013improving meta-analysis and prediction of prognosis","volume":"1, 42","author":"Sims","year":"2008","journal-title":"BMC Med. Genomics"},{"key":"2023062504245469600_btaa796-B44","doi-asserted-by":"crossref","DOI":"10.23915\/distill.00022","article-title":"Visualizing the impact of feature attribution baselines","author":"Sturmfels","year":"2020","journal-title":"Distill"},{"key":"2023062504245469600_btaa796-B45","first-page":"e00025","article-title":"ADAGE-based integration of publicly available Pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions","volume":"1","author":"et","year":"2016","journal-title":"mSystems"},{"key":"2023062504245469600_btaa796-B46","first-page":"41","author":"Tang","year":"2001"},{"key":"2023062504245469600_btaa796-B47","doi-asserted-by":"crossref","first-page":"R157","DOI":"10.1186\/gb-2007-8-8-r157","article-title":"An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer","volume":"8","author":"Teschendorff","year":"2007","journal-title":"Genome Biol"},{"key":"2023062504245469600_btaa796-B48","doi-asserted-by":"crossref","first-page":"1496","DOI":"10.1093\/bioinformatics\/btr171","article-title":"Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies","volume":"27","author":"Teschendorff","year":"2011","journal-title":"Bioinformatics"},{"key":"2023062504245469600_btaa796-B49","author":"Upadhyay","year":"2019"},{"key":"2023062504245469600_btaa796-B50","first-page":"1096","author":"Vincent","year":"2008"},{"key":"2023062504245469600_btaa796-B51","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1186\/s13059-020-02021-3","article-title":"Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations","volume":"21","author":"Way","year":"2020","journal-title":"Genome Biol"},{"key":"2023062504245469600_btaa796-B52","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/0169-7439(87)80084-9","article-title":"Principal component analysis","volume":"2","author":"Wold","year":"1987","journal-title":"Chemometr. Intell. Lab. Syst"},{"key":"2023062504245469600_btaa796-B53","first-page":"325","author":"Zemel","year":"2013"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_2\/i573\/50693499\/btaa796.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_2\/i573\/50693499\/btaa796.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,25]],"date-time":"2023-06-25T00:26:36Z","timestamp":1687652796000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_2\/i573\/6055930"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":53,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2020,12,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa796","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.04.28.065052","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,12]]},"published":{"date-parts":[[2020,12]]}}}