{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T20:18:27Z","timestamp":1775852307313,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T00:00:00Z","timestamp":1729036800000},"content-version":"vor","delay-in-days":23,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Marie Sklodowska-Curie","award":["860895"],"award-info":[{"award-number":["860895"]}]},{"DOI":"10.13039\/501100001826","name":"The Netherlands Organisation for Health Research and Development","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001826","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Personalized Medicine in Infections: from Systems Biomedicine and Immunometabolism to Precision Diagnosis and Stratification Permitting Individualized Therapies","award":["456008002"],"award-info":[{"award-number":["456008002"]}]},{"name":"PerMed Joint Transnational call JTC 2018"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Unsupervised learning, particularly clustering, plays a pivotal role in disease subtyping and patient stratification, especially with the abundance of large-scale multi-omics data. Deep learning models, such as variational autoencoders (VAEs), can enhance clustering algorithms by leveraging inter-individual heterogeneity. However, the impact of confounders\u2014external factors unrelated to the condition, e.g. batch effect or age\u2014on clustering is often overlooked, introducing bias and spurious biological conclusions. In this work, we introduce four novel VAE-based deconfounding frameworks tailored for clustering multi-omics data. These frameworks effectively mitigate confounding effects while preserving genuine biological patterns. The deconfounding strategies employed include (i) removal of latent features correlated with confounders, (ii) a conditional VAE, (iii) adversarial training, and (iv) adding a regularization term to the loss function. Using real-life multi-omics data from The Cancer Genome Atlas, we simulated various confounding effects (linear, nonlinear, categorical, mixed) and assessed model performance across 50 repetitions based on reconstruction error, clustering stability, and deconfounding efficacy. Our results demonstrate that our novel models, particularly the conditional multi-omics VAE (cXVAE), successfully handle simulated confounding effects and recover biologically driven clustering structures. cXVAE accurately identifies patient labels and unveils meaningful pathological associations among cancer types, validating deconfounded representations. Furthermore, our study suggests that some of the proposed strategies, such as adversarial training, prove insufficient in confounder removal. In summary, our study contributes by proposing innovative frameworks for simultaneous multi-omics data integration, dimensionality reduction, and deconfounding in clustering. Benchmarking on open-access data offers guidance to end-users, facilitating meaningful patient stratification for optimized precision medicine.<\/jats:p>","DOI":"10.1093\/bib\/bbae512","type":"journal-article","created":{"date-parts":[[2024,10,11]],"date-time":"2024-10-11T15:20:07Z","timestamp":1728660007000},"source":"Crossref","is-referenced-by-count":14,"title":["Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping"],"prefix":"10.1093","volume":"25","author":[{"given":"Zuqi","family":"Li","sequence":"first","affiliation":[{"name":"BIO3 - Laboratory for Systems Medicine , Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven,","place":["Belgium"]},{"name":"Medical Imaging Research Center , University Hospitals Leuven, Herestraat 49, 3000 Leuven,","place":["Belgium"]},{"name":"BIO3 - Laboratory for Systems Genetics , GIGA Molecular & Computational Biology, University of Li\u00e8ge, Avenue de l'H\u00f4pital 11, 4000 Li\u00e8ge,","place":["Belgium"]}]},{"given":"Sonja","family":"Katz","sequence":"additional","affiliation":[{"name":"Department of Radiology and Nuclear Medicine , Erasmus MC, Dr. Molewaterplein 40, 3015 GD Rotterdam,","place":["Netherlands"]},{"name":"Laboratory of Systems and Synthetic Biology , Wageningen University & Research, PO Box 8033, 6700 EJ Wageningen,","place":["Netherlands"]},{"name":"LifeGlimmer GmbH , Markelstra\u00dfe 38, 12163 Berlin,","place":["Germany"]}]},{"given":"Edoardo","family":"Saccenti","sequence":"additional","affiliation":[{"name":"Laboratory of Systems and Synthetic Biology , Wageningen University & Research, PO Box 8033, 6700 EJ Wageningen,","place":["Netherlands"]}]},{"given":"David W","family":"Fardo","sequence":"additional","affiliation":[{"name":"Department of Biostatistics , University of Kentucky, 111 Washington Avenue, Lexington, KY 40536,","place":["United States"]},{"name":"Sanders-Brown Center on Aging , University of Kentucky, 789 S Limestone, Lexington, KY 40536,","place":["United States"]}]},{"given":"Peter","family":"Claes","sequence":"additional","affiliation":[{"name":"Medical Imaging Research Center , University Hospitals Leuven, Herestraat 49, 3000 Leuven,","place":["Belgium"]},{"name":"Department of Human Genetics , KU Leuven, Herestraat 49, 3000 Leuven,","place":["Belgium"]},{"name":"Department of Electrical Engineering , ESAT-PSI, KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven,","place":["Belgium"]}]},{"given":"Vitor A P","family":"Martins dos Santos","sequence":"additional","affiliation":[{"name":"LifeGlimmer GmbH , Markelstra\u00dfe 38, 12163 Berlin,","place":["Germany"]},{"name":"Laboratory of Bioprocess Engineering , WageningenUniversity & Research, PO Box 16, 6700 AA Wageningen, the","place":["Netherlands"]}]},{"given":"Kristel","family":"Van Steen","sequence":"additional","affiliation":[{"name":"BIO3 - Laboratory for Systems Medicine , Department of Human Genetics, KU Leuven, Herestraat 49, 3000 Leuven,","place":["Belgium"]},{"name":"BIO3 - Laboratory for Systems Genetics , GIGA Molecular & Computational Biology, University of Li\u00e8ge, Avenue de l'H\u00f4pital 11, 4000 Li\u00e8ge,","place":["Belgium"]}]},{"given":"Gennady V","family":"Roshchupkin","sequence":"additional","affiliation":[{"name":"Medical Imaging Research Center , University Hospitals Leuven, Herestraat 49, 3000 Leuven,","place":["Belgium"]},{"name":"Department of Epidemiology , Erasmus MC, Dr. Molewaterplein 40, 3015 GD Rotterdam,","place":["Netherlands"]}]}],"member":"286","published-online":{"date-parts":[[2024,10,16]]},"reference":[{"key":"2024101622180509600_ref1","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2019.01205","article-title":"Variational autoencoders for cancer data integration: Design principles and computational practice","volume":"10","author":"Simidjievski","journal-title":"Front Genet"},{"key":"2024101622180509600_ref2","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1016\/j.tibtech.2017.02.012","article-title":"Why batch effects matter in omics data, and how to avoid them","volume":"35","author":"Goh","year":"2017","journal-title":"Trends Biotechnol"},{"key":"2024101622180509600_ref3","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1007\/978-1-4939-9744-2_16","article-title":"Review of batch effects prevention, diagnostics, and correction approaches","volume-title":"Mass Spectrometry Data Analysis in Proteomics, Methods in Molecular Biology","author":"\u010cuklina","year":"2020"},{"key":"2024101622180509600_ref4","first-page":"79","article-title":"How to control confounding effects by statistical analysis","volume":"5","author":"Pourhoseingholi","year":"2012","journal-title":"Gastroenterol Hepatol Bed Bench"},{"key":"2024101622180509600_ref5","doi-asserted-by":"publisher","first-page":"2436","DOI":"10.1038\/s41467-023-38125-0","article-title":"Cross-modal autoencoder framework learns holistic representations of cardiovascular state","volume":"14","author":"Radhakrishnan","year":"2023","journal-title":"Nat Commun"},{"key":"2024101622180509600_ref6","first-page":"430","article-title":"Conditional VAEs for confound removal and normative modelling of neurodegenerative diseases","volume-title":"Medical Image Computing and Computer Assisted Intervention\u2014MICCAI 2022, Lecture Notes in Computer Science","author":"Lawry Aguila","year":"2022"},{"key":"2024101622180509600_ref7","doi-asserted-by":"publisher","first-page":"i573","DOI":"10.1093\/bioinformatics\/btaa796","article-title":"Adversarial deconfounding autoencoder for learning robust gene expression embeddings","volume":"36","author":"Dincer","year":"2020","journal-title":"Bioinformatics"},{"key":"2024101622180509600_ref8","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1093\/bioinformatics\/btaa976","article-title":"Deep feature extraction of single-cell transcriptomes by generative adversarial network","volume":"37","author":"Bahrami","year":"2021","journal-title":"Bioinformatics"},{"key":"2024101622180509600_ref9","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-87240-3_78","article-title":"Projection-wise disentangling for fair and interpretable representation learning: Application to 3d facial shape analysis","volume-title":"Medical Image Computing and Computer Assisted Intervention \u2013 MICCAI 2021, Strasbourg, France, 2021. Proceedings, Part V 24. Springer International Publishing, New York City, United States.","author":"Liu"},{"key":"2024101622180509600_ref10","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1038\/s41514-022-00085-y","article-title":"A pan-tissue DNA-methylation epigenetic clock based on deep learning","volume":"8","author":"de Lima Camillo","year":"2022","journal-title":"npj Aging"},{"key":"2024101622180509600_ref11","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2021.772298","article-title":"Evaluation of epigenetic age based on dna methylation analysis of several cpg sites in ukrainian population","volume":"12","author":"Kuzub","year":"2022","journal-title":"Front Genet"},{"key":"2024101622180509600_ref12","doi-asserted-by":"publisher","first-page":"k134","DOI":"10.1136\/bmj.k134","article-title":"Cancer risk associated with chronic diseases and disease markers: Prospective cohort study","volume":"360","author":"Tu","year":"2018","journal-title":"BMJ"},{"key":"2024101622180509600_ref13","doi-asserted-by":"publisher","first-page":"817","DOI":"10.1093\/ije\/dyab274","article-title":"Circulating proteins and risk of pancreatic cancer: A case-subcohort study among Chinese adults","volume":"51","author":"Kartsonaki","year":"2022","journal-title":"Int J Epidemiol"},{"key":"2024101622180509600_ref14","doi-asserted-by":"publisher","first-page":"3841","DOI":"10.1002\/cncr.25936","article-title":"Body mass index and risk of colorectal cancer in chinese singaporeans: The Singapore chinese health study","volume":"117","author":"Odegaard","year":"2011","journal-title":"Cancer"},{"key":"2024101622180509600_ref15","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The cancer genome atlas pan-cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat Genet"},{"key":"2024101622180509600_ref16","doi-asserted-by":"publisher","first-page":"e71","DOI":"10.1093\/nar\/gkv1507","article-title":"TCGAbiolinks: An R\/Bioconductor package for integrative analysis of TCGA data","volume":"44","author":"Colaprico","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024101622180509600_ref17","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1038\/s41467-017-00289-x","article-title":"Pan-urologic cancer genomic subtypes that transcend tissue of origin","volume":"8","author":"Chen","year":"2017","journal-title":"Nat Commun"},{"key":"2024101622180509600_ref18","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1016\/j.cell.2018.03.022","article-title":"Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer","volume":"173","author":"Hoadley","year":"2018","journal-title":"Cell"},{"key":"2024101622180509600_ref19","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1158\/2159-8290.CD-12-0095","article-title":"The cbio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data","volume":"2","author":"Cerami","year":"2012","journal-title":"Cancer Discov"},{"key":"2024101622180509600_ref20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41514-022-00085-y","article-title":"A pan-tissue DNA-methylation epigenetic clock based on deep learning","volume":"8","author":"de Lima Camillo","year":"2022","journal-title":"npj Aging"},{"key":"2024101622180509600_ref21","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1007\/s10654-016-0155-5","article-title":"How much do tumor stage and treatment explain socioeconomic inequalities in breast cancer survival? Applying causal mediation analysis to population-based data","volume":"31","author":"Li","year":"2016","journal-title":"Eur J Epidemiol"},{"key":"2024101622180509600_ref22","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1002\/cam4.1277","article-title":"Disparities in cancer outcomes across age, sex, and race\/ethnicity among patients with pancreatic cancer","volume":"7","author":"Nipp","year":"2018","journal-title":"Cancer Med"},{"key":"2024101622180509600_ref23","article-title":"Learning structured output representation using deep conditional generative models","volume-title":"Advances in Neural Information Processing Systems","author":"Sohn","year":"2015"},{"key":"2024101622180509600_ref24","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1016\/j.patcog.2018.12.015","article-title":"Autoencoder node saliency: Selecting relevant latent representations","volume":"88","author":"Fan","year":"2019","journal-title":"Pattern Recognit"},{"key":"2024101622180509600_ref25","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.4236","article-title":"Sc3: Consensus clustering of single-cell rna-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"key":"2024101622180509600_ref26","article-title":"PyTorch lightning","author":"Falcon","year":"2019"},{"key":"2024101622180509600_ref27","doi-asserted-by":"publisher","first-page":"2796","DOI":"10.1016\/S0140-6736(16)30512-8","article-title":"Bladder cancer","volume":"388","author":"Kamat","year":"2016","journal-title":"Lancet"},{"key":"2024101622180509600_ref28","first-page":"2785","article-title":"Cancer tissue classification, associated therapeutic implications and PDT as an alternative","volume":"37","author":"Horne","year":"2017","journal-title":"Anticancer Res"},{"key":"2024101622180509600_ref29","first-page":"2021","article-title":"Multi-omics and deep learning provide a multifaceted view of cancer","author":"Uyar","year":"2021","journal-title":"bioRxiv"},{"key":"2024101622180509600_ref30","doi-asserted-by":"publisher","first-page":"8341","DOI":"10.1038\/s41598-020-65119-5","article-title":"Multi-omic signatures identify pan-cancer classes of tumors beyond tissue of origin","volume":"10","author":"Gonz\u00e1lez-Reym\u00fandez","year":"2020","journal-title":"Sci Rep"},{"key":"2024101622180509600_ref31","doi-asserted-by":"crossref","first-page":"765","DOI":"10.1109\/BIBM47256.2019.8983228","article-title":"Integrated multi-omics analysis using variational autoencoders: Application to pan-cancer classification","volume-title":"2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Zhang","year":"2019"},{"key":"2024101622180509600_ref32","first-page":"1","article-title":"Normative Modeling via conditional Variational autoencoder and adversarial learning to identify brain dysfunction in Alzheimer\u2019s disease","volume-title":"2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI)","author":"Wang","year":"2023"},{"key":"2024101622180509600_ref33","doi-asserted-by":"publisher","first-page":"e1009826","DOI":"10.1371\/journal.pcbi.1009826","article-title":"AIME: Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments","volume":"18","author":"Yu","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2024101622180509600_ref34","first-page":"2513","article-title":"Representation learning with statistical independence to mitigate bias","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Adeli","year":"2021"},{"key":"2024101622180509600_ref35","doi-asserted-by":"publisher","first-page":"giac014","DOI":"10.1093\/gigascience\/giac014","article-title":"How to remove or control confounds in predictive models, with applications to brain biomarkers","volume":"11","author":"Chyzhyk","year":"2022","journal-title":"GigaScience"},{"key":"2024101622180509600_ref36","doi-asserted-by":"publisher","first-page":"879","DOI":"10.1038\/s42256-022-00541-0","article-title":"A context-aware deconfounding autoencoder for robust prediction of personalized clinical drug response from cell-line compound screening","volume":"4","author":"He","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2024101622180509600_ref37","doi-asserted-by":"publisher","first-page":"e0210236","DOI":"10.1371\/journal.pone.0210236","article-title":"Clustering algorithms: A comparative approach","volume":"14","author":"Rodriguez","year":"2019","journal-title":"PloS One"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae512\/59812686\/bbae512.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae512\/59812686\/bbae512.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T18:18:31Z","timestamp":1729102711000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae512\/7824239"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,23]]},"references-count":37,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,9,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae512","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.02.05.578873","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,9,23]]},"article-number":"bbae512"}}