{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T10:41:39Z","timestamp":1781174499180,"version":"3.54.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,8,17]],"date-time":"2021-08-17T00:00:00Z","timestamp":1629158400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 Research and Innovation Programme","award":["764281"],"award-info":[{"award-number":["764281"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The lack of explainability is one of the most prominent disadvantages of deep learning applications in omics. This \u2018black box\u2019 problem can undermine the credibility and limit the practical implementation of biomedical deep learning models. Here we present XOmiVAE, a variational autoencoder (VAE)-based interpretable deep learning model for cancer classification using high-dimensional omics data. XOmiVAE is capable of revealing the contribution of each gene and latent dimension for each classification prediction and the correlation between each gene and each latent dimension. It is also demonstrated that XOmiVAE can explain not only the supervised classification but also the unsupervised clustering results from the deep learning network. To the best of our knowledge, XOmiVAE is one of the first activation level-based interpretable deep learning models explaining novel clusters generated by VAE. The explainable results generated by XOmiVAE were validated by both the performance of downstream tasks and the biomedical knowledge. In our experiments, XOmiVAE explanations of deep learning-based cancer classification and clustering aligned with current domain knowledge including biological annotation and academic literature, which shows great potential for novel biomedical knowledge discovery from deep learning models.<\/jats:p>","DOI":"10.1093\/bib\/bbab315","type":"journal-article","created":{"date-parts":[[2021,7,21]],"date-time":"2021-07-21T19:11:20Z","timestamp":1626894680000},"source":"Crossref","is-referenced-by-count":92,"title":["XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data"],"prefix":"10.1093","volume":"22","author":[{"given":"Eloise","family":"Withnell","sequence":"first","affiliation":[{"name":"Data Science Institute Imperial College London, SW7 2AZ London, UK"},{"name":"Department of Health Informatics University College London, WC1E 6BT London, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaoyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Data Science Institute Imperial College London, SW7 2AZ London, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kai","family":"Sun","sequence":"additional","affiliation":[{"name":"Data Science Institute Imperial College London, SW7 2AZ London, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yike","family":"Guo","sequence":"additional","affiliation":[{"name":"Data Science Institute Imperial College London, SW7 2AZ London, UK"},{"name":"Department of Computer Science Hong Kong Baptist University, Hong Kong China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2021,8,17]]},"reference":[{"issue":"15","key":"2021110815083969300_ref1","doi-asserted-by":"crossref","first-page":"4291","DOI":"10.1093\/bioinformatics\/btaa198","article-title":"Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data","volume":"36","author":"Angerer","year":"2020","journal-title":"Bioinformatics"},{"issue":"1","key":"2021110815083969300_ref2","doi-asserted-by":"crossref","first-page":"16526","DOI":"10.1038\/s41598-019-52937-5","article-title":"DeePathology: deep multi-task learning for inferring molecular pathology from cancer transcriptome","volume":"9","author":"Azarkhalili","year":"2019","journal-title":"Sci Rep"},{"issue":"6","key":"2021110815083969300_ref3","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/j.tig.2020.03.005","article-title":"Opening the black box: interpretable machine learning for geneticists","volume":"36","author":"Azodi","year":"2020","journal-title":"Trends Genet"},{"issue":"1","key":"2021110815083969300_ref4","doi-asserted-by":"crossref","first-page":"9790","DOI":"10.1038\/s41598-020-66166-8","article-title":"Unsupervised generative and graph representation learning for modelling cell differentiation","volume":"10","author":"Bica","year":"2020","journal-title":"Sci Rep"},{"issue":"5","key":"2021110815083969300_ref5","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1038\/nrneph.2016.46","article-title":"Evolving risks of umod variants","volume":"12","author":"Carney","year":"2016","journal-title":"Nat Rev Nephrol"},{"issue":"8","key":"2021110815083969300_ref6","doi-asserted-by":"crossref","first-page":"e71","DOI":"10.1093\/nar\/gkv1507","article-title":"TCGAbiolinks: an R\/Bioconductor package for integrative analysis of TCGA data","volume":"44","author":"Colaprico","year":"2015","journal-title":"Nucleic Acids Res"},{"issue":"90001","key":"2021110815083969300_ref7","doi-asserted-by":"crossref","first-page":"D258","DOI":"10.1093\/nar\/gkh036","article-title":"The Gene Ontology (go) database and informatics resource","volume":"32","author":"Gene Ontology Consortium","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2021110815083969300_ref8","doi-asserted-by":"crossref","DOI":"10.1101\/278739","article-title":"DeepProfile: deep learning of cancer molecular profiles for precision medicine","author":"Dincer","year":"2018"},{"key":"2021110815083969300_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1005968","article-title":"Reactome graph database: efficient access to complex pathway data","volume":"14","author":"Fabregat","year":"2018","journal-title":"PLoS Comput Biol"},{"issue":"4","key":"2021110815083969300_ref10","first-page":"46","article-title":"Interpreting neural-network connection weights","volume":"6","author":"Garson","year":"1991","journal-title":"AI Expert"},{"issue":"6","key":"2021110815083969300_ref11","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1038\/s41587-020-0546-8","article-title":"Visualizing and interpreting cancer genomics data via the Xena platform","volume":"38","author":"Goldman","year":"2020","journal-title":"Nat Biotechnol"},{"issue":"12","key":"2021110815083969300_ref12","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1056\/NEJMp1607591","article-title":"Toward a shared vision for cancer genomic data","volume":"375","author":"Grossman","year":"2016","journal-title":"N Engl J Med"},{"issue":"1","key":"2021110815083969300_ref13","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1186\/s12859-020-03836-4","article-title":"Biological interpretation of deep neural network for phenotype prediction based on gene expression","volume":"21","author":"Hanczar","year":"2020","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"2021110815083969300_ref14","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.gene.2015.12.061","article-title":"Epithelial sodium channel (ENaC) family: phylogeny, structure-function, tissue distribution, and associated inherited diseases","volume":"579","author":"Hanukoglu","year":"2016","journal-title":"Gene"},{"issue":"1","key":"2021110815083969300_ref15","doi-asserted-by":"crossref","first-page":"6265","DOI":"10.1038\/s41598-021-85285-4","article-title":"Integrated multi-omics analysis of ovarian cancer using variational autoencoders","volume":"11","author":"Hira","year":"2021","journal-title":"Sci Rep"},{"issue":"1","key":"2021110815083969300_ref16","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: Kyoto encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2021110815083969300_ref17","article-title":"Auto-encoding variational Bayes","volume-title":"International Conference on Learning Representations (ICLR)","author":"Kingma","year":"2014"},{"issue":"4","key":"2021110815083969300_ref18","doi-asserted-by":"crossref","first-page":"1033","DOI":"10.1677\/ERC-06-0001","article-title":"Significance, detection and markers of disseminated breast cancer cells","volume":"13","author":"Lacroix","year":"2006","journal-title":"Endocr Relat Cancer"},{"issue":"7553","key":"2021110815083969300_ref19","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"issue":"1","key":"2021110815083969300_ref20","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1186\/s12859-020-3465-2","article-title":"PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data","volume":"21","author":"Lemsara","year":"2020","journal-title":"BMC Bioinformatics"},{"issue":"12","key":"2021110815083969300_ref21","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol"},{"key":"2021110815083969300_ref22","first-page":"4768","article-title":"A unified approach to interpreting model predictions","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS)","author":"Lundberg","year":"2017"},{"key":"2021110815083969300_ref23","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2020"},{"issue":"4","key":"2021110815083969300_ref24","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1093\/bib\/bbv108","article-title":"Dimension reduction techniques for the integrative analysis of multi-omics data","volume":"17","author":"Meng","year":"2016","journal-title":"Brief Bioinform"},{"key":"2021110815083969300_ref25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.dsp.2017.10.011","article-title":"Methods for interpreting and understanding deep neural networks","volume":"73","author":"Montavon","year":"2018","journal-title":"Digital Signal Process"},{"key":"2021110815083969300_ref26","article-title":"On the importance of single directions for generalization","volume-title":"International Conference on Learning Representations (ICLR)","author":"Morcos","year":"2018"},{"issue":"10","key":"2021110815083969300_ref27","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The cancer genome atlas pan-cancer analysis project","volume":"45","author":"The Cancer Genome Atlas Research Network","year":"2013","journal-title":"Nat Genet"},{"issue":"1","key":"2021110815083969300_ref28","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/S0304-3800(02)00064-9","article-title":"Illuminating the \u2018black box\u2019: a randomization approach for understanding variable contributions in artificial neural networks","volume":"154","author":"Olden","year":"2002","journal-title":"Ecol Model"},{"issue":"7","key":"2021110815083969300_ref29","doi-asserted-by":"crossref","first-page":"1617","DOI":"10.1002\/ijc.28497","article-title":"Additive effect of the AZGP1, PIP, S100A8 and UBE2 molecular biomarkers improves outcome prediction in breast carcinoma","volume":"134","author":"Parris","year":"2014","journal-title":"Int J Cancer"},{"key":"2021110815083969300_ref30","first-page":"7762","article-title":"Explaining groups of points in low-dimensional representations","volume-title":"Proceedings of the 37th International Conference on Machine Learning","author":"Plumb","year":"2020"},{"issue":"W1","key":"2021110815083969300_ref31","doi-asserted-by":"crossref","first-page":"W191","DOI":"10.1093\/nar\/gkz369","article-title":"g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)","volume":"47","author":"Raudvere","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"2021110815083969300_ref32","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1038\/nbt0308-303","article-title":"What is principal component analysis?","volume":"26","author":"Ringn\u00e9r","year":"2008","journal-title":"Nat Biotechnol"},{"issue":"2","key":"2021110815083969300_ref33","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.cell.2018.03.035","article-title":"Oncogenic signaling pathways in the cancer genome atlas","volume":"173","author":"Sanchez-Vega","year":"2018","journal-title":"Cell"},{"issue":"15","key":"2021110815083969300_ref34","doi-asserted-by":"crossref","first-page":"3529","DOI":"10.1158\/1078-0432.CCR-14-2464","article-title":"Glycodelin: a new biomarker with immunomodulatory functions in non-small cell lung cancer","volume":"21","author":"Schneider","year":"2015","journal-title":"Clin Cancer Res"},{"key":"2021110815083969300_ref35","first-page":"3145","article-title":"Learning important features through propagating activation differences","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Shrikumar","year":"2017"},{"key":"2021110815083969300_ref36","article-title":"Deep inside convolutional networks: Visualising image classification models and saliency maps","author":"Simonyan","year":"2014","journal-title":"Workshop at International Conference on Learning Representations (ICLR)"},{"issue":"1","key":"2021110815083969300_ref37","doi-asserted-by":"crossref","first-page":"1.30.1","DOI":"10.1002\/cpbi.5","article-title":"The genecards suite: from gene data mining to disease genome sequence analyses","volume":"54","author":"Stelzer","year":"2016","journal-title":"Curr Protoc Bioinform"},{"issue":"43","key":"2021110815083969300_ref38","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc Natl Acad Sci"},{"key":"2021110815083969300_ref39","article-title":"Axiomatic attribution for deep networks","author":"Sundararajan","year":"2017","journal-title":"International Conference on Machine Learning (ICML)"},{"issue":"7","key":"2021110815083969300_ref40","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1038\/s42256-020-0201-6","article-title":"Deep learning decodes the principles of differential gene expression","volume":"2","author":"Tasaki","year":"2020","journal-title":"Nat Mach Intell"},{"issue":"86","key":"2021110815083969300_ref41","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"issue":"10","key":"2021110815083969300_ref42","doi-asserted-by":"crossref","DOI":"10.3390\/ijms19103028","article-title":"Role of extracellular matrix in development and cancer progression","volume":"19","author":"Walker","year":"2018","journal-title":"Int J Mol Sci"},{"key":"2021110815083969300_ref43","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1142\/9789813235533_0008","article-title":"Greene Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders","volume-title":"Biocomputing 2018","author":"Way","year":"2018"},{"issue":"9","key":"2021110815083969300_ref44","doi-asserted-by":"crossref","first-page":"1164","DOI":"10.1016\/j.rmed.2005.02.009","article-title":"Surfactant protein gene expressions for detection of lung carcinoma cells in peripheral blood","volume":"99","author":"Yamamoto","year":"2005","journal-title":"Respir Med"},{"issue":"12","key":"2021110815083969300_ref45","doi-asserted-by":"crossref","DOI":"10.3390\/cancers13123047","article-title":"OmiEmbed: a unified multi-task deep learning framework for multi-omics data","volume":"13","author":"Zhang","year":"2021","journal-title":"Cancers"},{"key":"2021110815083969300_ref46","doi-asserted-by":"crossref","first-page":"765","DOI":"10.1109\/BIBM47256.2019.8983228","article-title":"Integrated multi-omics analysis using variational autoencoders: application to pan-cancer classification","volume-title":"IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Zhang","year":"2019"},{"issue":"3","key":"2021110815083969300_ref47","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/ng0395-316","article-title":"Methylation of the mouse Xist gene in sperm and eggs correlates with imprinted xist expression and paternal x-inactivation","volume":"9","author":"Zuccotti","year":"1995","journal-title":"Nat Genet"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab315\/41089773\/bbab315.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab315\/41089773\/bbab315.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T15:24:20Z","timestamp":1636385060000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab315\/6353242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,17]]},"references-count":47,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab315","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,8,17]]},"article-number":"bbab315"}}