{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T04:23:45Z","timestamp":1780633425722,"version":"3.54.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1011198","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2024,7,16]],"date-time":"2024-07-16T00:00:00Z","timestamp":1721088000000}}],"reference-count":52,"publisher":"Public Library of Science (PLoS)","issue":"7","license":[{"start":{"date-parts":[[2024,7,3]],"date-time":"2024-07-03T00:00:00Z","timestamp":1719964800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Federal Ministry of Education and Research","doi-asserted-by":"crossref","award":["82DZL002A1"],"award-info":[{"award-number":["82DZL002A1"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Interpreting transcriptome data is an important yet challenging aspect of bioinformatic analysis. While gene set enrichment analysis is a standard tool for interpreting regulatory changes, we utilize deep learning techniques, specifically autoencoder architectures, to learn latent variables that drive transcriptome signals. We investigate whether simple, variational autoencoder (VAE), and beta-weighted VAE are capable of learning reduced representations of transcriptomes that retain critical biological information. We propose a novel VAE that utilizes priors from biological data to direct the network to learn a representation of the transcriptome that is based on understandable biological concepts. After benchmarking five different autoencoder architectures, we found that each succeeded in reducing the transcriptomes to 50 latent dimensions, which captured enough variation for accurate reconstruction. The simple, fully connected autoencoder, performs best across the benchmarks, but lacks the characteristic of having directly interpretable latent dimensions. The beta-weighted, prior-informed VAE implementation is able to solve the benchmarking tasks, and provide semantically accurate latent features equating to biological pathways. This study opens a new direction for differential pathway analysis in transcriptomics with increased transparency and interpretability.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1011198","type":"journal-article","created":{"date-parts":[[2024,7,3]],"date-time":"2024-07-03T13:37:12Z","timestamp":1720013832000},"page":"e1011198","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":2,"title":["A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data"],"prefix":"10.1371","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4311-6698","authenticated-orcid":true,"given":"Bin","family":"Liu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bodo","family":"Rosenhahn","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thomas","family":"Illig","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0141-9116","authenticated-orcid":true,"given":"David S.","family":"DeLuca","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"340","published-online":{"date-parts":[[2024,7,3]]},"reference":[{"issue":"11","key":"pcbi.1011198.ref001","doi-asserted-by":"crossref","first-page":"e14222","DOI":"10.1111\/and.14222","article-title":"Transcriptome analysis of human Leydig cell tumours reveals potential mechanisms underlying its development","volume":"53","author":"M. Kotula-Balak","year":"2021","journal-title":"Andrologia"},{"issue":"2","key":"pcbi.1011198.ref002","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1111\/srt.13135","article-title":"Minimally invasive skin sampling and transcriptome analysis using microneedles for skin type biomarker research","volume":"28","author":"S. H. Kim","year":"2022","journal-title":"Skin Research and Technology"},{"issue":"10","key":"pcbi.1011198.ref003","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.18632\/oncotarget.27954","article-title":"Transcriptome analyses of urine RNA reveal tumor markers for human bladder cancer: Validated amplicons for RT-qPCR-based detection","volume":"12","author":"J. Dubois","year":"2021","journal-title":"Oncotarget"},{"issue":"6509","key":"pcbi.1011198.ref004","doi-asserted-by":"crossref","first-page":"1318","DOI":"10.1126\/science.aaz1776","article-title":"The GTEx Consortium atlas of genetic regulatory effects across human tissues","volume":"369","author":"Consortium G.","year":"2020","journal-title":"Science"},{"issue":"suppl_1","key":"pcbi.1011198.ref005","doi-asserted-by":"crossref","first-page":"D747","DOI":"10.1093\/nar\/gkl995","article-title":"ArrayExpress\u2014a public database of microarray experiments and gene expression profiles","volume":"35","author":"H. Parkinson","year":"2007","journal-title":"Nucleic acids research"},{"issue":"D1","key":"pcbi.1011198.ref006","doi-asserted-by":"crossref","first-page":"D991","DOI":"10.1093\/nar\/gks1193","article-title":"NCBI GEO: archive for functional genomics data sets\u2014update","volume":"41","author":"T. Barrett","journal-title":"Nucleic Acids Research"},{"issue":"43","key":"pcbi.1011198.ref007","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"A. Subramanian","year":"2005","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"pcbi.1011198.ref008","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1007\/0-387-29362-0_23","volume-title":"Bioinformatics and computational biology solutions using R and Bioconductor","author":"G. K. Smyth","year":"2005"},{"issue":"7","key":"pcbi.1011198.ref009","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"M. E. Ritchie","year":"2015","journal-title":"Nucleic acids research"},{"issue":"12","key":"pcbi.1011198.ref010","first-page":"1","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"M. I. Love","year":"2014","journal-title":"Genome biology"},{"issue":"5","key":"pcbi.1011198.ref011","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"R. Satija","year":"2015","journal-title":"Nature biotechnology"},{"issue":"1","key":"pcbi.1011198.ref012","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-20966-2","article-title":"Deep convolutional neural networks to predict cardiovascular risk from computed tomography","volume":"12","author":"R. Zeleznik","year":"2021","journal-title":"Nature communications"},{"key":"pcbi.1011198.ref013","doi-asserted-by":"crossref","unstructured":"Yao, D., Zhi-li, Z., Xiao-feng, Z., Wei, C., Fang, H., Yao-ming, C., and Cai, W.-W. (2022) Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification. Defence Technology,.","DOI":"10.1016\/j.dt.2022.02.007"},{"key":"pcbi.1011198.ref014","first-page":"1","article-title":"Medical image-based detection of COVID-19 using deep convolution neural networks","author":"L. Gaur","year":"2021","journal-title":"Multimedia systems"},{"issue":"1","key":"pcbi.1011198.ref015","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-23952-w","article-title":"Correlator convolutional neural networks as an interpretable architecture for image-like quantum matter data","volume":"12","author":"C. Miles","year":"2021","journal-title":"Nature Communications"},{"key":"pcbi.1011198.ref016","article-title":"Wavelength-based attributed deep neural network for underwater image restoration","author":"P. K. Sharma","year":"2021","journal-title":"ACM Journal of the ACM (JACM)"},{"issue":"7","key":"pcbi.1011198.ref017","doi-asserted-by":"crossref","first-page":"2524","DOI":"10.1021\/acs.molpharmaceut.6b00248","article-title":"Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data","volume":"13","author":"A. Aliper","year":"2016","journal-title":"Molecular pharmaceutics"},{"issue":"1","key":"pcbi.1011198.ref018","first-page":"1","article-title":"Novel deep learning-based transcriptome data analysis for drug-drug interaction prediction with an application in diabetes","volume":"22","author":"Q. Luo","year":"2021","journal-title":"BMC bioinformatics"},{"issue":"1","key":"pcbi.1011198.ref019","first-page":"1","article-title":"A deep learning model to classify neoplastic state and tissue origin from transcriptomic data","volume":"12","author":"J. Hong","year":"2022","journal-title":"Scientific reports"},{"issue":"8","key":"pcbi.1011198.ref020","first-page":"45","article-title":"GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization","volume":"12","author":"H.-I. H. Chen","year":"2018","journal-title":"BMC systems biology"},{"key":"pcbi.1011198.ref021","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.neucom.2013.09.055","article-title":"Autoencoder for words","volume":"139","author":"C.-Y. Liou","year":"2014","journal-title":"Neurocomputing"},{"key":"pcbi.1011198.ref022","doi-asserted-by":"crossref","unstructured":"Way, G. P. and Greene, C. S. (2018) Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. In PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018: Proceedings of the Pacific Symposium World Scientific pp. 80\u201391.","DOI":"10.1142\/9789813235533_0008"},{"issue":"12","key":"pcbi.1011198.ref023","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"R. Lopez","year":"2018","journal-title":"Nature Methods"},{"issue":"1","key":"pcbi.1011198.ref024","doi-asserted-by":"crossref","first-page":"2002","DOI":"10.1038\/s41467-018-04368-5","article-title":"Interpretable dimensionality reduction of single cell transcriptome data with deep generative models","volume":"9","author":"J. Ding","year":"2018","journal-title":"Nature Communications"},{"issue":"1","key":"pcbi.1011198.ref025","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1038\/s41467-021-25534-2","article-title":"Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data","volume":"12","author":"Y. Zhao","year":"2021","journal-title":"Nature Communications"},{"issue":"2","key":"pcbi.1011198.ref026","first-page":"337","article-title":"Biologically informed deep learning to query gene programs in single-cell atlases","volume":"25","author":"M. Lotfollahi","year":"2023","journal-title":"Nature Cell Biology"},{"key":"pcbi.1011198.ref027","unstructured":"Doersch, C. (2016) Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908,."},{"key":"pcbi.1011198.ref028","unstructured":"Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2016) beta-vae: Learning basic visual concepts with a constrained variational framework."},{"key":"pcbi.1011198.ref029","doi-asserted-by":"crossref","unstructured":"Rumelhart, D. E., Hinton, G. E., and Williams, R. J., Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science (1985).","DOI":"10.21236\/ADA164453"},{"key":"pcbi.1011198.ref030","unstructured":"Torrente, A. A comprehensive human expression map."},{"issue":"2","key":"pcbi.1011198.ref031","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1093\/biostatistics\/kxp059","article-title":"Frozen robust multiarray analysis (fRMA)","volume":"11","author":"M. N. McCall","year":"2010","journal-title":"Biostatistics"},{"issue":"suppl_1","key":"pcbi.1011198.ref032","doi-asserted-by":"crossref","first-page":"D1011","DOI":"10.1093\/nar\/gkq1259","article-title":"The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes","volume":"39","author":"M. N. McCall","year":"2011","journal-title":"Nucleic acids research"},{"key":"pcbi.1011198.ref033","article-title":"Assessing affymetrix GeneChip microarray quality","author":"L. Margus","year":"2011","journal-title":"BMC"},{"issue":"23","key":"pcbi.1011198.ref034","doi-asserted-by":"crossref","first-page":"3153","DOI":"10.1093\/bioinformatics\/bts588","article-title":"fRMA ST: frozen robust multiarray analysis for Affymetrix Exon and Gene ST arrays","volume":"28","author":"M. N. McCall","year":"2012","journal-title":"Bioinformatics"},{"issue":"6","key":"pcbi.1011198.ref035","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1016\/j.cels.2015.12.004","article-title":"The molecular signatures database hallmark gene set collection","volume":"1","author":"A. Liberzon","year":"2015","journal-title":"Cell systems"},{"issue":"1","key":"pcbi.1011198.ref036","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: kyoto encyclopedia of genes and genomes","volume":"28","author":"M. Kanehisa","year":"2000","journal-title":"Nucleic acids research"},{"issue":"11","key":"pcbi.1011198.ref037","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1002\/pro.3715","article-title":"Toward understanding the origin and evolution of cellular organisms","volume":"28","author":"M. Kanehisa","year":"2019","journal-title":"Protein Science"},{"issue":"D1","key":"pcbi.1011198.ref038","doi-asserted-by":"crossref","first-page":"D587","DOI":"10.1093\/nar\/gkac963","article-title":"KEGG for taxonomy-based analysis of pathways and genomes","volume":"51","author":"M. Kanehisa","year":"2023","journal-title":"Nucleic acids research"},{"key":"pcbi.1011198.ref039","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"F. Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"pcbi.1011198.ref040","doi-asserted-by":"crossref","first-page":"5233","DOI":"10.1038\/s41598-019-41695-z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"V. A. Traag","year":"2019","journal-title":"Scientific reports"},{"issue":"W1","key":"pcbi.1011198.ref041","doi-asserted-by":"crossref","first-page":"W90","DOI":"10.1093\/nar\/gkw377","article-title":"Enrichr: a comprehensive gene set enrichment analysis web server 2016 update","volume":"44","author":"M. V. Kuleshov","year":"2016","journal-title":"Nucleic acids research"},{"issue":"1","key":"pcbi.1011198.ref042","first-page":"1","article-title":"Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool","volume":"14","author":"E. Y. Chen","year":"2013","journal-title":"BMC bioinformatics"},{"issue":"11","key":"pcbi.1011198.ref043","article-title":"Visualizing data using t-SNE","volume":"9","author":"L. Van der Maaten","year":"2008","journal-title":"Journal of machine learning research"},{"key":"pcbi.1011198.ref044","doi-asserted-by":"crossref","first-page":"1651","DOI":"10.2147\/DDDT.S415521","article-title":"Metastasis Related Epithelial-Mesenchymal Transition Signature Predicts Prognosis and Response to Chemotherapy in Acute Myeloid Leukemia","author":"S. Qu","year":"2023","journal-title":"Drug Design, Development and Therapy"},{"key":"pcbi.1011198.ref045","doi-asserted-by":"crossref","first-page":"151334","DOI":"10.1016\/j.ejcb.2023.151334","article-title":"Bone marrow mesenchymal\/fibroblastic stromal cells induce a distinctive EMT-like phenotype in AML cells","author":"N. Nojszewska","year":"2023","journal-title":"European Journal of Cell Biology"},{"issue":"3","key":"pcbi.1011198.ref046","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1038\/s41419-020-2399-y","article-title":"Life, death, and autophagy in cancer: NF-\u03baB turns up everywhere","volume":"11","author":"D. Verzella","year":"2020","journal-title":"Cell death & disease"},{"key":"pcbi.1011198.ref047","doi-asserted-by":"crossref","first-page":"115459","DOI":"10.1016\/j.biopha.2023.115459","article-title":"Nuclear factor kappa B expression in non-small cell lung cancer","volume":"167","author":"L. Zhang","year":"2023","journal-title":"Biomedicine & Pharmacotherapy"},{"issue":"2","key":"pcbi.1011198.ref048","first-page":"462","article-title":"Research on the coagulation function changes in non small cell lung cancer patients and analysis of their correlation with metastasis and survival","volume":"22","author":"Y. Qi","year":"2017","journal-title":"J buon"},{"key":"pcbi.1011198.ref049","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.lungcan.2018.10.010","article-title":"Chemerin as a biomarker at the intersection of inflammation, chemotaxis, coagulation, fibrinolysis and metabolism in resectable non-small cell lung cancer","volume":"125","author":"G. P. Sotiropoulos","year":"2018","journal-title":"Lung Cancer"},{"issue":"7","key":"pcbi.1011198.ref050","doi-asserted-by":"crossref","first-page":"2134","DOI":"10.1002\/1097-0142(19931001)72:7<2134::AID-CNCR2820720712>3.0.CO;2-8","article-title":"Correlation between increased granulocyte elastase release and activation of blood coagulation in patients with lung cancer","volume":"72","author":"E. C. Gabazza","year":"1993","journal-title":"Cancer"},{"issue":"11","key":"pcbi.1011198.ref051","doi-asserted-by":"crossref","first-page":"e0207387","DOI":"10.1371\/journal.pone.0207387","article-title":"Coagulation biomarkers and prediction of venous thromboembolism and survival in small cell lung cancer: A sub-study of RASTEN-A randomized trial with low molecular weight heparin","volume":"13","author":"E. Gezelius","year":"2018","journal-title":"PLoS One"},{"issue":"17","key":"pcbi.1011198.ref052","first-page":"3761","article-title":"Increased expression of cyclooxygenase 2 occurs frequently in human lung cancers, specifically in adenocarcinomas","volume":"58","author":"T. Hida","year":"1998","journal-title":"Cancer research"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1011198","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2024,7,16]],"date-time":"2024-07-16T00:00:00Z","timestamp":1721088000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011198","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,16]],"date-time":"2024-07-16T13:46:55Z","timestamp":1721137615000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1011198"}},"subtitle":[],"editor":[{"given":"Mark","family":"Alber","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2024,7,3]]},"references-count":52,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2024,7,3]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1011198","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.05.22.541678","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,3]]}}}