{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T00:20:00Z","timestamp":1768350000933,"version":"3.49.0"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009826","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,2,7]],"date-time":"2022-02-07T00:00:00Z","timestamp":1644192000000}}],"reference-count":39,"publisher":"Public Library of Science (PLoS)","issue":"1","license":[{"start":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T00:00:00Z","timestamp":1643155200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The Chinese University of Hong Kong - Shenzhen","award":["UDF0100185"],"award-info":[{"award-number":["UDF0100185"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>In the integrative analyses of omics data, it is often of interest to extract data representation from one data type that best reflect its relations with another data type. This task is traditionally fulfilled by linear methods such as canonical correlation analysis (CCA) and partial least squares (PLS). However, information contained in one data type pertaining to the other data type may be complex and in nonlinear form. Deep learning provides a convenient alternative to extract low-dimensional nonlinear data embedding. In addition, the deep learning setup can naturally incorporate the effects of clinical confounding factors into the integrative analysis. Here we report a deep learning setup, named Autoencoder-based Integrative Multi-omics data Embedding (AIME), to extract data representation for omics data integrative analysis. The method can adjust for confounder variables, achieve informative data embedding, rank features in terms of their contributions, and find pairs of features from the two data types that are related to each other through the data embedding. In simulation studies, the method was highly effective in the extraction of major contributing features between data types. Using two real microRNA-gene expression datasets, one with confounder variables and one without, we show that AIME excluded the influence of confounders, and extracted biologically plausible novel information. The R package based on Keras and the TensorFlow backend is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/tianwei-yu\/AIME\" xlink:type=\"simple\">https:\/\/github.com\/tianwei-yu\/AIME<\/jats:ext-link>.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009826","type":"journal-article","created":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T18:39:16Z","timestamp":1643222356000},"page":"e1009826","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":29,"title":["AIME: Autoencoder-based integrative multi-omics data embedding that allows for confounder adjustments"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2502-1628","authenticated-orcid":true,"given":"Tianwei","family":"Yu","sequence":"first","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,1,26]]},"reference":[{"issue":"3\/4","key":"pcbi.1009826.ref001","doi-asserted-by":"crossref","first-page":"321","DOI":"10.2307\/2333955","article-title":"Relations between two sets of variates","volume":"28","author":"H. Hotelling","year":"1936","journal-title":"Biometrika"},{"issue":"1","key":"pcbi.1009826.ref002","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1111\/biom.12715","article-title":"Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information","volume":"74","author":"SE Safo","year":"2018","journal-title":"Biometrics"},{"key":"pcbi.1009826.ref003","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1002\/cem.1180020306","article-title":"PLS regression methods","volume":"2","author":"A. Hoskuldsson","year":"1988","journal-title":"Journal of chemometrics"},{"issue":"5","key":"pcbi.1009826.ref004","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1142\/S012906570000034X","article-title":"Kernel and nonlinear canonical correlation analysis","volume":"10","author":"PL Lai","year":"2000","journal-title":"Int J Neural Syst"},{"key":"pcbi.1009826.ref005","article-title":"Deep Canonical Correlation Analysis","author":"G Andrew","year":"2013","journal-title":"Proceedings of the 30th International Conference on Machine Learning, PMLR"},{"issue":"22","key":"pcbi.1009826.ref006","doi-asserted-by":"crossref","first-page":"2906","DOI":"10.1093\/bioinformatics\/btp543","article-title":"Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis","volume":"25","author":"R Shen","year":"2009","journal-title":"Bioinformatics"},{"issue":"2","key":"pcbi.1009826.ref007","doi-asserted-by":"crossref","first-page":"1044","DOI":"10.1093\/nar\/gky1226","article-title":"Multi-omic and multi-view clustering algorithms: review and cancer benchmark","volume":"47","author":"N Rappoport","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"pcbi.1009826.ref008","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1186\/s13059-020-02015-1","article-title":"MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data","volume":"21","author":"R Argelaguet","year":"2020","journal-title":"Genome Biol"},{"issue":"6","key":"pcbi.1009826.ref009","doi-asserted-by":"crossref","first-page":"e8124","DOI":"10.15252\/msb.20178124","article-title":"Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets","volume":"14","author":"R Argelaguet","year":"2018","journal-title":"Mol Syst Biol"},{"issue":"10","key":"pcbi.1009826.ref010","doi-asserted-by":"crossref","DOI":"10.3390\/cancers11101434","article-title":"Data Fusion Techniques for the Integration of Multi-Domain Genomic Data from Uveal Melanoma","volume":"11","author":"M Pfeffer","year":"2019","journal-title":"Cancers (Basel)"},{"issue":"3","key":"pcbi.1009826.ref011","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"B Wang","year":"2014","journal-title":"Nat Methods"},{"key":"pcbi.1009826.ref012","first-page":"132","article-title":"Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders","volume":"2015","author":"J Tan","year":"2015","journal-title":"Pac Symp Biocomput"},{"issue":"1","key":"pcbi.1009826.ref013","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"G Eraslan","year":"2019","journal-title":"Nat Commun."},{"issue":"1","key":"pcbi.1009826.ref014","doi-asserted-by":"crossref","first-page":"16329","DOI":"10.1038\/s41598-018-34688-x","article-title":"AutoImpute: Autoencoder based imputation of single-cell RNA-seq data","volume":"8","author":"D Talwar","year":"2018","journal-title":"Sci Rep."},{"key":"pcbi.1009826.ref015","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1186\/s12864-016-2931-8","article-title":"IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction","volume":"17","author":"X Pan","year":"2016","journal-title":"BMC Genomics"},{"issue":"22","key":"pcbi.1009826.ref016","doi-asserted-by":"crossref","first-page":"3873","DOI":"10.1093\/bioinformatics\/bty440","article-title":"deepNF: deep network fusion for protein function prediction","volume":"34","author":"V Gligorijevic","year":"2018","journal-title":"Bioinformatics"},{"key":"pcbi.1009826.ref017","first-page":"219","article-title":"A Deep Learning Approach for Cancer Detection and Relevant Gene Identification","volume":"22","author":"P Danaee","year":"2017","journal-title":"Pac Symp Biocomput"},{"key":"pcbi.1009826.ref018","doi-asserted-by":"crossref","first-page":"226","DOI":"10.3389\/fgene.2019.00226","article-title":"Predicting Parkinson\u2019s Disease Genes Based on Node2vec and Autoencoder","volume":"10","author":"J Peng","year":"2019","journal-title":"Front Genet"},{"issue":"5","key":"pcbi.1009826.ref019","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1016\/j.gpb.2018.08.003","article-title":"VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder","volume":"16","author":"D Wang","year":"2018","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"pcbi.1009826.ref020","article-title":"Supervised autoencoders: improving generalization performance with unsupervised regularizers","author":"L Le","year":"2018","journal-title":"The 32nd International Conference on Neural Information Processing Systems"},{"issue":"6","key":"pcbi.1009826.ref021","doi-asserted-by":"crossref","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep Learning-Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer","volume":"24","author":"K Chaudhary","year":"2018","journal-title":"Clin Cancer Res"},{"issue":"4","key":"pcbi.1009826.ref022","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1109\/TCBB.2014.2377729","article-title":"Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach","volume":"12","author":"M Liang","year":"2015","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"pcbi.1009826.ref023","article-title":"Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data","author":"C Zuo","year":"2020","journal-title":"Brief Bioinform"},{"key":"pcbi.1009826.ref024","author":"F. Chollet","year":"2015"},{"key":"pcbi.1009826.ref025","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1198\/016214504000000089","article-title":"Large-scale simultaneous hypothesis testing: the choice of a null hypothesis","volume":"99","author":"B. E.","year":"2004","journal-title":"J Amer Stat Assoc"},{"issue":"3","key":"pcbi.1009826.ref026","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1093\/biomet\/57.3.519","article-title":"Measures of multivariate skewnees and kurtosis with applications","volume":"159","author":"KV Mardia","year":"1970","journal-title":"Biometrika"},{"key":"pcbi.1009826.ref027","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"T Hastie","year":"2009","edition":"2"},{"issue":"7757","key":"pcbi.1009826.ref028","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/s41586-019-1186-3","article-title":"Next-generation characterization of the Cancer Cell Line Encyclopedia","volume":"569","author":"M Ghandi","year":"2019","journal-title":"Nature"},{"issue":"7418","key":"pcbi.1009826.ref029","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","author":"N. Cancer Genome Atlas","year":"2012","journal-title":"Nature"},{"issue":"11","key":"pcbi.1009826.ref030","doi-asserted-by":"crossref","first-page":"e1005752","DOI":"10.1371\/journal.pcbi.1005752","article-title":"mixOmics: An R package for \u2019omics feature selection and multiple data integration","volume":"13","author":"F Rohart","year":"2017","journal-title":"PLoS Comput Biol"},{"issue":"4","key":"pcbi.1009826.ref031","doi-asserted-by":"crossref","first-page":"e35236","DOI":"10.1371\/journal.pone.0035236","article-title":"Integrative subtype discovery in glioblastoma using iCluster","volume":"7","author":"R Shen","year":"2012","journal-title":"PLoS One"},{"key":"pcbi.1009826.ref032","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1007\/BF02296656","article-title":"Four simultaneous component models for the analysis of multivariate time series from more than one subject to model intraindividual and interindividual differences","volume":"68","author":"ME Timmerman","year":"2003","journal-title":"Psychometrika"},{"issue":"17","key":"pcbi.1009826.ref033","doi-asserted-by":"crossref","first-page":"e133","DOI":"10.1093\/nar\/gku631","article-title":"The multiMiR R package and database: integration of microRNA-target interactions along with their disease and drug associations","volume":"42","author":"Y Ru","year":"2014","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"pcbi.1009826.ref034","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1158\/1078-0432.CCR-16-0943","article-title":"miR-206 Inhibits Stemness and Metastasis of Breast Cancer by Targeting MKL1\/IL11 Pathway","volume":"23","author":"R Samaeekia","year":"2017","journal-title":"Clin Cancer Res"},{"key":"pcbi.1009826.ref035","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1186\/1476-4598-9-169","article-title":"MicroRNA, hsa-miR-200c, is an independent prognostic factor in pancreatic cancer and its upregulation inhibits pancreatic cancer invasion but increases cell proliferation","volume":"9","author":"J Yu","year":"2010","journal-title":"Mol Cancer"},{"issue":"W1","key":"pcbi.1009826.ref036","doi-asserted-by":"crossref","first-page":"W460","DOI":"10.1093\/nar\/gkv403","article-title":"DIANA-miRPath v3.0: deciphering microRNA function with experimental support","volume":"43","author":"IS Vlachos","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009826.ref037","article-title":"Fast gene set enrichment analysis","author":"G Korotkevich","year":"2019","journal-title":"bioRxiv"},{"issue":"8","key":"pcbi.1009826.ref038","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised risk predictor of breast cancer based on intrinsic subtypes","volume":"27","author":"JS Parker","year":"2009","journal-title":"J Clin Oncol"},{"issue":"4","key":"pcbi.1009826.ref039","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/bcr3425","article-title":"Platelets, coagulation and fibrinolysis in breast cancer progression","volume":"15","author":"I Lal","year":"2013","journal-title":"Breast Cancer Res"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009826","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,2,7]],"date-time":"2022-02-07T00:00:00Z","timestamp":1644192000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009826","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,7]],"date-time":"2022-02-07T18:39:17Z","timestamp":1644259157000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009826"}},"subtitle":[],"editor":[{"given":"Wei","family":"Li","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,1,26]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,1,26]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009826","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1009826","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,26]]}}}