{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T22:12:16Z","timestamp":1775686336458,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2023,1,13]],"date-time":"2023-01-13T00:00:00Z","timestamp":1673568000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme"},{"DOI":"10.13039\/501100001732","name":"Danish National Research Foundation","doi-asserted-by":"publisher","award":["CEH-DNRF143"],"award-info":[{"award-number":["CEH-DNRF143"]}],"id":[{"id":"10.13039\/501100001732","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,2,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Machine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Our main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>All data and processing scripts are available at this GitLab repository: https:\/\/gitlab.com\/polavieja_lab\/ml_multi-omics_review\/ or in Zenodo: https:\/\/doi.org\/10.5281\/zenodo.7361807.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad021","type":"journal-article","created":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T20:26:58Z","timestamp":1673468818000},"source":"Crossref","is-referenced-by-count":72,"title":["Dealing with dimensionality: the application of machine learning to multi-omics data"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3984-0125","authenticated-orcid":false,"given":"Dylan","family":"Feldner-Busztin","sequence":"first","affiliation":[{"name":"Champalimaud Centre for the Unknown, Champalimaud Foundation , 1400-038 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8748-2978","authenticated-orcid":false,"given":"Panos","family":"Firbas Nisantzis","sequence":"additional","affiliation":[{"name":"Champalimaud Centre for the Unknown, Champalimaud Foundation , 1400-038 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5815-4294","authenticated-orcid":false,"given":"Shelley Jane","family":"Edmunds","sequence":"additional","affiliation":[{"name":"Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen , 1353 Copenhagen, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6453-8254","authenticated-orcid":false,"given":"Gergely","family":"Boza","sequence":"additional","affiliation":[{"name":"Centre for Ecological Research , 1113 Budapest, Hungary"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5025-2607","authenticated-orcid":false,"given":"Fernando","family":"Racimo","sequence":"additional","affiliation":[{"name":"Faculty of Health and Medical Sciences, University of Copenhagen , 2200 Copenhagen, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2004-6810","authenticated-orcid":false,"given":"Shyam","family":"Gopalakrishnan","sequence":"additional","affiliation":[{"name":"Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen , 1353 Copenhagen, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7718-6531","authenticated-orcid":false,"given":"Morten T\u00f8nsberg","family":"Limborg","sequence":"additional","affiliation":[{"name":"Center for Evolutionary Hologenomics, GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen , 1353 Copenhagen, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5537-637X","authenticated-orcid":false,"given":"Leo","family":"Lahti","sequence":"additional","affiliation":[{"name":"Department of Computing, University of Turku , 20014 Turku, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5359-3426","authenticated-orcid":false,"given":"Gonzalo G","family":"de Polavieja","sequence":"additional","affiliation":[{"name":"Champalimaud Centre for the Unknown, Champalimaud Foundation , 1400-038 Lisbon, Portugal"}]}],"member":"286","published-online":{"date-parts":[[2023,1,13]]},"reference":[{"key":"2023020815014461500_btad021-B2","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MCI.2018.2840660","article-title":"Augmentation of physician assessments with multi-omics enhances predictability of drug response: a case study of major depressive disorder","volume":"13","author":"Athreya","year":"2018","journal-title":"IEEE Comput. Intell. Mag"},{"key":"2023020815014461500_btad021-B3","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"2021","journal-title":"Nat. Methods"},{"key":"2023020815014461500_btad021-B4","author":"Bahdanau","year":"2014"},{"key":"2023020815014461500_btad021-B5","article-title":"Mechanistic models versus machine learning, a fight worth fighting for the biological community?","author":"Baker","year":"2022","journal-title":"R. Soc. Biol. Lett"},{"key":"2023020815014461500_btad021-B6","doi-asserted-by":"crossref","first-page":"100280","DOI":"10.1016\/j.patter.2021.100280","article-title":"Modeling in systems biology: causal understanding before prediction?","volume":"2","author":"Barsi","year":"2021","journal-title":"Patterns"},{"key":"2023020815014461500_btad021-B8","doi-asserted-by":"crossref","DOI":"10.1515\/9781400874668","volume-title":"Adaptive Control Processes","author":"Bellman","year":"1961"},{"key":"2023020815014461500_btad021-B9","doi-asserted-by":"crossref","first-page":"2526","DOI":"10.1214\/14-AOS1260","article-title":"CAM: causal additive models, high-dimensional order search and penalized regression","volume":"42","author":"B\u00fchlmann","year":"2014","journal-title":"Ann. Statist"},{"key":"2023020815014461500_btad021-B10","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1038\/nrc2981","article-title":"Regulation of cancer cell metabolism","volume":"11","author":"Cairns","year":"2011","journal-title":"Nat. Rev. Cancer"},{"key":"2023020815014461500_btad021-B11","doi-asserted-by":"crossref","first-page":"103798","DOI":"10.1016\/j.isci.2022.103798","article-title":"Machine learning for multi-omics data integration in cancer","volume":"25","author":"Cai","year":"2022","journal-title":"Iscience"},{"key":"2023020815014461500_btad021-B12","doi-asserted-by":"crossref","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep learning-based multi-omics integration robustly predicts survival in liver cancer","volume":"24","author":"Chaudhary","year":"2018","journal-title":"Clin. Cancer Res"},{"key":"2023020815014461500_btad021-B14","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.2517-6161.1972.tb00899.x","article-title":"Regression models and life tables","volume":"34","author":"Cox","year":"1972","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020815014461500_btad021-B15","first-page":"138","article-title":"On protein synthesis","volume":"12","author":"Crick","year":"1958","journal-title":"Symp. Soc. Exp. Biol"},{"key":"2023020815014461500_btad021-B16","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1038\/227561a0","article-title":"Central dogma of molecular biology","volume":"227","author":"Crick","year":"1970","journal-title":"Nature"},{"key":"2023020815014461500_btad021-B17","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1109\/MM.2021.3113475","article-title":"Evolution of the graphics processing unit (GPU)","volume":"41","author":"Dally","year":"2021","journal-title":"IEEE Micro"},{"key":"2023020815014461500_btad021-B18","author":"Devlin","year":"2018"},{"key":"2023020815014461500_btad021-B19","doi-asserted-by":"crossref","first-page":"e9730","DOI":"10.15252\/msb.20209730","article-title":"Causal integration of multi-omics data with prior knowledge to generate mechanistic hypotheses","volume":"17","author":"Dugourd","year":"2021","journal-title":"Mol. Syst. Biol"},{"key":"2023020815014461500_btad021-B20","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1017\/S0007114511005241","article-title":"Effects of kiwifruit extracts on colonic gene and protein expression levels in IL-10 gene-deficient mice","volume":"108","author":"Edmunds","year":"2012","journal-title":"Br. J. Nutr"},{"key":"2023020815014461500_btad021-B21","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/s41586-019-1186-3","article-title":"Next-generation characterization of the cancer cell line encyclopedia","volume":"569","author":"Ghandi","year":"2019","journal-title":"Nature"},{"key":"2023020815014461500_btad021-B22","doi-asserted-by":"publisher","author":"Holofood","year":"2019","DOI":"10.3030\/817729"},{"key":"2023020815014461500_btad021-B23","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023020815014461500_btad021-B24","article-title":"Convolutional networks for images, speech, and time series","volume":"3361","author":"LeCun","year":"1995","journal-title":"The Handbook of Brain Theory and Neural Networks"},{"key":"2023020815014461500_btad021-B25","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1093\/bib\/bby051","article-title":"Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application","volume":"20","author":"Lightbody","year":"2019","journal-title":"Brief. Bioinform"},{"key":"2023020815014461500_btad021-B26","doi-asserted-by":"crossref","first-page":"634511","DOI":"10.3389\/fmicb.2021.634511","article-title":"Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment","volume":"12","author":"Marcos-Zambrano","year":"2021","journal-title":"Front. Microbiol"},{"key":"2023020815014461500_btad021-B27","doi-asserted-by":"crossref","first-page":"7361","DOI":"10.1073\/pnas.1510493113","article-title":"Methods for causal inference from gene perturbation experiments and validation","volume":"113","author":"Meinshausen","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020815014461500_btad021-B28","first-page":"1573","author":"Mitchel","year":"2020"},{"key":"2023020815014461500_btad021-B29","doi-asserted-by":"crossref","first-page":"277","DOI":"10.3389\/fmicb.2021.635781","article-title":"Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions","volume":"12","author":"Moreno-Indias","year":"2021","journal-title":"Front. Microbiol"},{"key":"2023020815014461500_btad021-B30","doi-asserted-by":"crossref","first-page":"1515","DOI":"10.1093\/bib\/bbaa257","article-title":"Biological network analysis with deep learning","volume":"22","author":"Muzio","year":"2021","journal-title":"Brief. Bioinform"},{"key":"2023020815014461500_btad021-B31","first-page":"387","article-title":"Quantitative proteomics of the cancer cell line encyclopedia, Cell","author":"Nusinow","year":"2020"},{"key":"2023020815014461500_btad021-B34","volume-title":"Probabilistic Reasoning in Intelligent Systems","author":"Pearl","year":"1988"},{"key":"2023020815014461500_btad021-B35","doi-asserted-by":"crossref","first-page":"107739","DOI":"10.1016\/j.biotechadv.2021.107739","article-title":"Using machine learning approaches for multi-omics data analysis: a review","volume":"49","author":"Reel","year":"2021","journal-title":"Biotechnol. Adv"},{"key":"2023020815014461500_btad021-B36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13073-021-00930-x","article-title":"DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data","volume":"13","author":"Poirion","year":"2021","journal-title":"Genome Med"},{"key":"2023020815014461500_btad021-B37","doi-asserted-by":"crossref","first-page":"1384","DOI":"10.1109\/JBHI.2021.3102186","article-title":"Predicting drug response based on multi-omics fusion and graph convolution","volume":"26","author":"Peng","year":"2022","journal-title":"IEEE J. Biomed. Health Inform"},{"key":"2023020815014461500_btad021-B38","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1111\/rssb.12167","article-title":"Causal inference by using invariant prediction: identification and confidence intervals","volume":"78","author":"Peters","year":"2016","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"2023020815014461500_btad021-B39","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"2023020815014461500_btad021-B40","doi-asserted-by":"crossref","first-page":"2833","DOI":"10.1016\/j.ygeno.2020.03.021","article-title":"Estimating gene expression from DNA methylation and copy number variation: a deep learning regression model for multi-omics integration","volume":"112","author":"Seal","year":"2020","journal-title":"Genomics"},{"key":"2023020815014461500_btad021-B41","first-page":"2003","article-title":"A linear non-Gaussian acyclic model for causal discovery","volume":"7","author":"Shohei","year":"2006","journal-title":"J. Mach. Learn. Res"},{"key":"2023020815014461500_btad021-B42","author":"Singha","year":"2020"},{"key":"2023020815014461500_btad021-B43","first-page":"A68","article-title":"The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge","volume":"19","author":"Tomczak","year":"2015","journal-title":"Contemporary Oncol"},{"key":"2023020815014461500_btad021-B44","doi-asserted-by":"crossref","first-page":"467","DOI":"10.7326\/M18-0850","article-title":"PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation","volume":"169","author":"Tricco","year":"2018","journal-title":"Ann. Intern. Med"},{"key":"2023020815014461500_btad021-B45","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Informat. Process. Syst"},{"key":"2023020815014461500_btad021-B46","author":"Vincent","year":"2008"},{"key":"2023020815014461500_btad021-B47","first-page":"1","article-title":"MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification","volume":"12","author":"Wang","year":"2021","journal-title":"Nat. Commun"},{"key":"2023020815014461500_btad021-B49","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"},{"key":"2023020815014461500_btad021-B50","doi-asserted-by":"crossref","first-page":"2178","DOI":"10.1093\/bioinformatics\/btac088","article-title":"Multi-level attention graph neural network based on co-expression gene modules for disease diagnosis and prognosis","volume":"38","author":"Xing","year":"2022","journal-title":"Bioinformatics"},{"key":"2023020815014461500_btad021-B51","doi-asserted-by":"crossref","first-page":"782","DOI":"10.1109\/TCBB.2018.2866836","article-title":"Integration of multi-omics data for gene regulatory network inference and application to breast cancer","volume":"16","author":"Yuan","year":"2019","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023020815014461500_btad021-B52","first-page":"17283","article-title":"Big bird: transformers for longer sequences","volume":"33","author":"Zaheer","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023020815014461500_btad021-B53","doi-asserted-by":"crossref","first-page":"104048","DOI":"10.1016\/j.isci.2022.104048","article-title":"Multi-omics protein-coding units as massively parallel Bayesian networks: empirical validation of causality structure","volume":"25","author":"Zenere","year":"2022","journal-title":"iScience"},{"key":"2023020815014461500_btad021-B54","author":"Zhang","year":"2022"},{"key":"2023020815014461500_btad021-B55","doi-asserted-by":"crossref","first-page":"477","DOI":"10.3389\/fgene.2018.00477","article-title":"Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma","volume":"9","author":"Zhang","year":"2018","journal-title":"Front. Genet"},{"key":"2023020815014461500_btad021-B56","first-page":"100019","article-title":"AutoGGN: a gene graph network AutoML tool for multi-omics research","volume":"1","author":"Zhang","year":"2021","journal-title":"Artif. Intell. Life Sci"},{"key":"2023020815014461500_btad021-B57","doi-asserted-by":"crossref","first-page":"i457","DOI":"10.1093\/bioinformatics\/bty294","article-title":"Modeling polypharmacy side effects with graph convolutional networks","volume":"34","author":"Zitnik","year":"2018","journal-title":"Bioinformatics"},{"key":"2023020815014461500_btad021-B58","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1038\/s41586-022-04570-y","article-title":"Nonlinear control of transcription through enhancer\u2013promoter interactions","volume":"604","author":"Zuin","year":"2022","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad021\/48691591\/btad021.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/2\/btad021\/49124137\/btad021.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/2\/btad021\/49124137\/btad021.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,12]],"date-time":"2024-10-12T00:54:13Z","timestamp":1728694453000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad021\/6986971"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,1,13]]},"references-count":52,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,2,3]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad021","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,2,1]]},"published":{"date-parts":[[2023,1,13]]},"article-number":"btad021"}}