{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T04:00:01Z","timestamp":1778558401305,"version":"3.51.4"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2020,5,21]],"date-time":"2020-05-21T00:00:00Z","timestamp":1590019200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p\u226bn) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/theorod93\/sCCA.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa530","type":"journal-article","created":{"date-parts":[[2020,5,16]],"date-time":"2020-05-16T12:59:02Z","timestamp":1589633942000},"page":"4616-4625","source":"Crossref","is-referenced-by-count":61,"title":["Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6613-2530","authenticated-orcid":false,"given":"Theodoulos","family":"Rodosthenous","sequence":"first","affiliation":[{"name":"Department of Mathematics, Imperial College London , London SW7 2AZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4013-5458","authenticated-orcid":false,"given":"Vahid","family":"Shahrezaei","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Imperial College London , London SW7 2AZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marina","family":"Evangelou","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Imperial College London , London SW7 2AZ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,5,21]]},"reference":[{"key":"2023062213563163200_btaa530-B1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000016","article-title":"Distributed optimization and statistical learning via the alternating direction method of multipliers","volume":"3","author":"Boyd","year":"2010","journal-title":"Found. Trends Mach. Learn"},{"key":"2023062213563163200_btaa530-B2","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023062213563163200_btaa530-B3","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.csda.2011.07.012","article-title":"Comparison of penalty functions for sparse canonical correlation analysis","volume":"56","author":"Chalise","year":"2012","journal-title":"Computational Statistics and Data Anal"},{"key":"2023062213563163200_btaa530-B4","doi-asserted-by":"crossref","first-page":"3050","DOI":"10.1109\/TPAMI.2013.104","article-title":"Sparse canonical correlation analysis: new formulation and algorithm","volume":"35","author":"Chu","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023062213563163200_btaa530-B5","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1093\/bioinformatics\/btx594","article-title":"A novel SCCA approach via truncated 1-norm and truncated group lasso for brain imaging genetics","volume":"34","author":"Du","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062213563163200_btaa530-B6","doi-asserted-by":"crossref","first-page":"i474","DOI":"10.1093\/bioinformatics\/btz320","article-title":"Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the ADNI cohort","volume":"35","author":"Du","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062213563163200_btaa530-B7","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"Am. Stat. Assoc"},{"key":"2023062213563163200_btaa530-B8","doi-asserted-by":"crossref","first-page":"3480","DOI":"10.1093\/bioinformatics\/btw485","article-title":"Joint sparse canonical correlation analysis for detecting differential imaging genetics modules","volume":"32","author":"Fang","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062213563163200_btaa530-B9","doi-asserted-by":"crossref","first-page":"20150571","DOI":"10.1098\/rsif.2015.0571","article-title":"Methods for biological data integration: perspectives and Challenges","volume":"12","author":"Gligorijevi\u0107","year":"2015","journal-title":"J. R. Soc. Interface"},{"key":"2023062213563163200_btaa530-B10","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1007\/s10994-010-5222-7","article-title":"Sparse canonical correlation analysis","volume":"83","author":"Hardoon","year":"2011","journal-title":"Mach. Learn"},{"key":"2023062213563163200_btaa530-B11","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1186\/s13059-017-1215-1","article-title":"Multi-omics approaches to disease","volume":"18","author":"Hasin","year":"2017","journal-title":"Genome Biol"},{"key":"2023062213563163200_btaa530-B12","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.coisb.2017.08.009","article-title":"Designing and interpreting \u2018multi-omic\u2019 experiments that may change our understanding of biology","volume":"6","author":"Hass","year":"2017","journal-title":"Curr. Opin. Syst. Biol"},{"key":"2023062213563163200_btaa530-B13","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1093\/biomet\/28.3-4.321","article-title":"Relations between two sets of variables","volume":"28","author":"Hotelling","year":"1936","journal-title":"Biometrika"},{"key":"2023062213563163200_btaa530-B14","doi-asserted-by":"crossref","first-page":"1460","DOI":"10.1016\/j.jcss.2011.12.025","article-title":"A spectral algorithm for learning hidden Markov models","volume":"78","author":"Hsu","year":"2012","journal-title":"J. Comp. Syst. Sci"},{"key":"2023062213563163200_btaa530-B15","doi-asserted-by":"crossref","first-page":"84","DOI":"10.3389\/fgene.2017.00084","article-title":"More is better: recent progress in multi-omics data integration methods","volume":"8","author":"Huang","year":"2017","journal-title":"Front. Genet"},{"key":"2023062213563163200_btaa530-B16","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1016\/j.ijcard.2018.10.102","article-title":"Multivariate analysis of genome-wide data to identify potential pleiotropic genes for type 2 diabetes, obesity and coronary artery disease using MetaCCA","volume":"283","author":"Jia","year":"2019","journal-title":"Int. J. Cardiol"},{"key":"2023062213563163200_btaa530-B17","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1016\/j.ygeno.2016.04.005","article-title":"Integrated analysis of multidimensional omics data on cutaneous melanoma prognosis","volume":"107","author":"Jiang","year":"2016","journal-title":"Genomics"},{"key":"2023062213563163200_btaa530-B18","article-title":"Sparse canonical methods for biological data integration: application to a cross-platform study","volume":"10, 34","author":"L\u00ea Cao","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023062213563163200_btaa530-B19","first-page":"325","article-title":"A review on machine learning principles for multi-view biological data integration","volume":"19","author":"Li","year":"2018","journal-title":"Brief. Bioinform"},{"key":"2023062213563163200_btaa530-B20","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1214\/12-AOAS597","article-title":"Joint and individual variation explained (JIVE) for integrated analysis of multiple data types","volume":"7","author":"Lock","year":"2013","journal-title":"Ann. Appl. Stat"},{"key":"2023062213563163200_btaa530-B21","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1111\/biom.13043","article-title":"An iterative penalized least squares approach to sparse canonical correlation analysis","volume":"75","author":"Mai","year":"2019","journal-title":"Biometrics"},{"key":"2023062213563163200_btaa530-B22","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1093\/bioinformatics\/btx682","article-title":"Unsupervised multiple kernel learning for heterogeneous data integration","volume":"34","author":"Mariette","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062213563163200_btaa530-B23","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1002\/hep.21510","article-title":"Novel aspects of PPAR\u03b1w-mediated regulation of lipid and xenobiotic metabolism revealed through a nutrigenomic study","volume":"45","author":"Martin","year":"2007","journal-title":"Hepatology"},{"key":"2023062213563163200_btaa530-B24","article-title":"SparseNet: coordinate descent with nonconvex penalties","volume":"106, 1125\u20131138","author":"Mazumder","year":"2011","journal-title":"J. Am. Stat. Assoc"},{"key":"2023062213563163200_btaa530-B25","first-page":"123","article-title":"Proximal algorithms","volume":"1","author":"Parikh","year":"2014","journal-title":"Found. Trends Optim"},{"key":"2023062213563163200_btaa530-B26","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/j.jbi.2018.06.001","article-title":"Patient similarity for precision medicine: a systematic review","volume":"83","author":"Parimbelli","year":"2018","journal-title":"J. Biomed. Inform"},{"key":"2023062213563163200_btaa530-B27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2202\/1544-6115.1406","article-title":"Sparse canonical correlation analysis with application to genomic data integration","volume":"8","author":"Parkhomenko","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023062213563163200_btaa530-B28","first-page":"197","article-title":"Deep learning data integration for better risk stratification models of bladder cancer","volume":"2017","author":"Poirion","year":"2018","journal-title":"AMIA Jt Summits Transl. Sci. Proc"},{"key":"2023062213563163200_btaa530-B29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41524-017-0028-9","article-title":"Data analytics using canonical correlation analysis and Monte Carlo simulation","volume":"3","author":"Rickman","year":"2017","journal-title":"NPJ Comput. Mater"},{"key":"2023062213563163200_btaa530-B30","doi-asserted-by":"publisher","author":"Sathyanarayanan","year":"2019","DOI":"10.1093\/bib\/bbz121"},{"key":"2023062213563163200_btaa530-B31","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1207\/s15327752jpa8401_09","article-title":"Conducting and interpreting canonical correlation analysis in personality research: a user-friendly primer","volume":"84","author":"Sherry","year":"2005","journal-title":"J. Pers. Assess"},{"key":"2023062213563163200_btaa530-B32","doi-asserted-by":"crossref","first-page":"1177932219899051","DOI":"10.1177\/1177932219899051","article-title":"Multi-omics data integration, interpretation, and its application","volume":"14","author":"Subramanian","year":"2020","journal-title":"Bioinform. Biol. Insights"},{"key":"2023062213563163200_btaa530-B33","author":"Suo","year":"2017"},{"key":"2023062213563163200_btaa530-B34","first-page":"4886","author":"Swanson","year":"2019"},{"key":"2023062213563163200_btaa530-B35","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nature11412","article-title":"Comprehensive molecular portraits of human breast tumours","volume":"490","year":"2012","journal-title":"Nature"},{"key":"2023062213563163200_btaa530-B36","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. B"},{"key":"2023062213563163200_btaa530-B37","doi-asserted-by":"crossref","first-page":"e40358","DOI":"10.1371\/journal.pone.0040358","article-title":"Integration of clinical and gene expression data has a synergetic effect on predicting breast cancer outcome","volume":"7","author":"Van Vliet","year":"2012","journal-title":"PLoS One"},{"key":"2023062213563163200_btaa530-B38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.isprsjprs.2014.11.002","article-title":"Canonical information analysis","volume":"101","author":"Vestergaard","year":"2015","journal-title":"ISPRS J. Photogramm. Remote Sens"},{"key":"2023062213563163200_btaa530-B39","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1329","article-title":"Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis","volume":"7","author":"Waaijenborg","year":"2008","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023062213563163200_btaa530-B40","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat. Methods"},{"key":"2023062213563163200_btaa530-B41","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2202\/1544-6115.1470","article-title":"Extensions of sparse canonical correlation analysis with applications to genomic data","volume":"8","author":"Witten","year":"2009","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023062213563163200_btaa530-B42","doi-asserted-by":"crossref","first-page":"4","DOI":"10.3390\/ht8010004","article-title":"A selective review of multi-level omics data integration using variable selection","volume":"8","author":"Wu","year":"2019","journal-title":"High Throughput"},{"key":"2023062213563163200_btaa530-B43","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/bib\/bbu003","article-title":"Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA","volume":"16","author":"Zhao","year":"2015","journal-title":"Brief. Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa530\/33796509\/btaa530.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/17\/4616\/50677605\/btaa530.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/17\/4616\/50677605\/btaa530.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T22:27:59Z","timestamp":1722896879000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/17\/4616\/5841662"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,5,21]]},"references-count":43,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2020,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa530","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,9,1]]},"published":{"date-parts":[[2020,5,21]]}}}