{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T02:29:38Z","timestamp":1769826578471,"version":"3.49.0"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T00:00:00Z","timestamp":1766534400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T00:00:00Z","timestamp":1769731200000},"content-version":"vor","delay-in-days":37,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"European Union\u2019s Horizon 2020 research and innovation programme","award":["721815"],"award-info":[{"award-number":["721815"]}]},{"name":"ERA-Net E-Rare JTC 2018","award":["40-44000-98-2006 \/ 90030376507"],"award-info":[{"award-number":["40-44000-98-2006 \/ 90030376507"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>In studies that aim to model the relationship between an outcome variable and multiple omics datasets, it is often desirable to reduce the dimensionality of these datasets or to represent one omics dataset in terms of another. Several approaches exist for this purpose, including univariate methods such as polygenic scores, and multivariate methods. Multivariate approaches offer advantages by producing lower-dimensional integrative scores, capturing joint structures across datasets, and filtering out dataset-specific noise. In this paper, we describe one univariate and two multivariate methods, and evaluate their performance through simulations involving two correlated multivariate normally distributed omics datasets, as well as a combination of one multivariate normal and one fixed categorical dataset.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We assess method performance using the root mean squared error (RMSE) when modelling the outcome variable as a function of the reduced omics representations. Multivariate methods generally perform well, particularly when a slightly higher number of components is used for integration. They outperform the univariate method in scenarios involving two normally distributed omics datasets and perform comparably in settings with one normal and one categorical dataset. In real data applications, including two metabolomics datasets from TwinsUK and a metabolomics-genetic dataset from ORCADES, all methods show similar performance in modelling body mass index.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>Multivariate methods provide a valuable framework for summarizing multi-omics datasets into low-dimensional components suitable for outcome modelling. Even in the presence of non-normal data, these methods offer a promising alternative to high-dimensional univariate approaches.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-025-06349-0","type":"journal-article","created":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T05:43:21Z","timestamp":1766555001000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Statistical modelling of an outcome variable with integrated multi-omics"],"prefix":"10.1186","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-7288-3330","authenticated-orcid":false,"given":"He","family":"Li","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7675-8000","authenticated-orcid":false,"given":"Zander","family":"Gu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2279-4337","authenticated-orcid":false,"given":"Said","family":"el Bouhaddani","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4505-7137","authenticated-orcid":false,"given":"Jeanine","family":"Houwing-Duistermaat","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,12,24]]},"reference":[{"key":"6349_CR1","doi-asserted-by":"publisher","first-page":"1451","DOI":"10.1111\/rssc.12583","volume":"71","author":"S Bouhaddani","year":"2022","unstructured":"Bouhaddani S, Uh HW, Jongbloed G, Houwing-Duistermaat J. Statistical integration of heterogeneous omics data: probabilistic two-way partial least squares (po2pls). J R Stat Soc: Ser C: Appl Stat. 2022;71:1451\u201370.","journal-title":"J R Stat Soc: Ser C: Appl Stat"},{"issue":"9","key":"6349_CR2","doi-asserted-by":"publisher","first-page":"2759","DOI":"10.1038\/s41596-020-0353-1","volume":"15","author":"SW Choi","year":"2020","unstructured":"Choi SW, Mak TS-H, O\u2019Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15(9):2759\u201372.","journal-title":"Nat Protoc"},{"issue":"9","key":"6349_CR3","doi-asserted-by":"publisher","first-page":"1091","DOI":"10.1038\/ng.3367","volume":"47","author":"ER Gamazon","year":"2015","unstructured":"Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091\u20138.","journal-title":"Nat Genet"},{"key":"6349_CR4","first-page":"448","volume":"23","author":"B Li","year":"2018","unstructured":"Li B, Verma SS, Veturi YC, Verma A, Bradford Y, Haas DW, et al. Evaluation of predixcan for prioritizing gwas associations and predicting gene expression. Pacific Symposium Biocomput Pacific Symposium Biocomput. 2018;23:448.","journal-title":"Pacific Symposium Biocomput. Pacific Symposium Biocomput."},{"issue":"1","key":"6349_CR5","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","volume":"12","author":"AE Hoerl","year":"1970","unstructured":"Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55\u201367.","journal-title":"Technometrics"},{"issue":"1","key":"6349_CR6","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol. 1996;58(1):267\u201388.","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"key":"6349_CR7","doi-asserted-by":"publisher","DOI":"10.1201\/9781003026860","volume-title":"Multivariate Data Integration Using R: Methods and Applications with the mixOmics Package","author":"K-A L\u00ea Cao","year":"2021","unstructured":"L\u00ea Cao K-A, Welham ZM. Multivariate Data Integration Using R: Methods and Applications with the mixOmics Package. New York: Chapman and Hall\/CRC; 2021."},{"issue":"2","key":"6349_CR8","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1016\/S0169-7439(01)00156-3","volume":"58","author":"S Wold","year":"2001","unstructured":"Wold S, Trygg J, Berglund A, Antti H. Some recent developments in pls modeling. Chemom Intell Lab Syst. 2001;58(2):131\u201350.","journal-title":"Chemom Intell Lab Syst"},{"key":"6349_CR9","doi-asserted-by":"crossref","unstructured":"L\u00ea Cao K-A, Rossouw D, Robert-Grani\u00e9 C, Besse P. A sparse pls for variable selection when integrating omics data. Statistic Appl Genet Molecul Biol. 2008;7(1).","DOI":"10.2202\/1544-6115.1390"},{"key":"6349_CR10","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1016\/j.jmva.2018.05.009","volume":"167","author":"S Bouhaddani","year":"2018","unstructured":"Bouhaddani S, Uh H-W, Hayward C, Jongbloed G, Houwing-Duistermaat J. Probabilistic partial least squares model: Identifiability, estimation and application. J Multivar Anal. 2018;167:331\u201346.","journal-title":"J Multivar Anal"},{"key":"6349_CR11","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1002\/cem.775","volume":"17","author":"J Trygg","year":"2003","unstructured":"Trygg J, Wold S. O2-pls, a two-block (x-y) latent variable regression (lvr) method with an integral osc filter. J Chemom. 2003;17:53\u201364.","journal-title":"J Chemom"},{"key":"6349_CR12","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1186\/s12859-015-0854-z","volume":"17","author":"S Bouhaddani","year":"2016","unstructured":"Bouhaddani S, Houwing-Duistermaat J, Salo P, Perola M, Jongbloed G, Uh HW. Evaluation of o2pls in omics data integration. BMC Bioinform. 2016;17:11.","journal-title":"BMC Bioinform"},{"issue":"1","key":"6349_CR13","doi-asserted-by":"publisher","first-page":"454","DOI":"10.1093\/bib\/bbab454","volume":"23","author":"M Kang","year":"2022","unstructured":"Kang M, Ko E, Mersha TB. A roadmap for multi-omics data integration using deep learning. Brief Bioinform. 2022;23(1):454.","journal-title":"Brief Bioinform"},{"issue":"6","key":"6349_CR14","doi-asserted-by":"publisher","first-page":"1009","DOI":"10.1093\/bioinformatics\/btx682","volume":"34","author":"J Mariette","year":"2018","unstructured":"Mariette J, Villa-Vialaneix N. Unsupervised multiple kernel learning for heterogeneous data integration. Bioinformatics. 2018;34(6):1009\u201315.","journal-title":"Bioinformatics"},{"issue":"6","key":"6349_CR15","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1017\/thg.2019.65","volume":"22","author":"S Verdi","year":"2019","unstructured":"Verdi S, Abbasian G, Bowyer RC, Lachance G, Yarand D, Christofidou P, et al. Twinsuk: the UK adult twin registry update. Twin Res Hum Genet. 2019;22(6):523\u20139.","journal-title":"Twin Res Hum Genet"},{"key":"6349_CR16","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1375\/twin.9.6.899","volume":"9","author":"TD Spector","year":"2006","unstructured":"Spector TD, Williams FMK. The uk adult twin registry (twinsuk). Twin Res Hum Genet. 2006;9:899\u2013906.","journal-title":"Twin Res Hum Genet"},{"key":"6349_CR17","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1186\/s12859-018-2371-3","volume":"19","author":"S Bouhaddani","year":"2018","unstructured":"Bouhaddani S, Uh HW, Jongbloed G, Hayward C, Klari\u0107 L, Kie\u0142basa SM, et al. Integrating omics datasets with the omicspls package. BMC Bioinform. 2018;19:371.","journal-title":"BMC Bioinform"},{"key":"6349_CR18","volume-title":"Predictive Inference","author":"S Geisser","year":"2017","unstructured":"Geisser S. Predictive Inference. New Yorl: Chapman and Hall\/CRC; 2017."},{"key":"6349_CR19","doi-asserted-by":"publisher","first-page":"3427","DOI":"10.1093\/bioinformatics\/btu562","volume":"30","author":"I Shlyakhter","year":"2014","unstructured":"Shlyakhter I, Sabeti PC, Schaffner SF. Cosi2: An efficient simulator of exact and approximate coalescent with selection. Bioinformatics. 2014;30:3427\u20139.","journal-title":"Bioinformatics"},{"key":"6349_CR20","doi-asserted-by":"crossref","unstructured":"Gomari DP, Schweickart A, Cerchietti L, Paietta E, Fernandez H, Al-Amin H, et al. Variational autoencoders learn transferrable representations of metabolomics data. Commun Biol. 2022;5.","DOI":"10.1038\/s42003-022-03579-3"},{"issue":"16","key":"6349_CR21","doi-asserted-by":"publisher","first-page":"4252","DOI":"10.1073\/pnas.1603023113","volume":"113","author":"R Chaleckis","year":"2016","unstructured":"Chaleckis R, Murakami I, Takada J, Kondoh H, Yanagida M. Individual variability in human blood metabolites identifies age-related differences. Proc Natl Acad Sci. 2016;113(16):4252\u20139.","journal-title":"Proc Natl Acad Sci"},{"issue":"3","key":"6349_CR22","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1016\/j.ajhg.2008.08.007","volume":"83","author":"R McQuillan","year":"2008","unstructured":"McQuillan R, Leutenegger A-L, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. Am J Human Genet. 2008;83(3):359\u201372.","journal-title":"Am J Human Genet"},{"issue":"11","key":"6349_CR23","doi-asserted-by":"publisher","first-page":"1555","DOI":"10.1161\/CIRCRESAHA.117.312174","volume":"122","author":"C Menni","year":"2018","unstructured":"Menni C, Gudelj I, Macdonald-Dunlop E, Mangino M, Zierer J, Be\u0161i\u0107 E, et al. Glycosylation profile of immunoglobulin g is cross-sectionally associated with cardiovascular disease risk score and subclinical atherosclerosis in two independent cohorts. Circ Res. 2018;122(11):1555\u201364.","journal-title":"Circ Res"},{"key":"6349_CR24","doi-asserted-by":"publisher","first-page":"2867","DOI":"10.1093\/bioinformatics\/btq559","volume":"26","author":"A Manichaikul","year":"2010","unstructured":"Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867\u201373.","journal-title":"Bioinformatics"},{"key":"6349_CR25","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1086\/519795","volume":"81","author":"S Purcell","year":"2007","unstructured":"Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. Plink: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559\u201375.","journal-title":"Am J Hum Genet"},{"key":"6349_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/ncomms11122","volume":"7","author":"J Kettunen","year":"2016","unstructured":"Kettunen J, Demirkan A, W\u00fcrtz P, Draisma HHM, Haller T, Rawal R, et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of lpa. Nat Commun. 2016;7:1\u20139.","journal-title":"Nat Commun"},{"key":"6349_CR27","first-page":"1","volume":"11","author":"FA Hagenbeek","year":"2020","unstructured":"Hagenbeek FA, Pool R, Dongen J, Draisma HHM, Hottenga JJ, Willemsen G, et al. Heritability estimates for 361 blood metabolites across 40 genome-wide association studies. Nat Commun. 2020;11:1\u201311.","journal-title":"Nat Commun"},{"key":"6349_CR28","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1111\/j.2517-6161.1964.tb00553.x","volume":"26","author":"GEP Box","year":"1964","unstructured":"Box GEP, Cox DR. An analysis of transformations. J Source Statist Royal Series Soc. 1964;26:211\u201352.","journal-title":"J Source Statist Royal Series Soc"},{"key":"6349_CR29","doi-asserted-by":"publisher","first-page":"623","DOI":"10.18632\/aging.203847","volume":"14","author":"E Macdonald-Dunlop","year":"2022","unstructured":"Macdonald-Dunlop E, Taba N, Klari\u0107 L, Frkatovi\u0107 A, Walker R, Hayward C, et al. A catalogue of omics biological ageing clocks reveals substantial commonality and associations with disease risk. Aging. 2022;14:623\u201359.","journal-title":"Aging"},{"key":"6349_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v045.i03","volume":"45","author":"S Buuren","year":"2011","unstructured":"Buuren S, Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in r. J Stat Softw. 2011;45:1\u201367.","journal-title":"J Stat Softw"},{"issue":"13","key":"6349_CR31","doi-asserted-by":"publisher","first-page":"2627","DOI":"10.1080\/02664763.2024.2313458","volume":"51","author":"Z Gu","year":"2024","unstructured":"Gu Z, Uh H-W, Houwing-Duistermaat J, El Bouhaddani S. Joint modeling of an outcome variable and integrated omics datasets using glm-po2pls. J Appl Stat. 2024;51(13):2627\u201351.","journal-title":"J Appl Stat"},{"issue":"1","key":"6349_CR32","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1186\/s12974-024-03218-0","volume":"21","author":"C Otto","year":"2024","unstructured":"Otto C, Kalantzis R, K\u00fcbler-Weller D, K\u00fchn AA, B\u00f6ld T, Regler A, et al. Comprehensive analysis of the cerebrospinal fluid and serum metabolome in neurological diseases. J Neuroinflammation. 2024;21(1):234.","journal-title":"J Neuroinflammation"},{"issue":"2","key":"6349_CR33","doi-asserted-by":"publisher","first-page":"981","DOI":"10.1002\/cpz1.981","volume":"4","author":"P Evans","year":"2024","unstructured":"Evans P, Nagai T, Konkashbaev A, Zhou D, Knapik EW, Gamazon ER. Transcriptome-wide association studies (twas): Methodologies, applications, and challenges. Current Protocols. 2024;4(2):981.","journal-title":"Current Protocols"},{"key":"6349_CR34","doi-asserted-by":"publisher","first-page":"3735","DOI":"10.1016\/j.csbj.2021.06.030","volume":"19","author":"M Picard","year":"2021","unstructured":"Picard M, Scott-Boyer M-P, Bodein A, P\u00e9rin O, Droit A. Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J. 2021;19:3735\u201346.","journal-title":"Comput Struct Biotechnol J"},{"issue":"1","key":"6349_CR35","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1186\/s12859-025-06245-7","volume":"26","author":"C Wang","year":"2025","unstructured":"Wang C, O\u2019Connell MJ. Autoencoders with shared and specific embeddings for multi-omics data integration. BMC Bioinform. 2025;26(1):214.","journal-title":"BMC Bioinform"},{"issue":"5","key":"6349_CR36","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1111\/rssc.12060","volume":"63","author":"G Nyamundanda","year":"2014","unstructured":"Nyamundanda G, Gormley IC, Brennan L. A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data. J R Stat Soc: Ser C: Appl Stat. 2014;63(5):763\u201382.","journal-title":"J R Stat Soc: Ser C: Appl Stat"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-025-06349-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-025-06349-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-025-06349-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T14:00:08Z","timestamp":1769781608000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1186\/s12859-025-06349-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,24]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["6349"],"URL":"https:\/\/doi.org\/10.1186\/s12859-025-06349-0","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,24]]},"assertion":[{"value":"13 June 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 December 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 December 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publications"}},{"value":"ZG is currently an employee of Novartis Pharmaceuticals UK, but all work presented in this manuscript was completed while he was an employee of University of Cambridge and UMC Utrecht.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"26"}}