{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T01:40:17Z","timestamp":1776303617503,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,3,18]],"date-time":"2021-03-18T00:00:00Z","timestamp":1616025600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2021,3,18]],"date-time":"2021-03-18T00:00:00Z","timestamp":1616025600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010665","name":"H2020 Marie Sklodowska-Curie Actions","doi-asserted-by":"publisher","award":["721815"],"award-info":[{"award-number":["721815"]}],"id":[{"id":"10.13039\/100010665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Nowadays, multiple omics data are measured on the same samples in the belief that these different omics datasets represent various aspects of the underlying biological systems. Integrating these omics datasets will facilitate the understanding of the systems. For this purpose, various methods have been proposed, such as Partial Least Squares (PLS), decomposing two datasets into joint and residual subspaces. Since omics data are heterogeneous, the joint components in PLS will contain variation specific to each dataset. To account for this, Two-way Orthogonal Partial Least Squares (O2PLS) captures the heterogeneity by introducing orthogonal subspaces and better estimates the joint subspaces. However, the latent components spanning the joint subspaces in O2PLS are linear combinations of all variables, while it might be of interest to identify a small subset relevant to the research question. To obtain sparsity, we extend O2PLS to Group Sparse O2PLS (GO2PLS) that utilizes biological information on group structures among variables and performs group selection in the joint subspace.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The simulation study showed that introducing sparsity improved the feature selection performance. Furthermore, incorporating group structures increased robustness of the feature selection procedure. GO2PLS performed optimally in terms of accuracy of joint score estimation, joint loading estimation, and feature selection. We applied GO2PLS to datasets from two studies: TwinsUK (a population study) and CVON-DOSIS (a small case-control study). In the first, we incorporated biological information on the group structures of the methylation CpG sites when integrating the methylation dataset with the IgG glycomics data. The targeted genes of the selected methylation groups turned out to be relevant to the immune system, in which the IgG glycans play important roles. In the second, we selected regulatory regions and transcripts that explained the covariance between regulomics and transcriptomics data. The corresponding genes of the selected features appeared to be relevant to heart muscle disease.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>GO2PLS integrates two omics datasets to help understand the underlying system that involves both omics levels. It incorporates external group information and performs group selection, resulting in a small subset of features that best explain the relationship between two omics datasets for better interpretability.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-021-03958-3","type":"journal-article","created":{"date-parts":[[2021,3,18]],"date-time":"2021-03-18T17:02:48Z","timestamp":1616086968000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Statistical integration of two omics datasets using GO2PLS"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7675-8000","authenticated-orcid":false,"given":"Zhujie","family":"Gu","sequence":"first","affiliation":[]},{"given":"Said","family":"el Bouhaddani","sequence":"additional","affiliation":[]},{"given":"Jiayi","family":"Pei","sequence":"additional","affiliation":[]},{"given":"Jeanine","family":"Houwing-Duistermaat","sequence":"additional","affiliation":[]},{"given":"Hae-Won","family":"Uh","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,3,18]]},"reference":[{"issue":"1","key":"3958_CR1","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1093\/bib\/bbl016","volume":"8","author":"A-LL Boulesteix","year":"2007","unstructured":"Boulesteix A-LL, Strimmer K. Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Briefings Bioinform. 2007;8(1):32\u201344. https:\/\/doi.org\/10.1093\/bib\/bbl016.","journal-title":"Briefings in Bioinformatics"},{"key":"3958_CR2","doi-asserted-by":"publisher","unstructured":"Wold S, Ruhe A, Wold H, Dunn III WJ. The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses. SIAM J Sci Stat Comput. 1984;5(3):735\u201343. https:\/\/doi.org\/10.1137\/0905052 arXiv:1308.0863v1","DOI":"10.1137\/0905052"},{"issue":"1","key":"3958_CR3","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1002\/cem.775","volume":"17","author":"J Trygg","year":"2003","unstructured":"Trygg J, Wold S. O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral OSC filter. J Chemom. 2003;17(1):53\u201364. https:\/\/doi.org\/10.1002\/cem.775.","journal-title":"Journal of Chemometrics"},{"issue":"2","key":"3958_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-015-0854-z","volume":"17","author":"S el Bouhaddani","year":"2016","unstructured":"el Bouhaddani S, Houwing-Duistermaat J, Salo P, Perola M, Jongbloed G, Uh HW. Evaluation of O2PLS in Omics data integration. BMC Bioinform. 2016;17(2):1\u201320. https:\/\/doi.org\/10.1186\/s12859-015-0854-z.","journal-title":"BMC Bioinform."},{"key":"3958_CR5","doi-asserted-by":"publisher","unstructured":"Jolliffe IT, Trendafilov NT, Uddin M. A modified principal component technique based on the LASSO. J Comput Graph Stat. 2003;12(3):531\u201347. https:\/\/doi.org\/10.1198\/1061860032148, arXiv:1205.0121v2","DOI":"10.1198\/1061860032148"},{"key":"3958_CR6","doi-asserted-by":"publisher","unstructured":"Chun, H., Kele\u015f, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc Ser B: Stat Methodol 72(1), 3\u201325 (2010). https:\/\/doi.org\/10.1111\/j.1467-9868.2009.00723.x","DOI":"10.1111\/j.1467-9868.2009.00723.x"},{"key":"3958_CR7","doi-asserted-by":"publisher","unstructured":"L\u00ea Cao, K.A., Rossouw, D., Robert-Grani\u00e9, C., Besse, P. A sparse PLS for variable selection when integrating omics data. Statist Appl Genet Mol Biol. 7(1) (2008). https:\/\/doi.org\/10.2202\/1544-6115.1390","DOI":"10.2202\/1544-6115.1390"},{"issue":"10","key":"3958_CR8","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1186\/gb-2011-12-10-r105","volume":"12","author":"S Tyekucheva","year":"2011","unstructured":"Tyekucheva S, Marchionni L, Karchin R, Parmigiani G. Integrating diverse genomic data using gene sets. Genome Biology. 2011;12(10):105. https:\/\/doi.org\/10.1186\/gb-2011-12-10-r105.","journal-title":"Genome Biol"},{"issue":"1","key":"3958_CR9","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","volume":"68","author":"M Yuan","year":"2006","unstructured":"Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B: Stat Methodol. 2006;68(1):49\u201367. https:\/\/doi.org\/10.1111\/j.1467-9868.2005.00532.x.","journal-title":"Journal of the Royal Statistical Society. Series B: Statistical Methodology"},{"issue":"1","key":"3958_CR10","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1093\/bioinformatics\/btv535","volume":"32","author":"B Liquet","year":"2016","unstructured":"Liquet B, De Micheaux PL, Hejblum BP, Thi\u00e9baut R. Group and sparse group partial least square approaches applied in genomics context. Bioinformatics. 2016;32(1):35\u201342. https:\/\/doi.org\/10.1093\/bioinformatics\/btv535.","journal-title":"Bioinformatics"},{"issue":"6","key":"3958_CR11","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1375\/twin.9.6.899","volume":"9","author":"TD Spector","year":"2006","unstructured":"Spector TD, Williams FMK. The UK Adult Twin Registry (TwinsUK). Twin Res Hum Genet. 2006;9(6):899\u2013906. https:\/\/doi.org\/10.1375\/twin.9.6.899.","journal-title":"Twin Research and Human Genetics"},{"issue":"1","key":"3958_CR12","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1017\/thg.2012.89","volume":"16","author":"A Moayyeri","year":"2013","unstructured":"Moayyeri A, Hammond CJ, Hart DJ, Spector TD. The UK adult twin registry (twinsUK resource). Twin Res Hum Genet. 2013;16(1):144\u20139. https:\/\/doi.org\/10.1017\/thg.2012.89.","journal-title":"Twin Research and Human Genetics"},{"key":"3958_CR13","doi-asserted-by":"publisher","unstructured":"Wahl A, Kasela S, Carnero-Montoro E, van Iterson M, \u0160tambuk J, Sharma S, van\u00a0den Akker E, Klaric L, Benedetti E, Razdorov G, Trbojevi\u0107-Akma\u010di\u0107 I, Vu\u010dkovi\u0107 F, Ugrina I, Beekman M, Deelen J, van Heemst D, Heijmans BT, Consortium BIOS, Wuhrer M, Plomp R, Keser T, \u0160imurina M, Pavi\u0107 T, Gudelj I, Kri\u0161ti\u0107 J, Grallert H, Kunze S, Peters A, Bell JT, Spector TD, Milani L, Slagboom PE, Lauc G, Gieger C. IgG glycosylation and DNA methylation are interconnected with smoking. Biochimica et Biophysica Acta (BBA) - General Subjects 1862(3), 637\u2013648 (2018). https:\/\/doi.org\/10.1016\/J.BBAGEN.2017.10.012","DOI":"10.1016\/J.BBAGEN.2017.10.012"},{"key":"3958_CR14","unstructured":"CVON-DOSIS \u2013 Cardiovascular Research Consortium. http:\/\/cvon-dosis.nl\/. Accessed 18 Nov 2020"},{"issue":"43","key":"3958_CR15","doi-asserted-by":"publisher","first-page":"15545","DOI":"10.1073\/pnas.0506580102","volume":"102","author":"A Subramanian","year":"2005","unstructured":"Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545\u201350. https:\/\/doi.org\/10.1073\/pnas.0506580102.","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"2","key":"3958_CR16","doi-asserted-by":"publisher","first-page":"203","DOI":"10.4161\/epi.23470","volume":"8","author":"Y-aA Chen","year":"2013","unstructured":"Chen Y-aA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8(2):203\u20139. https:\/\/doi.org\/10.4161\/epi.23470.","journal-title":"Epigenetics"},{"key":"3958_CR17","doi-asserted-by":"publisher","unstructured":"Uh H-W, Klari\u0107 L, Ugrina I, Lauc G, Smilde AK, Houwing-Duistermaat JJ. Choosing proper normalization is essential for discovery of sparse glycan biomarkers. Mol Omics. 2020. https:\/\/doi.org\/10.1039\/c9mo00174c.","DOI":"10.1039\/c9mo00174c"},{"issue":"6","key":"3958_CR18","doi-asserted-by":"publisher","first-page":"996","DOI":"10.1101\/gr.229102","volume":"12","author":"WJ Kent","year":"2002","unstructured":"Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler aD. The human genome browser at UCSC. Genome Res. 2002;12(6):996\u20131006. https:\/\/doi.org\/10.1101\/gr.229102.","journal-title":"Genome Research"},{"key":"3958_CR19","unstructured":"UCSC Genome Browser Home. https:\/\/genome.ucsc.edu\/. Accessed 19 Nov 2020"},{"key":"3958_CR20","unstructured":"Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Tech Rep (2010). http:\/\/genomebiology.com\/2010\/11\/3\/R25"},{"key":"3958_CR21","doi-asserted-by":"publisher","unstructured":"Wold H. Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments. In: Multivariate Analysis\u2013III, pp. 383\u2013407 (1973). https:\/\/doi.org\/10.1016\/b978-0-12-426653-7.50032-6. https:\/\/www.sciencedirect.com\/science\/article\/pii\/B9780124266537500326","DOI":"10.1016\/b978-0-12-426653-7.50032-6"},{"issue":"3","key":"3958_CR22","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1093\/biostatistics\/kxp008","volume":"10","author":"DM Witten","year":"2009","unstructured":"Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10(3):515\u201334. https:\/\/doi.org\/10.1093\/biostatistics\/kxp008.","journal-title":"Biostatistics"},{"issue":"1","key":"3958_CR23","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc: Ser B (Methodological). 1996;58(1):267\u201388. https:\/\/doi.org\/10.1111\/j.2517-6161.1996.tb02080.x.","journal-title":"Journal of the Royal Statistical Society: Series B (Methodological)"},{"issue":"1","key":"3958_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1201\/b18401","volume":"84","author":"T Hastie","year":"2015","unstructured":"Hastie T, Tibshirani R, Wainwright M. Statistical learning with sparsity: the lasso and generalizations. Stat Learn Spars: Lasso General. 2015;84(1):1\u2013337. https:\/\/doi.org\/10.1201\/b18401.","journal-title":"Statistical Learning with Sparsity: The Lasso and Generalizations"},{"key":"3958_CR25","doi-asserted-by":"publisher","unstructured":"Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37(SUPPL.2). https:\/\/doi.org\/10.1093\/nar\/gkp427.","DOI":"10.1093\/nar\/gkp427"},{"issue":"1","key":"3958_CR26","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc: Ser B (Methodological). 1995;57(1):289\u2013300. https:\/\/doi.org\/10.1111\/j.2517-6161.1995.tb02031.x.","journal-title":"J R Stat Soc: Ser B (Methodological)"},{"key":"3958_CR27","doi-asserted-by":"publisher","unstructured":"Storey JD. A direct approach to false discovery rates. Technical Report. 2002;3. https:\/\/doi.org\/10.1111\/1467-9868.00346.","DOI":"10.1111\/1467-9868.00346"},{"key":"3958_CR28","doi-asserted-by":"publisher","unstructured":"Gao J, Collyer J, Wang M, Sun F, Xu F. Genetic dissection of hypertrophic cardiomyopathy with myocardial rna-seq. Int J Mol Sci. 2020;21(9). https:\/\/doi.org\/10.3390\/ijms21093040","DOI":"10.3390\/ijms21093040"},{"issue":"14","key":"3958_CR29","doi-asserted-by":"publisher","first-page":"2288","DOI":"10.1002\/sim.7281","volume":"36","author":"R Tissier","year":"2017","unstructured":"Tissier R, Tsonaka R, Mooijaart SP, Slagboom E, Houwing-Duistermaat JJ. Secondary phenotype analysis in ascertained family designs: application to the Leiden longevity study. Stat Med. 2017;36(14):2288\u2013301. https:\/\/doi.org\/10.1002\/sim.7281.","journal-title":"Statistics in Medicine"},{"key":"3958_CR30","doi-asserted-by":"publisher","unstructured":"Bishop CM, Tipping ME. Probabilistic Principal Component Analysis. J R Stat Soc. Ser B 61(iii), 611\u2013622 (1999). https:\/\/doi.org\/10.1111\/1467-9868.00196","DOI":"10.1111\/1467-9868.00196"},{"key":"3958_CR31","doi-asserted-by":"publisher","unstructured":"el Bouhaddani S, Uh HW, Hayward C, Jongbloed G, Houwing-Duistermaat J. Probabilistic partial least squares model: Identifiability, estimation and application. J Multivar Anal. 2018;167:331\u201346. https:\/\/doi.org\/10.1016\/j.jmva.2018.05.009. arXiv:1706.03597","DOI":"10.1016\/j.jmva.2018.05.009"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-03958-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-021-03958-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-03958-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,18]],"date-time":"2021-03-18T17:03:11Z","timestamp":1616086991000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-03958-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,18]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["3958"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-03958-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.08.31.274175","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,18]]},"assertion":[{"value":"29 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 March 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"TwinsUK study: Ethical approval was granted by the National Research Ethics Service London-Westminster, the St Thomas\u2019 Hospital Research Ethics Committee (EC04\/015 and 07\/H0802\/84). All research participants have signed informed consent prior to taking part in any research activities. CVON-DOSIS study: the study protocol was approved by the local ethics committee of the Erasmus MC (2010-409), the Biobank Research Ethics Committee of University Medical Center Utrecht (protocol number 12\/387), and the Washington University School of Medicine Ethics Committee (Institutional Review Board). Informed consent was obtained from each patient prior to surgery or was waived by the ethics committee when acquiring informed consent was not possible due to the death of the donor.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"131"}}