{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T06:32:30Z","timestamp":1772865150810,"version":"3.50.1"},"reference-count":62,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2020,8,21]],"date-time":"2020-08-21T00:00:00Z","timestamp":1597968000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research","award":["01IS18036A"],"award-info":[{"award-number":["01IS18036A"]}]},{"DOI":"10.13039\/501100001659","name":"German Research Foundation","doi-asserted-by":"publisher","award":["BO3139\/4-3"],"award-info":[{"award-number":["BO3139\/4-3"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"German Research Foundation","doi-asserted-by":"publisher","award":["HO6422\/1-2"],"award-info":[{"award-number":["HO6422\/1-2"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Multi-omics data, that is, datasets containing different types of high-dimensional molecular variables, are increasingly often generated for the investigation of various diseases. Nevertheless, questions remain regarding the usefulness of multi-omics data for the prediction of disease outcomes such as survival time. It is also unclear which methods are most appropriate to derive such prediction models. We aim to give some answers to these questions through a large-scale benchmark study using real data. Different prediction methods from machine learning and statistics were applied on 18 multi-omics cancer datasets (35 to 1000 observations, up to 100 000 variables) from the database \u2018The Cancer Genome Atlas\u2019 (TCGA). The considered outcome was the (censored) survival time. Eleven methods based on boosting, penalized regression and random forest were compared, comprising both methods that do and that do not take the group structure of the omics variables into account. The Kaplan\u2013Meier estimate and a Cox model using only clinical variables were used as reference methods. The methods were compared using several repetitions of 5-fold cross-validation. Uno\u2019s C-index and the integrated Brier score served as performance metrics. The results indicate that methods taking into account the multi-omics structure have a slightly better prediction performance. Taking this structure into account can protect the predictive information in low-dimensional groups\u2014especially clinical variables\u2014from not being exploited during prediction. Moreover, only the block forest method outperformed the Cox model on average, and only slightly. This indicates, as a by-product of our study, that in the considered TCGA studies the utility of multi-omics data for prediction purposes was limited. Contact: \u00a0moritz.herrmann@stat.uni-muenchen.de, +49 89 2180 3198 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online. All analyses are reproducible using R code freely available on Github.<\/jats:p>","DOI":"10.1093\/bib\/bbaa167","type":"journal-article","created":{"date-parts":[[2020,7,9]],"date-time":"2020-07-09T11:09:31Z","timestamp":1594292971000},"source":"Crossref","is-referenced-by-count":91,"title":["Large-scale benchmark study of survival prediction methods using multi-omics data"],"prefix":"10.1093","volume":"22","author":[{"given":"Moritz","family":"Herrmann","sequence":"first","affiliation":[{"name":"Department of Statistics, Ludwig Maximilian University, Munich, 80539, Germany"}]},{"given":"Philipp","family":"Probst","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany"}]},{"given":"Roman","family":"Hornung","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany"}]},{"given":"Vindi","family":"Jurinovic","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany"}]},{"given":"Anne-Laure","family":"Boulesteix","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig Maximilian University, Munich, 81377, Germany"}]}],"member":"286","published-online":{"date-parts":[[2020,8,22]]},"reference":[{"key":"2021061607281771000_ref1","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1186\/s13059-017-1215-1","article-title":"Multi-omics approaches to disease","volume":"18","author":"Hasin","year":"2017","journal-title":"Genome Biol"},{"key":"2021061607281771000_ref2","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1093\/bib\/bbq085","article-title":"Added predictive value of high-throughput molecular data to clinical data and its validation","volume":"12","author":"Boulesteix","year":"2011","journal-title":"Brief Bioinform"},{"key":"2021061607281771000_ref3","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1186\/s12859-018-2344-6","article-title":"Priority-lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data","volume":"19","author":"Klau","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2021061607281771000_ref4","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1186\/1471-2105-10-413","article-title":"Survival prediction from clinico-genomic models\u2014a comparative study","volume":"10","author":"B\u00f8velstad","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2021061607281771000_ref5","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/bib\/bbu003","article-title":"Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA","volume":"16","author":"Zhao","year":"2014","journal-title":"Brief Bioinform"},{"key":"2021061607281771000_ref6","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1080\/00949655.2014.929131","article-title":"Automatic model selection for high-dimensional survival analysis","volume":"85","author":"Lang","year":"2015","journal-title":"J Stat Comput Simul"},{"key":"2021061607281771000_ref7","article-title":"Combining clinical and molecular data in regression prediction models: insights from a simulation study","author":"De Bin","year":"2019","journal-title":"Brief Bioinform"},{"key":"2021061607281771000_ref8","doi-asserted-by":"crossref","first-page":"1314","DOI":"10.1002\/bimj.201700243","article-title":"Making complex prediction rules applicable for readers: current practice in random forest literature and recommendations","volume":"61","author":"Boulesteix","year":"2019","journal-title":"Biom J"},{"key":"2021061607281771000_ref9","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1186\/1471-2288-14-117","article-title":"Added predictive value of omics data: specific issues related to validation illustrated by two case studies","volume":"14","author":"De Bin","year":"2014","journal-title":"BMC Med Res Methodol"},{"key":"2021061607281771000_ref10","doi-asserted-by":"crossref","first-page":"5310","DOI":"10.1002\/sim.6246","article-title":"Investigating the prediction ability of survival models based on both clinical and omics data: two case studies","volume":"33","author":"De Bin","year":"2014","journal-title":"Stat Med"},{"key":"2021061607281771000_ref11","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/1471-2105-9-14","article-title":"Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models","volume":"9","author":"Binder","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2021061607281771000_ref12","doi-asserted-by":"crossref","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep learning-based multi-omics integration robustly predicts survival in liver cancer","volume":"24","author":"Chaudhary","year":"2018","journal-title":"Clin Cancer Res"},{"key":"2021061607281771000_ref13","article-title":"Integrating multi-omics data with deep learning for predicting cancer prognosis","author":"Chai","year":"2019","journal-title":"bioRxiv"},{"key":"2021061607281771000_ref14","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"key":"2021061607281771000_ref15","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3","article-title":"The lasso method for variable selection in the Cox model","volume":"16","author":"Tibshirani","year":"1997","journal-title":"Stat Med"},{"key":"2021061607281771000_ref16","article-title":"Clinical outcome prediction based on multi-omics data: extension of IPF-LASSO","author":"Schulze","year":"2017"},{"key":"2021061607281771000_ref17","doi-asserted-by":"publisher","DOI":"10.1155\/2017\/7691937","article-title":"IPF-LASSO: integrative L1-penalized regression with penalty factors for prediction based on multi-omics data","author":"Boulesteix","year":"2017","journal-title":"Comput Math Methods Med"},{"key":"2021061607281771000_ref18","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1080\/10618600.2012.681250","article-title":"A sparse-group lasso","volume":"22","author":"Simon","year":"2013","journal-title":"J Comput Graph Stat"},{"key":"2021061607281771000_ref19","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"key":"2021061607281771000_ref20","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1002\/sim.6732","article-title":"Better prediction by use of co-data: adaptive group-regularized ridge regression","volume":"35","author":"van de Wiel","year":"2016","journal-title":"Stat Med"},{"key":"2021061607281771000_ref21","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann Statist"},{"key":"2021061607281771000_ref22","doi-asserted-by":"crossref","first-page":"2828","DOI":"10.1093\/bioinformatics\/btl462","article-title":"Model-based boosting in high dimensions","volume":"22","author":"Hothorn","year":"2006","journal-title":"Bioinformatics"},{"key":"2021061607281771000_ref23","first-page":"477","article-title":"Boosting algorithms: regularization, prediction and model fitting","volume":"22","author":"B\u00fchlmann","year":"2007","journal-title":"Statist Sci"},{"key":"2021061607281771000_ref24","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1111\/j.1541-0420.2006.00578.x","article-title":"Generalized additive modeling with implicit variable selection by likelihood-based boosting","volume":"62","author":"Tutz","year":"2006","journal-title":"Biometrics"},{"key":"2021061607281771000_ref25","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2021061607281771000_ref26","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1214\/08-AOAS169","article-title":"Random survival forests","volume":"2","author":"Ishwaran","year":"2008","journal-title":"Ann Appl Stat"},{"key":"2021061607281771000_ref27","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1186\/s12859-019-2942-y","article-title":"Block forests: random forests for blocks of clinical and omics covariate data","volume":"20","author":"Hornung","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2021061607281771000_ref28","doi-asserted-by":"crossref","first-page":"e61562","DOI":"10.1371\/journal.pone.0061562","article-title":"A plea for neutral comparison studies in computational sciences","volume":"8","author":"Boulesteix","year":"2013","journal-title":"PLoS One"},{"key":"2021061607281771000_ref29","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1186\/s12874-017-0417-2","article-title":"Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies","volume":"17","author":"Boulesteix","year":"2017","journal-title":"BMC Med Res Methodol"},{"key":"2021061607281771000_ref30","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1186\/s12859-018-2264-5","article-title":"Random forest versus logistic regression: a large-scale benchmark experiment","volume":"19","author":"Couronn\u00e9","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2021061607281771000_ref31","first-page":"1","article-title":"mlr: machine learning in R","volume":"17","author":"Bischl","year":"2016","journal-title":"J Mach Learn Res"},{"key":"2021061607281771000_ref32","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Core Team","year":"2018"},{"key":"2021061607281771000_ref33","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1145\/2641190.2641198","article-title":"OpenML: networked science in machine learning","volume":"15","author":"Vanschoren","year":"2013","journal-title":"SIGKDD Explor"},{"key":"2021061607281771000_ref34","first-page":"1","article-title":"OpenML: an R package to connect to the machine learning platform OpenML","volume":"32","author":"Casalicchio","year":"2017","journal-title":"Comput Statist"},{"key":"2021061607281771000_ref35","volume-title":"Checkpoint: Install Packages from Snapshots on the Checkpoint Server for Reproducibility","author":"Microsoft Corporation","year":"2018"},{"key":"2021061607281771000_ref36","doi-asserted-by":"crossref","first-page":"135","DOI":"10.21105\/joss.00135","article-title":"Batchtools: tools for R to work on batch systems","volume":"2","author":"Lang","year":"2017","journal-title":"J Open Source Softw"},{"key":"2021061607281771000_ref37","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1002\/sim.4154","article-title":"On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data","volume":"30","author":"Uno","year":"2011","journal-title":"Stat Med"},{"key":"2021061607281771000_ref38","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1093\/biostatistics\/kxy006","article-title":"The c-index is not proper for the evaluation of-year predicted risks","volume":"20","author":"Blanche","year":"2019","journal-title":"Biostatistics"},{"key":"2021061607281771000_ref39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J Stat Softw"},{"key":"2021061607281771000_ref40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v039.i05","article-title":"Regularization paths for Cox\u2019s proportional hazards model via coordinate descent","volume":"39","author":"Simon","year":"2011","journal-title":"J Stat Softw"},{"key":"2021061607281771000_ref41","volume-title":"SGL: Fit a GLM (or Cox Model) with a Combination of Lasso and Group Lasso Regularization","author":"Simon","year":"2018"},{"key":"2021061607281771000_ref42","volume-title":"ipflasso: Integrative Lasso with Penalty Factors","author":"Boulesteix","year":"2019"},{"key":"2021061607281771000_ref43","volume-title":"prioritylasso: Analyzing Multiple Omics Data with an Offset Approach","author":"Klau","year":"2017"},{"key":"2021061607281771000_ref44","volume-title":"GRridge: Better Prediction by Use of Co-Data: Adaptive Group-Regularized Ridge Regression","author":"van de Wiel","year":"2018"},{"key":"2021061607281771000_ref45","volume-title":"mboost: Model-Based Boosting","author":"Hothorn","year":"2018"},{"key":"2021061607281771000_ref46","volume-title":"CoxBoost: Cox Models by Likelihood Based Boosting for a Single Survival Endpoint or Competing Risks","author":"Binder","year":"2013"},{"key":"2021061607281771000_ref47","article-title":"randomForestSRC: Random forests for survival, regression, and classification (rf-src)","author":"Ishwaran","year":"2007"},{"key":"2021061607281771000_ref48","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v077.i01","article-title":"ranger: a fast implementation of random forests for high dimensional data in C++ and R","volume":"77","author":"Wright","year":"2017","journal-title":"J Stat Softw"},{"key":"2021061607281771000_ref49","article-title":"blockForest: block forests: random forests for blocks of clinical and omics covariate data","author":"Hornung","year":"2019"},{"key":"2021061607281771000_ref50","volume-title":"survival: A Package for Survival Analysis in S","author":"Therneau","year":"2015"},{"key":"2021061607281771000_ref51","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1093\/bib\/bbq073","article-title":"An empirical assessment of validation practices for molecular classifiers","volume":"12","author":"Castaldi","year":"2011","journal-title":"Brief Bioinform"},{"key":"2021061607281771000_ref52","doi-asserted-by":"crossref","first-page":"i105","DOI":"10.1093\/bioinformatics\/btu279","article-title":"Cross-study validation for the assessment of prediction algorithms","volume":"30","author":"Bernau","year":"2014","journal-title":"Bioinformatics"},{"key":"2021061607281771000_ref53","doi-asserted-by":"crossref","first-page":"2599","DOI":"10.1007\/s00180-013-0420-y","article-title":"Benchmarking local classification methods","volume":"28","author":"Bischl","year":"2013","journal-title":"Comput Statist"},{"key":"2021061607281771000_ref54","doi-asserted-by":"crossref","first-page":"e1301","DOI":"10.1002\/widm.1301","article-title":"Hyperparameters and tuning strategies for random forest","volume":"9","author":"Probst","year":"2019","journal-title":"Data Min Knowl Discov"},{"key":"2021061607281771000_ref55","first-page":"1089","article-title":"No unbiased estimator of the variance of K-fold cross-validation","volume":"5","author":"Bengio","year":"2004","journal-title":"J Mach Learn Res"},{"key":"2021061607281771000_ref56","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1016\/j.jspi.2020.03.003","article-title":"On the asymptotic behaviour of the variance estimator of a U-statistic","volume":"209","author":"Fuchs","year":"2020","journal-title":"J Stat Plan Infer"},{"key":"2021061607281771000_ref57","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1007\/s00180-015-0642-2","article-title":"Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost","volume":"31","author":"De Bin","year":"2016","journal-title":"Comput Statist"},{"key":"2021061607281771000_ref58","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1080\/00031305.2015.1005128","article-title":"A statistical framework for hypothesis testing in real data comparison studies","volume":"69","author":"Boulesteix","year":"2015","journal-title":"Amer Statist"},{"key":"2021061607281771000_ref59","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1186\/1471-2288-9-85","article-title":"Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction","volume":"9","author":"Boulesteix","year":"2009","journal-title":"BMC Med Res Methodol"},{"key":"2021061607281771000_ref60","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1111\/biom.12041","article-title":"Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms","volume":"69","author":"Bernau","year":"2013","journal-title":"Biometrics"},{"key":"2021061607281771000_ref61","volume-title":"Shiny: Web Application Framework for R","author":"Chang","year":"2018"},{"key":"2021061607281771000_ref62","doi-asserted-by":"crossref","first-page":"1990","DOI":"10.1093\/bioinformatics\/btq323","article-title":"Over-optimism in bioinformatics: an illustration","volume":"26","author":"Jelizarow","year":"2010","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/3\/bbaa167\/38657332\/bbaa167.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/3\/bbaa167\/38657332\/bbaa167.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T16:28:01Z","timestamp":1723220881000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa167\/5895463"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,22]]},"references-count":62,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,5,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa167","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,5]]},"published":{"date-parts":[[2020,8,22]]},"article-number":"bbaa167"}}