{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T22:05:22Z","timestamp":1773180322502,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T00:00:00Z","timestamp":1727308800000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"publisher","award":["214457"],"award-info":[{"award-number":["214457"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["0069071"],"award-info":[{"award-number":["0069071"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Valid statistical inference is crucial for decision-making but difficult to obtain in supervised learning with multimodal data, e.g. combinations of clinical features, genomic data, and medical images. Multimodal data often warrants the use of black-box algorithms, for instance, random forests or neural networks, which impede the use of traditional variable significance tests.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We address this problem by proposing the use of COvariance MEasure Tests (COMETs), which are calibrated and powerful tests that can be combined with any sufficiently predictive supervised learning algorithm. We apply COMETs to several high-dimensional, multimodal data sets to illustrate (i) variable significance testing for finding relevant mutations modulating drug-activity, (ii) modality selection for predicting survival in liver cancer patients with multiomics data, and (iii) modality selection with clinical features and medical imaging data. In all applications, COMETs yield results consistent with domain knowledge without requiring data-driven pre-processing, which may invalidate type I error control. These novel applications with high-dimensional multimodal data corroborate prior results on the power and robustness of COMETs for significance testing.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>COMETs are implemented in the cometsR package available on CRAN and pycometsPython library available on GitHub. Source code for reproducing all results is available at https:\/\/github.com\/LucasKook\/comets. All data sets used in this work are openly available.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bib\/bbae475","type":"journal-article","created":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T04:26:52Z","timestamp":1727324812000},"source":"Crossref","is-referenced-by-count":3,"title":["Algorithm-agnostic significance testing in supervised learning with multimodal data"],"prefix":"10.1093","volume":"25","author":[{"given":"Lucas","family":"Kook","sequence":"first","affiliation":[{"name":"Institute for Statistics and Mathematics, Vienna University of Economics and Business , Welthandelsplatz 1, AT-1020 Vienna ,","place":["Austria"]}]},{"given":"Anton Rask","family":"Lundborg","sequence":"additional","affiliation":[{"name":"Department of Mathematical Sciences, University of Copenhagen , Universitetsparken 5, DK-2100 Copenhagen ,","place":["Denmark"]}]}],"member":"286","published-online":{"date-parts":[[2024,9,25]]},"reference":[{"key":"2024092604264617000_ref1","doi-asserted-by":"publisher","first-page":"i446","DOI":"10.1093\/bioinformatics\/btz342","article-title":"Deep learning with multimodal representation for pancancer prognosis prediction","volume":"35","author":"Cheerla","year":"2019","journal-title":"Bioinformatics"},{"key":"2024092604264617000_ref2","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1093\/bioinformatics\/btab608","article-title":"Multi-omics data integration by generative adversarial network","volume":"38","author":"Ahmed","year":"2021","journal-title":"Bioinformatics"},{"key":"2024092604264617000_ref3","doi-asserted-by":"publisher","first-page":"bbab569","DOI":"10.1093\/bib\/bbab569","article-title":"Multimodal deep learning for biomedical data fusion: a review","volume":"23","author":"Stahlschmidt","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024092604264617000_ref4","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining, Inference, and Prediction","author":"Hastie","year":"2009"},{"key":"2024092604264617000_ref5","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2024092604264617000_ref6","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1515\/jci-2022-0015","article-title":"A note on efficient minimum cost adjustment sets in causal graphical models","volume":"10","author":"Smucler","year":"2022","journal-title":"J Causal Inference"},{"key":"2024092604264617000_ref7","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1214\/22-STS850","article-title":"Double-estimation-friendly inference for high-dimensional misspecified models","volume":"38","author":"Shah","year":"2023","journal-title":"Stat Sci"},{"key":"2024092604264617000_ref8","article-title":"Kernel-based conditional independence test and application in causal discovery","volume-title":"Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI'11)","author":"Zhang"},{"key":"2024092604264617000_ref9","doi-asserted-by":"publisher","first-page":"20180017","DOI":"10.1515\/jci-2018-0017","article-title":"Approximate kernel-based conditional independence tests for fast non-parametric causal discovery","volume":"7","author":"Strobl","year":"2019","journal-title":"J Causal Inference"},{"key":"2024092604264617000_ref10","doi-asserted-by":"publisher","first-page":"551","DOI":"10.1111\/rssb.12265","article-title":"Panning for gold: \u2018Model-X\u2019 knockoffs for high dimensional controlled variable selection","volume":"80","author":"Cand\u00e8s","year":"2018","journal-title":"J R Stat Soc Series B Stat Methodology"},{"key":"2024092604264617000_ref11","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1111\/rssb.12340","article-title":"The conditional permutation test for independence while controlling for confounders","volume":"82","author":"Berrett","year":"2019","journal-title":"J R Stat Soc Series B Stat Methodology"},{"key":"2024092604264617000_ref12","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1111\/biom.13392","article-title":"Nonparametric variable importance assessment using machine learning techniques","volume":"77","author":"Williamson","year":"2021","journal-title":"Biometrics"},{"key":"2024092604264617000_ref13","doi-asserted-by":"publisher","first-page":"1645","DOI":"10.1080\/01621459.2021.2003200","article-title":"A general framework for inference on algorithm-agnostic variable importance","volume":"118","author":"Williamson","year":"2023","journal-title":"J Am Stat Assoc"},{"key":"2024092604264617000_ref14","article-title":"The Projected Covariance Measure for assumption-lean variable significance testing","author":"Lundborg","year":"2022"},{"key":"2024092604264617000_ref15","doi-asserted-by":"crossref","first-page":"1514","DOI":"10.1214\/19-AOS1857","article-title":"The hardness of conditional independence testing and the Generalised Covariance Measure","volume":"48","author":"Shah","year":"2020","journal-title":"Ann Stat"},{"key":"2024092604264617000_ref16","first-page":"12517","article-title":"The weighted Generalised Covariance Measure","volume":"23","author":"Scheidegger","year":"2022","journal-title":"J Mach Learn Res"},{"key":"2024092604264617000_ref17","doi-asserted-by":"crossref","first-page":"3388","DOI":"10.1214\/22-AOS2233","article-title":"Local permutation tests for conditional independence","volume":"50","author":"Kim","year":"2022","journal-title":"Ann Stat"},{"key":"2024092604264617000_ref18","doi-asserted-by":"publisher","article-title":"Rank-transformed subsampling: Inference for multiple data splitting and exchangeable p-values","author":"Guo","DOI":"10.1093\/jrsssb\/qkae091"},{"key":"2024092604264617000_ref19","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1038\/nature11003","article-title":"The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity","volume":"483","author":"Barretina","year":"2012","journal-title":"Nature"},{"key":"2024092604264617000_ref20","article-title":"Conditional independence testing using generative adversarial","volume-title":"Advances in Neural Information Processing Systems","author":"Bellot","year":"2019"},{"key":"2024092604264617000_ref21","first-page":"13029","article-title":"Double generative adversarial networks for conditional independence testing","volume":"22","author":"Shi","year":"2021","journal-title":"J Mach Learn Res"},{"key":"2024092604264617000_ref22","doi-asserted-by":"publisher","first-page":"1248","DOI":"10.1158\/1078-0432.CCR-17-0853","article-title":"Deep learning\u2013based multi-omics integration robustly predicts survival in liver cancer","volume":"24","author":"Chaudhary","year":"2018","journal-title":"Clin Cancer Res"},{"key":"2024092604264617000_ref23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13073-021-00930-x","article-title":"Deepprog: An ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data","volume":"13","author":"Poirion","year":"2021","journal-title":"Genome Med"},{"key":"2024092604264617000_ref24","article-title":"MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs","author":"Johnson","year":"2019"},{"key":"2024092604264617000_ref25","doi-asserted-by":"publisher","first-page":"454","DOI":"10.1148\/radiol.212482","article-title":"Simplified transfer learning for chest radiography models using less data","volume":"305","author":"Sellergren","year":"2022","journal-title":"Radiology"},{"key":"2024092604264617000_ref26","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Core Team","year":"2021"},{"key":"2024092604264617000_ref27","doi-asserted-by":"publisher","DOI":"10.32614\/CRAN.package.comets","volume-title":"COMETs: Covariance Measure Tests for Conditional Independence","author":"Kook","year":"2024"},{"key":"2024092604264617000_ref28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v077.i01","article-title":"Ranger: a fast implementation of random forests for high dimensional data in C++ and R","volume":"77","author":"Wright","year":"2017","journal-title":"J Stat Softw"},{"key":"2024092604264617000_ref29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v106.i01","article-title":"Elastic net regularization paths for all generalized linear models","volume":"106","author":"Tay","year":"2023","journal-title":"J Stat Softw"},{"key":"2024092604264617000_ref30","volume-title":"Pycomets: Covariance Measure Tests for Conditional Independence","author":"Huang","year":"2024"},{"key":"2024092604264617000_ref31","doi-asserted-by":"publisher","first-page":"802","DOI":"10.1214\/12-AOS1077","article-title":"Valid post-selection inference","volume":"41","author":"Berk","year":"2013","journal-title":"Ann Stat"},{"key":"2024092604264617000_ref32","doi-asserted-by":"publisher","first-page":"e230060","DOI":"10.1148\/ryai.230060","article-title":"Risk of bias in chest radiography deep learning foundation models","volume":"5","author":"Glocker","year":"2023","journal-title":"Radiology: Artif Intell"},{"key":"2024092604264617000_ref33","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1080\/00031305.2018.1529625","article-title":"Valid p-values behave exactly as they should: some misleading criticisms of p-values and their resolution with s-values","volume":"73","author":"Greenland","year":"2019","journal-title":"Am Stat"},{"key":"2024092604264617000_ref34","doi-asserted-by":"publisher","first-page":"1821","DOI":"10.1111\/rssb.12544","article-title":"Conditional independence testing in hilbert spaces with applications to functional data analysis","volume":"84","author":"Lundborg","year":"2022","journal-title":"J R Stat Soc Series B Stat Methodology"},{"key":"2024092604264617000_ref35","doi-asserted-by":"crossref","first-page":"2116","DOI":"10.1214\/23-AOS2323","article-title":"Nonparametric conditional local independence testing","volume":"51","author":"Christgau","year":"2023","journal-title":"Ann Stat"},{"key":"2024092604264617000_ref36","doi-asserted-by":"publisher","article-title":"Model-based causal feature selection for general response types","author":"Kook","DOI":"10.1080\/01621459.2024.2395588"},{"key":"2024092604264617000_ref37","article-title":"A general framework for the analysis of kernel-based tests","author":"Fern\u00e1ndez"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae475\/59335609\/bbae475.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae475\/59335609\/bbae475.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T04:26:58Z","timestamp":1727324818000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae475\/7774897"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,23]]},"references-count":37,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,9,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae475","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,9,23]]},"article-number":"bbae475"}}