{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T10:36:02Z","timestamp":1769164562514,"version":"3.49.0"},"reference-count":41,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T00:00:00Z","timestamp":1758499200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>The complexity of COVID-19 requires approaches that extend beyond symptom-based descriptors. Multi-omic data, combining clinical, proteomic, and metabolomic information, offer a more detailed view of disease mechanisms and biomarker discovery.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>As part of a large-scale Quebec initiative, we collected extensive datasets from COVID-19 positive and negative patient samples. Using a multi-view machine learning framework with ensemble methods, we integrated thousands of features across clinical, proteomic, and metabolomic domains to classify COVID-19 status. We further applied a novel feature relevance methodology to identify condensed signatures.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Our models achieved a balanced accuracy of 89% \u00b1 5% despite the high-dimensional nature of the data. Feature selection yielded 12- and 50-feature signatures that improved classification accuracy by at least 3% compared to the full feature set. These signatures were both accurate and interpretable.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>This work demonstrates that multi-omic integration, combined with advanced machine learning, enables the extraction of robust COVID-19 signatures from complex datasets. The condensed biomarker sets provide a practical path toward improved diagnosis and precision medicine, representing a significant advancement in COVID-19 biomarker discovery.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fbinf.2025.1645785","type":"journal-article","created":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T05:28:20Z","timestamp":1758518900000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Extracting a COVID-19 signature from a multi-omic dataset"],"prefix":"10.3389","volume":"5","author":[{"given":"Baptiste","family":"Bauvin","sequence":"first","affiliation":[]},{"given":"Thibaud","family":"Godon","sequence":"additional","affiliation":[]},{"given":"Guillaume","family":"Bachelot","sequence":"additional","affiliation":[]},{"given":"Claudia","family":"Carpentier","sequence":"additional","affiliation":[]},{"given":"Riikka","family":"Huusaari","sequence":"additional","affiliation":[]},{"given":"Maxime","family":"Deraspe","sequence":"additional","affiliation":[]},{"given":"Juho","family":"Rousu","sequence":"additional","affiliation":[]},{"given":"Caroline","family":"Quach","sequence":"additional","affiliation":[]},{"given":"Jacques","family":"Corbeil","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,9,22]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.inffus.2019.12.012","article-title":"Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai","volume":"58","author":"Arrieta","year":"2020","journal-title":"Inf. Fusion"},{"key":"B2","first-page":"139","article-title":"Integrating and reporting full multi-view supervised learning experiments using summit","volume-title":"Proceedings of the fourth international workshop on learning with imbalanced domains: theory and applications","author":"Bauvin","year":"2022"},{"key":"B3","first-page":"130","article-title":"Sample boosting Algorithm (SamBA) - an interpretable greedy ensemble classifier based on local expertise for fat data","volume-title":"Proceedings of the thirty-ninth conference on uncertainty in artificial intelligence","author":"Bauvin","year":"2023"},{"key":"B4","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1145\/130385.130401","article-title":"A training algorithm for optimal margin classifiers","volume-title":"Proceedings of the fifth annual workshop on computational learning theory","author":"Boser","year":"1992"},{"key":"B5","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"B6","volume-title":"Classification and regression trees","author":"Breiman","year":"1984"},{"key":"B7","first-page":"3121","article-title":"The balanced accuracy and its posterior distribution","author":"Brodersen","year":"2010"},{"key":"B8","doi-asserted-by":"publisher","first-page":"e106267","DOI":"10.15252\/embj.2020106267","article-title":"Syncytia formation by SARS-CoV-2-infected cells","volume":"39","author":"Buchrieser","year":"2020","journal-title":"EMBO J."},{"key":"B9","first-page":"35","article-title":"How evaluation guides ai research: the message still counts more than the medium","volume":"9","author":"Cohen","year":"1988","journal-title":"AI Mag."},{"key":"B10","doi-asserted-by":"publisher","first-page":"2414","DOI":"10.3390\/ijms23052414","article-title":"Covidomics: the proteomic and metabolomic signatures of covid-19","volume":"23","author":"Costanzo","year":"2022","journal-title":"Int. J. Mol. Sci."},{"key":"B11","doi-asserted-by":"publisher","first-page":"754","DOI":"10.1186\/s12864-016-2889-6","article-title":"Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons","volume":"17","author":"Drouin","year":"2016","journal-title":"BMC Genomics"},{"key":"B12","first-page":"177","article-title":"All models are wrong, but many are useful: learning a variable\u2019s importance by studying an entire class of prediction models simultaneously","volume":"20","author":"Fisher","year":"2019","journal-title":"J. Mach. Learn. Res."},{"key":"B13","doi-asserted-by":"publisher","first-page":"238","DOI":"10.2307\/1403797","article-title":"Discriminatory analysis. Nonparametric discrimination: consistency properties","volume":"57","author":"Fix","year":"1989","journal-title":"Int. Stat. Rev."},{"key":"B14","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1006\/jcss.1997.1504","article-title":"A decision-theoretic generalization of on-line learning and an application to boosting","volume":"55","author":"Freund","year":"1997","journal-title":"Jour. Comp. Sys. Sci."},{"key":"B15","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Statistics"},{"key":"B16","volume-title":"Randomscm: interpretable ensembles of sparse classifiers tailored for omics data","author":"Godon","year":"2022"},{"key":"B17","volume-title":"Invariant causal set covering machines","author":"Godon","year":"2023"},{"key":"B18","doi-asserted-by":"publisher","first-page":"874455","DOI":"10.3389\/fpubh.2022.874455","article-title":"An explainable ai approach for the rapid diagnosis of covid-19 using ensemble learning algorithms","volume":"10","author":"Gong","year":"2022","journal-title":"Front. Public Health"},{"key":"B19","doi-asserted-by":"publisher","first-page":"1171","DOI":"10.1214\/009053607000000677","article-title":"Kernel methods in machine learning","volume":"36","author":"Hofmann","year":"2008","journal-title":"Ann. Statistics"},{"key":"B20","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1016\/S0140-6736(20)30183-5","article-title":"Clinical features of patients infected with 2019 novel coronavirus in wuhan, China","volume":"395","author":"Huang","year":"2020","journal-title":"Lancet"},{"key":"B21","volume-title":"Learning primal-dual sparse kernel machines","author":"Huusari","year":"2021"},{"key":"B22","doi-asserted-by":"publisher","first-page":"2103","DOI":"10.1111\/jth.14975","article-title":"Coagulopathy in covid-19","volume":"18","author":"Iba","year":"2020","journal-title":"J. Thrombosis Haemostasis"},{"key":"B23","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: kyoto encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"B24","first-page":"1","article-title":"Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning","volume":"18","author":"Lema\u00eetre","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"B25","doi-asserted-by":"publisher","first-page":"2019","DOI":"10.1038\/s41418-021-00795-y","article-title":"Syncytia formation during SARS-CoV-2 lung infection: a disastrous Unity to eliminate lymphocytes","volume":"28","author":"Lin","year":"2021","journal-title":"Cell Death Differ."},{"key":"B26","doi-asserted-by":"publisher","first-page":"e0267047","DOI":"10.1371\/journal.pone.0267047","article-title":"Multi-omic analysis reveals enriched pathways associated with covid-19 and covid-19 severity","volume":"17","author":"Lipman","year":"2022","journal-title":"PLoS ONE"},{"key":"B27","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/s12859-022-05127-6","article-title":"Machine learning to analyse omic-data for covid-19 diagnosis and prognosis","volume":"24","author":"Liu","year":"2023","journal-title":"BMC Bioinforma."},{"key":"B28","first-page":"723","article-title":"The set covering machine","volume":"3","author":"Marchand","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"B29","doi-asserted-by":"publisher","first-page":"1076","DOI":"10.1016\/j.healun.2021.06.006","article-title":"Coagulation and wound repair during COVID-19","volume":"40","author":"Menachery","year":"2021","journal-title":"J. Heart Lung Transpl."},{"key":"B30","volume-title":"Neural networks and deep learning, vol. 25","author":"Nielsen","year":"2015"},{"key":"B31","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1016\/j.cels.2020.10.003","article-title":"Large-scale multi-omic analysis of covid-19 severity","volume":"12","author":"Overmyer","year":"2020","journal-title":"Cell Syst."},{"key":"B32","doi-asserted-by":"publisher","first-page":"167280","DOI":"10.1016\/j.jmb.2021.167280","article-title":"The mechanism and consequences of sars-cov-2 spike-mediated fusion and syncytia formation","volume":"434","author":"Rajah","year":"2021","journal-title":"J. Mol. Biol."},{"key":"B33","doi-asserted-by":"publisher","first-page":"100277","DOI":"10.1016\/j.mcpro.2022.100277","article-title":"Early prediction of covid-19 patient survival by targeted plasma multi-omics and machine learning","volume":"21","author":"Richard","year":"2022","journal-title":"Mol. Cell. Proteomics"},{"key":"B34","article-title":"Diet networks: thin parameters for fat genomics","author":"Romero","year":"2017"},{"key":"B35","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining Black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin","year":"2019","journal-title":"Nat. Mach. Intell."},{"key":"B36","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1145\/1101149.1101236","article-title":"Early versus late fusion in semantic video analysis","volume":"2005","author":"Snoek","year":"2005","journal-title":"Proc. 13th ACM Int. Conf. Multimedia"},{"key":"B37","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B Methodol."},{"key":"B38","doi-asserted-by":"publisher","first-page":"e0245031","DOI":"10.1371\/journal.pone.0245031","article-title":"The Biobanque qu\u00e9b\u00e9coise de la COVID-19 (BQC19)\u2014A cohort to prospectively study the clinical and biological determinants of COVID-19 clinical trajectories","volume":"16","author":"Tremblay","year":"2021","journal-title":"PLoS One"},{"key":"B39","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/s40537-024-00905-w","article-title":"Feature selection strategies: a comparative analysis of shap-value and importance-based methods","volume":"11","author":"Wang","year":"2024","journal-title":"J. Big Data"},{"key":"B40","doi-asserted-by":"publisher","first-page":"D521","DOI":"10.1093\/nar\/gkl923","article-title":"HMDB: the human metabolome database","volume":"35","author":"Wishart","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"B41","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/s12014-023-09436-7","article-title":"Validation of ANG-1 and P-SEL as biomarkers of post-COVID-19 conditions using data from the Biobanque qu\u00e9b\u00e9coise de la COVID-19 (BQC-19)","volume":"20","author":"Yamga","year":"2023","journal-title":"Clin. Proteomics"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1645785\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T05:28:23Z","timestamp":1758518903000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1645785\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,22]]},"references-count":41,"alternative-id":["10.3389\/fbinf.2025.1645785"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1645785","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,22]]},"article-number":"1645785"}}