{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:44:21Z","timestamp":1753875861865,"version":"3.41.2"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T00:00:00Z","timestamp":1747612800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"French national league against cancer"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,6,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Machine learning analyses of molecular omics datasets largely drive the development of precision medicine in oncology, but mathematical challenges still hamper their application in the clinic. In particular, omics-based learning relies on high dimensional data with high degrees of freedom and multicollinearity issues, requiring more tailored algorithms. Here, we have developed a prediction algorithm that relies on the 1-Wasserstein distance to better capture complex relationships between variables, and that is built on a decision rule based on the exact computation of the Kantorovich-Rubinstein optimizer to increase the algorithm precision. We explored dimension reduction and aggregation methods to improve its robustness. The exact method was compared with a neural network-based approximate method, as well as with standard Euclidean distance-based classifiers.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Experimental results on synthetic datasets with multiple scenarios of redundant\/informative variables revealed that exact and approximate methods based on Wasserstein distance outperformed state-of-the-art algorithms when class information was spread across a large number of variables. When predicting clinical or biological outcomes from transcriptomics datasets, HABiC achieved consistently higher accuracy in most situations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Python code for the HABiC classifier is available at https:\/\/github.com\/chiaraco\/HABiC.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf310","type":"journal-article","created":{"date-parts":[[2025,5,19]],"date-time":"2025-05-19T15:24:47Z","timestamp":1747668287000},"source":"Crossref","is-referenced-by-count":0,"title":["HABiC: an algorithm based on the exact computation of the Kantorovich-Rubinstein optimizer for binary classification in transcriptomics"],"prefix":"10.1093","volume":"41","author":[{"given":"Chiara","family":"Cordier","sequence":"first","affiliation":[{"name":"LAREMA, Univ Angers, CNRS, SFR MATHSTIC , Angers F-49000,","place":["France"]},{"name":"Institut de Canc\u00e9rologie de l\u2019Ouest , Angers F-49000,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6068-6441","authenticated-orcid":false,"given":"Pascal","family":"J\u00e9z\u00e9quel","sequence":"additional","affiliation":[{"name":"Institut de Canc\u00e9rologie de l\u2019Ouest , Angers F-49000,","place":["France"]},{"name":"CRCI2NA, Nantes Universit\u00e9, Univ Angers, INSERM, CNRS , Nantes F-44000,","place":["France"]},{"name":"SIRIC ILIAD , Angers F-49000,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5196-5908","authenticated-orcid":false,"given":"Mario","family":"Campone","sequence":"additional","affiliation":[{"name":"Institut de Canc\u00e9rologie de l\u2019Ouest , Angers F-49000,","place":["France"]},{"name":"SIRIC ILIAD , Angers F-49000,","place":["France"]}]},{"given":"Fabien","family":"Panloup","sequence":"additional","affiliation":[{"name":"LAREMA, Univ Angers, CNRS, SFR MATHSTIC , Angers F-49000,","place":["France"]},{"name":"Institut de Canc\u00e9rologie de l\u2019Ouest , Angers F-49000,","place":["France"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9349-1819","authenticated-orcid":false,"given":"Agnes","family":"Basseville","sequence":"additional","affiliation":[{"name":"Institut de Canc\u00e9rologie de l\u2019Ouest , Angers F-49000,","place":["France"]},{"name":"SIRIC ILIAD , Angers F-49000,","place":["France"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,19]]},"reference":[{"key":"2025070408275472900_btaf310-B1","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1007\/3-540-44503-X27","volume-title":"Database Theory\u2014ICDT 2001","author":"Aggarwal","year":"2001"},{"key":"2025070408275472900_btaf310-B2","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1093\/bioinformatics\/btab608","article-title":"Multi-omics data integration by generative adversarial network","volume":"38","author":"Ahmed","year":"2021","journal-title":"Bioinformatics"},{"year":"2017","author":"Arjovsky","key":"2025070408275472900_btaf310-B3"},{"key":"2025070408275472900_btaf310-B4","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1002\/cem.785","article-title":"Partial least squares for discrimination","volume":"17","author":"Barker","year":"2003","journal-title":"J Chemomet"},{"key":"2025070408275472900_btaf310-B5","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1038\/s41746-021-00521-5","article-title":"Digital medicine and the curse of dimensionality","volume":"4","author":"Berisha","year":"2021","journal-title":"NPJ Digit Med"},{"key":"2025070408275472900_btaf310-B6","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1007\/3-540-49257-715","volume-title":"Database Theory\u2014ICDT\u201999, Lecture Notes in Computer Science","author":"Beyer","year":"1999"},{"key":"2025070408275472900_btaf310-B7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1051\/proc\/202068001","article-title":"Statistical data analysis in the Wasserstein space","volume":"68","author":"Bigot","year":"2020","journal-title":"ESAIM: ProcS"},{"key":"2025070408275472900_btaf310-B8","doi-asserted-by":"publisher","first-page":"bbae027","DOI":"10.1093\/bib\/bbae027","article-title":"Should we really use graph neural networks for transcriptomic prediction?","volume":"25","author":"Brouard","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025070408275472900_btaf310-B9","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1093\/bioinformatics\/btab594","article-title":"Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona","volume":"38","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2025070408275472900_btaf310-B10","doi-asserted-by":"publisher","first-page":"3458","DOI":"10.1038\/s41467-020-17281-7","article-title":"Searching large-scale scRNA-seq databases via unbiased cell embedding with cell BLAST","volume":"11","author":"Cao","year":"2020","journal-title":"Nat Commun"},{"key":"2025070408275472900_btaf310-B11","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/s13040-023-00322-4","article-title":"The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification","volume":"16","author":"Chicco","year":"2023","journal-title":"BioData Min"},{"volume-title":"Linear Programming","year":"1983","author":"Chv\u00e1tal","key":"2025070408275472900_btaf310-B12"},{"key":"2025070408275472900_btaf310-B13","doi-asserted-by":"publisher","first-page":"1853","DOI":"10.1109\/TPAMI.2016.2615921","article-title":"Optimal transport for domain adaptation","volume":"39","author":"Courty","year":"2017","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025070408275472900_btaf310-B14","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1089\/cmb.2021.0446","article-title":"SCOT: single-cell multi-omics alignment with optimal transport","volume":"29","author":"Demetci","year":"2022","journal-title":"J Comput Biol"},{"key":"2025070408275472900_btaf310-B15","doi-asserted-by":"publisher","first-page":"1923","DOI":"10.1007\/s10994-018-5717-1","article-title":"Wasserstein discriminant analysis","volume":"107","author":"Flamary","year":"2018","journal-title":"Mach Learn"},{"year":"2015","author":"Frogner","key":"2025070408275472900_btaf310-B16"},{"key":"2025070408275472900_btaf310-B17","doi-asserted-by":"publisher","first-page":"bbac225","DOI":"10.1093\/bib\/bbac225","article-title":"Entropy-based inference of transition states and cellular trajectory for single-cell transcriptomics","volume":"23","author":"Gan","year":"2022","journal-title":"Brief Bioinform"},{"year":"2017","author":"Gulrajani","key":"2025070408275472900_btaf310-B18"},{"key":"2025070408275472900_btaf310-B19","doi-asserted-by":"publisher","first-page":"e1010984","DOI":"10.1371\/journal.pcbi.1010984","article-title":"The effect of non-linear signal in classification problems using gene expression","volume":"19","author":"Heil","year":"2023","journal-title":"PLoS Comput Biol"},{"key":"2025070408275472900_btaf310-B20","doi-asserted-by":"publisher","first-page":"2169","DOI":"10.1093\/bioinformatics\/btac084","article-title":"Optimal transport improves cell\u2013cell similarity inference in single-cell omics data","volume":"38","author":"Huizing","year":"2022","journal-title":"Bioinformatics"},{"key":"2025070408275472900_btaf310-B21","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.3390\/ijms23052481","article-title":"Augmentation of transcriptomic data for improved classification of patients with respiratory diseases of viral origin","volume":"23","author":"Kircher","year":"2022","journal-title":"Int J Mol Sci"},{"key":"2025070408275472900_btaf310-B22","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The Hungarian method for the assignment problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Naval Res Logist"},{"key":"2025070408275472900_btaf310-B23","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1038\/s41467-019-14018-z","article-title":"Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks","volume":"11","author":"Marouf","year":"2020","journal-title":"Nat Commun"},{"year":"2025","author":"Molnar","key":"2025070408275472900_btaf310-B24"},{"key":"2025070408275472900_btaf310-B25","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/978-3-030-28954-610","volume-title":"Explainable AI: Interpreting, Explaining and Visualizing Deep Learning","author":"Montavon","year":"2019"},{"key":"2025070408275472900_btaf310-B26","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1038\/s41586-019-1773-3","article-title":"Gene expression cartography","volume":"576","author":"Nitzan","year":"2019","journal-title":"Nature"},{"key":"2025070408275472900_btaf310-B27","doi-asserted-by":"publisher","first-page":"2825","DOI":"10.5555\/1953048.2078195","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2025070408275472900_btaf310-B28","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1561\/2200000073","article-title":"Computational optimal transport: with applications to data science","volume":"11","author":"Peyr\u00e9","year":"2019","journal-title":"FNT Mach Learn"},{"key":"2025070408275472900_btaf310-B29","doi-asserted-by":"publisher","first-page":"1472","DOI":"10.1109\/TCBB.2020.3039511","article-title":"aWCluster: a novel integrative network-based clustering of multiomics for subtype analysis of cancer data","volume":"19","author":"Pouryahya","year":"2022","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025070408275472900_btaf310-B30","doi-asserted-by":"publisher","first-page":"258","DOI":"10.1038\/s41587-024-02186-3","article-title":"Gene trajectory inference for single-cell data by optimal transport metrics","volume":"43","author":"Qu","year":"2025","journal-title":"Nat Biotechnol"},{"key":"2025070408275472900_btaf310-B31","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-20828-2","volume-title":"Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, Volume 87 of Progress in Nonlinear Differential Equations and Their Applications","author":"Santambrogio","year":"2015"},{"key":"2025070408275472900_btaf310-B32","doi-asserted-by":"publisher","first-page":"1517","DOI":"10.1016\/j.cell.2019.02.026","article-title":"Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming","volume":"176","author":"Schiebinger","year":"2019","journal-title":"Cell"},{"key":"2025070408275472900_btaf310-B33","doi-asserted-by":"publisher","first-page":"827","DOI":"10.1038\/nbt.1665","article-title":"The MicroArray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models","volume":"28","author":"Shi","year":"2010","journal-title":"Nat Biotechnol"},{"key":"2025070408275472900_btaf310-B34","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1186\/s12859-020-3427-8","article-title":"Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data","volume":"21","author":"Smith","year":"2020","journal-title":"BMC Bioinfo"},{"year":"2018","author":"Snow","key":"2025070408275472900_btaf310-B35"},{"key":"2025070408275472900_btaf310-B36","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71050-9","volume-title":"Optimal Transport, Volume 338 of Grundlehren Der Mathematischen Wissenschaften","author":"Villani","year":"2009"},{"key":"2025070408275472900_btaf310-B37","doi-asserted-by":"publisher","first-page":"bbae329","DOI":"10.1093\/bib\/bbae329","article-title":"Accurately deciphering spatial domains for spatially resolved transcriptomics with stCluster","volume":"25","author":"Wang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025070408275472900_btaf310-B38","doi-asserted-by":"publisher","first-page":"104540","DOI":"10.1016\/j.compbiomed.2021.104540","article-title":"Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data","volume":"135","author":"Xiao","year":"2021","journal-title":"Comput Biol Med"},{"key":"2025070408275472900_btaf310-B39","doi-asserted-by":"publisher","first-page":"107026","DOI":"10.1016\/j.cmpb.2022.107026","article-title":"PregGAN: a prognosis prediction model for breast cancer based on conditional generative adversarial networks","volume":"224","author":"Zhang","year":"2022","journal-title":"Comput Methods Programs Biomed"},{"key":"2025070408275472900_btaf310-B40","doi-asserted-by":"publisher","first-page":"4205","DOI":"10.1007\/s40747-022-00695-9","article-title":"Anomaly detection for high-dimensional space using deep hypersphere fused with probability approach","volume":"8","author":"Zheng","year":"2022","journal-title":"Complex Intell Syst"},{"key":"2025070408275472900_btaf310-B41","doi-asserted-by":"publisher","first-page":"e0265150","DOI":"10.1371\/journal.pone.0265150","article-title":"vWCluster: vector-valued optimal transport for network based clustering using multi-omics data in breast cancer","volume":"17","author":"Zhu","year":"2022","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf310\/63235811\/btaf310.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf310\/63235811\/btaf310.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf310\/63235811\/btaf310.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T12:28:04Z","timestamp":1751632084000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf310\/8137827"}},"subtitle":[],"editor":[{"given":"Laura","family":"Cantini","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,5,19]]},"references-count":41,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf310","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,6]]},"published":{"date-parts":[[2025,5,19]]},"article-number":"btaf310"}}