{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:10Z","timestamp":1772138050135,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2022,2,14]],"date-time":"2022-02-14T00:00:00Z","timestamp":1644796800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Agence Nationale de la Recherche (ANR)\u2014JCJC project scMOmix and Sanofi iTech Awards"},{"name":"HPC resources from GENCI-IDRIS","award":["2021-AD011012285"],"award-info":[{"award-number":["2021-AD011012285"]}]},{"name":"European Research Council (ERC project NORIA"},{"name":"French government under management of Agence Nationale de la Recherche as part of the \u2018Investissements d\u2019avenir\u2019 program, reference","award":["ANR19-P3IA-0001"],"award-info":[{"award-number":["ANR19-P3IA-0001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>High-throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity are typically achieved through unsupervised clustering, which crucially relies on a similarity metric.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We here propose the use of Optimal Transport (OT) as a cell\u2013cell similarity metric for single-cell omics data. OT defines distances to compare high-dimensional data represented as probability distributions. To speed up computations and cope with the high dimensionality of single-cell data, we consider the entropic regularization of the classical OT distance. We then extensively benchmark OT against state-of-the-art metrics over 13 independent datasets, including simulated, scRNA-seq, scATAC-seq and single-cell DNA methylation data. First, we test the ability of the metrics to detect the similarity between cells belonging to the same groups (e.g. cell types, cell lines of origin). Then, we apply unsupervised clustering and test the quality of the resulting clusters. OT is found to improve cell\u2013cell similarity inference and cell clustering in all simulated and real scRNA-seq data, as well as in scATAC-seq and single-cell DNA methylation data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>All our analyses are reproducible through the OT-scOmics Jupyter notebook available at https:\/\/github.com\/ComputationalSystemsBiology\/OT-scOmics.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac084","type":"journal-article","created":{"date-parts":[[2022,2,8]],"date-time":"2022-02-08T15:13:27Z","timestamp":1644333207000},"page":"2169-2177","source":"Crossref","is-referenced-by-count":35,"title":["Optimal transport improves cell\u2013cell similarity inference in single-cell omics data"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5208-7855","authenticated-orcid":false,"given":"Geert-Jan","family":"Huizing","sequence":"first","affiliation":[{"name":"Computational Systems Biology Team, Institut de Biologie de l\u2019Ecole Normale Sup\u00e9rieure, CNRS, INSERM, Ecole Normale Sup\u00e9rieure, Universit\u00e9 PSL , 75005 Paris, France"},{"name":"D\u00e9partement de Math\u00e9matiques et Applications de l\u2019Ecole Normale Sup\u00e9rieure, CNRS, Ecole Normale Sup\u00e9rieure, Universit\u00e9 PSL , 75005 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4477-0387","authenticated-orcid":false,"given":"Gabriel","family":"Peyr\u00e9","sequence":"additional","affiliation":[{"name":"D\u00e9partement de Math\u00e9matiques et Applications de l\u2019Ecole Normale Sup\u00e9rieure, CNRS, Ecole Normale Sup\u00e9rieure, Universit\u00e9 PSL , 75005 Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6360-4440","authenticated-orcid":false,"given":"Laura","family":"Cantini","sequence":"additional","affiliation":[{"name":"Computational Systems Biology Team, Institut de Biologie de l\u2019Ecole Normale Sup\u00e9rieure, CNRS, INSERM, Ecole Normale Sup\u00e9rieure, Universit\u00e9 PSL , 75005 Paris, France"}]}],"member":"286","published-online":{"date-parts":[[2022,2,14]]},"reference":[{"key":"2023020109025167000_btac084-B1","author":"Bellazzi","year":"2021"},{"key":"2023020109025167000_btac084-B2","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J. Stat. Mech. Theory Exp"},{"key":"2023020109025167000_btac084-B3","first-page":"211","volume-title":"Bioinformatics","author":"Cao","year":"2022"},{"key":"2023020109025167000_btac084-B4","doi-asserted-by":"crossref","first-page":"1103","DOI":"10.1038\/s41587-020-00748-9","article-title":"A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples","volume":"39","author":"Chen","year":"2021","journal-title":"Nat. Biotechnol"},{"key":"2023020109025167000_btac084-B5","doi-asserted-by":"crossref","first-page":"1193","DOI":"10.1038\/ng.3646","article-title":"Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution","volume":"48","author":"Corces","year":"2016","journal-title":"Nat. Genet"},{"key":"2023020109025167000_btac084-B6","first-page":"2292","article-title":"Sinkhorn distances: lightspeed computation of optimal transport","volume":"26","author":"Cuturi","year":"2013","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023020109025167000_btac084-B7","author":"Demetci","year":"2020"},{"key":"2023020109025167000_btac084-B8","first-page":"2681","author":"Feydy","year":"2019"},{"key":"2023020109025167000_btac084-B9","first-page":"1574","author":"Genevay","year":"2019"},{"key":"2023020109025167000_btac084-B10","doi-asserted-by":"crossref","first-page":"e1004575","DOI":"10.1371\/journal.pcbi.1004575","article-title":"SINCERA: a pipeline for single-cell RNA-seq profiling analysis","volume":"11","author":"Guo","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023020109025167000_btac084-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1874-1","article-title":"Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression","volume":"20","author":"Hafemeister","year":"2019","journal-title":"Genome Biol"},{"key":"2023020109025167000_btac084-B12","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1111\/j.2044-8317.1976.tb00714.x","article-title":"Quadratic assignment as a general data analysis strategy","volume":"29","author":"Hubert","year":"1976","journal-title":"Br. J. Math. Stat. Psychol"},{"key":"2023020109025167000_btac084-B13","article-title":"Unsupervised ground metric learning using wasserstein eigenvector","author":"Huizing","year":"2021","journal-title":"arXiv"},{"key":"2023020109025167000_btac084-B14","first-page":"227","article-title":"On the transfer of masses","volume":"37","author":"Kantorovich","year":"1942","journal-title":"Dokl. Akad. Nauk"},{"key":"2023020109025167000_btac084-B15","doi-asserted-by":"crossref","first-page":"2316","DOI":"10.1093\/bib\/bby076","article-title":"Impact of similarity metrics on single-cell RNA-seq data clustering","volume":"20","author":"Kim","year":"2019","journal-title":"Brief. Bioinform"},{"key":"2023020109025167000_btac084-B16","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat. Rev. Genet"},{"key":"2023020109025167000_btac084-B17","doi-asserted-by":"crossref","first-page":"1428","DOI":"10.1038\/s12276-020-0420-2","article-title":"Single-cell multiomics: technologies and data analysis methods","volume":"52","author":"Lee","year":"2020","journal-title":"Exp. Mol. Med"},{"key":"2023020109025167000_btac084-B18","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1038\/ng.3818","article-title":"Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors","volume":"49","author":"Li","year":"2017","journal-title":"Nat. Genet"},{"key":"2023020109025167000_btac084-B19","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/s41467-018-08205-7","article-title":"Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity","volume":"10","author":"Liu","year":"2019","journal-title":"Nat. Commun"},{"key":"2023020109025167000_btac084-B20","doi-asserted-by":"crossref","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol. Syst. Biol"},{"key":"2023020109025167000_btac084-B21","first-page":"2122","article-title":"A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor","volume":"5","author":"Lun","year":"2016","journal-title":"F1000Research"},{"key":"2023020109025167000_btac084-B22","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1126\/science.aan3351","article-title":"Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex","volume":"357","author":"Luo","year":"2017","journal-title":"Science"},{"key":"2023020109025167000_btac084-B23","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1016\/j.tibtech.2020.02.013","article-title":"Integrative methods and practical challenges for single-cell multi-omics","volume":"38","author":"Ma","year":"2020","journal-title":"Trends Biotechnol"},{"key":"2023020109025167000_btac084-B24","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2023020109025167000_btac084-B25","volume-title":"M\u00e9moire sur la th\u00e9orie des d\u00e9blais et des remblais","author":"Monge","year":"1781"},{"key":"2023020109025167000_btac084-B26","doi-asserted-by":"crossref","first-page":"e1008270","DOI":"10.1371\/journal.pcbi.1008270","article-title":"Epiclomal: probabilistic clustering of sparse single-cell DNA methylation data","volume":"16","author":"P E de Souza","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023020109025167000_btac084-B27","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1038\/nri.2017.76","article-title":"Single-cell RNA sequencing to explore immune cell heterogeneity","volume":"18","author":"Papalexi","year":"2018","journal-title":"Nat. Rev. Immunol"},{"key":"2023020109025167000_btac084-B28","first-page":"2825","article-title":"scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023020109025167000_btac084-B29","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1561\/2200000073","article-title":"Computational optimal transport: with applications to data science","volume":"11","author":"Peyr\u00e9","year":"2019","journal-title":"Found. Trends Mach. Learn"},{"key":"2023020109025167000_btac084-B30","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/s41581-018-0021-7","article-title":"Single-cell RNA sequencing for the study of development, physiology and disease","volume":"14","author":"Potter","year":"2018","journal-title":"Nat. Rev. Nephrol"},{"key":"2023020109025167000_btac084-B31","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/s41586-020-2715-9","article-title":"LifeTime and improving European healthcare through cell-based interceptive medicine","volume":"587","author":"Rajewsky","year":"2020","journal-title":"Nature"},{"key":"2023020109025167000_btac084-B32","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math"},{"key":"2023020109025167000_btac084-B33","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1007\/978-3-319-20828-2","volume-title":"Optimal Transport for Applied Mathematicians","author":"Santambrogio","year":"2015"},{"key":"2023020109025167000_btac084-B34","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023020109025167000_btac084-B35","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1016\/j.cell.2019.01.006","article-title":"Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming","volume":"176","author":"Schiebinger","year":"2019","journal-title":"Cell"},{"key":"2023020109025167000_btac084-B36","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1038\/nrg3833","article-title":"Computational and analytical challenges in single-cell transcriptomics","volume":"16","author":"Stegle","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023020109025167000_btac084-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-41695-z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"Traag","year":"2019","journal-title":"Sci. Rep"},{"key":"2023020109025167000_btac084-B38","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","article-title":"A tutorial on spectral clustering","volume":"17","author":"Von Luxburg","year":"2007","journal-title":"Stat. Comput"},{"key":"2023020109025167000_btac084-B39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2023020109025167000_btac084-B40","doi-asserted-by":"crossref","first-page":"4576","DOI":"10.1038\/s41467-019-12630-7","article-title":"SCALE method for single-cell ATAC-seq analysis via latent feature extraction","volume":"10","author":"Xiong","year":"2019","journal-title":"Nat. Commun"},{"key":"2023020109025167000_btac084-B41","doi-asserted-by":"crossref","first-page":"e1007828","DOI":"10.1371\/journal.pcbi.1007828","article-title":"Predicting cell lineages using autoencoders and optimal transport","volume":"16","author":"Yang","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023020109025167000_btac084-B42","doi-asserted-by":"crossref","first-page":"e179","DOI":"10.1093\/nar\/gkx828","article-title":"Linnorm: improved statistical analysis for single cell RNA-seq expression data","volume":"45","author":"Yip","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023020109025167000_btac084-B43","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2023020109025167000_btac084-B44","doi-asserted-by":"crossref","first-page":"3642","DOI":"10.1093\/bioinformatics\/btz139","article-title":"SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation","volume":"35","author":"Zheng","year":"2019","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac084\/43247329\/btac084.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2169\/49009377\/btac084.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2169\/49009377\/btac084.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T15:53:04Z","timestamp":1675266784000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/8\/2169\/6528312"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,2,14]]},"references-count":44,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac084","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.03.19.436159","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,4,15]]},"published":{"date-parts":[[2022,2,14]]}}}