{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:34Z","timestamp":1772138074745,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,6,14]],"date-time":"2025-06-14T00:00:00Z","timestamp":1749859200000},"content-version":"vor","delay-in-days":13,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Canadian NSERC Discovery","award":["RGPIN-03270-2023"],"award-info":[{"award-number":["RGPIN-03270-2023"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,6,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Recent advances in single-cell multimodal omics technologies enable the exploration of cellular systems at unprecedented resolution, leading to the rapid generation of multimodal datasets that require sophisticated integration methods. Diagonal integration has emerged as a flexible solution for integrating heterogeneous single-cell data without relying on shared cells or features. However, the absence of anchoring elements introduces the risk of artificial integrations, where cells across modalities are incorrectly aligned due to ambiguous mapping.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To address this challenge, we propose SONATA (Securing diagOnal iNtegrATion against Ambiguous) mapping, a novel diagnostic method designed to detect potential artificial integrations resulting from ambiguous mappings in diagonal data integration. SONATA identifies ambiguous alignments by quantifying cell\u2013cell ambiguity within the data manifold, ensuring that biologically meaningful integrations are distinguished from spurious ones. It is worth noting that SONATA is not designed to replace any existing pipelines for diagonal data integration; instead, SONATA works simply as an add-on to an existing pipeline for achieving more reliable integration. Through a comprehensive evaluation on both simulated and real multimodal single-cell datasets, we observe that artificial integrations in diagonal data integration are widespread yet surprisingly overlooked, occurring across all mainstream diagonal integration methods. We demonstrate SONATA\u2019s ability to safeguard against misleading integrations and provide actionable insights into potential integration failures across mainstream methods. Our approach offers a robust framework for ensuring the reliability and interpretability of multimodal single-cell data integration.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The source code is available at (https:\/\/github.com\/batmen-lab\/SONATA).<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf345","type":"journal-article","created":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T07:48:28Z","timestamp":1749628108000},"source":"Crossref","is-referenced-by-count":1,"title":["Securing diagonal integration of multimodal single-cell data against ambiguous mapping"],"prefix":"10.1093","volume":"41","author":[{"given":"Han","family":"Zhou","sequence":"first","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, ON, N2L 3G1,","place":["Canada"]}]},{"given":"Kai","family":"Cao","sequence":"additional","affiliation":[{"name":"Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard , Boston, MA,","place":["02142, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1686-5917","authenticated-orcid":false,"given":"Yang Young","family":"Lu","sequence":"additional","affiliation":[{"name":"Cheriton School of Computer Science, University of Waterloo , Waterloo, ON, N2L 3G1,","place":["Canada"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,14]]},"reference":[{"key":"2025070408281846300_btaf345-B1","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1038\/s41587-021-00895-7","article-title":"Computational principles and challenges in single-cell data integration","volume":"39","author":"Argelaguet","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025070408281846300_btaf345-B2","first-page":"333","author":"Basu","year":"2004"},{"key":"2025070408281846300_btaf345-B3","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/nature14590","article-title":"Single-cell chromatin accessibility reveals principles of regulatory variation","volume":"523","author":"Buenrostro","year":"2015","journal-title":"Nature"},{"key":"2025070408281846300_btaf345-B4","doi-asserted-by":"crossref","first-page":"i48","DOI":"10.1093\/bioinformatics\/btaa443","article-title":"Unsupervised topological alignment for single-cell multi-omics integration","volume":"36","author":"Cao","year":"2020","journal-title":"Bioinformatics"},{"key":"2025070408281846300_btaf345-B5","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1093\/bioinformatics\/btab594","article-title":"Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona","volume":"38","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2025070408281846300_btaf345-B6","doi-asserted-by":"crossref","first-page":"e1011288","DOI":"10.1371\/journal.pcbi.1011288","article-title":"The specious art of single-cell genomics","volume":"19","author":"Chari","year":"2023","journal-title":"PLoS Comput Biol"},{"key":"2025070408281846300_btaf345-B7","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1038\/s41587-019-0290-0","article-title":"High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell","volume":"37","author":"Chen","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2025070408281846300_btaf345-B8","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1038\/nmeth.3961","article-title":"Single-cell multimodal profiling reveals cellular epigenetic heterogeneity","volume":"13","author":"Cheow","year":"2016","journal-title":"Nat Methods"},{"key":"2025070408281846300_btaf345-B9","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1038\/s41467-018-03149-4","article-title":"scnmt-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells","volume":"9","author":"Clark","year":"2018","journal-title":"Nat Commun"},{"key":"2025070408281846300_btaf345-B10","first-page":"iii","author":"Costa","year":"2004"},{"key":"2025070408281846300_btaf345-B11","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1089\/cmb.2021.0446","article-title":"SCOT: single-cell multi-omics alignment with optimal transport","volume":"29","author":"Demetci","year":"2022","journal-title":"J Comput Biol"},{"key":"2025070408281846300_btaf345-B12","first-page":"3","author":"Demetci","year":"2022"},{"key":"2025070408281846300_btaf345-B13","doi-asserted-by":"publisher","author":"Dong","year":"2023","DOI":"10.1101\/2023.10.29.564479"},{"key":"2025070408281846300_btaf345-B14","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1038\/s41592-019-0367-1","article-title":"cisTopic: cis-regulatory topic modelling on single-cell ATAC-seq data","volume":"16","author":"Gonz\u00e1lez-Blas","year":"2019","journal-title":"Nat Methods"},{"key":"2025070408281846300_btaf345-B15","article-title":"Deep imputation bi-stochastic graph regularized matrix factorization for clustering single-cell RNA-sequencing data","author":"Lan","year":"2024","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025070408281846300_btaf345-B16","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.26599\/BDMA.2024.9020034","article-title":"Transformer-based single-cell language model: a survey","volume":"7","author":"Lan","year":"2024","journal-title":"Big Data Min Anal"},{"key":"2025070408281846300_btaf345-B17","doi-asserted-by":"crossref","first-page":"e1012679","DOI":"10.1371\/journal.pcbi.1012679","article-title":"scMoMtF: an interpretable multitask learning framework for single-cell multi-omics data analysis","volume":"20","author":"Lan","year":"2024","journal-title":"PLoS Comput Biol"},{"key":"2025070408281846300_btaf345-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ymeth.2023.11.019","article-title":"JLONMFSC: clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering","volume":"222","author":"Lan","year":"2024","journal-title":"Methods"},{"key":"2025070408281846300_btaf345-B19","doi-asserted-by":"crossref","first-page":"4486","DOI":"10.1109\/JBHI.2025.3530794","article-title":"The large language models on biomedical data analysis: a survey","volume":"29","author":"Lan","year":"2025","journal-title":"IEEE J Biomed Health Inform"},{"key":"2025070408281846300_btaf345-B20","doi-asserted-by":"publisher","author":"Liu","year":"2024","DOI":"10.1101\/2024.09.15.613149"},{"key":"2025070408281846300_btaf345-B21","first-page":"1","author":"Liu","year":"2019"},{"key":"2025070408281846300_btaf345-B22","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nature Methods"},{"key":"2025070408281846300_btaf345-B23","doi-asserted-by":"crossref","first-page":"861","DOI":"10.21105\/joss.00861","article-title":"UMAP: Uniform manifold approximation and projection for dimension reduction","volume":"3","author":"McInnes","year":"2018","journal-title":"J Open Source Softw"},{"key":"2025070408281846300_btaf345-B24","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1007\/s10208-011-9093-5","article-title":"Gromov\u2013Wasserstein distances and the metric approach to object matching","volume":"11","author":"M\u00e9moli","year":"2011","journal-title":"Found Comput Math"},{"key":"2025070408281846300_btaf345-B25","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature12593","article-title":"Single-cell Hi-C reveals cell-to-cell variability in chromosome structure","volume":"502","author":"Nagano","year":"2013","journal-title":"Nature"},{"key":"2025070408281846300_btaf345-B26","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1186\/s13059-021-02438-4","article-title":"CoCoA-diff: counterfactual inference for single-cell gene expression analysis","volume":"22","author":"Park","year":"2021","journal-title":"Genome Biol"},{"key":"2025070408281846300_btaf345-B27","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1561\/2200000073","article-title":"Computational optimal transport: with applications to data science","volume":"11","author":"Peyr\u00e9","year":"2019","journal-title":"FNT in Mach Learn"},{"key":"2025070408281846300_btaf345-B28","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1177\/030098589803500601","article-title":"The cell cycle: a review","volume":"35","author":"Schafer","year":"1998","journal-title":"Vet Pathol"},{"key":"2025070408281846300_btaf345-B29","author":"Serven","year":"2018"},{"key":"2025070408281846300_btaf345-B30","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1038\/nmeth.3035","article-title":"Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity","volume":"11","author":"Smallwood","year":"2014","journal-title":"Nat Methods"},{"key":"2025070408281846300_btaf345-B31","doi-asserted-by":"crossref","first-page":"i919","DOI":"10.1093\/bioinformatics\/btaa843","article-title":"SCIM: universal single-cell matching with unpaired feature sets","volume":"36","author":"Stark","year":"2020","journal-title":"Bioinformatics"},{"key":"2025070408281846300_btaf345-B32","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/nmeth.1315","article-title":"mRNA-Seq whole-transcriptome analysis of a single cell","volume":"6","author":"Tang","year":"2009","journal-title":"Nat Methods"},{"key":"2025070408281846300_btaf345-B33","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","article-title":"A global geometric framework for nonlinear dimensionality reduction","volume":"290","author":"Tenenbaum","year":"2000","journal-title":"Science"},{"key":"2025070408281846300_btaf345-B34","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1038\/s41576-023-00580-2","article-title":"Methods and applications for single-cell and spatial multi-omics","volume":"24","author":"Vandereyken","year":"2023","journal-title":"Nat Rev Genet"},{"key":"2025070408281846300_btaf345-B35","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1186\/s13059-017-1269-0","article-title":"MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics","volume":"18","author":"Welch","year":"2017","journal-title":"Genome Biol"},{"key":"2025070408281846300_btaf345-B36","doi-asserted-by":"crossref","first-page":"3505","DOI":"10.1038\/s41467-022-31104-x","article-title":"Diagonal integration of multimodal single-cell data: potential pitfalls and paths forward","volume":"13","author":"Xu","year":"2022","journal-title":"Nat Commun"},{"key":"2025070408281846300_btaf345-B37","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/s41467-020-20249-2","article-title":"Multi-domain translation between single-cell imaging and sequencing data using autoencoders","volume":"12","author":"Yang","year":"2021","journal-title":"Nat Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf345\/63488814\/btaf345.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf345\/63488814\/btaf345.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf345\/63488814\/btaf345.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T08:28:29Z","timestamp":1751617709000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf345\/8162456"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6]]},"references-count":37,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf345","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.10.05.561049","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,6]]},"published":{"date-parts":[[2025,6]]},"article-number":"btaf345"}}