{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,22]],"date-time":"2026-02-22T01:16:07Z","timestamp":1771722967545,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2024,10,22]],"date-time":"2024-10-22T00:00:00Z","timestamp":1729555200000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Clinical Medicine Plus X\u2014Young Scholars Project, Peking University, the Fundamental Research Funds for the Central Universities","award":["PKU2023LCXQ048"],"award-info":[{"award-number":["PKU2023LCXQ048"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82373682"],"award-info":[{"award-number":["82373682"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82173615"],"award-info":[{"award-number":["82173615"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82204158"],"award-info":[{"award-number":["82204158"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The integration of data from multiple modalities generated by single-cell omics technologies is crucial for accurately identifying cell states. One challenge in comprehending multi-omics data resides in mosaic integration, in which different data modalities are profiled in different subsets of cells, as it requires simultaneous batch effect removal and modality alignment. Here, we develop Multi-omics Mosaic Auto-scaling Attention Variational Inference (mmAAVI), a scalable deep generative model for single-cell mosaic integration. Leveraging auto-scaling self-attention mechanisms, mmAAVI can map arbitrary combinations of omics to the common embedding space. If existing well-annotated cell states, the model can perform semisupervised learning to utilize existing these annotations. We validated the performance of mmAAVI and five other commonly used methods on four benchmark datasets, which vary in cell numbers, omics types, and missing patterns. mmAAVI consistently demonstrated its superiority. We also validated mmAAVI's ability for cell state knowledge transfer, achieving balanced accuracies of 0.82 and 0.97 with less 1% labeled cells between batches with completely different omics. The full package is available at https:\/\/github.com\/luyiyun\/mmAAVI.<\/jats:p>","DOI":"10.1093\/bib\/bbae540","type":"journal-article","created":{"date-parts":[[2024,10,22]],"date-time":"2024-10-22T23:16:26Z","timestamp":1729638986000},"source":"Crossref","is-referenced-by-count":1,"title":["Single-cell mosaic integration and cell state transfer with auto-scaling self-attention mechanism"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0970-8545","authenticated-orcid":false,"given":"Zhiwei","family":"Rong","sequence":"first","affiliation":[{"name":"Department of Biostatistics , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]},{"name":"Peking University , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiali","family":"Song","sequence":"additional","affiliation":[{"name":"Department of Biostatistics , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]},{"name":"Peking University , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yipei","family":"Yu","sequence":"additional","affiliation":[{"name":"Department of Biostatistics , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]},{"name":"Peking University , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lan","family":"Mi","sequence":"additional","affiliation":[{"name":"Peking University Cancer Hospital , 52 Fucheng Rd., Haidian District, Beijing 100142 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"ManTang","family":"Qiu","sequence":"additional","affiliation":[{"name":"Department of Thoracic Surgery, Peking University People\u2019s Hospital , No. 11 Xizhimen South Street, Xicheng District, Beijing 100044 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuqin","family":"Song","sequence":"additional","affiliation":[{"name":"Peking University Cancer Hospital , 52 Fucheng Rd., Haidian District, Beijing 100142 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3658-8147","authenticated-orcid":false,"given":"Yan","family":"Hou","sequence":"additional","affiliation":[{"name":"Department of Biostatistics , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]},{"name":"Peking University , School of Public Health, , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]},{"name":"Peking University Cancer Hospital , 52 Fucheng Rd., Haidian District, Beijing 100142 ,","place":["China"]},{"name":"Peking University Clinical Research Center, Peking University , 38 Xueyuan Rd., Haidian District, Beijing 100191 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,10,22]]},"reference":[{"key":"2024102223160555400_ref1","doi-asserted-by":"publisher","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"2024102223160555400_ref2","doi-asserted-by":"publisher","first-page":"1213","DOI":"10.1038\/nmeth.2688","article-title":"Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position","volume":"10","author":"Buenrostro","year":"2013","journal-title":"Nat Methods"},{"key":"2024102223160555400_ref3","doi-asserted-by":"publisher","first-page":"1103","DOI":"10.1016\/j.cell.2020.09.056","article-title":"Chromatin potential identified by shared single-cell profiling of RNA and chromatin","volume":"183","author":"Ma","year":"2020","journal-title":"Cell"},{"key":"2024102223160555400_ref4","doi-asserted-by":"publisher","first-page":"865","DOI":"10.1038\/nmeth.4380","article-title":"Simultaneous epitope and transcriptome measurement in single cells","volume":"14","author":"Stoeckius","year":"2017","journal-title":"Nat Methods"},{"key":"2024102223160555400_ref5","doi-asserted-by":"publisher","first-page":"781","DOI":"10.1038\/s41467-018-03149-4","article-title":"scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells","volume":"9","author":"Clark","year":"2018","journal-title":"Nat Commun"},{"key":"2024102223160555400_ref6","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1038\/s42256-020-00233-7","article-title":"Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis","volume":"2","author":"Hu","year":"2020","journal-title":"Nat Mach Intell"},{"key":"2024102223160555400_ref7","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nat Methods"},{"key":"2024102223160555400_ref8","doi-asserted-by":"publisher","first-page":"2539","DOI":"10.1093\/bioinformatics\/btx196","article-title":"Removal of batch effects using distribution-matching residual networks","volume":"33","author":"Shaham","year":"2017","journal-title":"Bioinformatics"},{"key":"2024102223160555400_ref9","doi-asserted-by":"publisher","first-page":"1202","DOI":"10.1038\/s41587-021-00895-7","article-title":"Computational principles and challenges in single-cell data integration","volume":"39","author":"Argelaguet","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2024102223160555400_ref10","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1038\/s41580-023-00615-w","article-title":"The technological landscape and applications of single-cell multi-omics","volume":"24","author":"Baysoy","year":"2023","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2024102223160555400_ref11","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1186\/s13059-021-02565-y","article-title":"MultiMAP: dimensionality reduction and integration of multimodal data","volume":"22","author":"Jain","year":"2021","journal-title":"Genome Biol"},{"key":"2024102223160555400_ref12","doi-asserted-by":"publisher","first-page":"780","DOI":"10.1038\/s41467-022-28431-4","article-title":"UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization","volume":"13","author":"Kriebel","year":"2022","journal-title":"Nat Commun"},{"key":"2024102223160555400_ref13","doi-asserted-by":"publisher","first-page":"284","DOI":"10.1038\/s41587-023-01766-z","article-title":"Stabilized mosaic single-cell data integration using unshared features","volume":"42","author":"Ghazanfar","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024102223160555400_ref14","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1038\/s41467-023-36066-2","article-title":"scMoMaT jointly performs single cell mosaic integration and multi-modal bio-marker detection","volume":"14","author":"Zhang","year":"2023","journal-title":"Nat Commun"},{"key":"2024102223160555400_ref15","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-023-02040-y","article-title":"Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS","volume":"1\u201312","author":"He","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2024102223160555400_ref16","doi-asserted-by":"publisher","first-page":"bbaa128","DOI":"10.1093\/bib\/bbaa128","article-title":"Using deep neural networks and biological subwords to detect protein S-sulfenylation sites","volume":"22","author":"Do","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024102223160555400_ref17","doi-asserted-by":"publisher","DOI":"10.1002\/pmic.202100232","article-title":"Potential of deep representative learning features to interpret the sequence information in proteomics","volume":"22","author":"Le","year":"2022","journal-title":"Proteomics"},{"key":"2024102223160555400_ref18","doi-asserted-by":"publisher","first-page":"bbab005","DOI":"10.1093\/bib\/bbab005","article-title":"A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information","volume":"22","author":"Le","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024102223160555400_ref19","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1093\/bfgp\/elad031","article-title":"Omics-based deep learning approaches for lung cancer decision-making and therapeutics development","volume":"23","author":"Tran","year":"2024","journal-title":"Brief Funct Genomics"},{"key":"2024102223160555400_ref20","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2024102223160555400_ref21","doi-asserted-by":"publisher","DOI":"10.15252\/msb.20209620","article-title":"Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models","volume":"17","author":"Xu","year":"2021","journal-title":"Mol Syst Biol"},{"key":"2024102223160555400_ref22","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1038\/s41592-020-01050-x","article-title":"Joint probabilistic modeling of single-cell multi-omic data with totalVI","volume":"18","author":"Gayoso","year":"2021","journal-title":"Nat Methods"},{"key":"2024102223160555400_ref23","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-022-01284-4","article-title":"Multi-omics single-cell data integration and regulatory inference with graph-linked embedding","volume":"42","author":"Cao","journal-title":"Nat Biotechnol"},{"key":"2024102223160555400_ref24","doi-asserted-by":"publisher","first-page":"3458","DOI":"10.1038\/s41467-020-17281-7","article-title":"Searching large-scale scRNA-seq databases via unbiased cell embedding with cell BLAST","volume":"11","author":"Cao","year":"2020","journal-title":"Nat Commun"},{"key":"2024102223160555400_ref25","first-page":"2096","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Ganin","year":"2016","journal-title":"J Mach Learn Res"},{"key":"2024102223160555400_ref26","doi-asserted-by":"publisher","DOI":"10.1038\/s41587-023-01766-z","article-title":"Stabilized mosaic single-cell data integration using unshared features","volume-title":"Nature Biotechnology","author":"Ghazanfar"},{"key":"2024102223160555400_ref27","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1126\/science.aan3351","article-title":"Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex","volume":"357","author":"Luo","year":"2017","journal-title":"Science"},{"key":"2024102223160555400_ref28","doi-asserted-by":"publisher","first-page":"2190","DOI":"10.1038\/s41467-021-22368-w","article-title":"Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney","volume":"12","author":"Muto","year":"2021","journal-title":"Nat Commun"},{"key":"2024102223160555400_ref29","doi-asserted-by":"publisher","first-page":"1015","DOI":"10.1016\/j.cell.2018.07.028","article-title":"Molecular diversity and specializations among the cells of the adult mouse brain","volume":"174","author":"Saunders","year":"2018","journal-title":"Cell"},{"key":"2024102223160555400_ref30","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1186\/s13059-021-02556-z","article-title":"Cobolt: integrative analysis of multimodal single-cell sequencing data","volume":"22","author":"Gong","year":"2021","journal-title":"Genome Biol"},{"key":"2024102223160555400_ref31","doi-asserted-by":"publisher","author":"Kingma","DOI":"10.48550\/arXiv.1312.6114"},{"key":"2024102223160555400_ref32","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings","author":"Kipf","year":"2017"},{"key":"2024102223160555400_ref33","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2024102223160555400_ref34","article-title":"Semi-supervised learning with deep generative models","volume-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems","author":"Kingma"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae540\/59963457\/bbae540.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/6\/bbae540\/59963457\/bbae540.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,22]],"date-time":"2024-10-22T23:16:31Z","timestamp":1729638991000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae540\/7831260"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,23]]},"references-count":34,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,9,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae540","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,11]]},"published":{"date-parts":[[2024,9,23]]},"article-number":"bbae540"}}