{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T16:08:27Z","timestamp":1775146107278,"version":"3.50.1"},"reference-count":75,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,12,27]],"date-time":"2024-12-27T00:00:00Z","timestamp":1735257600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,27]],"date-time":"2024-12-27T00:00:00Z","timestamp":1735257600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Mach Intell"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Self-supervised learning (SSL) has emerged as a powerful method for extracting meaningful representations from vast, unlabelled datasets, transforming computer vision and natural language processing. In single-cell genomics (SCG), representation learning offers insights into the complex biological data, especially with emerging foundation models. However, identifying scenarios in SCG where SSL outperforms traditional learning methods remains a nuanced challenge. Furthermore, selecting the most effective pretext tasks within the SSL framework for SCG is a critical yet unresolved question. Here we address this gap by adapting and benchmarking SSL methods in SCG, including masked autoencoders with multiple masking strategies and contrastive learning methods. Models trained on over 20 million cells were examined across multiple downstream tasks, including cell-type prediction, gene-expression reconstruction, cross-modality prediction and data integration. Our empirical analyses underscore the nuanced role of SSL, namely, in transfer learning scenarios leveraging auxiliary data or analysing unseen datasets. Masked autoencoders excel over contrastive methods in SCG, diverging from computer vision trends. Moreover, our findings reveal the notable capabilities of SSL in zero-shot settings and its potential in cross-modality prediction and data integration. In summary, we study SSL methods in SCG on fully connected networks and benchmark their utility across key representation learning scenarios.<\/jats:p>","DOI":"10.1038\/s42256-024-00934-3","type":"journal-article","created":{"date-parts":[[2024,12,27]],"date-time":"2024-12-27T05:02:17Z","timestamp":1735275737000},"page":"68-78","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Delineating the effective use of self-supervised learning in single-cell genomics"],"prefix":"10.1038","volume":"7","author":[{"given":"Till","family":"Richter","sequence":"first","affiliation":[]},{"given":"Mojtaba","family":"Bahrami","sequence":"additional","affiliation":[]},{"given":"Yufan","family":"Xia","sequence":"additional","affiliation":[]},{"given":"David S.","family":"Fischer","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2419-1943","authenticated-orcid":false,"given":"Fabian J.","family":"Theis","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,12,27]]},"reference":[{"key":"934_CR1","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.coisb.2017.07.004","volume":"4","author":"P Angerer","year":"2017","unstructured":"Angerer, P. et al. Single cells make big data: new challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 4, 85\u201391 (2017).","journal-title":"Curr. Opin. Syst. Biol."},{"key":"934_CR2","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1038\/s41587-021-01001-7","volume":"40","author":"M Lotfollahi","year":"2022","unstructured":"Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121\u2013130 (2022).","journal-title":"Nat. Biotechnol."},{"key":"934_CR3","doi-asserted-by":"publisher","first-page":"e27041","DOI":"10.7554\/eLife.27041","volume":"6","author":"A Regev","year":"2017","unstructured":"Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).","journal-title":"eLife"},{"key":"934_CR4","doi-asserted-by":"publisher","first-page":"1563","DOI":"10.1038\/s41591-023-02327-2","volume":"29","author":"L Sikkema","year":"2023","unstructured":"Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563\u20131577 (2023).","journal-title":"Nat. Med."},{"key":"934_CR5","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-024-51059-5","volume":"15","author":"F Fischer","year":"2024","unstructured":"Fischer, F. et al. scTab: scaling cross-tissue single-cell annotation models. Nat. Commun. 15, 6611 (2024).","journal-title":"Nat. Commun."},{"key":"934_CR6","unstructured":"Consens, M. E. et al. To transformers and beyond: large language models for the genome. Preprint at https:\/\/arxiv.org\/abs\/2311.07621 (2023)."},{"key":"934_CR7","doi-asserted-by":"publisher","unstructured":"Boiarsky, R., Singh, N., Buendia, A., Getz, G. & Sontag, D. A deep dive into single-cell RNA sequencing foundation models. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2023.10.19.563100 (2023).","DOI":"10.1101\/2023.10.19.563100"},{"key":"934_CR8","first-page":"26671","volume":"35","author":"R Balestriero","year":"2022","unstructured":"Balestriero, R. et al. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. Adv. Neural Inf. Process. Syst. 35, 26671\u201326685 (2022).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"934_CR9","unstructured":"Weng, L. et al. Self-supervised learning: self-prediction and contrastive learning. Adv. Neural Inf. Process. Syst. https:\/\/nips.cc\/media\/neurips-2021\/Slides\/21895.pdf (2021)."},{"key":"934_CR10","unstructured":"Uelwer, T. et al. A survey on self-supervised representation learning. Preprint at https:\/\/arxiv.org\/abs\/2308.11455 (2023)."},{"key":"934_CR11","unstructured":"Bardes, A. et al. Y. VICReg: Variance-Invariance-Covariance regularization for self-supervised learning. Int. Conf. Learn. Represent. https:\/\/openreview.net\/forum?id=xm6YD62D1Ub (2022)."},{"key":"934_CR12","unstructured":"Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Iii, H. D. & Singh, A.) 1597\u20131607 (PMLR, 2020)."},{"key":"934_CR13","unstructured":"Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving language understanding by generative pre-training (2018); https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf"},{"key":"934_CR14","unstructured":"Devlin, J. et al. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers (eds Burstein, J. et al.) 4171\u20134186 (Association for Computational Linguistics, 2019)."},{"key":"934_CR15","unstructured":"Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https:\/\/arxiv.org\/abs\/2108.07258 (2021)."},{"key":"934_CR16","doi-asserted-by":"publisher","first-page":"696","DOI":"10.1038\/s42256-022-00518-z","volume":"4","author":"M Yang","year":"2022","unstructured":"Yang, M. et al. Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale. Nat. Mach. Intell. 4, 696\u2013709 (2022).","journal-title":"Nat. Mach. Intell."},{"key":"934_CR17","doi-asserted-by":"publisher","first-page":"btad098","DOI":"10.1093\/bioinformatics\/btad098","volume":"39","author":"Z Xiong","year":"2023","unstructured":"Xiong, Z. et al. scGCL: an imputation method for scRNA-seq data based on graph contrastive learning. Bioinformatics 39, btad098 (2023).","journal-title":"Bioinformatics"},{"key":"934_CR18","doi-asserted-by":"publisher","first-page":"btad099","DOI":"10.1093\/bioinformatics\/btad099","volume":"39","author":"X Yan","year":"2023","unstructured":"Yan, X., Zheng, R., Wu, F. & Li, M. CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity. Bioinformatics 39, btad099 (2023).","journal-title":"Bioinformatics"},{"key":"934_CR19","doi-asserted-by":"publisher","first-page":"792","DOI":"10.3390\/genes11070792","volume":"11","author":"L Chen","year":"2020","unstructured":"Chen, L., Zhai, Y., He, Q., Wang, W. & Deng, M. Integrating deep supervised, self-supervised and unsupervised learning for single-cell RNA-seq clustering and annotation. Genes 11, 792 (2020).","journal-title":"Genes"},{"key":"934_CR20","doi-asserted-by":"publisher","first-page":"1607","DOI":"10.1093\/bioinformatics\/btac007","volume":"38","author":"R Zhang","year":"2022","unstructured":"Zhang, R., Luo, Y., Ma, J., Zhang, M. & Wang, S. scPretrain: multi-task self-supervised learning for cell-type classification. Bioinformatics 38, 1607\u20131614 (2022).","journal-title":"Bioinformatics"},{"key":"934_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.isci.2021.103200","volume":"24","author":"H Shen","year":"2021","unstructured":"Shen, H. et al. Miscell: an efficient self-supervised learning approach for dissecting single-cell transcriptome. iScience 24, 103200 (2021).","journal-title":"iScience"},{"key":"934_CR22","doi-asserted-by":"publisher","first-page":"1575","DOI":"10.1093\/bioinformatics\/btac011","volume":"38","author":"H Wan","year":"2022","unstructured":"Wan, H., Chen, L. & Deng, M. scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data. Bioinformatics 38, 1575\u20131583 (2022).","journal-title":"Bioinformatics"},{"key":"934_CR23","doi-asserted-by":"publisher","first-page":"280","DOI":"10.1186\/s12859-021-04210-8","volume":"22","author":"M Ciortan","year":"2021","unstructured":"Ciortan, M. & Defrance, M. Contrastive self-supervised clustering of scRNA-seq data. BMC Bioinform. 22, 280 (2021).","journal-title":"BMC Bioinform."},{"key":"934_CR24","doi-asserted-by":"publisher","first-page":"bbac377","DOI":"10.1093\/bib\/bbac377","volume":"23","author":"W Han","year":"2022","unstructured":"Han, W. et al. Self-supervised contrastive learning for integrative single cell RNA-seq data analysis. Brief. Bioinform. 23, bbac377 (2022).","journal-title":"Brief. Bioinform."},{"key":"934_CR25","doi-asserted-by":"publisher","first-page":"2233","DOI":"10.1109\/TCBB.2023.3241129","volume":"20","author":"L Du","year":"2023","unstructured":"Du, L., Han, R., Liu, B., Wang, Y. & Li, J. ScCCL: single-cell data clustering based on self-supervised contrastive learning. IEEE\/ACM Trans. Comput. Biol. Bioinform. 20, 2233\u20132241 (2023).","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"934_CR26","doi-asserted-by":"publisher","first-page":"3430","DOI":"10.1109\/TNSE.2024.3373652","volume":"11","author":"W Peng","year":"2024","unstructured":"Peng, W. et al. Multi-network graph contrastive learning for cancer driver gene identification. IEEE Trans. Netw. Sci. Eng. 11, 3430\u20133440 (2024).","journal-title":"IEEE Trans. Netw. Sci. Eng."},{"key":"934_CR27","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-023-03072-y","volume":"24","author":"W Zhang","year":"2023","unstructured":"Zhang, W., Jiang, R., Chen, S. & Wang, Y. scIBD: a self-supervised iterative-optimizing model for boosting the detection of heterotypic doublets in single-cell chromatin accessibility data. Genome Biol. 24, 225 (2023).","journal-title":"Genome Biol."},{"key":"934_CR28","unstructured":"Vime: extending the success of self-and semi-supervised learning to tabular domain. https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/7d97667a3e056acab9aaf653807b4a03-Abstract.html"},{"key":"934_CR29","unstructured":"Lee, C. et al. Self-supervision enhanced feature selection with correlated gates. In Proc. 10th International Conference on Learning Representations https:\/\/openreview.net\/forum?id=oDFvtxzPOx (OpenReview.net, 2022)."},{"key":"934_CR30","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-024-45198-y","volume":"15","author":"MJ Geuenich","year":"2024","unstructured":"Geuenich, M. J., Gong, D.-W. & Campbell, K. R. The impacts of active and self-supervised learning on efficient annotation of single-cell expression data. Nat. Commun. 15, 1014 (2024).","journal-title":"Nat. Commun."},{"key":"934_CR31","unstructured":"Richter, T. et al. SpatialSSL: whole-brain spatial transcriptomics in the mouse brain with self-supervised learning. (2023)."},{"key":"934_CR32","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-023-35923-4","volume":"14","author":"J Chen","year":"2023","unstructured":"Chen, J. et al. Transformer for one stop interpretable cell type annotation. Nat. Commun. 14, 223 (2023).","journal-title":"Nat. Commun."},{"key":"934_CR33","doi-asserted-by":"crossref","unstructured":"Tang, W. et al. Single-cell multimodal prediction via transformers. In Proc. 32nd ACM International Conference on Information and Knowledge Management 2422\u20132431 (CIKM, 2023).","DOI":"10.1145\/3583780.3615061"},{"key":"934_CR34","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1038\/s41586-023-06139-9","volume":"618","author":"CV Theodoris","year":"2023","unstructured":"Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616\u2013624 (2023).","journal-title":"Nature"},{"key":"934_CR35","doi-asserted-by":"publisher","first-page":"1470","DOI":"10.1038\/s41592-024-02201-0","volume":"21","author":"H Cui","year":"2024","unstructured":"Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470\u20131480 (2024).","journal-title":"Nat. Methods"},{"key":"934_CR36","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1038\/s42256-022-00534-z","volume":"4","author":"F Yang","year":"2022","unstructured":"Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852\u2013866 (2022).","journal-title":"Nat. Mach. Intell."},{"key":"934_CR37","doi-asserted-by":"publisher","unstructured":"Schaar, A. C. et al. Nicheformer: a foundation model for single-cell and spatial omics. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2024.04.15.589472 (2024).","DOI":"10.1101\/2024.04.15.589472"},{"key":"934_CR38","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","volume":"15","author":"R Lopez","year":"2018","unstructured":"Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053\u20131058 (2018).","journal-title":"Nat. Methods"},{"key":"934_CR39","doi-asserted-by":"publisher","first-page":"e11517","DOI":"10.15252\/msb.202211517","volume":"19","author":"M Lotfollahi","year":"2023","unstructured":"Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).","journal-title":"Mol. Syst. Biol."},{"key":"934_CR40","unstructured":"Goldblum, M. et al. Battle of the backbones: a large-scale comparison of pretrained models across computer vision tasks. In Proc. 37th Conference on Neural Information Processing Systems, Datasets and Benchmarks Track https:\/\/openreview.net\/forum?id=1yOnfDpkVe (NeurIPS, 2023)."},{"key":"934_CR41","unstructured":"Smith, S. L., Brock, A., Berrada, L. & De, S. ConvNets match vision transformers at scale. Preprint at https:\/\/arxiv.org\/abs\/2310.19909 (2023)."},{"key":"934_CR42","unstructured":"Radford, A. et al. Robust speech recognition via large-scale weak supervision. In Proc. 40th International Conference on Machine Learning Vol. 202 (eds Krause, A. et al.) 28492\u201328518 (PMLR, 2023)."},{"key":"934_CR43","doi-asserted-by":"publisher","first-page":"1998","DOI":"10.1038\/s41588-023-01523-7","volume":"55","author":"E Dann","year":"2023","unstructured":"Dann, E. et al. Precise identification of cell states altered in disease using healthy single-cell references. Nat. Genet. 55, 1998\u20132008 (2023).","journal-title":"Nat. Genet."},{"key":"934_CR44","doi-asserted-by":"publisher","unstructured":"CZI Single-Cell Biology Program et al. CZ CELL\u00d7GENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2023.10.30.563174 (2023).","DOI":"10.1101\/2023.10.30.563174"},{"key":"934_CR45","doi-asserted-by":"crossref","unstructured":"He, K. et al. Masked autoencoders are scalable vision learners. In 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition 15979\u201315988 (IEEE, 2022).","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"934_CR46","unstructured":"Grill, J.-B. et al. Bootstrap your own latent\u2014a new approach to self-supervised learning. In Advances in Neural Information Processing Systems 21271\u201321284 (Curran Associates, 2020)."},{"key":"934_CR47","unstructured":"Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow twins: self-supervised learning via redundancy reduction. in Proc. 38th International Conference on Machine Learning 12310\u201312320 (PMLR, 2021)."},{"key":"934_CR48","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1038\/s41586-021-04345-x","volume":"602","author":"M Yoshida","year":"2022","unstructured":"Yoshida, M. et al. Local and systemic responses to SARS-CoV-2 infection in children and adults. Nature 602, 321\u2013327 (2022).","journal-title":"Nature"},{"key":"934_CR49","doi-asserted-by":"publisher","first-page":"eabl4896","DOI":"10.1126\/science.abl4896","volume":"376","author":"Tabula Sapiens Consortium","year":"2022","unstructured":"Tabula Sapiens Consortium et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).","journal-title":"Science"},{"key":"934_CR50","doi-asserted-by":"publisher","first-page":"733","DOI":"10.1126\/science.adf6162","volume":"381","author":"JS Fleck","year":"2023","unstructured":"Fleck, J. S., Camp, J. G. & Treutlein, B. What is a cell type? Science 381, 733\u2013734 (2023).","journal-title":"Science"},{"key":"934_CR51","doi-asserted-by":"publisher","unstructured":"Heimberg, G. et al. Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages. Preprint at bioRxiv https:\/\/doi.org\/10.1101\/2023.07.18.549537 (2023).","DOI":"10.1101\/2023.07.18.549537"},{"key":"934_CR52","doi-asserted-by":"publisher","DOI":"10.1126\/science.add7046","volume":"382","author":"K Siletti","year":"2023","unstructured":"Siletti, K. et al. Transcriptomic diversity of cell types across the adult human brain. Science 382, eadd7046 (2023).","journal-title":"Science"},{"key":"934_CR53","doi-asserted-by":"publisher","DOI":"10.1126\/science.adf0834","volume":"382","author":"D Velmeshev","year":"2023","unstructured":"Velmeshev, D. et al. Single-cell analysis of prenatal and postnatal human cortical development. Science 382, eadf0834 (2023).","journal-title":"Science"},{"key":"934_CR54","doi-asserted-by":"publisher","first-page":"108572","DOI":"10.1016\/j.isci.2023.108572","volume":"26","author":"E Ivanova","year":"2023","unstructured":"Ivanova, E. et al. mRNA COVID-19 vaccine elicits potent adaptive immune response without the acute inflammation of SARS-CoV-2 infection. iScience 26, 108572 (2023).","journal-title":"iScience"},{"key":"934_CR55","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade9516","volume":"382","author":"NL Jorstad","year":"2023","unstructured":"Jorstad, N. L. et al. Comparative transcriptomics reveals human-specific cortical features. Science 382, eade9516 (2023).","journal-title":"Science"},{"key":"934_CR56","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1038\/s41576-023-00586-w","volume":"24","author":"L Heumos","year":"2023","unstructured":"Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550\u2013572 (2023).","journal-title":"Nat. Rev. Genet."},{"key":"934_CR57","unstructured":"Luecken, M. et al. A sandbox for prediction and integration of DNA, RNA, and proteins in single cells. In Proc. 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks (eds Vanschoren, J. & Yeung, S.) https:\/\/datasets-benchmarks-proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf (2021)."},{"key":"934_CR58","doi-asserted-by":"publisher","first-page":"865","DOI":"10.1038\/nmeth.4380","volume":"14","author":"M Stoeckius","year":"2017","unstructured":"Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865\u2013868 (2017).","journal-title":"Nat. Methods"},{"key":"934_CR59","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1038\/s41592-020-01050-x","volume":"18","author":"A Gayoso","year":"2021","unstructured":"Gayoso, A. et al. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat. Methods 18, 272\u2013282 (2021).","journal-title":"Nat. Methods"},{"key":"934_CR60","doi-asserted-by":"publisher","first-page":"619","DOI":"10.1038\/s41586-020-2922-4","volume":"587","author":"KJ Travaglini","year":"2020","unstructured":"Travaglini, K. J. et al. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature 587, 619\u2013625 (2020).","journal-title":"Nature"},{"key":"934_CR61","doi-asserted-by":"publisher","first-page":"e62522","DOI":"10.7554\/eLife.62522","volume":"9","author":"A Wang","year":"2020","unstructured":"Wang, A. et al. Single-cell multiomic profiling of human lungs reveals cell-type-specific and age-dynamic control of SARS-CoV2 host genes. eLife 9, e62522 (2020).","journal-title":"eLife"},{"key":"934_CR62","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1038\/s41586-021-03569-1","volume":"595","author":"JC Melms","year":"2021","unstructured":"Melms, J. C. et al. A molecular single-cell lung atlas of lethal COVID-19. Nature 595, 114\u2013119 (2021).","journal-title":"Nature"},{"key":"934_CR63","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","volume":"19","author":"MD Luecken","year":"2022","unstructured":"Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41\u201350 (2022).","journal-title":"Nat. Methods"},{"key":"934_CR64","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-017-1382-0","volume":"19","author":"FA Wolf","year":"2018","unstructured":"Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19, 15 (2018).","journal-title":"Genome Biol"},{"key":"934_CR65","unstructured":"von K\u00fcgelgen, J. et al. Self-supervised learning with data augmentations provably isolates content from style. In Advances in Neural Information Processing Systems 16451\u201316467 (Curran Associates, 2021)."},{"key":"934_CR66","unstructured":"Liu, H., et al. Self-supervised learning is more robust to dataset imbalance. In NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications https:\/\/openreview.net\/forum?id=vUz4JPRLpGx (2021)."},{"key":"934_CR67","unstructured":"Cao, S., Xu, P. & Clifton, D. A. How to understand masked autoencoders. Preprint at https:\/\/arxiv.org\/abs\/2202.03670 (2022)."},{"key":"934_CR68","doi-asserted-by":"publisher","first-page":"15545","DOI":"10.1073\/pnas.0506580102","volume":"102","author":"A Subramanian","year":"2005","unstructured":"Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545\u201315550 (2005).","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"934_CR69","doi-asserted-by":"publisher","first-page":"1739","DOI":"10.1093\/bioinformatics\/btr260","volume":"27","author":"A Liberzon","year":"2011","unstructured":"Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739\u20131740 (2011).","journal-title":"Bioinformatics"},{"key":"934_CR70","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1016\/j.cels.2015.12.004","volume":"1","author":"A Liberzon","year":"2015","unstructured":"Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417\u2013425 (2015).","journal-title":"Cell Syst."},{"key":"934_CR71","doi-asserted-by":"publisher","first-page":"D104","DOI":"10.1093\/nar\/gkaa1057","volume":"49","author":"S Kolmykov","year":"2021","unstructured":"Kolmykov, S. et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Res. 49, D104\u2013D111 (2021).","journal-title":"Nucleic Acids Res."},{"key":"934_CR72","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1038\/nature03441","volume":"434","author":"X Xie","year":"2005","unstructured":"Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3\u2032 UTRs by comparison of several mammals. Nature 434, 338\u2013345 (2005).","journal-title":"Nature"},{"key":"934_CR73","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-021-02577-8","volume":"23","author":"D Bredikhin","year":"2022","unstructured":"Bredikhin, D., Kats, I. & Stegle, O. MUON: multimodal omics analysis framework. Genome Biol. 23, 42 (2022).","journal-title":"Genome Biol."},{"key":"934_CR74","first-page":"15753","volume":"15745","author":"X Chen","year":"2020","unstructured":"Chen, X. & He, K. Exploring simple Siamese representation learning. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 15745, 15753 (2020).","journal-title":"Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit."},{"key":"934_CR75","doi-asserted-by":"publisher","unstructured":"Richter, T. & Bahrami, M. Theislab\/ssl_in_scg: first release. Zenodo https:\/\/doi.org\/10.5281\/zenodo.13358873 (2024).","DOI":"10.5281\/zenodo.13358873"}],"container-title":["Nature Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s42256-024-00934-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-024-00934-3","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-024-00934-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T18:20:07Z","timestamp":1759861207000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s42256-024-00934-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,27]]},"references-count":75,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["934"],"URL":"https:\/\/doi.org\/10.1038\/s42256-024-00934-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.02.16.580624","asserted-by":"object"}]},"ISSN":["2522-5839"],"issn-type":[{"value":"2522-5839","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,27]]},"assertion":[{"value":"16 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 October 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 December 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"F.J.T. consults for Immunai, CytoReason, Cellarity and Omniscope and has an ownership interest in Dermagnostix GmbH and Cellarity. The other authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}