{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T12:48:12Z","timestamp":1774788492900,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T00:00:00Z","timestamp":1771372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Xinjiang Uygur Autonomous Region Key R & D program","award":["2024B03039-1"],"award-info":[{"award-number":["2024B03039-1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Single-cell heterogeneity analysis faces significant challenges due to the high dimensionality, complexity, and noise inherent in scRNA-seq data, especially when aiming for precise cell type classification. Existing analytical methods often exhibit limited generalization ability and adaptability across different biological contexts, leading to biased identification of cell subpopulations and hindering a comprehensive understanding of diseases, therapeutic responses, and biological processes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To address these issues, we propose a novel method named scKD, which integrates a hybrid neighbourhood-enhanced comparative learning model with a self-knowledge distillation strategy. scKD enhances clustering accuracy and is capable of accurately identifying both major cell types and rare cell subtypes. Extensive evaluations on multiple real-world datasets demonstrate that scKD achieves superior performance in subpopulation identification, clustering stability, and robustness. These results suggest that scKD is a powerful and reliable tool for analyzing single-cell transcriptomic data, facilitating deeper insights into cellular heterogeneity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability<\/jats:title>\n                    <jats:p>All datasets used in this study are publicly available. Detailed information about all the single-cell datasets analyzed in this paper is provided in Supplementary Table 1. All datasets can be accessed at https:\/\/zenodo.org\/records\/15412380. The source code is available at https:\/\/github.com\/A-qlh\/sckd.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btag084","type":"journal-article","created":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T12:28:59Z","timestamp":1771331339000},"source":"Crossref","is-referenced-by-count":0,"title":["A hybrid neighborhood enhanced contrastive learning and self-knowledge distillation method for scRNA-seq data clustering analysis"],"prefix":"10.1093","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-0744-1608","authenticated-orcid":false,"given":"Lihua","family":"Qi","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Xinjiang University , Urumqi, 830046,","place":["China"]}]},{"given":"Peng","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Dermatology and Venereology, People\u2019s Hospital of Xinjiang Uygur Autonomous Region , Urumqi, Xinjiang 830000,","place":["China"]},{"name":"Xinjiang Clinical Research Center for Dermatologic Diseases , Urumqi, Xinjiang 830000,","place":["China"]},{"name":"Xinjiang Key Laboratory of Dermatology Research (XJYS1707) , Urumqi, Xinjiang 830000,","place":["China"]}]},{"given":"Hao","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Xinjiang University , Urumqi, 830046,","place":["China"]}]},{"given":"Chen","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Software, Xinjiang University , Urumqi, 830046,","place":["China"]},{"name":"Key Laboratory of signal detection and processing, Xinjiang University , Urumqi, 830046,","place":["China"]},{"name":"Xinjiang Cloud Computing Application Laboratory , Karamay, 834099,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3968-8036","authenticated-orcid":false,"given":"Jin","family":"Gu","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University , Beijing, 100084,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6739-1937","authenticated-orcid":false,"given":"Cheng","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Software, Xinjiang University , Urumqi, 830046,","place":["China"]},{"name":"Xinjiang Key Laboratory of Cardiovascular Homeostasis and Regeneration Research , Urumqi, 830000,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2026,3,29]]},"reference":[{"key":"2026032908065788300_btag084-B1","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2026032908065788300_btag084-B2","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1093\/bioinformatics\/btaa908","article-title":"Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation","volume":"37","author":"Chen","year":"2021","journal-title":"Bioinformatics"},{"key":"2026032908065788300_btag084-B3","doi-asserted-by":"crossref","first-page":"lqaa039","DOI":"10.1093\/nargab\/lqaa039","article-title":"Deep soft K-means clustering with self-training for single-cell RNA sequence data","volume":"2","author":"Chen","year":"2020","journal-title":"NAR Genomics Bioinf"},{"key":"2026032908065788300_btag084-B4","doi-asserted-by":"crossref","first-page":"295","DOI":"10.3389\/fgene.2020.00295","article-title":"Single-cell transcriptome data clustering via multinomial modeling and adaptive fuzzy k-means algorithm","volume":"11","author":"Chen","year":"2020","journal-title":"Front Genet"},{"key":"2026032908065788300_btag084-B5","doi-asserted-by":"crossref","first-page":"792","DOI":"10.3390\/genes11070792","article-title":"Integrating deep supervised, self-supervised and unsupervised learning for single-cell RNA-seq clustering and annotation","volume":"11","author":"Chen","year":"2020","journal-title":"Genes (Basel)"},{"key":"2026032908065788300_btag084-B6","volume-title":"Introduction to Modern Information Retrieval","author":"Chowdhury","year":"2010"},{"key":"2026032908065788300_btag084-B7","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1186\/s12859-021-04210-8","article-title":"Contrastive self-supervised clustering of scRNA-seq data","volume":"22","author":"Ciortan","year":"2021","journal-title":"BMC Bioinform"},{"key":"2026032908065788300_btag084-B8","doi-asserted-by":"crossref","first-page":"012038","DOI":"10.1088\/1757-899X\/1088\/1\/012038","article-title":"Competency test clustering through the application of principal component analysis (PCA) and the K-Means algorithm","volume":"1088","author":"Dana","year":"2021","journal-title":"IOP Conf Ser: Mater Sci Eng"},{"key":"2026032908065788300_btag084-B9","doi-asserted-by":"crossref","first-page":"4197","DOI":"10.1038\/s41467-021-24489-8","article-title":"GapClust is a light-weight approach distinguishing rare cells from voluminous single cell expression profiles","volume":"12","author":"Fa","year":"2021","journal-title":"Nat Commun"},{"key":"2026032908065788300_btag084-B10","doi-asserted-by":"crossref","first-page":"bbad500","DOI":"10.1093\/bib\/bbad500","article-title":"stAA: adversarial graph autoencoder for spatial clustering task of spatially resolved transcriptomics","volume":"25","author":"Fang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2026032908065788300_btag084-B11","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1681\/ASN.2018090896","article-title":"Single-cell RNA profiling of glomerular cells shows dynamic changes in experimental diabetic kidney disease","volume":"30","author":"Fu","year":"2019","journal-title":"J Am Soc Nephrol"},{"key":"2026032908065788300_btag084-B12","author":"Gao","year":"2021"},{"key":"2026032908065788300_btag084-B13","doi-asserted-by":"crossref","first-page":"bbac377","DOI":"10.1093\/bib\/bbac377","article-title":"Self-supervised contrastive learning for integrative single cell RNA-seq data analysis","volume":"23","author":"Han","year":"2022","journal-title":"Brief Bioinform"},{"key":"2026032908065788300_btag084-B14","author":"He","year":"2020"},{"key":"2026032908065788300_btag084-B15","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1146\/annurev-biodatasci-012220-100601","article-title":"Computational methods for single-cell RNA sequencing","volume":"3","author":"Hie","year":"2020","journal-title":"Annu Rev Biomed Data Sci"},{"key":"2026032908065788300_btag084-B16","doi-asserted-by":"crossref","first-page":"7509","DOI":"10.1109\/TPAMI.2022.3216454","article-title":"Learning representation for clustering via prototype scattering and positive sampling","volume":"45","author":"Huang","year":"2023","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2026032908065788300_btag084-B17","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1038\/s41592-021-01171-x","article-title":"The triumphs and limitations of computational methods for scRNA-seq","volume":"18","author":"Kharchenko","year":"2021","journal-title":"Nat Methods"},{"key":"2026032908065788300_btag084-B18","author":"Lazarenko","year":"2021"},{"key":"2026032908065788300_btag084-B19","doi-asserted-by":"crossref","first-page":"btad342","DOI":"10.1093\/bioinformatics\/btad342","article-title":"Deep single-cell RNA-seq data clustering with graph prototypical contrastive learning","volume":"39","author":"Lee","year":"2023","journal-title":"Bioinformatics"},{"key":"2026032908065788300_btag084-B20","doi-asserted-by":"crossref","first-page":"bbaf138","DOI":"10.1093\/bib\/bbaf138","article-title":"scMUG: deep clustering analysis of single-cell RNA-seq data on multiple gene functional modules","volume":"26","author":"Liang","year":"2025","journal-title":"Brief Bioinform"},{"key":"2026032908065788300_btag084-B21","doi-asserted-by":"crossref","first-page":"7705","DOI":"10.1038\/s41467-022-35031-9","article-title":"Clustering of single-cell multi-omics data with a multimodal deep learning method","volume":"13","author":"Lin","year":"2022","journal-title":"Nat Commun"},{"key":"2026032908065788300_btag084-B22","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1016\/j.ccell.2022.09.012","article-title":"Artificial intelligence for multimodal data integration in oncology","volume":"40","author":"Lipkova","year":"2022","journal-title":"Cancer Cell"},{"key":"2026032908065788300_btag084-B23","doi-asserted-by":"crossref","first-page":"bbad475","DOI":"10.1093\/bib\/bbad475","article-title":"CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification","volume":"25","author":"Liu","year":"2024","journal-title":"Brief Bioinform"},{"key":"2026032908065788300_btag084-B24","doi-asserted-by":"crossref","first-page":"MSB188746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol Syst Biol"},{"key":"2026032908065788300_btag084-B25","doi-asserted-by":"crossref","first-page":"964","DOI":"10.1038\/s41467-023-36559-0","article-title":"Single-cell biological network inference using a heterogeneous graph transformer","volume":"14","author":"Ma","year":"2023","journal-title":"Nat Commun"},{"key":"2026032908065788300_btag084-B26","doi-asserted-by":"crossref","first-page":"9021","DOI":"10.1038\/s41598-024-59073-9","article-title":"Proof of biased behavior of normalized mutual information","volume":"14","author":"Mahmoudi","year":"2024","journal-title":"Sci Rep"},{"key":"2026032908065788300_btag084-B27","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.cels.2016.09.002","article-title":"A single-cell transcriptome atlas of the human pancreas","volume":"3","author":"Muraro","year":"2016","journal-title":"Cell Syst"},{"key":"2026032908065788300_btag084-B28","doi-asserted-by":"crossref","first-page":"1592","DOI":"10.1093\/bib\/bbab016","article-title":"Unsupervised and self-supervised deep learning approaches for biomedical text mining","volume":"22","author":"Nadif","year":"2021","journal-title":"Brief Bioinform"},{"key":"2026032908065788300_btag084-B29","first-page":"327","volume-title":"Advances in Information Communication Technology and Computing: Proceedings of AICTC 2019","author":"Pareek","year":"2020"},{"key":"2026032908065788300_btag084-B30","first-page":"1914","volume-title":"The thirty sixth annual conference on learning theory","author":"Parulekar","year":"2023"},{"key":"2026032908065788300_btag084-B31","first-page":"1","author":"Sahay","year":"2019"},{"key":"2026032908065788300_btag084-B32","first-page":"155","author":"Singh","year":"2020"},{"key":"2026032908065788300_btag084-B33","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.3390\/cells8101161","article-title":"An efficient and flexible method for deconvoluting bulk RNA-seq data with single-cell RNA-seq data","volume":"8","author":"Sun","year":"2019","journal-title":"Cells"},{"key":"2026032908065788300_btag084-B34","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1007\/s00180-022-01230-7","article-title":"Adjusting the adjusted rand index: a multinomial story","volume":"38","author":"Sundqvist","year":"2023","journal-title":"Comput Stat"},{"key":"2026032908065788300_btag084-B35","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nat Mach Intell"},{"key":"2026032908065788300_btag084-B36","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1186\/s13059-019-1861-6","article-title":"Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model","volume":"20","author":"Townes","year":"2019","journal-title":"Genome Biol"},{"key":"2026032908065788300_btag084-B37","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btac011","article-title":"scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data","volume":"38","author":"Wan","year":"2022","journal-title":"Bioinformatics"},{"key":"2026032908065788300_btag084-B38","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1016\/j.csbj.2023.12.043","article-title":"scSID: a lightweight algorithm for identifying rare cell types by capturing differential expression from single-cell sequencing data","volume":"23","author":"Wang","year":"2024","journal-title":"Comput Struct Biotechnol J"},{"key":"2026032908065788300_btag084-B39","first-page":"9929","author":"Wang","year":"2020"},{"key":"2026032908065788300_btag084-B40","first-page":"3","article-title":"Graph-and tree-based indexes for high-dimensional vector similarity search: analyses, comparisons, and future directions","volume":"47","author":"Wang","year":"2023","journal-title":"IEEE Data Eng Bull"},{"key":"2026032908065788300_btag084-B41","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"key":"2026032908065788300_btag084-B42","doi-asserted-by":"crossref","first-page":"lqaa082","DOI":"10.1093\/nargab\/lqaa082","article-title":"scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types","volume":"2","author":"Xie","year":"2020","journal-title":"NAR Genomics Bioinf"},{"key":"2026032908065788300_btag084-B43","doi-asserted-by":"crossref","first-page":"e9620","DOI":"10.15252\/msb.20209620","article-title":"Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models","volume":"17","author":"Xu","year":"2021","journal-title":"Mol Syst Biol"},{"key":"2026032908065788300_btag084-B44","first-page":"17185","author":"Yang","year":"2023"},{"key":"2026032908065788300_btag084-B45","first-page":"11033","article-title":"Vime: extending the success of self-and semi-supervised learning to tabular domain","volume":"33","author":"Yoon","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2026032908065788300_btag084-B46","doi-asserted-by":"crossref","first-page":"bbae112","DOI":"10.1093\/bib\/bbae112","article-title":"scNovel: a scalable deep learning-based network for novel rare cell discovery in single-cell transcriptomics","volume":"25","author":"Zheng","year":"2024","journal-title":"Brief Bioinform"},{"key":"2026032908065788300_btag084-B47","doi-asserted-by":"crossref","first-page":"997","DOI":"10.26599\/BDMA.2025.9020009","article-title":"A flexible data-driven framework for correcting coarsely annotated scRNA-seq data","volume":"8","author":"Zheng","year":"2025","journal-title":"Big Data Min Anal"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btag084\/66981734\/btag084.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/3\/btag084\/66981734\/btag084.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/3\/btag084\/66981734\/btag084.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T12:07:16Z","timestamp":1774786036000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btag084\/8489905"}},"subtitle":[],"editor":[{"given":"Laura","family":"Cantini","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2026,2,28]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,2,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btag084","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,3]]},"published":{"date-parts":[[2026,2,28]]},"article-number":"btag084"}}