{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T13:33:50Z","timestamp":1772544830103,"version":"3.50.1"},"reference-count":26,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T00:00:00Z","timestamp":1772496000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>Single-cell RNA sequencing (scRNA-seq) enables high-throughput analysis of gene expression at single-cell resolution and plays a crucial role in studying cellular heterogeneity, tissue development, and disease mechanisms. However, scRNA-seq data are characterized by high dimensionality, sparsity, technical noise, and prevalent dropout events, which pose substantial challenges to conventional clustering approaches.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>To address these challenges, we propose scDMAC, a novel clustering framework for single-cell RNA sequencing data based on denoising and masking learning. The method integrates a zero-inflated negative binomial (ZINB)-based denoising autoencoder with a masking autoencoder. First, the ZINB-based autoencoder models count distribution and dropout events to denoise gene expression data. Subsequently, a tailored masking strategy is applied to the denoised data to learn gene-wise correlations through reconstruction.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Extensive experiments conducted on multiple benchmark scRNA-seq datasets demonstrate that scDMAC achieves superior clustering accuracy and stability compared with state-of-the-art methods. The proposed framework consistently improves clustering performance across diverse datasets, highlighting its robustness to noise and sparsity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>By effectively combining probabilistic denoising with masking-based representation learning, scDMAC provides a powerful solution for addressing dropout and sparsity issues in scRNA-seq data. The improved clustering performance suggests that integrating distribution-aware denoising with feature reconstruction enhances the extraction of biologically meaningful representations, making scDMAC a promising tool for single-cell transcriptomic analysis.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fbinf.2026.1758257","type":"journal-article","created":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T09:22:21Z","timestamp":1772529741000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A clustering method for single-cell RNA sequencing data based on denoising and masking learning"],"prefix":"10.3389","volume":"6","author":[{"given":"Shuang","family":"Xu","sequence":"first","affiliation":[{"name":"Department of Anesthesiology, The Second Hospital of Jilin University","place":["Changchun, China"]}]},{"given":"Wen","family":"Yan","sequence":"additional","affiliation":[{"name":"Department of Anesthesiology, The Second Hospital of Jilin University","place":["Changchun, China"]}]},{"given":"Bin","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University","place":["Changchun, China"]}]},{"given":"Hong","family":"Qi","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University","place":["Changchun, China"]}]},{"given":"Kai","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University","place":["Changchun, China"]}]}],"member":"1965","published-online":{"date-parts":[[2026,3,3]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1016\/j.coisb.2017.12.007","article-title":"Methods and challenges in the analysis of single-cell RNA-Sequencing data","volume":"7","author":"Camara","year":"2018","journal-title":"Curr. Opin. Syst. Biol."},{"key":"B2","doi-asserted-by":"publisher","first-page":"lqaa039","DOI":"10.1093\/nargab\/lqaa039","article-title":"Deep soft K-Means clustering with self-training for single-cell RNA sequence data","volume":"2","author":"Chen","year":"2020","journal-title":"NAR Genomics Bioinformatics"},{"key":"B3","doi-asserted-by":"publisher","first-page":"1037","DOI":"10.1093\/bioinformatics\/btab787","article-title":"GNN-based embedding for clustering scRNA-Seq data","volume":"38","author":"Ciortan","year":"2022","journal-title":"Bioinformatics"},{"key":"B4","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/s13059-016-0881-8","article-title":"A survey of best practices for RNA-Seq data analysis","volume":"17","author":"Conesa","year":"2016","journal-title":"Genome Biology"},{"key":"B5","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-Seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat. Communications"},{"key":"B6","doi-asserted-by":"publisher","first-page":"bbab531","DOI":"10.1093\/bib\/bbab531","article-title":"Deep learning tackles single-Cell Analysis\u2014A survey of deep learning for scRNA-Seq analysis","volume":"23","author":"Flores","year":"2022","journal-title":"Briefings Bioinformatics"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1492752","DOI":"10.3389\/fgene.2024.1492752","article-title":"AI-Enabled pipeline for virus detection, validation, and SNP discovery from next-generation sequencing data","volume":"15","author":"Ghorbani","year":"2024","journal-title":"Front. Genet."},{"key":"B8","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1186\/s12859-018-2226-y","article-title":"DrImpute: imputing dropout events in single cell RNA sequencing data","volume":"19","author":"Gong","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"B9","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1186\/s13073-017-0467-4","article-title":"A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications","volume":"9","author":"Haque","year":"2017","journal-title":"Genome Medicine"},{"key":"B10","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2346830","article-title":"Algorithm as 136: a K-Means clustering Algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J. Royal Statistical Society. Series C Appl. Statistics"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1584334","DOI":"10.3389\/fgene.2025.1584334","article-title":"Identification of three T cell-related genes as diagnostic and prognostic biomarkers for triple-negative breast cancer and exploration of potential mechanisms","volume":"16","author":"He","year":"2025","journal-title":"Front. Genet."},{"key":"B12","doi-asserted-by":"publisher","first-page":"166","DOI":"10.1186\/s13045-020-01005-x","article-title":"RNA sequencing: new technologies and applications in cancer research","volume":"13","author":"Hong","year":"2020","journal-title":"J. Hematology & Oncology"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1208","DOI":"10.1038\/s41588-020-00726-6","article-title":"Pan-Cancer single-cell RNA-Seq identifies recurring programs of cellular heterogeneity","volume":"52","author":"Kinker","year":"2020","journal-title":"Nat. Genetics"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1901","DOI":"10.1038\/s41467-022-29576-y","article-title":"A universal deep neural network for in-Depth cleaning of single-cell RNA-seq data","volume":"13","author":"Li","year":"2022","journal-title":"Nat. Commun."},{"key":"B15","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat. Methods"},{"key":"B16","doi-asserted-by":"publisher","first-page":"1196","DOI":"10.1093\/bib\/bbz062","article-title":"Clustering and classification methods for single-cell RNA-sequencing data","volume":"21","author":"Qi","year":"2020","journal-title":"Briefings Bioinformatics"},{"key":"B17","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1186\/s12859-021-04028-4","article-title":"scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data","volume":"22","author":"Ranjan","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"B18","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1038\/nrg3833","article-title":"Computational and analytical challenges in single-cell transcriptomics","volume":"16","author":"Stegle","year":"2015","journal-title":"Nat. Rev. Genet."},{"key":"B19","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1186\/s40779-022-00434-8","article-title":"Data analysis guidelines for single-cell RNA-Seq in biomedical studies and clinical applications","volume":"9","author":"Su","year":"2022","journal-title":"Mil. Med. Res."},{"key":"B20","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-Seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nat. Mach. Intell."},{"key":"B21","doi-asserted-by":"publisher","first-page":"e100041","DOI":"10.18547\/gcb.2018.vol4.iss2.e100041","article-title":"Principal components analysis: theory and application to gene expression data analysis","volume":"4","author":"Todorov","year":"2018","journal-title":"Genomics Comput. Biol."},{"key":"B22","doi-asserted-by":"publisher","first-page":"bbac625","DOI":"10.1093\/bib\/bbac625","article-title":"scDCCA: deep contrastive clustering for single-cell RNA-Seq data based on auto-encoder network","volume":"24","author":"Wang","year":"2023","journal-title":"Briefings Bioinforma."},{"key":"B23","doi-asserted-by":"publisher","first-page":"btae130","DOI":"10.1093\/bioinformatics\/btae130","article-title":"CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis","volume":"40","author":"Wang","year":"2024","journal-title":"Bioinformatics"},{"key":"B24","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/nmeth.2694","article-title":"Quantitative assessment of single-cell RNA-Sequencing methods","volume":"11","author":"Wu","year":"2014","journal-title":"Nat. Methods"},{"key":"B25","doi-asserted-by":"publisher","first-page":"i368","DOI":"10.1093\/bioinformatics\/btad216","article-title":"CellBRF: a feature selection method for single-cell clustering using cell balance and random Forest","volume":"39","author":"Xu","year":"2023","journal-title":"Bioinformatics"},{"key":"B26","doi-asserted-by":"publisher","first-page":"665843","DOI":"10.3389\/fgene.2021.665843","article-title":"RFCell: a gene selection approach for scRNA-Seq clustering based on permutation and random forest","volume":"12","author":"Zhao","year":"2021","journal-title":"Front. Genetics"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2026.1758257\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T09:22:21Z","timestamp":1772529741000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2026.1758257\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,3]]},"references-count":26,"alternative-id":["10.3389\/fbinf.2026.1758257"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2026.1758257","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,3]]},"article-number":"1758257"}}