{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T21:46:37Z","timestamp":1774993597172,"version":"3.50.1"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,1,17]],"date-time":"2024-01-17T00:00:00Z","timestamp":1705449600000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62225209"],"award-info":[{"award-number":["62225209"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hunan Provincial Science and Technology Program"},{"name":"Central Universities of Central South University","award":["CX20220276"],"award-info":[{"award-number":["CX20220276"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,1,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of scMAE is available at: https:\/\/zenodo.org\/records\/10465991.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae020","type":"journal-article","created":{"date-parts":[[2024,1,17]],"date-time":"2024-01-17T12:34:06Z","timestamp":1705494846000},"source":"Crossref","is-referenced-by-count":43,"title":["scMAE: a masked autoencoder for single-cell RNA-seq clustering"],"prefix":"10.1093","volume":"40","author":[{"given":"Zhaoyu","family":"Fang","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District , Changsha 410083, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6372-6798","authenticated-orcid":false,"given":"Ruiqing","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District , Changsha 410083, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0188-1394","authenticated-orcid":false,"given":"Min","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, 932 South Lushan Road, Yuelu District , Changsha 410083, China"}]}],"member":"286","published-online":{"date-parts":[[2024,1,16]]},"reference":[{"key":"2024020112525828800_btae020-B1","doi-asserted-by":"crossref","first-page":"2128","DOI":"10.1038\/s41467-017-02001-5","article-title":"Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing","volume":"8","author":"Bach","year":"2017","journal-title":"Nat Commun"},{"key":"2024020112525828800_btae020-B2","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cels.2016.08.011","article-title":"A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure","volume":"3","author":"Baron","year":"2016","journal-title":"Cell Syst"},{"key":"2024020112525828800_btae020-B3","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J Stat Mech"},{"key":"2024020112525828800_btae020-B4","doi-asserted-by":"crossref","first-page":"e12242","DOI":"10.7554\/eLife.12242","article-title":"Rhodopsin targeted transcriptional silencing by DNA-binding","volume":"5","author":"Botta","year":"2016","journal-title":"eLife"},{"key":"2024020112525828800_btae020-B5","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1038\/nbt.3102","article-title":"Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells","volume":"33","author":"Buettner","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2024020112525828800_btae020-B6","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1126\/science.aam8940","article-title":"Comprehensive single-cell transcriptional profiling of a multicellular organism","volume":"357","author":"Cao","year":"2017","journal-title":"Science"},{"key":"2024020112525828800_btae020-B7","doi-asserted-by":"crossref","first-page":"lqaa039","DOI":"10.1093\/nargab\/lqaa039","article-title":"Deep soft K-means clustering with self-training for single-cell RNA sequence data","volume":"2","author":"Chen","year":"2020","journal-title":"NAR Genom Bioinform"},{"key":"2024020112525828800_btae020-B8","first-page":"1597","author":"Chen","year":"2020"},{"key":"2024020112525828800_btae020-B9","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1186\/s12859-021-04210-8","article-title":"Contrastive self-supervised clustering of scRNA-seq data","volume":"22","author":"Ciortan","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2024020112525828800_btae020-B10","doi-asserted-by":"crossref","first-page":"1037","DOI":"10.1093\/bioinformatics\/btab787","article-title":"GNN-based embedding for clustering scRNA-seq data","volume":"38","author":"Ciortan","year":"2022","journal-title":"Bioinformatics"},{"key":"2024020112525828800_btae020-B11","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"1","author":"Davies","year":"1979","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024020112525828800_btae020-B12","author":"Devlin"},{"key":"2024020112525828800_btae020-B13","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat Commun"},{"key":"2024020112525828800_btae020-B14","doi-asserted-by":"crossref","first-page":"btac757","DOI":"10.1093\/bioinformatics\/btac757","article-title":"GSEApy: a comprehensive package for performing gene set enrichment analysis in python","volume":"39","author":"Fang","year":"2023","journal-title":"Bioinformatics"},{"key":"2024020112525828800_btae020-B15","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1038\/s41422-018-0099-2","article-title":"The adult human testis transcriptional cell atlas","volume":"28","author":"Guo","year":"2018","journal-title":"Cell Res"},{"key":"2024020112525828800_btae020-B16","doi-asserted-by":"crossref","first-page":"bbac377","DOI":"10.1093\/bib\/bbac377","article-title":"Self-supervised contrastive learning for integrative single cell RNA-seq data analysis","volume":"23","author":"Han","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024020112525828800_btae020-B17","first-page":"100","article-title":"Algorithm as 136: a k-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J R Stat Soc Ser C (Appl Stat)"},{"key":"2024020112525828800_btae020-B18","author":"He"},{"key":"2024020112525828800_btae020-B19","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1038\/s41593-017-0029-5","article-title":"Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex","volume":"21","author":"Hrvatin","year":"2018","journal-title":"Nat Neurosci"},{"key":"2024020112525828800_btae020-B20","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"key":"2024020112525828800_btae020-B21","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"key":"2024020112525828800_btae020-B22","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1038\/s41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat Rev Genet"},{"key":"2024020112525828800_btae020-B23","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1186\/s13059-017-1188-0","article-title":"CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data","volume":"18","author":"Lin","year":"2017","journal-title":"Genome Biol"},{"key":"2024020112525828800_btae020-B24","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2024020112525828800_btae020-B25","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1038\/s41587-021-01001-7","article-title":"Mapping single-cell data to reference atlases by transfer learning","volume":"40","author":"Lotfollahi","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2024020112525828800_btae020-B26","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"2024020112525828800_btae020-B27","doi-asserted-by":"crossref","first-page":"1326","DOI":"10.1126\/science.aaf6463","article-title":"Oligodendrocyte heterogeneity in the mouse juvenile and adult central nervous system","volume":"352","author":"Marques","year":"2016","journal-title":"Science"},{"key":"2024020112525828800_btae020-B28","doi-asserted-by":"crossref","first-page":"3235","DOI":"10.1093\/bioinformatics\/btab276","article-title":"Clustering single-cell RNA-seq data by rank constrained similarity learning","volume":"37","author":"Mei","year":"2021","journal-title":"Bioinformatics"},{"key":"2024020112525828800_btae020-B29","doi-asserted-by":"crossref","first-page":"3157","DOI":"10.1096\/fj.11-186767","article-title":"Defective photoreceptor phagocytosis in a mouse model of enhanced s-cone syndrome causes progressive retinal degeneration","volume":"25","author":"Mustafi","year":"2011","journal-title":"FASEB J"},{"key":"2024020112525828800_btae020-B30","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1038\/nri.2017.76","article-title":"Single-cell RNA sequencing to explore immune cell heterogeneity","volume":"18","author":"Papalexi","year":"2018","journal-title":"Nat Rev Immunol"},{"key":"2024020112525828800_btae020-B31","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/nbt.2967","article-title":"Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex","volume":"32","author":"Pollen","year":"2014","journal-title":"Nat Biotechnol"},{"key":"2024020112525828800_btae020-B32","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1093\/bib\/bbz062","article-title":"Clustering and classification methods for single-cell RNA-sequencing data","volume":"21","author":"Qi","year":"2020","journal-title":"Brief Bioinform"},{"key":"2024020112525828800_btae020-B33","doi-asserted-by":"crossref","first-page":"bbaa216","DOI":"10.1093\/bib\/bbaa216","article-title":"A spectral clustering with self-weighted multiple kernel learning method for single-cell RNA-seq data","volume":"22","author":"Qi","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024020112525828800_btae020-B34","doi-asserted-by":"crossref","first-page":"bbad149","DOI":"10.1093\/bib\/bbad149","article-title":"SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data","volume":"24","author":"Qiu","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024020112525828800_btae020-B35","author":"Radford","year":"2018"},{"key":"2024020112525828800_btae020-B36","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J Comput Appl Math"},{"key":"2024020112525828800_btae020-B37","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1038\/nbt.3192","article-title":"Spatial reconstruction of single-cell gene expression data","volume":"33","author":"Satija","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2024020112525828800_btae020-B38","doi-asserted-by":"crossref","first-page":"1308","DOI":"10.1016\/j.cell.2016.07.054","article-title":"Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics","volume":"166","author":"Shekhar","year":"2016","journal-title":"Cell"},{"key":"2024020112525828800_btae020-B39","doi-asserted-by":"crossref","first-page":"3418","DOI":"10.1093\/bioinformatics\/btaa169","article-title":"Interpretable factor models of single-cell RNA-seq via variational autoencoders","volume":"36","author":"Svensson","year":"2020","journal-title":"Bioinformatics"},{"key":"2024020112525828800_btae020-B40","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris","volume":"562","author":"Tabula Muris Consortium","year":"2018","journal-title":"Nature"},{"key":"2024020112525828800_btae020-B41","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nat Mach Intell"},{"key":"2024020112525828800_btae020-B42","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1126\/science.aad0501","article-title":"Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq","volume":"352","author":"Tirosh","year":"2016","journal-title":"Science"},{"key":"2024020112525828800_btae020-B43","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1126\/science.aar4237","article-title":"Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles","volume":"360","author":"Tosches","year":"2018","journal-title":"Science"},{"key":"2024020112525828800_btae020-B44","doi-asserted-by":"crossref","first-page":"5233","DOI":"10.1038\/s41598-019-41695-z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"Traag","year":"2019","journal-title":"Sci Rep"},{"key":"2024020112525828800_btae020-B45","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/s13059-019-1850-9","article-title":"A benchmark of batch-effect correction methods for single-cell RNA sequencing data","volume":"21","author":"Tran","year":"2020","journal-title":"Genome Biol"},{"key":"2024020112525828800_btae020-B46","author":"Tschannen","year":"2018"},{"key":"2024020112525828800_btae020-B47","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btac011","article-title":"scNAME: neighborhood contrastive clustering with ancillary mask estimation for scRNA-seq data","volume":"38","author":"Wan","year":"2022","journal-title":"Bioinformatics"},{"key":"2024020112525828800_btae020-B48","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat Methods"},{"key":"2024020112525828800_btae020-B49","doi-asserted-by":"crossref","first-page":"1882","DOI":"10.1038\/s41467-021-22197-x","article-title":"scGNN is a novel graph neural network framework for single-cell RNA-seq analyses","volume":"12","author":"Wang","year":"2021","journal-title":"Nat Commun"},{"key":"2024020112525828800_btae020-B50","doi-asserted-by":"crossref","first-page":"bbab345","DOI":"10.1093\/bib\/bbab345","article-title":"A comparison of deep learning-based pre-processing and clustering approaches for single-cell RNA sequencing data","volume":"23","author":"Wang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024020112525828800_btae020-B51","doi-asserted-by":"crossref","first-page":"2407","DOI":"10.1073\/pnas.1719474115","article-title":"Pulmonary alveolar type I cell population consists of two distinct subtypes that differ in cell fate","volume":"115","author":"Wang","year":"2018","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024020112525828800_btae020-B52","doi-asserted-by":"crossref","first-page":"bbac311","DOI":"10.1093\/bib\/bbac311","article-title":"GLOBE: a contrastive learning-based framework for integrating single-cell transcriptome datasets","volume":"23","author":"Yan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024020112525828800_btae020-B53","doi-asserted-by":"crossref","first-page":"594","DOI":"10.1126\/science.aat1699","article-title":"Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors","volume":"361","author":"Young","year":"2018","journal-title":"Science"},{"key":"2024020112525828800_btae020-B54","doi-asserted-by":"crossref","first-page":"173902","DOI":"10.1007\/s11704-022-2011-y","article-title":"AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction","volume":"17","author":"Zhao","year":"2023","journal-title":"Front Comput Sci"},{"key":"2024020112525828800_btae020-B55","doi-asserted-by":"crossref","first-page":"181901","DOI":"10.1007\/s11704-022-2111-8","article-title":"cKBET: assessing goodness of batch effect correction for single-cell RNA-seq","volume":"18","author":"Zhao","year":"2024","journal-title":"Front Comput Sci"},{"key":"2024020112525828800_btae020-B56","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1186\/s12859-016-0984-y","article-title":"pcaReduce: hierarchical clustering of single cell transcriptional profiles","volume":"17","author":"\u017durauskien\u0117","year":"2016","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae020\/56167979\/btae020.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/1\/btae020\/56530641\/btae020.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/1\/btae020\/56530641\/btae020.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T12:58:53Z","timestamp":1706792333000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae020\/7564641"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1,1]]},"references-count":56,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae020","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2024,1,1]]},"article-number":"btae020"}}