{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T10:53:51Z","timestamp":1772448831275,"version":"3.50.1"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T00:00:00Z","timestamp":1772409600000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R15HG012087"],"award-info":[{"award-number":["R15HG012087"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Single-cell RNA sequencing (scRNA-seq) enables high-resolution analysis of gene expression at the individual cell level, with clustering serving as a critical step for identifying distinct cell populations. Due to the high dimensionality and sparsity of scRNA-seq data, existing approaches typically perform gene selection prior to clustering. However, treating feature selection as a separate preprocessing step can overlook latent clustering structure and often results in suboptimal outcomes, as it does not guarantee that the selected genes are informative for clustering. To address this limitation, we propose FSSC (Feature Selection for scRNA-seq Clustering), a unified framework for joint feature selection and clustering in scRNA-seq analysis. FSSC integrates a zero-inflated negative binomial (ZINB) autoencoder with a group Lasso penalty and a dedicated clustering loss. This joint optimization enables the model to simultaneously learn low-dimensional representations and select a compact set of cluster-discriminatory genes, preserving both the statistical characteristics of scRNA-seq data and its underlying cluster structure. Extensive experiments on both simulated and real scRNA-seq datasets demonstrate that FSSC consistently outperforms state-of-the-art methods in clustering accuracy and effectively identifies a compact, biologically meaningful set of marker genes.<\/jats:p>","DOI":"10.1093\/bib\/bbag082","type":"journal-article","created":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T12:44:34Z","timestamp":1770986674000},"source":"Crossref","is-referenced-by-count":0,"title":["Integrating feature selection with unsupervised deep embedding for clustering single-cell RNA-seq data"],"prefix":"10.1093","volume":"27","author":[{"given":"Cheng","family":"Zhong","sequence":"first","affiliation":[{"name":"Department of Computer Science, New Jersey Institute of Technology , 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 ,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7675-663X","authenticated-orcid":false,"given":"Siqi","family":"Jiang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New Jersey Institute of Technology , 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 ,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6059-4267","authenticated-orcid":false,"given":"Zhi","family":"Wei","sequence":"additional","affiliation":[{"name":"Department of Computer Science, New Jersey Institute of Technology , 323 Dr Martin Luther King Jr Blvd, Newark, NJ 07102 ,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2026,3,2]]},"reference":[{"key":"2026030204572169600_ref1","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1016\/j.mam.2017.07.003","article-title":"Single-cell RNA sequencing: Technical advancements and biological applications","volume":"59","author":"Hedlund","year":"2018","journal-title":"Mol Asp Med"},{"key":"2026030204572169600_ref2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.coisb.2017.07.004","article-title":"Single cells make big data: New challenges and opportunities in transcriptomics","volume":"4","author":"Angerer","year":"2017","journal-title":"Current opinion in systems biology"},{"key":"2026030204572169600_ref3","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1186\/s12859-018-2092-7","article-title":"An interpretable framework for clustering single-cell RNA-Seq datasets","volume":"19","author":"Zhang","year":"2018","journal-title":"BMC bioinformatics"},{"key":"2026030204572169600_ref4","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1038\/nmeth.4207","article-title":"Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning","volume":"14","author":"Wang","year":"2017","journal-title":"Nat Methods"},{"key":"2026030204572169600_ref5","doi-asserted-by":"publisher","first-page":"2069","DOI":"10.1093\/bioinformatics\/bty050","article-title":"Spectral clustering based on learning similarity matrix","volume":"34","author":"Park","year":"2018","journal-title":"Bioinformatics"},{"key":"2026030204572169600_ref6","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1038\/s41592-019-0353-7","article-title":"Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning","volume":"16","author":"Deng","year":"2019","journal-title":"Nat Methods"},{"key":"2026030204572169600_ref7","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"Eraslan","year":"2019","journal-title":"Nat Commun"},{"key":"2026030204572169600_ref8","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1038\/s42256-019-0037-0","article-title":"Clustering single-cell RNA-seq data with a model-based deep learning approach","volume":"1","author":"Tian","year":"2019","journal-title":"Nature Machine Intelligence"},{"key":"2026030204572169600_ref9","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"Lopez","year":"2018","journal-title":"Nat Methods"},{"key":"2026030204572169600_ref10","doi-asserted-by":"publisher","first-page":"2002","DOI":"10.1038\/s41467-018-04368-5","article-title":"Interpretable dimensionality reduction of single cell transcriptome data with deep generative models","volume":"9","author":"Ding","year":"2018","journal-title":"Nat Commun"},{"key":"2026030204572169600_ref11","doi-asserted-by":"publisher","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2026030204572169600_ref12","doi-asserted-by":"publisher","first-page":"2865","DOI":"10.1093\/bioinformatics\/bty1044","article-title":"M3Drop: Dropout-based feature selection for scRNASeq","volume":"35","author":"Andrews","year":"2019","journal-title":"Bioinformatics"},{"key":"2026030204572169600_ref13","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1038\/s41467-023-43406-9","article-title":"Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq","volume":"15","author":"Tyler","year":"2024","journal-title":"Nat Commun"},{"key":"2026030204572169600_ref14","article-title":"Penalized model-based clustering with application to variable selection","volume":"8","author":"Pan","year":"2007","journal-title":"J Mach Learn Res"},{"key":"2026030204572169600_ref15","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1214\/20-aoas1407","article-title":"Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery","volume":"15","author":"Lim","year":"2021","journal-title":"Ann Appl Stat"},{"key":"2026030204572169600_ref16","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1198\/016214501753382273","article-title":"Variable selection via nonconcave penalized likelihood and its oracle properties","volume":"96","author":"Fan","year":"2001","journal-title":"J Am Stat Assoc"},{"key":"2026030204572169600_ref17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/23-AOAS1761","article-title":"RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data","volume":"18","author":"Mi","year":"2024","journal-title":"Ann Appl Stat"},{"key":"2026030204572169600_ref18","doi-asserted-by":"publisher","first-page":"bbad475","DOI":"10.1093\/bib\/bbad475","article-title":"CAKE: A flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification","volume":"25","author":"Liu","year":"2024","journal-title":"Brief Bioinform"},{"key":"2026030204572169600_ref19","article-title":"Sparse-input neural networks for high-dimensional nonparametric regression and classification","author":"Feng"},{"key":"2026030204572169600_ref20","volume-title":"Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)","author":"Lemhadri","year":"2021"},{"key":"2026030204572169600_ref21","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1111\/j.1467-9868.2005.00532.x","article-title":"Model selection and estimation in regression with grouped variables","volume":"68","author":"Yuan","year":"2006","journal-title":"J R Stat Soc Series B Stat Methodology"},{"key":"2026030204572169600_ref22","article-title":"Concrete autoencoders for differentiable feature selection and reconstruction","author":"Abid","year":"2019"},{"key":"2026030204572169600_ref23","doi-asserted-by":"crossref","first-page":"997","DOI":"10.26599\/BDMA.2025.9020009","article-title":"A flexible data-driven framework for correcting coarsely annotated scRNA-seq data","volume":"8","author":"Zheng","year":"2025","journal-title":"Big Data Mining and Analytics"},{"key":"2026030204572169600_ref24","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: Simulation of single-cell RNA sequencing data","volume":"18","author":"Zappia","year":"2017","journal-title":"Genome Biol"},{"key":"2026030204572169600_ref25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J Stat Softw"},{"key":"2026030204572169600_ref26","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: Consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat Methods"},{"key":"2026030204572169600_ref27","doi-asserted-by":"publisher","first-page":"bbab034","DOI":"10.1093\/bib\/bbab034","article-title":"Accurate feature selection improves single-cell RNA-seq cell clustering","volume":"22","author":"Su","year":"2021","journal-title":"Brief Bioinform"},{"key":"2026030204572169600_ref28","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"key":"2026030204572169600_ref29","volume-title":"Proceedings of the 27th International Conference on Machine Learning (ICML-10)","author":"Nair","year":"2010"},{"key":"2026030204572169600_ref30","volume-title":"Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM \u201900)","author":"Nigam","year":"2000"},{"key":"2026030204572169600_ref31","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML 2016)","author":"Xie","year":"2016"},{"key":"2026030204572169600_ref32","volume-title":"In Ijcai","author":"Guo"},{"key":"2026030204572169600_ref33","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1080\/10618600.2012.681250","article-title":"A sparse-group lasso","volume":"22","author":"Simon","year":"2013","journal-title":"J Comput Graph Stat"},{"key":"2026030204572169600_ref34","doi-asserted-by":"publisher","first-page":"3625","DOI":"10.1242\/dev.151142","article-title":"Psychrophilic proteases dramatically reduce single-cell RNA-seq artifacts: A molecular atlas of kidney development","volume":"144","author":"Adam","year":"2017","journal-title":"Development"},{"key":"2026030204572169600_ref35","doi-asserted-by":"publisher","first-page":"1091","DOI":"10.1016\/j.cell.2018.02.001","article-title":"Mapping the mouse cell atlas by microwell-seq","volume":"172","author":"Han","year":"2018","journal-title":"Cell"},{"key":"2026030204572169600_ref36","doi-asserted-by":"publisher","first-page":"1915","DOI":"10.1101\/gad.17446611","article-title":"Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses","volume":"25","author":"Cabili","year":"2011","journal-title":"Genes Dev"},{"key":"2026030204572169600_ref37","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1016\/j.ebiom.2018.07.022","article-title":"The long noncoding RNA landscape in amygdala tissues from schizophrenia patients","volume":"34","author":"Tian","year":"2018","journal-title":"EBioMedicine"},{"key":"2026030204572169600_ref38","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1186\/s13059-015-0844-5","article-title":"MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data","volume":"16","author":"Finak","year":"2015","journal-title":"Genome Biol"},{"key":"2026030204572169600_ref39","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J R Stat Soc Series B Stat Methodology"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/27\/2\/bbag082\/67196994\/bbag082.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/27\/2\/bbag082\/67196994\/bbag082.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T09:57:35Z","timestamp":1772445455000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbag082\/8503314"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,1]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbag082","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,3]]},"published":{"date-parts":[[2026,3,1]]},"article-number":"bbag082"}}