{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T15:51:39Z","timestamp":1753890699150,"version":"3.41.2"},"reference-count":57,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T00:00:00Z","timestamp":1749600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>The accurate clustering of cell subpopulations is a crucial aspect of single-cell RNA sequencing. The ability to correctly subdivide cell subpopulations hinges on the efficacy of unsupervised clustering. Despite the advancements and numerous adaptations of clustering algorithms, the correct clustering of cells remains a challenging endeavor that is dependent on the data in question and on the parameters selected for the clustering process. In this context, the present study aimed to predict the accuracy of clustering methods when varying different parameters by exploiting the intrinsic goodness metrics.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>This study utilized three datasets, each originating from a distinct anatomical district and with a ground truth cell annotation. Moreover, the investigation employed two clustering methods: the Leiden and the Deep Embedding for Single-cell Clustering (DESC) algorithm. Firstly, a robust linear mixed regression model has been implemented in order to analyze the impact of clustering parameters on the accuracy. Consequently, fifteen intrinsic measures have been calculated and used to train an ElasticNet regression model in both intra- and cross-dataset approaches to evaluate the possibility of predicting the clustering accuracy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results and discussion<\/jats:title><jats:p>The first-order interactions demonstrated that the use of the UMAP method for the generation of the neighborhood graph and an increase in resolution has a beneficial impact on accuracy. The impact of the resolution parameter is accentuated by the reduced number of nearest neighbors, resulting in sparser and more locally sensitive graphs, which better preserve fine-grained cellular relationships. Furthermore, it is advisable to test different numbers of principal components, given that this parameter is highly affected by data complexity. This procedure has enabled the effective prediction of clustering accuracy through the utilization of intrinsic metrics. The findings demonstrated that the within-cluster dispersion and the Banfield-Raftery index could be effectively used as proxies for accuracy, for an immediate comparison of different clustering parameter configurations.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fbinf.2025.1562410","type":"journal-article","created":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T05:14:29Z","timestamp":1749618869000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics"],"prefix":"10.3389","volume":"5","author":[{"given":"Nicolina","family":"Sciaraffa","sequence":"first","affiliation":[]},{"given":"Antonino","family":"Gagliano","sequence":"additional","affiliation":[]},{"given":"Luigi","family":"Augugliaro","sequence":"additional","affiliation":[]},{"given":"Claudia","family":"Coronnello","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,6,11]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"30638","DOI":"10.1109\/ACCESS.2024.3368637","article-title":"Measuring connectivity in linear multivariate processes with penalized regression techniques","volume":"12","author":"Antonacci","year":"2024","journal-title":"IEEE Access"},{"volume-title":"ISODATA: a novel method of data analysis and pattern classification","year":"1965","author":"Ball","key":"B2"},{"key":"B3","doi-asserted-by":"publisher","first-page":"1441","DOI":"10.1109\/TKDE.2008.79","article-title":"A point symmetry-based clustering technique for automatic evolution of clusters","volume":"20","author":"Bandyopadhyay","year":"2008","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"B4","doi-asserted-by":"publisher","first-page":"803","DOI":"10.2307\/2532201","article-title":"Model-based Gaussian and non-Gaussian clustering","volume":"49","author":"Banfield","year":"1993","journal-title":"Biometrics"},{"key":"B5","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1111\/bmsp.12186","article-title":"Combining diversity and dispersion criteria for anticlustering: a bicriterion approach","volume":"73","author":"Brusco","year":"2020","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"B6","doi-asserted-by":"publisher","first-page":"e143","DOI":"10.1093\/nar\/gkz826","article-title":"Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data","volume":"47","author":"Cheng","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"B7","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1038\/S41587-021-01033-Z","article-title":"Differential abundance testing on single-cell data using k-nearest neighbor graphs","volume":"40","author":"Dann","year":"2022","journal-title":"Nat. Biotechnol."},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/2005\/09\/P09008","article-title":"Comparing community structure identification","volume":"9","author":"Danon","year":"2005","journal-title":"J. Stat. Mech.: Theory Exp."},{"key":"B9","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1186\/s13395-020-00236-3","article-title":"A reference single-cell transcriptomic atlas of human skeletal muscle tissue reveals bifurcated muscle stem cell populations","volume":"10","author":"De Micheli","year":"2020","journal-title":"Skelet. Muscle"},{"key":"B10","doi-asserted-by":"publisher","first-page":"eabl5197","DOI":"10.1126\/science.abl5197","article-title":"Cross-tissue immune cell analysis reveals tissue-specific features in humans","volume":"1979","author":"Dom\u00ednguez Conde","year":"2022","journal-title":"Science"},{"key":"B11","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1080\/01969727408546059","article-title":"Well separated clusters and optimal fuzzy partitions","volume":"4","author":"Dunn","year":"1974","journal-title":"J. Cybern."},{"key":"B12","doi-asserted-by":"publisher","first-page":"1141","DOI":"10.12688\/f1000research.15666.1","article-title":"A systematic performance evaluation of clustering methods for single-cell RNA-seq data","volume":"7","author":"Du\u00f2","year":"2018","journal-title":"F1000Res"},{"key":"B13","doi-asserted-by":"publisher","first-page":"362","DOI":"10.2307\/2528096","article-title":"A method for cluster analysis","volume":"56","author":"Edwards","year":"1965","journal-title":"Biometrika"},{"key":"B14","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"B15","doi-asserted-by":"publisher","first-page":"2989","DOI":"10.1093\/BIOINFORMATICS\/BTV325","article-title":"Diffusion maps for high-dimensional single-cell analysis of differentiation data","volume":"31","author":"Haghverdi","year":"2015","journal-title":"Bioinformatics"},{"key":"B16","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1023\/a:1012801612483","article-title":"On clustering validation techniques","volume":"17","author":"Halkidi","year":"2001","journal-title":"J. Intell. Inf. Syst."},{"key":"B17","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/B978-0-12-381479-1.00010-1","article-title":"Cluster analysis: basic concepts and methods","author":"Han","year":"2012","journal-title":"Data Min."},{"key":"B18","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/bf01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"B19","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1111\/j.2044-8317.1976.tb00714.x","article-title":"Quadratic assignment as a general data-analysis strategy","volume":"29","author":"Hubert","year":"1976","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"B20","doi-asserted-by":"publisher","first-page":"4719","DOI":"10.1038\/s41467-018-07234-6","article-title":"Discovery of rare cells from voluminous single cell expression data","volume":"9","author":"Jindal","year":"2018","journal-title":"Nat. Commun."},{"key":"B21","doi-asserted-by":"publisher","first-page":"e694","DOI":"10.1002\/ctm2.694","article-title":"Single\u2010cell RNA sequencing technologies and applications: a brief overview","volume":"12","author":"Jovic","year":"2022","journal-title":"Clin. Transl. Med."},{"key":"B22","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1038\/S41576-018-0088-9","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","volume":"20","author":"Kiselev","year":"2019","journal-title":"Nat. Rev. Genet."},{"key":"B23","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"Kiselev","year":"2017","journal-title":"Nat. Methods"},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.18637\/JSS.V075.I06","article-title":"Robustlmm: an R package for Robust estimation of linear Mixed-Effects models","volume":"75","author":"Koller","year":"2016","journal-title":"J. Stat. Softw."},{"key":"B25","doi-asserted-by":"publisher","first-page":"2338","DOI":"10.1038\/s41467-020-15851-3","article-title":"Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis","volume":"11","author":"Li","year":"2020","journal-title":"Nat. Commun."},{"key":"B26","doi-asserted-by":"publisher","first-page":"bbad497","DOI":"10.1093\/bib\/bbad497","article-title":"A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study","volume":"25","author":"Liang","year":"2024","journal-title":"Brief. Bioinform"},{"key":"B27","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1186\/s13059-017-1188-0","article-title":"CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data","volume":"18","author":"Lin","year":"2017","journal-title":"Genome Biol."},{"key":"B28","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1186\/s13059-021-02445-5","article-title":"MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data","volume":"22","author":"Liu","year":"2021","journal-title":"Genome Biol."},{"key":"B29","doi-asserted-by":"publisher","first-page":"bbae486","DOI":"10.1093\/bib\/bbae486","article-title":"scDFN: enhancing single-cell RNA-seq clustering with deep fusion networks","volume":"25","author":"Liu","year":"2024","journal-title":"Brief. Bioinform"},{"key":"B30","doi-asserted-by":"publisher","first-page":"8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single\u2010cell RNA\u2010seq analysis: a tutorial","volume":"15","author":"Luecken","year":"2019","journal-title":"Mol. Syst. Biol."},{"key":"B31","doi-asserted-by":"publisher","first-page":"1494","DOI":"10.3758\/S13428-016-0809-Y","article-title":"Evaluating significance in linear mixed-effects models in R","volume":"49","author":"Luke","year":"2017","journal-title":"Behav. Res. Methods"},{"key":"B32","doi-asserted-by":"publisher","first-page":"4383","DOI":"10.1038\/s41467-018-06318-7","article-title":"Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations","volume":"9","author":"MacParland","year":"2018","journal-title":"Nat. Commun."},{"key":"B33","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1016\/j.patcog.2003.06.005","article-title":"Validity index for crisp and fuzzy clusters","volume":"37","author":"Malay","year":"2004","journal-title":"Pattern Recognit."},{"key":"B34","doi-asserted-by":"publisher","first-page":"501","DOI":"10.2307\/2528592","article-title":"Practical problems in a method of cluster analysis","author":"Marriot","year":"1971","journal-title":"Biometrics"},{"key":"B35","first-page":"456","article-title":"Clustisz: a program to test for the quality of clustering of a set of objects","author":"McClain","year":"1975","journal-title":"J. Mark. Res."},{"key":"B36","unstructured":"UMAP: uniform manifold approximation and projection for dimension reduction\n          \n          \n            \n              McInnes\n              L.\n            \n            \n              Healy\n              J.\n            \n            \n              Melville\n              J.\n            \n          \n          \n          2018"},{"key":"B37","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1007\/bf02293899","article-title":"A Monte Carlo study of thirty internal criterion measures for cluster analysis","volume":"46","author":"Milligan","year":"1981","journal-title":"Psychometrika"},{"key":"B38","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1007\/s10456-021-09797-3","article-title":"Endothelial cell plasticity at the single-cell level","volume":"24","author":"Pasut","year":"2021","journal-title":"Angiogenesis"},{"key":"B39","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1186\/s12859-021-03957-4","article-title":"Selecting single cell clustering parameter values using subsampling-based robustness metrics","volume":"22","author":"Patterson-Cross","year":"2021","journal-title":"BMC Bioinforma."},{"key":"B40","doi-asserted-by":"publisher","first-page":"2825","DOI":"10.5555\/1953048.2078195","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"B41","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Peter","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"B42","doi-asserted-by":"publisher","first-page":"536","DOI":"10.1186\/s12859-022-05085-z","article-title":"SC3s: efficient scaling of single cell consensus clustering to millions of cells","volume":"23","author":"Quah","year":"2022","journal-title":"BMC Bioinforma."},{"key":"B43","first-page":"137","article-title":"Determination of number of clusters in K-means clustering and application in colour image segmentation","volume-title":"Proceedings of the 4th international conference on advances in pattern recognition and digital techniques","author":"Ray","year":"1999"},{"key":"B44","doi-asserted-by":"publisher","first-page":"102423","DOI":"10.1016\/J.JASREP.2020.102423","article-title":"Modern methods for old data: an overview of some robust methods for outliers detection with applications in osteology","volume":"32","author":"Santos","year":"2020","journal-title":"J. Archaeol. Sci. Rep."},{"key":"B45","doi-asserted-by":"publisher","first-page":"1888","DOI":"10.1016\/J.CELL.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"B46","doi-asserted-by":"publisher","first-page":"105117","DOI":"10.1016\/j.chemolab.2024.105117","article-title":"Extended multivariate comparison of 68 cluster validity indices. A review","volume":"251","author":"Todeschini","year":"2024","journal-title":"Chemom. Intelligent Laboratory Syst."},{"key":"B47","doi-asserted-by":"publisher","first-page":"5233","DOI":"10.1038\/S41598-019-41695-Z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"Traag","year":"2019","journal-title":"Sci. Rep."},{"key":"B48","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1038\/s41598-021-03613-0","article-title":"Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data","volume":"12","author":"Vasighizaker","year":"2022","journal-title":"Sci. Rep."},{"key":"B49","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol."},{"key":"B50","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1186\/s13059-019-1663-x","article-title":"PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells","volume":"20","author":"Wolf","year":"2019","journal-title":"Genome Biol."},{"key":"B51","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1109\/34.85677","article-title":"A validity measure for fuzzy clustering","volume":"13","author":"Xie","year":"1991","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"B57","doi-asserted-by":"crossref","first-page":"5876","DOI":"10.1016\/j.cell.2023.11.026","article-title":"Automatic cell-type harmonization and integration across Human Cell Atlas datasets","volume":"18","author":"Xu","year":"2023","journal-title":"Cell"},{"key":"B52","doi-asserted-by":"publisher","first-page":"e2400002121","DOI":"10.1073\/pnas.2400002121","article-title":"Single-cell analysis via manifold fitting: a framework for RNA clustering and beyond","volume":"121","author":"Yao","year":"2024","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"B53","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1186\/s13059-022-02622-0","article-title":"Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data","volume":"23","author":"Yu","year":"2022","journal-title":"Genome Biol."},{"key":"B54","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2012.14982","article-title":"Elastic net based feature ranking and selection","author":"Yu","year":"2020"},{"key":"B55","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1261\/rna.078965.121","article-title":"Review of single-cell RNA-seq data clustering for cell-type identification and characterization","volume":"29","author":"Zhang","year":"2023"},{"key":"B56","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1111\/J.1467-9868.2005.00503.X","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1562410\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T05:14:31Z","timestamp":1749618871000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1562410\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,11]]},"references-count":57,"alternative-id":["10.3389\/fbinf.2025.1562410"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1562410","relation":{},"ISSN":["2673-7647"],"issn-type":[{"type":"electronic","value":"2673-7647"}],"subject":[],"published":{"date-parts":[[2025,6,11]]},"article-number":"1562410"}}