{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T04:37:44Z","timestamp":1778647064995,"version":"3.51.4"},"reference-count":58,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T00:00:00Z","timestamp":1744934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deanship of Scientific Research","award":["KFU250055"],"award-info":[{"award-number":["KFU250055"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Accurate cell type annotation is a critical step in single-cell RNA sequencing (scRNA-seq) analysis, enabling deeper insights into cellular heterogeneity and biological processes. In this study, we conducted a comprehensive comparative evaluation of various machine learning techniques, including support vector machine (SVM), decision tree, random forest, logistic regression, gradient boosting, k-nearest neighbour, transformer, and naive Bayes, to determine their effectiveness for single-cell annotation. These methods were evaluated using four diverse datasets comprising hundreds of cell types across several tissues. Our results revealed that SVM consistently outperformed other techniques, emerging as the top performer in three out of the four datasets, followed closely by logistic regression. Most methods demonstrated robust capabilities in annotating major cell types and identifying rare cell populations, though naive Bayes was the least effective due to its inherent limitations in handling high-dimensional and interdependent data. This study provides valuable insights into the relative strengths and weaknesses of machine learning methods for single-cell annotation, offering guidance for selecting appropriate techniques in scRNA-seq analyses.<\/jats:p>","DOI":"10.3390\/a18040232","type":"journal-article","created":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T03:27:46Z","timestamp":1744946866000},"page":"232","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A Comparative Study of Machine Learning Techniques for Cell Annotation of scRNA-Seq Data"],"prefix":"10.3390","volume":"18","author":[{"given":"Shahid Ahmad","family":"Wani","sequence":"first","affiliation":[{"name":"Department of Computer Science, Jamia Millia Islamia, New Delhi 110025, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"SMK","family":"Quadri","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Jamia Millia Islamia, New Delhi 110025, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3212-5111","authenticated-orcid":false,"given":"Mohammad Shuaib","family":"Mir","sequence":"additional","affiliation":[{"name":"Department of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6515-1569","authenticated-orcid":false,"given":"Yonis","family":"Gulzar","sequence":"additional","affiliation":[{"name":"Department of Management Information Systems, College of Business Administration, King Faisal University, Al-Ahsa 31982, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,18]]},"reference":[{"key":"ref_1","unstructured":"Zimmer, C. (National Geographic Magazine, 2013). How Many Cells Are In Your Body?, National Geographic Magazine."},{"key":"ref_2","unstructured":"L\u00fccken, M.D., Burkhardt, D.B., Cannoodt, R., Lance, C., Agrawal, A., Aliee, H., Chen, A.T., Deconinck, L., Detweiler, A.M., and Granados, A.A. (2021, January 6\u201314). A sandbox for prediction and integration of DNA, RNA, and protein data in single cells. Proceedings of the NeurIPS 2021 Track Datasets and Benchmarks, Virtual."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/nmeth.1315","article-title":"mRNA-Seq whole-transcriptome analysis of a single cell","volume":"6","author":"Tang","year":"2009","journal-title":"Nat. Methods"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive Integration of Single-Cell Data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1126\/science.aad0501","article-title":"Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq","volume":"352","author":"Tirosh","year":"2016","journal-title":"Science"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1491","DOI":"10.1101\/gr.190595.115","article-title":"Defining cell types and states with single-cell genomics","volume":"25","author":"Trapnell","year":"2015","journal-title":"Genome Res."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Lyubimova","year":"2015","journal-title":"Nature"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1038\/nbt.3854","article-title":"Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development","volume":"35","author":"Rizvi","year":"2017","journal-title":"Nat. Biotechnol."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1242\/dev.133058","article-title":"Understanding development and stem cells using single cell-based analyses of gene expression","volume":"144","author":"Kumar","year":"2017","journal-title":"Development"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lyu, P., Zhai, Y., Li, T., and Qian, J. (2023). CellAnn: A comprehensive, super-fast, and user-friendly single-cell annotation web server. Bioinformatics, 39.","DOI":"10.1093\/bioinformatics\/btad521"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2749","DOI":"10.1038\/s41596-021-00534-0","article-title":"Tutorial: Guidelines for annotating single-cell transcriptomic maps using automated and manual methods","volume":"16","author":"Clarke","year":"2021","journal-title":"Nat. Protoc."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1038\/s42256-022-00534-z","article-title":"scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data","volume":"4","author":"Yang","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Cheng, C., Chen, W., Jin, H., and Chen, X. (2023). A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell\u2013Cell Communication. Cells, 12.","DOI":"10.3390\/cells12151970"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1038\/nri.2017.76","article-title":"Single-cell RNA sequencing to explore immune cell heterogeneity","volume":"18","author":"Papalexi","year":"2018","journal-title":"Nat. Rev. Immunol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"D721","DOI":"10.1093\/nar\/gky900","article-title":"CellMarker: A manually curated resource of cell markers in human and mouse","volume":"47","author":"Zhang","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_16","first-page":"baz046","article-title":"PanglaoDB: A web server for exploration of mouse and human single-cell RNA sequencing data","volume":"2019","author":"Gan","year":"2019","journal-title":"Database"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"D900","DOI":"10.1093\/nar\/gky939","article-title":"CancerSEA: A cancer single-cell state atlas","volume":"47","author":"Yuan","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"100882","DOI":"10.1016\/j.isci.2020.100882","article-title":"scCATCH: Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data","volume":"23","author":"Shao","year":"2020","journal-title":"iScience"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cao, Y., Wang, X., and Peng, G. (2020). SCSA: A cell type annotation tool for single-cell RNA-seq data. Front. Genet., 11.","DOI":"10.3389\/fgene.2020.00490"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1038\/s41592-019-0529-1","article-title":"Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling","volume":"16","author":"Zhang","year":"2019","journal-title":"Nat. Methods"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Luo, D., Zhong, X., Choi, J.H., Ma, Y., Wang, S., Mahrt, E., Guo, W., Stawiski, E.W., and Modrusan, Z. (2019). SCINA: A Semi-Supervised Subtyping Algorithm of Single Cells and Bulk Samples. Genes, 10.","DOI":"10.3390\/genes10070531"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1089\/cmb.2020.0439","article-title":"Supervised Adversarial Alignment of Single-Cell RNA-seq Data","volume":"28","author":"Ge","year":"2021","journal-title":"J. Comput. Biol."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"e48","DOI":"10.1093\/nar\/gkz116","article-title":"SuperCT: A supervised-learning framework for enhanced characterization of single-cell transcriptomic profiles","volume":"47","author":"Xie","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1038\/nbt.4091","article-title":"Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors","volume":"36","author":"Haghverdi","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ji, X., Tsao, D., Bai, K., Tsao, M., Xing, L., and Zhang, X. (2023). scAnnotate: An automated cell-type annotation tool for single-cell RNA-sequencing data. Bioinform. Adv., 3.","DOI":"10.1093\/bioadv\/vbad030"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lin, Y., Cao, Y., Kim, H.J., Salim, A., Speed, T.P., Lin, D.M., Yang, P., and Yang, J.Y.H. (2020). scClassify: Sample size estimation and multiscale classification of cells using single and multiple reference. Mol. Syst. Biol., 16.","DOI":"10.15252\/msb.20199389"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.cels.2019.06.004","article-title":"SingleCellNet: A Computational Tool to Classify Single Cell RNA-Seq Data Across Platforms and Across Species","volume":"9","author":"Tan","year":"2019","journal-title":"Cell Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wagner, F., and Yanai, I. (2018). Moana: A robust and scalable cell type classification framework for single-cell RNA-Seq data. BioRxiv.","DOI":"10.1101\/456129"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"4696","DOI":"10.1093\/bioinformatics\/btz295","article-title":"LAmbDA: Label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection","volume":"35","author":"Johnson","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_31","unstructured":"LeCun, Y., and Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, MIT Press."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Khan, F., Ayoub, S., Gulzar, Y., Majid, M., Reegu, F.A., Mir, M.S., Soomro, A.B., and Elwasila, O. (2023). MRI-Based Effective Ensemble Frameworks for Predicting Human Brain Tumor. J. Imaging, 9.","DOI":"10.3390\/jimaging9080163"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bao, S., Li, K., Yan, C., Zhang, Z., Qu, J., and Zhou, M. (2022). Deep learning-based advances and applications for single-cell RNA-sequencing data analysis. Brief. Bioinform., 23.","DOI":"10.1093\/bib\/bbab473"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wani, S.A., Khan, S.A., and Quadri, S.M.K. (2023). scJVAE: A novel method for integrative analysis of multimodal single-cell data. Comput. Biol. Med., 158.","DOI":"10.1016\/j.compbiomed.2023.106865"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2744","DOI":"10.1016\/j.procs.2023.01.246","article-title":"Evaluation of Computational Methods for Single Cell Multi-Omics Integration","volume":"218","author":"Wani","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_36","first-page":"421","article-title":"Enhanced Transfer Learning Strategies for Effective Kidney Tumor Classification with CT Imaging","volume":"14","author":"Majid","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Alquicira-Hernandez, J., Sathe, A., Ji, H.P., Nguyen, Q., and Powell, J.E. (2019). ScPred: Accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol., 20.","DOI":"10.1186\/s13059-019-1862-5"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"e122","DOI":"10.1093\/nar\/gkab775","article-title":"ScDeepSort: A pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network","volume":"49","author":"Shao","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1462","DOI":"10.1038\/s41592-024-02235-4","article-title":"Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis","volume":"21","author":"Hou","year":"2024","journal-title":"Nat. Methods"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1038\/s41592-024-02201-0","article-title":"scGPT: Toward building a foundation model for single-cell multi-omics using generative AI","volume":"21","author":"Cui","year":"2024","journal-title":"Nat. Methods"},{"key":"ref_41","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the NAACL HLT 2019\u20142019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies\u2014Proceedings of the Conference, Minneapolis, MN, USA."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"e95","DOI":"10.1093\/nar\/gkz543","article-title":"CHETAH: A selective, hierarchical cell type identification method for single-cell RNA sequencing","volume":"47","author":"Lijnzaad","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Cao, X., Xing, L., Majd, E., He, H., Gu, J., and Zhang, X. (2022). A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data. Front. Genet., 13.","DOI":"10.3389\/fgene.2022.836798"},{"key":"ref_44","unstructured":"Rokach, L., and Maimon, O. (2006). Decision Trees. Data Mining and Knowledge Discovery Handbook, Springer."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/j.inffus.2015.06.005","article-title":"Decision forest: Twenty years of research","volume":"27","author":"Rokach","year":"2016","journal-title":"Inf. Fusion"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"i79","DOI":"10.1093\/bioinformatics\/bty260","article-title":"Random forest based similarity learning for single cell RNA sequencing data","volume":"34","author":"Pouyan","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_47","first-page":"470","article-title":"Using Ensemble Learning and Advanced Data Mining Techniques to Improve the Diagnosis of Chronic Kidney Disease","volume":"14","author":"Majid","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Natekin, A., and Knoll, A. (2013). Gradient boosting machines, a tutorial. Front. Neurorobotics, 7.","DOI":"10.3389\/fnbot.2013.00021"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ding, S., Wang, D., Zhou, X., Chen, L., Feng, K., Xu, X., Huang, T., Li, Z., and Cai, Y. (2022). Predicting Heart Cell Types by Using Transcriptome Profiles and a Machine Learning Method. Life, 12.","DOI":"10.3390\/life12020228"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1038\/s41587-021-01033-z","article-title":"Differential abundance testing on single-cell data using k-nearest neighbor graphs","volume":"40","author":"Dann","year":"2021","journal-title":"Nat. Biotechnol."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1649","DOI":"10.1038\/s41467-019-09639-3","article-title":"A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies","volume":"10","author":"Sun","year":"2019","journal-title":"Nat. Commun."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Khan, F., Gulzar, Y., Ayoub, S., Majid, M., Mir, M.S., and Soomro, A.B. (2023). Least square-support vector machine based brain tumor classification system with multi model texture features. Front. Appl. Math. Stat., 9.","DOI":"10.3389\/fams.2023.1324054"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Saygili, G., and OzgodeYigin, B. (2023). Continual learning approaches for single cell RNA sequencing data. Sci. Rep., 13.","DOI":"10.1038\/s41598-023-42482-7"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Hosmer, D.W., Lemeshow, S., Sturdivant, R.X., and Regression, A.L. (2013). Applied Logistic Regression, John Wiley & Sons.","DOI":"10.1002\/9781118548387"},{"key":"ref_55","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_56","unstructured":"10X Genomics (2022, December 12). PBMC-Multiome. [Online]. Available online: https:\/\/www.10xgenomics.com\/resources\/datasets\/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-1-0-0."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1038\/s41586-019-1404-z","article-title":"Neuronal vulnerability and multilineage diversity in multiple sclerosis","volume":"573","author":"Schirmer","year":"2019","journal-title":"Nature"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/4\/232\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:17:01Z","timestamp":1760030221000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/4\/232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,18]]},"references-count":58,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["a18040232"],"URL":"https:\/\/doi.org\/10.3390\/a18040232","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,18]]}}}