{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T15:10:03Z","timestamp":1767625803270,"version":"3.41.2"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,31]],"date-time":"2022-12-31T00:00:00Z","timestamp":1672444800000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100012118","name":"Ontario Institute for Cancer Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100012118","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Genome Canada and Ontario Genomics","award":["OGI-167"],"award-info":[{"award-number":["OGI-167"]}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["DGECR-2021-00298","R-20-303"],"award-info":[{"award-number":["DGECR-2021-00298","R-20-303"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,19]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Single-cell RNA sequencing (scRNA-seq) clustering and labelling methods are used to determine precise cellular composition of tissue samples. Automated labelling methods rely on either unsupervised, cluster-based approaches or supervised, cell-based approaches to identify cell types. The high complexity of cancer poses a unique challenge, as tumor microenvironments are often composed of diverse cell subpopulations with unique functional effects that may lead to disease progression, metastasis and treatment resistance. Here, we assess 17 cell-based and 9 cluster-based scRNA-seq labelling algorithms using 8 cancer datasets, providing a comprehensive large-scale assessment of such methods in a cancer-specific context. Using several performance metrics, we show that cell-based methods generally achieved higher performance and were faster compared to cluster-based methods. Cluster-based methods more successfully labelled non-malignant cell types, likely because of a lack of gene signatures for relevant malignant cell subpopulations. Larger cell numbers present in some cell types in training data positively impacted prediction scores for cell-based methods. Finally, we examined which methods performed favorably when trained and tested on separate patient cohorts in scenarios similar to clinical applications, and which were able to accurately label particularly small or under-represented cell populations in the given datasets. We conclude that scPred and SVM show the best overall performances with cancer-specific data and provide further suggestions for algorithm selection. Our analysis pipeline for assessing the performance of cell type labelling algorithms is available in https:\/\/github.com\/shooshtarilab\/scRNAseq-Automated-Cell-Type-Labelling.<\/jats:p>","DOI":"10.1093\/bib\/bbac561","type":"journal-article","created":{"date-parts":[[2022,12,31]],"date-time":"2022-12-31T06:22:55Z","timestamp":1672467775000},"source":"Crossref","is-referenced-by-count":10,"title":["Evaluation of single-cell RNAseq labelling algorithms using cancer datasets"],"prefix":"10.1093","volume":"24","author":[{"given":"Erik","family":"Christensen","sequence":"first","affiliation":[{"name":"University of Western Ontario Department of Computer Science, , London, ON , Canada"},{"name":"Children\u2019s Health Research Institute, Lawson Research Institute , London, ON , Canada"}]},{"given":"Ping","family":"Luo","sequence":"additional","affiliation":[{"name":"Princess Margaret Cancer Centre, University Health Network , Toronto, ON , Canada"}]},{"given":"Andrei","family":"Turinsky","sequence":"additional","affiliation":[{"name":"Centre for Computational Medicine, The Hospital for Sick Children , Toronto, ON , Canada"}]},{"given":"Mia","family":"Husi\u0107","sequence":"additional","affiliation":[{"name":"Centre for Computational Medicine, The Hospital for Sick Children , Toronto, ON , Canada"}]},{"given":"Alaina","family":"Mahalanabis","sequence":"additional","affiliation":[{"name":"Centre for Computational Medicine, The Hospital for Sick Children , Toronto, ON , Canada"}]},{"given":"Alaine","family":"Naidas","sequence":"additional","affiliation":[{"name":"Children\u2019s Health Research Institute, Lawson Research Institute , London, ON , Canada"},{"name":"University of Western Ontario Department of Pathology and Lab Medicine, , London, ON , Canada"}]},{"given":"Juan Javier","family":"Diaz-Mejia","sequence":"additional","affiliation":[{"name":"Princess Margaret Cancer Centre, University Health Network , Toronto, ON , Canada"}]},{"given":"Michael","family":"Brudno","sequence":"additional","affiliation":[{"name":"University of Toronto Department of Computer Science, , Toronto, ON , Canada"}]},{"given":"Trevor","family":"Pugh","sequence":"additional","affiliation":[{"name":"Princess Margaret Cancer Centre, University Health Network , Toronto, ON , Canada"},{"name":"Ontario Institute for Cancer Research , Toronto, ON , Canada"},{"name":"University of Toronto Department of Medical Biophysics, , Toronto, ON , Canada"}]},{"given":"Arun","family":"Ramani","sequence":"additional","affiliation":[{"name":"Centre for Computational Medicine, The Hospital for Sick Children , Toronto, ON , Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2273-1034","authenticated-orcid":false,"given":"Parisa","family":"Shooshtari","sequence":"additional","affiliation":[{"name":"University of Western Ontario Department of Computer Science, , London, ON , Canada"},{"name":"Children\u2019s Health Research Institute, Lawson Research Institute , London, ON , Canada"},{"name":"University of Western Ontario Department of Pathology and Lab Medicine, , London, ON , Canada"},{"name":"Ontario Institute for Cancer Research , Toronto, ON , Canada"}]}],"member":"286","published-online":{"date-parts":[[2022,12,30]]},"reference":[{"key":"2023011917082432600_ref1","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1016\/j.cell.2011.02.013","article-title":"Hallmarks of cancer: the next generation","volume":"144","author":"Hanahan","year":"2011","journal-title":"Cell"},{"key":"2023011917082432600_ref2","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1038\/nrc2618","article-title":"Microenvironmental regulation of metastasis","volume":"9","author":"Joyce","year":"2009","journal-title":"Nat Rev Cancer"},{"key":"2023011917082432600_ref3","doi-asserted-by":"crossref","first-page":"1349","DOI":"10.1038\/s41556-018-0236-7","article-title":"Tumour heterogeneity and metastasis at single-cell resolution","volume":"20","author":"Lawson","year":"2018","journal-title":"Nat Cell Biol"},{"key":"2023011917082432600_ref4","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1126\/science.aad0501","article-title":"Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq","volume":"352","author":"Tirosh","year":"2016","journal-title":"Science"},{"key":"2023011917082432600_ref5","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1038\/nature12624","article-title":"Tumour heterogeneity and cancer cell plasticity","volume":"501","author":"Meacham","year":"2013","journal-title":"Nature"},{"key":"2023011917082432600_ref6","doi-asserted-by":"crossref","first-page":"514","DOI":"10.1016\/j.celrep.2013.12.041","article-title":"Inference of tumor evolution during chemotherapy by computational modeling and in situ analysis of cellular diversity for genetic and phenotypic features","volume":"6","author":"Almendro","year":"2014","journal-title":"Cell Rep"},{"key":"2023011917082432600_ref7","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/s41586-018-0024-3","article-title":"Intra-tumour diversification in colorectal cancer at the single-cell level","volume":"556","author":"Roerink","year":"2018","journal-title":"Nature"},{"key":"2023011917082432600_ref8","doi-asserted-by":"crossref","first-page":"986","DOI":"10.1038\/s41591-018-0078-7","article-title":"Single-cell profiling of breast cancer T cells reveals a tissue-resident memory subset associated with improved prognosis","volume":"24","author":"Savas","year":"2018","journal-title":"Nat Med"},{"key":"2023011917082432600_ref9","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1186\/s13059-019-1795-z","article-title":"A comparison of automatic cell identification methods for single-cell RNA sequencing data","volume":"20","author":"Abdelaal","year":"2019","journal-title":"Genome Biol"},{"key":"2023011917082432600_ref10","doi-asserted-by":"crossref","first-page":"100882","DOI":"10.1016\/j.isci.2020.100882","article-title":"scCATCH: automatic annotation on cell types of clusters from single-cell RNA sequencing data","volume":"23","author":"Shao","year":"2020","journal-title":"iScience"},{"key":"2023011917082432600_ref11","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1038\/nmeth.3337","article-title":"Robust enumeration of cell subsets from tissue expression profiles","volume":"12","author":"Newman","year":"2015","journal-title":"Nat Methods"},{"key":"2023011917082432600_ref12","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1471-2105-14-7","article-title":"GSVA: gene set variation analysis for microarray and RNA-seq data","volume":"14","author":"H\u00e4nzelmann","year":"2013","journal-title":"BMC Bioinform"},{"key":"2023011917082432600_ref13","doi-asserted-by":"crossref","DOI":"10.12688\/f1000research.18490.1","article-title":"Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data","volume":"8","author":"Diaz-Mejia","year":"2019","journal-title":"F1000 Res"},{"key":"2023011917082432600_ref14","doi-asserted-by":"crossref","first-page":"e95","DOI":"10.1093\/nar\/gkz543","article-title":"CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing","volume":"47","author":"Kanter","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023011917082432600_ref15","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1038\/s41422-020-0355-0","article-title":"A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling","volume":"30","author":"Qian","year":"2020","journal-title":"Cell Res"},{"key":"2023011917082432600_ref16","doi-asserted-by":"crossref","first-page":"bbab035","DOI":"10.1093\/bib\/bbab035","article-title":"Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data","volume":"22","author":"Huang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023011917082432600_ref17","doi-asserted-by":"crossref","first-page":"1581","DOI":"10.1093\/bib\/bbz096","article-title":"Evaluation of single-cell classifiers for single-cell RNA sequencing data sets","volume":"21","author":"Zhao","year":"2020","journal-title":"Brief Bioinform"},{"key":"2023011917082432600_ref18","doi-asserted-by":"crossref","first-page":"W372","DOI":"10.1093\/nar\/gkaa437","article-title":"CReSCENT: cancer single cell expression toolkit","volume":"48","author":"Mohanraj","year":"2020","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"2023011917082432600_ops-bib-reference-albcnxfjcfkjrou0","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1093\/bioinformatics\/btz592","article-title":"ACTINN: automated identification of cell types in single cell RNA sequencing","volume":"36","author":"Ma","year":"2020","journal-title":"Bioinformatics"},{"issue":"10","key":"2023011917082432600_ops-bib-reference-olbcny7vtanhbh3v","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0205499","article-title":"CaSTLe\u2013classification of single cells by transfer learning: harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments","volume":"13","author":"Lieberman","year":"2018","journal-title":"PloS one"},{"issue":"1","key":"2023011917082432600_ops-bib-reference-mlbcnyzfw6kbfx2s","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-17281-7","article-title":"Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST","volume":"11","author":"Cao","year":"2020","journal-title":"Nat Commun"},{"key":"2023011917082432600_ref19","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"issue":"22","key":"2023011917082432600_ops-bib-reference-clbco0u7a4dg07gb","doi-asserted-by":"crossref","first-page":"4696","DOI":"10.1093\/bioinformatics\/btz295","article-title":"LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection","volume":"35","author":"Johnson","year":"2019","journal-title":"Bioinformatics"},{"key":"2023011917082432600_ops-bib-reference-xlbco77x8co5o6qf","article-title":"scID uses discriminant analysis to identify transcriptionally equivalent cell types across single-cell RNA-seq data with batch effect","volume":"23","author":"Boufea","journal-title":"IScience"},{"issue":"5","key":"2023011917082432600_ops-bib-reference-ilbco7vi0eenryxh","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1038\/nmeth.4644","article-title":"scmap: projection of single-cell RNA-seq data across data sets","volume":"15","author":"Kiselev","year":"2018","journal-title":"Nat Methods"},{"issue":"1","key":"2023011917082432600_ops-bib-reference-plbco8jog9eji8rk","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1862-5","article-title":"scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data","volume":"20","author":"Alquicira-Hernandez","year":"2019","journal-title":"Genome Biol"},{"issue":"1","key":"2023011917082432600_ops-bib-reference-qlbco96umijmhvix","doi-asserted-by":"crossref","first-page":"e9620","DOI":"10.15252\/msb.20209620","article-title":"Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models","volume":"17","author":"Xu","year":"2021","journal-title":"Mol Syst Biol"},{"issue":"2","key":"2023011917082432600_ops-bib-reference-rlbco9wu25s5jbh9","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.cels.2019.06.004","article-title":"SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species","volume":"9","author":"Tan","year":"2019","journal-title":"Cell Syst"},{"issue":"2","key":"2023011917082432600_ops-bib-reference-ulbcoam3fvm06ywq","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1038\/s41590-018-0276-y","article-title":"Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage","volume":"20","author":"Aran","year":"2019","journal-title":"Nat Immunol"},{"issue":"12","key":"2023011917082432600_ops-bib-reference-plbcod92fskh2orm","doi-asserted-by":"crossref","first-page":"3910","DOI":"10.1093\/bioinformatics\/btaa269","article-title":"Bj\u00f6rkegren JL. alona: a web server for single-cell RNA-seq analysis","volume":"36","author":"Franz\u00e9n","year":"2020","journal-title":"Bioinformatics"},{"key":"2023011917082432600_ops-bib-reference-glbcob7hi9ksw14x","first-page":"15545","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume-title":"Proceedings of the National Academy of Sciences","author":"Subramanian","year":"2005"},{"issue":"1","key":"2023011917082432600_ops-bib-reference-jlbcobwfnh9tketd","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-03282-0","article-title":"Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor","volume":"9","author":"Crow","year":"2018","journal-title":"Nat Commun"},{"issue":"8","key":"2023011917082432600_ops-bib-reference-albcock7c7wo7reh","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1093\/bioinformatics\/btm051","article-title":"Analyzing gene expression data in terms of gene sets: methodological issues","volume":"23","author":"Goeman","year":"2007","journal-title":"Bioinformatics"},{"issue":"9","key":"2023011917082432600_ref20","doi-asserted-by":"crossref","first-page":"e0272302","DOI":"10.1371\/journal.pone.0272302","article-title":"TMExplorer: a tumour microenvironment single-cell RNAseq database and search tool","volume":"17","author":"Christensen","journal-title":"Plos One"},{"key":"2023011917082432600_ref21","doi-asserted-by":"crossref","first-page":"15081","DOI":"10.1038\/ncomms15081","article-title":"Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer","volume":"8","author":"Chung","year":"2017","journal-title":"Nat Commun"},{"key":"2023011917082432600_ref22","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1038\/ng.3818","article-title":"Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors","volume":"49","author":"Li","year":"2017","journal-title":"Nat Genet"},{"key":"2023011917082432600_ref23","doi-asserted-by":"crossref","first-page":"1399","DOI":"10.1016\/j.celrep.2017.10.030","article-title":"Single-cell RNA-seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma","volume":"21","author":"Darmanis","year":"2017","journal-title":"Cell Rep"},{"key":"2023011917082432600_ref24","doi-asserted-by":"crossref","first-page":"984","DOI":"10.1016\/j.cell.2018.09.006","article-title":"A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade","volume":"175","author":"Jerby-Arnon","year":"2018","journal-title":"Cell"},{"key":"2023011917082432600_ref25","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1038\/s41591-018-0096-5","article-title":"Phenotype molding of stromal cells in the lung tumor microenvironment","volume":"24","author":"Lambrechts","year":"2018","journal-title":"Nat Med"},{"key":"2023011917082432600_ref26","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1038\/s41422-019-0195-y","article-title":"Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma","volume":"29","author":"Peng","year":"2019","journal-title":"Cell Res"},{"key":"2023011917082432600_ref27","doi-asserted-by":"crossref","first-page":"1265","DOI":"10.1016\/j.cell.2019.01.031","article-title":"Single-cell RNA-seq reveals AML hierarchies relevant to disease progression and immunity","volume":"176","author":"Galen","year":"2019","journal-title":"Cell"},{"key":"2023011917082432600_ref28","doi-asserted-by":"crossref","first-page":"W537","DOI":"10.1093\/nar\/gky379","article-title":"The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update","volume":"46","author":"Afgan","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023011917082432600_ref29","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1016\/j.cell.2019.05.031","article-title":"Comprehensive integration of single-cell data","volume":"177","author":"Stuart","year":"2019","journal-title":"Cell"},{"key":"2023011917082432600_ref30","doi-asserted-by":"crossref","first-page":"R21","DOI":"10.1186\/gb-2005-6-2-r21","article-title":"An ontology for cell types","volume":"6","author":"Bard","year":"2005","journal-title":"Genome Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/1\/bbac561\/48782043\/bbac561.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/1\/bbac561\/48782043\/bbac561.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,18]],"date-time":"2023-03-18T07:37:36Z","timestamp":1679125056000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac561\/6965910"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,30]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac561","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2023,1]]},"published":{"date-parts":[[2022,12,30]]},"article-number":"bbac561"}}