{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T17:05:42Z","timestamp":1767891942557,"version":"3.49.0"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2024,9,23]],"date-time":"2024-09-23T00:00:00Z","timestamp":1727049600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62372483"],"award-info":[{"award-number":["62372483"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Recent advances in sequencing technology provide opportunities to study biological processes at a higher resolution. Cell type annotation is an important step in scRNA-seq analysis, which often relies on established marker genes. However, most of the previous methods divide the identification of cell types into two stages, clustering and assignment, whose performances are susceptible to the clustering algorithm, and the marker information cannot effectively guide the clustering process. Furthermore, their linear heuristic-based cell assignment process is often insufficient to capture potential dependencies between cells and types.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we present Interpretable Cell Type Annotation based on self-training (sICTA), a marker-based cell type annotation method that combines the self-training strategy with pseudo-labeling and the nonlinear association capturing capability of Transformer. In addition, we incorporate biological priori knowledge of genes and pathways into the classifier through an attention mechanism to enhance the transparency of the model. A benchmark analysis on 11 publicly available single-cell datasets demonstrates the superiority of sICTA compared to state-of-the-art methods. The robustness of our method is further validated by evaluating the prediction accuracy of the model on different cell types for each single-cell data. Moreover, ablation studies show that self-training and the ability to capture potential dependencies between cells and cell types, both of which are mutually reinforcing, work together to improve model performance. Finally, we apply sICTA to the pancreatic dataset, exemplifying the interpretable attention matrix captured by sICTA.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of sICTA is available in public at https:\/\/github.com\/nbnbhwyy\/sICTA. The processed datasets can be found at https:\/\/drive.google.com\/drive\/folders\/1jbqSxacL_IDIZ4uPjq220C9Kv024m9eL. The final version of the model will be permanently available at https:\/\/doi.org\/10.5281\/zenodo.13474010<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae569","type":"journal-article","created":{"date-parts":[[2024,9,23]],"date-time":"2024-09-23T19:07:56Z","timestamp":1727118476000},"source":"Crossref","is-referenced-by-count":1,"title":["A self-training interpretable cell type annotation framework using specific marker gene"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8338-3884","authenticated-orcid":false,"given":"Hegang","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510006,","place":["China"]}]},{"given":"Yuyin","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510006,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1610-9599","authenticated-orcid":false,"given":"Yanghui","family":"Rao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510006,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2024,9,23]]},"reference":[{"key":"2024101822591182600_btae569-B1","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1186\/s13059-019-1862-5","article-title":"scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data","volume":"20","author":"Alquicira-Hernandez","year":"2019","journal-title":"Genome Biol"},{"key":"2024101822591182600_btae569-B2","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1038\/s41590-018-0276-y","article-title":"Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage","volume":"20","author":"Aran","year":"2019","journal-title":"Nat Immunol"},{"key":"2024101822591182600_btae569-B3","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J Stat Mech"},{"key":"2024101822591182600_btae569-B4","doi-asserted-by":"publisher","author":"Carnevale","year":"2019","DOI":"10.1101\/730960"},{"key":"2024101822591182600_btae569-B5","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1038\/s41467-023-35923-4","article-title":"Transformer for one stop interpretable cell type annotation","volume":"14","author":"Chen","year":"2023","journal-title":"Nat Commun"},{"key":"2024101822591182600_btae569-B6","doi-asserted-by":"crossref","first-page":"1470","DOI":"10.1038\/s41592-024-02201-0","article-title":"scGPT: Toward building a foundation model for single-cell multi-omics using generative AI","volume":"21","author":"Cui","year":"2024","journal-title":"Nat Methods"},{"key":"2024101822591182600_btae569-B7","doi-asserted-by":"crossref","first-page":"eabl5197","DOI":"10.1126\/science.abl5197","article-title":"Cross-tissue immune cell analysis reveals tissue-specific features in humans","volume":"376","author":"Dom\u00ednguez Conde","year":"2022","journal-title":"Science"},{"key":"2024101822591182600_btae569-B8","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1186\/s12864-018-5370-x","article-title":"Gene2vec: distributed representation of genes based on co-expression","volume":"20","author":"Du","year":"2019","journal-title":"BMC Genomics"},{"key":"2024101822591182600_btae569-B9","doi-asserted-by":"crossref","first-page":"810","DOI":"10.1016\/j.cell.2020.12.016","article-title":"Spatiotemporal analysis of human intestinal development at single-cell resolution","volume":"184","author":"Fawkner-Corbett","year":"2021","journal-title":"Cell"},{"key":"2024101822591182600_btae569-B10","doi-asserted-by":"crossref","first-page":"baz046","DOI":"10.1093\/database\/baz046","article-title":"PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data","volume":"2019","author":"Franz\u00e9n","year":"2019","journal-title":"Database"},{"key":"2024101822591182600_btae569-B11","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1186\/s13059-021-02281-7","article-title":"scSorter: assigning cells to known cell types according to marker genes","volume":"22","author":"Guo","year":"2021","journal-title":"Genome Biol"},{"key":"2024101822591182600_btae569-B12","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/B978-0-12-800101-1.00010-7","article-title":"Regulation of pancreatic islet beta-cell mass by growth factor and hormone signaling","volume":"121","author":"Huang","year":"2014","journal-title":"Prog Mol Biol Transl Sci"},{"key":"2024101822591182600_btae569-B13","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J Classif"},{"key":"2024101822591182600_btae569-B14","doi-asserted-by":"crossref","first-page":"1246","DOI":"10.1038\/s41467-022-28803-w","article-title":"Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data","volume":"13","author":"Ianevski","year":"2022","journal-title":"Nat Commun"},{"key":"2024101822591182600_btae569-B15","doi-asserted-by":"crossref","first-page":"bbad266","DOI":"10.1093\/bib\/bbad266","article-title":"scDeepInsight: a supervised cell-type identification method for scRNA-seq data with deep learning","volume":"24","author":"Jia","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024101822591182600_btae569-B16","doi-asserted-by":"crossref","first-page":"3120","DOI":"10.1016\/j.csbj.2022.06.010","article-title":"MarkerCount: a stable, count-based cell type identifier for single-cell RNA-seq experiments","volume":"20","author":"Kim","year":"2022","journal-title":"Comput Struct Biotechnol J"},{"key":"2024101822591182600_btae569-B17","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1038\/nmeth.4644","article-title":"scmap: projection of single-cell RNA-seq data across data sets","volume":"15","author":"Kiselev","year":"2018","journal-title":"Nat Methods"},{"key":"2024101822591182600_btae569-B18","doi-asserted-by":"crossref","first-page":"e1001143","DOI":"10.1371\/journal.pbio.1001143","article-title":"Pancreatic mesenchyme regulates epithelial organogenesis throughout development","volume":"9","author":"Landsman","year":"2011","journal-title":"PLoS Biol"},{"key":"2024101822591182600_btae569-B19","first-page":"896","author":"Lee","year":"2013"},{"key":"2024101822591182600_btae569-B20","doi-asserted-by":"crossref","first-page":"bbad006","DOI":"10.1093\/bib\/bbad006","article-title":"Hierarchical cell-type identifier accurately distinguishes immune-cell subtypes enabling precise profiling of tissue microenvironment with single-cell RNA-sequencing","volume":"24","author":"Lee","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024101822591182600_btae569-B21","doi-asserted-by":"crossref","first-page":"btad360","DOI":"10.1093\/bioinformatics\/btad360","article-title":"Consensus label propagation with graph convolutional networks for single-cell RNA sequencing cell type annotation","volume":"39","author":"Lewinsohn","year":"2023","journal-title":"Bioinformatics"},{"key":"2024101822591182600_btae569-B22","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/s41586-021-03372-y","article-title":"Modelling human blastocysts by reprogramming fibroblasts into iBlastoids","volume":"591","author":"Liu","year":"2021","journal-title":"Nature"},{"key":"2024101822591182600_btae569-B23","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1038\/nmeth.2069","article-title":"Single-cell systems biology by super-resolution imaging and combinatorial labeling","volume":"9","author":"Lubeck","year":"2012","journal-title":"Nat Methods"},{"key":"2024101822591182600_btae569-B24","doi-asserted-by":"crossref","first-page":"42214","DOI":"10.1038\/srep42214","article-title":"Novel flow cytometry approach to identify bronchial epithelial cells from healthy human airways","volume":"7","author":"Maestre-Batlle","year":"2017","journal-title":"Sci Rep"},{"key":"2024101822591182600_btae569-B25","first-page":"983","author":"Meng","year":"2018"},{"key":"2024101822591182600_btae569-B26","doi-asserted-by":"crossref","first-page":"1142","DOI":"10.1038\/s42003-022-04093-2","article-title":"Multi-level cellular and functional annotation of single-cell transcriptomes using scPipeline","volume":"5","author":"Mikolajewicz","year":"2022","journal-title":"Commun Biol"},{"key":"2024101822591182600_btae569-B27","first-page":"4855","author":"Mrabah","year":"2023"},{"key":"2024101822591182600_btae569-B28","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s12859-022-04574-5","article-title":"scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data","volume":"23","author":"Nguyen","year":"2022","journal-title":"BMC Bioinformatics"},{"key":"2024101822591182600_btae569-B29","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.freeradbiomed.2005.04.030","article-title":"Diabetes, glucose toxicity, and oxidative stress: a case of double jeopardy for the pancreatic islet \u03b2 cell","volume":"41","author":"Paul Robertson","year":"2006","journal-title":"Free Radic Biol Med"},{"key":"2024101822591182600_btae569-B30","first-page":"2825","article-title":"scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2024101822591182600_btae569-B31","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/s41592-019-0535-3","article-title":"Supervised classification enables rapid annotation of cell atlases","volume":"16","author":"Pliner","year":"2019","journal-title":"Nat Methods"},{"key":"2024101822591182600_btae569-B32","author":"Rosenberg","year":"2005"},{"key":"2024101822591182600_btae569-B33","doi-asserted-by":"crossref","first-page":"100882","DOI":"10.1016\/j.isci.2020.100882","article-title":"scCATCH: automatic annotation on cell types of clusters from single-cell RNA sequencing data","volume":"23","author":"Shao","year":"2020","journal-title":"Iscience"},{"key":"2024101822591182600_btae569-B34","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2024101822591182600_btae569-B35","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1038\/s41586-023-06139-9","article-title":"Transfer learning enables predictions in network biology","volume":"618","author":"Theodoris","year":"2023","journal-title":"Nature"},{"key":"2024101822591182600_btae569-B36","doi-asserted-by":"crossref","first-page":"bbad195","DOI":"10.1093\/bib\/bbad195","article-title":"CiForm as a transformer-based model for cell-type annotation of large-scale single-cell RNA-seq data","volume":"24","author":"Xu","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024101822591182600_btae569-B37","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1126\/science.aaa1934","article-title":"Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq","volume":"347","author":"Zeisel","year":"2015","journal-title":"Science"},{"key":"2024101822591182600_btae569-B38","doi-asserted-by":"crossref","first-page":"D721","DOI":"10.1093\/nar\/gky900","article-title":"CellMarker: a manually curated resource of cell markers in human and mouse","volume":"47","author":"Zhang","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024101822591182600_btae569-B39","doi-asserted-by":"crossref","first-page":"531","DOI":"10.3390\/genes10070531","article-title":"SCINA: a semi-supervised subtyping algorithm of single cells and bulk samples","volume":"10","author":"Zhang","year":"2019","journal-title":"Genes (Basel)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae569\/59253304\/btae569.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/10\/btae569\/59882471\/btae569.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/10\/btae569\/59882471\/btae569.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,18]],"date-time":"2024-10-18T22:59:33Z","timestamp":1729292373000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae569\/7766196"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,9,23]]},"references-count":39,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae569","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,10]]},"published":{"date-parts":[[2024,9,23]]},"article-number":"btae569"}}