{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T06:29:52Z","timestamp":1763360992844,"version":"3.45.0"},"reference-count":36,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T00:00:00Z","timestamp":1763337600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec>\n                    <jats:title>Objective<\/jats:title>\n                    <jats:p>The precise identification of human cell types and their intricate interactions is of fundamental importance in biological research. Confronted with the challenges inherent in manual cell type annotation from the high-dimensional molecular feature data generated by single-cell RNA sequencing (scRNA-seq)\u2014a technology that has otherwise opened new avenues for such explorations\u2014this study aimed to develop and evaluate a robust, large-scale pre-trained model designed for automated cell type classification, with a focus on major cell categories in this initial study.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>A novel methodology for cell type classification, named scReformer-BERT, was developed, leveraging a BERT (Bidirectional Encoder Representations from Transformers) architecture that integrates Reformer encoders. This framework was subjected to extensive self-supervised pre-training on substantial scRNA-seq datasets, after which supervised fine-tuning and rigorous five-fold cross-validation was performed to optimize the model for predictive accuracy on targeted first-tier cell type classification tasks. A comprehensive ablation study was also conducted to dissect the contributions of each architectural component, and SHAP (SHapley Additive exPlanations) analysis was used to interpret the model\u2019s decisions.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The performance of the proposed model was rigorously evaluated through a series of experiments. These evaluations, conducted on scRNA-seq data, consistently revealed the superior efficacy of our approach in accurately classifying major cell categories when compared against several established baseline methods and the inherent difficulties in the field.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>Considering these outcomes, the developed large-scale pre-trained model, which synergizes Reformer encoders with a BERT architecture, presents a potent, effective and interpretable solution for automated cell type classification derived from scRNA-seq data. Its notable performance suggests considerable utility in improving both the efficiency and precision of cellular identification in high-throughput genomic investigations.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/frai.2025.1661318","type":"journal-article","created":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T06:26:09Z","timestamp":1763360769000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Scaling transformers to high-dimensional sparse data: a Reformer-BERT approach for large-scale classification"],"prefix":"10.3389","volume":"8","author":[{"given":"Wanxuan","family":"Li","sequence":"first","affiliation":[]},{"given":"Xinhua","family":"Li","sequence":"additional","affiliation":[]},{"given":"Weihang","family":"Guo","sequence":"additional","affiliation":[]},{"given":"Boyuan","family":"Gu","sequence":"additional","affiliation":[]},{"given":"Jianjun","family":"Du","sequence":"additional","affiliation":[]},{"given":"Ning","family":"Chi","sequence":"additional","affiliation":[]},{"given":"Dan","family":"Shao","sequence":"additional","affiliation":[]},{"given":"Kai","family":"Xiao","sequence":"additional","affiliation":[]},{"given":"Ren","family":"Mo","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,11,17]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1186\/s13059-019-1862-5","article-title":"scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data","volume":"20","author":"Alquicira-Hernandez","year":"2019","journal-title":"Genome Biol."},{"key":"ref2","article-title":"Longformer: the long-document transformer","author":"Beltagy","year":"2020"},{"key":"ref3","article-title":"A human ensemble cell atlas (hECA) enables in data cell sorting","author":"Chen","year":"2021"},{"key":"ref4","doi-asserted-by":"publisher","first-page":"14540","DOI":"10.1109\/ACCESS.2021.3052923","article-title":"Cell subtype classification via representation learning based on a denoising autoencoder for single-cell RNA sequencing","volume":"9","author":"Choi","year":"2021","journal-title":"IEEE Access"},{"key":"ref5","first-page":"1","article-title":"Rethinking attention with performers","author":"Choromanski","year":"2021"},{"key":"ref6","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P19-1285","article-title":"Transformer-XL: attentive language models beyond a fixed-length context","author":"Dai","year":"2019"},{"key":"ref8","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019"},{"key":"ref9","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1186\/s12864-018-5370-x","article-title":"Gene2vec: distributed representation of genes based on co-expression","volume":"20","author":"Du","year":"2019","journal-title":"BMC Genomics"},{"key":"ref10","doi-asserted-by":"publisher","DOI":"10.2174\/0113892037357900250401020207","article-title":"Messenger RNA nanomedicine: innovations and future directions","volume":"26","author":"Dwivedi","year":"2025","journal-title":"Curr. Protein Pept. Sci."},{"key":"ref11","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1186\/s12885-024-12331-5","article-title":"Reference-free inferring of transcriptomic events in cancer cells on single-cell data","volume":"24","author":"Eralp","year":"2024","journal-title":"BMC Cancer"},{"key":"ref12","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1038\/nature14966","article-title":"Single-cell messenger RNA sequencing reveals rare intestinal cell types","volume":"525","author":"Gr\u00fcn","year":"2015","journal-title":"Nature"},{"key":"ref13","volume-title":"Micrographia, or, some physiological descriptions of minute bodies made by magnifying glasses: With observations and inquiries thereupon","author":"Hooke","year":"1665"},{"key":"ref14","doi-asserted-by":"publisher","first-page":"607","DOI":"10.1038\/s42256-020-00233-7","article-title":"Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis","volume":"2","author":"Hu","year":"2020","journal-title":"Nat Mach Intell"},{"key":"ref15","article-title":"Reformer: the efficient transformer","author":"Kitaev","year":"2020"},{"key":"ref16","doi-asserted-by":"publisher","first-page":"e156","DOI":"10.1093\/nar\/gkx681","article-title":"Using neural networks for reducing the dimensions of single-cell RNA-seq data","volume":"45","author":"Lin","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1038\/s41586-020-2797-4","article-title":"Cells of the adult human heart","volume":"588","author":"Litvi\u0148ukov\u00e1","year":"2020","journal-title":"Nature"},{"key":"ref18","doi-asserted-by":"publisher","first-page":"4765","DOI":"10.48550\/arXiv.1705.07874","article-title":"A unified approach to interpreting model predictions","volume":"30","author":"Lundberg","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref19","article-title":"Cellxgene: a performant, scalable exploration platform for single cell data","author":"Megill","year":"2021"},{"key":"ref20","article-title":"Cosformer: rethinking softmax in attention","author":"Qin","year":"2022"},{"key":"ref21","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1038\/550451a","article-title":"The human cell atlas","volume":"550","author":"Regev","year":"2017","journal-title":"Nature"},{"key":"ref22","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1089\/cmb.2024.0807","article-title":"DRGAT: predicting drug responses via diffusion-based graph attention network","volume":"32","author":"Sefer","year":"2025","journal-title":"J. Comput. Biol."},{"key":"ref23","doi-asserted-by":"publisher","first-page":"100882","DOI":"10.1016\/j.isci.2020.100882","article-title":"ScCATCH: automatic annotation on cell types of clusters from single-cell RNA sequencing data","volume":"23","author":"Shao","year":"2020","journal-title":"iScience"},{"key":"ref24","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a tabula Muris","volume":"562","year":"2018","journal-title":"Nature"},{"key":"ref25","doi-asserted-by":"publisher","first-page":"eabl4896","DOI":"10.1126\/science.abl4896","article-title":"The tabula sapiens: a multiple-organ, single-cell transcriptomic atlas of humans","volume":"376","author":"Jones","year":"2022","journal-title":"Science"},{"key":"ref26","doi-asserted-by":"publisher","first-page":"1684","DOI":"10.1161\/CIRCULATIONAHA.119.045401","article-title":"Transcriptional and Cellular Diversity of the Human Heart","volume":"183","author":"Tucker","year":"2020","journal-title":"Circulation"},{"key":"ref27","doi-asserted-by":"publisher","first-page":"1260419","DOI":"10.1126\/science.1260419","article-title":"Proteomics. Tissue-based map of the human proteome","volume":"347","author":"Uhl\u00e9n","year":"2015","journal-title":"Science"},{"key":"ref7","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref28","article-title":"Attention is all you need","author":"Vaswani","year":"2023"},{"key":"ref29","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1038\/nbt825","article-title":"Global protein function prediction from protein-protein interaction networks","volume":"21","author":"Vazquez","year":"2003","journal-title":"Nat. Biotechnol."},{"key":"ref30","doi-asserted-by":"publisher","first-page":"1882","DOI":"10.1038\/s41467-021-22197-x","article-title":"scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses","volume":"12","author":"Wang","year":"2021","journal-title":"Nat. Commun."},{"key":"ref31","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1038\/s41556-019-0446-7","article-title":"Single-cell reconstruction of the adult human heart during heart failure and recovery reveals the cellular landscape of cardiac regeneration","volume":"22","author":"Wang","year":"2020","journal-title":"Nat. Cell Biol."},{"key":"ref32","article-title":"Linformer: self-attention with linear complexity","author":"Wang","year":"2020"},{"key":"ref33","doi-asserted-by":"publisher","first-page":"1974","DOI":"10.1093\/bioinformatics\/btv088","article-title":"Identification of cell types from single-cell transcriptomes using a novel clustering method","volume":"31","author":"Xu","year":"2015","journal-title":"Bioinformatics"},{"key":"ref34","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1038\/s42256-022-00534-z","article-title":"ScBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data","volume":"4","author":"Yang","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"ref35","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/s41593-020-00758-5","article-title":"Single-cell transcriptomics of the adult mouse brain","volume":"24","author":"Zeng","year":"2021","journal-title":"Nat. Neurosci."},{"key":"ref36","doi-asserted-by":"publisher","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat. Commun."}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1661318\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T06:26:10Z","timestamp":1763360770000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1661318\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,17]]},"references-count":36,"alternative-id":["10.3389\/frai.2025.1661318"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1661318","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,17]]},"article-number":"1661318"}}