{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T07:07:53Z","timestamp":1776841673365,"version":"3.51.2"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2023,5,18]],"date-time":"2023-05-18T00:00:00Z","timestamp":1684368000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Accurately predicting the antigen-binding specificity of adaptive immune receptors (AIRs), such as T-cell receptors (TCRs) and B-cell receptors (BCRs), is essential for discovering new immune therapies. However, the diversity of AIR chain sequences limits the accuracy of current prediction methods. This study introduces SC-AIR-BERT, a pre-trained model that learns comprehensive sequence representations of paired AIR chains to improve binding specificity prediction. SC-AIR-BERT first learns the \u2018language\u2019 of AIR sequences through self-supervised pre-training on a large cohort of paired AIR chains from multiple single-cell resources. The model is then fine-tuned with a multilayer perceptron head for binding specificity prediction, employing the K-mer strategy to enhance sequence representation learning. Extensive experiments demonstrate the superior AUC performance of SC-AIR-BERT compared with current methods for TCR- and BCR-binding specificity prediction.<\/jats:p>","DOI":"10.1093\/bib\/bbad191","type":"journal-article","created":{"date-parts":[[2023,5,19]],"date-time":"2023-05-19T12:39:45Z","timestamp":1684499985000},"source":"Crossref","is-referenced-by-count":19,"title":["SC-AIR-BERT: a pre-trained single-cell model for predicting the antigen-binding specificity of the adaptive immune receptor"],"prefix":"10.1093","volume":"24","author":[{"given":"Yu","family":"Zhao","sequence":"first","affiliation":[{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"}]},{"given":"Xiaona","family":"Su","sequence":"additional","affiliation":[{"name":"School of Informatics, Xiamen University , South Siming Road 422, 361005 Xiamen , China"},{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"}]},{"given":"Weitong","family":"Zhang","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"},{"name":"Department of Computer Science, City University of Hong Kong , Kowloon Tong , Hong Kong SAR"}]},{"given":"Sijie","family":"Mai","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Technology, Sun Yat-sen University , Xingangxi Road 135, 510275 Guangzhou , China"}]},{"given":"Zhimeng","family":"Xu","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"}]},{"given":"Chenchen","family":"Qin","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"}]},{"given":"Rongshan","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Informatics, Xiamen University , South Siming Road 422, 361005 Xiamen , China"}]},{"given":"Bing","family":"He","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"}]},{"given":"Jianhua","family":"Yao","sequence":"additional","affiliation":[{"name":"AI Lab, Tencent, Viseen Business Park , Gaoxin 9th South Road, 518057 Shenzhen , China"}]}],"member":"286","published-online":{"date-parts":[[2023,5,18]]},"reference":[{"issue":"1","key":"2023072020054823400_ref1","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1111\/imr.12667","article-title":"Human adaptive immune receptor repertoire analysis-past, present, and future","volume":"284","author":"Nielsen","year":"2018","journal-title":"Immunol Rev"},{"key":"2023072020054823400_ref2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1038\/ni1153","article-title":"Innate and adaptive immunity: specificities and signaling hierarchies revisited","volume":"6","author":"Vivier","year":"2005","journal-title":"Nat Immunol"},{"issue":"3","key":"2023072020054823400_ref3","doi-asserted-by":"crossref","first-page":"035005","DOI":"10.1088\/1478-3975\/10\/3\/035005","article-title":"Conserved variation: identifying patterns of stability and variability in BCR and TCR V genes with different diversity and richness metrics","volume":"10","author":"Schwartz","year":"2013","journal-title":"Phys Biol"},{"issue":"12","key":"2023072020054823400_ref4","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1016\/j.it.2014.09.004","article-title":"Characterizing immune repertoires by high throughput sequencing: strategies and applications","volume":"35","author":"Calis","year":"2014","journal-title":"Trends Immunol"},{"issue":"7661","key":"2023072020054823400_ref5","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1038\/nature22976","article-title":"Identifying specificity groups in the T cell receptor repertoire","volume":"547","author":"Glanville","year":"2017","journal-title":"Nature"},{"issue":"7661","key":"2023072020054823400_ref6","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nature22383","article-title":"Quantifiable predictive features define epitope-specific T cell receptor repertoires","volume":"547","author":"Dash","year":"2017","journal-title":"Nature"},{"key":"2023072020054823400_ref7","doi-asserted-by":"crossref","DOI":"10.1101\/2021.11.18.469186","article-title":"TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-xbinding analyses","volume-title":"bioRxiv","author":"Wu","year":"2021"},{"issue":"8","key":"2023072020054823400_ref8","doi-asserted-by":"crossref","first-page":"602","DOI":"10.2174\/1389202921999200625220812","article-title":"The comparison of two single-cell sequencing platforms: BD rhapsody and 10x genomics chromium","volume":"21","author":"Gao","year":"2020","journal-title":"Curr Genomics"},{"issue":"1","key":"2023072020054823400_ref9","doi-asserted-by":"crossref","first-page":"2309","DOI":"10.1038\/s41467-021-22667-2","article-title":"Author correction: DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires","volume":"12","author":"Sidhom","year":"2021","journal-title":"Nat Commun"},{"issue":"20","key":"2023072020054823400_ref10","doi-asserted-by":"crossref","first-page":"eabf5835","DOI":"10.1126\/sciadv.abf5835","article-title":"A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity","volume":"7","author":"Zhang","year":"2021","journal-title":"Sci Adv"},{"issue":"14","key":"2023072020054823400_ref11","doi-asserted-by":"crossref","first-page":"e2023141118","DOI":"10.1073\/pnas.2023141118","article-title":"Deep generative selection models of T and B cell receptor repertoires with soNNia","volume":"118","author":"Isacchini","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023072020054823400_ref12","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.aiopen.2021.08.002","article-title":"Pre-trained models: past, present and future","volume":"2","author":"Han","year":"2021","journal-title":"AI Open"},{"key":"2023072020054823400_ref13","doi-asserted-by":"crossref","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023072020054823400_ref14","doi-asserted-by":"crossref","first-page":"2000","DOI":"10.1016\/j.csbj.2020.07.008","article-title":"Methods for sequence and structural analysis of B and T cell receptor repertoires","volume":"18","author":"Teraguchi","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2023072020054823400_ref15","doi-asserted-by":"crossref","first-page":"D419","DOI":"10.1093\/nar\/gkx760","article-title":"VDJdb: a curated database of T-cell receptor sequences with known antigen specificity","volume":"46","author":"Shugay","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023072020054823400_ref16","doi-asserted-by":"crossref","first-page":"D339","DOI":"10.1093\/nar\/gky1006","article-title":"The Immune Epitope Database (IEDB): 2018 update","volume":"47","author":"Vita","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023072020054823400_ref17","doi-asserted-by":"crossref","first-page":"D1244","DOI":"10.1093\/nar\/gkab857","article-title":"huARdb: human Antigen Receptor database for interactive clonotype-transcriptome analysis at the single-cell level","volume":"50","author":"Wu","year":"2022","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"2023072020054823400_ref18","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1093\/bioinformatics\/btaa739","article-title":"CoV-AbDab: the coronavirus antibody database","volume":"37","author":"Raybould","year":"2021","journal-title":"Bioinformatics"},{"issue":"1","key":"2023072020054823400_ref19","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1038\/s41467-021-21795-z","article-title":"Comprehensive single-cell sequencing reveals the stromal dynamics and tumor-specific characteristics in the microenvironment of nasopharyngeal carcinoma","volume":"12","author":"Gong","year":"2021","journal-title":"Nat Commun"},{"key":"2023072020054823400_ref20","doi-asserted-by":"crossref","first-page":"eabb4432","DOI":"10.1126\/sciimmunol.abb4432","article-title":"Heterogeneity and clonal relationships of adaptive immune cells in ulcerative colitis revealed by single-cell analyses","volume":"5","author":"Boland","year":"2020","journal-title":"Sci Immunol"},{"issue":"7","key":"2023072020054823400_ref21","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1016\/j.cell.2021.01.053","article-title":"COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas","volume":"184","author":"Ren","year":"2021","journal-title":"Cell"},{"key":"2023072020054823400_ref22","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1038\/s41591-021-01329-2","article-title":"Single-cell multi-omics analysis of the immune response in COVID-19","volume":"27","author":"Stephenson","year":"2021","journal-title":"Nat Med"},{"issue":"5","key":"2023072020054823400_ref23","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1038\/s41590-022-01184-4","article-title":"SARS-CoV-2 antigen exposure history shapes phenotypes and specificity of memory CD8+ T cells","volume":"23","author":"Minervina","year":"2022","journal-title":"Nat Immunol"},{"key":"2023072020054823400_ref24","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2023072020054823400_ref25","article-title":"Transformer protein language models are unsupervised structure learners","volume-title":"International Conference on Learning Representations","author":"Rao","year":"2021"},{"issue":"1","key":"2023072020054823400_ref26","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1146\/annurev-immunol-032414-112334","article-title":"T cell antigen receptor recognition of antigen-presenting molecules","volume":"33","author":"Rossjohn","year":"2015","journal-title":"Annu Rev Immunol"},{"key":"2023072020054823400_ref27","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D19-1006","article-title":"How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ethayarajh","year":"2019"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/4\/bbad191\/50916801\/bbad191.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/4\/bbad191\/50916801\/bbad191.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,20]],"date-time":"2023-07-20T20:06:53Z","timestamp":1689883613000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad191\/7171417"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,18]]},"references-count":27,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad191","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,7]]},"published":{"date-parts":[[2023,5,18]]},"article-number":"bbad191"}}