{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T17:26:18Z","timestamp":1777397178873,"version":"3.51.4"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T00:00:00Z","timestamp":1738195200000},"content-version":"vor","delay-in-days":69,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"RGC Healthy Longevity Catalyst Awards","award":["CityU9080002"],"award-info":[{"award-number":["CityU9080002"]}]},{"name":"RGC Healthy Longevity Catalyst Awards","award":["HLCA\/E-107\/23"],"award-info":[{"award-number":["HLCA\/E-107\/23"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32300527"],"award-info":[{"award-number":["32300527"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100017596","name":"Natural Science Basic Research Program of Shaanxi Province","doi-asserted-by":"publisher","award":["2022JQ-644"],"award-info":[{"award-number":["2022JQ-644"]}],"id":[{"id":"10.13039\/501100017596","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32300527"],"award-info":[{"award-number":["32300527"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The complexity of T cell receptor (TCR) sequences, particularly within the complementarity-determining region 3 (CDR3), requires efficient embedding methods for applying machine learning to immunology. While various TCR CDR3 embedding strategies have been proposed, the absence of their systematic evaluations created perplexity in the community. Here, we extracted CDR3 embedding models from 19 existing methods and benchmarked these models with four curated datasets by accessing their impact on the performance of TCR downstream tasks, including TCR-epitope binding affinity prediction, epitope-specific TCR identification, TCR clustering, and visualization analysis. We assessed these models utilizing eight downstream classifiers and five downstream clustering methods, with the performance measured by a diverse range of metrics for precision, robustness, and usability. Overall, handcrafted embeddings outperformed data-driven ones in modeling TCR-epitope interactions. To further refine our comparative findings, we developed an all-in-one TCR CDR3 embedding package comprising all evaluated embedding models. This package will assist users in easily selecting suitable embedding models for their data.<\/jats:p>","DOI":"10.1093\/bib\/bbaf030","type":"journal-article","created":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T15:06:01Z","timestamp":1738249561000},"source":"Crossref","is-referenced-by-count":6,"title":["A comprehensive benchmarking for evaluating TCR embeddings in modeling TCR-epitope interactions"],"prefix":"10.1093","volume":"26","author":[{"given":"Xikang","family":"Feng","sequence":"first","affiliation":[{"name":"School of Software, Northwestern Polytechnical University , 127 West Youyi Road, Beilin District, Xi'an Shaanxi, 710072 ,","place":["China"]}]},{"given":"Miaozhe","family":"Huo","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong , 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, 999077 ,","place":["China"]}]},{"given":"He","family":"Li","sequence":"additional","affiliation":[{"name":"School of Software, Northwestern Polytechnical University , 127 West Youyi Road, Beilin District, Xi'an Shaanxi, 710072 ,","place":["China"]}]},{"given":"Yongze","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong , 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, 999077 ,","place":["China"]}]},{"given":"Yuepeng","family":"Jiang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong , 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, 999077 ,","place":["China"]}]},{"given":"Liang","family":"He","sequence":"additional","affiliation":[{"name":"School of Software, Northwestern Polytechnical University , 127 West Youyi Road, Beilin District, Xi'an Shaanxi, 710072 ,","place":["China"]}]},{"given":"Shuai","family":"Cheng Li","sequence":"additional","affiliation":[{"name":"Department of Computer Science, City University of Hong Kong , 83 Tat Chee Avenue, Kowloon Tong, Hong Kong, 999077 ,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,30]]},"reference":[{"key":"2025013015053138600_ref1","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1038\/334395a0","article-title":"T-cell antigen receptor genes and T-cell recognition","volume":"334","author":"Davis","year":"1988","journal-title":"Nature"},{"key":"2025013015053138600_ref2","doi-asserted-by":"publisher","first-page":"286","DOI":"10.1016\/j.coi.2009.05.004","article-title":"Functional implications of T cell receptor diversity","volume":"21","author":"Turner","year":"2009","journal-title":"Curr Opin Immunol"},{"key":"2025013015053138600_ref3","doi-asserted-by":"publisher","first-page":"1518","DOI":"10.1073\/pnas.0913939107","article-title":"High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets","volume":"107","author":"Wang","year":"2010","journal-title":"Proc Natl Acad Sci"},{"key":"2025013015053138600_ref4","doi-asserted-by":"publisher","first-page":"e38358","DOI":"10.7554\/eLife.38358","article-title":"Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity","volume":"7","author":"DeWitt","year":"2018","journal-title":"Elife"},{"key":"2025013015053138600_ref5","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1158\/2326-6066.CIR-19-0398","article-title":"TCR repertoire diversity of peripheral PD-1+ CD8+ T cells predicts clinical outcomes after immunotherapy in patients with non\u2013small cell lung cancer","volume":"8","author":"Han","year":"2020","journal-title":"Cancer Immunol Res"},{"key":"2025013015053138600_ref6","doi-asserted-by":"publisher","first-page":"2729","DOI":"10.3389\/fimmu.2018.02729","article-title":"TCR repertoire as a novel indicator for immune monitoring and prognosis assessment of patients with cervical cancer","volume":"9","author":"Cui","year":"2018","journal-title":"Front Immunol"},{"key":"2025013015053138600_ref7","doi-asserted-by":"publisher","first-page":"e68605","DOI":"10.7554\/eLife.68605","article-title":"TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs","volume":"10","author":"Mayer-Blackwell","year":"2021","journal-title":"Elife"},{"key":"2025013015053138600_ref8","doi-asserted-by":"publisher","DOI":"10.1126\/scitranslmed.aaz3738","article-title":"De novo prediction of cancer-associated T cell receptors for noninvasive cancer detection","volume":"12","author":"Beshnova","year":"2020","journal-title":"Sci Transl Med"},{"key":"2025013015053138600_ref9","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-21879-w","article-title":"Deeptcr is a deep learning framework for revealing sequence concepts within T-cell repertoires","volume":"12","author":"John-William Sidhom","year":"2021","journal-title":"Nat Commun"},{"key":"2025013015053138600_ref10","doi-asserted-by":"publisher","first-page":"eabq5089","DOI":"10.1126\/sciadv.abq5089","article-title":"Deep learning reveals predictive sequence concepts within immune repertoires to immunotherapy","volume":"8","author":"Sidhom","year":"2022","journal-title":"Sci Adv"},{"key":"2025013015053138600_ref11","doi-asserted-by":"publisher","first-page":"bbad086","DOI":"10.1093\/bib\/bbad086","article-title":"TEINet: a deep learning framework for prediction of TCR\u2013epitope binding specificity","volume":"24","author":"Jiang","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025013015053138600_ref12","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbad038","article-title":"Deep autoregressive generative models capture the intrinsics embedded in T-cell receptor repertoires","volume":"24","author":"Jiang","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025013015053138600_ref13","doi-asserted-by":"publisher","first-page":"4865","DOI":"10.1093\/bioinformatics\/btab446","article-title":"ClusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity","volume":"37","author":"Valkiers","year":"2021","journal-title":"Bioinformatics"},{"key":"2025013015053138600_ref14","doi-asserted-by":"publisher","first-page":"4699","DOI":"10.1038\/s41467-021-25006-7","article-title":"GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation","volume":"12","author":"Zhang","year":"2021","journal-title":"Nat Commun"},{"key":"2025013015053138600_ref15","doi-asserted-by":"publisher","first-page":"864","DOI":"10.1038\/s42256-021-00383-2","article-title":"Deep learning-based prediction of the T cell receptor\u2013antigen binding specificity","volume":"3","author":"Tianshi","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2025013015053138600_ref16","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1038\/s41577-023-00835-3","article-title":"Can we predict T cell specificity with digital biology and machine learning?","volume":"23","author":"Hudson","year":"2023","journal-title":"Nat Rev Immunol"},{"key":"2025013015053138600_ref17","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst"},{"key":"2025013015053138600_ref18","doi-asserted-by":"publisher","first-page":"893247","DOI":"10.3389\/fimmu.2022.893247","article-title":"ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model","volume":"13","author":"Cai","year":"2022","journal-title":"Front Immunol"},{"key":"2025013015053138600_ref19","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2025013015053138600_ref20","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1038\/nbt.4314","article-title":"Dimensionality reduction for visualizing single-cell data using UMAP","volume":"37","author":"Becht","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2025013015053138600_ref21","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbad175","article-title":"Quantitative annotations of T-cell repertoire specificity","volume":"24","author":"Luo","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025013015053138600_ref22","first-page":"18832","article-title":"Modern Hopfield networks and attention for immune repertoire classification","volume":"33","author":"Widrich","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025013015053138600_ref23","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbaa318","article-title":"Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification","volume":"22","author":"Moris","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025013015053138600_ref24","doi-asserted-by":"publisher","DOI":"10.3390\/genes12040572","article-title":"Predicting TCR-epitope binding specificity using deep metric learning and multimodal learning","volume":"12","author":"Luu","year":"2021","journal-title":"Genes"},{"key":"2025013015053138600_ref25","doi-asserted-by":"publisher","first-page":"936","DOI":"10.1038\/s42256-021-00413-z","article-title":"The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires","volume":"3","author":"Pavlovi\u0107","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2025013015053138600_ref26","doi-asserted-by":"publisher","first-page":"107281","DOI":"10.1016\/j.compbiolchem.2020.107281","article-title":"SETE: sequence-based ensemble learning approach for TCR epitope binding prediction","volume":"87","author":"Tong","year":"2020","journal-title":"Comput Biol Chem"},{"key":"2025013015053138600_ref27","doi-asserted-by":"publisher","first-page":"e1008814","DOI":"10.1371\/journal.pcbi.1008814","article-title":"Predicting recognition between T cell receptors and epitopes with TCRGP","volume":"17","author":"Jokinen","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2025013015053138600_ref28","doi-asserted-by":"publisher","first-page":"1060","DOI":"10.1038\/s42003-021-02610-3","article-title":"NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCR$\\alpha $ and $\\beta $ sequence data","volume":"4","author":"Montemurro","year":"2021","journal-title":"Commun Biol"},{"key":"2025013015053138600_ref29","doi-asserted-by":"publisher","first-page":"1359","DOI":"10.1158\/1078-0432.CCR-19-3249","article-title":"Investigation of antigen-specific T-cell receptor clusters in human cancers","volume":"26","author":"Zhang","year":"2020","journal-title":"Clin Cancer Res"},{"key":"2025013015053138600_ref30","doi-asserted-by":"publisher","first-page":"i237","DOI":"10.1093\/bioinformatics\/btab294","article-title":"Titan: T-cell receptor specificity prediction with bimodal attention networks","volume":"37","author":"Weber","year":"2021","journal-title":"Bioinformatics"},{"key":"2025013015053138600_ref31","doi-asserted-by":"publisher","first-page":"664514","DOI":"10.3389\/fimmu.2021.664514","article-title":"Contribution of T cell receptor alpha and beta CDR3, MHC typing, V and J genes to peptide binding prediction","volume":"12","author":"Springer","year":"2021","journal-title":"Front Immunol"},{"key":"2025013015053138600_ref32","doi-asserted-by":"publisher","article-title":"Context-aware amino acid embedding advances analysis of TCR-epitope interactions","author":"Zhang","DOI":"10.7554\/eLife.88837.2"},{"key":"2025013015053138600_ref33","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","journal-title":"PNAS"},{"key":"2025013015053138600_ref34","doi-asserted-by":"crossref","article-title":"Transformer protein language models are unsupervised structure learners","author":"Rao","DOI":"10.1101\/2020.12.15.422761"},{"key":"2025013015053138600_ref35","doi-asserted-by":"publisher","first-page":"2227","DOI":"10.18653\/v1\/N18-1202","article-title":"Deep contextualized word representations","author":"Peters","year":"2018"},{"key":"2025013015053138600_ref36","doi-asserted-by":"publisher","first-page":"D339","DOI":"10.1093\/nar\/gky1006","article-title":"The immune epitope database (IEDB): 2018 update","volume":"47","author":"Vita","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025013015053138600_ref37","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1038\/s41592-022-01578-0","article-title":"VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2","volume":"19","author":"Goncharov","year":"2022","journal-title":"Nat Methods"},{"key":"2025013015053138600_ref38","doi-asserted-by":"publisher","first-page":"2924","DOI":"10.1093\/bioinformatics\/btx286","article-title":"McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences","volume":"33","author":"Tickotsky","year":"2017","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbaf030\/61699927\/bbaf030.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbaf030\/61699927\/bbaf030.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T15:06:05Z","timestamp":1738249565000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf030\/7990508"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf030","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,11,22]]},"article-number":"bbaf030"}}