{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:24Z","timestamp":1772138064067,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T00:00:00Z","timestamp":1691020800000},"content-version":"vor","delay-in-days":2,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Polish Ministry of Education and the research"},{"name":"Polish Ministry of Science and Higher Education","award":["7054\/IA\/SP\/2020"],"award-info":[{"award-number":["7054\/IA\/SP\/2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The advent of T-cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide:TCR binding data available and a number of machine-learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We prepare the dataset of known peptide:TCR binders and augment it with negative decoys created using healthy donors\u2019 T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing to train part a peptide:TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The datasets and the code for model training are available at https:\/\/github.com\/SFGLab\/bertrand.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad468","type":"journal-article","created":{"date-parts":[[2023,8,3]],"date-time":"2023-08-03T13:49:17Z","timestamp":1691070557000},"source":"Crossref","is-referenced-by-count":17,"title":["BERTrand\u2014peptide:TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR pairing"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6404-270X","authenticated-orcid":false,"given":"Alexander","family":"Myronov","sequence":"first","affiliation":[{"name":"Faculty of Mathematics and Information Science, Warsaw University of Technology , Warsaw, Poland"},{"name":"Ardigen , Krakow, Poland"}]},{"given":"Giovanni","family":"Mazzocco","sequence":"additional","affiliation":[{"name":"Ardigen , Krakow, Poland"}]},{"given":"Paulina","family":"Kr\u00f3l","sequence":"additional","affiliation":[{"name":"Ardigen , Krakow, Poland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3840-7610","authenticated-orcid":false,"given":"Dariusz","family":"Plewczynski","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics and Information Science, Warsaw University of Technology , Warsaw, Poland"}]}],"member":"286","published-online":{"date-parts":[[2023,8,3]]},"reference":[{"key":"2023082304364732500_btad468-B1","doi-asserted-by":"crossref","first-page":"e1006191","DOI":"10.1371\/journal.ppat.1006191","article-title":"Selective expansion of high functional avidity memory CD8 T cell clonotypes during hepatitis C virus reinfection and clearance","volume":"13","author":"Abdel-Hakeem","year":"2017","journal-title":"PLoS Pathog"},{"key":"2023082304364732500_btad468-B2","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.immuni.2017.02.007","article-title":"Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction","volume":"46","author":"Abelin","year":"2017","journal-title":"Immunity"},{"key":"2023082304364732500_btad468-B3","doi-asserted-by":"crossref","first-page":"e108658","DOI":"10.1371\/journal.pone.0108658","article-title":"Assessing T cell clonal size distribution: a non-parametric approach","volume":"9","author":"Bolkhovskaya","year":"2014","journal-title":"PLoS One"},{"key":"2023082304364732500_btad468-B4","doi-asserted-by":"crossref","first-page":"131.4","DOI":"10.4049\/jimmunol.202.Supp.131.4","article-title":"Scalable and comprehensive characterization of antigen-specific CD8 T cells using multi-omics single cell analysis","volume":"202","author":"Boutet","year":"2019","journal-title":"J Immunol"},{"key":"2023082304364732500_btad468-B5","doi-asserted-by":"crossref","first-page":"5005","DOI":"10.4049\/jimmunol.1600005","article-title":"Dynamics of individual T cell repertoires: from cord blood to centenarians","volume":"196","author":"Britanova","year":"2016","journal-title":"J Immunol"},{"key":"2023082304364732500_btad468-B6","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nature22383","article-title":"Quantifiable predictive features define epitope-specific T cell receptor repertoires","volume":"547","author":"Dash","year":"2017","journal-title":"Nature"},{"key":"2023082304364732500_btad468-B7","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1186\/s13073-015-0238-z","article-title":"Annotation of pseudogenic gene segments by massively parallel sequencing of rearranged lymphocyte receptor loci","volume":"7","author":"Dean","year":"2015","journal-title":"Genome Med"},{"key":"2023082304364732500_btad468-B8","first-page":"4171","author":"Devlin","year":"2019"},{"key":"2023082304364732500_btad468-B9","doi-asserted-by":"crossref","first-page":"2639","DOI":"10.4049\/jimmunol.1700938","article-title":"Unveiling the peptide motifs of HLA-C and HLA-G from naturally presented peptides and generation of binding prediction matrices","volume":"199","author":"Di Marco","year":"2017","journal-title":"J Immunol"},{"key":"2023082304364732500_btad468-B10","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1038\/ng.3822","article-title":"Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire","volume":"49","author":"Emerson","year":"2017","journal-title":"Nat Genet"},{"key":"2023082304364732500_btad468-B11","doi-asserted-by":"crossref","first-page":"eaar3947","DOI":"10.1126\/sciimmunol.aar3947","article-title":"A subset of HLA-I peptides are not genomically templated: evidence for cis- and trans-spliced peptide ligands","volume":"3","author":"Faridi","year":"2018","journal-title":"Sci Immunol"},{"key":"2023082304364732500_btad468-B12","doi-asserted-by":"crossref","DOI":"10.2307\/j.ctv15r5djw","volume-title":"Immunology and Evolution of Infectious Disease","author":"Frank","year":"2020"},{"key":"2023082304364732500_btad468-B13","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1038\/s42256-023-00619-3","article-title":"Pan-peptide meta learning for T-cell receptor\u2013antigen binding recognition","volume":"5","author":"Gao","year":"2023","journal-title":"Nat Mach Intell"},{"key":"2023082304364732500_btad468-B14","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1016\/j.cell.2017.11.043","article-title":"Antigen identification for orphan T cell receptors expressed on tumor-infiltrating lymphocytes","volume":"172","author":"Gee","year":"2018","journal-title":"Cell"},{"key":"2023082304364732500_btad468-B15","doi-asserted-by":"crossref","first-page":"2820","DOI":"10.3389\/fimmu.2019.02820","article-title":"Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires","volume":"10","author":"Gielis","year":"2019","journal-title":"Front Immunol"},{"key":"2023082304364732500_btad468-B16","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1038\/nature22976","article-title":"Identifying specificity groups in the T cell receptor repertoire","volume":"547","author":"Glanville","year":"2017","journal-title":"Nature"},{"key":"2023082304364732500_btad468-B17","doi-asserted-by":"crossref","first-page":"979","DOI":"10.4049\/jimmunol.1801401","article-title":"Antigen-specific TCR signatures of cytomegalovirus infection","volume":"202","author":"Huth","year":"2019","journal-title":"J Immunol"},{"key":"2023082304364732500_btad468-B18","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2023082304364732500_btad468-B19","doi-asserted-by":"crossref","first-page":"e1008814","DOI":"10.1371\/journal.pcbi.1008814","article-title":"Predicting recognition between T cell receptors and epitopes with TCRGP","volume":"17","author":"Jokinen","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2023082304364732500_btad468-B20","doi-asserted-by":"crossref","first-page":"e1008122","DOI":"10.1371\/journal.ppat.1008122","article-title":"CDR3\u03b1 drives selection of the immunodominant Epstein Barr virus (EBV) BRLF1-specific CD8 T cell receptor repertoire in primary infection","volume":"15","author":"Kamga","year":"2019","journal-title":"PLoS Pathog"},{"key":"2023082304364732500_btad468-B21","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1038\/s41577-018-0007-5","article-title":"Understanding the drivers of MHC restriction of T cell receptors","volume":"18","author":"La Gruta","year":"2018","journal-title":"Nat Rev Immunol"},{"key":"2023082304364732500_btad468-B22","doi-asserted-by":"crossref","first-page":"864","DOI":"10.1038\/s42256-021-00383-2","article-title":"Deep learning-based prediction of the T cell receptor-antigen binding specificity","volume":"3","author":"Lu","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2023082304364732500_btad468-B23","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1016\/j.jtbi.2015.10.016","article-title":"How many TCR clonotypes does a body maintain?","volume":"389","author":"Lythe","year":"2016","journal-title":"J Theor Biol"},{"key":"2023082304364732500_btad468-B24","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1172\/JCI123791","article-title":"Neoantigen screening identifies broad TP53 mutant immunogenicity in patients with epithelial cancers","volume":"129","author":"Malekzadeh","year":"2019","journal-title":"J Clin Invest"},{"key":"2023082304364732500_btad468-B25","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1016\/S0167-5699(98)01299-7","article-title":"A very high level of crossreactivity is an essential feature of the T-cell receptor","volume":"19","author":"Mason","year":"1998","journal-title":"Immunol Today"},{"key":"2023082304364732500_btad468-B26","doi-asserted-by":"crossref","first-page":"1521","DOI":"10.1007\/s00018-011-0659-9","article-title":"Insights into MHC class I antigen processing gained from large-scale analysis of class I ligands","volume":"68","author":"Mester","year":"2011","journal-title":"Cell Mol Life Sci"},{"key":"2023082304364732500_btad468-B27","doi-asserted-by":"crossref","first-page":"100024","DOI":"10.1016\/j.immuno.2023.100024","article-title":"Benchmarking solutions to the T-cell receptor epitope prediction problem: IMMREP22 workshop report","volume":"9","author":"Meysman","year":"2023","journal-title":"Immunoinformatics"},{"key":"2023082304364732500_btad468-B28","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1038\/s42003-021-02610-3","article-title":"NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCR\u03b1 and \u03b2 sequence data","volume":"4","author":"Montemurro","year":"2021","journal-title":"Commun Biol"},{"key":"2023082304364732500_btad468-B29","doi-asserted-by":"crossref","first-page":"1267","DOI":"10.3389\/fimmu.2017.01267","article-title":"Quantitative characterization of the T cell receptor repertoire of na\u00efve and memory subsets using an integrated experimental and computational pipeline which is robust, economical, and versatile","volume":"8","author":"Oakes","year":"2017","journal-title":"Front Immunol"},{"key":"2023082304364732500_btad468-B30","doi-asserted-by":"crossref","first-page":"1022","DOI":"10.1158\/2159-8290.CD-18-1494","article-title":"Unique neoantigens arise from somatic mutations in patients with gastrointestinal cancers","volume":"9","author":"Parkhurst","year":"2019","journal-title":"Cancer Discov"},{"key":"2023082304364732500_btad468-B31","doi-asserted-by":"crossref","first-page":"12704","DOI":"10.1073\/pnas.1809642115","article-title":"Precise tracking of vaccine-responding T cell clones reveals convergent and personalized response in identical twins","volume":"115","author":"Pogorelyy","year":"2018","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023082304364732500_btad468-B32","doi-asserted-by":"crossref","first-page":"13139","DOI":"10.1073\/pnas.1409155111","article-title":"Diversity and clonal selection in the human T-cell repertoire","volume":"111","author":"Qi","year":"2014","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023082304364732500_btad468-B33","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023082304364732500_btad468-B34","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1146\/annurev.immunol.23.021704.115658","article-title":"How TCRs bind MHCs, peptides, and coreceptors","volume":"24","author":"Rudolph","year":"2006","journal-title":"Annu Rev Immunol"},{"key":"2023082304364732500_btad468-B35","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1038\/s41587-019-0322-9","article-title":"A large peptidome dataset improves HLA class I epitope prediction across most of the human population","volume":"38","author":"Sarkizova","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2023082304364732500_btad468-B36","doi-asserted-by":"crossref","first-page":"D419","DOI":"10.1093\/nar\/gkx760","article-title":"VDJdb: a curated database of T-cell receptor sequences with known antigen specificity","volume":"46","author":"Shugay","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023082304364732500_btad468-B37","doi-asserted-by":"crossref","first-page":"1605","DOI":"10.1038\/s41467-021-21879-w","article-title":"DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires","volume":"12","author":"Sidhom","year":"2021","journal-title":"Nat Commun"},{"key":"2023082304364732500_btad468-B38","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1016\/j.jhep.2019.06.005","article-title":"Defining virus-specific CD8+ TCR repertoires for therapeutic regeneration of T cells against chronic hepatitis E","volume":"71","author":"Soon","year":"2019","journal-title":"J Hepatol"},{"key":"2023082304364732500_btad468-B39","doi-asserted-by":"crossref","first-page":"1803","DOI":"10.3389\/fimmu.2020.01803","article-title":"Prediction of specific TCR-peptide binding from large dictionaries of TCR-peptide pairs","volume":"11","author":"Springer","year":"2020","journal-title":"Front Immunol"},{"key":"2023082304364732500_btad468-B40","doi-asserted-by":"crossref","first-page":"949","DOI":"10.1007\/s00262-018-2152-x","article-title":"Quantitative T-cell repertoire analysis of peripheral blood mononuclear cells from lung cancer patients following long-term cancer peptide vaccination","volume":"67","author":"Takeda","year":"2018","journal-title":"Cancer Immunol Immunother"},{"key":"2023082304364732500_btad468-B41","doi-asserted-by":"crossref","first-page":"2924","DOI":"10.1093\/bioinformatics\/btx286","article-title":"McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences","volume":"33","author":"Tickotsky","year":"2017","journal-title":"Bioinformatics"},{"key":"2023082304364732500_btad468-B42","doi-asserted-by":"crossref","first-page":"i237","DOI":"10.1093\/bioinformatics\/btab294","article-title":"TITAN: T-cell receptor specificity prediction with bimodal attention networks","volume":"37","author":"Weber","year":"2021","journal-title":"Bioinformatics"},{"key":"2023082304364732500_btad468-B43","author":"Wolf","year":"2019"},{"key":"2023082304364732500_btad468-B44","doi-asserted-by":"crossref","first-page":"bbab335","DOI":"10.1093\/bib\/bbab335","article-title":"DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor","volume":"22","author":"Xu","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023082304364732500_btad468-B45","doi-asserted-by":"crossref","first-page":"1156","DOI":"10.1038\/nbt.4282","article-title":"High-throughput determination of the antigen specificities of T cell receptors in single cells","volume":"36","author":"Zhang","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2023082304364732500_btad468-B46","doi-asserted-by":"crossref","first-page":"897","DOI":"10.1093\/bioinformatics\/btz614","article-title":"PIRD: pan immune repertoire database","volume":"36","author":"Zhang","year":"2020","journal-title":"Bioinformatics"},{"key":"2023082304364732500_btad468-B47","doi-asserted-by":"crossref","first-page":"eabf5835","DOI":"10.1126\/sciadv.abf5835","article-title":"A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity","volume":"7","author":"Zhang","year":"2021","journal-title":"Sci Adv"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad468\/51032269\/btad468.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/8\/btad468\/51226953\/btad468.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/8\/btad468\/51226953\/btad468.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T01:25:46Z","timestamp":1692753946000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad468\/7236502"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,8,1]]},"references-count":47,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad468","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.06.12.544613","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,8,1]]},"published":{"date-parts":[[2023,8,1]]},"article-number":"btad468"}}