{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:05:11Z","timestamp":1754154311992,"version":"3.41.2"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"vor","delay-in-days":23,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R21CA264381"],"award-info":[{"award-number":["R21CA264381"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Predicting T-cell receptor (TCR) recognizing antigen peptides is crucial for understanding the immune system and developing new treatments for cancer, infectious and autoimmune diseases. As experimental methods for identifying TCR\u2013antigen recognition are expensive and time-consuming, machine-learning approaches are increasingly used. However, existing computational tools often struggle with generalization due to limited data and challenges in acquiring true non-recognition pairs and rarely integrate multiple biological features into unified frameworks. To address these challenges, we propose a two-step framework for predicting TCR\u2013antigen recognition. The first step focuses on feature engineering: neural network-based embeddings of letter-based TCR and peptide sequences inspired by language models, and categorical encoding of Human Leukocyte Antigen types and Variable\/Joining genes. In the second step, we built a prediction model to assess the likelihood of TRC\u2013antigen recognition by a Bayesian Feedforward Neural Network. We trained and validated the framework using large public databases. Our results demonstrate that our advanced feature engineering delivers strong predictive performance both internally and externally. We applied the framework to a real-world case for predicting whether specific TCRs can recognize SARS-CoV-2 epitope peptides, demonstrating that our framework can function as a de novo TCR\u2013antigen prediction tool applicable to infectious diseases.<\/jats:p>","DOI":"10.1093\/bib\/bbaf351","type":"journal-article","created":{"date-parts":[[2025,7,3]],"date-time":"2025-07-03T07:50:21Z","timestamp":1751529021000},"source":"Crossref","is-referenced-by-count":0,"title":["PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning"],"prefix":"10.1093","volume":"26","author":[{"given":"Phi","family":"Le","sequence":"first","affiliation":[{"name":"Department of Medicine, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leah","family":"Ung","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hai","family":"Yang","sequence":"additional","affiliation":[{"name":"Helen Diller Family Comprehensive Cancer Center, University of California San Francisco , 1450 3rd St. San Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anwen","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]},{"name":"University of California Berkeley , 110 Sproul Hall, Berkeley, CA 94720 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tao","family":"He","sequence":"additional","affiliation":[{"name":"Department of Mathematics, San Francisco State University , 1600 Holloway AvenueSan Francisco, CA 94132 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Bruno","sequence":"additional","affiliation":[{"name":"Helen Diller Family Comprehensive Cancer Center, University of California San Francisco , 1450 3rd St. San Francisco, CA 94158 ,","place":["United States"]},{"name":"Department of Urology, University of California San Francisco , 1450 3rd StreetSan Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David Y","family":"Oh","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]},{"name":"Helen Diller Family Comprehensive Cancer Center, University of California San Francisco , 1450 3rd St. San Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bridget P","family":"Keenan","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]},{"name":"Helen Diller Family Comprehensive Cancer Center, University of California San Francisco , 1450 3rd St. San Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3617-2627","authenticated-orcid":false,"given":"Li","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]},{"name":"Helen Diller Family Comprehensive Cancer Center, University of California San Francisco , 1450 3rd St. San Francisco, CA 94158 ,","place":["United States"]},{"name":"Department of Epidemiology & Biostatistics, University of California San Francisco , 550 16th Street, San Francisco, CA 94158 ,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,7,24]]},"reference":[{"key":"2025072400195450500_ref1","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1038\/s41589-018-0130-4","article-title":"T cell receptor cross-reactivity expanded by dramatic peptide-MHC adaptability","volume":"14","author":"Riley","year":"2018","journal-title":"Nat Chem Biol"},{"volume-title":"Front Immunol","author":"Yang","key":"2025072400195450500_ref2","doi-asserted-by":"publisher","DOI":"10.3389\/fimmu.2023.1181825"},{"key":"2025072400195450500_ref3","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1038\/s42003-021-02610-3","article-title":"NetTCR-2.0 enables accurate prediction of TCR\u2013peptide binding by using paired TCR\u03b1 and \u03b2 sequence data","volume":"4","author":"Montemurro","year":"2021","journal-title":"Commun Biol"},{"key":"2025072400195450500_ref4","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025072400195450500_ref5","doi-asserted-by":"crossref","first-page":"bbaa318","DOI":"10.1093\/bib\/bbaa318","article-title":"Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification","volume":"22","author":"Moris","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025072400195450500_ref6","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1038\/s41592-020-01020-3","article-title":"Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics","volume":"18","author":"Zhang","year":"2021","journal-title":"Nat Methods"},{"key":"2025072400195450500_ref7","doi-asserted-by":"crossref","first-page":"6395","DOI":"10.1073\/pnas.0408677102","article-title":"Solving the protein sequence metric problem","volume":"102","author":"Atchley","year":"2005","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025072400195450500_ref8","doi-asserted-by":"crossref","first-page":"893247","DOI":"10.3389\/fimmu.2022.893247","article-title":"ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model","volume":"13","author":"Cai","year":"2022","journal-title":"Front Immunol"},{"key":"2025072400195450500_ref9","doi-asserted-by":"crossref","first-page":"114381","DOI":"10.1109\/ACCESS.2021.3104357","article-title":"A deep-learned embedding technique for categorical features encoding","volume":"9","author":"Dahouda","year":"2021","journal-title":"IEEE Access"},{"volume-title":"Attention is all you need Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Vaswani","key":"2025072400195450500_ref10"},{"key":"2025072400195450500_ref11","doi-asserted-by":"crossref","first-page":"2394","DOI":"10.1021\/bi102019c","article-title":"Dissecting protein\u2212protein interactions using directed evolution","volume":"50","author":"Bonsor","year":"2011","journal-title":"Biochemistry."},{"article-title":"Sequence to sequence learning with neural networks [internet]","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems","author":"Sutskever","key":"2025072400195450500_ref12"},{"key":"2025072400195450500_ref13","doi-asserted-by":"crossref","first-page":"1724","DOI":"10.3115\/v1\/D14-1179","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) [Internet]","author":"Cho","year":"2014"},{"key":"2025072400195450500_ref14","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/s40537-023-00876-4","article-title":"A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions","volume":"11","author":"Khemani","year":"2024","journal-title":"J Big Data"},{"key":"2025072400195450500_ref15","doi-asserted-by":"crossref","first-page":"664514","DOI":"10.3389\/fimmu.2021.664514","article-title":"Contribution of T cell receptor alpha and Beta CDR3, MHC typing, V and J genes to peptide binding prediction","volume":"12","author":"Springer","year":"2021","journal-title":"Front Immunol"},{"key":"2025072400195450500_ref16","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1038\/s42256-023-00619-3","article-title":"Pan-peptide meta learning for T-cell receptor\u2013antigen binding recognition","volume":"5","author":"Gao","year":"2023","journal-title":"Nat Mach Intell"},{"key":"2025072400195450500_ref17","doi-asserted-by":"crossref","first-page":"36538","DOI":"10.1109\/ACCESS.2022.3163384","article-title":"A review on Bayesian deep learning in healthcare: applications and challenges","volume":"10","author":"Abdullah","year":"2022","journal-title":"IEEE Access"},{"key":"2025072400195450500_ref18","first-page":"274","volume-title":"Probabilistic Deep Learning: With Python, Keras, and TensorFlow Probability","author":"D\u00fcrr","year":"2020"},{"volume-title":"Entropy (Basel)","key":"2025072400195450500_ref19","doi-asserted-by":"publisher","DOI":"10.3390\/e23010117"},{"key":"2025072400195450500_ref20","doi-asserted-by":"crossref","first-page":"D339","DOI":"10.1093\/nar\/gky1006","article-title":"The immune epitope database (IEDB): 2018 update","volume":"47","author":"Vita","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025072400195450500_ref21","doi-asserted-by":"crossref","first-page":"D419","DOI":"10.1093\/nar\/gkx760","article-title":"VDJdb: a curated database of T-cell receptor sequences with known antigen specificity","volume":"46","author":"Shugay","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025072400195450500_ref22","doi-asserted-by":"crossref","first-page":"2924","DOI":"10.1093\/bioinformatics\/btx286","article-title":"McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences","volume":"33","author":"Tickotsky","year":"2017","journal-title":"Bioinformatics."},{"key":"2025072400195450500_ref23","doi-asserted-by":"crossref","DOI":"10.21203\/rs.3.rs-51964\/v1","article-title":"A large-scale database of T-cell receptor beta (TCR\u03b2) sequences and binding associations from natural and synthetic exposure to SARS-CoV-2","volume-title":"Frontiers in Immunology","author":"Nolan"},{"key":"2025072400195450500_ref24","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1038\/nature22976","article-title":"Identifying specificity groups in the T cell receptor repertoire","volume":"547","author":"Glanville","year":"2017","journal-title":"Nature."},{"key":"2025072400195450500_ref25","doi-asserted-by":"crossref","first-page":"12704","DOI":"10.1073\/pnas.1809642115","article-title":"Precise tracking of vaccine-responding T cell clones reveals convergent and personalized response in identical twins","volume":"115","author":"Pogorelyy","year":"2018","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025072400195450500_ref26","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/nrc2355","article-title":"Adoptive cell transfer: a clinical path to effective cancer immunotherapy","volume":"8","author":"Rosenberg","year":"2008","journal-title":"Nat Rev Cancer"},{"key":"2025072400195450500_ref27","doi-asserted-by":"crossref","first-page":"e004034","DOI":"10.1136\/jitc-2021-004034","article-title":"Checkpoint blockade-induced CD8+ T cell differentiation in head and neck cancer responders","volume":"10","author":"Zhou","year":"2022","journal-title":"J Immunother Cancer"},{"key":"2025072400195450500_ref28","doi-asserted-by":"crossref","first-page":"3360","DOI":"10.4049\/jimmunol.1700893","article-title":"NetMHCpan-4.0: improved peptide\u2013MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data","volume":"199","author":"Jurtz","year":"2017","journal-title":"The Journal of Immunology"},{"article-title":"Inductive representation learning on large graphs, Publisher Curran Associates Inc.","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Hamilton","key":"2025072400195450500_ref29"},{"key":"2025072400195450500_ref30","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.imavis.2018.04.004","article-title":"Beyond one-hot encoding: lower dimensional target embedding","volume":"75","author":"Rodr\u00edguez","year":"2018","journal-title":"Image and Vision Computing"},{"volume-title":"Pattern Recognition","key":"2025072400195450500_ref31","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107585"},{"key":"2025072400195450500_ref32","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00861","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","volume-title":"Journal of Open Source Software","author":"McInnes"},{"key":"2025072400195450500_ref33","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J Am Stat Assoc"},{"key":"2025072400195450500_ref34","doi-asserted-by":"publisher","DOI":"10.3389\/fimmu.2024.1488860","volume-title":"Magnitude and Dynamics of the T-Cell Response to SARS-CoV-2 Infection at both Individual and Population Levels","author":"Snyder","year":"2020"},{"key":"2025072400195450500_ref35","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1038\/s41392-024-01876-3","article-title":"Nonconserved epitopes dominate reverse preexisting T cell immunity in COVID-19 convalescents","volume":"9","author":"Wang","year":"2024","journal-title":"Sig Transduct Target Ther"},{"key":"2025072400195450500_ref36","doi-asserted-by":"crossref","first-page":"eabf7550","DOI":"10.1126\/sciimmunol.abf7550","article-title":"SARS-CoV-2 genome-wide T cell epitope mapping reveals immunodominance and substantial CD8+ T cell activation in COVID-19 patients","volume":"6","author":"Saini","year":"2021","journal-title":"Sci Immunol"},{"key":"2025072400195450500_ref37","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/j.immuni.2020.06.024","article-title":"Next-generation sequencing of T and B cell receptor repertoires from COVID-19 patients showed signatures associated with severity of disease","volume":"53","author":"Schulthei\u00df","year":"2020","journal-title":"Immunity."},{"volume-title":"Journal of Machine Learning Research","key":"2025072400195450500_ref38"},{"key":"2025072400195450500_ref39","doi-asserted-by":"crossref","first-page":"381","DOI":"10.21275\/ART20203995","article-title":"Machine learning algorithms\u2014a review","volume":"9","author":"Mahesh","year":"2020","journal-title":"IJSR."},{"key":"2025072400195450500_ref40","doi-asserted-by":"crossref","first-page":"e9416","DOI":"10.15252\/msb.20199416","article-title":"Predicting antigen specificity of single T cells based on TCR CDR 3 regions","volume":"16","author":"Fischer","year":"2020","journal-title":"Mol Syst Biol"},{"key":"2025072400195450500_ref41","doi-asserted-by":"crossref","first-page":"4641","DOI":"10.1109\/ACCESS.2018.2789428","article-title":"Class weights random Forest algorithm for processing class imbalanced medical data","volume":"6","author":"Zhu","year":"2018","journal-title":"IEEE Access."},{"key":"2025072400195450500_ref42","doi-asserted-by":"publisher","DOI":"10.1117\/1.JBO.26.10.105001","article-title":"Classification of imbalanced oral cancer image data from high-risk population","volume":"26","author":"Song","year":"2021","journal-title":"J Biomed Opt"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf351\/63836035\/bbaf351.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf351\/63836035\/bbaf351.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T04:20:10Z","timestamp":1753330810000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf351\/8211391"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf351","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7]]},"article-number":"bbaf351"}}