{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T18:34:09Z","timestamp":1781116449687,"version":"3.54.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,5,3]],"date-time":"2021-05-03T00:00:00Z","timestamp":1620000000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01CA226802"],"award-info":[{"award-number":["R01CA226802"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Cincinnati Children\u2019s Hospital Research Foundation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Cytolytic T-cells play an essential role in the adaptive immune system by seeking out, binding and killing cells that present foreign antigens on their surface. An improved understanding of T-cell immunity will greatly aid in the development of new cancer immunotherapies and vaccines for life-threatening pathogens. Central to the design of such targeted therapies are computational methods to predict non-native peptides to elicit a T-cell response, however, we currently lack accurate immunogenicity inference methods. Another challenge is the ability to accurately simulate immunogenic peptides for specific human leukocyte antigen alleles, for both synthetic biological applications, and to augment real training datasets. Here, we propose a beta-binomial distribution approach to derive peptide immunogenic potential from sequence alone. We conducted systematic benchmarking of five traditional machine learning (ElasticNet, K-nearest neighbors, support vector machine, Random Forest and AdaBoost) and three deep learning models (convolutional neural network (CNN), Residual Net and graph neural network) using three independent prior validated immunogenic peptide collections (dengue virus, cancer neoantigen and SARS-CoV-2). We chose the CNN as the best prediction model, based on its adaptivity for small and large datasets and performance relative to existing methods. In addition to outperforming two highly used immunogenicity prediction algorithms, DeepImmuno-CNN correctly predicts which residues are most important for T-cell antigen recognition and predicts novel impacts of SARS-CoV-2 variants. Our independent generative adversarial network (GAN) approach, DeepImmuno-GAN, was further able to accurately simulate immunogenic peptides with physicochemical properties and immunogenicity predictions similar to that of real antigens. We provide DeepImmuno-CNN as source code and an easy-to-use web interface.<\/jats:p>","DOI":"10.1093\/bib\/bbab160","type":"journal-article","created":{"date-parts":[[2021,4,6]],"date-time":"2021-04-06T15:14:18Z","timestamp":1617722058000},"source":"Crossref","is-referenced-by-count":147,"title":["DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity"],"prefix":"10.1093","volume":"22","author":[{"given":"Guangyuan","family":"Li","sequence":"first","affiliation":[{"name":"University of Cincinnati, 3333 Burnet Ave, MLC7024, Cincinnati, OH 45267, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Balaji","family":"Iyer","sequence":"additional","affiliation":[{"name":"University of Cincinnati, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"V B Surya","family":"Prasath","sequence":"additional","affiliation":[{"name":"Biomedical Informatics, Cincinnati Children\u2019s Hospital Medical Center, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yizhao","family":"Ni","sequence":"additional","affiliation":[{"name":"Cincinnati Children\u2019s Hospital Medical Center, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nathan","family":"Salomonis","sequence":"additional","affiliation":[{"name":"Cincinnati Children\u2019s Hospital Medical Center, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2021,5,3]]},"reference":[{"key":"2021110814302366500_ref1","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1038\/s41590-020-00808-x","article-title":"SARS-CoV-2-derived peptides define heterologous and COVID-19-induced T cell recognition","volume":"22","author":"Nelde","year":"2020","journal-title":"Nat Immunol"},{"key":"2021110814302366500_ref2","first-page":"1","article-title":"Li G. T cell antigen discovery","volume":"7","author":"Joglekar","year":"2020","journal-title":"Nat Methods"},{"key":"2021110814302366500_ref3","doi-asserted-by":"crossref","DOI":"10.1101\/171843","article-title":"neoantigenR: an annotation based pipeline for tumor neoantigen identification from sequencing data","author":"Tang","year":"2017"},{"key":"2021110814302366500_ref4","doi-asserted-by":"crossref","first-page":"1119","DOI":"10.1093\/bib\/bbz051","article-title":"A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction","volume":"21","author":"Mei","year":"2020","journal-title":"Brief Bioinform"},{"key":"2021110814302366500_ref5","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1003266","article-title":"Properties of MHC class I presented peptides that enhance immunogenicity","volume":"9","author":"Calis","year":"2013","journal-title":"PLoS Comput Biol"},{"key":"2021110814302366500_ref6","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1038\/s41577-019-0195-7","article-title":"Alternative mRNA splicing in cancer immunotherapy","volume":"19","author":"Frankiw","year":"2019","journal-title":"Nat Rev Immunol"},{"key":"2021110814302366500_ref7","doi-asserted-by":"crossref","first-page":"942","DOI":"10.1093\/bioinformatics\/btm061","article-title":"POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties","volume":"23","author":"Tung","year":"2007","journal-title":"Bioinformatics"},{"key":"2021110814302366500_ref8","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1186\/1471-2105-12-446","article-title":"POPISK: T-cell reactivity prediction using support vector machines and string kernels","volume":"12","author":"Tung","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2021110814302366500_ref9","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/j.jim.2012.09.016","article-title":"PAAQD: predicting immunogenicity of MHC class I binding peptides using amino acid pairwise contact potentials and quantum topological molecular similarity descriptors","volume":"387","author":"Saethang","year":"2013","journal-title":"J Immunol Methods"},{"key":"2021110814302366500_ref10","doi-asserted-by":"crossref","first-page":"1030","DOI":"10.1093\/annonc\/mdy022","article-title":"Neopepsee: accurate genome-level prediction of neoantigens by harnessing sequence and amino acid immunogenicity information","volume":"29","author":"Kim","year":"2018","journal-title":"Ann Oncol"},{"key":"2021110814302366500_ref11","first-page":"2020","article-title":"INeo-Epp: a novel T-cell HLA class-I immunogenicity or neoantigenic epitope prediction method based on sequence-related amino acid features","volume":"5798356","author":"Wang","year":"2020","journal-title":"Biomed Res Int"},{"key":"2021110814302366500_ref12","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.3389\/fimmu.2019.02559","article-title":"DeepHLApan: a deep learning approach for neoantigen prediction considering both HLA-peptide binding and immunogenicity","volume":"10","author":"Wu","year":"2019","journal-title":"Front Immunol"},{"key":"2021110814302366500_ref13","first-page":"3581","article-title":"Semi-supervised learning with deep generative models","volume":"27","author":"Kingma","year":"2014","journal-title":"Adv Neural Inf Proces Syst"},{"key":"2021110814302366500_ref14","doi-asserted-by":"crossref","first-page":"1459107","DOI":"10.1155\/2020\/1459107","article-title":"Generative adversarial network technologies and applications in computer vision","volume":"2020","author":"Jin","year":"2020","journal-title":"Comput Intell Neurosci"},{"key":"2021110814302366500_ref15","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1038\/s42256-019-0017-4","article-title":"Feedback GAN for DNA optimizes protein functions","volume":"1","author":"Gupta","year":"2019","journal-title":"Nat Mach Intell"},{"key":"2021110814302366500_ref16","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.patcog.2017.10.013","article-title":"Recent advances in convolutional neural networks","volume":"77","author":"Gu","year":"2018","journal-title":"Pattern Recogn"},{"key":"2021110814302366500_ref17","doi-asserted-by":"crossref","first-page":"D202","DOI":"10.1093\/nar\/gkm998","article-title":"AAindex: amino acid index database, progress report 2008","volume":"36","author":"Kawashima","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2021110814302366500_ref18","doi-asserted-by":"crossref","first-page":"1477","DOI":"10.1007\/s10994-018-5724-2","article-title":"Similarity encoding for learning with dirty categorical variables","volume":"107","author":"Cerda","year":"2018","journal-title":"Mach Learn"},{"key":"2021110814302366500_ref19","first-page":"214","author":"Arjovsky","year":"2017"},{"key":"2021110814302366500_ref20","doi-asserted-by":"crossref","first-page":"e796","DOI":"10.1371\/journal.pone.0000796","article-title":"NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence","volume":"2","author":"Nielsen","year":"2007","journal-title":"PLoS One"},{"key":"2021110814302366500_ref21","doi-asserted-by":"crossref","first-page":"E2046","DOI":"10.1073\/pnas.1305227110","article-title":"Comprehensive analysis of dengue virus-specific responses supports an HLA-linked protective role for CD8+ T cells","volume":"110","author":"Weiskopf","year":"2013","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110814302366500_ref22","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1016\/j.cell.2020.09.015","article-title":"Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction","volume":"183","author":"Wells","year":"2020","journal-title":"Cell"},{"key":"2021110814302366500_ref23","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2021110814302366500_ref24","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1038\/s42256-020-00257-z","article-title":"Shortcut learning in deep neural networks","volume":"2","author":"Geirhos","year":"2020","journal-title":"Nat Mach Intell"},{"key":"2021110814302366500_ref25","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1126\/science.abe8499","article-title":"SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo","volume":"370","author":"Hou","year":"2020","journal-title":"Science"},{"key":"2021110814302366500_ref26","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.1016\/j.cell.2020.08.012","article-title":"Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding","volume":"182","author":"Starr","year":"2020","journal-title":"Cell"},{"key":"2021110814302366500_ref27","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1126\/science.abf9302","article-title":"Prospective mapping of viral mutations that escape antibodies used to treat COVID-19","volume":"371","author":"Starr","year":"2021","journal-title":"Science"},{"key":"2021110814302366500_ref28","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1016\/j.coi.2009.07.008","article-title":"Structural alterations in peptide-MHC recognition by self-reactive T cell receptors","volume":"21","author":"Wucherpfennig","year":"2009","journal-title":"Curr Opin Immunol"},{"key":"2021110814302366500_ref29","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1146\/annurev.immunol.23.021704.115658","article-title":"How TCRs bind MHCs, peptides, and coreceptors","volume":"24","author":"Rudolph","year":"2006","journal-title":"Annu Rev Immunol"},{"key":"2021110814302366500_ref30","doi-asserted-by":"crossref","first-page":"2908","DOI":"10.1038\/s41467-020-16755-y","article-title":"Structural basis for oligoclonal T cell recognition of a shared p53 cancer neoantigen","volume":"11","author":"Wu","year":"2020","journal-title":"Nat Commun"},{"key":"2021110814302366500_ref31","doi-asserted-by":"crossref","first-page":"4946","DOI":"10.1093\/bioinformatics\/btz427","article-title":"ACME: pan-specific peptide-MHC class I binding prediction through attention-based deep neural networks","volume":"35","author":"Hu","year":"2019","journal-title":"Bioinformatics"},{"issue":"3","key":"2021110814302366500_ref32","first-page":"1","article-title":"Use of molecular modeling and site-directed mutagenesis to define the structural basis for the immune response to carbohydrate xenoantigens","volume":"8","author":"Kearns-Jonker","year":"2007","journal-title":"BMC Immunol"},{"key":"2021110814302366500_ref33","article-title":"The Python Language Reference Manual","author":"Van Rossum","year":"2011"},{"key":"2021110814302366500_ref34","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1016\/j.cels.2020.09.001","article-title":"MHCflurry 2.0: improved Pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing","volume":"11","author":"O\u2019Donnell","year":"2020","journal-title":"Cell Syst"},{"key":"2021110814302366500_ref35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-77466-4","article-title":"Identification and validation of 174 COVID-19 vaccine candidate epitopes reveals low performance of common epitope prediction tools","volume":"10","author":"Prachar","year":"2020","journal-title":"Sci Rep"},{"key":"2021110814302366500_ref36","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2021110814302366500_ref37","doi-asserted-by":"crossref","first-page":"3427","DOI":"10.1093\/bioinformatics\/bty364","article-title":"Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks","volume":"34","author":"Pan","year":"2018","journal-title":"Bioinformatics"},{"key":"2021110814302366500_ref38","doi-asserted-by":"crossref","first-page":"5323","DOI":"10.1093\/bioinformatics\/btz517","article-title":"TCR3d: the T cell receptor structural repertoire database","volume":"35","author":"Gowthaman","year":"2019","journal-title":"Bioinformatics"},{"key":"2021110814302366500_ref39","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gkz874","article-title":"VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium","volume":"48","author":"Bagaev","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2021110814302366500_ref40","doi-asserted-by":"crossref","first-page":"1293","DOI":"10.1016\/j.cell.2018.05.060","article-title":"Single-cell map of diverse immune phenotypes in the breast tumor microenvironment","volume":"174","author":"Azizi","year":"2018","journal-title":"Cell"},{"key":"2021110814302366500_ref41","first-page":"2018","article-title":"Description of CD8 regulatory T lymphocytes and their specific intervention in graft-versus-host and infectious diseases, autoimmunity, and cancer","volume":"3758713","author":"Vieyra-Lobato","year":"2018","journal-title":"J Immunol Res"},{"key":"2021110814302366500_ref42","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1006\/jtbi.1994.1160","article-title":"T cell repertoires and competitive exclusion","volume":"169","author":"De Boer","year":"1994","journal-title":"J Theor Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab160\/41087451\/bbab160.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab160\/41087451\/bbab160.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T09:34:16Z","timestamp":1636364056000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab160\/6261914"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,3]]},"references-count":42,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab160","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.12.24.424262","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,5,3]]},"article-number":"bbab160"}}