{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T17:56:01Z","timestamp":1774288561071,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2017,4,21]],"date-time":"2017-04-21T00:00:00Z","timestamp":1492732800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Many biological processes are governed by protein\u2013ligand interactions. One such example is the recognition of self and non-self cells by the immune system. This immune response process is regulated by the major histocompatibility complex (MHC) protein which is encoded by the human leukocyte antigen (HLA) complex. Understanding the binding potential between MHC and peptides can lead to the design of more potent, peptide-based vaccines and immunotherapies for infectious autoimmune diseases.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We apply machine learning techniques from the natural language processing (NLP) domain to address the task of MHC-peptide binding prediction. More specifically, we introduce a new distributed representation of amino acids, name HLA-Vec, that can be used for a variety of downstream proteomic machine learning tasks. We then propose a deep convolutional neural network architecture, name HLA-CNN, for the task of HLA class I-peptide binding prediction. Experimental results show combining the new distributed representation with our HLA-CNN architecture achieves state-of-the-art results in the majority of the latest two Immune Epitope Database (IEDB) weekly automated benchmark datasets. We further apply our model to predict binding on the human genome and identify 15 genes with potential for self binding.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and Implementation<\/jats:title>\n                    <jats:p>Codes to generate the HLA-Vec and HLA-CNN are publicly available at: https:\/\/github.com\/uci-cbcl\/HLA-bind.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx264","type":"journal-article","created":{"date-parts":[[2017,4,20]],"date-time":"2017-04-20T03:52:13Z","timestamp":1492660333000},"page":"2658-2665","source":"Crossref","is-referenced-by-count":93,"title":["HLA class I binding prediction via convolutional neural networks"],"prefix":"10.1093","volume":"33","author":[{"given":"Yeeleng S","family":"Vang","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of California, Irvine, CA, USA"}]},{"given":"Xiaohui","family":"Xie","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California, Irvine, CA, USA"},{"name":"Institute for Genomics and Bioinformatics, University of California, Irvine, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,4,21]]},"reference":[{"key":"2023020206270840300_btx264-B1","first-page":"btv639.","article-title":"Gapped sequence alignment using artificial neural networks: application to the MHC class I system","author":"Andreatta","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020206270840300_btx264-B2","doi-asserted-by":"crossref","first-page":"e0141287.","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PloS One"},{"key":"2023020206270840300_btx264-B4","doi-asserted-by":"crossref","first-page":"6395","DOI":"10.1073\/pnas.0408677102","article-title":"Solving the protein sequence metric problem","volume":"102","author":"Atchley","year":"2005","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023020206270840300_btx264-B5","first-page":"238","article-title":"Don\u2019t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors","volume":"1","author":"Baroni","year":"2014","journal-title":"ACL"},{"key":"2023020206270840300_btx264-B3","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1016\/j.ymeth.2004.06.006","article-title":"Computational methods for prediction of T-cell epitopesa framework for modelling, testing, and applications","volume":"34","author":"Brusic","year":"2004","journal-title":"Methods"},{"key":"2023020206270840300_btx264-B6","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/S1359-6446(03)02953-2","article-title":"Minimizing the immunogenicity of protein therapeutics","volume":"9","author":"Chirino","year":"2004","journal-title":"Drug Discovery Today"},{"key":"2023020206270840300_btx264-B7","doi-asserted-by":"crossref","first-page":"4580","DOI":"10.1073\/pnas.1201586109","article-title":"Promiscuous binding of extracellular peptides to cell surface class I MHC protein","volume":"109","author":"Eisen","year":"2012","journal-title":"Proc Natl Acad Sci"},{"key":"2023020206270840300_btx264-B8","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1034\/j.1399-0039.2002.590202.x","article-title":"HLA Class II peptidebinding and autoimmunity","volume":"59","author":"Gebe","year":"2002","journal-title":"Tissue Antigens"},{"key":"2023020206270840300_btx264-B9","first-page":"249","article-title":"Understanding the difficulty of training deep feedforward neural networks","volume":"9","author":"Glorot","year":"2010","journal-title":"In Aistats"},{"key":"2023020206270840300_btx264-B10","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc Natl Acad Sci"},{"key":"2023020206270840300_btx264-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00251-008-0341-z","article-title":"NetMHCpan, a method for MHC class I binding prediction beyond humans","volume":"61","author":"Hoof","year":"2009","journal-title":"Immunogenet"},{"key":"2023020206270840300_btx264-B12","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1038\/nature11147","article-title":"Immune self-reactivity triggered by drug-modified HLA-peptide repertoire","volume":"486","author":"Illing","year":"2012","journal-title":"Nature"},{"key":"2023020206270840300_btx264-B13","volume-title":"Immunobiology: The Immune System in Health and Disease","author":"Janeway","year":"2001","edition":"5th edn."},{"key":"2023020206270840300_btx264-B14","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/1479-5876-1-8","article-title":"Polymorphism in clinical immunology-from HLA typing to immunogenetic profiling","volume":"1","author":"Jin","year":"2003","journal-title":"J Transl Med"},{"key":"2023020206270840300_btx264-B15","author":"Kalchbrenner","year":"2014"},{"key":"2023020206270840300_btx264-B16","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/BF01025492","article-title":"Statistical analysis of the physical properties of the 20 naturally occurring amino acids","volume":"4","author":"Kidera","year":"1985","journal-title":"J Protein Chem"},{"key":"2023020206270840300_btx264-B17","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-10-394","article-title":"Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior","volume":"10","author":"Kim","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020206270840300_btx264-B18","author":"Kim","year":"2014"},{"key":"2023020206270840300_btx264-B19","doi-asserted-by":"crossref","first-page":"e1003088.","DOI":"10.1371\/journal.pcbi.1003088","article-title":"Scrutinizing MHC-I binding peptides and their limits of variation","volume":"9","author":"Koch","year":"2013","journal-title":"PLoS Comput Biol"},{"key":"2023020206270840300_btx264-B20","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","author":"Krizhevsky","year":"2012","journal-title":"In Advances in Neural Information Processing Systems"},{"key":"2023020206270840300_btx264-B21","first-page":"btv371.","article-title":"High-order neural networks and kernel methods for peptide-MHC binding prediction","author":"Kuksa","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020206270840300_btx264-B22","doi-asserted-by":"crossref","first-page":"61.","DOI":"10.1186\/1756-0500-2-61","article-title":"MHCBN 4.0: A database of MHC\/TAP binding peptides and T-cell epitopes","volume":"2","author":"Lata","year":"2009","journal-title":"BMC Res. Notes"},{"key":"2023020206270840300_btx264-B23","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput"},{"key":"2023020206270840300_btx264-B24","first-page":"211","article-title":"Improving distributional similarity with lessons learned from word embeddings","volume":"3","author":"Levy","year":"2015","journal-title":"Trans Assoc Comput Ling"},{"key":"2023020206270840300_btx264-B25","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1093\/bioinformatics\/btn128","article-title":"Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers","volume":"24","author":"Lundegaard","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020206270840300_btx264-B26","doi-asserted-by":"crossref","first-page":"S9","DOI":"10.1186\/1471-2105-16-S13-S9","article-title":"Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis","volume":"16, (Suppl. 13)","author":"Luo","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023020206270840300_btx264-B27","doi-asserted-by":"crossref","DOI":"10.1038\/srep32115","article-title":"sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides","volume":"6","author":"Luo","year":"2016","journal-title":"Scientific Reports"},{"key":"2023020206270840300_btx264-B28","article-title":"Rectifier nonlinearities improve neural network acoustic models","volume":"30","author":"Maas","year":"2013","journal-title":"In Proc. ICML"},{"key":"2023020206270840300_btx264-B29","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res"},{"key":"2023020206270840300_btx264-B30","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1111\/j.1399-0039.2010.01466.x","article-title":"Nomenclature for factors of the HLA system, 2010","volume":"75","author":"Marsh","year":"2010","journal-title":"Tissue Antigens"},{"key":"2023020206270840300_btx264-B31","doi-asserted-by":"crossref","first-page":"2","DOI":"10.6026\/97320630001002","article-title":"Apdbase: Amino acid physico-chemical properties database","volume":"1","author":"Mathura","year":"2005","journal-title":"Bioinformation"},{"key":"2023020206270840300_btx264-B32","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","author":"Mikolov","year":"2013","journal-title":"Adv. Neural Inform. Process. Syst"},{"key":"2023020206270840300_btx264-B33","article-title":"Efficient estimation of word representations in vector space","author":"Mikolov","year":"2013","journal-title":"ICLR Workshop"},{"key":"2023020206270840300_btx264-B34","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1110\/ps.0239403","article-title":"Reliable prediction of T-cell epitopes using neural networks with novel sequence representations","volume":"12","author":"Nielsen","year":"2003","journal-title":"Protein Sci"},{"key":"2023020206270840300_btx264-B35","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/s13073-016-0288-x","article-title":"NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets","volume":"8","author":"Nielsen","year":"2016","journal-title":"Genome Med"},{"key":"2023020206270840300_btx264-B36","doi-asserted-by":"crossref","first-page":"5831","DOI":"10.4049\/jimmunol.1302101","article-title":"HLA class I alleles are associated with peptide-binding repertoires of different size, affinity, and immunogenicity","volume":"191","author":"Paul","year":"2013","journal-title":"J. Immunol"},{"key":"2023020206270840300_btx264-B37","first-page":"1532","article-title":"Glove: global vectors for word representation","volume":"14","author":"Pennington","year":"2014","journal-title":"EMNLP"},{"key":"2023020206270840300_btx264-B38","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1007\/s002510050595","article-title":"SYFPEITHI: database for MHC ligands and peptide motifs","volume":"50","author":"Rammensee","year":"1999","journal-title":"Immunogenetics"},{"key":"2023020206270840300_btx264-B39","doi-asserted-by":"crossref","first-page":"453","DOI":"10.2174\/138920207783591690","article-title":"The HLA region and autoimmune disease: associations and mechanisms of action","volume":"8","author":"Simmonds","year":"2007","journal-title":"Current Genomics"},{"key":"2023020206270840300_btx264-B40","author":"Simonyan","year":"2014"},{"key":"2023020206270840300_btx264-B41","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023020206270840300_btx264-B42","first-page":"140","volume-title":"European Conference on Computer Vision","author":"Taylor","year":"2010"},{"key":"2023020206270840300_btx264-B43","doi-asserted-by":"crossref","first-page":"4.","DOI":"10.1186\/1745-7580-1-4","article-title":"AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data","volume":"1","author":"Toseland","year":"2005","journal-title":"Immunome Res"},{"key":"2023020206270840300_btx264-B44","first-page":"btv123.","article-title":"Automated benchmarking of peptide-MHC class I binding predictions","author":"Trolle","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020206270840300_btx264-B45","first-page":"535.","article-title":"HLA class II molecules (HLA-DR,-DP,-DQ) on cells in the human CNS studied in situ and in vitro","volume":"82","author":"Ulvestad","year":"1994","journal-title":"Immunology"},{"key":"2023020206270840300_btx264-B46","doi-asserted-by":"crossref","first-page":"D158","DOI":"10.1093\/nar\/gkw1099","article-title":"UniProt: the universal protein knowledgebae","volume":"45","author":"The UniProt Consortium","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023020206270840300_btx264-B47","doi-asserted-by":"crossref","first-page":"916","DOI":"10.1016\/j.addr.2005.11.003","article-title":"Improved peptide vaccine strategies, creating synthetic artificial infections to maximize immune efficacy","volume":"58","author":"van der Burg","year":"2006","journal-title":"Adv Drug Deliv. Rev"},{"key":"2023020206270840300_btx264-B48","doi-asserted-by":"crossref","first-page":"D405","DOI":"10.1093\/nar\/gku938","article-title":"The immune epitope database (IEDB) 3.0","volume":"43","author":"Vita","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020206270840300_btx264-B49","doi-asserted-by":"crossref","first-page":"75","DOI":"10.2174\/1386207318666150121125746","article-title":"Quantitative prediction of class I MHC\/epitope binding affinity using QSAR modeling derived from amino acid structural information","volume":"18","author":"Wang","year":"2015","journal-title":"Comb. Chem. High Throughput Screen"},{"key":"2023020206270840300_btx264-B50","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/S1525-1578(10)60658-7","article-title":"Human leukocyte antigen gene polymorphism and the histocompatibility laboratory","volume":"3.3","author":"Williams","year":"2001","journal-title":"J. Mol. Diagn"},{"key":"2023020206270840300_btx264-B51","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1146\/annurev.immunol.17.1.51","article-title":"Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses 1","volume":"17","author":"Yewdell","year":"1999","journal-title":"Annu. Rev. Immunol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/17\/2658\/49040904\/bioinformatics_33_17_2658.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/17\/2658\/49040904\/bioinformatics_33_17_2658.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T01:29:48Z","timestamp":1675301388000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/17\/2658\/3746909"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,4,21]]},"references-count":51,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2017,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx264","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/099358","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,9,1]]},"published":{"date-parts":[[2017,4,21]]}}}