{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T18:13:36Z","timestamp":1778696016121,"version":"3.51.4"},"reference-count":54,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T00:00:00Z","timestamp":1651795200000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Senior and Junior Technological Innovation Team","award":["20210509055RQ"],"award-info":[{"award-number":["20210509055RQ"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072212"],"award-info":[{"award-number":["62072212"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U19A2061"],"award-info":[{"award-number":["U19A2061"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Jilin Provincial Key Laboratory of Big Data Intelligent Computing","award":["20180622002JC"],"award-info":[{"award-number":["20180622002JC"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Human Leukocyte Antigen (HLA) is a type of molecule residing on the surfaces of most human cells and exerts an essential role in the immune system responding to the invasive items. The T cell antigen receptors may recognize the HLA-peptide complexes on the surfaces of cancer cells and destroy these cancer cells through toxic T lymphocytes. The computational determination of HLA-binding peptides will facilitate the rapid development of cancer immunotherapies. This study hypothesized that the natural language processing-encoded peptide features may be further enriched by another deep neural network. The hypothesis was tested with the Bi-directional Long Short-Term Memory-extracted features from the pretrained Protein Bidirectional Encoder Representations from Transformers-encoded features of the class I HLA (HLA-I)-binding peptides. The experimental data showed that our proposed HLAB feature engineering algorithm outperformed the existing ones in detecting the HLA-I-binding peptides. The extensive evaluation data show that the proposed HLAB algorithm outperforms all the seven existing studies on predicting the peptides binding to the HLA-A*01:01 allele in AUC and achieves the best average AUC values on the six out of the seven k-mers (k=8,9,...,14, respectively represent the prediction task of a polypeptide consisting of k amino acids) except for the 9-mer prediction tasks. The source code and the fine-tuned feature extraction models are available at http:\/\/www.healthinformaticslab.org\/supp\/resources.php.<\/jats:p>","DOI":"10.1093\/bib\/bbac173","type":"journal-article","created":{"date-parts":[[2022,4,19]],"date-time":"2022-04-19T11:19:31Z","timestamp":1650367171000},"source":"Crossref","is-referenced-by-count":49,"title":["HLAB: learning the BiLSTM features from the ProtBert-encoded proteins for the class I HLA-peptide binding prediction"],"prefix":"10.1093","volume":"23","author":[{"given":"Yaqi","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Biology & Engineering, Guizhou Medical University , Guiyang, Guizhou 550004, P.R. China"},{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gancheng","family":"Zhu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kewei","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fei","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lan","family":"Huang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7171-2695","authenticated-orcid":false,"given":"Meiyu","family":"Duan","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8108-6007","authenticated-orcid":false,"given":"Fengfeng","family":"Zhou","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University , Changchun, Jilin 130012, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,5,5]]},"reference":[{"key":"2022092013195251800_ref1","first-page":"1","volume-title":"Peptide Hormones","author":"Rudinger","year":"1976"},{"key":"2022092013195251800_ref2","doi-asserted-by":"crossref","first-page":"3343","DOI":"10.1074\/mcp.M113.036194","article-title":"Mechanistic peptidomics: factors that dictate specificity in the formation of endogenous peptides in human milk","volume":"13","author":"Guerrero","year":"2014","journal-title":"Mol Cell Proteomics"},{"key":"2022092013195251800_ref3","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1146\/annurev-immunol-032712-095910","article-title":"Pathways of antigen processing","volume":"31","author":"Blum","year":"2013","journal-title":"Annu Rev Immunol"},{"key":"2022092013195251800_ref4","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/S1074-7613(01)00170-4","article-title":"How much TCR does a T cell need?","volume":"15","author":"Labrecque","year":"2001","journal-title":"Immunity"},{"key":"2022092013195251800_ref5","doi-asserted-by":"crossref","first-page":"75","DOI":"10.2174\/1386207318666150121125746","article-title":"Quantitative prediction of class I MHC\/epitope binding affinity using QSAR modeling derived from amino acid structural information","volume":"18","author":"Wang","year":"2015","journal-title":"Comb Chem High Throughput Screen"},{"key":"2022092013195251800_ref6","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1093\/bioinformatics\/bty140","article-title":"iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref7","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: a sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res"},{"key":"2022092013195251800_ref8","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.jim.2014.01.015","article-title":"Improving the prediction of HLA class I-binding peptides using a supertype-based method","volume":"405","author":"Wang","year":"2014","journal-title":"J Immunol Methods"},{"key":"2022092013195251800_ref9","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbaa415","article-title":"Anthem: a user customised tool for fast and accurate prediction of binding between peptides and HLA class I molecules","volume":"22","author":"Mei","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref10","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/s10994-005-4258-6","article-title":"Not so naive Bayes: aggregating one-dependence estimators","volume":"58","author":"Webb","year":"2005","journal-title":"Mach Learn"},{"key":"2022092013195251800_ref11","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.3389\/fimmu.2019.02559","article-title":"DeepHLApan: a deep learning approach for neoantigen prediction considering both HLA-peptide binding and immunogenicity","volume":"10","author":"Wu","year":"2019","journal-title":"Front Immunol"},{"key":"2022092013195251800_ref12","article-title":"A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction","volume":"4","author":"Mei","year":"2020","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref13","article-title":"Contextual lstm (clstm) models for large scale nlp tasks","author":"Ghosh"},{"key":"2022092013195251800_ref14","first-page":"81","volume-title":"Biological, Translational, and Clinical Language Processing","author":"Chapman","year":"2007"},{"key":"2022092013195251800_ref15","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2022092013195251800_ref16","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin"},{"key":"2022092013195251800_ref17","article-title":"ProtTrans: towards cracking the language of Life\u2019s code through self-supervised deep learning and high performance computing","author":"Elnaggar","year":"2020"},{"key":"2022092013195251800_ref18","doi-asserted-by":"crossref","first-page":"861","DOI":"10.21105\/joss.00861","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","volume":"3","author":"Mcinnes","year":"2018","journal-title":"J Open Source Softw"},{"key":"2022092013195251800_ref19","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref20","doi-asserted-by":"crossref","first-page":"2542","DOI":"10.1038\/s41467-018-04964-5","article-title":"Clustering huge protein sequence sets in linear time","volume":"9","author":"Martin","year":"2018","journal-title":"Nat Commun"},{"key":"2022092013195251800_ref21","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","author":"UniProt, Consortium","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2022092013195251800_ref22","doi-asserted-by":"crossref","first-page":"bbaa124","DOI":"10.1093\/bib\/bbaa124","article-title":"DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites","volume":"22","author":"Liu","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref23","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1145\/3267851.3267878","volume-title":"Proceedings of the 18th International Conference on Intelligent Virtual Agents","author":"Hasegawa","year":"2018"},{"key":"2022092013195251800_ref24","article-title":"StaBle-ABPpred: a stacked ensemble predictor based on biLSTM and attention mechanism for accelerated discovery of antibacterial peptides","volume":"24","author":"Singh","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref25","article-title":"Deep-AFPpred: identifying novel antifungal peptides using pretrained embeddings from seq2vec with 1DCNN-BiLSTM","volume":"23","author":"Sharma","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref26","doi-asserted-by":"crossref","first-page":"1542","DOI":"10.1093\/bioinformatics\/btz763","article-title":"Feature selection may improve deep neural networks for the bioinformatics problems","volume":"36","author":"Chen","year":"2020","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref27","article-title":"Breast cancer detection from thermal images using a Grunwald-Letnikov-aided dragonfly algorithm-based deep feature selection method","volume":"141","author":"Chatterjee","year":"2021","journal-title":"Comput Biol Med"},{"key":"2022092013195251800_ref28","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1037\/h0071325","article-title":"Analysis of a complex of statistical variables into principal components","volume":"24","author":"Hotellings","year":"1932","journal-title":"Br J Educ Psychol"},{"key":"2022092013195251800_ref29","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Laurens","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2022092013195251800_ref30","article-title":"Umap: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2018"},{"key":"2022092013195251800_ref31","doi-asserted-by":"crossref","first-page":"104871","DOI":"10.1016\/j.compbiomed.2021.104871","article-title":"Artificial intelligence for quality control of oscillometry measures","volume":"138","author":"Veneroni","year":"2021","journal-title":"Comput Biol Med"},{"key":"2022092013195251800_ref32","article-title":"NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data","volume":"48","author":"Birkir","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022092013195251800_ref33","doi-asserted-by":"crossref","first-page":"e1005725","DOI":"10.1371\/journal.pcbi.1005725","article-title":"Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity","volume":"13","author":"Bassani-Sternberg","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2022092013195251800_ref34","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/s00251-011-0579-8","article-title":"NetMHCcons: a consensus method for the major histocompatibility complex class I predictions","volume":"64","author":"Karosiene","year":"2012","journal-title":"Immunogenetics"},{"key":"2022092013195251800_ref35","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.4049\/jimmunol.1600582","article-title":"Pan-specific prediction of peptide-MHC class I complex stability, a correlate of T cell immunogenicity","volume":"197","author":"","year":"2016","journal-title":"J Immunol"},{"key":"2022092013195251800_ref36","article-title":"ACME: pan-specific peptide\u2013MHC class I binding prediction through attention-based deep neural networks","volume":"23","author":"Hu","year":"2019","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref37","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-019-2892-4","article-title":"MHCSeqNet: a deep neural network model for universal MHC binding prediction","volume":"20","author":"Phloyphisut","year":"2019","journal-title":"BMC Bioinform"},{"key":"2022092013195251800_ref38","article-title":"DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction","volume":"9","author":"Liu","year":"2019","journal-title":"Sci Rep"},{"key":"2022092013195251800_ref39","article-title":"Non-contact screening system based for COVID-19 on XGBoost and logistic regression","volume":"141","author":"Dong","year":"2021","journal-title":"Comput Biol Med"},{"key":"2022092013195251800_ref40","article-title":"A network-based method for brain disease gene prediction by integrating brain connectome and molecular network","volume":"23","author":"Wang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref41","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.knosys.2017.10.032","article-title":"An approach to EEG-based gender recognition using entropy measurement methods","volume":"140","author":"Hu","year":"2018","journal-title":"Knowl Based Syst"},{"key":"2022092013195251800_ref42","doi-asserted-by":"crossref","first-page":"104664","DOI":"10.1016\/j.compbiomed.2021.104664","article-title":"Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier","volume":"136","author":"Prabha","year":"2021","journal-title":"Comput Biol Med"},{"key":"2022092013195251800_ref43","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btab394","article-title":"Robust and efficient single-cell Hi-C clustering with approximate k-nearest neighbor graphs","volume":"37","author":"Wolff","year":"2021","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref44","doi-asserted-by":"crossref","first-page":"104089","DOI":"10.1016\/j.compbiomed.2020.104089","article-title":"Application of decision tree-based ensemble learning in the classification of breast cancer","volume":"128","author":"Ghiasi","year":"2021","journal-title":"Comput Biol Med"},{"key":"2022092013195251800_ref45","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1186\/s12859-019-2924-0","article-title":"Enhancing ontology-driven diagnostic reasoning with a symptom-dependency-aware naive Bayes classifier","volume":"20","author":"Shen","year":"2019","journal-title":"BMC Bioinform"},{"key":"2022092013195251800_ref46","article-title":"ALBERT: A Lite BERT for self-supervised learning of language representations","author":"Lan"},{"key":"2022092013195251800_ref47","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1093\/bioinformatics\/btab712","article-title":"BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models","volume":"38","author":"Qiao","year":"2021","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref48","doi-asserted-by":"crossref","first-page":"4857","DOI":"10.1021\/acs.jcim.1c00458","article-title":"Toward guided mutagenesis: Gaussian process regression predicts MHC class II antigen mutant binding","volume":"61","author":"Bell","year":"2021","journal-title":"J Chem Inf Model"},{"key":"2022092013195251800_ref49","doi-asserted-by":"crossref","first-page":"15039","DOI":"10.1021\/acsomega.0c00857","article-title":"Recommender systems in antiviral drug discovery","volume":"5","author":"Sosnina","year":"2020","journal-title":"ACS Omega"},{"key":"2022092013195251800_ref50","article-title":"MHCAttnNet: predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model","volume":"36","author":"Gopalakrishnan","year":"2020","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref51","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btab687","article-title":"CNN-PepPred: an open-source tool to create convolutional NN models for the discovery of patterns in peptide sets\u2014application to peptide\u2013MHC class II binding prediction","volume":"37","author":"Junet","year":"2021","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref52","article-title":"PRISMOID: a comprehensive 3D structure database for post-translational modifications and mutations with functional impact","volume":"21","author":"Li","year":"2019","journal-title":"Brief Bioinform"},{"key":"2022092013195251800_ref53","article-title":"GMSimpute: a generalized two-step Lasso approach to impute missing values in label-free mass spectrum analysis","volume":"36","author":"Li","year":"2019","journal-title":"Bioinformatics"},{"key":"2022092013195251800_ref54","article-title":"GPS-Uber: a hybrid-learning framework for prediction of general and E3-specific lysine ubiquitination sites","volume":"23","author":"Wang","year":"2022","journal-title":"Brief Bioinform"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac173\/45936212\/bbac173.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac173\/45936212\/bbac173.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T05:08:01Z","timestamp":1700456881000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac173\/6581432"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,5]]},"references-count":54,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac173","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9]]},"published":{"date-parts":[[2022,5,5]]},"article-number":"bbac173"}}