{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:43:49Z","timestamp":1753875829496,"version":"3.41.2"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T00:00:00Z","timestamp":1661212800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Agricultural Bioinformatics and Computational Biology"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11\u201356 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition\u2013transition\u2013distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC\u2009+\u2009AC and APC\u2009+\u2009APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.<\/jats:p>","DOI":"10.1093\/bib\/bbac356","type":"journal-article","created":{"date-parts":[[2022,8,23]],"date-time":"2022-08-23T23:38:08Z","timestamp":1661297888000},"source":"Crossref","is-referenced-by-count":4,"title":["A comparative analysis of amino acid encoding schemes for the prediction of flexible length linear B-cell epitopes"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0799-0660","authenticated-orcid":false,"given":"Tanmaya Kumar","family":"Sahu","sequence":"first","affiliation":[{"name":"ICAR-Indian Agricultural Statistics Research Institute , New Delhi, India"},{"name":"ICAR-National Bureau of Plant Genetic Resources , New Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7098-8785","authenticated-orcid":false,"given":"Prabina Kumar","family":"Meher","sequence":"additional","affiliation":[{"name":"ICAR-Indian Agricultural Statistics Research Institute , New Delhi, India"}]},{"given":"Nalini Kanta","family":"Choudhury","sequence":"additional","affiliation":[{"name":"ICAR-Indian Agricultural Statistics Research Institute , New Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3914-6248","authenticated-orcid":false,"given":"Atmakuri Ramakrishna","family":"Rao","sequence":"additional","affiliation":[{"name":"ICAR-Indian Agricultural Statistics Research Institute , New Delhi, India"},{"name":"Indian Council of Agricultural Research (ICAR) , New Delhi, India"}]}],"member":"286","published-online":{"date-parts":[[2022,8,23]]},"reference":[{"key":"2022092013241003000_ref1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2017\/9363750","article-title":"Computer-aided design of an epitope-based vaccine against epstein-barr virus","volume":"2017","author":"Alonso-Padilla","year":"2017","journal-title":"J Immunol Res"},{"key":"2022092013241003000_ref2","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1186\/1471-2105-12-251","article-title":"Determinants of antigenicity and specificity in immune response for protein sequences","volume":"12","author":"Wang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2022092013241003000_ref3","first-page":"17","article-title":"Epitope prediction algorithms for peptide-based vaccine design","volume":"2","author":"Florea","year":"2003","journal-title":"Proc IEEE Comput Soc Bioinform Conf"},{"key":"2022092013241003000_ref4","first-page":"197","volume-title":"BcePred: Prediction of Continuous B-cell Epitopes in Antigenic Sequences Using Physico-Chemical Properties","author":"Saha","year":"2004"},{"key":"2022092013241003000_ref5","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/1745-7580-2-2","article-title":"Improved method for predicting linear B-cell epitopes","volume":"2","author":"Larsen","year":"2006","journal-title":"Immunome Res"},{"key":"2022092013241003000_ref6","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1142\/9781848162648_0011","article-title":"Predicting flexible length linear B-cell epitopes","volume":"7","author":"El-Manzalawy","year":"2008","journal-title":"Comput Syst Bioinformatics Conf"},{"issue":"4","key":"2022092013241003000_ref7","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1002\/jmr.893","article-title":"Predicting linear B-cell epitopes using string kernels","volume":"21","author":"EL-Manzalawy","year":"2008","journal-title":"J Mol Recognit"},{"issue":"3","key":"2022092013241003000_ref8","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1002\/jmr.771","article-title":"Machine learning approaches for prediction of linear B-cell epitopes on proteins","volume":"19","author":"S\u00f6llner","year":"2006","journal-title":"J Mol Recognit"},{"issue":"3","key":"2022092013241003000_ref9","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1007\/s00726-006-0485-9","article-title":"Prediction of linear B-cell epitopes using amino acid pair antigenicity scale","volume":"33","author":"Chen","year":"2007","journal-title":"Amino Acids"},{"key":"2022092013241003000_ref10","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/s13040-015-0047-3","article-title":"Predicting linear B-cell epitopes using amino acid anchoring pair composition","volume":"8","author":"Shen","year":"2015","journal-title":"BioData Min"},{"issue":"5","key":"2022092013241003000_ref11","doi-asserted-by":"crossref","first-page":"e62216","DOI":"10.1371\/journal.pone.0062216","article-title":"Improved method for linear B-cell epitope prediction using antigen's primary sequence","volume":"8","author":"Singh","year":"2013","journal-title":"PLoS One"},{"issue":"5","key":"2022092013241003000_ref12","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1016\/j.gpb.2019.04.004","article-title":"iLBE for computational identification of linear B-cell epitopes by integrating sequence and evolutionary features","volume":"18","author":"Hasan","year":"2020","journal-title":"Genom Proteom Bioinf"},{"key":"2022092013241003000_ref13","doi-asserted-by":"crossref","first-page":"1695","DOI":"10.3389\/fimmu.2018.01695","article-title":"iBCE-EL: a new ensemble learning framework for improved linear B-cell epitope prediction","volume":"9","author":"Manavalan","year":"2018","journal-title":"Front Immunol"},{"issue":"23","key":"2022092013241003000_ref14","doi-asserted-by":"crossref","first-page":"4517","DOI":"10.1093\/bioinformatics\/btab467","article-title":"EpitopeVec: linear epitope prediction using deep protein sequence embeddings","volume":"37","author":"Bahai","year":"2021","journal-title":"Bioinformatics"},{"issue":"4","key":"2022092013241003000_ref15","doi-asserted-by":"crossref","first-page":"448","DOI":"10.1093\/bioinformatics\/btaa773","article-title":"EpiDope: a deep neural network for linear B-cell epitope prediction","volume":"37","author":"Collatz","year":"2021","journal-title":"Bioinformatics"},{"key":"2022092013241003000_ref16","volume-title":"Janeway's Immunobiology","author":"Murphy","year":"2012","edition":"8th"},{"key":"2022092013241003000_ref17","volume-title":"B Cells and Antibodies. Molecular Biology of the Cell","author":"Alberts","year":"2002","edition":"4th"},{"issue":"2","key":"2022092013241003000_ref18","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1038\/nri3383","article-title":"Marginal zone B cells: virtues of innate-like antibody-producing lymphocytes","volume":"13","author":"Cerutti","year":"2013","journal-title":"Nat Rev Immunol"},{"issue":"1","key":"2022092013241003000_ref19","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1111\/j.1600-065X.2012.01124.x","article-title":"Germinal center selection and the development of memory B and plasma cells","volume":"247","author":"Shlomchik","year":"2012","journal-title":"Immunol Rev"},{"issue":"1","key":"2022092013241003000_ref20","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1146\/annurev-immunol-032712-095910","article-title":"Pathways of antigen processing","volume":"31","author":"Blum","year":"2013","journal-title":"Annu Rev Immunol"},{"issue":"1","key":"2022092013241003000_ref21","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1146\/annurev.immunol.23.021704.115728","article-title":"Marginal zone B cells","volume":"23","author":"Pillai","year":"2005","journal-title":"Annu Rev Immunol"},{"key":"2022092013241003000_ref22","doi-asserted-by":"crossref","first-page":"298","DOI":"10.3389\/fimmu.2019.00298","article-title":"Antibody specific B-cell epitope predictions: leveraging information from antibody-antigen protein complexes","volume":"10","author":"Jespersen","year":"2019","journal-title":"Front Immunol"},{"key":"2022092013241003000_ref23","doi-asserted-by":"crossref","first-page":"2002","DOI":"10.1016\/j.bbapap.2014.07.006","article-title":"Antibody informatics for drug discovery","volume":"1844","author":"Shirai","year":"2014","journal-title":"Biochim Biophys Acta"},{"issue":"1","key":"2022092013241003000_ref24","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1002\/prot.21078","article-title":"Prediction of continuous B-cell epitopes in an antigen using recurrent neural network","volume":"65","author":"Saha","year":"2006","journal-title":"Proteins"},{"issue":"9","key":"2022092013241003000_ref25","doi-asserted-by":"crossref","first-page":"e45152","DOI":"10.1371\/journal.pone.0045152","article-title":"SVMTriP: a method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity","volume":"7","author":"Yao","year":"2012","journal-title":"PLoS One"},{"issue":"13","key":"2022092013241003000_ref26","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2022092013241003000_ref27","first-page":"2126","volume-title":"Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit","author":"Zhang","year":"2006"},{"issue":"1","key":"2022092013241003000_ref28","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"article-title":"Manual on setting up, using and understanding random forests V3.1","year":"2002","author":"Breiman","key":"2022092013241003000_ref29"},{"issue":"2","key":"2022092013241003000_ref30","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach Learn"},{"key":"2022092013241003000_ref31","first-page":"278","volume-title":"Proceedings of the 3rd International Conference on Document Analysis and Recognition","author":"Ho","year":"1995"},{"issue":"8","key":"2022092013241003000_ref32","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1109\/34.709601","article-title":"The random subspace method for constructing decision forests","volume":"20","author":"Ho","year":"1998","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"7","key":"2022092013241003000_ref33","doi-asserted-by":"crossref","first-page":"1545","DOI":"10.1162\/neco.1997.9.7.1545","article-title":"Shape quantization and recognition with randomized trees","volume":"9","author":"Amit","year":"1997","journal-title":"Neural Comput"},{"issue":"3","key":"2022092013241003000_ref34","first-page":"18","article-title":"Classification and regression by random forest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"2022092013241003000_ref35","volume-title":"The Nature of Statistical Learning Theory","author":"Vapnik","year":"1999","edition":"2nd"},{"author":"Meyer","key":"2022092013241003000_ref36"},{"key":"2022092013241003000_ref37","first-page":"200","article-title":"A thorough review on the current advance of neural network structures","volume":"14","author":"Dupond","year":"2019","journal-title":"Annu Rev Control"},{"key":"2022092013241003000_ref38","doi-asserted-by":"crossref","first-page":"e00938","DOI":"10.1016\/j.heliyon.2018.e00938","article-title":"State-of-the-art in artificial neural network applications: a survey","volume":"4","author":"Abiodun","year":"2018","journal-title":"Heliyon"},{"issue":"2","key":"2022092013241003000_ref39","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1016\/j.fcij.2018.10.003","article-title":"Time series forecasting using artificial neural networks methodologies: a systematic review","volume":"3","author":"Tealab","year":"2018","journal-title":"Future Comput Inf J"},{"issue":"8","key":"2022092013241003000_ref40","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"issue":"5\u20136","key":"2022092013241003000_ref41","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","article-title":"Framewise phoneme classification with bidirectional LSTM and other neural network architectures","volume":"18","author":"Graves","year":"2005","journal-title":"Neural Netw"},{"key":"2022092013241003000_ref42","doi-asserted-by":"crossref","DOI":"10.3115\/v1\/D14-1179","article-title":"Learning phrase representations using RNN encoder-decoder for statistical machine translation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar","author":"Cho"},{"key":"2022092013241003000_ref43","first-page":"785","volume-title":"In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA","author":"Chen","year":"2016"},{"key":"2022092013241003000_ref44","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann Stat"},{"issue":"8","key":"2022092013241003000_ref45","doi-asserted-by":"crossref","first-page":"913","DOI":"10.1111\/ecog.02881","article-title":"Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure","volume":"40","author":"Roberts","year":"2017","journal-title":"Ecography"},{"issue":"11","key":"2022092013241003000_ref46","doi-asserted-by":"crossref","first-page":"2284","DOI":"10.3390\/app9112284","article-title":"A novel MOGA-SVM multinomial classification for organ inflammation detection","volume":"9","author":"Chui","year":"2019","journal-title":"Appl Sci"},{"key":"2022092013241003000_ref47","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/j.trivac.2016.04.003","article-title":"B-cell epitope mapping for the design of vaccines and effective diagnostics","volume":"5","author":"Ahmad","year":"2016","journal-title":"Trials Vaccinol"},{"key":"2022092013241003000_ref48","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2016\/6760830","article-title":"An introduction to B-cell epitope mapping and in silico epitope prediction","volume":"2016","author":"Potocnakova","year":"2016","journal-title":"J Immunol Res"},{"issue":"3","key":"2022092013241003000_ref49","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1016\/j.jclinepi.2011.09.002","article-title":"Tradeoffs between accuracy measures for electronic health care data algorithms","volume":"65","author":"Chubak","year":"2012","journal-title":"J Clin Epidemiol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac356\/45939659\/bbac356.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/5\/bbac356\/45939659\/bbac356.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T18:17:38Z","timestamp":1663697858000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac356\/6673853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,23]]},"references-count":49,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac356","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2022,9]]},"published":{"date-parts":[[2022,8,23]]},"article-number":"bbac356"}}