{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T16:23:38Z","timestamp":1778171018685,"version":"3.51.4"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T00:00:00Z","timestamp":1614297600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"TRF Research Grant for New Scholar","award":["MRG6180222"],"award-info":[{"award-number":["MRG6180222"]}]},{"name":"TRF Research Grant for New Scholar","award":["MRG6180226"],"award-info":[{"award-number":["MRG6180226"]}]},{"name":"College of Arts, Media and Technology, Chiang Mai University, and partially supported by Chiang Mai University and the TRF Research Career Development","award":["RSA6280075"],"award-info":[{"award-number":["RSA6280075"]}]},{"name":"Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science and ICT","award":["2018R1D1A1B07049572"],"award-info":[{"award-number":["2018R1D1A1B07049572"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The identification of bitter peptides through experimental approaches is an expensive and time-consuming endeavor. Due to the huge number of newly available peptide sequences in the post-genomic era, the development of automated computational models for the identification of novel bitter peptides is highly desirable.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this work, we present BERT4Bitter, a bidirectional encoder representation from transformers (BERT)-based model for predicting bitter peptides directly from their amino acid sequence without using any structural information. To the best of our knowledge, this is the first time a BERT-based model has been employed to identify bitter peptides. Compared to widely used machine learning models, BERT4Bitter achieved the best performance with an accuracy of 0.861 and 0.922 for cross-validation and independent tests, respectively. Furthermore, extensive empirical benchmarking experiments on the independent dataset demonstrated that BERT4Bitter clearly outperformed the existing method with improvements of 8.0% accuracy and 16.0% Matthews coefficient correlation, highlighting the effectiveness and robustness of BERT4Bitter. We believe that the BERT4Bitter method proposed herein will be a useful tool for rapidly screening and identifying novel bitter peptides for drug development and nutritional research.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availabilityand implementation<\/jats:title>\n                  <jats:p>The user-friendly web server of the proposed BERT4Bitter is freely accessible at http:\/\/pmlab.pythonanywhere.com\/BERT4Bitter.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab133","type":"journal-article","created":{"date-parts":[[2021,2,24]],"date-time":"2021-02-24T12:23:03Z","timestamp":1614169383000},"page":"2556-2562","source":"Crossref","is-referenced-by-count":160,"title":["BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides"],"prefix":"10.1093","volume":"37","author":[{"given":"Phasit","family":"Charoenkwan","sequence":"first","affiliation":[{"name":"Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University , Chiang Mai 50200, Thailand"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chanin","family":"Nantasenamat","sequence":"additional","affiliation":[{"name":"Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University , Bangkok 10700, Thailand"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Md Mehedi","family":"Hasan","sequence":"additional","affiliation":[{"name":"Department of Bioscience and Bioinformatics, Kyushu Institute of Technology , Iizuka, Fukuoka 820-8502, Japan"},{"name":"Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University , New Orleans, LA 70112, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Balachandran","family":"Manavalan","sequence":"additional","affiliation":[{"name":"Department of Physiology, Ajou University School of Medicine , Suwon 443380, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3394-8709","authenticated-orcid":false,"given":"Watshara","family":"Shoombuatong","sequence":"additional","affiliation":[{"name":"Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University , Bangkok 10700, Thailand"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,2,26]]},"reference":[{"key":"2023051609205046100_btab133-B1","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1016\/S0092-8674(00)80705-9","article-title":"A novel family of mammalian taste receptors","volume":"100","author":"Adler","year":"2000","journal-title":"Cell"},{"key":"2023051609205046100_btab133-B2","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1038\/ng.3511","article-title":"An expanded sequence context model broadly explains variability in polymorphism levels across the human genome","volume":"48","author":"Aggarwala","year":"2016","journal-title":"Nat. Genet"},{"key":"2023051609205046100_btab133-B3","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/S0306-4573(02)00021-3","article-title":"An information-theoretic perspective of TF\u2013IDF measures","volume":"39","author":"Aizawa","year":"2003","journal-title":"Inf. Process. Manag"},{"key":"2023051609205046100_btab133-B4","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.csl.2019.01.005","article-title":"Unsupervised sentence representations as word information series: revisiting TF\u2013IDF","volume":"56","author":"Arroyo-Fern\u00e1ndez","year":"2019","journal-title":"Comput. Speech Language"},{"key":"2023051609205046100_btab133-B5","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PLoS One"},{"key":"2023051609205046100_btab133-B6","doi-asserted-by":"crossref","first-page":"1276","DOI":"10.1002\/med.21658","article-title":"Machine intelligence in peptide therapeutics: a next-generation tool for rapid disease screening","volume":"40","author":"Basith","year":"2020","journal-title":"Med. Res. Rev"},{"key":"2023051609205046100_btab133-B7","first-page":"5","author":"Breiman","year":"2001"},{"key":"2023051609205046100_btab133-B8","doi-asserted-by":"crossref","first-page":"2813","DOI":"10.1016\/j.ygeno.2020.03.019","article-title":"iAMY-SCM: improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides","volume":"112","author":"Charoenkwan","year":"2020","journal-title":"Genomics"},{"key":"2023051609205046100_btab133-B9","doi-asserted-by":"crossref","first-page":"4125","DOI":"10.1021\/acs.jproteome.0c00590","article-title":"iDPPIV-SCM: a sequence-based predictor for identifying and analyzing dipeptidyl peptidase IV (DPP-IV) inhibitory peptides using a scoring card method","volume":"19","author":"Charoenkwan","year":"2020","journal-title":"J. Proteome Res"},{"key":"2023051609205046100_btab133-B10","doi-asserted-by":"crossref","first-page":"6666","DOI":"10.1021\/acs.jcim.0c00707","article-title":"iUmami-SCM: a novel sequence-based predictor for prediction and analysis of umami peptides using a scoring card method with propensity scores of dipeptides","volume":"60","author":"Charoenkwan","year":"2020","journal-title":"J. Chem. Inf. Model"},{"key":"2023051609205046100_btab133-B11","doi-asserted-by":"crossref","DOI":"10.1016\/j.ygeno.2020.03.019","article-title":"iBitter-SCM: identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides","author":"Charoenkwan","year":"2020"},{"key":"2023051609205046100_btab133-B12","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.eswa.2016.09.009","article-title":"Turning from TF-IDF to TF-IGM for term weighting in text classification","volume":"66","author":"Chen","year":"2016","journal-title":"Expert Syst. Appl"},{"key":"2023051609205046100_btab133-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-017-12359-7","article-title":"Bitter or not? BitterPredict, a tool for predicting taste from chemical structure","volume":"7","author":"Dagan-Wiener","year":"2017","journal-title":"Sci. Rep"},{"key":"2023051609205046100_btab133-B14","author":"Devlin","year":"2018"},{"key":"2023051609205046100_btab133-B15","doi-asserted-by":"crossref","first-page":"1424","DOI":"10.1093\/ajcn\/72.6.1424","article-title":"Bitter taste, phytonutrients, and the consumer: a review","volume":"72","author":"Drewnowski","year":"2000","journal-title":"Am. J. Clin. Nutr"},{"key":"2023051609205046100_btab133-B16","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1016\/j.ejor.2017.11.054","article-title":"Deep learning with long short-term memory networks for financial market predictions","volume":"270","author":"Fischer","year":"2018","journal-title":"Eur. J. Operat. Res"},{"key":"2023051609205046100_btab133-B17","doi-asserted-by":"crossref","first-page":"i37","DOI":"10.1093\/bioinformatics\/btx228","article-title":"Deep learning with word embeddings improves biomedical named entity recognition","volume":"33","author":"Habibi","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B18","doi-asserted-by":"crossref","first-page":"2009","DOI":"10.1093\/bioinformatics\/bty937","article-title":"Identifying antimicrobial peptides using word embedding with deep recurrent neural networks","volume":"35","author":"Hamid","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B19","volume-title":"Exploring QSAR: Fundamentals and Applications in Chemistry and Biology","author":"Hansch","year":"1995"},{"key":"2023051609205046100_btab133-B20","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1016\/j.csbj.2020.04.001","article-title":"i4mC-Mouse: improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes","volume":"18","author":"Hasan","year":"2020","journal-title":"Comput. Struct. Biotechnol. J"},{"key":"2023051609205046100_btab133-B21","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/s11103-020-00988-y","article-title":"i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation","volume":"103","author":"Hasan","year":"2020","journal-title":"Plant Mol. Biol"},{"key":"2023051609205046100_btab133-B22","doi-asserted-by":"crossref","first-page":"3350","DOI":"10.1093\/bioinformatics\/btaa160","article-title":"HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation","volume":"36","author":"Hasan","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B23","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1109\/TGRS.2019.2934760","article-title":"HSI-BERT: hyperspectral image classification using the bidirectional encoder representation from transformers","volume":"58","author":"He","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens"},{"key":"2023051609205046100_btab133-B24","doi-asserted-by":"crossref","first-page":"1126","DOI":"10.1021\/jm00390a003","article-title":"Peptide quantitative structure-activity relationships, a multivariate approach","volume":"30","author":"Hellberg","year":"1987","journal-title":"J. Med. Chem"},{"key":"2023051609205046100_btab133-B25","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023051609205046100_btab133-B26","doi-asserted-by":"crossref","first-page":"23450","DOI":"10.1038\/srep23450","article-title":"BitterX: a tool for understanding bitter taste in humans","volume":"6","author":"Huang","year":"2016","journal-title":"Sci. Rep"},{"key":"2023051609205046100_btab133-B27","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/bioinformatics\/bty752","article-title":"DeepGSR: an optimized deep-learning structure for the recognition of genomic signals and regions","volume":"35","author":"Kalkatawi","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B28","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky","year":"2012"},{"key":"2023051609205046100_btab133-B29","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1109\/ISCAS.2010.5537907","volume-title":"Proceedings of 2010 IEEE International Symposium on Circuits and Systems","author":"LeCun","year":"2010"},{"key":"2023051609205046100_btab133-B30","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B31","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1093\/bioinformatics\/btz721","article-title":"DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites","volume":"36","author":"Li","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B32","doi-asserted-by":"crossref","first-page":"1044","DOI":"10.1016\/j.omtn.2020.07.034","article-title":"im6A-TS-CNN: identifying the N6-methyladenine site in multiple tissues by using the convolutional neural network","volume":"21","author":"Liu","year":"2020","journal-title":"Mol. Therapy Nucleic Acids"},{"key":"2023051609205046100_btab133-B33","doi-asserted-by":"crossref","first-page":"3336","DOI":"10.1093\/bioinformatics\/btaa155","article-title":"iMRM: a platform for simultaneously identifying multiple kinds of RNA modifications","volume":"36","author":"Liu","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B34","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.trc.2015.03.014","article-title":"Long short-term memory neural network for traffic speed prediction using remote microwave sensor data","volume":"54","author":"Ma","year":"2015","journal-title":"Transport. Res. Part C Emerg. Technol"},{"key":"2023051609205046100_btab133-B35","doi-asserted-by":"crossref","first-page":"1661","DOI":"10.1007\/s00018-009-8755-9","article-title":"Bitter peptides and bitter taste receptors","volume":"66","author":"Maehashi","year":"2009","journal-title":"Cell. Mol. Life Sci"},{"key":"2023051609205046100_btab133-B36","author":"Mikolov","year":"2013"},{"key":"2023051609205046100_btab133-B37","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051609205046100_btab133-B38","doi-asserted-by":"crossref","first-page":"880","DOI":"10.1016\/j.foodchem.2006.06.026","article-title":"Modelling relationship between angiotensin-(I)-converting enzyme inhibition and the bitter taste of peptides","volume":"102","author":"Pripp","year":"2007","journal-title":"Food Chem"},{"key":"2023051609205046100_btab133-B39","first-page":"82","volume-title":"International Conference on Artificial Neural Networks","author":"Scherer","year":"2010"},{"key":"2023051609205046100_btab133-B40","doi-asserted-by":"crossref","first-page":"5128","DOI":"10.1093\/bioinformatics\/btz464","article-title":"DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network","volume":"35","author":"Shi","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B41","first-page":"1441","volume-title":"Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China","author":"Sun","year":"2019"},{"key":"2023051609205046100_btab133-B42","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.neunet.2020.05.027","article-title":"Prediction of N6-methyladenosine sites using convolution neural network model based on distributed feature representations","volume":"129","author":"Tahir","year":"2020","journal-title":"Neural Netw"},{"key":"2023051609205046100_btab133-B43","doi-asserted-by":"crossref","first-page":"2740","DOI":"10.1093\/bioinformatics\/bty179","article-title":"Deep learning improves antimicrobial peptide recognition","volume":"34","author":"Veltri","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051609205046100_btab133-B44","first-page":"bbaa275","article-title":"Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework","volume":"2020","author":"Wei","year":"2020","journal-title":"Brief. Bioinform"},{"key":"2023051609205046100_btab133-B45","doi-asserted-by":"crossref","first-page":"275","DOI":"10.4155\/fmc-2016-0188","article-title":"HemoPred: a web server for predicting the hemolytic activity of peptides","volume":"9","author":"Win","year":"2017","journal-title":"Future Med. Chem"},{"key":"2023051609205046100_btab133-B46","doi-asserted-by":"crossref","first-page":"1749","DOI":"10.4155\/fmc-2017-0300","article-title":"PAAP: a web server for predicting antihypertensive activity of peptides","volume":"10","author":"Win","year":"2018","journal-title":"Future Med. Chem"},{"key":"2023051609205046100_btab133-B47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-3006-z","article-title":"PTPD: predicting therapeutic peptides by deep learning and word2vec","volume":"20","author":"Wu","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051609205046100_btab133-B48","first-page":"bbaa125","article-title":"DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy","volume":"2020","author":"Xie","year":"2020","journal-title":"Brief. Bioinf"},{"key":"2023051609205046100_btab133-B49","first-page":"3","article-title":"Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning","volume":"2020","author":"Xu","year":"2020","journal-title":"Brief. Bioinformatics"},{"key":"2023051609205046100_btab133-B50","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1109\/ICSMC.2008.4811259","volume-title":"2008 IEEE International Conference on Systems, Man and Cybernetics","author":"Zhang","year":"2008"},{"key":"2023051609205046100_btab133-B51","doi-asserted-by":"crossref","first-page":"2758","DOI":"10.1016\/j.eswa.2010.08.066","article-title":"A comparative study of TF IDF, LSI and multi-words for text classification","volume":"38","author":"Zhang","year":"2011","journal-title":"Expert Syst. Appl"},{"key":"2023051609205046100_btab133-B52","doi-asserted-by":"crossref","first-page":"895","DOI":"10.3389\/fchem.2019.00895","article-title":"SPVec: a Word2vec-inspired feature representation method for drug\u2013target interaction prediction","volume":"7","author":"Zhang","year":"2019","journal-title":"Front. Chem"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab133\/36622488\/btab133.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/17\/2556\/50339061\/btab133.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/17\/2556\/50339061\/btab133.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T09:23:58Z","timestamp":1684229038000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/17\/2556\/6151716"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,2,26]]},"references-count":52,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2021,9,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab133","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9,1]]},"published":{"date-parts":[[2021,2,26]]}}}