{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T19:45:17Z","timestamp":1774554317382,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2022,11,3]],"date-time":"2022-11-03T00:00:00Z","timestamp":1667433600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Guangdong Province Basic and Applied Basic Research Fund","award":["2021A1515012447"],"award-info":[{"award-number":["2021A1515012447"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32070659"],"award-info":[{"award-number":["32070659"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010877","name":"Science, Technology and Innovation Commission of Shenzhen Municipality","doi-asserted-by":"publisher","award":["JCYJ20200109150003938"],"award-info":[{"award-number":["JCYJ20200109150003938"]}],"id":[{"id":"10.13039\/501100010877","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Ganghong Young Scholar Development Fund","award":["2021E007"],"award-info":[{"award-number":["2021E007"]}]},{"name":"Shenzhen-Hong Kong Cooperation Zone for Technology and Innovation","award":["HZQB-KCZYB-2020056"],"award-info":[{"award-number":["HZQB-KCZYB-2020056"]}]},{"name":"Warshel Institute for Computational Biology, School of Life and Health Sciences"},{"DOI":"10.13039\/501100004853","name":"The Chinese University of Hong Kong","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004853","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,12,13]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Antimicrobial peptides (AMPs) have the potential to inhibit multiple types of pathogens and to heal infections. Computational strategies can assist in characterizing novel AMPs from proteome or collections of synthetic sequences and discovering their functional abilities toward different microbial targets without intensive labor.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we present a deep learning-based method for computer-aided novel AMP discovery that utilizes the transformer neural network architecture with knowledge from natural language processing to extract peptide sequence information. We implemented the method for two AMP-related tasks: the first is to discriminate AMPs from other peptides, and the second task is identifying AMPs functional activities related to seven different targets (gram-negative bacteria, gram-positive bacteria, fungi, viruses, cancer cells, parasites and mammalian cell inhibition), which is a multi-label problem. In addition, asymmetric loss was adopted to resolve the intrinsic imbalance of dataset, particularly for the multi-label scenarios. The evaluation showed that our proposed scheme achieves the best performance for the first task (96.85% balanced accuracy) and has a more unbiased prediction for the second task (79.83% balanced accuracy averaged across all functional activities) when compared with that of strategies without imbalanced learning or deep learning.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code and data of this study are available at https:\/\/github.com\/BiOmicsLab\/TransImbAMP.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac711","type":"journal-article","created":{"date-parts":[[2022,11,2]],"date-time":"2022-11-02T14:25:04Z","timestamp":1667399104000},"page":"5368-5374","source":"Crossref","is-referenced-by-count":45,"title":["Integrating transformer and imbalanced multi-label learning to identify antimicrobial peptides and their functional activities"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6778-7034","authenticated-orcid":false,"given":"Yuxuan","family":"Pang","sequence":"first","affiliation":[{"name":"School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"},{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4554-6827","authenticated-orcid":false,"given":"Lantian","family":"Yao","sequence":"additional","affiliation":[{"name":"School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"},{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"}]},{"given":"Jingyi","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7076-8432","authenticated-orcid":false,"given":"Zhuo","family":"Wang","sequence":"additional","affiliation":[{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8475-7868","authenticated-orcid":false,"given":"Tzong-Yi","family":"Lee","sequence":"additional","affiliation":[{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"},{"name":"School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen , Shenzhen 518172, China"}]}],"member":"286","published-online":{"date-parts":[[2022,11,3]]},"reference":[{"key":"2022121418402266900_btac711-B1","doi-asserted-by":"crossref","first-page":"323","DOI":"10.3389\/fmicb.2018.00323","article-title":"In silico approach for prediction of antifungal peptides","volume":"9","author":"Agrawal","year":"2018","journal-title":"Front. Microbiol"},{"key":"2022121418402266900_btac711-B2","doi-asserted-by":"crossref","first-page":"bbaa153","DOI":"10.1093\/bib\/bbaa153","article-title":"AntiCP 2.0: an updated model for predicting anticancer peptides","volume":"22","author":"Agrawal","year":"2020","journal-title":"Brief. Bioinform"},{"key":"2022121418402266900_btac711-B3","article-title":"Evaluation measures for models assessment over imbalanced data sets","volume":"3","author":"Bekkar","year":"2013","journal-title":"J. Inf. Eng. Appl"},{"key":"2022121418402266900_btac711-B4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2022121418402266900_btac711-B5","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1111\/j.1541-0420.2007.00781_1.x","article-title":"The skill plot: a graphical technique for evaluating continuous diagnostic tests","volume":"64","author":"Briggs","year":"2008","journal-title":"Biometrics"},{"key":"2022121418402266900_btac711-B6","doi-asserted-by":"crossref","first-page":"e1002079","DOI":"10.1371\/journal.pcbi.1002079","article-title":"Generative embedding for model-based classification of fMRI data","volume":"7","author":"Brodersen","year":"2011","journal-title":"PLoS Comput. Biol"},{"key":"2022121418402266900_btac711-B7","doi-asserted-by":"crossref","first-page":"1375","DOI":"10.1080\/14740338.2021.1928633","article-title":"Antimicrobial resistance and the post antibiotic era: better late than never effort","volume":"20","author":"Chandra","year":"2021","journal-title":"Expert Opin. Drug Saf"},{"key":"2022121418402266900_btac711-B8","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1007\/978-0-387-09823-4_45","volume-title":"Data Mining and Knowledge Discovery Handbook","author":"Chawla","year":"2009"},{"key":"2022121418402266900_btac711-B9","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"Smote: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res"},{"key":"2022121418402266900_btac711-B10","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo-amino acid composition","volume":"43","author":"Chou","year":"2001","journal-title":"Proteins"},{"key":"2022121418402266900_btac711-B11","first-page":"4171","author":"Devlin","year":"2019"},{"key":"2022121418402266900_btac711-B12","first-page":"343","author":"Dong","year":"2011"},{"key":"2022121418402266900_btac711-B13","doi-asserted-by":"crossref","first-page":"D444","DOI":"10.1093\/nar\/gkt1008","article-title":"Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides","volume":"42","author":"Gautam","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B14","first-page":"4918","author":"He","year":"2019"},{"key":"2022121418402266900_btac711-B15","doi-asserted-by":"crossref","first-page":"D460","DOI":"10.1093\/nar\/gkab1080","article-title":"dbAMP 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data","volume":"50","author":"Jhong","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B16","doi-asserted-by":"crossref","first-page":"1535","DOI":"10.1109\/TCBB.2012.89","article-title":"Classamp: a prediction tool for classification of antimicrobial peptides","volume":"9","author":"Joseph","year":"2012","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2022121418402266900_btac711-B17","author":"Kingma","year":"2015"},{"key":"2022121418402266900_btac711-B18","doi-asserted-by":"crossref","first-page":"10957","DOI":"10.1007\/s11033-012-1997-x","article-title":"Cathelicidins: family of antimicrobial peptides. A review","volume":"39","author":"Ko\u015bciuczuk","year":"2012","journal-title":"Mol. Biol. Rep"},{"key":"2022121418402266900_btac711-B19","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1126\/science.abl5997","article-title":"The post-antibiotic era is here","volume":"373","author":"Kwon","year":"2021","journal-title":"Science"},{"key":"2022121418402266900_btac711-B20","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2022121418402266900_btac711-B21","first-page":"2980","author":"Lin","year":"2017"},{"key":"2022121418402266900_btac711-B22","doi-asserted-by":"crossref","first-page":"3745","DOI":"10.1093\/bioinformatics\/btw560","article-title":"Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types","volume":"32","author":"Lin","year":"2016","journal-title":"Bioinformatics"},{"key":"2022121418402266900_btac711-B23","doi-asserted-by":"crossref","first-page":"1909","DOI":"10.1007\/s00500-010-0625-8","article-title":"Addressing data complexity for imbalanced data sets: analysis of smote-based oversampling and evolutionary undersampling","volume":"15","author":"Luengo","year":"2011","journal-title":"Soft. Comput"},{"key":"2022121418402266900_btac711-B24","first-page":"3","author":"Maas","year":"2013"},{"key":"2022121418402266900_btac711-B25","doi-asserted-by":"crossref","first-page":"194","DOI":"10.3389\/fcimb.2016.00194","article-title":"Antimicrobial peptides: an emerging category of therapeutic agents","volume":"6","author":"Mahlapuu","year":"2016","journal-title":"Front. Cell. Infect. Microbiol"},{"key":"2022121418402266900_btac711-B26","author":"McInnes","year":"2018"},{"key":"2022121418402266900_btac711-B27","doi-asserted-by":"crossref","first-page":"42362","DOI":"10.1038\/srep42362","article-title":"Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou\u2019s general pseaac","volume":"7","author":"Meher","year":"2017","journal-title":"Sci. Rep"},{"key":"2022121418402266900_btac711-B28","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B29","doi-asserted-by":"crossref","first-page":"bbab263","DOI":"10.1093\/bib\/bbab263","article-title":"AVPIden: a new scheme for identification and functional prediction of antiviral peptides based on machine learning approaches","volume":"22","author":"Pang","year":"2021","journal-title":"Brief. Bioinform"},{"key":"2022121418402266900_btac711-B30","article-title":"Pytorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2022121418402266900_btac711-B31","doi-asserted-by":"crossref","first-page":"801","DOI":"10.3390\/antibiotics9110801","article-title":"A review on the use of antimicrobial peptides to combat porcine viruses","volume":"9","author":"Pen","year":"2020","journal-title":"Antibiotics"},{"key":"2022121418402266900_btac711-B32","doi-asserted-by":"crossref","first-page":"D288","DOI":"10.1093\/nar\/gkaa991","article-title":"DBAASP v3: database of antimicrobial\/cytotoxic activity and structure of peptides as a resource for development of new therapeutics","volume":"49","author":"Pirtskhalava","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B33","doi-asserted-by":"crossref","first-page":"D1147","DOI":"10.1093\/nar\/gkt1191","article-title":"AVPdb: a database of experimentally validated antiviral peptides targeting medically important viruses","volume":"42","author":"Qureshi","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B34","first-page":"9689","author":"Rao","year":"2019"},{"key":"2022121418402266900_btac711-B35","first-page":"82","author":"Ridnik","year":"2021"},{"key":"2022121418402266900_btac711-B36","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.3390\/antibiotics10091095","article-title":"Antimicrobial peptides: a potent alternative to antibiotics","volume":"10","author":"Rima","year":"2021","journal-title":"Antibiotics"},{"key":"2022121418402266900_btac711-B37","doi-asserted-by":"crossref","first-page":"1320","DOI":"10.3389\/fimmu.2017.01320","article-title":"Antimicrobial peptides as biologic and immunotherapeutic agents against cancer: a comprehensive overview","volume":"8","author":"Roudi","year":"2017","journal-title":"Front. Immunol"},{"key":"2022121418402266900_btac711-B38","first-page":"145","article-title":"On the stratification of multi-label data","author":"Sechidis","year":"2011","journal-title":"Machine Learning and Knowledge Discovery in Databases"},{"key":"2022121418402266900_btac711-B39","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1016\/j.actbio.2018.01.009","article-title":"Central \u03b2-turn increases the cell selectivity of imperfectly amphipathic \u03b1-helical peptides","volume":"69","author":"Shao","year":"2018","journal-title":"Acta Biomater"},{"key":"2022121418402266900_btac711-B40","doi-asserted-by":"crossref","first-page":"D488","DOI":"10.1093\/nar\/gkab651","article-title":"DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides","volume":"50","author":"Shi","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B41","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2022121418402266900_btac711-B42","doi-asserted-by":"crossref","first-page":"107965","DOI":"10.1016\/j.patcog.2021.107965","article-title":"A review of methods for imbalanced multi-label classification","volume":"118","author":"Tarekegn","year":"2021","journal-title":"Pattern Recognit"},{"key":"2022121418402266900_btac711-B43","doi-asserted-by":"crossref","first-page":"D837","DOI":"10.1093\/nar\/gku892","article-title":"Cancerppd: a database of anticancer peptides and proteins","volume":"43","author":"Tyagi","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B44","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"UniProt Consortium","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022121418402266900_btac711-B45","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2022121418402266900_btac711-B46","doi-asserted-by":"crossref","first-page":"2740","DOI":"10.1093\/bioinformatics\/bty179","article-title":"Deep learning improves antimicrobial peptide recognition","volume":"34","author":"Veltri","year":"2018","journal-title":"Bioinformatics"},{"key":"2022121418402266900_btac711-B47","doi-asserted-by":"crossref","first-page":"2221","DOI":"10.1080\/07391102.2014.998710","article-title":"iDrug-target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach","volume":"33","author":"Xiao","year":"2015","journal-title":"J. Biomol. Struct. Dyn"},{"key":"2022121418402266900_btac711-B48","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1038\/s41592-019-0496-6","article-title":"Machine-learning-guided directed evolution for protein engineering","volume":"16","author":"Yang","year":"2019","journal-title":"Nat. Methods"},{"key":"2022121418402266900_btac711-B49","doi-asserted-by":"crossref","first-page":"baaa061","DOI":"10.1093\/database\/baaa061","article-title":"Lamp2: a major update of the database linking antimicrobial peptides","volume":"2020","author":"Ye","year":"2020","journal-title":"Database"},{"key":"2022121418402266900_btac711-B51","doi-asserted-by":"crossref","first-page":"2135","DOI":"10.1093\/bioinformatics\/btac106","article-title":"PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning","volume":"38","author":"Zhang","year":"2022","journal-title":"Bioinformatics"},{"key":"2022121418402266900_btac711-B52","first-page":"48","article-title":"Antimicrobial peptides: mechanism of action, activity and clinical potential","volume":"8","author":"Zhang","year":"2021","journal-title":"Mil. Med. Res"},{"key":"2022121418402266900_btac711-B53","doi-asserted-by":"crossref","first-page":"bbab200","DOI":"10.1093\/bib\/bbab200","article-title":"A novel antibacterial peptide recognition algorithm based on BERT","volume":"22","author":"Zhang","year":"2021","journal-title":"Brief. Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac711\/47013946\/btac711.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/24\/5368\/47886933\/btac711.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/24\/5368\/47886933\/btac711.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,14]],"date-time":"2022-12-14T18:40:59Z","timestamp":1671043259000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/24\/5368\/6795008"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,11,3]]},"references-count":52,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2022,11,3]]},"published-print":{"date-parts":[[2022,12,13]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac711","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,12,15]]},"published":{"date-parts":[[2022,11,3]]}}}