{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T16:06:11Z","timestamp":1772121971550,"version":"3.50.1"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2021,1,26]],"date-time":"2021-01-26T00:00:00Z","timestamp":1611619200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32070659"],"award-info":[{"award-number":["32070659"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Warshel Institute of Computational Biology"},{"DOI":"10.13039\/501100004853","name":"Chinese University of Hong Kong","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004853","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,3,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>As the current worldwide outbreaks of the SARS-CoV-2, it is urgently needed to develop effective therapeutic agents for inhibiting the pathogens or treating the related diseases. Antimicrobial peptides (AMP) with functional activity against coronavirus could be a considerable solution, yet there is no research for identifying anti-coronavirus (anti-CoV) peptides with the computational approach. In this study, we first investigated the physiochemical and compositional properties of the collected anti-CoV peptides by comparing against three other negative sets: antivirus peptides without anti-CoV function (antivirus), regular AMP without antivirus functions (non-AVP) and peptides without antimicrobial functions (non-AMP). Then, we established classifiers for identifying anti-CoV peptides between different negative sets based on random forest. Imbalanced learning strategies were adopted due to the severe class-imbalance within the datasets. The geometric mean of the sensitivity and specificity (GMean) under the identification from antivirus, non-AVP and non-AMP reaches 83.07%, 85.51% and 98.82%, respectively. Then, to pursue identifying anti-CoV peptides from broad-spectrum peptides, we designed a double-stages classifier based on the collected datasets. In the first stage, the classifier characterizes AMPs from regular peptides. It achieves an area under the receiver operating curve (AUCROC) value of 97.31%. The second stage is to identify the anti-CoV peptides between the combined negatives of other AMPs. Here, the GMean of evaluation on the independent test set is 79.42%. The proposed approach is considered as an applicable scheme for assisting the development of novel anti-CoV peptides. The datasets and source codes used in this study are available at https:\/\/github.com\/poncey\/PreAntiCoV.<\/jats:p>","DOI":"10.1093\/bib\/bbaa423","type":"journal-article","created":{"date-parts":[[2020,12,24]],"date-time":"2020-12-24T12:29:53Z","timestamp":1608812993000},"page":"1085-1095","source":"Crossref","is-referenced-by-count":54,"title":["Identifying anti-coronavirus peptides by incorporating different negative datasets and imbalanced learning strategies"],"prefix":"10.1093","volume":"22","author":[{"given":"Yuxuan","family":"Pang","sequence":"first","affiliation":[{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China"},{"name":"School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China"}]},{"given":"Zhuo","family":"Wang","sequence":"additional","affiliation":[{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China"}]},{"given":"Jhih-Hua","family":"Jhong","sequence":"additional","affiliation":[{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China"},{"name":"Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan"}]},{"given":"Tzong-Yi","family":"Lee","sequence":"additional","affiliation":[{"name":"Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China"},{"name":"School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, P.R. China"}]}],"member":"286","published-online":{"date-parts":[[2021,1,26]]},"reference":[{"key":"2021032314373209300_ref1","doi-asserted-by":"crossref","first-page":"323","DOI":"10.3389\/fmicb.2018.00323","article-title":"In silico approach for prediction of antifungal peptides","volume":"9","author":"Agrawal","year":"2019","journal-title":"Front Microbiol"},{"key":"2021032314373209300_ref2","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbaa153","article-title":"AntiCP 2.0: an updated model for predicting anticancer peptides","author":"Agrawal","year":"2020","journal-title":"Brief Bioinform"},{"issue":"suppl_2","key":"2021032314373209300_ref3","doi-asserted-by":"crossref","first-page":"W202","DOI":"10.1093\/nar\/gkp335","article-title":"MEME suite: tools for motif discovery and searching","volume":"37","author":"Bailey","year":"2009","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"2021032314373209300_ref4","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1080\/10408363.2019.1631249","article-title":"Designing and optimizing new antimicrobial peptides: all targets are not the same","volume":"56","author":"Barreto-Santamar\u00eda","year":"2019","journal-title":"Crit Rev Clin Lab Sci"},{"issue":"1","key":"2021032314373209300_ref5","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1002\/elps.11501401163","article-title":"The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences","volume":"14","author":"Bjellqvist","year":"1993","journal-title":"Electrophoresis"},{"issue":"1","key":"2021032314373209300_ref6","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1002\/elps.1150150171","article-title":"Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions","volume":"15","author":"Bjellqvist","year":"1994","journal-title":"Electrophoresis"},{"issue":"1","key":"2021032314373209300_ref7","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1016\/0014-5793(89)81505-4","article-title":"Antibacterial and antimalarial properties of peptides that are cecropin-melittin hybrids","volume":"259","author":"Boman","year":"1989","journal-title":"FEBS Lett"},{"issue":"4","key":"2021032314373209300_ref8","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1093\/bib\/bbu031","article-title":"Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features","volume":"16","author":"Chen","year":"2015","journal-title":"Brief Bioinform"},{"issue":"14","key":"2021032314373209300_ref9","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1093\/bioinformatics\/bty140","article-title":"iFeature: a python package and web server for features extraction and selection from protein and peptide sequences","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"issue":"3","key":"2021032314373209300_ref10","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1002\/prot.1035","article-title":"Prediction of protein cellular attributes using pseudo-amino acid composition","volume":"43","author":"Chou","year":"2001","journal-title":"Proteins: Struct, Funct, Bioinformat"},{"issue":"3","key":"2021032314373209300_ref11","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1093\/bib\/bbz043","article-title":"Characterization and identification of antimicrobial peptides with different functional activities","volume":"21","author":"Chung","year":"2019","journal-title":"Brief Bioinform"},{"issue":"3","key":"2021032314373209300_ref12","doi-asserted-by":"crossref","first-page":"986","DOI":"10.3390\/ijms21030986","article-title":"Characterization and identification of natural antimicrobial peptides on different organisms","volume":"21","author":"Chung","year":"2020","journal-title":"Int J Mol Sci"},{"issue":"11","key":"2021032314373209300_ref13","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"2021032314373209300_ref14","first-page":"D1158","article-title":"Uniprot: the universal protein knowledgebase","volume":"45","author":"Consortium","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2021032314373209300_ref15","doi-asserted-by":"crossref","first-page":"1887","DOI":"10.1016\/j.patrec.2008.06.007","article-title":"Using Chou\u2019s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier","volume":"29","author":"Ding","year":"2008","journal-title":"Pattern Recognit Lett"},{"issue":"5881","key":"2021032314373209300_ref16","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1038\/299371a0","article-title":"The helical hydrophobic moment: a measure of the amphiphilicity of a helix","volume":"299","author":"Eisenberg","year":"1982","journal-title":"Nature"},{"issue":"1","key":"2021032314373209300_ref17","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (roc) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2021032314373209300_ref18","first-page":"2354","article-title":"Wilcoxon rank sum test","author":"Haynes","year":"2013","journal-title":"Ency Syst Biol"},{"issue":"9","key":"2021032314373209300_ref19","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"6","key":"2021032314373209300_ref20","doi-asserted-by":"crossref","first-page":"3824","DOI":"10.1073\/pnas.78.6.3824","article-title":"Prediction of protein antigenic determinants from amino acid sequences","volume":"78","author":"Hopp","year":"1981","journal-title":"Proc Natl Acad Sci"},{"issue":"6","key":"2021032314373209300_ref21","first-page":"1895","article-title":"Thermostability and aliphatic index of globular proteins","volume":"88","author":"Ikai","year":"1980","journal-title":"J Biochem"},{"key":"2021032314373209300_ref22","first-page":"D1285","article-title":"dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data","volume":"47","author":"Jhong","year":"2018","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2021032314373209300_ref23","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1038\/s41597-019-0154-y","article-title":"Dramp 2.0, an updated data repository of antimicrobial peptides","volume":"6","author":"Kang","year":"2019","journal-title":"Scientific Data"},{"issue":"4","key":"2021032314373209300_ref24","doi-asserted-by":"crossref","first-page":"262","DOI":"10.2174\/157016409789973707","article-title":"Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology","volume":"6","author":"Kuo-Chen","year":"2009","journal-title":"Curr Proteomics"},{"issue":"1","key":"2021032314373209300_ref25","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Leo","year":"2001","journal-title":"Mach Learn"},{"issue":"1","key":"2021032314373209300_ref26","first-page":"559","article-title":"Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning","volume":"18","author":"Lema","year":"2017","journal-title":"J Mach Learn Res"},{"issue":"20","key":"2021032314373209300_ref27","doi-asserted-by":"crossref","first-page":"4277","DOI":"10.1021\/bi00613a026","article-title":"Conformational preferences of amino acids in globular proteins","volume":"17","author":"Levitt","year":"1978","journal-title":"Biochemistry"},{"issue":"7","key":"2021032314373209300_ref28","doi-asserted-by":"crossref","first-page":"1518","DOI":"10.1016\/j.peptides.2011.05.015","article-title":"Virucidal activity of a scorpion venom peptide variant mucroporin-m1 against measles, SARS-COV and influenza H5N1 viruses","volume":"32","author":"Li","year":"2011","journal-title":"Peptides"},{"issue":"13","key":"2021032314373209300_ref29","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2021032314373209300_ref30","volume-title":"CRC Handbook of Chemistry and Physics","author":"Lide","year":"1991"},{"issue":"1","key":"2021032314373209300_ref31","doi-asserted-by":"crossref","first-page":"3067","DOI":"10.1038\/ncomms4067","article-title":"Structure-based discovery of Middle East respiratory syndrome coronavirus fusion inhibitor","volume":"5","author":"Lu","year":"2014","journal-title":"Nat Commun"},{"issue":"1","key":"2021032314373209300_ref32","doi-asserted-by":"crossref","first-page":"2522","DOI":"10.1038\/s42256-019-0138-9","article-title":"From local explanations to global understanding with explainable AI for trees","volume":"2","author":"Lundberg","year":"2020","journal-title":"Nat Mach Intell"},{"key":"2021032314373209300_ref33","doi-asserted-by":"crossref","first-page":"194","DOI":"10.3389\/fcimb.2016.00194","article-title":"Antimicrobial peptides: an emerging category of therapeutic agents","volume":"6","author":"Mahlapuu","year":"2016","journal-title":"Front Cell Infect Microbiol"},{"key":"2021032314373209300_ref34","article-title":"KNN approach to unbalanced data distributions: a case study involving information extraction","volume-title":"Proceedings of International Conference on Machine Learning (ICML\u2019 2003) workshop on learning from imbalanced datasets","author":"Mani","year":"2003"},{"issue":"1","key":"2021032314373209300_ref35","doi-asserted-by":"crossref","first-page":"42362","DOI":"10.1038\/srep42362","article-title":"Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou\u2019s general pseAAC","volume":"7","author":"Meher","year":"2017","journal-title":"Sci Rep"},{"key":"2021032314373209300_ref36","first-page":"2020","article-title":"Therapeutic and prophylactic potential of antimicrobial peptides against coronaviruses","author":"Memariani","year":"1971","journal-title":"Ir J Med Sci"},{"issue":"17","key":"2021032314373209300_ref37","doi-asserted-by":"crossref","first-page":"2753","DOI":"10.1093\/bioinformatics\/btx285","article-title":"modlAMP: python for antimicrobial peptides","volume":"33","author":"M\u00fcller","year":"2017","journal-title":"Bioinformatics"},{"key":"2021032314373209300_ref38","first-page":"1","article-title":"Peptide-protein interaction studies of antimicrobial peptides targeting middle east respiratory syndrome coronavirus spike protein: an in silico approach","volume":"2019","author":"Mustafa","year":"2019","journal-title":"Adv Bioinformat"},{"key":"2021032314373209300_ref39","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"issue":"D1","key":"2021032314373209300_ref40","doi-asserted-by":"crossref","first-page":"D1147","DOI":"10.1093\/nar\/gkt1191","article-title":"AVPdb: a database of experimentally validated antiviral peptides targeting medically important viruses","volume":"42","author":"Qureshi","year":"2014","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2021032314373209300_ref41","first-page":"11","article-title":"Impact of sample size and variability on the power and type I error rates of equivalence tests: a simulation study","volume":"19","author":"Rusticus","year":"2014","journal-title":"Pract Assess Res Eval"},{"issue":"22","key":"2021032314373209300_ref42","doi-asserted-by":"crossref","first-page":"5743","DOI":"10.3390\/ijms20225743","article-title":"Meta-iAVP: a sequence-based meta-predictor for improving the prediction of antiviral peptides using effective feature representation","volume":"20","author":"Schaduangrat","year":"2019","journal-title":"Int J Mol Sci"},{"issue":"W1","key":"2021032314373209300_ref43","doi-asserted-by":"crossref","first-page":"W199","DOI":"10.1093\/nar\/gks450","article-title":"AVPpred: collection and prediction of highly effective antiviral peptides","volume":"40","author":"Thakur","year":"2012","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2021032314373209300_ref44","doi-asserted-by":"crossref","first-page":"D837","DOI":"10.1093\/nar\/gku892","article-title":"CancerPPD: a database of anticancer peptides and proteins","volume":"43","author":"Tyagi","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2021032314373209300_ref45","first-page":"38","article-title":"AFP-CKSAAP: prediction of antifreeze proteins using composition of k-spaced amino acid pairs with deep neural network","volume-title":"2019 IEEE 19th International Conference on Bioinformatics and Bioengineering","author":"Usman","year":"2019"},{"issue":"86","key":"2021032314373209300_ref46","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"issue":"16","key":"2021032314373209300_ref47","doi-asserted-by":"crossref","first-page":"2740","DOI":"10.1093\/bioinformatics\/bty179","article-title":"Deep learning improves antimicrobial peptide recognition","volume":"34","author":"Veltri","year":"2018","journal-title":"Bioinformatics"},{"issue":"D1","key":"2021032314373209300_ref48","doi-asserted-by":"crossref","first-page":"1087","DOI":"10.1093\/nar\/gkv1278","article-title":"Apd3: the antimicrobial peptide database as a tool for research and education","volume":"44","author":"Wang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2021032314373209300_ref49","article-title":"A large-scale investigation and identification of methicillin-resistant Staphylococcus aureus based on peaks binning of matrix-assisted laser desorption ionization-time of flight MS spectra","author":"Wang","year":"2020","journal-title":"Brief Bioinform"},{"issue":"4","key":"2021032314373209300_ref50","doi-asserted-by":"crossref","first-page":"18476","DOI":"10.1371\/journal.pone.0018476","article-title":"Prediction of antimicrobial peptides based on sequence alignment and feature selection methods","volume":"6","author":"Wang","year":"2011","journal-title":"PLoS One"},{"issue":"21","key":"2021032314373209300_ref51","doi-asserted-by":"crossref","first-page":"11385","DOI":"10.1128\/JVI.01363-09","article-title":"Rhesus theta-defensin prevents death in a mouse model of severe acute respiratory syndrome coronavirus pulmonary disease","volume":"83","author":"Wohlford-Lenane","year":"2009","journal-title":"J Virol"},{"key":"2021032314373209300_ref52","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.ijid.2020.03.004","article-title":"The SARS-CoV-2 outbreak: what we know","volume":"94","author":"Wu","year":"2020","journal-title":"Int J Infect Dis"},{"issue":"2","key":"2021032314373209300_ref53","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.ab.2013.01.019","article-title":"iamp-2l: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types","volume":"436","author":"Xiao","year":"2013","journal-title":"Anal Biochem"},{"issue":"8","key":"2021032314373209300_ref54","doi-asserted-by":"crossref","first-page":"1987","DOI":"10.1110\/ps.062286306","article-title":"An amino acid \u201ctransmembrane tendency\u201d scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: relationship to biological hydrophobicity","volume":"15","author":"Zhao","year":"2006","journal-title":"Protein Sci"},{"issue":"1","key":"2021032314373209300_ref55","doi-asserted-by":"crossref","first-page":"22008","DOI":"10.1038\/srep22008","article-title":"A novel peptide with potent and broad-spectrum anti-viral activities against multiple respiratory viruses","volume":"6","author":"Zhao","year":"2016","journal-title":"Sci Rep"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/2\/1085\/36655068\/bbaa423.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/2\/1085\/36655068\/bbaa423.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,23]],"date-time":"2021-03-23T15:10:49Z","timestamp":1616512249000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/22\/2\/1085\/6120286"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,26]]},"references-count":55,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,1,26]]},"published-print":{"date-parts":[[2021,3,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa423","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3]]},"published":{"date-parts":[[2021,1,26]]}}}