{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T09:23:09Z","timestamp":1770888189640,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2019,12,3]],"date-time":"2019-12-03T00:00:00Z","timestamp":1575331200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["DBI-1933521"],"award-info":[{"award-number":["DBI-1933521"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100008114","name":"University of Nebraska-Lincoln","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008114","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000199","name":"United States Department of Agriculture","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000199","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000199","name":"USDA","doi-asserted-by":"publisher","award":["58-8042-9-089"],"award-info":[{"award-number":["58-8042-9-089"]}],"id":[{"id":"10.13039\/100000199","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31728013"],"award-info":[{"award-number":["31728013"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61973174"],"award-info":[{"award-number":["61973174"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Carbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer-based CAZyme classification, motif identification and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>This new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer-based tools (including PPR-Hotpep, CUPP and eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer-based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/yinlabniu\/eCAMI and https:\/\/github.com\/zhanglabNKU\/eCAMI.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz908","type":"journal-article","created":{"date-parts":[[2019,12,2]],"date-time":"2019-12-02T16:58:29Z","timestamp":1575305909000},"page":"2068-2075","source":"Crossref","is-referenced-by-count":39,"title":["eCAMI: simultaneous classification and motif identification for enzyme annotation"],"prefix":"10.1093","volume":"36","author":[{"given":"Jing","family":"Xu","sequence":"first","affiliation":[{"name":"College of Artificial Intelligence, Nankai University , Tianjin 300071, China"},{"name":"College of Computer Science, Nankai University , Tianjin 300071, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8498-3451","authenticated-orcid":false,"given":"Han","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Nankai University , Tianjin 300071, China"}]},{"given":"Jinfang","family":"Zheng","sequence":"additional","affiliation":[{"name":"Department of Food Science and Technology, Nebraska Food for Health Center, University of Nebraska , Lincoln, NE 68588, USA"}]},{"given":"Philippe","family":"Dovoedo","sequence":"additional","affiliation":[{"name":"Department of Mathematical Sciences, Northern Illinois University , DeKalb, IL 60115, USA"}]},{"given":"Yanbin","family":"Yin","sequence":"additional","affiliation":[{"name":"Department of Food Science and Technology, Nebraska Food for Health Center, University of Nebraska , Lincoln, NE 68588, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,12,3]]},"reference":[{"key":"2023062312013810700_btz908-B1","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/1471-2148-12-186","article-title":"Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5)","volume":"12","author":"Aspeborg","year":"2012","journal-title":"BMC Evol. Biol"},{"key":"2023062312013810700_btz908-B2","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/nar\/28.1.304","article-title":"The ENZYME database in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B3","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1186\/s13068-019-1436-5","article-title":"Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)","volume":"12","author":"Barrett","year":"2019","journal-title":"Biotechnol. Biofuels"},{"key":"2023062312013810700_btz908-B4","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat Methods"},{"key":"2023062312013810700_btz908-B5","first-page":"181917","article-title":"Peptide Pattern Recognition for high-throughput protein sequence analysis and clustering","author":"Busk","year":"2018","journal-title":"bioRxiv"},{"key":"2023062312013810700_btz908-B6","doi-asserted-by":"crossref","first-page":"3380","DOI":"10.1128\/AEM.03803-12","article-title":"Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs","volume":"79","author":"Busk","year":"2013","journal-title":"Appl. Environ. Microbiol"},{"key":"2023062312013810700_btz908-B7","doi-asserted-by":"crossref","first-page":"e114138","DOI":"10.1371\/journal.pone.0114138","article-title":"Several genes encoding enzymes with the same activity are necessary for aerobic fungal degradation of cellulose in nature","volume":"9","author":"Busk","year":"2014","journal-title":"PLoS One"},{"key":"2023062312013810700_btz908-B8","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1186\/s12859-017-1625-9","article-title":"Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function","volume":"18","author":"Busk","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023062312013810700_btz908-B9","doi-asserted-by":"crossref","first-page":"3692","DOI":"10.1093\/nar\/gkg600","article-title":"SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence","volume":"31","author":"Cai","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B10","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1126\/science.1252076","article-title":"Genomic signatures of specialized metabolism in plants","volume":"344","author":"Chae","year":"2014","journal-title":"Science"},{"key":"2023062312013810700_btz908-B11","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1021\/pr0255710","article-title":"Prediction of enzyme family classes","volume":"2","author":"Chou","year":"2003","journal-title":"J. Proteome Res"},{"key":"2023062312013810700_btz908-B12","doi-asserted-by":"crossref","first-page":"6633","DOI":"10.1093\/nar\/gkg847","article-title":"Enzyme-specific profiles for genome annotation: PRIAM","volume":"31","author":"Claudel-Renard","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B13","doi-asserted-by":"crossref","first-page":"732","DOI":"10.1016\/j.jmb.2018.12.017","article-title":"N-glycan utilization by bifidobacterium gut symbionts involves a specialist beta-mannosidase","volume":"431","author":"Cordeiro","year":"2019","journal-title":"J. Mol. Biol"},{"key":"2023062312013810700_btz908-B14","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jmb.2004.10.024","article-title":"Predicting enzyme class from protein structure without alignments","volume":"345","author":"Dobson","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023062312013810700_btz908-B15","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B16","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1186\/s13068-018-1027-x","article-title":"SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets","volume":"11","author":"Jones","year":"2018","journal-title":"Biotechnol. Biofuels"},{"key":"2023062312013810700_btz908-B17","doi-asserted-by":"crossref","first-page":"760","DOI":"10.1093\/bioinformatics\/btx680","article-title":"DEEPre: sequence-based enzyme EC number prediction by deep learning","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062312013810700_btz908-B18","doi-asserted-by":"crossref","first-page":"D490","DOI":"10.1093\/nar\/gkt1178","article-title":"The carbohydrate-active enzymes database (CAZy) in 2013","volume":"42","author":"Lombard","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B19","doi-asserted-by":"crossref","first-page":"1686","DOI":"10.1128\/AEM.03453-15","article-title":"Dividing the large glycoside hydrolase family 43 into subfamilies: a motivation for detailed enzyme characterization","volume":"82","author":"Mewis","year":"2016","journal-title":"Appl. Environ. Microbiol"},{"key":"2023062312013810700_btz908-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/giy014","article-title":"Bipartite graphs in systems biology and medicine: a survey of methods and applications","volume":"7","author":"Pavlopoulos","year":"2018","journal-title":"Gigascience"},{"key":"2023062312013810700_btz908-B21","doi-asserted-by":"crossref","first-page":"13996","DOI":"10.1073\/pnas.1821905116","article-title":"Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers","volume":"116","author":"Ryu","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062312013810700_btz908-B22","doi-asserted-by":"crossref","first-page":"2041","DOI":"10.1104\/pp.16.01942","article-title":"Genome-wide prediction of metabolic enzymes, pathways, and gene clusters in plants","volume":"173","author":"Schlapfer","year":"2017","journal-title":"Plant Physiol"},{"key":"2023062312013810700_btz908-B23","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.bbrc.2007.09.098","article-title":"EzyPred: a top-down approach for predicting enzyme functional classes and subclasses","volume":"364","author":"Shen","year":"2007","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2023062312013810700_btz908-B24","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1093\/protein\/gzl044","article-title":"Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of alpha-amylase-related proteins","volume":"19","author":"Stam","year":"2006","journal-title":"Protein Eng. Des. Sel"},{"key":"2023062312013810700_btz908-B25","doi-asserted-by":"crossref","first-page":"6226","DOI":"10.1093\/nar\/gkh956","article-title":"EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference","volume":"32","author":"Tian","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B26","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1093\/bioinformatics\/btx635","article-title":"Detecting hidden batch factors through data-adaptive adjustment for biological effects","volume":"34","author":"Yi","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062312013810700_btz908-B27","doi-asserted-by":"crossref","first-page":"W445","DOI":"10.1093\/nar\/gks479","article-title":"dbCAN: a web resource for automated carbohydrate-active enzyme annotation","volume":"40","author":"Yin","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062312013810700_btz908-B28","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1002\/prot.22167","article-title":"Genome-wide enzyme annotation with precision control: catalytic families (CatFam) databases","volume":"74","author":"Yu","year":"2009","journal-title":"Proteins"},{"key":"2023062312013810700_btz908-B29","doi-asserted-by":"crossref","first-page":"W95","DOI":"10.1093\/nar\/gky418","article-title":"dbCAN2: a meta server for automated carbohydrate-active enzyme annotation","volume":"46","author":"Zhang","year":"2018","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz908\/31588767\/btz908.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/7\/2068\/50670128\/bioinformatics_36_7_2068.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/7\/2068\/50670128\/bioinformatics_36_7_2068.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T21:02:44Z","timestamp":1687640564000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/7\/2068\/5651014"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,12,3]]},"references-count":29,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2020,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz908","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,4,1]]},"published":{"date-parts":[[2019,12,3]]}}}