{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T14:50:47Z","timestamp":1775573447065,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2021,1,30]],"date-time":"2021-01-30T00:00:00Z","timestamp":1611964800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CCF-1909536"],"award-info":[{"award-number":["CCF-1909536"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000057","name":"NIGMS","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health, Award","doi-asserted-by":"crossref","award":["R01GM132391"],"award-info":[{"award-number":["R01GM132391"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme\u2019s natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We frame this \u2018enzyme promiscuity prediction\u2019 problem as a multi-label classification task. We maximally utilize inhibitor and unlabeled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbors similarity-based and other machine-learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>We provide Python code and data for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https:\/\/github.com\/hassounlab\/EPP.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab054","type":"journal-article","created":{"date-parts":[[2021,1,26]],"date-time":"2021-01-26T12:12:09Z","timestamp":1611663129000},"page":"2017-2024","source":"Crossref","is-referenced-by-count":32,"title":["Enzyme promiscuity prediction using hierarchy-informed multi-label classification"],"prefix":"10.1093","volume":"37","author":[{"given":"Gian Marco","family":"Visani","sequence":"first","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155, USA"}]},{"given":"Michael C","family":"Hughes","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9477-2199","authenticated-orcid":false,"given":"Soha","family":"Hassoun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155, USA"},{"name":"Department of Chemical and Biological Engineering, Tufts University , Medford, MA 02155, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,1,30]]},"reference":[{"key":"2023061310310793400_btab054-B1","volume-title":"Molecular Similarity and Xenobiotic Metabolism","author":"Adams","year":"2010"},{"key":"2023061310310793400_btab054-B2","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1186\/s12934-019-1156-3","article-title":"Towards creating an extended metabolic model (EMM) for E. coli using enzyme promiscuity prediction and metabolomics data","volume":"18","author":"Amin","year":"2019","journal-title":"Microb. Cell Factories"},{"key":"2023061310310793400_btab054-B3","doi-asserted-by":"crossref","first-page":"1405","DOI":"10.1002\/bit.26959","article-title":"Establishing synthesis pathway-host compatibility via enzyme solubility","volume":"116","author":"Amin","year":"2019","journal-title":"Biotechnol. Bioeng"},{"key":"2023061310310793400_btab054-B4","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1186\/s13321-015-0069-3","article-title":"Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?","volume":"7","author":"Bajusz","year":"2015","journal-title":"J. Cheminf"},{"key":"2023061310310793400_btab054-B5","first-page":"719","article-title":"Learning from positive and unlabeled data: a survey","volume":"109","author":"Bekker","year":"2020"},{"key":"2023061310310793400_btab054-B6","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res"},{"key":"2023061310310793400_btab054-B7","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn"},{"key":"2023061310310793400_btab054-B8","doi-asserted-by":"crossref","first-page":"2012","DOI":"10.1093\/bioinformatics\/btq317","article-title":"Molecular signatures-based prediction of enzyme promiscuity","volume":"26","author":"Carbonell","year":"2010","journal-title":"Bioinformatics"},{"key":"2023061310310793400_btab054-B9","doi-asserted-by":"crossref","first-page":"43994","DOI":"10.1074\/jbc.M111.274050","article-title":"Origins of specificity and promiscuity in metabolic networks","volume":"286","author":"Carbonell","year":"2011","journal-title":"J. Biol. Chem"},{"key":"2023061310310793400_btab054-B10","doi-asserted-by":"crossref","first-page":"W389","DOI":"10.1093\/nar\/gku362","article-title":"XTMS: pathway design in an eXTended metabolic space","volume":"42","author":"Carbonell","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023061310310793400_btab054-B11","doi-asserted-by":"crossref","first-page":"2153","DOI":"10.1093\/bioinformatics\/bty065","article-title":"Selenzyme: enzyme selection tool for pathway design","volume":"34","author":"Carbonell","year":"2018","journal-title":"Bioinformatics"},{"key":"2023061310310793400_btab054-B12","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41929-019-0385-5","article-title":"Engineering new catalytic activities in enzymes","volume":"3","author":"Chen","year":"2020","journal-title":"Nat. Catal"},{"key":"2023061310310793400_btab054-B13","doi-asserted-by":"crossref","first-page":"2208","DOI":"10.3390\/molecules23092208","article-title":"Machine learning for drug\u2013target interaction prediction","volume":"23","author":"Chen","year":"2018","journal-title":"Molecules"},{"key":"2023061310310793400_btab054-B14","doi-asserted-by":"crossref","first-page":"5389","DOI":"10.3390\/ijms20215389","article-title":"Alignment-free method to predict enzyme classes and subclasses","volume":"20","author":"Concu","year":"2019","journal-title":"Int. J. Mol. Sci"},{"key":"2023061310310793400_btab054-B15","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1002\/(SICI)1521-1878(199802)20:2<181::AID-BIES10>3.0.CO;2-0","article-title":"Underground metabolism","volume":"20","author":"D'Ari","year":"1998","journal-title":"Bioessays"},{"key":"2023061310310793400_btab054-B16","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1186\/s12859-018-2368-y","article-title":"ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature","volume":"19","author":"Dalkiran","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023061310310793400_btab054-B17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-018-0324-5","article-title":"BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification","volume":"11","author":"Djoumbou-Feunang","year":"2019","journal-title":"J. Cheminf"},{"key":"2023061310310793400_btab054-B18","doi-asserted-by":"crossref","first-page":"1613","DOI":"10.1080\/13102818.2018.1521302","article-title":"A hierarchical multi-label classification method based on neural networks for gene function prediction","volume":"32","author":"Feng","year":"2018","journal-title":"Biotechnol. Biotechnol. Equipment"},{"key":"2023061310310793400_btab054-B19","doi-asserted-by":"crossref","first-page":"160","DOI":"10.3390\/metabo10040160","article-title":"Biological filtering and substrate promiscuity prediction for annotating untargeted metabolomics","volume":"10","author":"Hassanpour","year":"2020","journal-title":"Metabolites"},{"key":"2023061310310793400_btab054-B20","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s13321-015-0087-1","article-title":"MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics","volume":"7","author":"Jeffryes","year":"2015","journal-title":"J. Cheminf"},{"key":"2023061310310793400_btab054-B21","first-page":"btaa881","article-title":"Learning graph representations of biochemical networks and its application to enzymatic link prediction","volume":"2020","author":"Jiang","year":"2020","journal-title":"Bioinformatics"},{"key":"2023061310310793400_btab054-B22","first-page":"2323","article-title":"Junction tree variational autoencoder for molecular graph generation","author":"Jin","year":"2018"},{"key":"2023061310310793400_btab054-B23","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1016\/j.cbpa.2006.08.011","article-title":"Enzyme promiscuity: evolutionary and mechanistic aspects","volume":"10","author":"Khersonsky","year":"2006","journal-title":"Current Opinion in Chemical Biology"},{"key":"2023061310310793400_btab054-B24","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1146\/annurev-biochem-030409-143718","article-title":"Enzyme promiscuity: a mechanistic and evolutionary perspective","volume":"79","author":"Khersonsky","year":"2010","journal-title":"Annu. Rev. Biochem"},{"key":"2023061310310793400_btab054-B25","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov","year":"2018","journal-title":"Bioinformatics"},{"key":"2023061310310793400_btab054-B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1687-4153-2012-1","article-title":"A top-down approach to classify enzyme functional classes and sub-classes using random forest","volume":"2012","author":"Kumar","year":"2012","journal-title":"EURASIP J. Bioinf. Syst. Biol"},{"key":"2023061310310793400_btab054-B27","doi-asserted-by":"crossref","first-page":"760","DOI":"10.1093\/bioinformatics\/btx680","article-title":"DEEPre: sequence-based enzyme EC number prediction by deep learning","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023061310310793400_btab054-B28","first-page":"179","author":"Liu","year":"2003"},{"key":"2023061310310793400_btab054-B29","first-page":"151","author":"Manning","year":"2009"},{"key":"2023061310310793400_btab054-B30","doi-asserted-by":"crossref","first-page":"2077","DOI":"10.1021\/acs.jcim.7b00166","article-title":"Profile-QSAR 2.0: kinase virtual screening accuracy comparable to four-concentration IC50s for realistically novel compounds","volume":"57","author":"Martin","year":"2017","journal-title":"J. Chem. Inf. Model"},{"key":"2023061310310793400_btab054-B31","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1021\/acssynbio.5b00294","article-title":"Semisupervised Gaussian process for automated enzyme search","volume":"5","author":"Mellor","year":"2016","journal-title":"ACS Synth. Biol"},{"key":"2023061310310793400_btab054-B32","author":"Moura","year":"2013"},{"key":"2023061310310793400_btab054-B33","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1038\/nbt1519","article-title":"Protein promiscuity and its implications for biotechnology","volume":"27","author":"Nobeli","year":"2009","journal-title":"Nat. Biotechnol"},{"key":"2023061310310793400_btab054-B34","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023061310310793400_btab054-B35","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/j.ymben.2017.09.016","article-title":"Predicting novel substrates for enzymes with minimal experimental effort with active learning","volume":"44","author":"Pertusi","year":"2017","journal-title":"Metab. Eng"},{"key":"2023061310310793400_btab054-B36","doi-asserted-by":"crossref","first-page":"1016","DOI":"10.1093\/bioinformatics\/btu760","article-title":"Efficient searching and annotation of metabolic networks using chemical similarity","volume":"31","author":"Pertusi","year":"2015","journal-title":"Bioinformatics"},{"key":"2023061310310793400_btab054-B37","first-page":"3","author":"Radenovi\u0107","year":"2016"},{"key":"2023061310310793400_btab054-B38","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1038\/nmeth.2803","article-title":"EC-BLAST: a tool to automatically search and compare enzyme reactions","volume":"11","author":"Rahman","year":"2014","journal-title":"Nat. Methods"},{"key":"2023061310310793400_btab054-B39","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model"},{"key":"2023061310310793400_btab054-B40","doi-asserted-by":"crossref","first-page":"W471","DOI":"10.1093\/nar\/gks372","article-title":"COFACTOR: an accurate comparative algorithm for structure-based protein function annotation","volume":"40","author":"Roy","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023061310310793400_btab054-B41","doi-asserted-by":"crossref","first-page":"13996","DOI":"10.1073\/pnas.1821905116","article-title":"Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers","volume":"116","author":"Ryu","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023061310310793400_btab054-B42","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1016\/j.jbiotec.2017.04.020","article-title":"The BRENDA enzyme information system\u2014from a database to an expert system","volume":"261","author":"Schomburg","year":"2017","journal-title":"J. Biotechnol"},{"key":"2023061310310793400_btab054-B43","first-page":"1409","volume-title":"A statistical method for evaluating systematic relationships","author":"Sokal","year":"1958"},{"key":"2023061310310793400_btab054-B44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/jdwm.2007070101","article-title":"Multi-label classification: an overview","volume":"3","author":"Tsoumakas","year":"2007","journal-title":"Int. J. Data Warehousing Mining"},{"key":"2023061310310793400_btab054-B45","first-page":"5075","author":"Wehrmann","year":"2018"},{"key":"2023061310310793400_btab054-B46","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1186\/s12918-015-0241-4","article-title":"PROXIMAL: a method for prediction of xenobiotic metabolism","volume":"9","author":"Yousofshahi","year":"2015","journal-title":"BMC Syst. Biol"},{"key":"2023061310310793400_btab054-B47","first-page":"650","author":"Zhang","year":"2008"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab054\/38865493\/btab054.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/14\/2017\/50579099\/btab054.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/14\/2017\/50579099\/btab054.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T10:34:31Z","timestamp":1686652471000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/14\/2017\/6124277"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1,30]]},"references-count":47,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2021,8,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab054","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,7,15]]},"published":{"date-parts":[[2021,1,30]]}}}