{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:37Z","timestamp":1772138077604,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"23","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Motivation: An important area of research in biochemistry and molecular biology focuses on characterization of enzyme mutants. However, synthesis and analysis of experimental mutants is time consuming and expensive. We describe a machine-learning approach for inferring the activity levels of all unexplored single point mutants of an enzyme, based on a training set of such mutants with experimentally measured activity.<\/jats:p>\n                  <jats:p>Results: Based on a Delaunay tessellation-derived four-body statistical potential function, a perturbation vector measuring environmental changes relative to wild type (wt) at every residue position uniquely characterizes each enzyme mutant for model development and prediction. First, a measure of model performance utilizing area (AUC) under the receiver operating characteristic (ROC) curve surpasses 0.83 and 0.77 for data sets of experimental HIV-1 protease and T4 lysozyme mutants, respectively. Additionally, a novel method is introduced for evaluating statistical significance associated with the number of correct test set predictions obtained from a trained model. Third, 100 stratified random splits of the protease and T4 lysozyme mutant data sets into training and test sets achieve 77.0% and 80.8% mean accuracy, respectively. Next, protease and T4 lysozyme models trained with experimental mutants are used to predict activity levels for all remaining mutants; a subsequent search for publications reporting on dozens of these test mutants reveals that experimental results are matched by 79% and 86% of predictions, respectively. Finally, learning curves for each mutant enzyme system indicate the influence of training set size on model performance.<\/jats:p>\n                  <jats:p>Availability: Prediction databases at http:\/\/proteins.gmu.edu\/automute\/<\/jats:p>\n                  <jats:p>Contact: \u00a0ivaisman@gmu.edu<\/jats:p>\n                  <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm509","type":"journal-article","created":{"date-parts":[[2007,10,31]],"date-time":"2007-10-31T20:33:58Z","timestamp":1193862838000},"page":"3155-3161","source":"Crossref","is-referenced-by-count":47,"title":["Accurate prediction of enzyme mutant activity based on a multibody statistical potential"],"prefix":"10.1093","volume":"23","author":[{"given":"Majid","family":"Masso","sequence":"first","affiliation":[{"name":"Laboratory for Structural Bioinformatics, School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA 20110, USA"}]},{"given":"Iosif I.","family":"Vaisman","sequence":"additional","affiliation":[{"name":"Laboratory for Structural Bioinformatics, School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA 20110, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,10,31]]},"reference":[{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"469","DOI":"10.1145\/235815.235821","article-title":"The quickhull algorithm for convex hulls","volume":"22","author":"Barber","year":"1996","journal-title":"ACM Trans. Math. Softw"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"D120","DOI":"10.1093\/nar\/gkh082","article-title":"ProTherm, version 4.0: thermodynamic database for proteins and mutants","volume":"32","author":"Bava","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"2246","DOI":"10.1126\/science.1103330","article-title":"Use of logic relationships to decipher protein network organization","volume":"306","author":"Bowers","year":"2004","journal-title":"Science"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1006\/jmbi.2001.4510","article-title":"Predicting the functional consequences of non-synonymous single nucleotide polymorphisms","volume":"307","author":"Chasman","year":"2001","journal-title":"J. Mol. Biol"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jmb.2004.10.024","article-title":"Predicting enzyme class from protein structure without alignments","volume":"345","author":"Dobson","year":"2005","journal-title":"J. Mol. Biol"},{"key":"2023041107510990800_","article-title":"ROC graphs: notes and practical considerations for researchers","volume-title":"Technical report HPL-2003-4","author":"Fawcett","year":"2003"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"2479","DOI":"10.1093\/bioinformatics\/bth261","article-title":"Data mining in bioinformatics using Weka","volume":"20","author":"Frank","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.virol.2004.10.020","article-title":"Prediction of functional class of novel viral proteins by a statistical learning method irrespective of sequence similarity","volume":"331","author":"Han","year":"2005","journal-title":"Virology"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1023\/A:1010920819831","article-title":"A simple generalization of the area under the ROC curve to multiple class classification problems","volume":"45","author":"Hand","year":"2001","journal-title":"Mach. Learn"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"1503","DOI":"10.1097\/01.aids.0000131358.29586.6b","article-title":"Evolution of resistance to drugs in HIV-1-infected patients failing antiretroviral therapy","volume":"18","author":"Kantor","year":"2004","journal-title":"AIDS"},{"key":"2023041107510990800_","first-page":"397","article-title":"Improving functional annotation of non-synonomous SNPs with information theory","author":"Karchin","year":"2005","journal-title":"Pac. Symp. Biocomput"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"2199","DOI":"10.1093\/bioinformatics\/btg297","article-title":"A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function","volume":"19","author":"Krishnan","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1038\/340397a0","article-title":"Complete mutagenesis of the HIV-1 protease","volume":"340","author":"Loeb","year":"1989","journal-title":"Nature"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1016\/S0006-291X(03)00760-5","article-title":"Comprehensive mutagenesis of HIV-1 protease: a computational geometry approach","volume":"305","author":"Masso","year":"2003","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1002\/prot.20968","article-title":"Computational mutagenesis studies of protein structure-function correlations","volume":"64","author":"Masso","year":"2006","journal-title":"Proteins"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1101\/gr.176601","article-title":"Predicting deleterious amino acid substitutions","volume":"11","author":"Ng","year":"2001","journal-title":"Genome Res"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"14754","DOI":"10.1073\/pnas.0404569101","article-title":"Automated prediction of protein function and detection of functional sites from structure","volume":"101","author":"Pazos","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041107510990800_","article-title":"Well-trained PETs: improving probability estimation trees","volume-title":"CeDER Technical report IS-00-04","author":"Provost","year":"2001"},{"key":"2023041107510990800_","volume-title":"C4.5: Programs for Machine Learning","author":"Quinlan","year":"1993"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"3894","DOI":"10.1093\/nar\/gkf493","article-title":"Human non-synonymous SNPs: server and survey","volume":"30","author":"Ramensky","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/0022-2836(91)90738-R","article-title":"Systematic mutation of bacteriophage T4 lysozyme","volume":"222","author":"Rennell","year":"1991","journal-title":"J. Mol. Biol"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1016\/S0022-2836(02)00813-6","article-title":"Evaluation of structural and evolutionary contributions to deleterious mutation prediction","volume":"322","author":"Saunders","year":"2002","journal-title":"J. Mol. Biol"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1089\/cmb.1996.3.213","article-title":"Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues","volume":"3","author":"Singh","year":"1996","journal-title":"J. Comput. Biol"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1093\/bioinformatics\/bth021","article-title":"Phylogenomic inference of protein molecular function: advances and challenges","volume":"20","author":"Sjolander","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1093\/hmg\/10.6.591","article-title":"Prediction of deleterious human alleles","volume":"10","author":"Sunyaev","year":"2001","journal-title":"Hum. Mol. Genet"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"6226","DOI":"10.1093\/nar\/gkh956","article-title":"EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference","volume":"32","author":"Tian","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1109\/IJSIS.1998.685437","article-title":"Compositional preferences in quadruplets of nearest neighbor residues in protein structures: statistical geometry analysis","author":"Vaisman","year":"1998","journal-title":"Proc. IEEE Symp. Intell. Syst"},{"key":"2023041107510990800_","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1002\/humu.22","article-title":"SNPs, protein structure, and disease","volume":"17","author":"Wang","year":"2001","journal-title":"Hum. Mutat"},{"key":"2023041107510990800_","volume-title":"Data Mining","author":"Witten","year":"2000"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/23\/3155\/49822989\/bioinformatics_23_23_3155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/23\/3155\/49822989\/bioinformatics_23_23_3155.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T11:51:18Z","timestamp":1684065078000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/23\/3155\/291404"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,10,31]]},"references-count":31,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2007,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm509","relation":{"has-review":[{"id-type":"doi","id":"10.3410\/f.1098693.554814","asserted-by":"object"}]},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,12,1]]},"published":{"date-parts":[[2007,10,31]]}}}