{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T16:09:52Z","timestamp":1768925392719,"version":"3.49.0"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3076,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular function-al specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs.<\/jats:p>\n               <jats:p>Results: We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolution-ary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs.<\/jats:p>\n               <jats:p>Availability: Datasets and GroupSim code are available online at http:\/\/compbio.cs.princeton.edu\/specificity\/<\/jats:p>\n               <jats:p>Contact: \u00a0msingh@cs.princeton.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn214","type":"journal-article","created":{"date-parts":[[2008,5,4]],"date-time":"2008-05-04T04:47:30Z","timestamp":1209876450000},"page":"1473-1480","source":"Crossref","is-referenced-by-count":104,"title":["Characterization and prediction of residues determining protein functional specificity"],"prefix":"10.1093","volume":"24","author":[{"given":"John A.","family":"Capra","sequence":"first","affiliation":[{"name":"Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA"}]},{"given":"Mona","family":"Singh","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA"}]}],"member":"286","published-online":{"date-parts":[[2008,5,1]]},"reference":[{"key":"2023020210370345500_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B2","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/nar\/28.1.304","article-title":"The enyzme database in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B3","doi-asserted-by":"crossref","first-page":"D154","DOI":"10.1093\/nar\/gki070","article-title":"The universal protein resource (UniProt)","volume":"33","author":"Bairoch","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B4","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/S0022-2836(02)01036-7","article-title":"Analysis of catalytic residues in enzyme active sites","volume":"324","author":"Bartlett","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B5","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B6","doi-asserted-by":"crossref","first-page":"e160","DOI":"10.1371\/journal.pcbi.0030160","article-title":"Automated protein subfamily identification and classification","volume":"3","author":"Brown","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023020210370345500_B7","doi-asserted-by":"crossref","first-page":"R8","DOI":"10.1186\/gb-2006-7-1-r8","article-title":"A gold standard set of mechanistically diverse enzyme superfamilies","volume":"7","author":"Brown","year":"2006","journal-title":"Genome Biol."},{"key":"2023020210370345500_B8","doi-asserted-by":"crossref","first-page":"1875","DOI":"10.1093\/bioinformatics\/btm270","article-title":"Predicting functionally important residues from sequence conservation","volume":"23","author":"Capra","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210370345500_B9","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1038\/nsb0295-171","article-title":"A method to predict functional residues in proteins","volume":"2","author":"Casari","year":"1995","journal-title":"Nat. Stuct. Biol."},{"key":"2023020210370345500_B10","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1016\/j.jmb.2007.08.036","article-title":"Functional specificity lies within the properties and evolutionary changes of amino acids","volume":"373","author":"Chakrabarti","year":"2007","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B11","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1145\/1143844.1143874","article-title":"The relationship between precision-recall and ROC curves","volume":"23","author":"Davis","year":"2006","journal-title":"Proceedings of 23rd International Conference on Machine Learning"},{"key":"2023020210370345500_B12","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1016\/S0022-2836(02)01451-1","article-title":"Automatic methods for predicting functionally important residues","volume":"326","author":"del Sol Mesa","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B13","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1101\/gr.2821705","article-title":"Probabilistic consistency-based multiple sequence alignment","volume":"15","author":"Do","year":"2005","journal-title":"Genome Res."},{"key":"2023020210370345500_B14","doi-asserted-by":"crossref","first-page":"2629","DOI":"10.1093\/bioinformatics\/bti396","article-title":"Determining functional specificity from protein sequences","volume":"21","author":"Donald","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020210370345500_B15","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gkj149","article-title":"Pfam: clans, web tools, and services","volume":"34","author":"Finn","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B16","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1093\/bioinformatics\/btm626","article-title":"Prediction of protein functional residues from sequence by probability density estimation","volume":"24","author":"Fischer","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020210370345500_B17","doi-asserted-by":"crossref","first-page":"12299","DOI":"10.1073\/pnas.0504833102","article-title":"Effective function annotation through catalytic residue conservation","volume":"102","author":"George","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210370345500_B18","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1006\/jmbi.2000.4036","article-title":"Analysis and prediction of functional sub-types from protein sequence alignments","volume":"303","author":"Hannenhalli","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B19","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210370345500_B20","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabasch","year":"1983","journal-title":"Biopolymers"},{"key":"2023020210370345500_B21","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1110\/ps.03191704","article-title":"Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous gropus in protein families","volume":"13","author":"Kalinina","year":"2003","journal-title":"Prot. Sci."},{"key":"2023020210370345500_B22","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/1471-2105-9-17","article-title":"Prediction of enzyme function based on 3D templates of evolutionarily important amino acids","volume":"9","author":"Kristensen","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020210370345500_B23","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/0022-2836(82)90515-0","article-title":"A simple method for displaying the hydropathic character of a protein","volume":"157","author":"Kyte","year":"1982","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B24","doi-asserted-by":"crossref","first-page":"D266","DOI":"10.1093\/nar\/gki001","article-title":"PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids","volume":"33","author":"Laskowski","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B25","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1006\/jmbi.1996.0167","article-title":"An evolutionary trace method defines binding surfaces common to protein families","volume":"257","author":"Lichtarge","year":"1996","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B26","first-page":"745","article-title":"Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation","volume":"9","author":"Livingstone","year":"1993","journal-title":"Comput. Appl. Biosci"},{"key":"2023020210370345500_B27","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1471-2105-9-51","article-title":"The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction","volume":"9","author":"Manning","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020210370345500_B28","doi-asserted-by":"crossref","first-page":"2466","DOI":"10.1093\/bioinformatics\/btl411","article-title":"Bayesian search of functionally divergent protein subgroups and their function specific residues","volume":"22","author":"Marttinen","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020210370345500_B29","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1186\/1471-2105-6-284","article-title":"Linking enzyme sequence to function using conserved property difference locator to identify and annotate positions likely to control specific functionality","volume":"6","author":"Mayer","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210370345500_B30","doi-asserted-by":"crossref","first-page":"1265","DOI":"10.1016\/j.jmb.2003.12.078","article-title":"A family of evolution-entropy methods for ranking protein residues by importance","volume":"336","author":"Mihalek","year":"2004","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B31","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/S0022-2836(02)00587-9","article-title":"Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors","volume":"321","author":"Mirny","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023020210370345500_B32","doi-asserted-by":"crossref","first-page":"1440","DOI":"10.1093\/bioinformatics\/btl104","article-title":"Phylogeny-independent detection of functional residues","volume":"22","author":"Pazos","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020210370345500_B33","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1093\/bioinformatics\/bti766","article-title":"Prediction of functional specificity determinants from protein sequences using log-likelihood ratios","volume":"22","author":"Pei","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020210370345500_B34","doi-asserted-by":"crossref","first-page":"6540","DOI":"10.1093\/nar\/gkl901","article-title":"Sequence comparison by sequence harmony identifies subtype-specific functional sites","volume":"34","author":"Pirovano","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B35","doi-asserted-by":"crossref","first-page":"D129","DOI":"10.1093\/nar\/gkh028","article-title":"The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data","volume":"32","author":"Porter","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023020210370345500_B36","doi-asserted-by":"crossref","first-page":"R232","DOI":"10.1186\/gb-2007-8-11-r232","article-title":"Determinants of protein function revealed by combinatorial entropy optimization","volume":"8","author":"Reva","year":"2007","journal-title":"Genome Biol."},{"key":"2023020210370345500_B37","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1002\/prot.10146","article-title":"Scoring residue conservation","volume":"48","author":"Valdar","year":"2002","journal-title":"Proteins"},{"key":"2023020210370345500_B38","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1186\/1471-2105-8-135","article-title":"Supervised multivariate analysis of sequence groups to identify specificity determining residues","volume":"8","author":"Wallace","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023020210370345500_B39","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1002\/prot.20899","article-title":"A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-couples receptors","volume":"63","author":"Ye","year":"2006","journal-title":"Prot. Struct. Funct. Bioinfo."},{"key":"2023020210370345500_B40","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1016\/j.jmb.2005.08.008","article-title":"In silico discovery of enzyme-substrate specificity-determining residue clusters","volume":"352","author":"Yu","year":"2005","journal-title":"J. Mol. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/1473\/49052221\/bioinformatics_24_13_1473.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/13\/1473\/49052221\/bioinformatics_24_13_1473.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T12:15:28Z","timestamp":1675340128000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/13\/1473\/237512"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,5,1]]},"references-count":40,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2008,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn214","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,7,1]]},"published":{"date-parts":[[2008,5,1]]}}}