{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T18:42:33Z","timestamp":1771353753103,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions.<\/jats:p>\n               <jats:p>Results: Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases.<\/jats:p>\n               <jats:p>Availability: \u00a0http:\/\/www.genoscope.fr\/ASMC\/.<\/jats:p>\n               <jats:p>Contact: \u00a0raquelcm@dcc.ufmg.br; kbastard@genoscope.cns.fr; artigue@genoscope.cns.fr<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq595","type":"journal-article","created":{"date-parts":[[2010,10,28]],"date-time":"2010-10-28T00:19:07Z","timestamp":1288225147000},"page":"3075-3082","source":"Crossref","is-referenced-by-count":38,"title":["Identification of subfamily-specific sites based on active sites modeling and clustering"],"prefix":"10.1093","volume":"26","author":[{"given":"Raquel C.","family":"de Melo-Minardi","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"}]},{"given":"Karine","family":"Bastard","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"},{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"},{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"}]},{"given":"Fran\u00e7ois","family":"Artiguenave","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"},{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"},{"name":"1 Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2Genoscope, Institut de G\u00e9nomique, Comissariat \u00e0 l'\u00e9nergie atomique et aux \u00e9nergies alternatives, Evry cedex, 3UMR 8030, Cenre National de la Recherche Scientifique, Evry cedex and 4Universit\u00e9 Evry Val d'Essonne, Evry F-91057, France"}]}],"member":"286","published-online":{"date-parts":[[2010,10,26]]},"reference":[{"key":"2023012508031588400_B1","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1093\/bioinformatics\/btn214","article-title":"Characterization and prediction of residues determining protein functional specificity","volume":"24","author":"Capra","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508031588400_B2","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1002\/prot.22239","article-title":"Coevolution in defining the functional specificity","volume":"75","author":"Chakrabarti","year":"2009","journal-title":"Proteins"},{"key":"2023012508031588400_B3","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1016\/j.jmb.2007.08.036","article-title":"Functional specificity lies within the properties and evolutionary changes of amino acids","volume":"373","author":"Chakrabarti","year":"2007","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B4","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/1472-6807-8-31","article-title":"Systematic analysis of the effect of multiple templates on the accuracy of comparative models of protein structure","volume":"8","author":"Chakravarty","year":"2008","journal-title":"BMC Struct. Biol."},{"key":"2023012508031588400_B5","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: a sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res."},{"key":"2023012508031588400_B6","doi-asserted-by":"crossref","first-page":"2629","DOI":"10.1093\/bioinformatics\/bti396","article-title":"Determining functional specificity from protein sequences","volume":"21","author":"Donaldo","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508031588400_B7","doi-asserted-by":"crossref","first-page":"D191","DOI":"10.1093\/nar\/gkn716","article-title":"SDR: a database of predicted specificity-determining residues in proteins","volume":"37","author":"Donald","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508031588400_B8","doi-asserted-by":"crossref","DOI":"10.1002\/0471250953.bi0506s15","article-title":"Comparative protein structure modeling using modeller","author":"Eswar","year":"2006","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023012508031588400_B9","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/978-1-60327-058-8_8","article-title":"Protein structure modelling with Modeller","volume":"426","author":"Eswar","year":"2008","journal-title":"Methods Mol. Biol."},{"key":"2023012508031588400_B10","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The Pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012508031588400_B11","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1007\/BF00114265","article-title":"Knowledge acquisition via incremental conceptual clustering","volume":"2","author":"Fisher","year":"1987","journal-title":"Mach. Learn."},{"key":"2023012508031588400_B12","doi-asserted-by":"crossref","first-page":"D323","DOI":"10.1093\/nar\/gkn822","article-title":"The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures","volume":"37","author":"Goldenberg","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508031588400_B13","doi-asserted-by":"crossref","first-page":"e1000179","DOI":"10.1371\/journal.pcbi.1000179","article-title":"Discarding functional residues from the substitution table improves prediction of active sites within three-dimensional structures","volume":"4","author":"Gong","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012508031588400_B14","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1016\/j.cell.2009.07.038","article-title":"Protein sectors: evolutionary units of three-dimensional structure","volume":"138","author":"Halabi","year":"2009","journal-title":"Cell"},{"key":"2023012508031588400_B15","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1126\/science.3291115","article-title":"The protein kinase family: conserved features and deduced phylogeny of the catalytic domains","volume":"241","author":"Hanks","year":"1988","journal-title":"Science"},{"key":"2023012508031588400_B16","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1006\/jmbi.2000.4036","article-title":"Analysis and prediction of functional sub-types from protein sequence alignments","volume":"303","author":"Hannenhalli","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B17","doi-asserted-by":"crossref","first-page":"8757","DOI":"10.1021\/bi00195a017","article-title":"Converting trypsin to chymotrypsin: residue 172 is a substrate specificity determinant","volume":"33","author":"Hedstrom","year":"1994","journal-title":"Biochemistry"},{"issue":"Suppl. 4","key":"2023012508031588400_B18","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-8-S4-S5","article-title":"Using structural motif descriptors for sequence-based binding site prediction","volume":"8","author":"Henschel","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023012508031588400_B19","first-page":"357","article-title":"Weka: a machine learning workbench","volume-title":"Proceedings of the Second Australia and New Zealand Conference on Intelligent Information Systems.","author":"Holmes","year":"1994"},{"key":"2023012508031588400_B20","doi-asserted-by":"crossref","first-page":"W424","DOI":"10.1093\/nar\/gkh391","article-title":"SDPred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins","volume":"32","author":"Kalinina","year":"2004","journal-title":"Nucleid Acids Res."},{"key":"2023012508031588400_B21","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/1471-2105-10-174","article-title":"Combining specificity determining and conserved residues improves functional site prediction","volume":"10","author":"Kalinina","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012508031588400_B22","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/nar\/gki198","article-title":"Mafft version 5: improvement in accuracy of multiple sequence alignment","volume":"33","author":"Katoh","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012508031588400_B23","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1146\/annurev-biochem-030409-143718","article-title":"Enzyme promiscuity: a mechanistic and evolutionary perspective","volume":"79","author":"Khersonsky","year":"2010","journal-title":"Annu. Rev. Biochem."},{"key":"2023012508031588400_B24","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/1471-2105-9-17","article-title":"Prediction of enzyme function based on 3D templates of evolutionarily important amino acids","volume":"9","author":"Kristensen","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508031588400_B25","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1006\/jmbi.2001.4540","article-title":"Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins","volume":"307","author":"Langraf","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B26","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1186\/1471-2105-10-168","article-title":"Fpocket: an open source platform for ligand pocket detection","volume":"10","author":"Le Guilloux","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012508031588400_B27","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1006\/jmbi.1996.0167","article-title":"An evolutionary trace method defines binding surfaces common to protein families","volume":"257","author":"Lichtarge","year":"1996","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B28","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1006\/jmbi.2001.5327","article-title":"Structural clusters of evolutionary trace residues are statistically significant and common in proteins","volume":"316","author":"Madabushi","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B29","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1093\/protein\/gzp040","article-title":"Alignment of multiple protein structures based on sequence and structure features","volume":"22","author":"Madhusudhan","year":"2009","journal-title":"Protein Eng. Des. Sel."},{"key":"2023012508031588400_B30","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.sbi.2005.05.011","article-title":"A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction","volume":"15","author":"Moult","year":"2005","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012508031588400_B31","doi-asserted-by":"crossref","first-page":"2369","DOI":"10.1002\/prot.22750","article-title":"Relationship between functional subclasses and information contained in active-site and ligand-binding residues in diverse superfamilies","volume":"78","author":"Nagao","year":"2010","journal-title":"Proteins"},{"key":"2023012508031588400_B32","doi-asserted-by":"crossref","first-page":"i105","DOI":"10.1093\/bioinformatics\/btn263","article-title":"Detection of 3d atomic similarities and their use in the discrimination of small molecule protein-binding sites","volume":"26","author":"Najmanovich","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508031588400_B33","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"CATH: a hierarchic database of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023012508031588400_B34","doi-asserted-by":"crossref","first-page":"1440","DOI":"10.1093\/bioinformatics\/btl104","article-title":"Phylogeny-independent detection of functional residues","volume":"22","author":"Pazos","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012508031588400_B35","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1093\/bioinformatics\/bti766","article-title":"Prediction of functional specificity determinants from protein sequences using log-likelihood ratios","volume":"22","author":"Pei","year":"2006","journal-title":"Bioinformatics"},{"issue":"Suppl. 1","key":"2023012508031588400_B36","doi-asserted-by":"crossref","first-page":"S71","DOI":"10.1093\/bioinformatics\/18.suppl_1.S71","article-title":"Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants with their homologues","volume":"18","author":"Pupko","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012508031588400_B37","doi-asserted-by":"crossref","first-page":"1995","DOI":"10.1073\/pnas.0908044107","article-title":"Protein interactions and ligand binding: from protein subfamilies to functional specificity","volume":"107","author":"Rausell","year":"2010","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508031588400_B38","doi-asserted-by":"crossref","first-page":"e1000485","DOI":"10.1371\/journal.pcbi.1000485","article-title":"FLORA: a novel method to predict protein function from structure diverse superfamilies","volume":"5","author":"Redfern","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023012508031588400_B39","doi-asserted-by":"crossref","first-page":"e1000636","DOI":"10.1371\/journal.pcbi.1000636","article-title":"Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families","volume":"6","author":"Rottig","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023012508031588400_B40","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1002\/prot.10628","article-title":"A method for simultaneous alignment of multiple protein structures","volume":"56","author":"Shatsky","year":"2004","journal-title":"Proteins"},{"key":"2023012508031588400_B41","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1016\/S0022-2836(02)01451-1","article-title":"Automatic methods for predicting functionally important residues","volume":"326","author":"Sol","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B42","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L","article-title":"Pfam: a comprehensive database of protein families based on seed alignments","volume":"28","author":"Sonnhammer","year":"1997","journal-title":"Proteins"},{"issue":"Suppl. 6","key":"2023012508031588400_B43","first-page":"652","article-title":"Assessment of homology-based predictions in CASP5","volume":"53","author":"Tramontano","year":"2003","journal-title":"Proteins"},{"key":"2023012508031588400_B44","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1016\/j.jmb.2008.12.072","article-title":"Predicting protein function and binding profile via matching of local evolutionary and geometric surface patterns","volume":"387","author":"Tseng","year":"2009","journal-title":"J. Mol. Biol."},{"key":"2023012508031588400_B45","doi-asserted-by":"crossref","first-page":"5993","DOI":"10.1073\/pnas.95.11.5993","article-title":"Two amino acid substitutions convert a guanylyl cyclase, RetGC-1 into and adenylyl cyclase","volume":"98","author":"Tucker","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508031588400_B46","doi-asserted-by":"crossref","first-page":"1426","DOI":"10.1093\/bioinformatics\/btp160","article-title":"Evolutionary trace annotation server: automated enzyme function prediction in protein structures with 3D templates","volume":"25","author":"Ward","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508031588400_B47","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1093\/oxfordjournals.molbev.a003851","article-title":"A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach","volume":"18","author":"Whelan","year":"2001","journal-title":"Mol. Biol. Evol."},{"key":"2023012508031588400_B48","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1016\/j.jmb.2005.08.008","article-title":"In silico discovery of enzyme-substrate specificity-determining residue clusters","volume":"352","author":"Yu","year":"2005","journal-title":"J. Mol. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/24\/3075\/48854685\/bioinformatics_26_24_3075.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/24\/3075\/48854685\/bioinformatics_26_24_3075.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:05:45Z","timestamp":1674633945000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/24\/3075\/289453"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,10,26]]},"references-count":48,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2010,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq595","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,12,15]]},"published":{"date-parts":[[2010,10,26]]}}}