{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T19:00:50Z","timestamp":1706814050396},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-10-335","type":"journal-article","created":{"date-parts":[[2009,10,15]],"date-time":"2009-10-15T18:14:56Z","timestamp":1255630496000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Clustering of protein domains for functional and evolutionary studies"],"prefix":"10.1186","volume":"10","author":[{"given":"Pavle","family":"Goldstein","sequence":"first","affiliation":[]},{"given":"Jurica","family":"Zucko","sequence":"additional","affiliation":[]},{"given":"Du\u0161ica","family":"Vujaklija","sequence":"additional","affiliation":[]},{"given":"Anita","family":"Kri\u0161ko","sequence":"additional","affiliation":[]},{"given":"Daslav","family":"Hranueli","sequence":"additional","affiliation":[]},{"given":"Paul F","family":"Long","sequence":"additional","affiliation":[]},{"given":"Catherine","family":"Etchebest","sequence":"additional","affiliation":[]},{"given":"Bojan","family":"Basrak","sequence":"additional","affiliation":[]},{"given":"John","family":"Cullum","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2009,10,15]]},"reference":[{"key":"3065_CR1","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215: 403\u2013410.","journal-title":"J Mol Biol"},{"key":"3065_CR2","doi-asserted-by":"publisher","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","volume":"14","author":"SR Eddy","year":"1998","unstructured":"Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14: 755\u2013763. 10.1093\/bioinformatics\/14.9.755","journal-title":"Bioinformatics"},{"key":"3065_CR3","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1093\/nar\/30.1.276","volume":"30","author":"A Bateman","year":"2002","unstructured":"Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2002, 30: 276\u2013280. 10.1093\/nar\/30.1.276","journal-title":"Nucleic Acids Res"},{"key":"3065_CR4","doi-asserted-by":"publisher","first-page":"1697","DOI":"10.2174\/0929867054367176","volume":"12","author":"D Hranueli","year":"2005","unstructured":"Hranueli D, Cullum J, Basrak B, Goldstein P, Long PF: Plasticity of the Streptomyces genome - evolution and engineering of new antibiotics. Curr Med Chem 2005, 12: 1697\u20131704. 10.2174\/0929867054367176","journal-title":"Curr Med Chem"},{"key":"3065_CR5","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1039\/B801658P","volume":"26","author":"YA Chan","year":"2009","unstructured":"Chan YA, Podevels AM, Kevany BM, Thomas MG: Biosynthesis of polyketide synthase extender units. Nat Prod Rep 2009, 26: 90\u2013114. 10.1039\/b801658p","journal-title":"Nat Prod Rep"},{"key":"3065_CR6","doi-asserted-by":"publisher","first-page":"6882","DOI":"10.1093\/nar\/gkn685","volume":"36","author":"A Starcevic","year":"2008","unstructured":"Starcevic A, Zucko J, Simunkovic J, Long PF, Cullum J, Hranueli D: ClustScan : An integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res 2008, 36: 6882\u20136892. 10.1093\/nar\/gkn685","journal-title":"Nucleic Acids Res"},{"key":"3065_CR7","doi-asserted-by":"publisher","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","volume":"22","author":"JD Thompson","year":"1994","unstructured":"Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673\u20134680. 10.1093\/nar\/22.22.4673","journal-title":"Nucleic Acids Res"},{"key":"3065_CR8","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1002\/cbic.200300581","volume":"4","author":"P Caffrey","year":"2003","unstructured":"Caffrey P: Conserved amino acid residues correlating with ketoreductase stereospecificity in modular polyketide synthases. Chem Bio Chem 2003, 4: 654\u2013657.","journal-title":"Chem Bio Chem"},{"key":"3065_CR9","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1016\/S0022-2836(03)00232-8","volume":"328","author":"G Yadav","year":"2003","unstructured":"Yadav G, Gokhale RS, Mohanty D: Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases. J Mol Biol 2003, 328: 335\u2013363. 10.1016\/S0022-2836(03)00232-8","journal-title":"J Mol Biol"},{"key":"3065_CR10","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1006\/jmbi.2000.4036","volume":"303","author":"SS Hannenhalli","year":"2000","unstructured":"Hannenhalli SS, Russell RB: Analysis and prediction of functional sub-types from protein sequence alignments. J Mol Biol 2000, 303: 61\u201376. 10.1006\/jmbi.2000.4036","journal-title":"J Mol Biol"},{"key":"3065_CR11","doi-asserted-by":"publisher","first-page":"6540","DOI":"10.1093\/nar\/gkl901","volume":"34","author":"W Pirovano","year":"2006","unstructured":"Pirovano W, Feenstra KA, Heringa J: Sequence comparison by sequence harmony identifies subtype-specific functional sites. Nucleic Acids Res 2006, 34: 6540\u20136548. 10.1093\/nar\/gkl901","journal-title":"Nucleic Acids Res"},{"key":"3065_CR12","doi-asserted-by":"publisher","first-page":"1440","DOI":"10.1093\/bioinformatics\/btl104","volume":"22","author":"F Pazos","year":"2006","unstructured":"Pazos F, Rausell A, Valencia A: Phylogeny-independent detection of functional residues. Bioinformatics 2006, 22: 1440\u20131448. 10.1093\/bioinformatics\/btl104","journal-title":"Bioinformatics"},{"key":"3065_CR13","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1186\/1471-2105-8-135","volume":"8","author":"IM Wallace","year":"2007","unstructured":"Wallace IM, Higgins DG: Supervised multivariate analysis of sequence groups to identify specificity determining residues. BMC Bioinformatics 2007, 8: 135. 10.1186\/1471-2105-8-135","journal-title":"BMC Bioinformatics"},{"key":"3065_CR14","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1093\/bioinformatics\/btm537","volume":"24","author":"KK Ye","year":"2008","unstructured":"Ye KK, Feenstra A, Heringa J, IJzerman AP, Marchiori E: Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a machine-learning approach for feature weighting. Bioinformatics 2008, 24: 18\u201325. 10.1093\/bioinformatics\/btm537","journal-title":"Bioinformatics"},{"key":"3065_CR15","doi-asserted-by":"publisher","first-page":"4876","DOI":"10.1093\/nar\/25.24.4876","volume":"25","author":"JD Thompson","year":"1997","unstructured":"Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 1997, 25: 4876\u20134882. 10.1093\/nar\/25.24.4876","journal-title":"Nucleic Acids Res"},{"key":"3065_CR16","doi-asserted-by":"publisher","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","volume":"89","author":"S Henikoff","year":"1992","unstructured":"Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 1992, 89: 10915\u201310919. 10.1073\/pnas.89.22.10915","journal-title":"Proc Natl Acad Sci USA"},{"key":"3065_CR17","doi-asserted-by":"publisher","first-page":"D169","DOI":"10.1093\/nar\/gkn664","volume":"37","author":"The UniProt Consortium","year":"2009","unstructured":"The UniProt Consortium: The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res 2009, 37: D169-D174. 10.1093\/nar\/gkn664","journal-title":"Nucleic Acids Res"},{"key":"3065_CR18","doi-asserted-by":"publisher","first-page":"444","DOI":"10.1016\/S0968-0004(97)01131-6","volume":"22","author":"CM Smith","year":"1997","unstructured":"Smith CM, Shindyalov IN, Veretnik S, Gribskov M, Taylor SS, Ten Eyck LF, Bourne PE: The protein kinase resource. Trends Biochem Sci 1997, 22: 444\u2013446. 10.1016\/S0968-0004(97)01131-6","journal-title":"Trends Biochem Sci"},{"key":"3065_CR19","doi-asserted-by":"publisher","first-page":"1541","DOI":"10.1126\/science.3201242","volume":"242","author":"HM Wilks","year":"1988","unstructured":"Wilks HM, Hart KW, Feeney R, Dunn CR, Muirhead H, Chia WN, Barstow DA, Atkinson T, Clarke AR, Holbrook JJ: A specific, highly acitve malate dehydrogenase by redesign of a lactate dehydrogenase framework. Science 1988, 242: 1541\u20131544. 10.1126\/science.3201242","journal-title":"Science"},{"key":"3065_CR20","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1016\/0014-5793(95)01119-Y","volume":"374","author":"SF Haydock","year":"1995","unstructured":"Haydock SF, Aparicio JF, Moln\u00e1r I, Schwecke T, Khaw LE, K\u00f6nig A, Marsden AF, Galloway IS, Staunton J, Leadlay PF: Divergent sequence motifs correlated with the substrate specificity of (methyl)malonyl-CoA:acyl carrier protein transacylase domains in modular polyketide synthases. FEBS Lett 1995, 374: 246\u2013248. 10.1016\/0014-5793(95)01119-Y","journal-title":"FEBS Lett"},{"key":"3065_CR21","doi-asserted-by":"publisher","first-page":"1643","DOI":"10.1021\/bi9820311","volume":"38","author":"J Lau","year":"1999","unstructured":"Lau J, Fu H, Cane DE, Khosla C: Dissecting the role of acyltransferase domains of modular polyketide synthases in the choice and stereochemical fate of extender units. Biochemistry 1999, 38: 1643\u20131651. 10.1021\/bi9820311","journal-title":"Biochemistry"},{"issue":"51","key":"3065_CR22","doi-asserted-by":"publisher","first-page":"15464","DOI":"10.1021\/bi015864r","volume":"40","author":"CD Reeves","year":"2001","unstructured":"Reeves CD, Murli S, Ashley GW, Piagentini M, Hutchinson CR, McDaniel R: Alteration of the substrate specificity of a modular polyketide synthase acyltransferase domain through site-specific mutations. Biochemistry 2001, 40(51):15464\u201315470. 10.1021\/bi015864r","journal-title":"Biochemistry"},{"key":"3065_CR23","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1007\/s10295-003-0062-0","volume":"30","author":"F Del Vecchio","year":"2003","unstructured":"Del Vecchio F, Petkovic H, Kendrew SG, Low L, Wilkinson B, Lill R, Cort\u00e9s J, Rudd BA, Staunton J, Leadlay PF: Active-site residue, domain and module swaps in modular polyketide synthases. J Ind Microbiol Biotechnol 2003, 30: 489\u2013494.","journal-title":"J Ind Microbiol Biotechnol"},{"key":"3065_CR24","doi-asserted-by":"publisher","first-page":"12961","DOI":"10.1074\/jbc.270.22.12961","volume":"270","author":"L Serre","year":"1995","unstructured":"Serre L, Verbree EC, Dauter Z, Stuitje AR, Derewenda ZS: The Escherichia coli malonyl-CoA:acyl carrier protein transacylase at 1.5A resolution. Crystal structure of a FAS component. J Biol Chem 1995, 270: 12961\u201312964. 10.1074\/jbc.270.22.12961","journal-title":"J Biol Chem"},{"key":"3065_CR25","doi-asserted-by":"publisher","first-page":"13758","DOI":"10.1021\/ja0753290","volume":"129","author":"R Castonguay","year":"2007","unstructured":"Castonguay R, He W, Chen AY, Khosla C, Cane DE: Stereospecificity of ketoreductase domains of the 6-deoxyerythronolide B synthase. J Am Chem Soc 2007, 129: 13758\u201313769. 10.1021\/ja0753290","journal-title":"J Am Chem Soc"},{"key":"3065_CR26","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1093\/jxb\/47.3.325","volume":"47","author":"ER Waters","year":"1996","unstructured":"Waters ER, Lee GJ, Vierling E: Evolution, structure and function of the small heat shock proteins in plants. J Exp Bot 1996, 47: 325\u2013338. 10.1093\/jxb\/47.3.325","journal-title":"J Exp Bot"},{"key":"3065_CR27","doi-asserted-by":"publisher","first-page":"1025","DOI":"10.1038\/nsb722","volume":"8","author":"RL van Montfort","year":"2001","unstructured":"van Montfort RL, Basha E, Friedrich KL, Slingsby C, Vierling E: Crystal structure and assembly of a eukaryotic small heat shock protein. Nat Struct Biol 2001, 8: 1025\u20131030. 10.1038\/nsb722","journal-title":"Nat Struct Biol"},{"key":"3065_CR28","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1038\/29106","volume":"394","author":"KK Kim","year":"1998","unstructured":"Kim KK, Kim R, Kim SH: Crystal structure of a small heat-shock protein. Nature 1998, 394: 595\u2013599. 10.1038\/29106","journal-title":"Nature"},{"key":"3065_CR29","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1002\/cbic.200600399","volume":"8","author":"A Starcevic","year":"2007","unstructured":"Starcevic A, Jaspars M, Cullum J, Hranueli D, Long PF: Predicting the nature and timing of epimerisation on a modular polyketide synthase. Chem Bio Chem 2007, 8: 28\u201331.","journal-title":"Chem Bio Chem"},{"key":"3065_CR30","doi-asserted-by":"publisher","first-page":"898","DOI":"10.1016\/j.chembiol.2007.07.009","volume":"14","author":"AT Keatinge-Clay","year":"2007","unstructured":"Keatinge-Clay AT: A tylosin ketoreductase reveals how chirality is determined in polyketides. Chemistry & Biology 2007, 14: 898\u2013908. 10.1016\/j.chembiol.2007.07.009","journal-title":"Chemistry & Biology"},{"key":"3065_CR31","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1089\/106652703322756195","volume":"10","author":"S Veerassamy","year":"2003","unstructured":"Veerassamy S, Smith A, Tillier ERM: A transition probability model for amino acid substitutions from blocks. J Comput Biol 2003, 10: 997\u20131010. 10.1089\/106652703322756195","journal-title":"J Comput Biol"},{"key":"3065_CR32","first-page":"275","volume":"8","author":"DT Jones","year":"1992","unstructured":"Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 1992, 8: 275\u2013282.","journal-title":"Comput Appl Biosci"},{"key":"3065_CR33","unstructured":"ExPASy Proteomics Server[http:\/\/expasy.org\/]"},{"key":"3065_CR34","unstructured":"NRPS_PKS: A knowledge based resource for analysis of Non-ribosomal Peptide Synthetases and Polyketide Synthases[http:\/\/www.nii.res.in\/nrps-pks.html]"},{"issue":"Web server issu","key":"3065_CR35","doi-asserted-by":"publisher","first-page":"W405","DOI":"10.1093\/nar\/gkh359","volume":"32","author":"MZ Ansari","year":"2004","unstructured":"Ansari MZ, Yadav G, Gokhale RS, Mohanty D: NRPS-PKS: a knowledge-based resource for analysis of NRPS\/PKS megasynthases. Nucleic Acids Res 2004, 32(Web server issue):W405-W413. 10.1093\/nar\/gkh359","journal-title":"Nucleic Acids Res"},{"key":"3065_CR36","doi-asserted-by":"publisher","first-page":"140","DOI":"10.4014\/jmb.0809.554","volume":"19","author":"H Tae","year":"2009","unstructured":"Tae H, Jae KS, Park K: Development of an analysis program of Type I polyketide synthase gene clusters using homology search and profile hidden Markov model. J Microbiol Biotechnol 2009, 19: 140\u2013146. 10.4014\/jmb.0809.554","journal-title":"J Microbiol Biotechnol"},{"key":"3065_CR37","unstructured":"European Bioinformatics Institute[http:\/\/www.ebi.ac.uk]"},{"key":"3065_CR38","first-page":"164","volume":"5","author":"J Felsenstein","year":"1989","unstructured":"Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 1989, 5: 164\u2013166.","journal-title":"Cladistics"},{"key":"3065_CR39","first-page":"345","volume":"5","author":"MO Dayhoff","year":"1978","unstructured":"Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 1978, 5: 345\u2013352.","journal-title":"Atlas of Protein Sequence and Structure"},{"key":"3065_CR40","volume-title":"Inferring Phylogenies","author":"J Felsenstein","year":"2004","unstructured":"Felsenstein J: Inferring Phylogenies. Sunderland, MA: Sinauer Associates; 2004."},{"key":"3065_CR41","doi-asserted-by":"publisher","first-page":"574","DOI":"10.1016\/0022-2836(94)90032-9","volume":"243","author":"S Henikoff","year":"1994","unstructured":"Henikoff S, Henikoff JG: Position-based sequence weights. J Mol Biol 1994, 243: 574\u2013578. 10.1016\/0022-2836(94)90032-9","journal-title":"J Mol Biol"},{"key":"3065_CR42","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1016\/S0893-6080(97)00133-0","volume":"2","author":"N Ueda","year":"1998","unstructured":"Ueda N, Nakano R: Deterministic Annealing EM Algorithm. Neural Networks 1998, 2: 271\u2013282. 10.1016\/S0893-6080(97)00133-0","journal-title":"Neural Networks"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-10-335.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:36:29Z","timestamp":1630445789000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-10-335"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,10,15]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["3065"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-10-335","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,10,15]]},"assertion":[{"value":"14 February 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 October 2009","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 October 2009","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"335"}}