{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T04:15:35Z","timestamp":1728360935973},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2005,1,10]],"date-time":"2005-01-10T00:00:00Z","timestamp":1105315200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2005,1,10]],"date-time":"2005-01-10T00:00:00Z","timestamp":1105315200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>Structure-dependent substitution matrices increase the accuracy of sequence alignments when the 3D structure of one sequence is known, and are successful e.g. in fold recognition. We propose a new automated method, EvDTree, based on a decision tree algorithm, for automatic derivation of amino acid substitution probabilities from a set of sequence-structure alignments. The main advantage over other approaches is an unbiased automatic selection of the most informative structural descriptors and associated values or thresholds. This feature allows automatic derivation of structure-dependent substitution scores for any specific set of structures, without the need to empirically determine best descriptors and parameters.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusions<\/jats:title>\n                        <jats:p>We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-6-4","type":"journal-article","created":{"date-parts":[[2005,1,12]],"date-time":"2005-01-12T22:25:51Z","timestamp":1105568751000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments"],"prefix":"10.1186","volume":"6","author":[{"given":"Jean-Christophe","family":"Gelly","sequence":"first","affiliation":[]},{"given":"Laurent","family":"Chiche","sequence":"additional","affiliation":[]},{"given":"J\u00e9r\u00f4me","family":"Gracy","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,1,10]]},"reference":[{"key":"329_CR1","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1126\/science.1853201","volume":"253","author":"JU Bowie","year":"1991","unstructured":"Bowie JU, Luthy R, Eisenberg D: A method to identify protein sequences that fold into a known three-dimensional structure. Science 1991, 253: 164\u2013170.","journal-title":"Science"},{"key":"329_CR2","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1096\/fasebj.10.1.8566533","volume":"10","author":"D Fischer","year":"1996","unstructured":"Fischer D, Rice D, Bowie JU, Eisenberg D: Assigning amino acid sequences to 3-dimensional protein folds. Faseb J 1996, 10: 126\u2013136.","journal-title":"Faseb J"},{"key":"329_CR3","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1002\/pro.5560030416","volume":"3","author":"KY Zhang","year":"1994","unstructured":"Zhang KY, Eisenberg D: The three-dimensional profile method using residue preference as a continuous function of residue environment. Protein Sci 1994, 3: 687\u2013695.","journal-title":"Protein Sci"},{"key":"329_CR4","first-page":"25","volume-title":"Faraday Discuss","author":"D Eisenberg","year":"1992","unstructured":"Eisenberg D, Bowie JU, Luthy R, Choe S: Three-dimensional profiles for analysing protein sequence-structure relationships. Faraday Discuss 1992, 25\u201334. 10.1039\/fd9929300025"},{"key":"329_CR5","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1038\/356083a0","volume":"356","author":"R Luthy","year":"1992","unstructured":"Luthy R, Bowie JU, Eisenberg D: Assessment of protein models with three-dimensional profiles. Nature 1992, 356: 83\u201385. 10.1038\/356083a0","journal-title":"Nature"},{"key":"329_CR6","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1016\/S0076-6879(97)77022-8","volume":"277","author":"D Eisenberg","year":"1997","unstructured":"Eisenberg D, Luthy R, Bowie JU: VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 1997, 277: 396\u2013404.","journal-title":"Methods Enzymol"},{"key":"329_CR7","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1093\/protein\/6.8.821","volume":"6","author":"J Gracy","year":"1993","unstructured":"Gracy J, Chiche L, Sallantin J: Improved alignment of weakly homologous protein sequences using structural information. Protein Eng 1993, 6: 821\u2013829.","journal-title":"Protein Eng"},{"key":"329_CR8","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1016\/0959-440X(95)80081-6","volume":"5","author":"MJ Sippl","year":"1995","unstructured":"Sippl MJ: Knowledge-based potentials for proteins. Curr Opin Struct Biol 1995, 5: 229\u2013235. 10.1016\/0959-440X(95)80081-6","journal-title":"Curr Opin Struct Biol"},{"key":"329_CR9","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1007\/BF02337562","volume":"7","author":"MJ Sippl","year":"1993","unstructured":"Sippl MJ: Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J Comput Aided Mol Des 1993, 7: 473\u2013501.","journal-title":"J Comput Aided Mol Des"},{"key":"329_CR10","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1006\/jmbi.1999.2583","volume":"287","author":"DT Jones","year":"1999","unstructured":"Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 1999, 287: 797\u2013815. 10.1006\/jmbi.1999.2583","journal-title":"J Mol Biol"},{"key":"329_CR11","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1002\/prot.340230312","volume":"23","author":"DT Jones","year":"1995","unstructured":"Jones DT, Miller RT, Thornton JM: Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins 1995, 23: 387\u2013397.","journal-title":"Proteins"},{"key":"329_CR12","doi-asserted-by":"publisher","first-page":"1598","DOI":"10.1006\/jmbi.1994.1109","volume":"235","author":"JP Kocher","year":"1994","unstructured":"Kocher JP, Rooman MJ, Wodak SJ: Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. J Mol Biol 1994, 235: 1598\u20131613. 10.1006\/jmbi.1994.1109","journal-title":"J Mol Biol"},{"key":"329_CR13","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1046\/j.1432-1327.1998.2540135.x","volume":"254","author":"M Rooman","year":"1998","unstructured":"Rooman M, Gilis D: Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power. Eur J Biochem 1998, 254: 135\u2013143. 10.1046\/j.1432-1327.1998.2540135.x","journal-title":"Eur J Biochem"},{"key":"329_CR14","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1002\/prot.340230308","volume":"23","author":"CM Lemer","year":"1995","unstructured":"Lemer CM, Rooman MJ, Wodak SJ: Protein structure prediction by threading methods: evaluation of current techniques. Proteins 1995, 23: 337\u2013355.","journal-title":"Proteins"},{"key":"329_CR15","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1006\/jmbi.2001.4762","volume":"310","author":"J Shi","year":"2001","unstructured":"Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310: 243\u2013257. 10.1006\/jmbi.2001.4762","journal-title":"J Mol Biol"},{"key":"329_CR16","doi-asserted-by":"publisher","first-page":"16041","DOI":"10.1073\/pnas.252626399","volume":"99","author":"P Mallick","year":"2002","unstructured":"Mallick P, Weiss R, Eisenberg D: The directional atomic solvation energy: an atom-based potential for the assignment of protein sequences to known folds. Proc Natl Acad Sci U S A 2002, 99: 16041\u201316046. 10.1073\/pnas.252626399","journal-title":"Proc Natl Acad Sci U S A"},{"key":"329_CR17","doi-asserted-by":"publisher","first-page":"1026","DOI":"10.1006\/jmbi.1997.0924","volume":"267","author":"DW Rice","year":"1997","unstructured":"Rice DW, Eisenberg D: A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J Mol Biol 1997, 267: 1026\u20131038. 10.1006\/jmbi.1997.0924","journal-title":"J Mol Biol"},{"key":"329_CR18","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1093\/protein\/10.1.7","volume":"10","author":"CM Topham","year":"1997","unstructured":"Topham CM, Srinivasan N, Blundell TL: Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. Protein Eng 1997, 10: 7\u201321. 10.1093\/protein\/10.1.7","journal-title":"Protein Eng"},{"key":"329_CR19","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1006\/jmbi.1993.1018","volume":"229","author":"CM Topham","year":"1993","unstructured":"Topham CM, McLeod A, Eisenmenger F, Overington JP, Johnson MS, Blundell TL: Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J Mol Biol 1993, 229: 194\u2013220. 10.1006\/jmbi.1993.1018","journal-title":"J Mol Biol"},{"key":"329_CR20","doi-asserted-by":"publisher","first-page":"1443","DOI":"10.1126\/science.1604319","volume":"256","author":"GH Gonnet","year":"1992","unstructured":"Gonnet GH, Cohen MA, Benner SA: Exhaustive matching of the entire protein sequence database. Science 1992, 256: 1443\u20131445.","journal-title":"Science"},{"key":"329_CR21","doi-asserted-by":"publisher","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","volume":"89","author":"S Henikoff","year":"1992","unstructured":"Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 1992, 89: 10915\u201310919.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"329_CR22","first-page":"345","volume-title":"Atlas of Protein Sequence and Structure","author":"MO Dayhoff","year":"1978","unstructured":"Dayhoff MO, Schwartz RM, Orcutt BC: A model of evolutionary change in proteins. Matrices for detecting distant relationships. In Atlas of Protein Sequence and Structure. Volume 5. Edited by: Dayhoff MO. Washington DC, National Biomedical Research Foundation; 1978:345\u2013358 suppl. 3."},{"key":"329_CR23","doi-asserted-by":"publisher","first-page":"2469","DOI":"10.1002\/pro.5560071126","volume":"7","author":"K Mizuguchi","year":"1998","unstructured":"Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 1998, 7: 2469\u20132471.","journal-title":"Protein Sci"},{"key":"329_CR24","doi-asserted-by":"publisher","first-page":"745","DOI":"10.1093\/protein\/13.11.745","volume":"13","author":"P Lackner","year":"2000","unstructured":"Lackner P, Koppensteiner WA, Sippl MJ, Domingues FS: ProSup: a refined tool for protein structure alignment. Protein Eng 2000, 13: 745\u2013752. 10.1093\/protein\/13.11.745","journal-title":"Protein Eng"},{"key":"329_CR25","first-page":"3600","volume":"22","author":"L Holm","year":"1994","unstructured":"Holm L, Sander C: The FSSP database of structurally aligned protein fold families. Nucleic Acids Res 1994, 22: 3600\u20133609.","journal-title":"Nucleic Acids Res"},{"key":"329_CR26","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1002\/prot.340230412","volume":"23","author":"D Frishman","year":"1995","unstructured":"Frishman D, Argos P: Knowledge-based protein secondary structure assignment. Proteins 1995, 23: 566\u2013579.","journal-title":"Proteins"},{"key":"329_CR27","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1093\/bioinformatics\/14.7.617","volume":"14","author":"K Mizuguchi","year":"1998","unstructured":"Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP: JOY: protein sequence-structure representation and analysis. Bioinformatics 1998, 14: 617\u2013623. 10.1093\/bioinformatics\/14.7.617","journal-title":"Bioinformatics"},{"key":"329_CR28","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Freidman J, Olshen R, Stone C: Classification and regression trees. Belmont, CA, Wadsworth International Group; 1984."},{"key":"329_CR29","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"27","author":"CE Shannon","year":"1948","unstructured":"Shannon CE: A mathematical theory of communication. Bell System Technical Journal 1948, 27: 379\u2013423 and 623\u2013656.","journal-title":"Bell System Technical Journal"},{"key":"329_CR30","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1109\/34.589207","volume":"19","author":"F Esposito","year":"1997","unstructured":"Esposito F, Malerba D, Semeraro G, Kay J: A comparative analysis of methods for pruning decision trees. Pattern Analysis and Machine Intelligence IEEE Transactions 1997, 19: 476\u2013491. 10.1109\/34.589207","journal-title":"Pattern Analysis and Machine Intelligence IEEE Transactions"},{"key":"329_CR31","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1093\/protein\/10.4.339","volume":"10","author":"M Ota","year":"1997","unstructured":"Ota M, Nishikawa K: Assessment of pseudo-energy potentials by the best-five test: a new use of the three-dimensional profiles of proteins. Protein Eng 1997, 10: 339\u2013351. 10.1093\/protein\/10.4.339","journal-title":"Protein Eng"},{"key":"329_CR32","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1006\/jmbi.1997.1237","volume":"272","author":"D Gilis","year":"1997","unstructured":"Gilis D, Rooman M: Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol 1997, 272: 276\u2013290. 10.1006\/jmbi.1997.1237","journal-title":"J Mol Biol"},{"key":"329_CR33","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1098\/rspb.1990.0077","volume":"241","author":"J Overington","year":"1990","unstructured":"Overington J, Johnson MS, Sali A, Blundell TL: Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc R Soc Lond B Biol Sci 1990, 241: 132\u2013145.","journal-title":"Proc R Soc Lond B Biol Sci"},{"key":"329_CR34","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1002\/prot.10231","volume":"49","author":"A Marin","year":"2002","unstructured":"Marin A, Pothier J, Zimmermann K, Gibrat JF: FROST: a filter-based fold recognition method. Proteins 2002, 49: 493\u2013509. 10.1002\/prot.10231","journal-title":"Proteins"},{"key":"329_CR35","doi-asserted-by":"publisher","first-page":"545","DOI":"10.1093\/protein\/13.8.545","volume":"13","author":"A Prlic","year":"2000","unstructured":"Prlic A, Domingues FS, Sippl MJ: Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng 2000, 13: 545\u2013550. 10.1093\/protein\/13.8.545","journal-title":"Protein Eng"},{"key":"329_CR36","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1002\/1097-0134(20000815)40:3<482::AID-PROT150>3.0.CO;2-5","volume":"40","author":"MA Marti-Renom","year":"2000","unstructured":"Marti-Renom MA, Stote RH, Querol E, Aviles FX, Karplus M: Structures of scrambled disulfide forms of the potato carboxypeptidase inhibitor predicted by molecular dynamics simulations with constraints. Proteins 2000, 40: 482\u2013493.","journal-title":"Proteins"},{"key":"329_CR37","doi-asserted-by":"publisher","first-page":"D156","DOI":"10.1093\/nar\/gkh015","volume":"32 Database iss","author":"JC Gelly","year":"2004","unstructured":"Gelly JC, Gracy J, Kaas Q, Le-Nguyen D, Heitz A, Chiche L: The KNOTTIN website and database: a new information system dedicated to the knottin scaffold. Nucleic Acids Res 2004, 32 Database issue: D156\u20139. 10.1093\/nar\/gkh015","journal-title":"Nucleic Acids Res"},{"key":"329_CR38","doi-asserted-by":"publisher","first-page":"8606","DOI":"10.1074\/jbc.M211147200","volume":"278","author":"KJ Rosengren","year":"2003","unstructured":"Rosengren KJ, Daly NL, Plan MR, Waine C, Craik DJ: Twists, knots, and rings in proteins. Structural definition of the cyclotide framework. J Biol Chem 2003, 278: 8606\u20138616. 10.1074\/jbc.M211147200","journal-title":"J Biol Chem"},{"key":"329_CR39","doi-asserted-by":"crossref","first-page":"431","DOI":"10.18388\/abp.1996_4475","volume":"43","author":"J Otlewski","year":"1996","unstructured":"Otlewski J, Krowarsch D: Squash inhibitor family of serine proteinases. Acta Biochim Pol 1996, 43: 431\u2013444.","journal-title":"Acta Biochim Pol"},{"key":"329_CR40","doi-asserted-by":"publisher","first-page":"847","DOI":"10.1093\/bioinformatics\/btg492","volume":"20","author":"RB Vilim","year":"2004","unstructured":"Vilim RB, Cunningham RM, Lu B, Kheradpour P, Stevens FJ: Fold-specific substitution matrices for protein classification. Bioinformatics 2004, 20: 847\u2013853. 10.1093\/bioinformatics\/btg492","journal-title":"Bioinformatics"},{"key":"329_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/BF00871744","volume":"2","author":"SK Murthy","year":"1994","unstructured":"Murthy SK, Kasif S, Salzberg S: A System for Induction of Oblique Decision Trees. Journal of Artificial Intelligence Research 1994, 2: 1\u201332. 10.1007\/BF00871744","journal-title":"Journal of Artificial Intelligence Research"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-6-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:27:52Z","timestamp":1728304072000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,1,10]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["329"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-4","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2005,1,10]]},"assertion":[{"value":"22 October 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"4"}}