{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T03:01:11Z","timestamp":1768014071239,"version":"3.49.0"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We start with a Na\u00efve Bayes classifier trained to predict whether a given amino acid residue is a DNA-binding residue based on its identity and the identities of its sequence neighbors. The input to the classifier consists of the identities of the target residue and 4 sequence neighbors on each side of the target residue. The classifier is trained and evaluated (using leave-one-out cross-validation) on a non-redundant set of 171 proteins. Our results indicate the feasibility of identifying interface residues based on local sequence information. The classifier achieves 71% overall accuracy with a correlation coefficient of 0.24, 35% specificity and 53% sensitivity in identifying interface residues as evaluated by leave-one-out cross-validation. We show that the performance of the classifier is improved by using sequence entropy of the target residue (the entropy of the corresponding column in multiple alignment obtained by aligning the target sequence with its sequence homologs) as additional input. The classifier achieves 78% overall accuracy with a correlation coefficient of 0.28, 44% specificity and 41% sensitivity in identifying interface residues. Examination of the predictions in the context of 3-dimensional structures of proteins demonstrates the effectiveness of this method in identifying DNA-binding sites from sequence information. In 33% (56 out of 171) of the proteins, the classifier identifies the interaction sites by correctly recognizing at least half of the interface residues. In 87% (149 out of 171) of the proteins, the classifier correctly identifies at least 20% of the interface residues. This suggests the possibility of using such classifiers to identify potential DNA-binding motifs and to gain potentially useful insights into sequence correlates of protein-DNA interactions.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Na\u00efve Bayes classifiers trained to identify DNA-binding residues using sequence information offer a computationally efficient approach to identifying putative DNA-binding sites in DNA-binding proteins and recognizing potential DNA-binding motifs.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-262","type":"journal-article","created":{"date-parts":[[2006,5,20]],"date-time":"2006-05-20T07:33:00Z","timestamp":1148110380000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":123,"title":["Predicting DNA-binding sites of proteins from amino acid sequence"],"prefix":"10.1186","volume":"7","author":[{"given":"Changhui","family":"Yan","sequence":"first","affiliation":[]},{"given":"Michael","family":"Terribilini","sequence":"additional","affiliation":[]},{"given":"Feihong","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Robert L","family":"Jernigan","sequence":"additional","affiliation":[]},{"given":"Drena","family":"Dobbs","sequence":"additional","affiliation":[]},{"given":"Vasant","family":"Honavar","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,5,19]]},"reference":[{"key":"1001_CR1","doi-asserted-by":"publisher","first-page":"691","DOI":"10.2174\/0929867053202197","volume":"12","author":"D Ghosh","year":"2005","unstructured":"Ghosh D, Papavassiliou AG: Transcription factor therapeutics: long-shot or lodestone. Curr Med Chem 2005, 12: 691\u2013701.","journal-title":"Curr Med Chem"},{"key":"1001_CR2","doi-asserted-by":"publisher","first-page":"1361","DOI":"10.1124\/mol.104.002758","volume":"66","author":"P Blancafort","year":"2004","unstructured":"Blancafort P, Segal DJ, Barbas CFIII: Designing transcription factor architectures for drug discovery. Mol Pharmacol 2004, 66: 1361\u20131371. 10.1124\/mol.104.002758","journal-title":"Mol Pharmacol"},{"key":"1001_CR3","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1146\/annurev.bi.61.070192.005201","volume":"61","author":"CO Pabo","year":"1992","unstructured":"Pabo CO, Sauer RT: Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem 1992, 61: 1053\u20131095. 10.1146\/annurev.bi.61.070192.005201","journal-title":"Annu Rev Biochem"},{"key":"1001_CR4","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/S0959-440X(00)00167-6","volume":"11","author":"JH Laity","year":"2001","unstructured":"Laity JH, Lee BM, Wright PE: Zinc finger proteins: new insights into structural and functional diversity. Current Opinion in Structural Biology 2001, 11: 39\u201346. 10.1016\/S0959-440X(00)00167-6","journal-title":"Current Opinion in Structural Biology"},{"key":"1001_CR5","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1016\/j.sbi.2004.01.012","volume":"14","author":"CL Lawson","year":"2004","unstructured":"Lawson CL, Swigon D, Murakami KS, Darst SA, Berman HM, Ebright RH: Catabolite activator protein: DNA binding and transcription activation. Current Opinion in Structural Biology 2004, 14: 10\u201320. 10.1016\/j.sbi.2004.01.012","journal-title":"Current Opinion in Structural Biology"},{"key":"1001_CR6","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1016\/S0959-440X(00)00163-9","volume":"11","author":"CW Muller","year":"2001","unstructured":"Muller CW: Transcription factors: global and detailed views. Current Opinion in Structural Biology 2001, 11: 26\u201332. 10.1016\/S0959-440X(00)00163-9","journal-title":"Current Opinion in Structural Biology"},{"key":"1001_CR7","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1002\/prot.20297","volume":"58","author":"M Radlinska","year":"2005","unstructured":"Radlinska M, Kondrzycka-Dada A, Piekarowicz A, Bujnicki JM: Identification of amino acids important for target recognition by the DNA:m5C methyltransferase M.NgoPII by alanine-scanning mutagenesis of residues at the protein-DNA interface. Proteins 2005, 58: 263\u2013270. 10.1002\/prot.20297","journal-title":"Proteins"},{"key":"1001_CR8","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1016\/S0022-2836(02)00782-9","volume":"322","author":"KL Griffith","year":"2002","unstructured":"Griffith KL, Wolf JRE: A comprehensive alanine scanning mutagenesis of the Escherichia coli transcriptional activator SoxS: identifying amino acids important for DNA binding and transcription activation. Journal of Molecular Biology 2002, 322: 237\u2013257. 10.1016\/S0022-2836(02)00782-9","journal-title":"Journal of Molecular Biology"},{"key":"1001_CR9","doi-asserted-by":"publisher","first-page":"e132","DOI":"10.1093\/nar\/gnh131","volume":"32","author":"H Geyer","year":"2004","unstructured":"Geyer H, Geyer R, Pingoud V: A novel strategy for the identification of protein-DNA contacts by photocrosslinking and mass spectrometry. Nucleic Acids Res 2004, 32: e132. 10.1093\/nar\/gnh131","journal-title":"Nucleic Acids Res"},{"key":"1001_CR10","doi-asserted-by":"publisher","first-page":"7189","DOI":"10.1093\/nar\/gkg922","volume":"31","author":"S Jones","year":"2003","unstructured":"Jones S, Shanahan HP, Berman HM, Thornton JM: Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucl Acids Res 2003, 31: 7189\u20137198. 10.1093\/nar\/gkg922","journal-title":"Nucl Acids Res"},{"key":"1001_CR11","doi-asserted-by":"publisher","first-page":"4732","DOI":"10.1093\/nar\/gkh803","volume":"32","author":"HP Shanahan","year":"2004","unstructured":"Shanahan HP, Garcia MA, Jones S, Thornton JM: Identifying DNA-binding proteins using structural motifs and the electrostatic potential. Nucl Acids Res 2004, 32: 4732\u20134741. 10.1093\/nar\/gkh803","journal-title":"Nucl Acids Res"},{"key":"1001_CR12","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1002\/prot.20111","volume":"55","author":"Y Tsuchiya","year":"2004","unstructured":"Tsuchiya Y, Kinoshita K, Nakamura H: Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces. Proteins 2004, 55: 885\u2013894. 10.1002\/prot.20111","journal-title":"Proteins"},{"key":"1001_CR13","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1002\/jcc.10361","volume":"25","author":"M Keil","year":"2004","unstructured":"Keil M, Exner TE, Brickmann J: Pattern recognition strategies for molecular surfaces: III. Binding site prediction with a neural network. J Comput Chem 2004, 25: 779\u2013789. 10.1002\/jcc.10361","journal-title":"J Comput Chem"},{"key":"1001_CR14","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1093\/bioinformatics\/btg432","volume":"20","author":"S Ahmad","year":"2004","unstructured":"Ahmad S, Gromiha MM, Sarai A: Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 2004, 20: 477\u2013486. 10.1093\/bioinformatics\/btg432","journal-title":"Bioinformatics"},{"key":"1001_CR15","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/1471-2105-6-33","volume":"6","author":"S Ahmad","year":"2005","unstructured":"Ahmad S, Sarai A: PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 2005, 6: 33. 10.1186\/1471-2105-6-33","journal-title":"BMC Bioinformatics"},{"key":"1001_CR16","unstructured":"Prediction of DNA-binding residues by PSSM and sequence homology\n                  http:\/\/wwwnetasaorg\/dbs-pssm\/"},{"key":"1001_CR17","doi-asserted-by":"publisher","first-page":"3248","DOI":"10.1073\/pnas.0409851102","volume":"102","author":"JS Kim","year":"2005","unstructured":"Kim JS, DeGiovanni A, Jancarik J, Adams PD, Yokota H, Kim R, Kim SH: Crystal structure of DNA sequence specificity subunit of a type I restriction-modification enzyme and its functional implications. PNAS 2005, 102: 3248\u20133253. 10.1073\/pnas.0409851102","journal-title":"PNAS"},{"key":"1001_CR18","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1006\/jmbi.1997.1233","volume":"272","author":"S Jones","year":"1997","unstructured":"Jones S, Thornton JM: Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997, 272: 133\u2013143. 10.1006\/jmbi.1997.1233","journal-title":"J Mol Biol"},{"key":"1001_CR19","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1186\/1471-2105-5-205","volume":"5","author":"TZ Sen","year":"2005","unstructured":"Sen TZ, Kloczkowski A, Jernigan RL, Yan C, Honavar V, Ho KM, Wang CZ, Ihm Y, Cao H, Gu X, Dobbs D: Predicting binding sites of hydrolase-inhibitor complexes by combining several methods. BMC Bioinformatics 2005, 5: 205. 10.1186\/1471-2105-5-205","journal-title":"BMC Bioinformatics"},{"key":"1001_CR20","doi-asserted-by":"publisher","first-page":"i371","DOI":"10.1093\/bioinformatics\/bth920","volume":"20","author":"C Yan","year":"2004","unstructured":"Yan C, Dobbs D, Honavar V: A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 2004, 20: i371-i378. 10.1093\/bioinformatics\/bth920","journal-title":"Bioinformatics"},{"key":"1001_CR21","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/s00521-004-0414-3","volume":"13","author":"C Yan","year":"2004","unstructured":"Yan C, Honavar V, Dobbs D: Identification of interface residues in protease-inhibitor and antigen-antibody complexes: a support vector machine approach. Neural Computing & Applications 2004, 13: 123\u2013129.","journal-title":"Neural Computing & Applications"},{"key":"1001_CR22","unstructured":"Terribilini M, Lee JH, Yan C, Jernigan RL, Honavar V, Dobbs D: Prediction of RNA-binding sites in proteins based on amino acid sequence. Submitted Submitted"},{"key":"1001_CR23","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","volume":"28","author":"HM Berman","year":"2000","unstructured":"Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235\u2013242. 10.1093\/nar\/28.1.235","journal-title":"Nucleic Acids Research"},{"key":"1001_CR24","doi-asserted-by":"publisher","first-page":"1589","DOI":"10.1093\/bioinformatics\/btg224","volume":"19","author":"G Wang","year":"2003","unstructured":"Wang G, Dunbrack RLJ: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589\u20131591. 10.1093\/bioinformatics\/btg224","journal-title":"Bioinformatics"},{"key":"1001_CR25","unstructured":"PDB derived data\n                  ftp:\/\/ftprcsborg\/pub\/pdb\/derived_data\/"},{"key":"1001_CR26","unstructured":"Gene ontology annotation\n                  http:\/\/wwwebiacuk\/GOA\/"},{"key":"1001_CR27","volume-title":"NACCESS","author":"SJ Hubbard","year":"1993","unstructured":"Hubbard SJ: NACCESS. Department of Biochemistry and Molecular Biology, University College, London.; 1993."},{"key":"1001_CR28","volume-title":"Data mining: practical machine learning tools and techniques with Java implements","author":"IH Witten","year":"1999","unstructured":"Witten IH, Frank E: Data mining: practical machine learning tools and techniques with Java implements. San Mateo, CA, Morgan Kaufmann; 1999."},{"key":"1001_CR29","unstructured":"Weka 3: Data mining software in Java\n                  http:\/\/wwwcswaikatoacnz\/~ml\/weka\/"},{"key":"1001_CR30","first-page":"52","volume-title":"Theory refinement on Bayesian networks","author":"W Buntine","year":"1991","unstructured":"Buntine W: Theory refinement on Bayesian networks: ; Los Angeles, CA. ; 1991:52\u201360."},{"key":"1001_CR31","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1002\/prot.340090107","volume":"9","author":"C Sander","year":"1991","unstructured":"Sander C, Schneider R: Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins 1991, 9: 56\u201368. 10.1002\/prot.340090107","journal-title":"Proteins"},{"key":"1001_CR32","doi-asserted-by":"publisher","first-page":"6507","DOI":"10.1021\/jp010454y","volume":"B 105","author":"W Rocchia","year":"2001","unstructured":"Rocchia W, Alexov E, Honig B: Extending the applicability of the nonlinear Poisson-Boltzmann equation: multiple dielectric constants and multivalent ions. Journal of Physical Chemistry 2001, B 105: 6507\u20136514.","journal-title":"Journal of Physical Chemistry"},{"key":"1001_CR33","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1002\/jcc.1161","volume":"23","author":"W Rocchia","year":"2002","unstructured":"Rocchia W, Sridharan S, Nicholls A, Alexov E, Chiabrera A, Honig B: Rapid grid-based construction of the molecular surface for both molecules and geometric objects: applications to the finite difference Poisson-Boltzmann method. Journal of Computational Chemistry 2002, 23: 128\u2013137. 10.1002\/jcc.1161","journal-title":"Journal of Computational Chemistry"},{"key":"1001_CR34","volume-title":"Proc Natl Acad Sci USA","author":"D Eisenberg","year":"1984","unstructured":"Eisenberg D, Weiss RM, Terwilliger TC: The hydrophobicity moment detects periodicity in protein hydrophobicity. Proc Natl Acad Sci USA 1984., 81:"},{"key":"1001_CR35","doi-asserted-by":"publisher","first-page":"412","DOI":"10.1093\/bioinformatics\/16.5.412","volume":"16","author":"P Baldi","year":"2000","unstructured":"Baldi P, Brunak S, Chauvin Y, Andersen CAF: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 2000, 16: 412\u2013424. 10.1093\/bioinformatics\/16.5.412","journal-title":"Bioinformatics"},{"key":"1001_CR36","doi-asserted-by":"publisher","first-page":"D227","DOI":"10.1093\/nar\/gkj063","volume":"34","author":"N Hulo","year":"2006","unstructured":"Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJA: The PROSITE database. Nucl Acids Res 2006, 34: D227\u2013230. 10.1093\/nar\/gkj063","journal-title":"Nucl Acids Res"},{"key":"1001_CR37","unstructured":"ps_scan program\n                  ftp:\/\/caexpasyorg\/databases\/prosite\/tools\/ps_scan\/"},{"key":"1001_CR38","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/S0968-0004(01)02008-4","volume":"27","author":"E Martz","year":"2002","unstructured":"Martz E: Protein Explorer: easy yet powerful macromolecular visualization. Trends Biochem Sci 2002, 27: 107\u2013109. 10.1016\/S0968-0004(01)02008-4","journal-title":"Trends Biochem Sci"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-262.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:04:30Z","timestamp":1630494270000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-262"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,5,19]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["1001"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-262","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,5,19]]},"assertion":[{"value":"28 November 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 May 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 May 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"262"}}