{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:21:49Z","timestamp":1760242909382,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2016,10,24]],"date-time":"2016-10-24T00:00:00Z","timestamp":1477267200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure- or sequence-based approaches to predict the DNA-binding residues in proteins. The structure-based methods usually achieve good results, but require the knowledge of the 3D structure of protein; while sequence-based methods can be applied to high-throughput of proteins, but require good features. In this study, we present a new information theoretic feature derived from Jensen\u2013Shannon Divergence (JSD) between amino acid distribution of a site and the background distribution of non-binding sites. Our new feature indicates the difference of a certain site from a non-binding site, thus it is informative for detecting binding sites in proteins. We conduct the study with a five-fold cross validation of 263 proteins utilizing the Random Forest classifier. We evaluate the functionality of our new features by combining them with other popular existing features such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). We notice that by adding our features, we can significantly boost the performance of Random Forest classifier, with a clear increment of sensitivity and Matthews correlation coefficient (MCC).<\/jats:p>","DOI":"10.3390\/e18100379","type":"journal-article","created":{"date-parts":[[2016,10,24]],"date-time":"2016-10-24T10:46:39Z","timestamp":1477305999000},"page":"379","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen\u2013Shannon Divergence"],"prefix":"10.3390","volume":"18","author":[{"given":"Truong","family":"Dang","sequence":"first","affiliation":[{"name":"Institute of Computer Science, University of G\u00f6ttingen, G\u00f6ttingen 37077, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cornelia","family":"Meckbach","sequence":"additional","affiliation":[{"name":"Institute of Bioinformatics, University Medical Center G\u00f6ttingen, G\u00f6ttingen 37077, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rebecca","family":"Tacke","sequence":"additional","affiliation":[{"name":"Institute of Bioinformatics, University Medical Center G\u00f6ttingen, G\u00f6ttingen 37077, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephan","family":"Waack","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, University of G\u00f6ttingen, G\u00f6ttingen 37077, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3297-3192","authenticated-orcid":false,"given":"Mehmet","family":"G\u00fcltas","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, University of G\u00f6ttingen, G\u00f6ttingen 37077, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2016,10,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"15479","DOI":"10.1038\/srep15479","article-title":"DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation","volume":"5","author":"Liu","year":"2015","journal-title":"Sci. Rep."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5340","DOI":"10.1093\/nar\/gkv446","article-title":"Prediction of nucleic acid binding probability in proteins: A neighboring residue network based score","volume":"43","author":"Miao","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Si, J., Zhang, Z., Lin, B., Schroeder, M., and Huang, B. (2011). MetaDBSite: A meta approach to improve protein DNA-binding sites prediction. BMC Syst. Biol., 5.","DOI":"10.1186\/1752-0509-5-S1-S7"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1766","DOI":"10.1109\/TCBB.2012.106","article-title":"Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information","volume":"9","author":"Ma","year":"2012","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1093\/bioinformatics\/btn583","article-title":"Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature","volume":"25","author":"Wu","year":"2009","journal-title":"Bioinformatics"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1002\/minf.201400025","article-title":"PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou\u2019s PseAAC and Physicochemical Distance Transformation","volume":"34","author":"Liu","year":"2015","journal-title":"Mol. Inform."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Xu, R., Zhou, J., Wang, H., He, Y., Wang, X., and Liu, B. (2015). Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol., 9.","DOI":"10.1186\/1752-0509-9-S1-S10"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Dong, Q., Wang, S., Wang, K., Liu, X., and Liu, B. (2015, January 9\u201312). Identification of DNA-binding proteins by auto-cross covariance transformation. Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA.","DOI":"10.1109\/BIBM.2015.7359730"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Wei, L., Tang, J., and Zou, Q. (2016). Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci., in press.","DOI":"10.1016\/j.ins.2016.06.026"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.neucom.2016.03.025","article-title":"Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix","volume":"199","author":"Waris","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"27653","DOI":"10.1038\/srep27653","article-title":"PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context","volume":"6","author":"Zhou","year":"2016","journal-title":"Sci. Rep."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"7189","DOI":"10.1093\/nar\/gkg922","article-title":"Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins","volume":"31","author":"Jones","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1093\/bioinformatics\/btg432","article-title":"Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information","volume":"20","author":"Ahmad","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Bhardwaj, N., Langlois, R.E., Zhao, G., and Lu, H. (2005, January 1\u20134). Structure based prediction of binding residues on DNA-binding proteins. Proceedings of the IEEE 27th Annual International Conference of the Engineering in Medicine and Biology Society (IEEE-EMBS 2005), Shanghai, China.","DOI":"10.1109\/IEMBS.2005.1617004"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ahmad, S., and Sarai, A. (2005). PSSM-based prediction of DNA binding sites in proteins. BMC Bioinform., 6.","DOI":"10.1186\/1471-2105-6-33"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1002\/prot.20977","article-title":"Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins","volume":"64","author":"Kuznetsov","year":"2006","journal-title":"Proteins"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1141","DOI":"10.1142\/S0219720006002387","article-title":"Prediction of DNA-binding residues from sequence features","volume":"4","author":"Wang","year":"2006","journal-title":"J. Bioinform. Comput. Biol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"W243","DOI":"10.1093\/nar\/gkl298","article-title":"BindN: A web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences","volume":"34","author":"Wang","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"i347","DOI":"10.1093\/bioinformatics\/btm174","article-title":"Prediction of DNA-binding residues from sequence","volume":"23","author":"Ofran","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1093\/nar\/gkl1155","article-title":"Structure-based prediction of C2H2 zinc-finger binding specificity: Sensitivity to docking geometry","volume":"35","author":"Siggers","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.1093\/nar\/gkm008","article-title":"DISPLAR: An accurate method for predicting DNA-binding sites on protein surfaces","volume":"35","author":"Tjong","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"692","DOI":"10.1093\/bioinformatics\/btq019","article-title":"iDBPs: A web server for the identification of DNA binding proteins","volume":"26","author":"Nimrod","year":"2010","journal-title":"Bioinformatics"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, L., Huang, C., Yang, M.Q., and Yang, J.Y. (2010). BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features. BMC Syst. Biol., 4.","DOI":"10.1186\/1752-0509-4-S1-S3"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Miao, Z., and Westhof, E. (2015). A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs. PLoS Comput. Biol., 11.","DOI":"10.1371\/journal.pcbi.1004639"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1093\/bib\/bbv023","article-title":"A comprehensive comparative review of sequence-based predictors of DNA-and RNA-binding residues","volume":"17","author":"Yan","year":"2015","journal-title":"Brief. Bioinform."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yan, C., Terribilini, M., Wu, F., Jernigan, R.L., Dobbs, D., and Honavar, V. (2006). Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform., 7.","DOI":"10.1186\/1471-2105-7-262"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1093\/bioinformatics\/btl672","article-title":"DP-Bind: A web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins","volume":"23","author":"Hwang","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Huang, Y.F., Huang, C.C., Liu, Y.C., Oyang, Y.J., and Huang, C.K. (2009). DNA-binding residues and binding mode prediction with binding-mechanism concerned models. BMC Genom., 10.","DOI":"10.1186\/1471-2164-10-S3-S23"},{"key":"ref_29","first-page":"10180","article-title":"Computational learning on specificity-determining residue-nucleotide interactions","volume":"43","author":"Wong","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, L., Yang, M.Q., and Yang, J.Y. (2009). Prediction of DNA-binding residues from protein sequence information using random forests. BMC Genom., 10.","DOI":"10.1186\/1471-2164-10-S1-S1"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Eggeling, R., Roos, T., Myllym\u00e4ki, P., and Grosse, I. (2015). Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinform., 16.","DOI":"10.1186\/s12859-015-0797-4"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"G\u00fcltas, M., D\u00fczg\u00fcn, G., Herzog, S., J\u00e4ger, S.J., Meckbach, C., Wingender, E., and Waack, S. (2014). Quantum coupled mutation finder: Predicting functionally or structurally important sites in proteins using quantum Jensen\u2013Shannon divergence and CUDA programming. BMC Bioinform., 15.","DOI":"10.1186\/1471-2105-15-96"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1093\/bioinformatics\/btm626","article-title":"Prediction of protein functional residues from sequence by probability density estimation","volume":"24","author":"Fischer","year":"2008","journal-title":"Bioinformatics"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1875","DOI":"10.1093\/bioinformatics\/btm270","article-title":"Predicting functionally important residues from sequence conservation","volume":"23","author":"Capra","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"041905","DOI":"10.1103\/PhysRevE.65.041905","article-title":"Analysis of symbolic sequences using the Jensen\u2013Shannon divergence","volume":"65","author":"Grosse","year":"2002","journal-title":"Phys. Rev. E"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"G\u00fcltas, M., Haubrock, M., T\u00fcys\u00fcz, N., and Waack, S. (2012). Coupled mutation finder: A new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinform., 13.","DOI":"10.1186\/1471-2105-13-225"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"678","DOI":"10.1093\/bioinformatics\/btt029","article-title":"PreDNA: Accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information","volume":"29","author":"Li","year":"2013","journal-title":"Bioinformatics"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2253","DOI":"10.1002\/prot.24592","article-title":"A simple contact mapping algorithm for identifying potential peptide mimetics in protein\u2013protein interaction partners","volume":"82","author":"Krall","year":"2014","journal-title":"Proteins"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/S0092-8674(02)01284-9","article-title":"X-ray structures of Myc-Max and Mad-Max recognizing DNA: Molecular bases of regulation by proto-oncogenic transcription factors","volume":"112","author":"Nair","year":"2003","journal-title":"Cell"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat. Methods"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1088\/1751-8113\/42\/36\/365209","article-title":"Random bistochastic matrices","volume":"42","author":"Cappellini","year":"2009","journal-title":"J. Phys. A Math. Theor."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/S0022-2836(02)01036-7","article-title":"Analysis of catalytic residues in enzyme active sites","volume":"324","author":"Bartlett","year":"2002","journal-title":"J. Mol. Biol."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"884","DOI":"10.1110\/ps.03465504","article-title":"Prediction of functional sites by analysis of sequence and structure conservation","volume":"13","author":"Panchenko","year":"2004","journal-title":"Protein Sci."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Janda, J.O., Busch, M., K\u00fcck, F., Porfenenko, M., and Merkl, R. (2012). CLIPS-1D: Analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinform., 13.","DOI":"10.1186\/1471-2105-13-55"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: An update","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/18\/10\/379\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T19:33:49Z","timestamp":1760211229000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/18\/10\/379"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,10,24]]},"references-count":48,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2016,10]]}},"alternative-id":["e18100379"],"URL":"https:\/\/doi.org\/10.3390\/e18100379","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2016,10,24]]}}}