{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,30]],"date-time":"2025-12-30T15:40:42Z","timestamp":1767109242436,"version":"3.37.3"},"reference-count":59,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2019,4,8]],"date-time":"2019-04-08T00:00:00Z","timestamp":1554681600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Anhui Provincial Education Department","award":["KJ2017ZD01"],"award-info":[{"award-number":["KJ2017ZD01"]}]},{"name":"Anhui Provincial Outstanding Young Talent Support Plan","award":["gxyqZD2017005"],"award-info":[{"award-number":["gxyqZD2017005"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61873001","11835014","31301101","61672037"],"award-info":[{"award-number":["61873001","11835014","31301101","61672037"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,5,21]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein\u2013DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Na\u00efve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein\u2013DNA binding Hot spots), for the prediction of hot spots in protein\u2013DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http:\/\/bioinfo.ahu.edu.cn:8080\/PrPDH.<\/jats:p>","DOI":"10.1093\/bib\/bbz037","type":"journal-article","created":{"date-parts":[[2019,3,8]],"date-time":"2019-03-08T12:21:33Z","timestamp":1552047693000},"page":"1038-1046","source":"Crossref","is-referenced-by-count":36,"title":["A feature-based approach to predict hot spots in protein\u2013DNA binding interfaces"],"prefix":"10.1093","volume":"21","author":[{"given":"Sijia","family":"Zhang","sequence":"first","affiliation":[{"name":"Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China"}]},{"given":"Le","family":"Zhao","sequence":"first","affiliation":[{"name":"Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China"}]},{"given":"Chun-Hou","family":"Zheng","sequence":"first","affiliation":[{"name":"Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China"}]},{"given":"Junfeng","family":"Xia","sequence":"first","affiliation":[{"name":"Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China"}]}],"member":"286","published-online":{"date-parts":[[2019,4,8]]},"reference":[{"key":"2020051819281340300_ref1","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1016\/0076-6879(91)02020-A","article-title":"Systematic mutational analyses of protein-protein interfaces","volume":"202","author":"Wells","year":"1991","journal-title":"Methods Enzymol"},{"key":"2020051819281340300_ref2","doi-asserted-by":"crossref","first-page":"803","DOI":"10.1002\/prot.21396","article-title":"Hot spots\u2014a review of the protein\u2013protein interface determinant amino-acid residues","volume":"68","author":"Moreira","year":"2007","journal-title":"Proteins"},{"key":"2020051819281340300_ref3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1006\/jmbi.1998.1843","article-title":"Anatomy of hot spots in protein interfaces","volume":"280","author":"Bogan","year":"1998","journal-title":"J Mol Biol"},{"key":"2020051819281340300_ref4","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1006\/jmbi.2000.3888","article-title":"Major groove recognition by three-stranded \u03b2-sheets: affinity determinants and conserved structural features","volume":"300","author":"Connolly","year":"2000","journal-title":"J Mol Biol"},{"key":"2020051819281340300_ref5","doi-asserted-by":"crossref","first-page":"19281","DOI":"10.1074\/jbc.274.27.19281","article-title":"Binding studies with mutants of Zif268 contribution of individual side chains to binding affinity and specificity in the Zif268 zinc finger-DNA complex","volume":"274","author":"Elrod-Erickson","year":"1999","journal-title":"J Biol Chem"},{"key":"2020051819281340300_ref6","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1007\/978-1-4939-7717-8_13","article-title":"Survey of computational approaches for prediction of DNA-binding residues on protein surfaces","volume":"1754","author":"Xiong","year":"2018","journal-title":"Methods Mol Biol"},{"key":"2020051819281340300_ref7","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1000567","article-title":"A threading-based method for the prediction of DNA-binding proteins with application to the human genome","volume":"5","author":"Gao","year":"2009","journal-title":"PLoS Comput Biol"},{"key":"2020051819281340300_ref8","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1007\/s00894-003-0168-3","article-title":"Structure-based method for analyzing protein\u2013protein interfaces","volume":"10","author":"Gao","year":"2004","journal-title":"J Mol Model"},{"key":"2020051819281340300_ref9","doi-asserted-by":"crossref","first-page":"2811","DOI":"10.1093\/nar\/gkg386","article-title":"Using structural motif templates to identify proteins with DNA binding function","volume":"31","author":"Jones","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2020051819281340300_ref10","doi-asserted-by":"crossref","first-page":"1857","DOI":"10.1093\/bioinformatics\/btq295","article-title":"Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function","volume":"26","author":"Zhao","year":"2010","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref11","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.1016\/j.jmb.2009.02.023","article-title":"Identification of DNA-binding proteins using structural, electrostatic and evolutionary features","volume":"387","author":"Nimrod","year":"2009","journal-title":"J Mol Biol"},{"key":"2020051819281340300_ref12","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.jmb.2004.05.058","article-title":"Moment-based prediction of DNA-binding proteins","volume":"341","author":"Ahmad","year":"2004","journal-title":"J Mol Biol"},{"key":"2020051819281340300_ref13","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1093\/bioinformatics\/btx698","article-title":"Predicting protein\u2013DNA binding free energy change upon missense mutations using modified MM\/PBSA approach: SAMPDI webserver","volume":"34","author":"Peng","year":"2017","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref14","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1006615","article-title":"PremPDI estimates and interprets the effects of missense mutations on protein\u2013DNA interactions","volume":"14","author":"Zhang","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2020051819281340300_ref15","doi-asserted-by":"crossref","first-page":"W241","DOI":"10.1093\/nar\/gkx236","article-title":"mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions","volume":"45","author":"Pires","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2020051819281340300_ref16","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bay034","article-title":"dbAMEPNI: a database of alanine mutagenic effects for protein\u2013nucleic acid interactions","volume":"2018","author":"Liu","year":"2018","journal-title":"Database"},{"key":"2020051819281340300_ref17","doi-asserted-by":"crossref","first-page":"19","DOI":"10.32614\/RJ-2015-018","article-title":"VSURF: an R package for variable selection using random forests","volume":"7","author":"Genuer","year":"2015","journal-title":"R J"},{"key":"2020051819281340300_ref18","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/5254.708428","article-title":"Support vector machines","volume":"13","author":"Hearst","year":"1998","journal-title":"IEEE Intell Syst"},{"key":"2020051819281340300_ref19","first-page":"1658","volume-title":"Bioinformatics","author":"Li","year":"2006"},{"key":"2020051819281340300_ref20","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1186\/1471-2105-15-298","article-title":"nDNA-prot: identification of DNA-binding proteins based on unbalanced classification","volume":"15","author":"Song","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2020051819281340300_ref21","doi-asserted-by":"crossref","first-page":"e160","DOI":"10.1371\/journal.pcbi.0030160","article-title":"Automated protein subfamily identification and classification","volume":"3","author":"Brown","year":"2007","journal-title":"PLoS Comput Biol"},{"key":"2020051819281340300_ref22","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1093\/bioinformatics\/btx822","article-title":"Computational identification of binding energy hot spots in protein\u2013RNA complexes using an ensemble approach","volume":"34","author":"Pan","year":"2018","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref23","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1093\/bioinformatics\/btp240","article-title":"Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy","volume":"25","author":"Tuncbag","year":"2009","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref24","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/1471-2105-11-174","article-title":"APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility","volume":"11","author":"Xia","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2020051819281340300_ref25","doi-asserted-by":"crossref","first-page":"2671","DOI":"10.1002\/prot.23094","article-title":"KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features","volume":"79","author":"Zhu","year":"2011","journal-title":"Proteins"},{"volume-title":"`NACCESS: Program for Calculating Accessibilities'","author":"Hubbard","key":"2020051819281340300_ref26"},{"key":"2020051819281340300_ref27","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1472-6807-9-51","article-title":"A generic method for assignment of reliability scores applied to solvent accessibility predictions","volume":"9","author":"Petersen","year":"2009","journal-title":"BMC Struct Biol"},{"key":"2020051819281340300_ref28","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2020051819281340300_ref29","doi-asserted-by":"crossref","DOI":"10.1038\/srep11476","article-title":"Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning","volume":"5","author":"Heffernan","year":"2015","journal-title":"Sci Rep"},{"key":"2020051819281340300_ref30","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1016\/j.jmb.2005.01.071","article-title":"The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins","volume":"347","author":"Dosztanyi","year":"2005","journal-title":"J Mol Biol"},{"key":"2020051819281340300_ref31","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1000376","article-title":"Prediction of protein binding regions in disordered proteins","volume":"5","author":"M\u00e9sz\u00e1ros","year":"2009","journal-title":"PLoS Comput Biol"},{"key":"2020051819281340300_ref32","doi-asserted-by":"crossref","first-page":"857","DOI":"10.1093\/bioinformatics\/btu744","article-title":"DISOPRED3: precise disordered region predictions with annotated protein-binding activity","volume":"31","author":"Jones","year":"2014","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref33","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1016\/j.str.2003.10.002","article-title":"Protein disorder prediction: implications for structural proteomics","volume":"11","author":"Linding","year":"2003","journal-title":"Structure"},{"key":"2020051819281340300_ref34","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1093\/bioinformatics\/bty653","article-title":"StackDPPred: a stacking based prediction of DNA-binding protein from sequence","volume":"35","author":"Mishra","year":"2019","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref35","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/1472-6807-8-21","article-title":"PSAIA\u2013protein structure and interaction analyzer","volume":"8","author":"Mihel","year":"2008","journal-title":"BMC Struct Biol"},{"key":"2020051819281340300_ref36","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2020051819281340300_ref37","doi-asserted-by":"crossref","first-page":"1419","DOI":"10.1007\/s00726-014-1710-6","article-title":"The construction of an amino acid network for understanding protein structure and function","volume":"46","author":"Yan","year":"2014","journal-title":"Amino Acids"},{"key":"2020051819281340300_ref38","doi-asserted-by":"crossref","first-page":"W375","DOI":"10.1093\/nar\/gkw383","article-title":"NAPS: network analysis of protein structures","volume":"44","author":"Chakrabarty","year":"2016","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"2020051819281340300_ref39","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: a library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans Intell Syst Technol"},{"key":"2020051819281340300_ref40","article-title":"Large-scale comparative assessment of computational predictors for lysine post-translational modification sites","author":"Chen","year":"2018","journal-title":"Brief Bioinform"},{"key":"2020051819281340300_ref41","doi-asserted-by":"crossref","first-page":"4223","DOI":"10.1093\/bioinformatics\/bty522","article-title":"Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref42","article-title":"Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods","author":"Li","year":"2018","journal-title":"Brief Bioinform"},{"key":"2020051819281340300_ref43","article-title":"iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites","author":"Song","year":"2018","journal-title":"Brief Bioinform"},{"key":"2020051819281340300_ref44","doi-asserted-by":"crossref","first-page":"684","DOI":"10.1093\/bioinformatics\/btx670","article-title":"PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy","volume":"34","author":"Song","year":"2018","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref45","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0086703","article-title":"Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naive Bayes","volume":"9","author":"Lou","year":"2014","journal-title":"PLoS One"},{"key":"2020051819281340300_ref46","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2020051819281340300_ref47","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach Learn"},{"key":"2020051819281340300_ref48","article-title":"GlycoMine struct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features","volume":"6","author":"Li","year":"2016","journal-title":"Sci Rep"},{"key":"2020051819281340300_ref49","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1093\/bioinformatics\/btu852","article-title":"GlycoMine: a machine learning-based approach for predicting N-, C-and O-linked glycosylation in the human proteome","volume":"31","author":"Li","year":"2015","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref50","doi-asserted-by":"crossref","first-page":"5765","DOI":"10.1038\/srep05765","article-title":"Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features","volume":"4","author":"Li","year":"2014","journal-title":"Sci Rep"},{"key":"2020051819281340300_ref51","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1093\/bioinformatics\/btt603","article-title":"Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets","volume":"30","author":"Wang","year":"2013","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref52","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0080635","article-title":"Maximum allowed solvent accessibilites of residues in proteins","volume":"8","author":"Tien","year":"2013","journal-title":"PloS One"},{"key":"2020051819281340300_ref53","doi-asserted-by":"crossref","first-page":"1773","DOI":"10.1007\/s00726-017-2474-6","article-title":"Protein binding hot spots prediction from sequence only by a new ensemble learning method","volume":"49","author":"Hu","year":"2017","journal-title":"Amino Acids"},{"key":"2020051819281340300_ref54","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.jtbi.2018.01.023","article-title":"PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework","volume":"443","author":"Song","year":"2018","journal-title":"J Theor Biol"},{"key":"2020051819281340300_ref55","doi-asserted-by":"crossref","first-page":"2853","DOI":"10.1093\/bioinformatics\/btw315","article-title":"DBSI server: DNA binding site identifier","volume":"32","author":"Sukumar","year":"2016","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref56","doi-asserted-by":"crossref","first-page":"1885","DOI":"10.1002\/prot.24330","article-title":"DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning-and template-based approaches","volume":"81","author":"Liu","year":"2013","journal-title":"Proteins"},{"key":"2020051819281340300_ref57","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1093\/bioinformatics\/btl672","article-title":"DP-bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins","volume":"23","author":"Hwang","year":"2007","journal-title":"Bioinformatics"},{"key":"2020051819281340300_ref58","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1038\/7603","article-title":"NMR structure of the Tn916 integrase\u2013DNA complex","volume":"6","author":"Wojciak","year":"1999","journal-title":"Nat Struct Mol Biol"},{"key":"2020051819281340300_ref59","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1093\/bioinformatics\/btx698","article-title":"Predicting protein-DNA binding free energy change upon missense mutations using modified MM\/PBSA approach: SAMPDI webserver","volume":"34","author":"Peng","year":"2018","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/21\/3\/1038\/33227300\/bbz037.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/21\/3\/1038\/33227300\/bbz037.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,5,19]],"date-time":"2020-05-19T01:09:12Z","timestamp":1589850552000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/21\/3\/1038\/5424984"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,8]]},"references-count":59,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,4,8]]},"published-print":{"date-parts":[[2020,5,21]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbz037","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2020,5]]},"published":{"date-parts":[[2019,4,8]]}}}