{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,9]],"date-time":"2025-11-09T03:46:10Z","timestamp":1762659970301},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"S13","license":[{"start":{"date-parts":[[2020,9,1]],"date-time":"2020-09-01T00:00:00Z","timestamp":1598918400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T00:00:00Z","timestamp":1600300800000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Identification of hot spots in protein-DNA interfaces provides crucial information for the research on protein-DNA interaction and drug design. As experimental methods for determining hot spots are time-consuming, labor-intensive and expensive, there is a need for developing reliable computational method to predict hot spots on a large scale.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Here, we proposed a new method named sxPDH based on supervised isometric feature mapping (S-ISOMAP) and extreme gradient boosting (XGBoost) to predict hot spots in protein-DNA complexes. We obtained 114 features from a combination of the protein sequence, structure, network and solvent accessible information, and systematically assessed various feature selection methods and feature dimensionality reduction methods based on manifold learning. The results show that the S-ISOMAP method is superior to other feature selection or manifold learning methods. XGBoost was then used to develop hot spots prediction model sxPDH based on the three dimensionality-reduced features obtained from S-ISOMAP.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Our method sxPDH boosts prediction performance using S-ISOMAP and XGBoost. The AUC of the model is 0.773, and the F1 score is 0.713. Experimental results on benchmark dataset indicate that sxPDH can achieve generally better performance in predicting hot spots compared to the state-of-the-art methods.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-020-03683-3","type":"journal-article","created":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T00:03:04Z","timestamp":1600300984000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Prediction of hot spots in protein\u2013DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting"],"prefix":"10.1186","volume":"21","author":[{"given":"Ke","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sijia","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Di","family":"Yan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yannan","family":"Bin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junfeng","family":"Xia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,9,17]]},"reference":[{"issue":"2","key":"3683_CR1","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1109\/TCBB.2017.2701379","volume":"16","author":"J Zhang","year":"2017","unstructured":"Zhang J, Zhang Z, Chen Z, Deng L. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. IEEE\/ACM Trans Comput Biol Bioinform. 2017;16(2):396\u2013406.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"2","key":"3683_CR2","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1038\/nrg3141","volume":"13","author":"J K\u00f6nig","year":"2012","unstructured":"K\u00f6nig J, Zarnack K, Luscombe NM, Ule J. Protein\u2013RNA interactions: new genomic technologies and perspectives. Nat Rev Genet. 2012;13(2):77\u201383.","journal-title":"Nat Rev Genet"},{"issue":"5196","key":"3683_CR3","doi-asserted-by":"publisher","first-page":"383","DOI":"10.1126\/science.7529940","volume":"267","author":"T Clackson","year":"1995","unstructured":"Clackson T, Wells JA. A hot spot of binding energy in a hormone-receptor interface. Science. 1995;267(5196):383\u20136.","journal-title":"Science"},{"issue":"4","key":"3683_CR4","doi-asserted-by":"publisher","first-page":"803","DOI":"10.1002\/prot.21396","volume":"68","author":"IS Moreira","year":"2007","unstructured":"Moreira IS, Fernandes PA, Ramos MJ. Hot spots\u2014a review of the protein\u2013protein interface determinant amino-acid residues. Proteins. 2007;68(4):803\u201312.","journal-title":"Proteins"},{"issue":"14","key":"3683_CR5","doi-asserted-by":"publisher","first-page":"18065","DOI":"10.18632\/oncotarget.7695","volume":"7","author":"J Xia","year":"2016","unstructured":"Xia J, Yue Z, Di Y, Zhu X, Zheng C-H. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features. Oncotarget. 2016;7(14):18065\u201375.","journal-title":"Oncotarget"},{"issue":"9","key":"3683_CR6","doi-asserted-by":"publisher","first-page":"1473","DOI":"10.1093\/bioinformatics\/btx822","volume":"34","author":"Y Pan","year":"2017","unstructured":"Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy hot spots in protein\u2013RNA complexes using an ensemble approach. Bioinformatics. 2017;34(9):1473\u201380.","journal-title":"Bioinformatics"},{"issue":"1","key":"3683_CR7","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1186\/s12859-018-2009-5","volume":"19","author":"Y Qiao","year":"2018","unstructured":"Qiao Y, Xiong Y, Gao H, Zhu X, Chen P. Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinformatics. 2018;19(1):14. https:\/\/doi.org\/10.1186\/s12859-018-2009-5.","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"3683_CR8","doi-asserted-by":"publisher","first-page":"242","DOI":"10.3390\/genes10030242","volume":"10","author":"L Deng","year":"2019","unstructured":"Deng L, Sui Y, Zhang J. XGBPRH: prediction of binding hot spots at protein\u2013RNA interfaces utilizing extreme gradient boosting. Genes. 2019;10(3):242. https:\/\/doi.org\/10.3390\/genes10030242.","journal-title":"Genes"},{"issue":"3","key":"3683_CR9","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1093\/protein\/gzr066","volume":"25","author":"L Wang","year":"2012","unstructured":"Wang L, Liu Z-P, Zhang X-S, Chen L. Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Eng Des Sel. 2012;25(3):119\u201326.","journal-title":"Protein Eng Des Sel"},{"key":"3683_CR10","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1007\/978-1-4939-7717-8_13","volume":"1754","author":"Y Xiong","year":"2018","unstructured":"Xiong Y, Zhu X, Dai H, Wei DQ. Survey of computational approaches for prediction of DNA-binding residues on protein surfaces. Methods Mol Biol. 2018;1754:223\u201334.","journal-title":"Methods Mol Biol"},{"key":"3683_CR11","doi-asserted-by":"publisher","unstructured":"Zhang S, Zhao L, Zheng C-H, Xia J. A feature-based approach to predict hot spots in protein\u2013DNA binding interfaces. Brief Bioinform. 2019. https:\/\/doi.org\/10.1093\/bib\/bbz037.","DOI":"10.1093\/bib\/bbz037"},{"issue":"6","key":"3683_CR12","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1145\/3136625","volume":"50","author":"J Li","year":"2018","unstructured":"Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H. Feature selection: a data perspective. ACM Comput Surv. 2018;50(6):94. https:\/\/doi.org\/10.1145\/3136625.","journal-title":"ACM Comput Surv"},{"key":"3683_CR13","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1016\/j.neucom.2017.11.077","volume":"300","author":"J Cai","year":"2018","unstructured":"Cai J, Luo J, Wang S, Yang S. Feature selection in machine learning: a new perspective. Neurocomputing. 2018;300:70\u20139.","journal-title":"Neurocomputing"},{"issue":"5500","key":"3683_CR14","doi-asserted-by":"publisher","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","volume":"290","author":"JB Tenenbaum","year":"2000","unstructured":"Tenenbaum JB, De Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290(5500):2319\u201323.","journal-title":"Science"},{"issue":"5500","key":"3683_CR15","doi-asserted-by":"publisher","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","volume":"290","author":"ST Roweis","year":"2000","unstructured":"Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323\u20136.","journal-title":"Science"},{"issue":"6","key":"3683_CR16","doi-asserted-by":"publisher","first-page":"1098","DOI":"10.1109\/TSMCB.2005.850151","volume":"35","author":"X Geng","year":"2005","unstructured":"Geng X, Zhan D-C, Zhou Z-H. Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern B Cybern. 2005;35(6):1098\u2013107.","journal-title":"IEEE Trans Syst Man Cybern B Cybern"},{"key":"3683_CR17","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1145\/2939672.2939785","volume-title":"Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining","author":"T Chen","year":"2016","unstructured":"Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In:  Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining; 2016. p. 785\u201394."},{"issue":"3","key":"3683_CR18","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1111\/j.1745-3984.2003.tb01108.x","volume":"40","author":"I Borg","year":"2003","unstructured":"Borg I, Groenen P. Modern multidimensional scaling: theory and applications. J Educ Meas. 2003;40(3):277\u201380.","journal-title":"J Educ Meas"},{"key":"3683_CR19","doi-asserted-by":"publisher","unstructured":"Chen Z, Liu X, Li F, et al. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform. 2018. https:\/\/doi.org\/10.1093\/bib\/bby089.","DOI":"10.1093\/bib\/bby089"},{"issue":"24","key":"3683_CR20","doi-asserted-by":"publisher","first-page":"4223","DOI":"10.1093\/bioinformatics\/bty522","volume":"34","author":"F Li","year":"2018","unstructured":"Li F, Li C, Marquez-Lago TT, et al. Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome. Bioinformatics. 2018;34(24):4223\u201331.","journal-title":"Bioinformatics"},{"key":"3683_CR21","doi-asserted-by":"publisher","unstructured":"Li F, Wang Y, Li C, et al. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform. 2018. https:\/\/doi.org\/10.1093\/bib\/bby077.","DOI":"10.1093\/bib\/bby077"},{"issue":"2","key":"3683_CR22","doi-asserted-by":"publisher","first-page":"638","DOI":"10.1093\/bib\/bby028","volume":"20","author":"J Song","year":"2018","unstructured":"Song J, Wang Y, Li F, et al. iProt-sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinform. 2018;20(2):638\u201358.","journal-title":"Brief Bioinform"},{"issue":"4","key":"3683_CR23","doi-asserted-by":"publisher","first-page":"684","DOI":"10.1093\/bioinformatics\/btx670","volume":"34","author":"J Song","year":"2017","unstructured":"Song J, Li F, Leier A, et al. PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy. Bioinformatics. 2017;34(4):684\u20137.","journal-title":"Bioinformatics"},{"key":"3683_CR24","doi-asserted-by":"crossref","unstructured":"De Ridder D, Kouropteva O, Okun O, et al. Supervised locally linear embedding. In:  Artificial Neural Networks and Neural Information Processing\u2014ICANN\/ICONIP: Springer; 2003. p. 333\u201341.","DOI":"10.1007\/3-540-44989-2_40"},{"issue":"1","key":"3683_CR25","doi-asserted-by":"publisher","first-page":"e86703","DOI":"10.1371\/journal.pone.0086703","volume":"9","author":"W Lou","year":"2014","unstructured":"Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naive Bayes. PLoS One. 2014;9(1):e86703.","journal-title":"PLoS One"},{"issue":"8","key":"3683_CR26","doi-asserted-by":"publisher","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","volume":"27","author":"H Peng","year":"2005","unstructured":"Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell. 2005;27(8):1226\u201338.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"1\u20133","key":"3683_CR27","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"46","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1\u20133):389\u2013422.","journal-title":"Mach Learn"},{"key":"3683_CR28","first-page":"19","volume-title":"VSURF: an R package for variable selection using random forests","author":"R Genuer","year":"2015","unstructured":"Genuer R, Poggi J-M, Tuleau-Malot C. VSURF: an R package for variable selection using random forests, vol. 7; 2015. p. 19\u201333."},{"issue":"5","key":"3683_CR29","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1093\/bioinformatics\/btx698","volume":"34","author":"Y Peng","year":"2017","unstructured":"Peng Y, Sun L, Jia Z, Li L, Alexov E. Predicting protein\u2013DNA binding free energy change upon missense mutations using modified MM\/PBSA approach: SAMPDI webserver. Bioinformatics. 2017;34(5):779\u201386.","journal-title":"Bioinformatics"},{"key":"3683_CR30","doi-asserted-by":"publisher","first-page":"e1006615","DOI":"10.1371\/journal.pcbi.1006615","volume":"14","author":"N Zhang","year":"2018","unstructured":"Zhang N, Chen Y, Zhao F, et al. PremPDI estimates and interprets the effects of missense mutations on protein\u2013DNA interactions. PLoS Comput Biol. 2018;14:e1006615.","journal-title":"PLoS Comput Biol"},{"key":"3683_CR31","doi-asserted-by":"publisher","first-page":"W241","DOI":"10.1093\/nar\/gkx236","volume":"45","author":"DEV Pires","year":"2017","unstructured":"Pires DEV, Ascher DB. mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions. Nucleic Acids Res. 2017;45:W241\u20136.","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03683-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-03683-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03683-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,16]],"date-time":"2021-09-16T23:11:17Z","timestamp":1631833877000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03683-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9]]},"references-count":31,"journal-issue":{"issue":"S13","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["3683"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03683-3","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9]]},"assertion":[{"value":"17 September 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declared that they have no competing interests exist.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"381"}}