{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,9]],"date-time":"2025-11-09T03:44:15Z","timestamp":1762659855142},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"S25","license":[{"start":{"date-parts":[[2019,12,1]],"date-time":"2019-12-01T00:00:00Z","timestamp":1575158400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,12,24]],"date-time":"2019-12-24T00:00:00Z","timestamp":1577145600000},"content-version":"vor","delay-in-days":23,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-019-3257-8","type":"journal-article","created":{"date-parts":[[2019,12,24]],"date-time":"2019-12-24T09:02:35Z","timestamp":1577178155000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Ranking near-native candidate protein structures via random forest classification"],"prefix":"10.1186","volume":"20","author":[{"given":"Hongjie","family":"Wu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongmei","family":"Huang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weizhong","family":"Lu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiming","family":"Fu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yijie","family":"Ding","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Qiu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haiou","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2019,12,24]]},"reference":[{"issue":"2","key":"3257_CR1","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1002\/pmic.201200334","volume":"13","author":"J Zhang","year":"2013","unstructured":"Zhang J, Xu D. Fast algorithm for population-based protein structural model analysis. PROTEOMICS. 2013;13(2):221\u20139.","journal-title":"PROTEOMICS"},{"issue":"7","key":"3257_CR2","doi-asserted-by":"publisher","first-page":"e38799","DOI":"10.1371\/journal.pone.0038799","volume":"7","author":"D Simoncini","year":"2012","unstructured":"Simoncini D, Berenger F, Shrestha R, et al. A probabilistic fragment-based protein structure prediction algorithm. PLoS One. 2012;7(7):e38799.","journal-title":"PLoS One"},{"key":"3257_CR3","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1146\/annurev.biophys.29.1.291","volume":"29","author":"MA Marti-Renom","year":"2000","unstructured":"Marti-Renom MA, Stuart A, Fiser A, et al. Comparative protein structure modeling of genes and genomes [J]. Annu Rev Biophys Biomol Struct. 2000;29:291\u2013325.","journal-title":"Annu Rev Biophys Biomol Struct"},{"issue":"7620","key":"3257_CR4","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1038\/nature19946","volume":"537","author":"PS Huang","year":"2016","unstructured":"Huang PS, Boyken SE, Baker D. The coming of age of de novo protein design. Nature. 2016;537(7620):320\u20137.","journal-title":"Nature"},{"issue":"6","key":"3257_CR5","doi-asserted-by":"publisher","first-page":"865","DOI":"10.1002\/jcc.20011","volume":"25","author":"Y Zhang","year":"2004","unstructured":"Zhang Y, Skolnick J. SPICKER: a clustering approach to identify near-native protein folds. J Comput Chem. 2004;25(6):865\u201371.","journal-title":"J Comput Chem"},{"issue":"1","key":"3257_CR6","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1186\/1471-2105-11-25","volume":"11","author":"SC Li","year":"2010","unstructured":"Li SC, Ng YK. Calibur: a tool for clustering large numbers of protein decoys. BMC Bioinformatics. 2010;11(1):25\u20130.","journal-title":"BMC Bioinformatics"},{"issue":"7","key":"3257_CR7","doi-asserted-by":"publisher","first-page":"939","DOI":"10.1093\/bioinformatics\/btr072","volume":"27","author":"F Berenger","year":"2011","unstructured":"Berenger F. Zhou, et al. entropy-accelerated exact clustering of protein decoys. Bioinformatics. 2011;27(7):939\u201345.","journal-title":"Bioinformatics"},{"issue":"1","key":"3257_CR8","first-page":"24","volume":"37","author":"X Huang","year":"2011","unstructured":"Huang X, Lu Q, Qian P. Evaluation of protein structure prediction clustering algorithm. Comput Eng. 2011;37(1):24\u20137.","journal-title":"Comput Eng"},{"issue":"3","key":"3257_CR9","doi-asserted-by":"publisher","first-page":"765","DOI":"10.1109\/TCBB.2011.142","volume":"9","author":"SC Li","year":"2012","unstructured":"Li SC, Bu D, Li M. Clustering 100,000 protein structure decoys in minutes. IEEE\/ACM Transac Comput Biol Bioinformatics. 2012;9(3):765\u201373.","journal-title":"IEEE\/ACM Transac Comput Biol Bioinformatics"},{"issue":"7","key":"3257_CR10","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.1093\/nar\/gki524","volume":"33","author":"Y Zhang","year":"2005","unstructured":"Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score [J]. Nucleic Acids Res. 2005;33(7):2302\u20139.","journal-title":"Nucleic Acids Res"},{"key":"3257_CR11","doi-asserted-by":"crossref","unstructured":"Liu H, Mo Y, Wang J, et al. A new feature selection method based on clustering[C], Eighth International Conference on Fuzzy Systems & Knowledge Discovery. Shanghai: IEEE; 2011.","DOI":"10.1109\/FSKD.2011.6019687"},{"issue":"2","key":"3257_CR12","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1109\/TCBB.2013.10","volume":"10","author":"DS Huang","year":"2013","unstructured":"Huang DS, Yu HJ. Normalized feature vectors: a novel alignment-free sequence comparison method based on the numbers of adjacent amino acids. IEEE\/ACM Transac Comput Biol Bioinformatics. 2013;10(2):457\u201367.","journal-title":"IEEE\/ACM Transac Comput Biol Bioinformatics"},{"issue":"2","key":"3257_CR13","doi-asserted-by":"publisher","first-page":"833","DOI":"10.1109\/TCE.2011.5955230","volume":"57","author":"FU Siddiqui","year":"2011","unstructured":"Siddiqui FU, Mat Isa NA. Enhanced moving K-means (EMKM) algorithm for image segmentation [J]. IEEE Trans Consum Electron. 2011;57(2):833\u201341.","journal-title":"IEEE Trans Consum Electron"},{"issue":"22","key":"3257_CR14","doi-asserted-by":"publisher","first-page":"3835","DOI":"10.1093\/bioinformatics\/bty458","volume":"34","author":"B Liu","year":"2018","unstructured":"Liu B, Weng F, et al. iEnhancer-EL: Identifying enhancers and their strength with ensemble learning approach. Bioinformatics. 2018;34(22):3835\u201342.","journal-title":"Bioinformatics"},{"issue":"1","key":"3257_CR15","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1080\/10361146.2012.755670","volume":"48","author":"R Hoffman","year":"2013","unstructured":"Hoffman R, Lazaridis D. The limits of compulsion: demographic influences on voter turnout in Australian state elections. Aust J Polit Sci. 2013;48(1):28\u201343.","journal-title":"Aust J Polit Sci"},{"issue":"6","key":"3257_CR16","doi-asserted-by":"publisher","first-page":"553","DOI":"10.2174\/1389203715666140724084019","volume":"15","author":"DS Huang","year":"2014","unstructured":"Huang DS, Zhang L, et al. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr Protein Pept Sci. 2014;15(6):553\u201360.","journal-title":"Curr Protein Pept Sci"},{"key":"3257_CR17","first-page":"88","volume":"1","author":"Q Liu","year":"2014","unstructured":"Liu Q, Lu J, Chen S. Design and analysis of traffic incident detection method based on random forest. J Southeast Univ (English Edition). 2014;1:88\u201395.","journal-title":"J Southeast Univ (English Edition)"},{"issue":"1","key":"3257_CR18","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1093\/bioinformatics\/btx579","volume":"34","author":"B Liu","year":"2018","unstructured":"Liu B, Yang F, et al. iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics. 2018;34(1):33\u201340. https:\/\/doi.org\/10.1093\/bioinformatics\/btx579.","journal-title":"Bioinformatics"},{"issue":"2","key":"3257_CR19","first-page":"226","volume":"48","author":"J Dang","year":"2017","unstructured":"Dang J, Jia R, Luo X, et al. Research on wear properties assessment of tubular turbine guide bearing based on H-K clustering-logistic regression model. Shuili Xuebao\/J Hydraulic Eng. 2017;48(2):226\u201333.","journal-title":"Shuili Xuebao\/J Hydraulic Eng"},{"issue":"5","key":"3257_CR20","doi-asserted-by":"publisher","first-page":"1154","DOI":"10.1109\/TCBB.2016.2609420","volume":"14","author":"L Yuan","year":"2017","unstructured":"Yuan L, Zhu L, et al. Nonconvex penalty based low-rank representation and sparse regression for eQTL mapping. IEEE\/ACM Transac Comput Biol Bioinformatics. 2017;14(5):1154\u201364.","journal-title":"IEEE\/ACM Transac Comput Biol Bioinformatics"},{"issue":"1","key":"3257_CR21","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1186\/1471-2105-14-62","volume":"14","author":"M Jamroz","year":"2013","unstructured":"Jamroz M, Kolinski A. ClusCo: clustering and comparison of protein models. Bmc Bioinformatics. 2013;14(1):62.","journal-title":"Bmc Bioinformatics"},{"key":"3257_CR22","unstructured":"Wang A, Wan G, Cheng Z, et al. An incremental extremely random forest classifier for online learning and tracking[C]. IEEE International Conference on Image Processing. Hong Kong: IEEE; 2010."},{"issue":"C","key":"3257_CR23","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1016\/j.ins.2014.03.043","volume":"285","author":"S del R\u00edo","year":"2014","unstructured":"del R\u00edo S. L\u00f3pez, Victoria, Ben\u00edtez, Jos\u00e9 Manuel, et al. on the use of MapReduce for imbalanced big data using random forest. Inform Sci Int J. 2014;285(C):112\u201337.","journal-title":"Inform Sci Int J"},{"issue":"6","key":"3257_CR24","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1093\/bioinformatics\/btv684","volume":"32","author":"P Pudlo","year":"2015","unstructured":"Pudlo P, Marin JM, Estoup A, et al. Reliable ABC model choice via random forests. Bioinformatics. 2015;32(6):859\u201366.","journal-title":"Bioinformatics"},{"issue":"18","key":"3257_CR25","first-page":"1","volume":"2017","author":"H Wu","year":"2017","unstructured":"Wu H, Li H, Min J, et al. Identify high-quality protein structural models by enhanced K-means [J]. Biomed Res Int. 2017;2017(18):1\u20139.","journal-title":"Biomed Res Int"},{"issue":"14","key":"3257_CR26","doi-asserted-by":"crossref","first-page":"i243","DOI":"10.1093\/bioinformatics\/btx255","volume":"33","author":"L Zhu","year":"2017","unstructured":"Zhu L, Zhang HB, et al. Direct AUC optimization of regulatory motifs. Bioinformatics. 2017;33(14):i243\u201351.","journal-title":"Bioinformatics"},{"issue":"21","key":"3257_CR27","doi-asserted-by":"publisher","first-page":"2744","DOI":"10.1093\/bioinformatics\/btq510","volume":"26","author":"ZH You","year":"2010","unstructured":"You ZH, Lei YK, et al. Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics. 2010;26(21):2744\u201351.","journal-title":"Bioinformatics"},{"key":"3257_CR28","doi-asserted-by":"crossref","unstructured":"Yu H, Zhang C, Wang G. A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst. 2016;91:189\u2013203.","DOI":"10.1016\/j.knosys.2015.05.028"},{"issue":"S1","key":"3257_CR29","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1002\/prot.24918","volume":"84","author":"J Yang","year":"2016","unstructured":"Yang J, Zhang W, He B, et al. Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade. Proteins Struct Funct Bioinformatics. 2016;84(S1):233\u201346.","journal-title":"Proteins Struct Funct Bioinformatics"},{"issue":"2","key":"3257_CR30","doi-asserted-by":"publisher","first-page":"0","DOI":"10.1006\/jmbi.2000.4170","volume":"304","author":"E Katoh","year":"2000","unstructured":"Katoh E, Hatta T, Shindo H, et al. High precision NMR structure of YhhP, a novel Escherichia coli protein implicated in cell division. J Mol Biol. 2000;304(2):0\u2013229.","journal-title":"J Mol Biol"},{"issue":"1","key":"3257_CR31","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1186\/s13059-018-1459-4","volume":"19","author":"GH Chuai","year":"2018","unstructured":"Chuai GH, Ma H, Yan JF, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 2018;19(1):80.","journal-title":"Genome Biol"},{"issue":"2","key":"3257_CR32","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1109\/TCBB.2015.2407393","volume":"14","author":"L Zhu","year":"2017","unstructured":"Zhu L, Deng SP, et al. Identifying spurious interactions in the protein-protein interaction networks using local similarity preserving embedding. IEEE\/ACM Transac Comput Biol Bioinformatics. 2017;14(2):345\u201352.","journal-title":"IEEE\/ACM Transac Comput Biol Bioinformatics"},{"issue":"2","key":"3257_CR33","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1002\/(SICI)1097-0134(199706)28:2<261::AID-PROT13>3.0.CO;2-G","volume":"28","author":"X Zhang","year":"2015","unstructured":"Zhang X, Boyar W, Toth MJ, et al. Structural definition of the C5a C terminus by two-dimensional nuclear magnetic resonance spectroscopy. Proteins Struct Func Bioinformatics. 2015;28(2):261\u20137.","journal-title":"Proteins Struct Func Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3257-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3257-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3257-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,23]],"date-time":"2020-12-23T00:13:48Z","timestamp":1608682428000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3257-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":33,"journal-issue":{"issue":"S25","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["3257"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3257-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]},"assertion":[{"value":"24 December 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"683"}}