{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T14:03:32Z","timestamp":1761919412053},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Clustering is commonly used to identify the best decoy among many generated in protein structure prediction when using energy alone is insufficient. Calculation of the pairwise distance matrix for a large decoy set is computationally expensive. Typically, only a reduced set of decoys using energy filtering is subjected to clustering analysis. A fast clustering method for a large decoy set would be beneficial to protein structure prediction and this still poses a challenge.<\/jats:p>\n               <jats:p>Results: We propose a method using propagation of geometric constraints to accelerate exact clustering, without compromising the distance measure. Our method can be used with any metric distance. Metrics that are expensive to compute and have known cheap lower and upper bounds will benefit most from the method. We compared our method's accuracy against published results from the SPICKER clustering software on 40 large decoy sets from the I-TASSER protein folding engine. We also performed some additional speed comparisons on six targets from the \u2018semfold\u2019 decoy set. In our tests, our method chose a better decoy than the energy criterion in 25 out of 40 cases versus 20 for SPICKER. Our method also was shown to be consistently faster than another fast software performing exact clustering named Calibur. In some cases, our approach can even outperform the speed of an approximate method.<\/jats:p>\n               <jats:p>Availability: Our C++ software is released under the GNU General Public License. It can be downloaded from http:\/\/www.riken.jp\/zhangiru\/software\/durandal_released.tgz.<\/jats:p>\n               <jats:p>Contact: \u00a0kamzhang@riken.jp<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr072","type":"journal-article","created":{"date-parts":[[2011,2,11]],"date-time":"2011-02-11T01:35:13Z","timestamp":1297388113000},"page":"939-945","source":"Crossref","is-referenced-by-count":23,"title":["Entropy-accelerated exact clustering of protein decoys"],"prefix":"10.1093","volume":"27","author":[{"given":"Francois","family":"Berenger","sequence":"first","affiliation":[{"name":"Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan"}]},{"given":"Yong","family":"Zhou","sequence":"additional","affiliation":[{"name":"Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan"}]},{"given":"Rojan","family":"Shrestha","sequence":"additional","affiliation":[{"name":"Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan"}]},{"given":"Kam Y. J.","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan"}]}],"member":"286","published-online":{"date-parts":[[2011,2,9]]},"reference":[{"key":"2023012512180218300_B1","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1126\/science.181.4096.223","article-title":"Principles that govern the folding of protein chains","volume":"181","author":"Anfinsen","year":"1973","journal-title":"Science"},{"key":"2023012512180218300_B2","doi-asserted-by":"crossref","first-page":"2918","DOI":"10.1093\/bioinformatics\/btq542","article-title":"PAR: a PARallel and distributed job crusher","volume":"26","author":"Berenger","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512180218300_B3","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1002\/1096-987X(200102)22:3<339::AID-JCC1006>3.0.CO;2-R","article-title":"Finding the needle in a haystack: educing native folds from ambiguous ab initio protein structure predictions","volume":"22","author":"Betancourt","year":"2001","journal-title":"J. Comput. Chem."},{"key":"2023012512180218300_B4","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1002\/prot.1170","article-title":"Rosetta in casp4: Progress in ab initio protein structure prediction","volume":"45","author":"Bonneau","year":"2001","journal-title":"Proteins Struct. Funct. Bioinformatics"},{"key":"2023012512180218300_B5","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1146\/annurev.biochem.77.062906.171838","article-title":"Macromolecular modeling with rosetta","volume":"77","author":"Das","year":"2008","journal-title":"Annu. Rev. Biochem."},{"key":"2023012512180218300_B6","doi-asserted-by":"crossref","first-page":"3179","DOI":"10.1093\/bioinformatics\/bti450","article-title":"Hcpm\u2013program for hierarchical clustering of protein models","volume":"21","author":"Gront","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512180218300_B7","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1002\/qua.20741","article-title":"Exploring protein energy landscapes with hierarchical clustering","volume":"105","author":"Gront","year":"2005","journal-title":"Int. J. Quant. Chem."},{"key":"2023012512180218300_B8","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining, Inference, and Prediction","author":"Hastie","year":"2009","edition":"Second"},{"key":"2023012512180218300_B9","first-page":"228","article-title":"Development of an ab initio protein structure prediction system able","volume":"14","author":"Ishida","year":"2003","journal-title":"Genome Inform."},{"key":"2023012512180218300_B10","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1145\/331499.331504","article-title":"Data clustering: A review","volume":"31","author":"Jain","year":"1999","journal-title":"ACM Comput. Surv."},{"key":"2023012512180218300_B11","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1002\/(SICI)1097-0134(19991201)37:4<554::AID-PROT6>3.0.CO;2-1","article-title":"Unit-vector rms (urms) as a tool to analyze molecular dynamics trajectories","volume":"37","author":"Kedem","year":"1999","journal-title":"Proteins"},{"key":"2023012512180218300_B12","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1529\/biophysj.107.116095","article-title":"Folding pathway of the b1 domain of protein g explored by multiscale modeling","volume":"94","author":"Kmiecik","year":"2008","journal-title":"Biophys. J."},{"key":"2023012512180218300_B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0925-2312(98)00030-7","article-title":"The self-organizing map","volume":"21","author":"Kohonen","year":"1998","journal-title":"Neurocomputing"},{"key":"2023012512180218300_B14","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1002\/jcc.20251","article-title":"Scud: Fast structure clustering of decoys using reference state to remove overall rotation","volume":"26","author":"Li","year":"2005","journal-title":"J. Comput. Chem."},{"key":"2023012512180218300_B15","doi-asserted-by":"crossref","first-page":"985","DOI":"10.1002\/prot.21084","article-title":"A model of local-minima distribution on conformational space and its application to protein structure prediction","volume":"64","author":"Li","year":"2006","journal-title":"Proteins Struct. Funct. Bioinformatics"},{"key":"2023012512180218300_B16","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/1471-2105-11-25","article-title":"Calibur: a tool for clustering large numbers of protein decoys","volume":"11","author":"Li","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512180218300_B17","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1002\/prot.22540","article-title":"Structure prediction for casp8 with all-atom refinement using rosetta","volume":"77","author":"Raman","year":"2009","journal-title":"Proteins Struct. Funct. Bioinformatics"},{"key":"2023012512180218300_B18","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1472-6807-2-3","article-title":"A comprehensive analysis of 40 blind protein structure predictions","volume":"2","author":"Samudrala","year":"2002","journal-title":"BMC Struct. Biol."},{"key":"2023012512180218300_B19","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1145\/359581.359599","article-title":"The choice of reference points in best-match file searching","volume":"20","author":"Shapiro","year":"1977","journal-title":"Commun. ACM"},{"key":"2023012512180218300_B20","doi-asserted-by":"crossref","first-page":"11158","DOI":"10.1073\/pnas.95.19.11158","article-title":"Clustering of low-energy conformations near the native structures of small proteins","volume":"95","author":"Shortle","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512180218300_B21","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.sbi.2006.02.004","article-title":"In quest of an empirical potential for protein structure prediction","volume":"16","author":"Skolnick","year":"2006","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012512180218300_B22","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1107\/S0108767302011637","article-title":"A revised proof of the metric properties of optimally superimposed vector sets","volume":"58","author":"Steipe","year":"2002","journal-title":"Acta Crystallogr. Sect. A"},{"key":"2023012512180218300_B23","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/1741-7007-5-17","article-title":"Ab initio modeling of small proteins by iterative tasser simulations","volume":"5","author":"Wu","year":"2007","journal-title":"BMC Biol."},{"key":"2023012512180218300_B24","doi-asserted-by":"crossref","first-page":"865","DOI":"10.1002\/jcc.20011","article-title":"Spicker: a clustering approach to identify near-native protein folds","volume":"25","author":"Zhang","year":"2004","journal-title":"J. Comput. Chem."},{"key":"2023012512180218300_B25","doi-asserted-by":"crossref","first-page":"2647","DOI":"10.1529\/biophysj.104.045385","article-title":"Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins","volume":"87","author":"Zhang","year":"2004","journal-title":"Biophys. J."},{"key":"2023012512180218300_B26","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1002\/prot.20724","article-title":"Tasser: an automated method for the prediction of protein tertiary structures in casp6","volume":"61","author":"Zhang","year":"2005","journal-title":"Proteins Struct. Funct. Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/7\/939\/48869494\/bioinformatics_27_7_939.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/7\/939\/48869494\/bioinformatics_27_7_939.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T14:55:04Z","timestamp":1674658504000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/7\/939\/233344"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,2,9]]},"references-count":26,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2011,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr072","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,4,1]]},"published":{"date-parts":[[2011,2,9]]}}}