{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T14:05:07Z","timestamp":1761919507759},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories.<\/jats:p>\n               <jats:p>Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors\u2014which were introduced by R\u00f8gen and co-workers\u2014and subsequently performing K-means clustering.<\/jats:p>\n               <jats:p>Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes.<\/jats:p>\n               <jats:p>Contact: \u00a0thamelry@binf.ku.dk; harder@binf.ku.dk<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr692","type":"journal-article","created":{"date-parts":[[2011,12,24]],"date-time":"2011-12-24T02:37:52Z","timestamp":1324694272000},"page":"510-515","source":"Crossref","is-referenced-by-count":31,"title":["Fast large-scale clustering of protein structures using Gauss integrals"],"prefix":"10.1093","volume":"28","author":[{"given":"Tim","family":"Harder","sequence":"first","affiliation":[{"name":"1 The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, 2Department of Electrical Engineering and 3Department of Mathematics, Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mikael","family":"Borg","sequence":"additional","affiliation":[{"name":"1 The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, 2Department of Electrical Engineering and 3Department of Mathematics, Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wouter","family":"Boomsma","sequence":"additional","affiliation":[{"name":"1 The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, 2Department of Electrical Engineering and 3Department of Mathematics, Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"R\u00f8gen","sequence":"additional","affiliation":[{"name":"1 The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, 2Department of Electrical Engineering and 3Department of Mathematics, Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Hamelryck","sequence":"additional","affiliation":[{"name":"1 The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, 2Department of Electrical Engineering and 3Department of Mathematics, Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2011,12,22]]},"reference":[{"key":"2023012512182800100_B1","first-page":"1027","article-title":"k-means++: the advantages of careful seeding","volume-title":"Proceedings of the 18th Annual ACM-SIAM Symposium","author":"Arthur","year":"2007"},{"key":"2023012512182800100_B2","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1093\/bioinformatics\/btr072","article-title":"Entropy-accelerated exact clustering of protein decoys","volume":"27","author":"Berenger","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512182800100_B3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012512182800100_B4","volume-title":"Pattern Recognition and Machine Learning.","author":"Bishop","year":"2006"},{"key":"2023012512182800100_B5","doi-asserted-by":"crossref","first-page":"8932","DOI":"10.1073\/pnas.0801715105","article-title":"A generative, probabilistic model of local protein structure","volume":"105","author":"Boomsma","year":"2008","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023012512182800100_B6","first-page":"65","article-title":"A probabilistic approach to protein structure prediction: PHAISTOS in CASP9","volume-title":"LASR","author":"Borg","year":"2009"},{"key":"2023012512182800100_B7","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.1093\/bioinformatics\/btp474","article-title":"Efficient SCOP-fold classification and retrieval using index-based protein substructure alignments","volume":"25","author":"Chi","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512182800100_B8","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1186\/1471-2105-11-306","article-title":"Beyond rotamers: a generative, probabilistic model of side chains in proteins","volume":"11","author":"Harder","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512182800100_B9","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Class."},{"key":"2023012512182800100_B10","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond K-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"2023012512182800100_B11","doi-asserted-by":"crossref","first-page":"922","DOI":"10.1107\/S0567739476001873","article-title":"A solution for the best rotation to relate two sets of vectors","volume":"32","author":"Kabsch","year":"1976","journal-title":"Acta Crystallogr. A"},{"key":"2023012512182800100_B12","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1107\/S0567739478001680","article-title":"A discussion of the solution for the best rotation to relate two sets of vectors","volume":"34","author":"Kabsch","year":"1978","journal-title":"Acta Crystallogr. A"},{"key":"2023012512182800100_B13","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.tibs.2004.11.008","article-title":"Protein folding and the organization of the protein topology universe","volume":"30","author":"Lindorff-Larsen","year":"2005","journal-title":"Trends Biochem. Sci."},{"key":"2023012512182800100_B14","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1002\/jcc.20251","article-title":"SCUD: fast structure clustering of decoys using reference state to remove overall rotation","volume":"26","author":"Li","year":"2005","journal-title":"J. Comput. Chem."},{"key":"2023012512182800100_B15","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/1471-2105-11-25","article-title":"Calibur: a tool for clustering large numbers of protein decoys","volume":"11","author":"Li","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512182800100_B16","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023012512182800100_B17","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/S0969-2126(96)00018-4","article-title":"Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding","volume":"4","author":"M\u00fcller","year":"1996","journal-title":"Structure"},{"key":"2023012512182800100_B18","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012512182800100_B19","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"CATH: a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023012512182800100_B20","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc."},{"key":"2023012512182800100_B21","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/S0025-5564(02)00216-X","article-title":"A new family of global protein shape descriptors","volume":"182","author":"R\u00f8gen","year":"2003","journal-title":"Math. Biosci."},{"key":"2023012512182800100_B22","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1073\/pnas.2636460100","article-title":"Automatic classification of protein structure by using Gauss integrals","volume":"100","author":"R\u00f8gen","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512182800100_B23","doi-asserted-by":"crossref","first-page":"1523","DOI":"10.1088\/0953-8984\/17\/18\/010","article-title":"Evaluating protein structure descriptors and tuning Gauss integral based descriptors","volume":"17","author":"R\u00f8gen","year":"2005","journal-title":"J. Phys. Condens. Matter"},{"key":"2023012512182800100_B24","doi-asserted-by":"crossref","first-page":"11158","DOI":"10.1073\/pnas.95.19.11158","article-title":"Clustering of low-energy conformations near the native structures of small proteins","volume":"95","author":"Shortle","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012512182800100_B25","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1006\/jmbi.1997.0959","article-title":"Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions","volume":"268","author":"Simons","year":"1997","journal-title":"J. Mol. Biol."},{"key":"2023012512182800100_B26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1348\/000711005X48266","article-title":"K-means clustering: a half-century synthesis","volume":"59","author":"Steinley","year":"2006","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"2023012512182800100_B27","doi-asserted-by":"crossref","first-page":"2171","DOI":"10.1093\/bioinformatics\/btl332","article-title":"THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures","volume":"22","author":"Theobald","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012512182800100_B28","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/S0969-2126(00)00031-9","article-title":"The sequence, crystal structure determination and refinement of two crystal forms of lipase B from Candida antarctica","volume":"2","author":"Uppenberg","year":"1994","journal-title":"Structure"},{"key":"2023012512182800100_B29","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/1741-7007-5-17","article-title":"Ab initio modeling of small proteins by iterative TASSER simulations","volume":"5","author":"Wu","year":"2007","journal-title":"BMC Biol."},{"key":"2023012512182800100_B30","doi-asserted-by":"crossref","first-page":"865","DOI":"10.1002\/jcc.20011","article-title":"SPICKER: a clustering approach to identify near-native protein folds","volume":"25","author":"Zhang","year":"2004","journal-title":"J. Comput. Chem."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/4\/510\/48876466\/bioinformatics_28_4_510.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/4\/510\/48876466\/bioinformatics_28_4_510.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T15:00:08Z","timestamp":1674658808000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/4\/510\/212290"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,12,22]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr692","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,2,15]]},"published":{"date-parts":[[2011,12,22]]}}}