{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,8,19]],"date-time":"2023-08-19T21:03:49Z","timestamp":1692479029450},"reference-count":13,"publisher":"Springer Science and Business Media LLC","issue":"S13","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>Our method, self consistency grouping, i.e. <jats:italic>SCG<\/jats:italic>, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>SCG has potential for finding biological relationships under stringent conditions.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-13-s13-s3","type":"journal-article","created":{"date-parts":[[2012,12,5]],"date-time":"2012-12-05T22:16:16Z","timestamp":1354745776000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Self consistency grouping: a stringent clustering method"],"prefix":"10.1186","volume":"13","author":[{"given":"Bong-Hyun","family":"Kim","sequence":"first","affiliation":[]},{"given":"Bhadrachalam","family":"Chitturi","sequence":"additional","affiliation":[]},{"given":"Nick V","family":"Grishin","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,8,24]]},"reference":[{"key":"5288_CR1","volume-title":"Cluster Analysis","author":"BS Everitt","year":"2001","unstructured":"Everitt BS, Landau S, Leese M: Cluster Analysis. 4th edition. Arnold; 2001.","edition":"4"},{"issue":"15","key":"5288_CR2","doi-asserted-by":"publisher","first-page":"3201","DOI":"10.1093\/bioinformatics\/bti517","volume":"21","author":"J Handl","year":"2005","unstructured":"Handl J, Knowles J, Kell DB: Computational cluster validation in post-genomic data analysis. Bioinformatics 2005, 21(15):3201\u20133212. 10.1093\/bioinformatics\/bti517","journal-title":"Bioinformatics"},{"key":"5288_CR3","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1146\/annurev.bb.02.060173.000501","volume":"2","author":"JA Hartigan","year":"1973","unstructured":"Hartigan JA: Clustering. Annu Rev Biophys Bioeng 1973, 2: 81\u2013101. 10.1146\/annurev.bb.02.060173.000501","journal-title":"Annu Rev Biophys Bioeng"},{"issue":"7","key":"5288_CR4","doi-asserted-by":"publisher","first-page":"508","DOI":"10.1038\/nrg1113","volume":"4","author":"CA Ouzounis","year":"2003","unstructured":"Ouzounis CA, Coulson RM, Enright AJ, Kunin V, Pereira-Leal JB: Classification schemes for protein structure and function. Nat Rev Genet 2003, 4(7):508\u2013519.","journal-title":"Nat Rev Genet"},{"issue":"5338","key":"5288_CR5","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1126\/science.278.5338.631","volume":"278","author":"RL Tatusov","year":"1997","unstructured":"Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997, 278(5338):631\u2013637. 10.1126\/science.278.5338.631","journal-title":"Science"},{"issue":"11","key":"5288_CR6","doi-asserted-by":"publisher","first-page":"1025","DOI":"10.1109\/T-C.1973.223640","volume":"C-22","author":"RA Jarvis","year":"1973","unstructured":"Jarvis RA, Patrick EA: Clustering Using a Similarity Measure Based on Shared near Neighbors. Ieee Transactions on Computers 1973, C-22(11):1025\u20131034.","journal-title":"Ieee Transactions on Computers"},{"key":"5288_CR7","volume-title":"Bmc Bioinformatics","author":"C Huttenhower","year":"2007","unstructured":"Huttenhower C, Flamholz AI, Landis JN, Sahi S, Myers CL, Olszewski KL, Hibbs MA, Siemers NO, Troyanskaya OG, Coller HA: Nearest Neighbor Networks: clustering expression data based on gene neighborhoods. Bmc Bioinformatics 2007., 8:"},{"key":"5288_CR8","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1007\/11612704_24","volume":"3852","author":"DS Guru","year":"2006","unstructured":"Guru DS, Nagendraswamy HS: Clustering of interval-valued symbolic patterns based on mutual similarity value and the concept of k-mutual nearest neighborhood. Computer Vision - Accv 2006, Pt Ii 2006, 3852: 234\u2013243. 10.1007\/11612704_24","journal-title":"Computer Vision - Accv 2006, Pt Ii"},{"issue":"6","key":"5288_CR9","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1016\/0031-3203(91)90022-W","volume":"24","author":"KC Gowda","year":"1991","unstructured":"Gowda KC, Diday E: Symbolic Clustering Using a New Dissimilarity Measure. Pattern Recognition 1991, 24(6):567\u2013578. 10.1016\/0031-3203(91)90022-W","journal-title":"Pattern Recognition"},{"issue":"9","key":"5288_CR10","doi-asserted-by":"publisher","first-page":"1453","DOI":"10.1093\/bioinformatics\/bth078","volume":"20","author":"MJ de Hoon","year":"2004","unstructured":"de Hoon MJ, Imoto S, Nolan J, Miyano S: Open source clustering software. Bioinformatics 2004, 20(9):1453\u20131454. 10.1093\/bioinformatics\/bth078","journal-title":"Bioinformatics"},{"issue":"4","key":"5288_CR11","first-page":"536","volume":"247","author":"AG Murzin","year":"1995","unstructured":"Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536\u2013540.","journal-title":"J Mol Biol"},{"issue":"6","key":"5288_CR12","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1093\/bioinformatics\/16.6.566","volume":"16","author":"L Holm","year":"2000","unstructured":"Holm L, Park J: DaliLite workbench for protein structure comparison. Bioinformatics 2000, 16(6):566\u2013567. 10.1093\/bioinformatics\/16.6.566","journal-title":"Bioinformatics"},{"key":"5288_CR13","volume-title":"Information Retrieval","author":"CJV Rijsbergen","year":"1979","unstructured":"Rijsbergen CJV: Information Retrieval. 2nd edition. London, England: Butterworths; 1979.","edition":"2"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-S13-S3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T19:39:25Z","timestamp":1630525165000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-S13-S3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,8]]},"references-count":13,"journal-issue":{"issue":"S13","published-print":{"date-parts":[[2012,8]]}},"alternative-id":["5288"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-s13-s3","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,8]]},"assertion":[{"value":"24 August 2012","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S3"}}