{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T13:01:55Z","timestamp":1759496515057},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this article we provide a general selection scheme, the <jats:italic>level independent clustering selection method<\/jats:italic>, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of <jats:italic>cohesive clusters<\/jats:italic>. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1748-7188-4-12","type":"journal-article","created":{"date-parts":[[2009,10,20]],"date-time":"2009-10-20T06:19:06Z","timestamp":1256019546000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods"],"prefix":"10.1186","volume":"4","author":[{"given":"L\u00e1szl\u00f3 A","family":"Zahor\u00e1nszky","sequence":"first","affiliation":[]},{"given":"Gyula Y","family":"Katona","sequence":"additional","affiliation":[]},{"given":"P\u00e9ter","family":"H\u00e1ri","sequence":"additional","affiliation":[]},{"given":"Andr\u00e1s","family":"M\u00e1ln\u00e1si-Csizmadia","sequence":"additional","affiliation":[]},{"given":"Katharina A","family":"Zweig","sequence":"additional","affiliation":[]},{"given":"Gergely","family":"Zahor\u00e1nszky-K\u00f6halmi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2009,10,19]]},"reference":[{"key":"69_CR1","doi-asserted-by":"publisher","first-page":"1094","DOI":"10.1021\/ci00021a011","volume":"34","author":"GM Downs","year":"1994","unstructured":"Downs GM, Willett P: Similarity searching and clustering of chemical-structure databases using molecular property data. J Chem Inf Comput Sci. 1994, 34: 1094-1102.","journal-title":"J Chem Inf Comput Sci"},{"key":"69_CR2","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1021\/ci9800211","volume":"38","author":"P Willett","year":"1998","unstructured":"Willett P: Chemical similarity searching. J Chem Inf Comput Sci. 1998, 38: 983-996.","journal-title":"J Chem Inf Comput Sci"},{"key":"69_CR3","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1021\/ci990086j","volume":"40","author":"DJ Wild","year":"2000","unstructured":"Wild DJ, Blankley CJ: Comparison of 2D fingerprint types and hierarchy level selection methods fo structural grouping using Ward's clustering. J Chem Inf Comput Sci. 2000, 40: 155-162.","journal-title":"J Chem Inf Comput Sci"},{"key":"69_CR4","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1021\/ci9501047","volume":"36","author":"RD Brown","year":"1996","unstructured":"Brown RD, Martin YC: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci. 1996, 36: 572-584.","journal-title":"J Chem Inf Comput Sci"},{"key":"69_CR5","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","volume":"58","author":"JH Ward","year":"1963","unstructured":"Ward JH: Hierarchical grouping to optimize an objective function. J Amer Statist Assoc. 1963, 58: 236-244. 10.2307\/2282967.","journal-title":"J Amer Statist Assoc"},{"key":"69_CR6","volume-title":"Molecular modeling, principles and applications","author":"AR Leach","year":"1997","unstructured":"Leach AR: Molecular modeling, principles and applications. 1997, Addison-Wesley Publishing Company"},{"key":"69_CR7","doi-asserted-by":"publisher","first-page":"1063","DOI":"10.1093\/protein\/9.11.1063","volume":"9","author":"LA Kelley","year":"1996","unstructured":"Kelley LA, Gardner SP, Sutcliffe MG: An automated approach for clustering an ensemble for NMR-derived protein structures into conformationally related subfamilies. Protein Eng. 1996, 9: 1063-1065.","journal-title":"Protein Eng"},{"key":"69_CR8","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2346830","volume":"28","author":"JA Hartigan","year":"1979","unstructured":"Hartigan JA, Wong MA: A K-means clustering algorithm. Applied Statistics. 1979, 28: 100-108. 10.2307\/2346830.","journal-title":"Applied Statistics"},{"key":"69_CR9","doi-asserted-by":"publisher","first-page":"1025","DOI":"10.1109\/T-C.1973.223640","volume":"C22","author":"RA Jarvis","year":"1973","unstructured":"Jarvis RA, Patrick EA: Clustering using a similarity measure based on shared near neighbors. IEEE Trans Comput. 1973, C22: 1025-1034. 10.1109\/T-C.1973.223640.","journal-title":"IEEE Trans Comput"},{"key":"69_CR10","doi-asserted-by":"publisher","first-page":"7821","DOI":"10.1073\/pnas.122653799","volume":"99","author":"M Girvan","year":"2002","unstructured":"Girvan M, Newman MEJ: Community structure in social and biological networks. Proceedings of the National Academy of Sciences. 2002, 99: 7821-7826. 10.1073\/pnas.122653799.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"69_CR11","doi-asserted-by":"publisher","first-page":"814","DOI":"10.1038\/nature03607","volume":"435","author":"G Palla","year":"2005","unstructured":"Palla G, Der\u00e9nyi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005, 435: 814-818.","journal-title":"Nature"},{"key":"69_CR12","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1093\/comjnl\/11.2.177","volume":"11","author":"N Jardine","year":"1968","unstructured":"Jardine N, Sibson R: The construction of hierarchic and non-hierarchic classifications. Comp J. 1968, 11: 177-","journal-title":"Comp J"},{"key":"69_CR13","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1093\/comjnl\/13.2.156","volume":"13","author":"AJ Cole","year":"1970","unstructured":"Cole AJ, Wishar D: An improved algorithm for the Jardine-Sibson method of generating overlapping clusters. Comp J. 1970, 13: 156-163. 10.1093\/comjnl\/13.2.156.","journal-title":"Comp J"},{"issue":"6","key":"69_CR14","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1016\/j.compbiolchem.2006.10.001","volume":"30","author":"S Zhang","year":"2006","unstructured":"Zhang S, Ning X, Zhang XS: Identification of functional modules in a PPI network by clique percolation clustering. Computational Biology and Chemistry. 2006, 30 (6): 445-451.","journal-title":"Computational Biology and Chemistry"},{"key":"69_CR15","volume-title":"Functional and transcriptional coherency of modules in the human protein interaction network","author":"ME Futschik","year":"2007","unstructured":"Futschik ME, Chaurasia G, Tschaut A, Russ J, Babu MM, Herzel H: Functional and transcriptional coherency of modules in the human protein interaction network. Journal of Integrative Bioinformatics. 2007, 4 (3): doi:10.2390\/biecoll-jib-2007-76"},{"key":"69_CR16","doi-asserted-by":"crossref","unstructured":"Gaertler M: Network analysis: Methodological foundations. 178-215. Springer-Verlag 2005 chap. Clustering","DOI":"10.1007\/978-3-540-31955-9_8"},{"key":"69_CR17","doi-asserted-by":"publisher","first-page":"1021","DOI":"10.1093\/bioinformatics\/btl039","volume":"22","author":"B Adamcsek","year":"2006","unstructured":"Adamcsek B, Palla G, Farkas IJ, Der\u00e9nyi I, Vicsek T: CFinder: Locating cliques and overlapping modules in biological networks. Bioinformatics. 2006, 22: 1021-1023.","journal-title":"Bioinformatics"},{"key":"69_CR18","doi-asserted-by":"publisher","first-page":"160202","DOI":"10.1103\/PhysRevLett.94.160202","volume":"94","author":"I Der\u00e9nyi","year":"2005","unstructured":"Der\u00e9nyi I, Palla G, Vicsek T: Clique percolation in random networks. Phys Rev Lett. 2005, 94: 160202-","journal-title":"Phys Rev Lett"},{"key":"69_CR19","volume-title":"Computers and intractability - a guide to the theory of NP-completeness","author":"MR Garey","year":"1979","unstructured":"Garey MR, Johnson DS: Computers and intractability - a guide to the theory of NP-completeness. 1979, W. H. Freeman and Company, New York"},{"key":"69_CR20","unstructured":"CFinder. http:\/\/cfinder.org"},{"key":"69_CR21","unstructured":"Personal communication with Gergely Palla."},{"key":"69_CR22","first-page":"19","volume-title":"Handbook of combinatorial optimization","author":"I Bonze","year":"1999","unstructured":"Bonze I, Budinich M, Pardalos P, Pelillo M: Handbook of combinatorial optimization. 1999, 4: 19-21. Kulwer Academic Publishers, chap The maximum clique problem"},{"issue":"3","key":"69_CR23","doi-asserted-by":"publisher","first-page":"505","DOI":"10.1137\/0206036","volume":"6","author":"S Tsukiyama","year":"1977","unstructured":"Tsukiyama S, Ide H, Ariyoshi H, Shirakawa I: A new algorithm for generating all the maximal independent sets. SIAM J Comput. 1977, 6 (3): 505-517. 10.1137\/0206036.","journal-title":"SIAM J Comput"},{"key":"69_CR24","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1021\/ci049714+","volume":"45","author":"JJ Irwin","year":"2005","unstructured":"Irwin JJ, Shoiche BK: ZINC - a free database of commercially available compounds for virtual screening. J Chem Inf Model. 2005, 45: 177-182.","journal-title":"J Chem Inf Model"},{"key":"69_CR25","unstructured":"Albany Molecular Research Inc. http:\/\/www.amriglobal.com\/"},{"issue":"1","key":"69_CR26","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s11030-006-8697-1","volume":"10","author":"AG Maldonado","year":"2006","unstructured":"Maldonado AG, Doucet JP, Petitjean M, Fan BT: Molecular similarity and diversity in chemoinformatics: from theory to applications. Molecular Diversity. 2006, 10 (1): 39-79.","journal-title":"Molecular Diversity"},{"key":"69_CR27","unstructured":"Daylight Chemical Information Systems Inc. http:\/\/www.daylight.com\/dayhtml\/doc\/theory\/theory.finger.html"},{"key":"69_CR28","unstructured":"ChemAxon Ltd., Chemical hashed fingerprints. http:\/\/www.chemaxon.com\/jchem\/doc\/user\/fingerprint.html"},{"key":"69_CR29","volume-title":"Tech. rep., IBM Internal Report","author":"TT Tanimoto","year":"1957","unstructured":"Tanimoto TT: Tech. rep., IBM Internal Report. 1957"},{"key":"69_CR30","doi-asserted-by":"publisher","first-page":"440","DOI":"10.1038\/30918","volume":"393","author":"DJ Watts","year":"1998","unstructured":"Watts DJ, Strogatz SH: Collective dynamics of 'small-world' networks. Nature. 1998, 393: 440-442.","journal-title":"Nature"},{"key":"69_CR31","doi-asserted-by":"publisher","first-page":"1257","DOI":"10.1038\/82360","volume":"18","author":"B Schwikowski","year":"2000","unstructured":"Schwikowski B, Uetz P, Fields S: A network of protein-protein interactions in yeast. Nature Biotechnology. 2000, 18: 1257-1261.","journal-title":"Nature Biotechnology"},{"issue":"5","key":"69_CR32","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1074\/mcp.M100037-MCP200","volume":"1","author":"C Deane","year":"2002","unstructured":"Deane C, Salwi\u00f1ski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics. 2002, 1 (5): 349-356.","journal-title":"Mol Cell Proteomics"},{"key":"69_CR33","doi-asserted-by":"publisher","first-page":"1552","DOI":"10.1136\/bmj.308.6943.1552","volume":"308","author":"DG Altman","year":"1994","unstructured":"Altman DG, Bland JM: Diagnostic tests 1: Sensitivity and specificity. BMJ. 1994, 308: 1552-","journal-title":"BMJ"},{"issue":"16","key":"69_CR34","doi-asserted-by":"publisher","first-page":"3448","DOI":"10.1093\/bioinformatics\/bti551","volume":"21","author":"S Maere","year":"2005","unstructured":"Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks. Bioinformatics. 2005, 21 (16): 3448-3449.","journal-title":"Bioinformatics"},{"issue":"11","key":"69_CR35","doi-asserted-by":"publisher","first-page":"2498","DOI":"10.1101\/gr.1239303","volume":"13","author":"P Shannon","year":"2003","unstructured":"Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13 (11): 2498-2504.","journal-title":"Genome Res"}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1748-7188-4-12.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T01:08:14Z","timestamp":1630458494000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/1748-7188-4-12"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,10,19]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["69"],"URL":"https:\/\/doi.org\/10.1186\/1748-7188-4-12","relation":{},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,10,19]]},"assertion":[{"value":"1 April 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2009","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2009","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"12"}}