{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T10:05:33Z","timestamp":1760954733836},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2004,12,14]],"date-time":"2004-12-14T00:00:00Z","timestamp":1102982400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2004,12,14]],"date-time":"2004-12-14T00:00:00Z","timestamp":1102982400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusions<\/jats:title>\n                        <jats:p>We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-5-196","type":"journal-article","created":{"date-parts":[[2005,1,12]],"date-time":"2005-01-12T16:25:14Z","timestamp":1105547114000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["A functional hierarchical organization of the protein sequence space"],"prefix":"10.1186","volume":"5","author":[{"given":"Noam","family":"Kaplan","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Moriah","family":"Friedlich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Menachem","family":"Fromer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michal","family":"Linial","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2004,12,14]]},"reference":[{"key":"312_CR1","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1110\/ps.9.1.197","volume":"9","author":"SE Brenner","year":"2000","unstructured":"Brenner SE, Levitt M: Expectations from structural genomics.\n                           Protein Sci 2000, 9: 197\u2013200.","journal-title":"Protein Sci"},{"key":"312_CR2","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1038\/76443","volume":"18","author":"MY Galperin","year":"2000","unstructured":"Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics.\n                           Nat Biotechnol 2000, 18: 609\u2013613. 10.1038\/76443","journal-title":"Nat Biotechnol"},{"key":"312_CR3","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1093\/bioinformatics\/18.7.922","volume":"18","author":"J Liu","year":"2002","unstructured":"Liu J, Rost B: Target space for structural genomics revisited.\n                           Bioinformatics 2002, 18: 922\u2013933. 10.1093\/bioinformatics\/18.7.922","journal-title":"Bioinformatics"},{"key":"312_CR4","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/S1367-5931(02)00015-7","volume":"7","author":"C Zhang","year":"2003","unstructured":"Zhang C, Kim SH: Overview of structural genomics: from structure to function.\n                           Curr Opin Chem Biol 2003, 7: 28\u201332. 10.1016\/S1367-5931(02)00015-7","journal-title":"Curr Opin Chem Biol"},{"key":"312_CR5","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1038\/88640","volume":"8","author":"D Vitkup","year":"2001","unstructured":"Vitkup D, Melamud E, Moult J, Sander C: Completeness in structural genomics.\n                           Nature Structural Biology 2001, 8: 559\u2013566. 10.1038\/88640","journal-title":"Nature Structural Biology"},{"key":"312_CR6","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/S1367-5931(02)00003-0","volume":"7","author":"J Liu","year":"2003","unstructured":"Liu J, Rost B: Domains, motifs and clusters in the protein universe.\n                           Curr Opin Chem Biol 2003, 7: 5\u201311. 10.1016\/S1367-5931(02)00003-0","journal-title":"Curr Opin Chem Biol"},{"key":"312_CR7","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1186\/gb-2003-4-2-401","volume":"4","author":"V Kunin","year":"2003","unstructured":"Kunin V, Cases I, Enright AJ, De Lorenzo V, Ouzonis CA: Myriads of protein families, and still counting.\n                           Genome Biol 2003, 4: 401. 10.1186\/gb-2003-4-2-401","journal-title":"Genome Biol"},{"key":"312_CR8","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1002\/prot.10514","volume":"54","author":"X Liu","year":"2004","unstructured":"Liu X, Fan K, Wang W: The number of protein folds and their distribution over families in nature.\n                           Proteins 2004, 54: 491\u2013499. 10.1002\/prot.10514","journal-title":"Proteins"},{"key":"312_CR9","doi-asserted-by":"publisher","first-page":"953","DOI":"10.1038\/nsb1101-953","volume":"8","author":"S Dietmann","year":"2001","unstructured":"Dietmann S, Holm L: Identification of homology in protein structure classification.\n                           Nature Structural Biology 2001, 8: 953\u20137. 10.1038\/nsb1101-953","journal-title":"Nature Structural Biology"},{"key":"312_CR10","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1093\/protein\/14.4.209","volume":"14","author":"AC May","year":"2001","unstructured":"May AC: Optimal classification of protein sequences and selection of representative sets from multiple alignments: application to homologous families and lessons for structural genomics.\n                           Protein Eng 2001, 14: 209\u2013217. 10.1093\/protein\/14.4.209","journal-title":"Protein Eng"},{"key":"312_CR11","doi-asserted-by":"publisher","first-page":"536","DOI":"10.1006\/jmbi.1995.0159","volume":"247","author":"AG Murzin","year":"1995","unstructured":"Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures.\n                           J Mol Biol 1995, 247: 536\u201340. 10.1006\/jmbi.1995.0159","journal-title":"J Mol Biol"},{"key":"312_CR12","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1110\/ps.16802","volume":"11","author":"FM Pearl","year":"2002","unstructured":"Pearl FM, Lee D, Bray JE, Buchan DW, Shepherd AJ, et al.: The CATH extended protein-family database: providing structural annotations for genome sequences.\n                           Protein Science 2002, 11: 233\u2013244. 10.1110\/ps.16802","journal-title":"Protein Science"},{"key":"312_CR13","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1093\/nar\/27.1.244","volume":"27","author":"L Holm","year":"1999","unstructured":"Holm L, Sander C: Protein folds and families: sequence and structure alignments.\n                           Nucleic Acids Res 1999, 27: 244\u2013247. 10.1093\/nar\/27.1.244","journal-title":"Nucleic Acids Res"},{"key":"312_CR14","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1093\/bioinformatics\/18.7.899","volume":"18","author":"E Portugaly","year":"2002","unstructured":"Portugaly E, Kifer I, Linial M: Selecting targets for structural determination by navigating in a graph of protein families.\n                           Bioinformatics 2002, 18: 899\u2013907. 10.1093\/bioinformatics\/18.7.899","journal-title":"Bioinformatics"},{"key":"312_CR15","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1093\/nar\/28.1.49","volume":"28","author":"G Yona","year":"2000","unstructured":"Yona G, Linial N, Linial M: ProtoMap: automatic classification of protein sequences and hierarchy of protein families.\n                           Nucleic Acids Res 2000, 28: 49\u201355. 10.1093\/nar\/28.1.49","journal-title":"Nucleic Acids Res"},{"key":"312_CR16","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1093\/bioinformatics\/17.3.272","volume":"17","author":"A Hedger","year":"2001","unstructured":"Hedger A, Holm L: Picasso: generating a covering set of protein family profiles.\n                           Bioinformatics 2001, 17: 272\u2013279. 10.1093\/bioinformatics\/17.3.272","journal-title":"Bioinformatics"},{"key":"312_CR17","doi-asserted-by":"publisher","first-page":"270","DOI":"10.1093\/nar\/28.1.270","volume":"28","author":"A Krause","year":"2000","unstructured":"Krause A, Stove J, Vingron M: The SYSTERS protein sequence cluster set.\n                           Nucleic Acids Res 2000, 28: 270\u2013272. 10.1093\/nar\/28.1.270","journal-title":"Nucleic Acids Res"},{"key":"312_CR18","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1093\/nar\/29.1.52","volume":"29","author":"CH Wu","year":"2001","unstructured":"Wu CH, Xiao C, Hou Z, Huang H, Barker WC: iProClass: an integrated, comprehensive and annotated protein classification database.\n                           Nucleic Acids Res 2001, 29: 52\u201354. 10.1093\/nar\/29.1.52","journal-title":"Nucleic Acids Res"},{"key":"312_CR19","doi-asserted-by":"publisher","first-page":"S14","DOI":"10.1093\/bioinformatics\/18.suppl_1.S14","volume":"18","author":"O Sasson","year":"2002","unstructured":"Sasson O, Linial N, Linial M: The metric space of proteins-comparative study of clustering algorithms.\n                           Bioinformatics 2002, 18: S14\u201321.","journal-title":"Bioinformatics"},{"key":"312_CR20","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.\n                           Nucleic Acids Res 1997, 25: 3389\u20132402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"312_CR21","doi-asserted-by":"publisher","first-page":"216","DOI":"10.1093\/nar\/gki007","volume":"33","author":"N Kaplan","year":"2005","unstructured":"Kaplan N, Sasson O, Inbar U, Friedlich M, Fromer M, et al.: ProtoNet 4.0: A hierarchical classification of one million protein sequences.\n                           Nucleic Acids Res 2005, 33: 216\u2013218. 10.1093\/nar\/gki007","journal-title":"Nucleic Acids Res"},{"key":"312_CR22","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1093\/nar\/gkg095","volume":"31","author":"B Boeckmann","year":"2003","unstructured":"Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, et al.: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.\n                           Nucleic Acids Res 2003, 31: 365\u2013370. 10.1093\/nar\/gkg095","journal-title":"Nucleic Acids Res"},{"key":"312_CR23","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1093\/bib\/3.3.225","volume":"3","author":"NJ Mulder","year":"2002","unstructured":"Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, et al.: InterPro: an integrated documentation resource for protein families, domains and functional sites.\n                           Brief Bioinform 2002, 3: 225\u2013235.","journal-title":"Brief Bioinform"},{"key":"312_CR24","doi-asserted-by":"publisher","first-page":"662","DOI":"10.1101\/gr.461403","volume":"13","author":"E Camon","year":"2003","unstructured":"Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, et al.: The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro.\n                           Genome Res 2003, 13: 662\u2013672. 10.1101\/gr.461403","journal-title":"Genome Res"},{"key":"312_CR25","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1093\/nar\/28.1.304","volume":"28","author":"A Bairoch","year":"2000","unstructured":"Bairoch A: The ENZYME database in 2000.\n                           Nucleic Acids Res 2000, 28: 304\u2013305. 10.1093\/nar\/28.1.304","journal-title":"Nucleic Acids Res"},{"key":"312_CR26","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1186\/gb-2004-5-5-107","volume":"5","author":"A Grant","year":"2004","unstructured":"Grant A, Lee D, Orengo C: Progress towards mapping the universe of protein folds.\n                           Genome Biol 2004, 5: 107. 10.1186\/gb-2004-5-5-107","journal-title":"Genome Biol"},{"key":"312_CR27","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1002\/prot.20235","volume":"57","author":"O Shachar","year":"2004","unstructured":"Shachar O, Linial M: A robust method to detect structural and functional remote homologues.\n                           Proteins 2004, 57: 531\u2013538. 10.1002\/prot.20235","journal-title":"Proteins"},{"key":"312_CR28","first-page":"477","volume":"6","author":"M Wolters","year":"1999","unstructured":"Wolters M, Madeja M, Farrel AM, Pongs O: Bacillus stearothermophilus lctB gene gives rise to functional K+ channels in Escherichia coli and in Xenopus oocytes.\n                           Receptors Channels 1999, 6: 477\u2013491.","journal-title":"Receptors Channels"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-196.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-5-196\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-196.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:22:03Z","timestamp":1728303723000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-5-196"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,12,14]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2004,12]]}},"alternative-id":["312"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-5-196","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2004,12,14]]},"assertion":[{"value":"10 September 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 December 2004","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 December 2004","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"196"}}