{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,7,20]],"date-time":"2023-07-20T10:30:26Z","timestamp":1689849026209},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Mapping of remote evolutionary links is a classic computational problem of much interest. Relating protein families allows for functional and structural inference on uncharacterized families. Since sequences have diverged beyond reliable alignment, these are too remote to identify by conventional methods.<\/jats:p>\n               <jats:p>Approach: We present a method to systematically identify remote evolutionary relations between protein families, leveraging a novel evolutionary-driven tree of all protein sequences and families. A global approach which considers the entire volume of similarities while clustering sequences, leads to a robust tree that allows tracing of very faint evolutionary links. The method systematically scans the tree for clusters which partition exceptionally well into extant protein families, thus suggesting an evolutionary breakpoint in a putative ancient superfamily. Our method does not require family profiles (or HMMs), or multiple alignment.<\/jats:p>\n               <jats:p>Results: Considering the entire Pfam database, we are able to suggest 710 links between protein families, 125 of which are confirmed by existence of Pfam clans. The quality of our predictions is also validated by structural assignments. We further provide an intrinsic characterization of the validity of our results and provide examples for new biological findings, from our systematic scan. For example, we are able to relate several bacterial pore-forming toxin families, and then link them with a novel family of eukaryotic toxins expressed in plants,.sh venom and notably also uncharacterized proteins from human pathogens.<\/jats:p>\n               <jats:p>Availability: A detailed list of putative homologous superfamilies, including 210 families of unknown function, has been made available online: http:\/\/www.protonet.cs.huji.ac.il\/dots<\/jats:p>\n               <jats:p>Contact: \u00a0lonshy@cs.huji.ac.il<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn301","type":"journal-article","created":{"date-parts":[[2008,8,9]],"date-time":"2008-08-09T13:08:02Z","timestamp":1218287282000},"page":"i193-i199","source":"Crossref","is-referenced-by-count":6,"title":["Connect the dots: exposing hidden protein family connections from the entire sequence tree"],"prefix":"10.1093","volume":"24","author":[{"given":"Yaniv","family":"Loewenstein","sequence":"first","affiliation":[{"name":"1 School of Computer Science and Engineering and 2Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel"}]},{"given":"Michal","family":"Linial","sequence":"additional","affiliation":[{"name":"1 School of Computer Science and Engineering and 2Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel"}]}],"member":"286","published-online":{"date-parts":[[2008,8,9]]},"reference":[{"key":"2023020210501129600_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B2","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/nar\/28.1.304","article-title":"The ENZYME database in 2000","volume":"28","author":"Bairoch","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B3","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/0968-0004(94)90167-8","article-title":"Convergent evolution: the need to be explicit","volume":"19","author":"Doolittle","year":"1994","journal-title":"Trends Biochem. Sci"},{"key":"2023020210501129600_B4","doi-asserted-by":"crossref","first-page":"D247","DOI":"10.1093\/nar\/gkj149","article-title":"Pfam: clans, web tools and services","volume":"34","author":"Finn","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B5","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The Pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B6","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1016\/j.sbi.2007.06.003","article-title":"Structural genomics: keeping up with expanding knowledge of the protein universe","volume":"17","author":"Grabowski","year":"2007","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023020210501129600_B7","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1186\/1471-2105-5-196","article-title":"A functional hierarchical organization of the protein sequence space","volume":"5","author":"Kaplan","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023020210501129600_B8","doi-asserted-by":"crossref","first-page":"1020","DOI":"10.1093\/bioinformatics\/bti135","article-title":"Predicting fold novelty based on ProtoNet hierarchical classification","volume":"21","author":"Kifer","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020210501129600_B9","first-page":"i41","article-title":"Efficient algorithms for exact hierarchical clustering of huge datasets: tackling the entire protein space","volume":"24","author":"Loewenstein","year":"2008","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol"},{"key":"2023020210501129600_B10","doi-asserted-by":"crossref","first-page":"4321","DOI":"10.1093\/nar\/gkf544","article-title":"A comparison of profile hidden Markov model procedures for remote homology detection","volume":"30","author":"Madera","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B11","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/978-1-59745-515-2_5","article-title":"InterPro and InterProScan: tools for protein sequence classification and comparison","volume":"396","author":"Mulder","year":"2007","journal-title":"Methods Mol. Biol"},{"key":"2023020210501129600_B12","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023020210501129600_B13","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1093\/nar\/30.1.289","article-title":"SUPFAM\u2013a database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: implications for structural genomics and function annotation in genomes","volume":"30","author":"Pandit","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B14","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.sbi.2005.05.005","article-title":"The limits of protein sequence comparison?","volume":"15","author":"Pearson","year":"2005","journal-title":"Curr Opin. Struct. Biol"},{"key":"2023020210501129600_B15","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1186\/1471-2105-7-277","article-title":"EVEREST: automatic identification and classification of protein domains in all protein sequences","volume":"7","author":"Portugaly","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020210501129600_B16","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng"},{"key":"2023020210501129600_B17","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1093\/bioinformatics\/btg485","article-title":"Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs","volume":"20","author":"Sadreyev","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020210501129600_B18","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.tips.2005.12.004","article-title":"Is GAS1 a co-receptor for the GDNF family of ligands?","volume":"27","author":"Schueler-Furman","year":"2006","journal-title":"Trends Pharmacol. Sci"},{"key":"2023020210501129600_B19","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"Soding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020210501129600_B20","first-page":"1409","article-title":"A statistical method for evaluating systematic relationships University of Kansas Science Bulletin 38","author":"Sokal","year":"1958"},{"key":"2023020210501129600_B21","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/gkl910","article-title":"The SUPERFAMILY database in 2007: families and functions","volume":"35","author":"Wilson","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023020210501129600_B22","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1126\/science.1151532","article-title":"Alignment uncertainty and genomic analysis","volume":"319","author":"Wong","year":"2008","journal-title":"Science"},{"key":"2023020210501129600_B23","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1186\/1479-7364-1-3-229","article-title":"Update on genome completion and annotations: protein information resource","volume":"1","author":"Wu","year":"2004","journal-title":"Hum. Genomics"},{"key":"2023020210501129600_B24","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The Universal Protein Resource (UniProt): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/16\/i193\/49052297\/bioinformatics_24_16_i193.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/16\/i193\/49052297\/bioinformatics_24_16_i193.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T12:47:55Z","timestamp":1675342075000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/16\/i193\/201881"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,9]]},"references-count":24,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2008,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn301","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,8,15]]},"published":{"date-parts":[[2008,8,9]]}}}