{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T15:17:10Z","timestamp":1767626230794},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: \u00a0Trans-kingdom protein clustering remained difficult because of large sequence divergence between eukaryotes and prokaryotes and the presence of a transit sequence in organellar proteins. A large-scale protein clustering including such divergent organisms needs a heuristic to efficiently select similar proteins by setting a proper threshold for homologs of each protein. Here a method is described using two similarity measures and organism count.<\/jats:p>\n               <jats:p>Results: The Gclust software constructs minimal homolog groups using all-against-all BLASTP results by single-linkage clustering. Major points include (i) estimation of domain structure of proteins; (ii) exclusion of multi-domain proteins; (iii) explicit consideration of transit peptides; and (iv) heuristic estimation of a similarity threshold for homologs of each protein by entropy-optimized organism count method. The resultant clusters were evaluated in the light of power law. The software was used to construct protein clusters for up to 95 organisms.<\/jats:p>\n               <jats:p>Availability: Software and data are available at http:\/\/gclust.c.u-tokyo.ac.jp\/Gclust_Download.html.<\/jats:p>\n               <jats:p>Contact: \u00a0naokisat@bio.c.u-tokyo.ac.jp<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp047","type":"journal-article","created":{"date-parts":[[2009,1,22]],"date-time":"2009-01-22T01:44:17Z","timestamp":1232588657000},"page":"599-605","source":"Crossref","is-referenced-by-count":40,"title":["Gclust: <i>trans<\/i>-kingdom classification of proteins using automatic individual threshold setting"],"prefix":"10.1093","volume":"25","author":[{"given":"Naoki","family":"Sato","sequence":"first","affiliation":[{"name":"Department of Life Sciences, Graduate School of Arts and Sciences, University of Tokyo, Komaba, Meguro-ku, Tokyo, 153-8902, Japan"}]}],"member":"286","published-online":{"date-parts":[[2009,1,21]]},"reference":[{"key":"2023013110111312400_B1","doi-asserted-by":"crossref","first-page":"4947","DOI":"10.1242\/jcs.02714","article-title":"Scale-free networks in cell biology","volume":"118","author":"Albert","year":"2005","journal-title":"J. Cell Sci."},{"key":"2023013110111312400_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023013110111312400_B3","doi-asserted-by":"crossref","first-page":"1120","DOI":"10.1104\/pp.106.082859","article-title":"Comparative genomic analysis revealed a gene for monoglucosyldiacylglycerol synthase, an enzyme for photosynthetic membrane lipid synthesis in cyanobacteria","volume":"141","author":"Awai","year":"2006","journal-title":"Plant Physiol."},{"key":"2023013110111312400_B4","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1126\/science.286.5439.509","article-title":"Emergence of scaling in random networks","volume":"286","author":"Barab\u00e1si","year":"1999","journal-title":"Science"},{"key":"2023013110111312400_B5","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/S0168-9525(02)00030-6","article-title":"Searching for nuclear-mitochondrial genes","volume":"19","author":"Chinnery","year":"2003","journal-title":"Trends Genet"},{"key":"2023013110111312400_B6","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1093\/bioinformatics\/btk040","article-title":"OrthologID: automation of genome-scale ortholog identification within a parsimony framework","volume":"22","author":"Chiu","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013110111312400_B7","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1016\/j.tim.2007.10.008","article-title":"Finding novel metabolic genes through plant-prokaryote phylogenomics","volume":"15","author":"De Cr\u00e9cy-Legard","year":"2007","journal-title":"Trends Microbiol."},{"key":"2023013110111312400_B8","doi-asserted-by":"crossref","first-page":"e45","DOI":"10.1371\/journal.pcbi.0010045","article-title":"Protein molecular function prediction by Bayesian phylogenomics","volume":"1","author":"Engelhardt","year":"2005","journal-title":"PLoS Comput. Biol."},{"key":"2023013110111312400_B9","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1093\/molbev\/msj009","article-title":"Genome phylogenies indicate a meaningful \u03b1-proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales","volume":"23","author":"Fitzpatrick","year":"2005","journal-title":"Mol. Biol. Evol."},{"key":"2023013110111312400_B10","doi-asserted-by":"crossref","first-page":"4029","DOI":"10.1093\/nar\/28.20.4029","article-title":"Automatic detection of conserved gene clusters in multiple genomes by graph comparison and p-quasi grouping","volume":"28","author":"Fujibushi","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023013110111312400_B11","doi-asserted-by":"crossref","first-page":"5452","DOI":"10.1093\/nar\/gkh885","article-title":"\u2018Conserved hypothetical\u2019 proteins: prioritization of targets for experimental study","volume":"32","author":"Galperin","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023013110111312400_B12","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1007\/s00239-001-0054-5","article-title":"Using homolog groups to create a whole-genomic tree of free-living organisms: an update","volume":"54","author":"House","year":"2002","journal-title":"J. Mol. Evol."},{"key":"2023013110111312400_B13","first-page":"D476","article-title":"Inparanoid: a comprehensive database of eukaryotic orthologs","volume":"33","author":"Kersey","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023013110111312400_B14","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1146\/annurev.genet.33.1.351","article-title":"Mitochondrial genome evolution and the origin of eukaryotes","volume":"33","author":"Lang","year":"1999","journal-title":"Annu. Rev. Genet."},{"key":"2023013110111312400_B15","doi-asserted-by":"crossref","first-page":"12246","DOI":"10.1073\/pnas.182432999","article-title":"Evolutionary analysis of arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus","volume":"99","author":"Martin","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110111312400_B16","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1038\/nature02398","article-title":"Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D","volume":"428","author":"Matsuzaki","year":"2004","journal-title":"Nature"},{"key":"2023013110111312400_B17","doi-asserted-by":"crossref","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","article-title":"Assigning protein functions by comparative genome analysis: protein phylogenetic profiles","volume":"96","author":"Pellegrini","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110111312400_B18","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1186\/1471-2105-8-120","article-title":"BranchClust: a phylogenetic algorithm for selecting gene families","volume":"8","author":"Poptsova","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013110111312400_B19","doi-asserted-by":"crossref","first-page":"1551","DOI":"10.1126\/science.1073374","article-title":"Hierarchical organization of modularity in metabolic networks","volume":"297","author":"Ravasz","year":"2002","journal-title":"Science"},{"key":"2023013110111312400_B20","doi-asserted-by":"crossref","first-page":"1041","DOI":"10.1006\/jmbi.2000.5197","article-title":"Automatic clustering of orthologs and in-paralogs from pairwise species comparisons","volume":"314","author":"Remm","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023013110111312400_B21","doi-asserted-by":"crossref","first-page":"1361","DOI":"10.1104\/pp.107.106781","article-title":"Digalactosyldiacylglycerol is required for stabilization of the oxygen-evolving complex in photosystem II","volume":"145","author":"Sakurai","year":"2007","journal-title":"plant Physiol."},{"key":"2023013110111312400_B22","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1093\/bioinformatics\/16.2.180","article-title":"SISEQ: manipulation of multiple sequence and large database files for common platforms","volume":"16","author":"Sato","year":"2000","journal-title":"Bioinformatics"},{"key":"2023013110111312400_B23","first-page":"173","article-title":"Comparative analysis of the genomes of cyanobacteria and plants","volume":"13","author":"Sato","year":"2002","journal-title":"Genome Inform."},{"key":"2023013110111312400_B24","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/978-1-4020-4061-0_4","article-title":"Origin and evolution of plastids: genomic view on the unification and diversity of plastids","volume-title":"The Structure and Function of Plastids.","author":"Sato","year":"2006"},{"key":"2023013110111312400_B25","first-page":"56","article-title":"Mass identification of chloroplast proteins of endosymbiont origin by phylogenetic profiling based on organism-optimized homologous protein groups","volume":"16","author":"Sato","year":"2005","journal-title":"Genome Inform."},{"key":"2023013110111312400_B26","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1093\/bioinformatics\/bth021","article-title":"Phylogenomic inference of protein molecular function: advances and challenges","volume":"20","author":"Sj\u00f6lander","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110111312400_B27","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2105-4-41","article-title":"The COG database: an updated version includes eukaryotes","volume":"4","author":"Tatusov","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023013110111312400_B28","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1109\/TCBB.2007.1004","article-title":"Ortholog clustering on a multipartite graph","volume":"4","author":"Vashist","year":"2007","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"2023013110111312400_B29","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1093\/nar\/gkg109","article-title":"MBGD: microbial genome database for comparative analysis","volume":"31","author":"Uchiyama","year":"2003","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/5\/599\/48983956\/bioinformatics_25_5_599.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/5\/599\/48983956\/bioinformatics_25_5_599.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T19:48:14Z","timestamp":1675194494000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/5\/599\/183904"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,1,21]]},"references-count":29,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2009,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp047","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,3,1]]},"published":{"date-parts":[[2009,1,21]]}}}