{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,12,12]],"date-time":"2024-12-12T05:54:36Z","timestamp":1733982876071,"version":"3.30.2"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3412,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Many classifications of protein function such as Gene Ontology (GO) are organized in directed acyclic graph (DAG) structures. In these classifications, the proteins are terminal leaf nodes; the categories \u2018above\u2019 them are functional annotations at various levels of specialization and the computation of a numerical measure of relatedness between two arbitrary proteins is an important proteomics problem. Moreover, analogous problems are important in other contexts in large-scale information organization\u2014e.g. the Wikipedia online encyclopedia and the Yahoo and DMOZ web page classification schemes.<\/jats:p>\n               <jats:p>Results: Here we develop a simple probabilistic approach for computing this relatedness quantity, which we call the total ancestry method. Our measure is based on counting the number of leaf nodes that share exactly the same set of \u2018higher up\u2019 category nodes in comparison to the total number of classified pairs (i.e. the chance for the same total ancestry). We show such a measure is associated with a power-law distribution, allowing for the quick assessment of the statistical significance of shared functional annotations. We formally compare it with other quantitative functional similarity measures (such as, shortest path within a DAG, lowest common ancestor shared and Azuaje's information-theoretic similarity) and provide concrete metrics to assess differences. Finally, we provide a practical implementation for our total ancestry measure for GO and the MIPS functional catalog and give two applications of it in specific functional genomics contexts.<\/jats:p>\n               <jats:p>Availability: The implementations and results are available through our supplementary website at: http:\/\/gersteinlab.org\/proj\/funcsim<\/jats:p>\n               <jats:p>Contact: \u00a0mark.gerstein@yale.edu<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm291","type":"journal-article","created":{"date-parts":[[2007,6,1]],"date-time":"2007-06-01T00:14:26Z","timestamp":1180656866000},"page":"2163-2173","source":"Crossref","is-referenced-by-count":40,"title":["Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications"],"prefix":"10.1093","volume":"23","author":[{"given":"Haiyuan","family":"Yu","sequence":"first","affiliation":[{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"},{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"},{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"}]},{"given":"Ronald","family":"Jansen","sequence":"additional","affiliation":[{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"}]},{"given":"Gustavo","family":"Stolovitzky","sequence":"additional","affiliation":[{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"}]},{"given":"Mark","family":"Gerstein","sequence":"additional","affiliation":[{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"},{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"},{"name":"1 Department of Molecular Biophysics & Biochemistry, 2Department of Computer Science, 3Program in Computational Biology and Bioinformatics, 266 Whitney Avenue, Yale University, PO Box 208114, New Haven, CT 06520, 4Department of Genetics, Harvard University, 5Department of Cancer Biology, Dana-Farber Cancer Institute, 1 Jimmy Fund Way, Boston, MA 02115 and 6IBM Computational Biology Center, T.J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598"}]}],"member":"286","published-online":{"date-parts":[[2007,5,31]]},"reference":[{"volume-title":"Design and Analysis of Computer Algorithms","year":"1974","author":"Aho","key":"2024121118021348700_B1"},{"key":"2024121118021348700_B2","doi-asserted-by":"crossref","DOI":"10.1109\/ICDMW.2006.130","article-title":"Predictive integration of Gene Ontology-driven similarity and functional interactions","author":"Azuaje","year":"2006"},{"key":"2024121118021348700_B3","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1073\/pnas.97.1.262","article-title":"Knowledge-based analysis of microarray gene expression data by using support vector machines","volume":"97","author":"Brown","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2024121118021348700_B4","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1016\/S0092-8674(00)81360-4","article-title":"A novel mechanism for regulating activity of a transcription factor that controls the unfolded protein response","volume":"87","author":"Cox","year":"1996","journal-title":"Cell"},{"key":"2024121118021348700_B5","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2024121118021348700_B6","doi-asserted-by":"crossref","first-page":"967","DOI":"10.1093\/bioinformatics\/btl042","article-title":"Assessing semantic similarity measures for the characterization of human regulatory pathways","volume":"22","author":"Guo","year":"2006","journal-title":"Bioinformatics"},{"key":"2024121118021348700_B7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511574931","volume-title":"Algorithms on Strings, Tress and Sequencess: Computer Science and Computational Biology","author":"Gusfield","year":"1997"},{"key":"2024121118021348700_B8","doi-asserted-by":"crossref","first-page":"D258","DOI":"10.1093\/nar\/gkh036","article-title":"The Gene Ontology (GO) database and informatics resource","volume":"32","author":"Harris","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024121118021348700_B9","doi-asserted-by":"crossref","first-page":"1632","DOI":"10.1101\/gr.183801","article-title":"Annotation transfer for genomics: measuring functional divergence in multi-domain proteins","volume":"11","author":"Hegyi","year":"2001","journal-title":"Genome Res"},{"key":"2024121118021348700_B10","doi-asserted-by":"crossref","first-page":"7923","DOI":"10.1128\/MCB.21.23.7923-7932.2001","article-title":"The Hsp70-Ydj1 molecular chaperone represses the activity of the heme activator protein Hap1 in the absence of heme","volume":"21","author":"Hon","year":"2001","journal-title":"Mol. Cell. Biol"},{"key":"2024121118021348700_B11","doi-asserted-by":"crossref","first-page":"3017","DOI":"10.1101\/gad.1039602","article-title":"Complex transcriptional circuitry at the G1\/S transition in Saccharomyces cerevisiae","volume":"16","author":"Horak","year":"2002","journal-title":"Genes Dev"},{"key":"2024121118021348700_B12","doi-asserted-by":"crossref","DOI":"10.14209\/its.2002.603","article-title":"Distance Metrics in the Internet","author":"Huffaker","year":"2002"},{"key":"2024121118021348700_B13","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1126\/science.1087361","article-title":"A Bayesian networks approach for predicting protein-protein interactions from genomic data","volume":"302","author":"Jansen","year":"2003","journal-title":"Science"},{"key":"2024121118021348700_B14","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1093\/nar\/24.1.32","article-title":"EcoCyc: an encyclopedia of Escherichia coli genes and metabolism","volume":"24","author":"Karp","year":"1996","journal-title":"Nucleic Acids Res"},{"key":"2024121118021348700_B15","doi-asserted-by":"crossref","first-page":"12860","DOI":"10.1073\/pnas.95.22.12860","article-title":"Folding in vivo of a newly translated yeast cytosolic enzyme is mediated by the SSA class of cytosolic yeast Hsp70 proteins","volume":"95","author":"Kim","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2024121118021348700_B16","doi-asserted-by":"crossref","first-page":"1362","DOI":"10.1126\/science.7761857","article-title":"Role of the protein chaperone YDJ1 in establishing Hsp90-mediated signal transduction pathways","volume":"268","author":"Kimura","year":"1995","journal-title":"Science"},{"key":"2024121118021348700_B17","doi-asserted-by":"crossref","first-page":"1848","DOI":"10.1109\/JPROC.2002.805302","article-title":"Toward a systematic definition of protein function that scales to the genome level: defining function in terms of interactions","volume":"90","author":"Lan","year":"2002","journal-title":"Proc. IEEE"},{"key":"2024121118021348700_B18","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1126\/science.1075090","article-title":"Transcriptional regulatory networks in Saccharomyces cerevisiae","volume":"298","author":"Lee","year":"2002","journal-title":"Science"},{"key":"2024121118021348700_B19","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1186\/1471-2105-7-491","article-title":"Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction","volume":"7","author":"Lei","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2024121118021348700_B20","first-page":"296","article-title":"An information-theoretic definition of similarity","author":"Lin","year":"1998"},{"key":"2024121118021348700_B21","doi-asserted-by":"crossref","first-page":"1703","DOI":"10.1101\/gr.192502","article-title":"Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons","volume":"12","author":"Mateos","year":"2002","journal-title":"Genome Res"},{"key":"2024121118021348700_B22","doi-asserted-by":"crossref","first-page":"D169","DOI":"10.1093\/nar\/gkj148","article-title":"MIPS: analysis and annotation of proteins from whole genomes in 2005","volume":"34","author":"Mewes","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2024121118021348700_B23","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1093\/nar\/27.1.275","article-title":"The CATH Database provides insights into protein structure\/function relationships","volume":"27","author":"Orengo","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2024121118021348700_B24","first-page":"448","article-title":"Using information content to evaluate semantic similarity in a taxonomy","author":"Resnik","year":"1995"},{"key":"2024121118021348700_B25","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1093\/nar\/24.1.40","article-title":"Genes and proteins of Escherichia coli (GenProtEc)","volume":"24","author":"Riley","year":"1996","journal-title":"Nucleic Acids Res"},{"key":"2024121118021348700_B26","doi-asserted-by":"crossref","first-page":"5539","DOI":"10.1093\/nar\/gkh894","article-title":"The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes","volume":"32","author":"Ruepp","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024121118021348700_B27","doi-asserted-by":"crossref","first-page":"3273","DOI":"10.1091\/mbc.9.12.3273","article-title":"Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization","volume":"9","author":"Spellman","year":"1998","journal-title":"Mol. Biol. Cell"},{"volume-title":"Graphs, Networks and Algorithms","year":"1981","author":"Swamy","key":"2024121118021348700_B28"},{"key":"2024121118021348700_B29","first-page":"25","article-title":"Gene expression correlation and gen ontology-based similarity: an assessment of quantitative relationships","author":"Wang","year":"2004"},{"key":"2024121118021348700_B30","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1006\/jmbi.2000.3550","article-title":"Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores","volume":"297","author":"Wilson","year":"2000","journal-title":"J. Mol. Biol"},{"key":"2024121118021348700_B31","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1038\/ng906","article-title":"Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters","volume":"31","author":"Wu","year":"2002","journal-title":"Nat. Genet"},{"key":"2024121118021348700_B32","doi-asserted-by":"crossref","first-page":"2137","DOI":"10.1093\/nar\/gkl219","article-title":"Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/16\/2163\/61052008\/bioinformatics_23_16_2163.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/16\/2163\/61052008\/bioinformatics_23_16_2163.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T22:24:10Z","timestamp":1733955850000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/16\/2163\/197865"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,5,31]]},"references-count":32,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2007,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm291","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2007,8,15]]},"published":{"date-parts":[[2007,5,31]]}}}