{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T19:36:41Z","timestamp":1776022601927,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Rapidly expanding repositories of highly informative genomic data have generated increasing interest in methods for protein function prediction and inference of biological networks. The successful application of supervised machine learning to these tasks requires a gold standard for protein function: a trusted set of correct examples, which can be used to assess performance through cross-validation or other statistical approaches. Since gene annotation is incomplete for even the best studied model organisms, the biological reliability of such evaluations may be called into question.<\/jats:p>\n               <jats:p>Results: We address this concern by constructing and analyzing an experimentally based gold standard through comprehensive validation of protein function predictions for mitochondrion biogenesis in Saccharomyces cerevisiae. Specifically, we determine that (i) current machine learning approaches are able to generalize and predict novel biology from an incomplete gold standard and (ii) incomplete functional annotations adversely affect the evaluation of machine learning performance. While computational approaches performed better than predicted in the face of incomplete data, relative comparison of competing approaches\u2014even those employing the same training data\u2014is problematic with a sparse gold standard. Incomplete knowledge causes individual methods' performances to be differentially underestimated, resulting in misleading performance evaluations. We provide a benchmark gold standard for yeast mitochondria to complement current databases and an analysis of our experimental results in the hopes of mitigating these effects in future comparative evaluations.<\/jats:p>\n               <jats:p>Availability: The mitochondrial benchmark gold standard, as well as experimental results and additional data, is available at http:\/\/function.princeton.edu\/mitochondria<\/jats:p>\n               <jats:p>Contact: \u00a0ogt@cs.princeton.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp397","type":"journal-article","created":{"date-parts":[[2009,6,27]],"date-time":"2009-06-27T00:24:20Z","timestamp":1246062260000},"page":"2404-2410","source":"Crossref","is-referenced-by-count":34,"title":["The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction"],"prefix":"10.1093","volume":"25","author":[{"given":"Curtis","family":"Huttenhower","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"},{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"}]},{"given":"Matthew A.","family":"Hibbs","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"}]},{"given":"Chad L.","family":"Myers","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"}]},{"given":"Amy A.","family":"Caudy","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"}]},{"given":"David C.","family":"Hess","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"}]},{"given":"Olga G.","family":"Troyanskaya","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"},{"name":"1 Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540-5233, 2Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, 3Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 and 4Department of Computer Science, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,6,26]]},"reference":[{"key":"2023013112121110500_B1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023013112121110500_B2","doi-asserted-by":"crossref","first-page":"D760","DOI":"10.1093\/nar\/gkl887","article-title":"NCBI GEO: mining tens of millions of expression profiles\u2013database and tools update","volume":"35","author":"Barrett","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013112121110500_B3","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1093\/bioinformatics\/btk048","article-title":"Hierarchical multi-label prediction of gene function","volume":"22","author":"Barutcuoglu","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112121110500_B4","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.cell.2006.06.023","article-title":"Alternative splicing: new insights from global analyses","volume":"126","author":"Blencowe","year":"2006","journal-title":"Cell"},{"key":"2023013112121110500_B5","doi-asserted-by":"crossref","first-page":"D766","DOI":"10.1093\/nar\/gkl1019","article-title":"The Stanford Microarray Database: implementation of new analysis tools and open source release of software","volume":"35","author":"Demeter","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013112121110500_B6","doi-asserted-by":"crossref","first-page":"e1000407","DOI":"10.1371\/journal.pgen.1000407","article-title":"Computationally driven, quantitative experiments discover genes required for mitochondrial biogenesis","volume":"5","author":"Hess","year":"2009","journal-title":"PLoS Genet."},{"key":"2023013112121110500_B7","doi-asserted-by":"crossref","first-page":"2692","DOI":"10.1093\/bioinformatics\/btm403","article-title":"Exploring the functional landscape of gene expression: directed search of large microarray compendia","volume":"23","author":"Hibbs","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112121110500_B8","doi-asserted-by":"crossref","first-page":"e1000322","DOI":"10.1371\/journal.pcbi.1000322","article-title":"Directing experimental biology: a case study in mitochondrial biogenesis","volume":"5","author":"Hibbs","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023013112121110500_B9","doi-asserted-by":"crossref","first-page":"D577","DOI":"10.1093\/nar\/gkm909","article-title":"Gene Ontology annotations at SGD: new data sources and annotation methods","volume":"36","author":"Hong","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112121110500_B10","doi-asserted-by":"crossref","first-page":"2890","DOI":"10.1093\/bioinformatics\/btl492","article-title":"A scalable method for integration and functional analysis of multiple microarray datasets","volume":"22","author":"Huttenhower","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013112121110500_B11","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1126\/science.1087361","article-title":"A Bayesian networks approach for predicting protein-protein interactions from genomic data","volume":"302","author":"Jansen","year":"2003","journal-title":"Science"},{"key":"2023013112121110500_B12","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkm882","article-title":"KEGG for linking genomes to life and the environment","volume":"36","author":"Kanehisa","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013112121110500_B13","doi-asserted-by":"crossref","first-page":"2888","DOI":"10.1073\/pnas.0307326101","article-title":"Whole-genome annotation by using evidence integration in functional-linkage networks","volume":"101","author":"Karaoz","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013112121110500_B14","doi-asserted-by":"crossref","first-page":"2626","DOI":"10.1093\/bioinformatics\/bth294","article-title":"A statistical framework for genomic data fusion","volume":"20","author":"Lanckriet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013112121110500_B15","doi-asserted-by":"crossref","first-page":"1555","DOI":"10.1126\/science.1099511","article-title":"A probabilistic functional network of yeast genes","volume":"306","author":"Lee","year":"2004","journal-title":"Science"},{"key":"2023013112121110500_B16","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1128\/MMBR.00013-06","article-title":"The yeast actin cytoskeleton: from cellular function to biochemical mechanism","volume":"70","author":"Moseley","year":"2006","journal-title":"Microbiol. Mol. Biol. Rev."},{"key":"2023013112121110500_B17","doi-asserted-by":"crossref","first-page":"2322","DOI":"10.1093\/bioinformatics\/btm332","article-title":"Context-sensitive data integration and prediction of biological networks","volume":"23","author":"Myers","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112121110500_B18","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1186\/1471-2164-7-187","article-title":"Finding function: evaluation methods for functional genomic data","volume":"7","author":"Myers","year":"2006","journal-title":"BMC Genomics"},{"key":"2023013112121110500_B19","doi-asserted-by":"crossref","first-page":"R114","DOI":"10.1186\/gb-2005-6-13-r114","article-title":"Discovery of biological networks from diverse functional genomic data","volume":"6","author":"Myers","year":"2005","journal-title":"Genome Biol."},{"issue":"Suppl. 1","key":"2023013112121110500_B20","doi-asserted-by":"crossref","first-page":"i302","DOI":"10.1093\/bioinformatics\/bti1054","article-title":"Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps","volume":"21","author":"Nabieva","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112121110500_B21","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1126\/science.125.3254.928","article-title":"Tetrazolium overlay technique for population studies of respiration deficiency in yeast","volume":"125","author":"Ogur","year":"1957","journal-title":"Science"},{"key":"2023013112121110500_B22","doi-asserted-by":"crossref","first-page":"D747","DOI":"10.1093\/nar\/gkl995","article-title":"ArrayExpress\u2014a public database of microarray experiments and gene expression profiles","volume":"35","author":"Parkinson","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013112121110500_B23","doi-asserted-by":"crossref","first-page":"5539","DOI":"10.1093\/nar\/gkh894","article-title":"The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes","volume":"32","author":"Ruepp","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023013112121110500_B24","volume-title":"Artificial Intelligence: A Modern Approach.","author":"Russell","year":"2003"},{"key":"2023013112121110500_B25","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1126\/science.1105809","article-title":"Causal protein-signaling networks derived from multiparameter single-cell data","volume":"308","author":"Sachs","year":"2005","journal-title":"Science"},{"key":"2023013112121110500_B26","first-page":"171","article-title":"Synthetic genetic array analysis in Saccharomyces cerevisiae","volume":"313","author":"Tong","year":"2006","journal-title":"Methods Mol. Biol."},{"key":"2023013112121110500_B27","doi-asserted-by":"crossref","first-page":"8348","DOI":"10.1073\/pnas.0832373100","article-title":"A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)","volume":"100","author":"Troyanskaya","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/18\/2404\/48995059\/bioinformatics_25_18_2404.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/18\/2404\/48995059\/bioinformatics_25_18_2404.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T21:35:46Z","timestamp":1675200946000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/18\/2404\/196123"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6,26]]},"references-count":27,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2009,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp397","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,9,15]]},"published":{"date-parts":[[2009,6,26]]}}}