{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T22:26:38Z","timestamp":1778797598599,"version":"3.51.4"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":772,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development of accurate methods that can integrate disparate molecular data as well as about an unbiased evaluation of these methods. One important concern is that experimental annotations of proteins are incomplete. This raises questions as to whether and to what degree currently available data can be reliably used to train computational models and estimate their performance accuracy.<\/jats:p>\n               <jats:p>Results: We study the effect of incomplete experimental annotations on the reliability of performance evaluation in protein function prediction. Using the structured-output learning framework, we provide theoretical analyses and carry out simulations to characterize the effect of growing experimental annotations on the correctness and stability of performance estimates corresponding to different types of methods. We then analyze real biological data by simulating the prediction, evaluation and subsequent re-evaluation (after additional experimental annotations become available) of GO term predictions. Our results agree with previous observations that incomplete and accumulating experimental annotations have the potential to significantly impact accuracy assessments. We find that their influence reflects a complex interplay between the prediction algorithm, performance metric and underlying ontology. However, using the available experimental data and under realistic assumptions, our results also suggest that current large-scale evaluations are meaningful and almost surprisingly reliable.<\/jats:p>\n               <jats:p>Contact: \u00a0predrag@indiana.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu472","type":"journal-article","created":{"date-parts":[[2014,8,26]],"date-time":"2014-08-26T11:23:57Z","timestamp":1409052237000},"page":"i609-i616","source":"Crossref","is-referenced-by-count":42,"title":["The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective"],"prefix":"10.1093","volume":"30","author":[{"given":"Yuxiang","family":"Jiang","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA, 2Department of Microbiology and 3Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wyatt T.","family":"Clark","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA, 2Department of Microbiology and 3Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Iddo","family":"Friedberg","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA, 2Department of Microbiology and 3Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA"},{"name":"1 Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA, 2Department of Microbiology and 3Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Predrag","family":"Radivojac","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA, 2Department of Microbiology and 3Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2014,8,22]]},"reference":[{"key":"2023012711545887700_btu472-B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012711545887700_btu472-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012711545887700_btu472-B3","doi-asserted-by":"crossref","first-page":"i53","DOI":"10.1093\/bioinformatics\/btt228","article-title":"Information-theoretic evaluation of predicted ontological annotations","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012711545887700_btu472-B4","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1016\/j.tig.2013.09.005","article-title":"CAFA and the open world of protein function predictions","volume":"29","author":"Dessimoz","year":"2013","journal-title":"Trends Genet."},{"key":"2023012711545887700_btu472-B5","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1145\/1401890.1401920","article-title":"Learning classifiers from only positive and unlabeled data","volume-title":"Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Elkan","year":"2008"},{"key":"2023012711545887700_btu472-B6","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1093\/bib\/bbl004","article-title":"Automated protein function prediction\u2013the genomic challenge","volume":"7","author":"Friedberg","year":"2006","journal-title":"Brief. Bioinform."},{"key":"2023012711545887700_btu472-B7","doi-asserted-by":"crossref","first-page":"2404","DOI":"10.1093\/bioinformatics\/btp397","article-title":"The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction","volume":"25","author":"Huttenhower","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012711545887700_btu472-B8","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/1471-2105-5-178","article-title":"GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes","volume":"5","author":"Martin","year":"2004","journal-title":"BMC Bioinformatics"},{"issue":"Suppl. 1","key":"2023012711545887700_btu472-B9","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/gb-2008-9-s1-s2","article-title":"A critical assessment of mus musculus gene function prediction using integrated genomic evidence","volume":"9","author":"Pena-Castillo","year":"2008","journal-title":"Genome Biol."},{"key":"2023012711545887700_btu472-B10","doi-asserted-by":"crossref","first-page":"e1000160","DOI":"10.1371\/journal.pcbi.1000160","article-title":"The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function","volume":"4","author":"Punta","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012711545887700_btu472-B11","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat. Methods"},{"key":"2023012711545887700_btu472-B12","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.tibtech.2009.01.002","article-title":"Protein function prediction\u2013the power of multiplicity","volume":"27","author":"Rentzsch","year":"2009","journal-title":"Trends Biotechnol."},{"key":"2023012711545887700_btu472-B13","first-page":"380","article-title":"Classifier evaluation with missing negative class labels","volume-title":"Proceedings of the 12th International Symposium on Intelligent Data Analysis (IDA 2013)","author":"Rider","year":"2013"},{"key":"2023012711545887700_btu472-B14","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1038\/msb4100129","article-title":"Network-based prediction of protein function","volume":"3","author":"Sharan","year":"2007","journal-title":"Mol. Syst. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/17\/i609\/48927905\/bioinformatics_30_17_i609.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/17\/i609\/48927905\/bioinformatics_30_17_i609.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T12:33:57Z","timestamp":1674822837000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/17\/i609\/201287"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8,22]]},"references-count":14,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2014,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu472","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,9,1]]},"published":{"date-parts":[[2014,8,22]]}}}