{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:01:46Z","timestamp":1761058906495},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The biological community's reliance on computational annotations of protein function makes correct assessment of function prediction methods an issue of great importance. The fact that a large fraction of the annotations in current biological databases are based on computational methods can lead to bias in estimating the accuracy of function prediction methods. This can happen since predicting an annotation that was derived computationally in the first place is likely easier than predicting annotations that were derived experimentally, leading to over-optimistic classifier performance estimates.<\/jats:p>\n               <jats:p>Results: We illustrate this phenomenon in a set of controlled experiments using a nearest neighbor classifier that uses PSI-BLAST similarity scores. Our results demonstrate that the source of Gene Ontology (GO) annotations used to assess a protein function predictor can have a highly significant influence on classifier accuracy: the average accuracy over four species and over GO terms in the biological process namespace increased from 0.72 to 0.87 when the classifier was given access to annotations that are assigned evidence codes that indicate a possible computational source, instead of experimentally determined annotations. Slightly smaller increases were observed in the other namespaces. In these comparisons the total number of annotations and their distribution across GO terms were kept the same.<\/jats:p>\n               <jats:p>Conclusion: In conclusion, taking into account GO evidence codes is required for reporting accuracy statistics that do not overestimate a model's performance, and is of particular importance for a fair comparison of classifiers that rely on different information sources.<\/jats:p>\n               <jats:p>Contact: \u00a0rogersma@cs.colostate.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp122","type":"journal-article","created":{"date-parts":[[2009,3,3]],"date-time":"2009-03-03T01:24:53Z","timestamp":1236043493000},"page":"1173-1177","source":"Crossref","is-referenced-by-count":44,"title":["The use of gene ontology evidence codes in preventing classifier assessment bias"],"prefix":"10.1093","volume":"25","author":[{"given":"Mark F.","family":"Rogers","sequence":"first","affiliation":[{"name":"1 Computer Science Department and 2Statistics Department, Colorado State University, Ft. Collins, CO, USA"}]},{"given":"Asa","family":"Ben-Hur","sequence":"additional","affiliation":[{"name":"1 Computer Science Department and 2Statistics Department, Colorado State University, Ft. Collins, CO, USA"},{"name":"1 Computer Science Department and 2Statistics Department, Colorado State University, Ft. Collins, CO, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,3,2]]},"reference":[{"key":"2023013110275750700_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023013110275750700_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI\u2013BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023013110275750700_B3","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.1101\/gr.180801","article-title":"Creating the gene ontology resource: design and implementation","volume":"11","author":"Ashburner","year":"2001","journal-title":"Genome Res"},{"issue":"Suppl. 1","key":"2023013110275750700_B4","doi-asserted-by":"crossref","first-page":"i38","DOI":"10.1093\/bioinformatics\/bti1016","article-title":"Kernel methods for predicting protein-protein interactions","volume":"21","author":"Ben-Hur","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110275750700_B5","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1016\/0168-9525(96)60040-7","article-title":"Go hunting in sequence databases but watch out for the traps","volume":"12","author":"Bork","year":"1996","journal-title":"Trends Genet."},{"key":"2023013110275750700_B6","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/S0168-9525(99)01706-0","article-title":"Errors in genome annotation","volume":"15","author":"Brenner","year":"1999","journal-title":"Trends Genet."},{"key":"2023013110275750700_B7","doi-asserted-by":"crossref","first-page":"e12","DOI":"10.1093\/nar\/gkm1167","article-title":"Gene Ontology annotation quality analysis in model eukaryotes","volume":"36","author":"Buza","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023013110275750700_B8","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1145\/640075.640087","article-title":"An integrated probabilistic model for functional prediction of proteins","volume-title":"RECOMB '03: Proceedings of the seventh annual international conference on Research in computational molecular biology.","author":"Deng","year":"2003"},{"key":"2023013110275750700_B9","doi-asserted-by":"crossref","first-page":"1641","DOI":"10.1093\/bioinformatics\/18.12.1641","article-title":"Modeling the percolation of annotation errors in a database of protein sequences","volume":"18","author":"Gilks","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013110275750700_B10"},{"key":"2023013110275750700_B11","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1007\/978-3-540-35488-8_10","article-title":"Design and analysis of the NIPS2003 challenge","volume-title":"Feature extraction, foundations and applications.","author":"Guyon","year":"2006"},{"key":"2023013110275750700_B12","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1186\/1471-2105-8-170","article-title":"Estimating the annotation error rate of curated GO database sequence annotations","volume":"8","author":"Jones","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013110275750700_B13","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1093\/bioinformatics\/14.9.753","article-title":"What we do not know about sequence analysis and sequence databases","volume":"14","author":"Karp","year":"1998","journal-title":"Bioinformatics"},{"key":"2023013110275750700_B14","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1038\/nrm2281","article-title":"Predicting protein function from sequence and structure","volume":"8","author":"Lee","year":"2007","journal-title":"Nat. Rev. Mol. Cell Biol."},{"issue":"Suppl. 1","key":"2023013110275750700_B15","doi-asserted-by":"crossref","first-page":"i197","DOI":"10.1093\/bioinformatics\/btg1026","article-title":"Predicting protein function from protein\/protein interaction data: a probabilistic approach","volume":"19","author":"Letovsky","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013110275750700_B16","first-page":"93","article-title":"Predicting functional linkages from gene fusions with confidence","volume":"1","author":"Marcotte","year":"2002","journal-title":"Appl. Bioinform."},{"issue":"Suppl. 1","key":"2023013110275750700_B17","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s1-s4","article-title":"GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function","volume":"9","author":"Mostafavi","year":"2008","journal-title":"Genome Biol."},{"key":"2023013110275750700_B18","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.str.2004.10.015","article-title":"Inference of protein function from protein structure","volume":"13","author":"Pal","year":"2005","journal-title":"Structure"},{"issue":"Suppl. 1","key":"2023013110275750700_B19","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/gb-2008-9-s1-s2","article-title":"A critical assessment of Mus musculusgene function prediction using integrated genomic evidence","volume":"9","author":"Pena-Castillo","year":"2008","journal-title":"Genome Biology"},{"key":"2023013110275750700_B20","doi-asserted-by":"crossref","first-page":"1090","DOI":"10.1093\/bioinformatics\/btl642","article-title":"A structural alignment kernel for protein structures","volume":"23","author":"Qiu","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013110275750700_B21","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1016\/j.jmb.2003.08.057","article-title":"How well is enzyme function conserved as a function of pairwise sequence identity?","volume":"333","author":"Tian","year":"2003","journal-title":"J. Mol. Biol."},{"issue":"Suppl. 1","key":"2023013110275750700_B22","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1186\/gb-2008-9-s1-s7","article-title":"Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function","volume":"9","author":"Tian","year":"2008","journal-title":"Genome Biol."},{"key":"2023013110275750700_B23","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1016\/j.sbi.2005.05.010","article-title":"Automatic annotation of protein function","volume":"15","author":"Valencia","year":"2005","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023013110275750700_B24","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1186\/1471-2105-5-116","article-title":"Applying support vector machines for Gene ontology based gene function prediction","volume":"5","author":"Vinayagam","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023013110275750700_B25","volume-title":"Probability and Statistics for Engineers and Scientists.","author":"Walpole","year":"1978","edition":"2"},{"key":"2023013110275750700_B26","doi-asserted-by":"crossref","first-page":"1268","DOI":"10.1093\/bioinformatics\/18.9.1268","article-title":"UniBLAST: a system to filter, cluster, and display BLAST results and assign unique gene annotation","volume":"18","author":"Zhou","year":"2002","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1173\/48983992\/bioinformatics_25_9_1173.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1173\/48983992\/bioinformatics_25_9_1173.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T20:33:24Z","timestamp":1675197204000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/9\/1173\/204083"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,2]]},"references-count":26,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2009,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp122","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,5,1]]},"published":{"date-parts":[[2009,3,2]]}}}