{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T13:50:25Z","timestamp":1767016225535},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The Gene Ontology Annotation (GOA) database <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.ebi.ac.uk\/GOA\" ext-link-type=\"uri\">http:\/\/www.ebi.ac.uk\/GOA<\/jats:ext-link> aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded.<\/jats:p>\n            <jats:p>Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process.<\/jats:p>\n            <jats:p>To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge.<\/jats:p>\n            <jats:p>BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase.<\/jats:p>\n            <jats:p>GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new <jats:italic>Journal of Biological Chemistry<\/jats:italic> articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an assessment of our current electronic GO annotation strategies.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The GOA database currently extracts GO annotation from the literature with 91 to 100% precision, and at least 72% recall. This creates a particularly high threshold for text mining systems which in BioCreAtIvE task 2 (GO annotation extraction and retrieval) initial results precisely predicted GO terms only 10 to 20% of the time.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Improvements in the performance and accuracy of text mining for GO terms should be expected in the next BioCreAtIvE challenge. In the meantime the manual and electronic GO annotation strategies already employed by GOA will provide high quality annotations.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-6-s1-s17","type":"journal-article","created":{"date-parts":[[2005,5,24]],"date-time":"2005-05-24T18:13:44Z","timestamp":1116958424000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":104,"title":["An evaluation of GO annotation retrieval for BioCreAtIvE and GOA"],"prefix":"10.1186","volume":"6","author":[{"given":"Evelyn B","family":"Camon","sequence":"first","affiliation":[]},{"given":"Daniel G","family":"Barrell","sequence":"additional","affiliation":[]},{"given":"Emily C","family":"Dimmer","sequence":"additional","affiliation":[]},{"given":"Vivian","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Michele","family":"Magrane","sequence":"additional","affiliation":[]},{"given":"John","family":"Maslen","sequence":"additional","affiliation":[]},{"given":"David","family":"Binns","sequence":"additional","affiliation":[]},{"given":"Rolf","family":"Apweiler","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,5,24]]},"reference":[{"issue":"Database","key":"652_CR1","doi-asserted-by":"publisher","first-page":"D115","DOI":"10.1093\/nar\/gkh131","volume":"32","author":"R Apweiler","year":"2004","unstructured":"Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 2004, 32(Database):D115\u2013119. 10.1093\/nar\/gkh131","journal-title":"Nucleic Acids Res"},{"issue":"Database","key":"652_CR2","doi-asserted-by":"publisher","first-page":"D262","DOI":"10.1093\/nar\/gkh021","volume":"32","author":"E Camon","year":"2004","unstructured":"Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 2004, 32(Database):D262\u2013266. 10.1093\/nar\/gkh021","journal-title":"Nucleic Acids Res"},{"issue":"Database","key":"652_CR3","doi-asserted-by":"publisher","first-page":"D258","DOI":"10.1093\/nar\/gkh036","volume":"32","author":"Gene Ontology Consortium","year":"2004","unstructured":"Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32(Database):D258\u2013261. 10.1093\/nar\/gkh036","journal-title":"Nucleic Acids Res"},{"key":"652_CR4","unstructured":"GO Consortium home page[http:\/\/www.geneontology.org]"},{"key":"652_CR5","unstructured":"GOA home page[http:\/\/\/www.ebi.ac.uk\/GOA]"},{"key":"652_CR6","first-page":"7158","volume":"63","author":"HE Cunliffe","year":"2003","unstructured":"Cunliffe HE, Ringner M, Bilke S, Walker RL, Cheung JM, Chen Y, Meltzer PS: The gene expression response of breast cancer to growth regulators: patterns and correlation with tumor expression profiles. Cancer Res 2003, 63: 7158\u201366.","journal-title":"Cancer Res"},{"key":"652_CR7","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1038\/ng1291","volume":"3","author":"SA McCarroll","year":"2004","unstructured":"McCarroll SA, Murphy CT, Zou S, Pletcher SD, Chin CS, Jan YN, Kenyon C, Bargmann CI, Li H: Comparing genomic expression patterns across species identifies shared transcriptional profile in aging. Nat Genet 2004, 3: 197\u2013204. 10.1038\/ng1291","journal-title":"Nat Genet"},{"key":"652_CR8","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1074\/mcp.M200074-MCP200","volume":"2","author":"T Kislinger","year":"2003","unstructured":"Kislinger T, Rahman K, Radulovic D, Cox B, Rossant J, Emili A: PRISM, a Generic Large Scale Proteomic Investigation Strategy for Mammals. Mol Cell Proteomics 2003, 2: 96\u2013106. 10.1074\/mcp.M200074-MCP200","journal-title":"Mol Cell Proteomics"},{"issue":"6","key":"652_CR9","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1093\/bioinformatics\/btg500","volume":"20","author":"M Deng","year":"2004","unstructured":"Deng M, Tu Z, Sun F, Chen T: Mapping Gene Ontology to proteins based on protein-protein interaction data. Bioinformatics 2004, 20(6):895\u2013902. 10.1093\/bioinformatics\/btg500","journal-title":"Bioinformatics"},{"issue":"5","key":"652_CR10","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1093\/bioinformatics\/btg036","volume":"19","author":"LJ Jensen","year":"2003","unstructured":"Jensen LJ, Gupta R, Staerfeldt HH, Brunak S: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 2003, 19(5):635\u2013642. 10.1093\/bioinformatics\/btg036","journal-title":"Bioinformatics"},{"issue":"Database","key":"652_CR11","first-page":"D262","volume":"32","author":"D Groth","year":"2004","unstructured":"Groth D, Lehrach H, Hennig S: GOblet: a platform for Gene Ontology annotation of anonymous sequence data. Nucleic Acids Res 2004, 32(Database):D262\u2013266.","journal-title":"Nucleic Acids Res"},{"issue":"10","key":"652_CR12","doi-asserted-by":"publisher","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","volume":"19","author":"PW Lord","year":"2003","unstructured":"Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 2003, 19(10):1275\u20131283. 10.1093\/bioinformatics\/btg153","journal-title":"Bioinformatics"},{"issue":"12","key":"652_CR13","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1093\/bioinformatics\/18.12.1553","volume":"18","author":"L Hirschman","year":"2002","unstructured":"Hirschman L, Park JC, Tsujii J, Wong L, Wu CH: Accomplishments and challenges in literature data mining for biology. Bioinformatics 2002, 18(12):1553\u20131561. 10.1093\/bioinformatics\/18.12.1553","journal-title":"Bioinformatics"},{"issue":"2","key":"652_CR14","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1371\/journal.pbio.0000048","volume":"1","author":"S Dickman","year":"2003","unstructured":"Dickman S: Tough Mining, The challenges of searching the scientific literature. Plos Biology 2003, 1(2):144\u2013147. 10.1371\/journal.pbio.0000048","journal-title":"Plos Biology"},{"key":"652_CR15","unstructured":"Textpresso[http:\/\/www.textpresso.org\/]"},{"issue":"Database","key":"652_CR16","doi-asserted-by":"publisher","first-page":"D315","DOI":"10.1093\/nar\/gkg046","volume":"31","author":"NJ Mulder","year":"2003","unstructured":"Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 2003, 31(Database):D315\u2013318. 10.1093\/nar\/gkg046","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"652_CR17","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/S1476-9271(02)00094-4","volume":"27","author":"A Gattiker","year":"2003","unstructured":"Gattiker A, Michoud K, Rivoire C, Auchincloss AH, Coudert E, Lima T, Kersey P, Pagni M, Sigrist CJ, Lachaize C, Veuthey AL, Gasteiger E, Bairoch A: Automated annotation of microbial proteomes in SWISS-PROT. Comput Biol Chem 2003, 27(1):49\u201358. 10.1016\/S1476-9271(02)00094-4","journal-title":"Comput Biol Chem"},{"issue":"Database","key":"652_CR18","doi-asserted-by":"publisher","first-page":"D568","DOI":"10.1093\/nar\/gkh069","volume":"32","author":"DP Hill","year":"2004","unstructured":"Hill DP, Begley DA, Finger JH, Hayamizu TF, McCright IJ, Smith CM, Beal JS, Corbani LE, Blake JA, Eppig JT, Kadin JA, Richardson JE, Ringwald M: The mouse Gene Expression Database (GXD): updates and enhancements. Nucleic Acids Res 2004, 32(Database):D568\u2013571. 10.1093\/nar\/gkh069","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"652_CR19","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1093\/nar\/30.1.69","volume":"30","author":"SS Dwight","year":"2002","unstructured":"Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, Christie KR, Fisk DG, Issel-Tarver L, Schroeder M, Sherlock G, Sethuraman A, Weng S, Botstein D, Cherry JM: Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res 2002, 30(1):69\u201372. 10.1093\/nar\/30.1.69","journal-title":"Nucleic Acids Res"},{"issue":"Database","key":"652_CR20","doi-asserted-by":"publisher","first-page":"D27","DOI":"10.1093\/nar\/gkh120","volume":"32","author":"T Kulikova","year":"2004","unstructured":"Kulikova T, Aldebert P, Althorpe N, Baker W, Bates K, Browne P, van den Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, Garcia-Pastor M, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Stoehr P, Stoesser G, Tuli MA, Tzouvara K, Vaughan R, Wu D, Zhu W, Apweiler R: The EMBL Nucleotide Sequence Database. Nucleic Acids Res 2004, 32(Database):D27\u201330. 10.1093\/nar\/gkh120","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"652_CR21","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1093\/nar\/28.1.10","volume":"28","author":"DL Wheeler","year":"2000","unstructured":"Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, Tatusova TA, Rapp BA: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2000, 28(1):10\u201314. 10.1093\/nar\/28.1.10","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"652_CR22","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1006\/geno.2002.6748","volume":"79","author":"HM Wain","year":"2002","unstructured":"Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S: Guidelines for human gene nomenclature. Genomics 2002, 79(4):464\u2013470. 10.1006\/geno.2002.6748","journal-title":"Genomics"},{"key":"652_CR23","unstructured":"QuickGO[http:\/\/www.ebi.ac.uk\/ego]"},{"key":"652_CR24","unstructured":"Obsolete GO terms[http:\/\/www.geneontology.org\/GO.usage.html#obsoleteTerms]"},{"key":"652_CR25","unstructured":"GO in SourceForge[http:\/\/sourceforge.net\/projects\/geneontology\/]"},{"key":"652_CR26","unstructured":"Using sensu for species-specific GO terms[http:\/\/www.geneontology.org\/GO.usage.html#sensu]"},{"key":"652_CR27","unstructured":"GO evidence codes[http:\/\/geneontology.org\/doc\/GO.evidence.html]"},{"key":"652_CR28","unstructured":"BioCreAtIvE task 2 document[http:\/\/www.pdg.cnb.uam.es\/BioLink\/BioCreative_task2.html]"},{"issue":"1","key":"652_CR29","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1093\/nar\/gkg094","volume":"31","author":"FlyBase Consortium","year":"2003","unstructured":"FlyBase Consortium: The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res 2003, 31(1):172\u2013175. 10.1093\/nar\/gkg094","journal-title":"Nucleic Acids Res"},{"key":"652_CR30","unstructured":"BioCreAtIvE data resources[http:\/\/www.pdg.cnb.uam.es\/BioLINK\/workshop_BioCreative_04\/results]"},{"key":"652_CR31","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1101\/sqb.2003.68.227","volume":"68","author":"M Ashburner","year":"2004","unstructured":"Ashburner M, Mungall CJ, Lewis SE: Ontologies for Biologists: A Community Model for the Annotation of Genomic Data. Cold Spring Harbor Symposia on Quantitative Biology 2004, 68: 227\u2013235. 10.1101\/sqb.2003.68.227","journal-title":"Cold Spring Harbor Symposia on Quantitative Biology"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S1-S17.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T01:32:49Z","timestamp":1630459969000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S1-S17"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":31,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["652"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-s1-s17","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5]]},"assertion":[{"value":"24 May 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S17"}}