{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T21:28:50Z","timestamp":1762032530836},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Molecular Biology accumulated substantial amounts of data concerning functions of genes and proteins. Information relating to functional descriptions is generally extracted manually from textual data and stored in biological databases to build up annotations for large collections of gene products. Those annotation databases are crucial for the interpretation of large scale analysis approaches using bioinformatics or experimental techniques. Due to the growing accumulation of functional descriptions in biomedical literature the need for text mining tools to facilitate the extraction of such annotations is urgent. In order to make text mining tools useable in real world scenarios, for instance to assist database curators during annotation of protein function, comparisons and evaluations of different approaches on full text articles are needed.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The Critical Assessment for Information Extraction in Biology (BioCreAtIvE) contest consists of a community wide competition aiming to evaluate different strategies for text mining tools, as applied to biomedical literature. We report on task two which addressed the automatic extraction and assignment of Gene Ontology (GO) annotations of human proteins, using full text articles. The predictions of task 2 are based on triplets of <jats:italic>protein \u2013 GO term \u2013 article passage<\/jats:italic>. The annotation-relevant text passages were returned by the participants and evaluated by expert curators of the GO annotation (GOA) team at the European Institute of Bioinformatics (EBI). Each participant could submit up to three results for each sub-task comprising task 2. In total more than 15,000 individual results were provided by the participants. The curators evaluated in addition to the annotation itself, whether the protein and the GO term were correctly predicted and traceable through the submitted text fragment.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Concepts provided by GO are currently the most extended set of terms used for annotating gene products, thus they were explored to assess how effectively text mining tools are able to extract those annotations automatically. Although the obtained results are promising, they are still far from reaching the required performance demanded by real world applications. Among the principal difficulties encountered to address the proposed task, were the complex nature of the GO terms and protein names (the large range of variants which are used to express proteins and especially GO terms in free text), and the lack of a standard training set. A range of very different strategies were used to tackle this task. The dataset generated in line with the BioCreative challenge is publicly available and will allow new possibilities for training information extraction methods in the domain of molecular biology.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-6-s1-s16","type":"journal-article","created":{"date-parts":[[2005,5,24]],"date-time":"2005-05-24T18:13:44Z","timestamp":1116958424000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":73,"title":["Evaluation of BioCreAtIvE assessment of task 2"],"prefix":"10.1186","volume":"6","author":[{"given":"Christian","family":"Blaschke","sequence":"first","affiliation":[]},{"given":"Eduardo Andres","family":"Leon","sequence":"additional","affiliation":[]},{"given":"Martin","family":"Krallinger","sequence":"additional","affiliation":[]},{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,5,24]]},"reference":[{"key":"651_CR1","doi-asserted-by":"publisher","first-page":"857","DOI":"10.1093\/bioinformatics\/17.9.857","volume":"17","author":"K Johnson","year":"2001","unstructured":"Johnson K, Lin S: Critical assessment of microarray data analysis: the 2001 challenge. Bioinformatics 2001, 17: 857\u2013858. [http:\/\/www.fruitfly.org\/GASP1\/] 10.1093\/bioinformatics\/17.9.857","journal-title":"Bioinformatics"},{"key":"651_CR2","doi-asserted-by":"publisher","first-page":"242","DOI":"10.1016\/j.sbi.2004.02.003","volume":"14","author":"S Wodak","year":"2004","unstructured":"Wodak S, Mendez R: Prediction of protein-protein interactions: the CAPRI experiment, its evaluation and implications. Curr Opin Struct Biol 2004, 14: 242\u2013249. [http:\/\/capri.ebi.ac.uk\/] 10.1016\/j.sbi.2004.02.003","journal-title":"Curr Opin Struct Biol"},{"key":"651_CR3","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1101\/gr.10.4.483","volume":"10","author":"M Reese","year":"2000","unstructured":"Reese M, Hartzell G, Harris N, Ohler U, Abril J, Lewis S: Genome annotation assessment in Drosophila melanogaster. Genome Res 2000, 10: 483\u2013501. 10.1101\/gr.10.4.483","journal-title":"Genome Res"},{"key":"651_CR4","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/1471-2156-4-S1-S1","volume":"4","author":"L Almasy","year":"2003","unstructured":"Almasy L, Amos C, Bailey-Wilson J, Cantor R, Jaquish C, Martinez M, Neuman R, Olson J, Palmer L, Rich S, Spence M, MacCluer JW: Genetic Analysis Workshop 13: Analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors. BMC Genetics 2003, 4: S1. [http:\/\/www.gaworkshop.org\/] 10.1186\/1471-2156-4-S1-S1","journal-title":"BMC Genetics"},{"key":"651_CR5","doi-asserted-by":"publisher","first-page":"1179","DOI":"10.1093\/bioinformatics\/btg084","volume":"19","author":"C Helma","year":"2003","unstructured":"Helma C, Kramer S: A survey of the Predictive Toxicology Challenge 2000\u20132001. Bioinformatics 2003, 19: 1179\u20131182. [http:\/\/www.predictive-toxicology.org\/ptc\/] 10.1093\/bioinformatics\/btg084","journal-title":"Bioinformatics"},{"key":"651_CR6","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1006\/csla.1998.0102","volume":"12","author":"L Hirschman","year":"1998","unstructured":"Hirschman L: The evolution of evaluation: lessons from the message understanding conferences. Computer Speech and Language 1998, 12: 281\u2013305. [http:\/\/www.itl.nist.gov\/iaui\/894.02\/related_projects\/muc] 10.1006\/csla.1998.0102","journal-title":"Computer Speech and Language"},{"key":"651_CR7","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1093\/bioinformatics\/btg1046","volume":"19","author":"A Yeh","year":"2003","unstructured":"Yeh A, Hirschman L, Morgan A: Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics 2003, 19: 331\u2013339. 10.1093\/bioinformatics\/btg1046","journal-title":"Bioinformatics"},{"key":"651_CR8","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1093\/nar\/gkg094","volume":"31","author":"F Consortium","year":"2003","unstructured":"Consortium F: The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res 2003, 31: 172\u2013175. [http:\/\/flybase.org] 10.1093\/nar\/gkg094","journal-title":"Nucleic Acids Res"},{"key":"651_CR9","first-page":"14","volume-title":"Proc Twelfth Text Retrieval Conference (TREC 2003)","author":"W Hersh","year":"2003","unstructured":"Hersh W, Bhupatiraju R: TREC GENOMICS Track Overview. Proc Twelfth Text Retrieval Conference (TREC 2003) 2003, 14\u201324. [http:\/\/ir.ohsu.edu\/genomics\/]"},{"issue":"Suppl 1","key":"651_CR10","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-6-S1-S2","volume":"6","author":"A Yeh","year":"2005","unstructured":"Yeh A, Hirsch man L, Morgan A, Colosimo M: BioCre AtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics 2005, 6(Suppl 1):S2. 10.1186\/1471-2105-6-S1-S2","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"651_CR11","doi-asserted-by":"publisher","first-page":"S11","DOI":"10.1186\/1471-2105-6-S1-S11","volume":"6","author":"L Hirschman","year":"2005","unstructured":"Hirschman L, Colosimo M, Morgan A, Yeh A: Overview of BioCreAtIvE task 1B: Normailzed Gene Lists. BMC Bioinformatics 2005, 6(Suppl 1):S11. 10.1186\/1471-2105-6-S1-S11","journal-title":"BMC Bioinformatics"},{"key":"651_CR12","doi-asserted-by":"publisher","first-page":"D258","DOI":"10.1093\/nar\/gkh036","volume":"32","author":"TGO Consortium","year":"2004","unstructured":"Consortium TGO: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32: D258-D261. [http:\/\/www.geneontology.org] 10.1093\/nar\/gkh036","journal-title":"Nucleic Acids Res"},{"key":"651_CR13","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1093\/nar\/gkh021","volume":"32","author":"E Camon","year":"2004","unstructured":"Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 2004, 32: 262\u2013266. 10.1093\/nar\/gkh021","journal-title":"Nucleic Acids Res"},{"issue":"Suppl 1","key":"651_CR14","doi-asserted-by":"publisher","first-page":"S17","DOI":"10.1186\/1471-2105-6-S1-S17","volume":"6","author":"E Camon","year":"2005","unstructured":"Camon E, Barrell D, Dimmer E, Lee V, Magrane M, Mslen J, Binns D, Apweiler R: Evaluation of GO annotation retrieval for BioCreative, Task 2: Lessons to be learned and comparison with existing annotation techniques in GOA. BMC Bioinformatics 2005, 6(Suppl 1):S17. 10.1186\/1471-2105-6-S1-S17","journal-title":"BMC Bioinformatics"},{"key":"651_CR15","unstructured":"TREC 2004 contest homepage[http:\/\/ir.ohsu.edu\/genomics\/2004protocol.html]"},{"key":"651_CR16","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1093\/nar\/gkg033","volume":"31","author":"D Wheeler","year":"2003","unstructured":"Wheeler D, Church D, Federhen S, Lash A, Madden T, Pontius J, Schuler G, Schriml L, Sequeira E, Tatusova T, Wagner L: Database resources of the National Center for Biotechnology. Nucleic Acids Res 2003, 31: 28\u201333. [http:\/\/www.ncbi.nlm.nih.gov\/] 10.1093\/nar\/gkg033","journal-title":"Nucleic Acids Res"},{"key":"651_CR17","doi-asserted-by":"publisher","first-page":"D255","DOI":"10.1093\/nar\/gkh072","volume":"32","author":"H Wain","year":"2004","unstructured":"Wain H, Lush M, Ducluzeau F, Khodiyar V, Povey S: Genew: the Human Gene Nomenclature Database, 2004 updates. Nucleic Acids Res 2004, 32: D255-D257. [http:\/\/www.geneontology.org] 10.1093\/nar\/gkh072","journal-title":"Nucleic Acids Res"},{"key":"651_CR18","unstructured":"BioCreAtIvE contest homepage[http:\/\/www.pdg.cnb.uam.es\/BioLINK\/workshop_BioCreative_04\/results\/]"},{"issue":"Suppl 1","key":"651_CR19","doi-asserted-by":"publisher","first-page":"S21","DOI":"10.1186\/1471-2105-6-S1-S21","volume":"6","author":"F Couto","year":"2005","unstructured":"Couto F, Silva M, Coutinho P: Finding Genomic Ontology Terms in Unstructured Text. BMC Bioinformatics 2005, 6(Suppl 1):S21. 10.1186\/1471-2105-6-S1-S21","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"651_CR20","doi-asserted-by":"publisher","first-page":"S23","DOI":"10.1186\/1471-2105-6-S1-S23","volume":"6","author":"F Ehrler","year":"2005","unstructured":"Ehrler F, Jimeno A, Ruch P: Data-poor categorization and passage retrieval for Gene Ontology annotation in Swiss-Prot. BMC bioinformatics 2005, 6(Suppl 1):S23. 10.1186\/1471-2105-6-S1-S23","journal-title":"BMC bioinformatics"},{"issue":"Suppl 1","key":"651_CR21","doi-asserted-by":"publisher","first-page":"S20","DOI":"10.1186\/1471-2105-6-S1-S20","volume":"6","author":"K Verspoor","year":"2005","unstructured":"Verspoor K, Cohn J, Joslyn C, Mniszewski S, Rechtsteiner A, Rocha L, Simas T: Protein Annotation as Term Categorization in the Gene Ontology using Word Proximity Networks. BMC bioinformatics 2005, 6(Suppl 1):S20. 10.1186\/1471-2105-6-S1-S20","journal-title":"BMC bioinformatics"},{"issue":"Suppl 1","key":"651_CR22","doi-asserted-by":"publisher","first-page":"S19","DOI":"10.1186\/1471-2105-6-S1-S19","volume":"6","author":"M Krallinger","year":"2005","unstructured":"Krallinger M, Padron M, Valencia A: A sentence sliding window approach to extract protein annotations from biomedical articles. BMC Bioinformatics 2005, 6(Suppl 1):S19. 10.1186\/1471-2105-6-S1-S19","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"651_CR23","doi-asserted-by":"publisher","first-page":"S22","DOI":"10.1186\/1471-2105-6-S1-S22","volume":"6","author":"S Rice","year":"2005","unstructured":"Rice S, Nenadic G, Stapley B: Mining protein functions from text using term-based support vector machines. BMC bioinformatics 2005, 6(Suppl 1):S22. 10.1186\/1471-2105-6-S1-S22","journal-title":"BMC bioinformatics"},{"issue":"Suppl 1","key":"651_CR24","doi-asserted-by":"publisher","first-page":"S18","DOI":"10.1186\/1471-2105-6-S1-S18","volume":"6","author":"S Ray","year":"2005","unstructured":"Ray S, Craven M: Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text. BMC bioinformatics 2005, 6(Suppl 1):S18. 10.1186\/1471-2105-6-S1-S18","journal-title":"BMC bioinformatics"},{"key":"651_CR25","volume-title":"Proc BioCreAtIvE Challenge Evaluation Workshop","author":"J Chiang","year":"2004","unstructured":"Chiang J, Yu H: Extracting Functional Annotations of Proteins Based on Hybrid Text Mining Approaches. Proc BioCreAtIvE Challenge Evaluation Workshop 2004."},{"key":"651_CR26","volume-title":"Proc BioCreAtIvE Challenge Evaluation Workshop","author":"Y Krymolowski","year":"2004","unstructured":"Krymolowski Y, Alex B, Leidner J: BioCreative Task 2.1: The Edinburgh\/Stanford system. Proc BioCreAtIvE Challenge Evaluation Workshop 2004."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S1-S16.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T01:37:48Z","timestamp":1630460268000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S1-S16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":26,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["651"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-s1-s16","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5]]},"assertion":[{"value":"24 May 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S16"}}