{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T13:03:53Z","timestamp":1781787833213,"version":"3.54.5"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The biological research literature is a major repository of knowledge. As the amount of literature increases, it will get harder to find the information of interest on a particular topic. There has been an increasing amount of work on text mining this literature, but comparing this work is hard because of a lack of standards for making comparisons. To address this, we worked with colleagues at the Protein Design Group, CNB-CSIC, Madrid to develop BioCreAtIvE (Critical Assessment for Information Extraction in Biology), an open common evaluation of systems on a number of biological text mining tasks. We report here on task 1A, which deals with finding mentions of genes and related entities in text. \"Finding mentions\" is a basic task, which can be used as a building block for other text mining tasks. The task makes use of data and evaluation software provided by the (US) National Center for Biotechnology Information (NCBI).<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>15 teams took part in task 1A. A number of teams achieved scores over 80% F-measure (balanced precision and recall). The teams that tried to use their task 1A systems to help on other BioCreAtIvE tasks reported mixed results.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The 80% plus F-measure results are good, but still somewhat lag the best scores achieved in some other domains such as newswire, due in part to the complexity and length of gene names, compared to person or organization names in newswire.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-6-s1-s2","type":"journal-article","created":{"date-parts":[[2005,5,24]],"date-time":"2005-05-24T18:13:44Z","timestamp":1116958424000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":107,"title":["BioCreAtIvE Task 1A: gene mention finding evaluation"],"prefix":"10.1186","volume":"6","author":[{"given":"Alexander","family":"Yeh","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alexander","family":"Morgan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marc","family":"Colosimo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lynette","family":"Hirschman","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2005,5,24]]},"reference":[{"key":"637_CR1","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1093\/bioinformatics\/18.12.1553","volume":"18","author":"L Hirschman","year":"2002","unstructured":"Hirschman L, Park JC, Tsujii J, Wong L, Wu CH: Accomplishments and challenges in literature data mining for biology. Bioinformatics 2002, 18: 1553\u20131561. 10.1093\/bioinformatics\/18.12.1553","journal-title":"Bioinformatics"},{"key":"637_CR2","unstructured":"Critical Assessment of Techniques for Protein Structure Prediction[http:\/\/predictioncenter.llnl.gov\/]"},{"key":"637_CR3","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1006\/csla.1998.0102","volume":"12","author":"L Hirschman","year":"1998","unstructured":"Hirschman L: The evolution of evaluation: lessons from the message understanding conferences. Computer Speech and Language 1998, 12: 281\u2013305. 10.1006\/csla.1998.0102","journal-title":"Computer Speech and Language"},{"key":"637_CR4","unstructured":"Text REtrieval Conference[http:\/\/trec.nist.gov\/]"},{"key":"637_CR5","volume-title":"J. The Eleventh Text Retrieval Conference (TREC 2002): NIST Special Publication 500-XXX, Gaithersburg, Maryland","year":"2002","unstructured":"Voorhees EM, Buckland LP, Ed:J. The Eleventh Text Retrieval Conference (TREC 2002): NIST Special Publication 500-XXX, Gaithersburg, Maryland. 2002. [http:\/\/trec.nist.gov\/pubs\/trec11\/t11_proceedings.html]"},{"key":"637_CR6","doi-asserted-by":"publisher","first-page":"i331","DOI":"10.1093\/bioinformatics\/btg1046","volume":"19","author":"AS Yeh","year":"2003","unstructured":"Yeh AS, Hirschman L, Morgan AA: The Evaluation of text data mining for database curation: lessons learned from the KDD challenge cup. Bioinformatics 2003, 19: i331-i339. 10.1093\/bioinformatics\/btg1046","journal-title":"Bioinformatics"},{"key":"637_CR7","unstructured":"BioCreAtIvE Workshop Handouts, Granada, Spain. 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLINK\/workshop_BioCreative_04\/handout\/index.html]"},{"issue":"Suppl 1","key":"637_CR8","doi-asserted-by":"publisher","first-page":"S16","DOI":"10.1186\/1471-2105-6-S1-S16","volume":"6","author":"C Blaschke","year":"2005","unstructured":"Blaschke C, Leon EA, Krallinger M, Valencia A: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005, 6(Suppl 1):S16. 10.1186\/1471-2105-6-S1-S16","journal-title":"BMC Bioinformatics"},{"key":"637_CR9","unstructured":"Medline[http:\/\/www.ncbi.nlm.nih.gov\/PubMed\/]"},{"issue":"Suppl 1","key":"637_CR10","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1471-2105-6-S1-S3","volume":"6","author":"L Tanabe","year":"2005","unstructured":"Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ: GENETAG: A Tagged Corpus for Gene\/Protein Named Entity Recognition. BMC Bioinformatics 2005, 6(Suppl 1):S3. 10.1186\/1471-2105-6-S1-S3","journal-title":"BMC Bioinformatics"},{"key":"637_CR11","doi-asserted-by":"publisher","first-page":"947","DOI":"10.3115\/992730.992783","volume-title":"Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000): Saarbrueken","author":"A Yeh","year":"2000","unstructured":"Yeh A: More accurate tests for the statistical significance of result differences. Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000): Saarbrueken 2000, 947\u2013953. 31 July \u2013 4 August 2000"},{"key":"637_CR12","volume-title":"Computer-intensive methods for testing hypotheses: an introduction","author":"E Noreen","year":"1989","unstructured":"Noreen E: Computer-intensive methods for testing hypotheses: an introduction. John Wiley and Sons, Inc; 1989."},{"key":"637_CR13","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"J Tamames","year":"2004","unstructured":"Tamames J: Text Detective: BioAlma's gene annotation tool. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR14","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"S Dingare","year":"2004","unstructured":"Dingare S, Finkel J, Manning C, Nissim M, Alex B: Exploring the Boundaries: Gene and Protein Identification in Biomedical Text. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR15","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"S Kinoshita","year":"2004","unstructured":"Kinoshita S, Ogren P, Cohen KB, Hunter L: Entity identification in the molecular biology domain with a stochastic POS tagger: the BioCreative task. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR16","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"GD Zhou","year":"2004","unstructured":"Zhou GD, Shen D, Zhang J, Su J, Tan SH, Tan CL: Recognition of Protein\/Gene Names from Text using an Ensemble of Classifiers and Effective Abbreviation Resolution. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR17","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"R McDonald","year":"2004","unstructured":"McDonald R, Pereira F: Identifying Gene and Protein Mentions in Text Using Conditional Random Fields. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR18","doi-asserted-by":"publisher","first-page":"1146","DOI":"10.3115\/992730.992822","volume-title":"Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000): Saarbrueken","author":"A Yeh","year":"2000","unstructured":"Yeh A: Comparing two trainable grammatical relations finders. Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000): Saarbrueken 2000, 1146\u20131150. 31 July \u2013 4 August 2000"},{"key":"637_CR19","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"J Crim","year":"2004","unstructured":"Crim J, McDonald R, Pereira F: Automatically Annotating documents with Normalized Gene Lists. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR20","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"B Hachey","year":"2004","unstructured":"Hachey B, Nguyen H, Nissim M, Alex B, Grover C: Grounding Gene Mentions with Respect to Gene Database Identifiers. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR21","volume-title":"BioCreAtIvE Workshop Handouts, Granada, Spain","author":"Y Krymolowski","year":"2004","unstructured":"Krymolowski Y, Alex B, Leidner JL: BioCreative Task 2.1: The Edinburgh-Stanford system. BioCreAtIvE Workshop Handouts, Granada, Spain 2004."},{"key":"637_CR22","volume-title":"Proceedings of the 16th International Conference on Machine Learning (ICML-99)","author":"T Joachims","year":"1999","unstructured":"Joachims T: Transductive Inference for Text Classification using Support Vector Machines. Proceedings of the 16th International Conference on Machine Learning (ICML-99) 1999."},{"key":"637_CR23","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/S1532-0464(03)00014-5","volume":"35","author":"L Hirschman","year":"2002","unstructured":"Hirschman L, Morgan A, Yeh A: Rutabaga by any other name: extracting biological names. J of Biomedical Informatics 2002, 35: 247\u2013259. 10.1016\/S1532-0464(03)00014-5","journal-title":"J of Biomedical Informatics"},{"key":"637_CR24","unstructured":"Linguistic Data Consortium[http:\/\/ldc.upenn.edu]"},{"key":"637_CR25","unstructured":"Marsh E, Perzanowski D: MUC-7 Evaluation of IE Technology: Overview of Results.[http:\/\/www.itl.nist.gov\/iaui\/894.02\/related_projects\/muc\/]"},{"issue":"Suppl 1","key":"637_CR26","doi-asserted-by":"publisher","first-page":"S5","DOI":"10.1186\/1471-2105-6-S1-S5","volume":"6","author":"S Dingare","year":"2005","unstructured":"Dingare S, Finkel J, Manning C, Nissim M, Alex B, Grover C: Exploring the boundaries: Gene and Protein Identification in Biomedical Text. BMC Bioinformatics 2005, 6(Suppl 1):S5. 10.1186\/1471-2105-6-S1-S5","journal-title":"BMC Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S1-S2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T01:40:09Z","timestamp":1630460409000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S1-S2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":26,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["637"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-s1-s2","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5]]},"assertion":[{"value":"24 May 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S2"}}