{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T22:35:53Z","timestamp":1763764553978},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2461,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Text mining technologies have been shown to reduce the laborious work involved in organizing the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts. This article reports on a comprehensive study on resolving the ambiguity in mentions of biomedical named entities with respect to model organisms and presents an array of approaches, with focus on methods utilizing natural language parsers.<\/jats:p>\n               <jats:p>Results: We build a corpus for organism disambiguation where every occurrence of protein\/gene entity is manually tagged with a species ID, and evaluate a number of methods on it. Promising results are obtained by training a machine learning model on syntactic parse trees, which is then used to decide whether an entity belongs to the model organism denoted by a neighbouring species-indicating word (e.g. yeast). The parser-based approaches are also compared with a supervised classification method and results indicate that the former are a more favorable choice when domain portability is of concern. The best overall performance is obtained by combining the strengths of syntactic features and supervised classification.<\/jats:p>\n               <jats:p>Availability: The corpus and demo are available at http:\/\/www.nactem.ac.uk\/deca_details\/start.cgi, and the software is freely available as U-Compare components (Kano et al., 2009): NaCTeM Species Word Detector and NaCTeM Species Disambiguator. U-Compare is available at http:\/\/-compare.org\/<\/jats:p>\n               <jats:p>Contact: \u00a0xinglong.wang@manchester.ac.uk<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq002","type":"journal-article","created":{"date-parts":[[2010,1,7]],"date-time":"2010-01-07T02:54:39Z","timestamp":1262832879000},"page":"661-667","source":"Crossref","is-referenced-by-count":46,"title":["Disambiguating the species of biomedical named entities using natural language parsers"],"prefix":"10.1093","volume":"26","author":[{"given":"Xinglong","family":"Wang","sequence":"first","affiliation":[{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"},{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"}]},{"given":"Jun'ichi","family":"Tsujii","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"},{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"},{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"}]},{"given":"Sophia","family":"Ananiadou","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"},{"name":"1 National Centre for Text Mining, 2 School of Computer Science, University of Manchester, Manchester, UK and 3 Department of Computer Science, University of Tokyo, Tokyo, Japan"}]}],"member":"286","published-online":{"date-parts":[[2010,1,6]]},"reference":[{"key":"2023012510594494000_B1","doi-asserted-by":"crossref","DOI":"10.3115\/1572306.1572308","article-title":"A graph kernel for protein-protein interaction extraction","volume-title":"Proceedings of BioNLP","author":"Airola","year":"2008"},{"key":"2023012510594494000_B2","first-page":"556","article-title":"Assisted curation: does text mining really help?","volume":"13","author":"Alex","year":"2008","journal-title":"Pac. Symp. Biocompu."},{"key":"2023012510594494000_B3","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1016\/j.tibtech.2006.10.002","article-title":"Text mining and its potential applications in systems biology","volume":"24","author":"Ananiadou","year":"2006","journal-title":"Trends Biotechnol."},{"key":"2023012510594494000_B4","first-page":"77","article-title":"The second release of the RASP system","volume-title":"Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics, Interactive Presentation Sessions","author":"Briscoe","year":"2006"},{"key":"2023012510594494000_B5","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1093\/bioinformatics\/bth496","article-title":"Gene name ambiguity of eukaryotic nomenclatures","volume":"21","author":"Chen","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012510594494000_B6","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1162\/coli.2007.33.4.493","article-title":"Wide-coverage efficient statistical parsing with CCG and log-linear models","volume":"33","author":"Clark","year":"2007","journal-title":"Comput. Linguist."},{"issue":"Suppl. 1","key":"2023012510594494000_B7","first-page":"S11","article-title":"Data preparation and interannotator agreement: BioCreAtIvE task 1B","volume":"6","author":"Colosimo","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012510594494000_B8","article-title":"Generating typed dependency parses from phrase structure","volume-title":"Proceedings of the 5th International Conference on Language Resources and Evaluation","author":"de Marneffe","year":"2006"},{"key":"2023012510594494000_B9","first-page":"228","article-title":"Semi-supervised classification for extracting protein interaction sentences using dependency parsing","volume-title":"Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Erkan","year":"2007"},{"key":"2023012510594494000_B10","doi-asserted-by":"crossref","first-page":"i126","DOI":"10.1093\/bioinformatics\/btn299","article-title":"Inter-species normalization of gene mentions with GNAT","volume":"24","author":"Hakenberg","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012510594494000_B11","first-page":"11","article-title":"Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser","volume-title":"Proceedings of the 10th International Conference on Parsing Technology","author":"Hara","year":"2007"},{"issue":"Suppl. 1","key":"2023012510594494000_B12","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/1471-2105-6-S1-S11","article-title":"Overview of BioCreAtIvE task 1B: normalised gene lists","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012510594494000_B13","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1016\/j.molcel.2006.02.012","article-title":"Biomedical language processing: what's beyond PubMed","volume":"21","author":"Hunter","year":"2006","journal-title":"Mol. Cell"},{"key":"2023012510594494000_B14","article-title":"Learning from imbalanced data sets: a comparison of various strategies","volume-title":"Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets","author":"Japkowicz","year":"2000"},{"key":"2023012510594494000_B15","doi-asserted-by":"crossref","first-page":"1997","DOI":"10.1093\/bioinformatics\/btp289","article-title":"U-Compare: share and compare text mining tools with UIMA","volume":"25","author":"Kano","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012510594494000_B16","doi-asserted-by":"crossref","DOI":"10.3115\/1572364.1572375","article-title":"TX task: automatic detection of focus organisms in biomedical publications","volume-title":"Proceedings of BioNLP","author":"Kappeler","year":"2009"},{"key":"2023012510594494000_B17","first-page":"423","article-title":"Accurate unlexicalized parsing","volume-title":"Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics","author":"Klein","year":"2003"},{"issue":"Suppl. 2","key":"2023012510594494000_B18","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/gb-2008-9-s2-s1","article-title":"Evaluation of text-mining systems for biology: overview of the Second BioCreAtIvE community challenge","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"key":"2023012510594494000_B19","article-title":"Dependency-based evaluation of Minipar","volume-title":"Proceedings of Workshop on the Evaluation of Parsing Systems","author":"Lin","year":"1998"},{"key":"2023012510594494000_B20","first-page":"101","article-title":"Combining multiple layers of syntactic information for protein-protein interaction extraction","volume-title":"Proceedings of the 3rd International Symposium on Semantic Mining in Biomedicine","author":"Miwa","year":"2008"},{"key":"2023012510594494000_B21","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1162\/coli.2008.34.1.35","article-title":"Feature forest models for probabilistic HPSG parsing","volume":"34","author":"Miyao","year":"2008","journal-title":"Comput. Linguist."},{"key":"2023012510594494000_B22","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1093\/bioinformatics\/btn631","article-title":"Evaluating contributions of natural language parsers to protein-protein interaction extraction","volume":"25","author":"Miyao","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012510594494000_B23","article-title":"Overview of BioCreAtIvE II gene normalisation","volume-title":"Proceedings of the BioCreAtIvE II Workshop","author":"Morgan","year":"2007"},{"key":"2023012510594494000_B24","first-page":"113","article-title":"Making tree kernels practical for natural language learning","volume-title":"Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics","author":"Moschitti","year":"2006"},{"key":"2023012510594494000_B25","volume-title":"Computer Intensive Methods for Testing Hypothesis - An Introduction.","author":"Noreen","year":"1989"},{"key":"2023012510594494000_B26","article-title":"Syntactic features for protein-protein interaction extraction","volume-title":"Proceedings of the 2nd International Symposium on Languages in Biology and Medicine","author":"S\u00e6tre","year":"2007"},{"key":"2023012510594494000_B27","first-page":"451","article-title":"A simple algorithm for identifying abbreviation definitions in biomedical text","volume":"8","author":"Schwartz","year":"2003","journal-title":"Pac. Symp. Biocompu."},{"key":"2023012510594494000_B28","doi-asserted-by":"crossref","first-page":"3191","DOI":"10.1093\/bioinformatics\/bti475","article-title":"ABNER: an open source tool for automatically tagging genes, proteins, and other entity names in text","volume":"21","author":"Settles","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012510594494000_B29","first-page":"220","article-title":"Syntax annotation for the GENIA corpus","volume-title":"Proceedings of the 2nd International Joint Conference on Natural Language Processing","author":"Tateisi","year":"2005"},{"key":"2023012510594494000_B30","article-title":"Learning the species of biomedical named entities from annotated corpora","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation","author":"Wang","year":"2008"},{"issue":"Suppl. 11","key":"2023012510594494000_B31","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/1471-2105-9-S11-S6","article-title":"Distinguishing the species of biomedical named entities for term identification","volume":"9","author":"Wang","year":"2008","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/5\/661\/48860351\/bioinformatics_26_5_661.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/5\/661\/48860351\/bioinformatics_26_5_661.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:00:21Z","timestamp":1674644421000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/5\/661\/212170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,6]]},"references-count":31,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2010,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq002","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,3,1]]},"published":{"date-parts":[[2010,1,6]]}}}