{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T14:38:50Z","timestamp":1742395130710},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols.<\/jats:p><jats:p>Results: For each gene, we create a profile with different types of information automatically extracted from related MEDLINE abstracts and readily available annotated knowledge sources. We apply the gene profiles to the disambiguation task via an information retrieval method, which ranks the similarity scores between the context where the ambiguous gene is mentioned, and candidate gene profiles. The gene profile with the highest similarity score is then chosen as the correct sense. We evaluated the method on three automatically generated testing sets of mouse, fly and yeast organisms, respectively. The method achieved the highest precision of 93.9% for the mouse, 77.8% for the fly and 89.5% for the yeast.<\/jats:p><jats:p>Availability: The testing data sets and disambiguation programs are available at http:\/\/www.dbmi.columbia.edu\/~hux7002\/gsd2006<\/jats:p><jats:p>Contact: \u00a0friedman@dbmi.columbia.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm056","type":"journal-article","created":{"date-parts":[[2007,2,22]],"date-time":"2007-02-22T01:17:04Z","timestamp":1172107024000},"page":"1015-1022","source":"Crossref","is-referenced-by-count":35,"title":["Gene symbol disambiguation using knowledge-based profiles"],"prefix":"10.1093","volume":"23","author":[{"given":"Hua","family":"Xu","sequence":"first","affiliation":[{"name":"1 Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jung-Wei","family":"Fan","sequence":"additional","affiliation":[{"name":"1 Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Hripcsak","sequence":"additional","affiliation":[{"name":"1 Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eneida A.","family":"Mendon\u00e7a","sequence":"additional","affiliation":[{"name":"1 Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marianthi","family":"Markatou","sequence":"additional","affiliation":[{"name":"1 Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carol","family":"Friedman","sequence":"additional","affiliation":[{"name":"1 Department of Biomedical Informatics, Columbia University, 622 168th St and 2Department of Biostatistics, Columbia University, 722 168th St, New York City, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2007,2,21]]},"reference":[{"key":"2023041107550934500_","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","author":"Aronson","year":"2001"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023041107550934500_","first-page":"139","article-title":"Word sense disambiguation using decomposable models","author":"Bruce","year":"1994"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1093\/bioinformatics\/bth496","article-title":"Gene name ambiguity of eukaryotic nomenclatures","volume":"21","author":"Chen","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1080\/00401706.1964.10490181","article-title":"Multiple comparisons using rank sums","volume":"6","author":"Dunn","year":"1964","journal-title":"Technometrics"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.drudis.2006.02.011","article-title":"Status of text-mining techniques applied to biomedical text","volume":"11","author":"Erhardt","year":"2006","journal-title":"Drug Discov. Today"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1080\/01621459.1937.10503522","article-title":"The use of ranks to avoid the assumption of normality implicit in the analysis of variance","volume":"32","author":"Friedman","year":"1937","journal-title":"J. Am. Stat. Assoc."},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1186\/1471-2105-7-372","article-title":"Gene and protein nomenclature in public databases","volume":"7","author":"Fundel","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041107550934500_","first-page":"74","article-title":"Sense tagging in action: combining different tests with additive weightings","author":"Harley","year":"1997"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/1471-2105-6-S1-S11","article-title":"Overview of BioCreAtIvE task 1B: normalized gene lists","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1002\/asi.20257","article-title":"Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: preliminary experiment","volume":"57","author":"Humphrey","year":"2006","journal-title":"J. Am. Soc. Inform. Sci. Tech."},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1038\/nrg1768","article-title":"Literature mining for the biologist: from information retrieval to biological discovery","volume":"7","author":"Jensen","year":"2006","journal-title":"Nat. Rev. Genet."},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1186\/gb-2005-6-7-224","article-title":"Text-mining and information-retrieval services for molecular biology","volume":"6","author":"Krallinger","year":"2005","journal-title":"Genome Biol."},{"key":"2023041107550934500_","first-page":"41","article-title":"An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation","author":"Lee","year":"2002"},{"key":"2023041107550934500_","first-page":"24","article-title":"Automatic sense disambiguation using machine-readable dictionaries: how to tell a pine cone from an ice cream cone","author":"Lesk","year":"1986"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1197\/jamia.M1101","article-title":"Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS","volume":"9","author":"Liu","year":"2002","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2023041107550934500_","first-page":"64","article-title":"PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing","volume":"11","author":"Lussier","year":"2006","journal-title":"Pac. Symp. Biocomput."},{"key":"2023041107550934500_","first-page":"D54","article-title":"Entrez Gene: gene-centered information at NCBI","volume":"3","author":"Maglott","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023041107550934500_","first-page":"25","article-title":"Combining lexical and syntactic features for supervised word sense disambiguation","author":"Mohammad","year":"2004"},{"key":"2023041107550934500_","volume-title":"UMLS Knowledge Sources.","author":"NLM","year":"2000","edition":"11th edn"},{"key":"2023041107550934500_","first-page":"415","article-title":"AZuRE, a scalable system for automated term disambiguation of gene and protein names","author":"Podowski","year":"2004"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/eb046814","article-title":"An algorithm for suffix stripping","volume":"14","author":"Porter","year":"1980","journal-title":"Program"},{"key":"2023041107550934500_","first-page":"371","article-title":"Automatic extraction of acronym-meaning pairs from MEDLINE databases","volume":"10","author":"Pustejovsky","year":"2001","journal-title":"Medinfo."},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term-weighting approaches in automatic text retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Information Processing & Management"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1186\/1471-2105-6-149","article-title":"Thesaurus-based disambiguation of gene symbols","volume":"6","author":"Schijvenaars","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"2597","DOI":"10.1093\/bioinformatics\/bth291","article-title":"Distribution of information in biomedical abstracts and full-text publications","volume":"20","author":"Schuemie","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041107550934500_","article-title":"Gene terms and English words: an ambiguous mix","volume-title":"SIGIR '04 Workshop on Search and Discovery in BioInformatics.","author":"Sehgal","year":"2004"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1186\/gb-2006-7-5-402","article-title":"The success (or not) of HUGO nomenclature","volume":"7","author":"Tamames","year":"2006","journal-title":"Genome Biol."},{"key":"2023041107550934500_","first-page":"9","article-title":"Unsupervised monolingual and bilingual word-sense disambiguation of medical documents using UMLS","author":"Widdows","year":"2003"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","DOI":"10.1007\/BF00393758","volume-title":"Providing Machine Tractable Dictionary Tools.","author":"Wilks","year":"1990"},{"key":"2023041107550934500_","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1186\/1471-2105-7-334","article-title":"Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues","volume":"7","author":"Xu","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041107550934500_","first-page":"208","article-title":"Syntax and the problem of multiple meaning","volume-title":"Machine Translation of Languages.","author":"Yngve","year":"1955"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/8\/1015\/49823692\/bioinformatics_23_8_1015.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/8\/1015\/49823692\/bioinformatics_23_8_1015.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,11]],"date-time":"2024-02-11T04:22:32Z","timestamp":1707625352000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/8\/1015\/199007"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,2,21]]},"references-count":32,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2007,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm056","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,4,15]]},"published":{"date-parts":[[2007,2,21]]}}}