{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T07:50:58Z","timestamp":1761292258869},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2005,6,16]],"date-time":"2005-06-16T00:00:00Z","timestamp":1118880000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2005,6,16]],"date-time":"2005-06-16T00:00:00Z","timestamp":1118880000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusion<\/jats:title>\n                        <jats:p>The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene\/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-6-149","type":"journal-article","created":{"date-parts":[[2005,6,16]],"date-time":"2005-06-16T18:13:38Z","timestamp":1118945618000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Thesaurus-based disambiguation of gene symbols"],"prefix":"10.1186","volume":"6","author":[{"given":"Bob JA","family":"Schijvenaars","sequence":"first","affiliation":[]},{"given":"Barend","family":"Mons","sequence":"additional","affiliation":[]},{"given":"Marc","family":"Weeber","sequence":"additional","affiliation":[]},{"given":"Martijn J","family":"Schuemie","sequence":"additional","affiliation":[]},{"given":"Erik M","family":"van Mulligen","sequence":"additional","affiliation":[]},{"given":"Hester M","family":"Wain","sequence":"additional","affiliation":[]},{"given":"Jan A","family":"Kors","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,6,16]]},"reference":[{"key":"474_CR1","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1038\/88213","volume":"28","author":"TK Jenssen","year":"2001","unstructured":"Jenssen TK, Laegreid A, Komorowski J, Hovig E: A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 2001, 28: 21\u201328. 10.1038\/88213","journal-title":"Nat Genet"},{"key":"474_CR2","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1093\/bioinformatics\/17.4.319","volume":"17","author":"DR Masys","year":"2001","unstructured":"Masys DR, Welsh JB, Lynn Fink J, Gribskov M, Klacansky I, Corbeil J: Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics 2001, 17: 319\u2013326. 10.1093\/bioinformatics\/17.4.319","journal-title":"Bioinformatics"},{"key":"474_CR3","first-page":"317","volume":"8","author":"H Shatkay","year":"2000","unstructured":"Shatkay H, Edwards S, Wilbur WJ, Boguski M: Genes, themes and microarrays: using information retrieval for large- scale gene analysis. Proc Int Conf Intell Syst Mol Biol 2000, 8: 317\u2013328.","journal-title":"Proc Int Conf Intell Syst Mol Biol"},{"key":"474_CR4","doi-asserted-by":"publisher","first-page":"S74","DOI":"10.1093\/bioinformatics\/17.suppl_1.S74","volume":"17","author":"C Friedman","year":"2001","unstructured":"Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 2001, 17: S74\u201382.","journal-title":"Bioinformatics"},{"key":"474_CR5","first-page":"123","volume":"12","author":"C Blaschke","year":"2001","unstructured":"Blaschke C, Valencia A: The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform Ser Workshop Genome Inform 2001, 12: 123\u2013134.","journal-title":"Genome Inform Ser Workshop Genome Inform"},{"key":"474_CR6","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/S0014-5793(00)01661-6","volume":"476","author":"MA Andrade","year":"2000","unstructured":"Andrade MA, Bork P: Automated extraction of information in molecular biology. FEBS Lett 2000, 476: 12\u201317. 10.1016\/S0014-5793(00)01661-6","journal-title":"FEBS Lett"},{"key":"474_CR7","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1089\/106652703322756104","volume":"10","author":"H Shatkay","year":"2003","unstructured":"Shatkay H, Feldman R: Mining the biomedical literature in the genomic era: an overview. J Comput Biol 2003, 10: 821\u2013855. 10.1089\/106652703322756104","journal-title":"J Comput Biol"},{"key":"474_CR8","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1093\/bioinformatics\/btg421","volume":"20","author":"JD Wren","year":"2004","unstructured":"Wren JD, Bekeredjian R, Stewart JA, Shohet RV, Garner HR: Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 2004, 20: 389\u2013398. 10.1093\/bioinformatics\/btg421","journal-title":"Bioinformatics"},{"key":"474_CR9","doi-asserted-by":"publisher","first-page":"664","DOI":"10.1038\/ng0704-664","volume":"36","author":"R Hoffmann","year":"2004","unstructured":"Hoffmann R, Valencia A: A gene network for navigating the literature. Nat Genet 2004, 36: 664. 10.1038\/ng0704-664","journal-title":"Nat Genet"},{"key":"474_CR10","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1038\/416373a","volume":"416","author":"MV Blagosklonny","year":"2002","unstructured":"Blagosklonny MV, Pardee AB: Conceptual biology: unearthing the gems. Nature 2002, 416: 373. 10.1038\/416373a","journal-title":"Nature"},{"key":"474_CR11","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1038\/88324","volume":"28","author":"DR Masys","year":"2001","unstructured":"Masys DR: Linking microarray data to the literature. Nat Genet 2001, 28: 9\u201310. 10.1038\/88324","journal-title":"Nat Genet"},{"key":"474_CR12","doi-asserted-by":"crossref","unstructured":"Obstacles of nomenclature Nature 1997, 389: 1.","DOI":"10.1038\/37816"},{"key":"474_CR13","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1159\/000015372","volume":"86","author":"H Wain","year":"1999","unstructured":"Wain H, White J, Povey S: The changing challenges of nomenclature. Cytogenet Cell Genet 1999, 86: 162\u2013164. 10.1159\/000015372","journal-title":"Cytogenet Cell Genet"},{"key":"474_CR14","first-page":"704","volume-title":"Proc AMIA Symp","author":"M Weeber","year":"2003","unstructured":"Weeber M, Schijvenaars BJ, Van Mulligen EM, Mons B, Jelier R, Van der Eijk CC, Kors JA: Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection. Proc AMIA Symp 2003, 704\u2013708."},{"key":"474_CR15","first-page":"238","volume-title":"Pac Symp Biocomput","author":"O Tuason","year":"2004","unstructured":"Tuason O, Chen L, Liu H, Blake JA, Friedman C: Biological nomenclatures: a source of lexical knowledge and ambiguity. Pac Symp Biocomput 2004, 238\u2013249."},{"key":"474_CR16","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1093\/bioinformatics\/bth496","volume":"21","author":"L Chen","year":"2005","unstructured":"Chen L, Liu H, Friedman C: Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 2005, 21: 248\u2013256. 10.1093\/bioinformatics\/bth496","journal-title":"Bioinformatics"},{"key":"474_CR17","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1006\/jbin.2001.1023","volume":"34","author":"H Liu","year":"2001","unstructured":"Liu H, Lussier YA, Friedman C: Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method. J Biomed Inform 2001, 34: 249\u2013261. 10.1006\/jbin.2001.1023","journal-title":"J Biomed Inform"},{"key":"474_CR18","doi-asserted-by":"publisher","first-page":"14","DOI":"10.3115\/1118149.1118152","volume-title":"Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain","author":"KB Cohen","year":"2002","unstructured":"Cohen KB, Dolbey AE, Acquaah-Mensah GK, Hunter L: Contrast and variability in gene names. In Proceedings of the Workshop on Natural Language Processing in the Biomedical Domain. Philadelphia; 2002:14\u201320."},{"key":"474_CR19","volume-title":"Proceedings of the Computational Systems Bioinformatics Conference","author":"RM Podowski","year":"2004","unstructured":"Podowski RM, Cleary JG, Goncharoff NT, Amoutzias G, Hayes WS: AZuRe, a scalable system for automated term disambiguation of gene and protein names. In Proceedings of the Computational Systems Bioinformatics Conference. Stanford; 2004."},{"key":"474_CR20","first-page":"1","volume":"24","author":"N Ide","year":"1998","unstructured":"Ide N, V\u00e9ronis J: Introduction to the special issue on word sense disambiguation: the state of the art. Computational Linguistics 1998, 24: 1\u201340.","journal-title":"Computational Linguistics"},{"key":"474_CR21","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1162\/089120101317066104","volume":"27","author":"M Stevenson","year":"2001","unstructured":"Stevenson M, Wilks Y: The interaction of knowledge sources in word sense disambiguation. Computational Linguistics 2001, 27: 321\u2013349. 10.1162\/089120101317066104","journal-title":"Computational Linguistics"},{"key":"474_CR22","doi-asserted-by":"publisher","first-page":"S97","DOI":"10.1093\/bioinformatics\/17.suppl_1.S97","volume":"17","author":"V Hatzivassiloglou","year":"2001","unstructured":"Hatzivassiloglou V, Duboue PA, Rzhetsky A: Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics 2001, 17: S97\u2013106.","journal-title":"Bioinformatics"},{"key":"474_CR23","first-page":"605","volume":"5","author":"F Ginter","year":"2004","unstructured":"Ginter F, Boberg J, J\u00e4rvinen J, Salakosi T: New techniques for disambiguation in natural language and their application to biological text. J Machine Learning Res 2004, 5: 605\u2013621.","journal-title":"J Machine Learning Res"},{"key":"474_CR24","doi-asserted-by":"publisher","first-page":"621","DOI":"10.1197\/jamia.M1101","volume":"9","author":"H Liu","year":"2002","unstructured":"Liu H, Johnson SB, Friedman C: Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS. J Am Med Inform Assoc 2002, 9: 621\u2013636. 10.1197\/jamia.M1101","journal-title":"J Am Med Inform Assoc"},{"key":"474_CR25","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1197\/jamia.M1533","volume":"11","author":"H Liu","year":"2004","unstructured":"Liu H, Teller V, Friedman C: A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 2004, 11: 320\u2013331. 10.1197\/jamia.M1533","journal-title":"J Am Med Inform Assoc"},{"key":"474_CR26","doi-asserted-by":"publisher","first-page":"9","DOI":"10.3115\/1118958.1118960","volume-title":"Natural Language Processing in Biomedicine, ACL 2003 Workshop","author":"D Widdows","year":"2003","unstructured":"Widdows D, Peters S, Cederberg S, Chan C, Steffen D, Buitelaar P: Unsupervised monolingual and bilingual word-sense disambiguation of medical documents using UMLS. In Natural Language Processing in Biomedicine, ACL 2003 Workshop. Sapporo; 2003:9\u201316."},{"key":"474_CR27","doi-asserted-by":"publisher","first-page":"1103","DOI":"10.1001\/jama.1994.03510380059038","volume":"271","author":"HJ Lowe","year":"1994","unstructured":"Lowe HJ, Barnett GO: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Jama 1994, 271: 1103\u20131108. 10.1001\/jama.271.14.1103","journal-title":"Jama"},{"key":"474_CR28","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1093\/nar\/29.1.137","volume":"29","author":"KD Pruitt","year":"2001","unstructured":"Pruitt KD, Maglott DR: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res 2001, 29: 137\u2013140. 10.1093\/nar\/29.1.137","journal-title":"Nucleic Acids Res"},{"issue":"Database","key":"474_CR29","doi-asserted-by":"publisher","first-page":"D3","DOI":"10.1093\/nar\/gkh143","volume":"32","author":"MY Galperin","year":"2004","unstructured":"Galperin MY: The Molecular Biology Database Collection: 2004 update. Nucleic Acids Res 2004, 32(Database):D3\u201322. 10.1093\/nar\/gkh143","journal-title":"Nucleic Acids Res"},{"key":"474_CR30","first-page":"451","volume-title":"Pac Symp Biocomput","author":"AS Schwartz","year":"2003","unstructured":"Schwartz AS, Hearst MA: A simple algorithm for identifying abbreviation definitions in biomedical text. Pac Symp Biocomput 2003, 451\u2013462."},{"key":"474_CR31","first-page":"868","volume-title":"Proc AMIA Symp","author":"EM van Mulligen","year":"2000","unstructured":"van Mulligen EM, Diwersy M, Schmidt M, Buurman H, Mons B: Facilitating networks of information. Proc AMIA Symp 2000, 868\u2013872."},{"key":"474_CR32","volume-title":"Introduction to modern information retrieval","author":"G Salton","year":"1983","unstructured":"Salton G: Introduction to modern information retrieval. New York: McGraw-Hill; 1983."},{"key":"474_CR33","doi-asserted-by":"publisher","first-page":"2597","DOI":"10.1093\/bioinformatics\/bth291","volume":"20","author":"MJ Schuemie","year":"2004","unstructured":"Schuemie MJ, Weeber M, Schijvenaars BJ, van Mulligen EM, van der Eijk CC, Jelier R, Mons B, Kors JA: Distribution of information in biomedical abstracts and full-text publications. Bioinformatics 2004, 20: 2597\u20132604. 10.1093\/bioinformatics\/bth291","journal-title":"Bioinformatics"},{"key":"474_CR34","doi-asserted-by":"publisher","first-page":"512","DOI":"10.1016\/j.jbi.2004.08.004","volume":"37","author":"M Krauthammer","year":"2004","unstructured":"Krauthammer M, Nenadic G: Term identification in the biomedical literature. J Biomed Inform 2004, 37: 512\u2013526. 10.1016\/j.jbi.2004.08.004","journal-title":"J Biomed Inform"},{"key":"474_CR35","unstructured":"Genew download[http:\/\/www.gene.ucl.ac.uk\/public-files\/nomen\/nomeids.txt]"},{"key":"474_CR36","unstructured":"GDB download[http:\/\/gdbwww.gdb.org\/gdbreports\/GeneByAlpha.tab]"},{"key":"474_CR37","unstructured":"LocusLink download[ftp:\/\/ftp.ncbi.nih.gov\/refseq\/LocusLink\/ARCHIVE\/LL_tmpl.gz]"},{"key":"474_CR38","unstructured":"OMIM download[ftp:\/\/ftp.ncbi.nih.gov\/repository\/OMIM\/genemap]"},{"key":"474_CR39","unstructured":"Swiss-Prot download[ftp:\/\/us.expasy.org\/databases\/swiss-prot\/special_selections\/human.seq.gz]"},{"key":"474_CR40","unstructured":"UMLS lexical tools[http:\/\/umlslex.nlm.nih.gov\/lvg\/current\/]"},{"key":"474_CR41","volume-title":"Information retrieval: a health and biomedical perspective","author":"WR Hersh","year":"2003","unstructured":"Hersh WR: Information retrieval: a health and biomedical perspective. New York: Springer-Verlag; 2003."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-149.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-6-149\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-149.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:13:22Z","timestamp":1728303202000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-149"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,6,16]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["474"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-149","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2005,6,16]]},"assertion":[{"value":"22 November 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"149"}}