{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T19:23:12Z","timestamp":1778613792348,"version":"3.51.4"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1138,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,11,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Despite the central role of diseases in biomedical research, there have been much fewer attempts to automatically determine which diseases are mentioned in a text\u2014the task of disease name normalization (DNorm)\u2014compared with other normalization tasks in biomedical text mining research.<\/jats:p><jats:p>Methods: In this article we introduce the first machine learning approach for DNorm, using the NCBI disease corpus and the MEDIC vocabulary, which combines MeSH\u00ae and OMIM. Our method is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. The technique is based on pairwise learning to rank, which has not previously been applied to the normalization task but has proven successful in large optimization problems for information retrieval.<\/jats:p><jats:p>Results: We compare our method with several techniques based on lexical normalization and matching, MetaMap and Lucene. Our algorithm achieves 0.782 micro-averaged F-measure and 0.809 macro-averaged F-measure, an increase over the highest performing baseline method of 0.121 and 0.098, respectively.<\/jats:p><jats:p>Availability: The source code for DNorm is available at http:\/\/www.ncbi.nlm.nih.gov\/CBBresearch\/Lu\/Demo\/DNorm, along with a web-based demonstration and links to the NCBI disease corpus. Results on PubMed abstracts are available in PubTator: http:\/\/www.ncbi.nlm.nih.gov\/CBBresearch\/Lu\/Demo\/PubTator<\/jats:p><jats:p>Contact: \u00a0zhiyong.lu@nih.gov<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt474","type":"journal-article","created":{"date-parts":[[2013,8,23]],"date-time":"2013-08-23T00:58:43Z","timestamp":1377219523000},"page":"2909-2917","source":"Crossref","is-referenced-by-count":370,"title":["DNorm: disease name normalization with pairwise learning to rank"],"prefix":"10.1093","volume":"29","author":[{"given":"Robert","family":"Leaman","sequence":"first","affiliation":[{"name":"1 National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA and 2Department of Biomedical Informatics, Arizona State University, 13212 East Shea Blvd, Scottsdale, AZ 85259, USA"},{"name":"1 National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA and 2Department of Biomedical Informatics, Arizona State University, 13212 East Shea Blvd, Scottsdale, AZ 85259, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rezarta","family":"Islamaj Do\u011fan","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA and 2Department of Biomedical Informatics, Arizona State University, 13212 East Shea Blvd, Scottsdale, AZ 85259, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA and 2Department of Biomedical Informatics, Arizona State University, 13212 East Shea Blvd, Scottsdale, AZ 85259, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2013,8,21]]},"reference":[{"key":"2023012810471943800_btt474-B1","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program","volume-title":"Proceedings of the AMIA Symposium","author":"Aronson","year":"2001"},{"key":"2023012810471943800_btt474-B2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1007\/s10791-009-9117-9","article-title":"Learning to rank with (a lot of) word features","volume":"13","author":"Bai","year":"2010","journal-title":"Inf. Retr."},{"key":"2023012810471943800_btt474-B3","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1111\/j.1399-0004.2005.00509.x","article-title":"Mapping phenotypes to language: a proposal to organize and standardize the clinical descriptions of malformations","volume":"68","author":"Biesecker","year":"2005","journal-title":"Clin. Genet."},{"key":"2023012810471943800_btt474-B4","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1145\/1102351.1102363","article-title":"Learning to rank using gradient descent","volume-title":"Proceedings of the 22nd International Conference on Machine learning","author":"Burges","year":"2005"},{"key":"2023012810471943800_btt474-B5","first-page":"163","article-title":"Resolution of coordination ellipses in biological named entities using conditional random fields","volume-title":"Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics","author":"Buyko","year":"2007"},{"key":"2023012810471943800_btt474-B6","first-page":"263","article-title":"New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL)","author":"Collins","year":"2002"},{"key":"2023012810471943800_btt474-B7","doi-asserted-by":"crossref","first-page":"bar065","DOI":"10.1093\/database\/bar065","article-title":"MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database","volume":"2012","author":"Davis","year":"2012","journal-title":"Database"},{"key":"2023012810471943800_btt474-B8","doi-asserted-by":"crossref","first-page":"842","DOI":"10.1016\/j.jbi.2012.04.006","article-title":"A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions","volume":"45","author":"Hakenberg","year":"2012","journal-title":"J. Biomed. Inform."},{"key":"2023012810471943800_btt474-B9","doi-asserted-by":"crossref","first-page":"115","DOI":"10.7551\/mitpress\/1113.003.0010","article-title":"Large margin rank boundaries for ordinal regression","volume-title":"Smola,A.J., et al. (eds.), Advances in Large Margin Classifiers","author":"Herbrich","year":"2000"},{"key":"2023012810471943800_btt474-B10","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/1471-2105-6-S1-S11","article-title":"Overview of BioCreAtIvE task 1B: normalized gene lists","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012810471943800_btt474-B11","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-6-S1-S1","article-title":"Overview of BioCreAtIvE: critical assessment of information extraction for biology","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012810471943800_btt474-B12","doi-asserted-by":"crossref","first-page":"1032","DOI":"10.1093\/bioinformatics\/btr042","article-title":"GeneTUKit: a software for document-level gene normalization","volume":"27","author":"Huang","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012810471943800_btt474-B13","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1136\/amiajnl-2010-000055","article-title":"Recommending MeSH terms for annotating biomedical articles","volume":"18","author":"Huang","year":"2011","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2023012810471943800_btt474-B14","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/9780262013055.001.0001","volume-title":"The Processes of Life: An Introduction to Molecular Biology","author":"Hunter","year":"2009"},{"key":"2023012810471943800_btt474-B15","first-page":"91","article-title":"An improved corpus of disease mentions in PubMed citations","volume-title":"Proceedings of the 2012 Workshop on Biomedical Natural Language Processing","author":"Islamaj Do\u011fan","year":"2012"},{"key":"2023012810471943800_btt474-B16","first-page":"8","article-title":"An Inference Method for Disease Name Normalization","volume-title":"Proceedings of the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text","author":"Islamaj Do\u011fan","year":"2012"},{"key":"2023012810471943800_btt474-B17","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2105-9-S3-S3","article-title":"Assessment of disease named entity recognition on a corpus of annotated sentences","volume":"9","author":"Jimeno","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012810471943800_btt474-B18","doi-asserted-by":"crossref","first-page":"876","DOI":"10.1136\/amiajnl-2012-001173","article-title":"Using rule-based natural language processing to improve disease normalization in biomedical text","volume":"20","author":"Kang","year":"2012","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2023012810471943800_btt474-B19","first-page":"1","article-title":"Overview of BioNLP'09 shared task on event extraction","volume-title":"Proceedings of the NAACL-HLT 2009 Workshop on BioNLP","author":"Kim","year":"2009"},{"key":"2023012810471943800_btt474-B20","doi-asserted-by":"crossref","first-page":"bas042","DOI":"10.1093\/database\/bas042","article-title":"Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information","volume":"2012","author":"Kim","year":"2012","journal-title":"Database"},{"key":"2023012810471943800_btt474-B21","first-page":"282","article-title":"Conditional random fields: probabilistic models for segmenting and labeling sequence data","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning","author":"Lafferty","year":"2001"},{"key":"2023012810471943800_btt474-B22","first-page":"652","article-title":"BANNER: an executable survey of advances in biomedical named entity recognition","volume":"13","author":"Leaman","year":"2008","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012810471943800_btt474-B23","first-page":"82","article-title":"Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark","volume-title":"Proceedings of the 2009 Symposium on Languages in Biology and Medicine","author":"Leaman","year":"2009"},{"key":"2023012810471943800_btt474-B24","article-title":"NCBI at 2013 ShARe\/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with DNorm","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum","author":"Leaman","year":"2013"},{"key":"2023012810471943800_btt474-B25","doi-asserted-by":"crossref","first-page":"baq036","DOI":"10.1093\/database\/baq036","article-title":"PubMed and beyond: a survey of web tools for searching biomedical literature","volume":"2011","author":"Lu","year":"2011","journal-title":"Database"},{"key":"2023012810471943800_btt474-B26","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-12-S8-S2","article-title":"The gene normalization task in BioCreative III","volume":"12","author":"Lu","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012810471943800_btt474-B27","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retreival","author":"Manning","year":"2008"},{"key":"2023012810471943800_btt474-B28","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/gb-2008-9-s2-s3","article-title":"Overview of BioCreative II gene normalization","volume":"9","author":"Morgan","year":"2008","journal-title":"Genome Biol."},{"key":"2023012810471943800_btt474-B29","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1145\/2110363.2110455","article-title":"Linking multiple disease-related resources through UMLS","volume-title":"Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium","author":"N\u00e9v\u00e9ol","year":"2012"},{"key":"2023012810471943800_btt474-B30","doi-asserted-by":"crossref","first-page":"D940","DOI":"10.1093\/nar\/gkr972","article-title":"Disease Ontology: a backbone for disease semantic integration","volume":"40","author":"Schriml","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012810471943800_btt474-B31","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1038\/sj.embor.7400195","article-title":"What is a disease?","volume":"5","author":"Scully","year":"2004","journal-title":"EMBO Rep."},{"key":"2023012810471943800_btt474-B32","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1186\/1471-2105-9-402","article-title":"Abbreviation definition identification based on automatic precision estimates","volume":"9","author":"Sohn","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012810471943800_btt474-B33","first-page":"662","article-title":"SNOMED clinical terms: overview of the development process and project status","volume-title":"Proceedings of the AMIA Symposium","author":"Stearns","year":"2001"},{"key":"2023012810471943800_btt474-B34","article-title":"Three shared tasks on clinical natural language processing","volume-title":"Proceedings of the Conference and Labs of the Evaluation Forum","author":"Suominen","year":"2013"},{"key":"2023012810471943800_btt474-B35","doi-asserted-by":"crossref","first-page":"2768","DOI":"10.1093\/bioinformatics\/btm393","article-title":"Learning string similarity measures for gene\/protein name dictionary look-up using logistic regression","volume":"23","author":"Tsuruoka","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012810471943800_btt474-B36","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1136\/amiajnl-2011-000203","article-title":"2010 i2b2\/VA challenge on concepts, assertions, and relations in clinical text","volume":"18","author":"Uzuner","year":"2011","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2023012810471943800_btt474-B37","article-title":"Overview of the TREC 2011 medical records track","volume-title":"The tenth Text REtrieval Conference","author":"Voorhees","year":"2011"},{"key":"2023012810471943800_btt474-B38","doi-asserted-by":"crossref","first-page":"bas041","DOI":"10.1093\/database\/bas041","article-title":"Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts","volume":"2012","author":"Wei","year":"2012","journal-title":"Database"},{"issue":"Web server","key":"2023012810471943800_btt474-B39","doi-asserted-by":"crossref","first-page":"W518","DOI":"10.1093\/nar\/gkt441","article-title":"PubTator: a web-based text mining tool for assisting biocuration","volume":"41","author":"Wei","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023012810471943800_btt474-B40","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1093\/bioinformatics\/btp071","article-title":"High-performance gene name normalization with GeNo","volume":"25","author":"Wermter","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012810471943800_btt474-B41","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bas037","article-title":"Collaborative biocuration\u2013text-mining development task for document prioritization for curation","author":"Wiegers","year":"2012","journal-title":"Database"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/22\/2909\/48891951\/bioinformatics_29_22_2909.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/22\/2909\/48891951\/bioinformatics_29_22_2909.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,17]],"date-time":"2024-05-17T01:36:16Z","timestamp":1715909776000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/22\/2909\/312804"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,21]]},"references-count":41,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2013,11,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt474","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,11,15]]},"published":{"date-parts":[[2013,8,21]]}}}