{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T16:05:03Z","timestamp":1772121903542,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization.<\/jats:p>\n               <jats:p>Methods: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput.<\/jats:p>\n               <jats:p>Results: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance.<\/jats:p>\n               <jats:p>Availability and Implementation: The TaggerOne source code and an online demonstration are available at: http:\/\/www.ncbi.nlm.nih.gov\/bionlp\/taggerone<\/jats:p>\n               <jats:p>Contact: \u00a0zhiyong.lu@nih.gov<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw343","type":"journal-article","created":{"date-parts":[[2016,6,10]],"date-time":"2016-06-10T00:56:16Z","timestamp":1465520176000},"page":"2839-2846","source":"Crossref","is-referenced-by-count":227,"title":["TaggerOne: joint named entity recognition and normalization with semi-Markov Models"],"prefix":"10.1093","volume":"32","author":[{"given":"Robert","family":"Leaman","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]},{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, 8600 Rockville Pike, Bethesda, MD 20894, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,6,9]]},"reference":[{"key":"2023020113391774600_btw343-B1","volume-title":"Predicting Structured Data","author":"Altun","year":"2007"},{"key":"2023020113391774600_btw343-B2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1007\/s10791-009-9117-9","article-title":"Learning to rank with (a lot of) word features","volume":"13","author":"Bai","year":"2010","journal-title":"Inf. Retrieval"},{"key":"2023020113391774600_btw343-B3","doi-asserted-by":"crossref","first-page":"e1003799","DOI":"10.1371\/journal.pcbi.1003799","article-title":"Quantifying the impact and extent of undocumented biomedical synonymy","volume":"10","author":"Blair","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023020113391774600_btw343-B4","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1186\/1471-2105-14-281","article-title":"A modular framework for biomedical concept recognition","volume":"14","author":"Campos","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020113391774600_btw343-B5","author":"Chowdhury","year":"2010"},{"key":"2023020113391774600_btw343-B6","first-page":"89","volume-title":"Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extractions Processes and Data Integration Methods. 10th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. ACM","author":"Cohen","year":"2004"},{"key":"2023020113391774600_btw343-B7","first-page":"265","article-title":"On the algorithmic implementation of multiclass kernel-based vector machines","volume":"2","author":"Crammer","year":"2001","journal-title":"J. Mach. Learn. Res."},{"key":"2023020113391774600_btw343-B8","first-page":"951","article-title":"Ultraconservative online algorithms for multiclass problems","volume":"3","author":"Crammer","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"2023020113391774600_btw343-B9","first-page":"297","author":"D'Souza","year":"2015"},{"key":"2023020113391774600_btw343-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: A resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J. Biomed. Inf"},{"key":"2023020113391774600_btw343-B11","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1162\/tacl_a_00197","article-title":"A joint model for entity analysis: coreference, typing and linking","volume":"2","author":"Durrett","year":"2014","journal-title":"Trans. Assoc. Comput. Linguist"},{"key":"2023020113391774600_btw343-B12","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/1758-2946-6-17","article-title":"Chemical named entities recognition: a review on approaches and applications","volume":"6","author":"Eltyeb","year":"2014","journal-title":"J. Cheminf"},{"key":"2023020113391774600_btw343-B13","first-page":"326","volume-title":"Joint Parsing and Named Entity Recognition. NAACL\/HLT","author":"Finkel","year":"2009"},{"key":"2023020113391774600_btw343-B14","first-page":"720","volume-title":"Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data. 48th ACL","author":"Finkel","year":"2010"},{"key":"2023020113391774600_btw343-B15","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1002\/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P","article-title":"How effective is suffixing?","volume":"42","author":"Hartman","year":"1991","journal-title":"J. Am. Soc. Inf. Sci. Technol"},{"key":"2023020113391774600_btw343-B16","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-6-S1-S1","article-title":"Overview of BioCreAtIvE: critical assessment of information extraction for biology","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020113391774600_btw343-B17","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1758-2946-3-41","article-title":"OSCAR4: a flexible architecture for chemical text-mining","volume":"3","author":"Jessop","year":"2011","journal-title":"J. Cheminf"},{"key":"2023020113391774600_btw343-B18","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2105-9-S3-S3","article-title":"Assessment of disease named entity recognition on a corpus of annotated sentences","volume":"9","author":"Jimeno","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020113391774600_btw343-B19","doi-asserted-by":"crossref","first-page":"876","DOI":"10.1136\/amiajnl-2012-001173","article-title":"Using rule-based natural language processing to improve disease normalization in biomedical text","volume":"20","author":"Kang","year":"2012","journal-title":"J. Am. Med. Inf. Assoc"},{"key":"2023020113391774600_btw343-B20","first-page":"1","article-title":"Overview of BioNLP'09 shared task on event extraction","author":"Kim","year":"2009","journal-title":"BioNLP Workshop"},{"key":"2023020113391774600_btw343-B21","doi-asserted-by":"crossref","first-page":"i268","DOI":"10.1093\/bioinformatics\/btn181","article-title":"Detection of IUPAC and IUPAC-like chemical names","volume":"24","author":"Klinger","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020113391774600_btw343-B22","article-title":"Chemical names: terminological resources and corpora annotation","author":"Kolarik","year":"2008","journal-title":"LREC Workshop on Building and Evaluating Resources for Bbiomedical Text Mining"},{"key":"2023020113391774600_btw343-B23","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1758-2946-7-S1-S1","article-title":"CHEMDNER: The drugs and chemical names extraction challenge","volume":"7","author":"Krallinger","year":"2015","journal-title":"J. Cheminf"},{"key":"2023020113391774600_btw343-B24","first-page":"63","volume-title":"Overview of the CHEMDNER Patents Task. Fifth BioCreative Challenge Evaluation Workshop","author":"Krallinger","year":"2015"},{"key":"2023020113391774600_btw343-B25","first-page":"208","volume-title":"The UET-CAM System in the BioCreAtIvE V CDR Task. BioCreative Workshop","author":"Le","year":"2015"},{"key":"2023020113391774600_btw343-B26","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1093\/bioinformatics\/btt474","article-title":"DNorm: Disease name normalization with pairwise learning-to-rank","volume":"29","author":"Leaman","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020113391774600_btw343-B27","first-page":"652","article-title":"BANNER: an executable survey of advances in biomedical named entity recognition","author":"Leaman","year":"2008","journal-title":"Pac. Symp. Biocomput"},{"key":"2023020113391774600_btw343-B28","first-page":"82","article-title":"Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark","volume":"13","author":"Leaman","year":"2009","journal-title":"Proc Symp on Languages in Biology and Medicine"},{"key":"2023020113391774600_btw343-B29","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.jbi.2015.07.010","article-title":"Challenges in clinical natural language processing for automated disorder normalization","volume":"57","author":"Leaman","year":"2015","journal-title":"J. Biomed. Inf"},{"key":"2023020113391774600_btw343-B30","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1758-2946-7-S1-S3","article-title":"tmChem: a high performance approach for chemical named entity recognition and normalization","volume":"7","author":"Leaman","year":"2015","journal-title":"J. Cheminf"},{"key":"2023020113391774600_btw343-B31","first-page":"226","volume-title":"An Enhanced CRF-Based System for Disease Name Entity Recognition and Normalization on BioCreative V DNER Task. Proc BioCreative Workshop","author":"Lee","year":"2015"},{"key":"2023020113391774600_btw343-B32","author":"Li","year":"2015"},{"key":"2023020113391774600_btw343-B33","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/gb-2008-9-s2-s3","article-title":"Overview of BioCreative II gene normalization","volume":"9","author":"Morgan","year":"2008","journal-title":"Genome Biol"},{"key":"2023020113391774600_btw343-B34","first-page":"465","volume-title":"Improving the scalability of semi-markov conditional random fields for named entity recognition. 21st Int Conf on Comp Ling and 44th ACL. Association for Computational Linguistics","author":"Okanohara","year":"2006"},{"key":"2023020113391774600_btw343-B35","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/eb046814","article-title":"An algorithm for suffix stripping","volume":"14","author":"Porter","year":"1980","journal-title":"Program"},{"key":"2023020113391774600_btw343-B36","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1136\/amiajnl-2013-002544","article-title":"Evaluating the state of the art in disorder recognition and normalization of the clinical narrative","volume":"22","author":"Pradhan","year":"2015","journal-title":"J. Am. Med. Inf. Assoc"},{"key":"2023020113391774600_btw343-B37","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1093\/bioinformatics\/btt580","article-title":"Anatomical entity mention recognition at literature scale","volume":"30","author":"Pyysalo","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113391774600_btw343-B38","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1142\/S0219720010004562","article-title":"CALBC silver standard corpus","volume":"8","author":"Rebholz-Schuhmann","year":"2010","journal-title":"J. Bioinf. Comput. Biol"},{"key":"2023020113391774600_btw343-B39","doi-asserted-by":"crossref","first-page":"1633","DOI":"10.1093\/bioinformatics\/bts183","article-title":"ChemSpot: a hybrid system for chemical named entity recognition","volume":"28","author":"Rocktaschel","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020113391774600_btw343-B40","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1186\/1471-2105-9-402","article-title":"Abbreviation definition identification based on automatic precision estimates","volume":"9","author":"Sohn","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020113391774600_btw343-B41","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1016\/j.jbi.2015.08.008","article-title":"PKDE4J: Entity and relation extraction for public knowledge discovery","volume":"57","author":"Song","year":"2015","journal-title":"J. Biomed. Inf"},{"key":"2023020113391774600_btw343-B42","volume-title":"Adv Neural Inf Process Syst","author":"Taskar","year":"2004"},{"key":"2023020113391774600_btw343-B43","doi-asserted-by":"crossref","first-page":"2768","DOI":"10.1093\/bioinformatics\/btm393","article-title":"Learning string similarity measures for gene\/protein name dictionary look-up using logistic regression","volume":"23","author":"Tsuruoka","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020113391774600_btw343-B44","author":"Usami","year":"2011"},{"key":"2023020113391774600_btw343-B45","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1002\/minf.201100005","article-title":"Text mining for drugs and chemical compounds: methods, tools and applications","volume":"30","author":"Vazquez","year":"2011","journal-title":"Mol. Inf"},{"key":"2023020113391774600_btw343-B46","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"BioMed. Res. Int"},{"key":"2023020113391774600_btw343-B47","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/JBHI.2015.2422651","article-title":"SimConcept: a hybrid approach for simplifying composite named entities in biomedical text","volume":"19","author":"Wei","year":"2015","journal-title":"IEEE J. Biomed. Health Inf"},{"key":"2023020113391774600_btw343-B48","author":"Wei","year":"2015"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/18\/2839\/49020913\/bioinformatics_32_18_2839.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/18\/2839\/49020913\/bioinformatics_32_18_2839.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:44:29Z","timestamp":1675295069000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/18\/2839\/1744190"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,9]]},"references-count":48,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2016,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw343","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,9,15]]},"published":{"date-parts":[[2016,6,9]]}}}