{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:01:44Z","timestamp":1772906504822,"version":"3.50.1"},"reference-count":117,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2020,6,30]],"date-time":"2020-06-30T00:00:00Z","timestamp":1593475200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Ministry of Education, Taiwan"},{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["108-2319-B-400-001"],"award-info":[{"award-number":["108-2319-B-400-001"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein\u2013protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein\u2013protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https:\/\/iasl-btm.iis.sinica.edu.tw\/BNER\/Content\/Re vised_JNLPBA.zip. The EBED dataset is available at https:\/\/iasl-btm.iis.sinica.edu.tw\/BNER\/Content\/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bib\/bbaa054","type":"journal-article","created":{"date-parts":[[2020,6,29]],"date-time":"2020-06-29T19:15:14Z","timestamp":1593458114000},"page":"2219-2238","source":"Crossref","is-referenced-by-count":48,"title":["Biomedical named entity recognition and linking datasets: survey and our recent development"],"prefix":"10.1093","volume":"21","author":[{"given":"Ming-Siang","family":"Huang","sequence":"first","affiliation":[{"name":"Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Po-Ting","family":"Lai","sequence":"additional","affiliation":[{"name":"Institute of Biomedical Informatics, National Yang Ming University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pei-Yen","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu-Ting","family":"You","sequence":"additional","affiliation":[{"name":"Intelligent Agent Systems Laboratory, Institute of Information Science, Academia Sinica, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Richard Tzong-Han","family":"Tsai","sequence":"additional","affiliation":[{"name":"Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wen-Lian","family":"Hsu","sequence":"additional","affiliation":[{"name":"Intelligent Agent Systems Laboratory, Institute of Information Science, Academia Sinica, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,6,30]]},"reference":[{"issue":"3","key":"2020120303471761000_ref1","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1007\/s11192-010-0202-z","article-title":"The rate of growth in scientific publication and the decline in coverage provided by science citation index","volume":"84","author":"Larsen","year":"2010","journal-title":"Scientometrics"},{"issue":"11","key":"2020120303471761000_ref2","doi-asserted-by":"crossref","first-page":"2215","DOI":"10.1002\/asi.23329","article-title":"Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references","volume":"66","author":"Bornmann","year":"2015","journal-title":"J Assoc Inf Sci Technol"},{"key":"2020120303471761000_ref3","first-page":"119","article-title":"Literature mining for the biologist: from information retrieval to biological discovery","volume-title":"Nat Rev Genet","author":"Jensen","year":"2006"},{"key":"2020120303471761000_ref4","first-page":"705","article-title":"The impact of named entity normalization on information retrieval for question answering","volume-title":"European Conference on Information Retrieval","author":"Khalid","year":"2008"},{"key":"2020120303471761000_ref5","first-page":"1","article-title":"Overview of BioNLP shared task 2013","author":"N\u00e9dellec","journal-title":"Proceedings of the BioNLP Shared Task 2013 Workshop"},{"key":"2020120303471761000_ref6","first-page":"138","article-title":"An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition","volume-title":"BMC Bioinform","author":"Tsatsaronis","year":"2015"},{"key":"2020120303471761000_ref7","first-page":"70","article-title":"Introduction to the bio-entity recognition task at JNLPBA","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications","author":"Kim","year":"2004"},{"key":"2020120303471761000_ref8","first-page":"e79517","article-title":"Collective instance-level gene normalization on the IGN corpus","volume-title":"PLoS One","author":"Dai","year":"2013"},{"key":"2020120303471761000_ref9","first-page":"1","article-title":"Overview of BioNLP \u201909 shared task on event extraction","volume-title":"Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task","author":"Kim","year":"2009"},{"key":"2020120303471761000_ref10","first-page":"1","article-title":"Overview of BioNLP shared task 2011","author":"Kim","year":"2011","journal-title":"Proceedings of the BioNLP Shared Task 2011 Workshop, BioNLP Shared Task \u201911"},{"issue":"2","key":"2020120303471761000_ref11","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.artmed.2004.07.016","article-title":"Comparative experiments on learning information extractors for proteins and their interactions","volume":"33","author":"Bunescu","year":"2005","journal-title":"Artif Intell Med"},{"key":"2020120303471761000_ref12","first-page":"50","article-title":"BioInfer: a corpus for information extraction in the biomedical domain","volume-title":"BMC Bioinform","author":"Pyysalo","year":"2007"},{"key":"2020120303471761000_ref13","first-page":"S3","article-title":"Genetag: a tagged corpus for gene\/protein named entity recognition","volume-title":"BMC Bioinform","author":"Tanabe","year":"2005"},{"issue":"1","key":"2020120303471761000_ref14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-017-05778-z","article-title":"Learning a health knowledge graph from electronic medical records","volume":"7","author":"Rotmensch","year":"2017","journal-title":"Sci Rep"},{"key":"2020120303471761000_ref15","first-page":"92","article-title":"Various criteria in the evaluation of biomedical named entity recognition","volume-title":"BMC Bioinform","author":"Tzong-Han Tsai","year":"2006"},{"key":"2020120303471761000_ref16","doi-asserted-by":"crossref","first-page":"101767","DOI":"10.1016\/j.artmed.2019.101767","article-title":"SemBioNLQA: a semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions","volume":"102","author":"Sarrouti","year":"2020","journal-title":"Artif Intell Med"},{"key":"2020120303471761000_ref17","first-page":"i180","article-title":"Genia corpus\u2014a semantically annotated corpus for bio-textmining","volume-title":"Bioinformatics","author":"Kim","year":"2003"},{"key":"2020120303471761000_ref18","article-title":"Genia ontology","volume-title":"Report TR-NLP-UT-2006-2","author":"Teteisi","year":"2006"},{"key":"2020120303471761000_ref19","first-page":"96","article-title":"Exploring deep knowledge resources in biomedical name recognition","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"GuoDong","year":"2004"},{"key":"2020120303471761000_ref20","first-page":"S11","article-title":"NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition","volume-title":"BMC Bioinform","author":"Tzong-Han Tsai","year":"2006"},{"issue":"9","key":"2020120303471761000_ref21","doi-asserted-by":"crossref","first-page":"1547","DOI":"10.1093\/bioinformatics\/btx815","article-title":"Gram-CNN: a deep learning approach with local context for named entity recognition in biomedical text","volume":"34","author":"Zhu","year":"2017","journal-title":"Bioinformatics"},{"key":"2020120303471761000_ref22","first-page":"326","volume-title":"Mining MEDLINE: abstracts, sentences, or phrases","author":"Ding","year":"2001"},{"key":"2020120303471761000_ref23","first-page":"S11","article-title":"Overview of biocreative task 1b: normalized gene lists","volume-title":"BMC Bioinform","author":"Hirschman","year":"2005"},{"key":"2020120303471761000_ref24","first-page":"1","article-title":"Learning language in logic\u2013genic interaction extraction challenge","volume-title":"Proceedings of the 4th Learning Language in Logic Workshop (LLL05)","author":"N\u00e9dellec","year":"2005"},{"issue":"3","key":"2020120303471761000_ref25","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/bioinformatics\/btl616","article-title":"Relex-relation extraction using dependency parse trees","volume":"23","author":"Fundel","year":"2006","journal-title":"Bioinformatics"},{"key":"2020120303471761000_ref26","first-page":"S4","article-title":"Cascaded classifiers for confidence-based chemical named entity recognition","volume-title":"BMC Bioinform","author":"Corbett","year":"2008"},{"key":"2020120303471761000_ref27","article-title":"Chemical names: terminological resources and corpora annotation","volume-title":"Workshop on Building and Evaluating Resources for Biomedical Text Mining (6th Edition of the Language Resources and Evaluation Conference)","author":"Kol\u00e1rik","year":"2008"},{"issue":"2","key":"2020120303471761000_ref28","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/gb-2008-9-s2-s3","article-title":"Overview of BioCreative II gene normalization","volume":"9","author":"Morgan","year":"2008","journal-title":"Genome Biol"},{"key":"2020120303471761000_ref29","first-page":"84","article-title":"Osirisv1. 2: a named entity recognition system for sequence variants of genes in biomedical literature","volume-title":"BMC Bioinform","author":"Furlong","year":"2008"},{"key":"2020120303471761000_ref30","first-page":"82","article-title":"Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark","volume-title":"Proceedings of the 2009 Symposium on Languages in Biology and Medicine","author":"Leaman","year":"2009"},{"issue":"9","key":"2020120303471761000_ref31","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2105-10-S9-S12","article-title":"Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease","volume":"10","author":"South","year":"2009","journal-title":"BMC Bioinform"},{"issue":"3","key":"2020120303471761000_ref32","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1109\/TCBB.2010.61","article-title":"An overview of BioCreative II. 5","volume":"7","author":"Leitner","year":"2010","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2020120303471761000_ref33","first-page":"85","article-title":"Linnaeus: a species name identification system for biomedical literature","volume-title":"BMC Bioinform","author":"Gerner","year":"2010"},{"issue":"4","key":"2020120303471761000_ref34","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-12-S4-S4","article-title":"Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers","volume":"12","author":"Thomas","year":"2011","journal-title":"BMC Bioinformatics"},{"issue":"S8","key":"2020120303471761000_ref35","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-12-S8-S2","article-title":"The gene normalization task in BioCreative III","volume":"12","author":"Lu","year":"2011","journal-title":"BMC Bioinform"},{"key":"2020120303471761000_ref36","first-page":"16","article-title":"Annotating and evaluating text for stem cell research","volume-title":"Proceedings of the Third Workshop on Building and Evaluation Resources for Biomedical Text Mining (BioTxtM 2012) at Language Resources and Evaluation (LREC). Istanbul, Turkey","author":"Neves","year":"2012"},{"key":"2020120303471761000_ref37","first-page":"27","article-title":"Open-domain anatomical entity mention detection","volume-title":"Proceedings of the Workshop on Detecting Structure in Scholarly Discourse","author":"Ohta","year":"2012"},{"key":"2020120303471761000_ref38","first-page":"161","article-title":"Concept annotation in the craft corpus","volume-title":"BMC Bioinformatics","author":"Bada","year":"2012"},{"key":"2020120303471761000_ref39","first-page":"879","article-title":"The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships","volume-title":"J Biomed Inform","author":"Van Mulligen","year":"2012"},{"issue":"5","key":"2020120303471761000_ref40","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","article-title":"The DDI corpus: an annotated corpus with pharmacological substances and drug\u2013drug interactions","volume":"46","author":"Herrero-Zazo","year":"2013","journal-title":"J Biomed Inform"},{"key":"2020120303471761000_ref41","first-page":"e65390","article-title":"The species and organisms resources for fast and accurate identification of taxonomic names in text","volume-title":"PLoS One","author":"Pafilis","year":"2013"},{"key":"2020120303471761000_ref42","first-page":"1","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume-title":"J Biomed Inform","author":"Do\u011fan","year":"2014"},{"key":"2020120303471761000_ref43","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bau086","article-title":"Overview of the gene ontology task at BioCreative IV","volume":"2014","author":"Mao","year":"2014","journal-title":"Database"},{"key":"2020120303471761000_ref44","first-page":"S1","article-title":"Chemdner: the drugs and chemical names extraction challenge","volume-title":"J Cheminform","author":"Krallinger","year":"2015"},{"key":"2020120303471761000_ref45","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1016\/j.jbi.2015.09.018","article-title":"Creation of a new longitudinal corpus of clinical narratives","volume":"58","author":"Kumar","year":"2015","journal-title":"J Biomed Inform"},{"key":"2020120303471761000_ref46","first-page":"S2","article-title":"Overview of the cancer genetics and pathway curation tasks of BioNLP shared task 2013","volume-title":"BMC Bioinform","author":"Pyysalo","year":"2015"},{"key":"2020120303471761000_ref47","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database"},{"key":"2020120303471761000_ref48","article-title":"Evaluation of chemical and gene\/protein entity recognition systems at BioCreative V. 5: the CEMP and GPRO patents tracks","volume-title":"Proceedings of the BioCreative V.5 Challenge Evaluation Workshop","author":"P\u00e9rez-P\u00e9rez","year":"2017"},{"key":"2020120303471761000_ref49","first-page":"451","article-title":"A method for named entity normalization in biomedical articles: application to diseases and plants","volume-title":"BMC Bioinform","author":"Cho","year":"2017"},{"key":"2020120303471761000_ref50","doi-asserted-by":"publisher","first-page":"2033","DOI":"10.18653\/v1\/D18-1228","article-title":"Annotation of a large clinical entity corpus","volume-title":"EMNLP","author":"Patel","year":"2018"},{"key":"2020120303471761000_ref51","article-title":"Medmentions: a large biomedical corpus annotated with UMLS concepts","author":"Mohan","year":"2019"},{"issue":"138\u201363","key":"2020120303471761000_ref52","first-page":"8","article-title":"On protein synthesis","volume":"12","author":"Crick","year":"1958","journal-title":"Symp Soc Exp Biol"},{"key":"2020120303471761000_ref53","first-page":"561","article-title":"Central dogma of molecular biology","volume-title":"Nature","author":"Crick","year":"1970"},{"key":"2020120303471761000_ref54","first-page":"S2","article-title":"Overview of BioCreative II gene mention recognition","volume-title":"Genome Biol","author":"Smith","year":"2008"},{"key":"2020120303471761000_ref55","article-title":"Evaluation, corpora and analysis of chemical and gene\/protein name recognition in patents: the CHEMDNER patents text mining task at BioCreative V","volume-title":"Database","author":"Louren\u00e7o","year":"2016"},{"key":"2020120303471761000_ref56","first-page":"i268","article-title":"Detection of IUPAC and IUPAC-like chemical names","volume-title":"Bioinformatics","author":"Klinger","year":"2008"},{"key":"2020120303471761000_ref57","first-page":"341","article-title":"SemEval-2013 task 9: extraction of drug\u2013drug interactions from biomedical texts (DDIExtraction 2013)","volume-title":"Second Joint Conference on Lexical and Computational Semantics ($^{\\ast }$SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)","author":"Segura-Bedmar","year":"2013"},{"issue":"12","key":"2020120303471761000_ref58","doi-asserted-by":"crossref","first-page":"1633","DOI":"10.1093\/bioinformatics\/bts183","article-title":"Chemspot: a hybrid system for chemical named entity recognition","volume":"28","author":"Rockt\u00e4schel","year":"2012","journal-title":"Bioinformatics"},{"key":"2020120303471761000_ref59","first-page":"2","article-title":"Overview of the chemical compound and drug name recognition (CHEMDNER) task","volume-title":"BioCreative Challenge Evaluation Workshop","author":"Krallinger","year":"2013"},{"key":"2020120303471761000_ref60","first-page":"63","article-title":"Overview of the CHEMDNER patents task","volume-title":"Proceedings of the Fifth BioCreative Challenge Evaluation Workshop","author":"Krallinger","year":"2015"},{"issue":"1","key":"2020120303471761000_ref61","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1093\/nar\/28.1.352","article-title":"dbSNP: a database of single nucleotide polymorphisms","volume":"28","author":"Smigielski","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2020120303471761000_ref62","first-page":"D54","article-title":"Entrez gene: gene-centered information at NCBI","volume-title":"Nucleic Acids Res","author":"Maglott","year":"2005"},{"issue":"3","key":"2020120303471761000_ref63","first-page":"265","article-title":"Medical subject headings (mesh)","volume":"88","author":"Lipscomb","year":"2000","journal-title":"Bull Med Library Assoc"},{"key":"2020120303471761000_ref64","first-page":"D514","article-title":"Online mendelian inheritance in man (omim), a knowledgebase of human genes and genetic disorders","volume-title":"Nucleic Acids Res","author":"Hamosh","year":"2005"},{"issue":"22","key":"2020120303471761000_ref65","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1093\/bioinformatics\/btt474","article-title":"Dnorm: disease name normalization with pairwise learning to rank","volume":"29","author":"Leaman","year":"2013","journal-title":"Bioinformatics"},{"issue":"18","key":"2020120303471761000_ref66","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1093\/bioinformatics\/btw343","article-title":"Taggerone: joint named entity recognition and normalization with semi-markov models","volume":"32","author":"Leaman","year":"2016","journal-title":"Bioinformatics"},{"issue":"15","key":"2020120303471761000_ref67","doi-asserted-by":"crossref","first-page":"2363","DOI":"10.1093\/bioinformatics\/btx172","article-title":"A transition-based joint model for disease named entity recognition and normalization","volume":"33","author":"Lou","year":"2017","journal-title":"Bioinformatics"},{"key":"2020120303471761000_ref68","first-page":"S6","article-title":"Comparative analysis of five protein\u2013protein interaction corpora","volume-title":"BMC Bioinform","author":"Pyysalo","year":"2008"},{"issue":"D1","key":"2020120303471761000_ref69","doi-asserted-by":"crossref","first-page":"D1018","DOI":"10.1093\/nar\/gky1105","article-title":"Expansion of the human phenotype ontology (HPO) knowledge base and resources","volume":"47","author":"K\u00f6hler","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"2020120303471761000_ref70","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1006\/geno.2002.6748","article-title":"Guidelines for human gene nomenclature","volume":"79","author":"Wain","year":"2002","journal-title":"Genomics"},{"key":"2020120303471761000_ref71","volume-title":"A Guide to IUPAC Nomenclature of Organic Compounds","author":"Panico","year":"1993"},{"key":"2020120303471761000_ref72","article-title":"Nomenclature of inorganic chemistry: IUPAC recommendations 2005","volume-title":"Chemistry International","author":"Ture","year":"2005"},{"issue":"4","key":"2020120303471761000_ref73","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1016\/j.compbiolchem.2008.03.008","article-title":"Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature","volume":"32","author":"Yang","year":"2008","journal-title":"Comput Biol Chem"},{"key":"2020120303471761000_ref74","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program","volume-title":"Proceedings of the AMIA Symposium","author":"Aronson","year":"2001"},{"issue":"3","key":"2020120303471761000_ref75","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1136\/jamia.2009.002733","article-title":"An overview of metamap: historical perspective and recent advances","volume":"17","author":"Aronson","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2020120303471761000_ref76","first-page":"1425","article-title":"A machine learning approach for phenotype name recognition","volume-title":"Proceedings of COLING 2012","author":"Khordad","year":"2012"},{"key":"2020120303471761000_ref77","first-page":"74","article-title":"Comparison of metamap and ctakes for entity extraction in clinical notes","volume-title":"BMC Medi Inform Decis Mak","author":"Re\u00e1tegui","year":"2018"},{"key":"2020120303471761000_ref78","first-page":"876","article-title":"Using rule-based natural language processing to improve disease normalization in biomedical text","volume-title":"J Am Med Inform Assoc","author":"Kang","year":"2013"},{"key":"2020120303471761000_ref79","first-page":"707","article-title":"Toward information extraction: identifying protein names from biological papers","volume-title":"Pac Symp Biocomput","author":"Fukuda","year":"1998"},{"key":"2020120303471761000_ref80","first-page":"S10","article-title":"Text detective: a rule-based system for gene annotation in biomedical texts","volume-title":"BMC Bioinform","author":"Tamames","year":"2005"},{"key":"2020120303471761000_ref81","first-page":"S10","article-title":"A critical assessment of text mining methods in molecular biology","volume-title":"BMC Bioinform","author":"Hirschman","year":"2005"},{"key":"2020120303471761000_ref82","first-page":"S2","article-title":"BioCreative task 1a: gene mention finding evaluation","volume-title":"BMC Bioinform","author":"Yeh","year":"2005"},{"key":"2020120303471761000_ref83","first-page":"S14","article-title":"Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization","volume-title":"J Cheminform","author":"Dai","year":"2015"},{"key":"2020120303471761000_ref84","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3115\/1118149.1118150","article-title":"Tuning support vector machines for biomedical named entity recognition","volume-title":"Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain-Volume 3","author":"Kazama","year":"2002"},{"issue":"7","key":"2020120303471761000_ref85","doi-asserted-by":"crossref","first-page":"1178","DOI":"10.1093\/bioinformatics\/bth060","article-title":"Recognizing names in biomedical texts: a machine learning approach","volume":"20","author":"Zhou","year":"2004","journal-title":"Bioinformatics"},{"key":"2020120303471761000_ref86","first-page":"S6","article-title":"Identifying gene and protein mentions in text using conditional random fields","volume-title":"BMC Bioinform","author":"McDonald","year":"2005"},{"key":"2020120303471761000_ref87","first-page":"403","article-title":"Efficiently inducing features of conditional random fields","volume-title":"Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence","author":"McCallum","year":"2002"},{"key":"2020120303471761000_ref88","first-page":"64","article-title":"Statistical principle-based approach for gene and protein related object recognition","volume-title":"J Cheminform","author":"Lai","year":"2018"},{"issue":"10","key":"2020120303471761000_ref89","doi-asserted-by":"crossref","first-page":"1671","DOI":"10.1109\/LSP.2015.2420092","article-title":"Deep neural network approaches to speaker and language recognition","volume":"22","author":"Richardson","year":"2015","journal-title":"IEEE Signal Proc Lett"},{"key":"2020120303471761000_ref90","volume-title":"Nvidia Demos a Car Computer Trained With Deep Learning","author":"Nelson"},{"key":"2020120303471761000_ref91","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1145\/1390156.1390177","article-title":"A unified architecture for natural language processing: deep neural networks with multitask learning","volume-title":"Proceedings of the 25th International Conference on Machine Learning","author":"Collobert","year":"2008"},{"key":"2020120303471761000_ref92","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw140","article-title":"Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks","volume":"2016","author":"Wei","year":"2016","journal-title":"Database"},{"key":"2020120303471761000_ref93","first-page":"1812","article-title":"Clinical named entity recognition using deep learning models","volume":"2017","author":"Wu","year":"2017","journal-title":"AMIA Annual Symposium Proceedings"},{"key":"2020120303471761000_ref94","doi-asserted-by":"crossref","first-page":"166","DOI":"10.18653\/v1\/W16-2922","article-title":"How to train good word embeddings for biomedical NLP","volume-title":"Proceedings of the 15th Workshop on Biomedical Natural Language Processing","author":"Chiu","year":"2016"},{"key":"2020120303471761000_ref95","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1007\/978-981-15-0118-0_5","article-title":"Biowordvec, improving biomedical word embeddings with subword information and mesh","volume-title":"Sci Data","author":"Zhang","year":"2019"},{"key":"2020120303471761000_ref96","first-page":"1234","article-title":"Biobert: pre-trained biomedical language representation model for biomedical text mining","author":"Lee","year":"2020"},{"key":"2020120303471761000_ref97","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin"},{"key":"2020120303471761000_ref98","first-page":"552","article-title":"2010 i2b2\/va challenge on concepts, assertions, and relations in clinical text","volume-title":"J Am Med Inform Assoc","author":"Uzuner","year":"2011"},{"key":"2020120303471761000_ref99","first-page":"55","article-title":"Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research","volume-title":"BMC Bioinform","author":"Bravo","year":"2015"},{"key":"2020120303471761000_ref100","first-page":"141","article-title":"Overview of the BioCreative VI chemical\u2013protein interaction track","volume-title":"Proceedings of the Sixth BioCreative Challenge Evaluation Workshop","author":"Krallinger","year":"2017"},{"key":"2020120303471761000_ref101","first-page":"102","article-title":"Brat: a web-based tool for NLP-assisted text annotation","volume-title":"Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics","author":"Stenetorp","year":"2012"},{"issue":"2","key":"2020120303471761000_ref102","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1073\/pnas.98.2.381","volume":"98","author":"Roberts","year":"2001","journal-title":"Pubmed Central: The Genbank of the Published Literature. Proceedings of the National Academy of Sciences"},{"key":"2020120303471761000_ref103","first-page":"20","article-title":"Pubtator: a PubMed-like interactive curation system for document triage and literature curation","author":"Wei","journal-title":"Proceedings of the BioCreative 2012 Workshop, Washington, DC"},{"issue":"W1","key":"2020120303471761000_ref104","doi-asserted-by":"crossref","first-page":"W518","DOI":"10.1093\/nar\/gkt441","article-title":"Pubtator: a web-based text mining tool for assisting biocuration","volume":"41","author":"Wei","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2020120303471761000_ref105","first-page":"D344","article-title":"ChEBI: a database and ontology for chemical entities of biological interest","volume-title":"Nucleic Acids Res","author":"Degtyarenko","year":"2007"},{"key":"2020120303471761000_ref106","first-page":"652","article-title":"Banner: an executable survey of advances in biomedical named entity recognition","volume-title":"Biocomputing 2008","author":"Leaman","year":"2008"},{"key":"2020120303471761000_ref107","first-page":"54","article-title":"Gimli: open source and high-performance biomedical name recognition","volume-title":"BMC Bioinform","author":"Campos","year":"2013"},{"key":"2020120303471761000_ref108","volume-title":"Nersuite: a named entity recognition toolkit","author":"Cho","year":"2010"},{"key":"2020120303471761000_ref109","doi-asserted-by":"crossref","DOI":"10.1201\/9780429258589","volume-title":"Practical Statistics for Medical Research","author":"Altman","year":"1990"},{"key":"2020120303471761000_ref110","article-title":"Genia tagger: part-of-speech tagging, shallow parsing, and named entity recognition for biomedical text","author":"Tsuruoka","year":"2006"},{"key":"2020120303471761000_ref111","doi-asserted-by":"crossref","DOI":"10.3115\/1118108.1118117","article-title":"NLTK: the natural language toolkit","volume-title":"Proceedings of the ACL Interactive Poster and Demonstration Sessions","author":"Loper"},{"key":"2020120303471761000_ref112","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"Glove: global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"2020120303471761000_ref113","doi-asserted-by":"crossref","first-page":"249","DOI":"10.3115\/981967.981999","article-title":"Estimating upper and lower bounds on the performance of word-sense disambiguation programs","volume-title":"Proceedings of the 30th Annual Meeting on Association for Computational Linguistics","author":"Gale","year":"1992"},{"key":"2020120303471761000_ref114","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/1295074.1295082","article-title":"Toward a text classification system for the quality assessment of software requirements written in natural language","volume-title":"Fourth International Workshop on Software Quality Assurance: In Conjunction With the 6th ESEC\/FSE Joint Meeting","author":"Ormandjieva","year":"2007"},{"key":"2020120303471761000_ref115","doi-asserted-by":"crossref","DOI":"10.1002\/9781444324044.ch11","article-title":"11 evaluation of NLP systems","volume":"57","author":"Resnik","year":"2010","journal-title":"The Handbook of Computational Linguistics and Natural Language Processing"},{"key":"2020120303471761000_ref116","first-page":"298","article-title":"Inter-annotator agreement and the upper limit on machine performance: evidence from biomedical natural language processing","volume-title":"Stud Health Technol Inform","author":"Boguslav","year":"2017"},{"key":"2020120303471761000_ref117","first-page":"D786","article-title":"Comparative toxicogenomics database: a knowledgebase and discovery tool for chemical\u2013gene\u2013disease networks","volume-title":"Nucleic Acids Res","author":"Davis","year":"2008"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/21\/6\/2219\/34671873\/bbaa054.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/21\/6\/2219\/34671873\/bbaa054.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,22]],"date-time":"2021-03-22T05:42:28Z","timestamp":1616391748000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/21\/6\/2219\/5850239"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,30]]},"references-count":117,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2020,6,30]]},"published-print":{"date-parts":[[2020,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa054","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,11]]},"published":{"date-parts":[[2020,6,30]]}}}