{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T05:34:46Z","timestamp":1770960886503,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T00:00:00Z","timestamp":1700179200000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of BELB is available at: https:\/\/github.com\/sg-wbi\/belb. The code to reproduce our experiments can be found at: https:\/\/github.com\/sg-wbi\/belb-exp.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad698","type":"journal-article","created":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T18:24:15Z","timestamp":1700245455000},"source":"Crossref","is-referenced-by-count":5,"title":["BELB: a biomedical entity linking benchmark"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-8234-8299","authenticated-orcid":false,"given":"Samuele","family":"Garda","sequence":"first","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2499-472X","authenticated-orcid":false,"given":"Leon","family":"Weber-Genzel","sequence":"additional","affiliation":[{"name":"Center for Information and Language Processing, Ludwig-Maximilians-Universit\u00e4t M\u00fcnchen , M\u00fcnchen 80539, Germany"}]},{"given":"Robert","family":"Martin","sequence":"additional","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"}]},{"given":"Ulf","family":"Leser","sequence":"additional","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"}]}],"member":"286","published-online":{"date-parts":[[2023,11,17]]},"reference":[{"key":"2023112802465564500_btad698-B1","author":"Agarwal","year":"2022"},{"key":"2023112802465564500_btad698-B2","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baac047","article-title":"Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics","volume":"2022","author":"Almeida","year":"2022","journal-title":"Database (Oxford)"},{"key":"2023112802465564500_btad698-B3","first-page":"376","author":"Arighi","year":"2017"},{"key":"2023112802465564500_btad698-B4","doi-asserted-by":"crossref","first-page":"25","DOI":"10.7171\/jbt.18-2902-002","article-title":"The cellosaurus, a cell-line knowledge resource","volume":"29","author":"Bairoch","year":"2018","journal-title":"J Biomol Tech"},{"key":"2023112802465564500_btad698-B5","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: a review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023112802465564500_btad698-B6","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023112802465564500_btad698-B7","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gku1055","article-title":"Gene: a gene-centered information resource at NCBI","volume":"43","author":"Brown","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023112802465564500_btad698-B8","doi-asserted-by":"crossref","first-page":"D1257","DOI":"10.1093\/nar\/gkac833","article-title":"Comparative toxicogenomics database (CTD): update 2023","volume":"51","author":"Davis","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2023112802465564500_btad698-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J Biomed Inform"},{"key":"2023112802465564500_btad698-B10","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/1471-2105-9-84","article-title":"OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature","volume":"9","author":"Furlong","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023112802465564500_btad698-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans Comput Healthcare"},{"key":"2023112802465564500_btad698-B12","author":"Hou","year":"2023"},{"key":"2023112802465564500_btad698-B13","doi-asserted-by":"crossref","first-page":"103779","DOI":"10.1016\/j.jbi.2021.103779","article-title":"NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition","volume":"118","author":"Islamaj","year":"2021","journal-title":"J Biomed Inform"},{"key":"2023112802465564500_btad698-B14","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baac102","article-title":"NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles","volume":"2022","author":"Islamaj","year":"2022","journal-title":"Database (Oxford)"},{"key":"2023112802465564500_btad698-B15","doi-asserted-by":"crossref","first-page":"ooab025","DOI":"10.1093\/jamiaopen\/ooab025","article-title":"Annotation and initial evaluation of a large annotated German oncological corpus","volume":"4","author":"Kittner","year":"2021","journal-title":"JAMIA Open"},{"key":"2023112802465564500_btad698-B16","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1093\/bioinformatics\/btw343","article-title":"TaggerOne: joint named entity recognition and normalization with semi-Markov models","volume":"32","author":"Leaman","year":"2016","journal-title":"Bioinformatics"},{"key":"2023112802465564500_btad698-B17","doi-asserted-by":"crossref","first-page":"e0126283","DOI":"10.1371\/journal.pone.0126283","article-title":"Assembly of a comprehensive regulatory network for the mammalian circadian clock: a bioinformatics approach","volume":"10","author":"Lehmann","year":"2015","journal-title":"PLoS One"},{"key":"2023112802465564500_btad698-B18","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database (Oxford)"},{"key":"2023112802465564500_btad698-B19","first-page":"4228","author":"Liu","year":"2021"},{"key":"2023112802465564500_btad698-B20","doi-asserted-by":"crossref","first-page":"1529\u2013e1","DOI":"10.1093\/jamia\/ocaa106","article-title":"The 2019 n2c2\/UMass Lowell shared task on clinical concept normalization","volume":"27","author":"Luo","year":"2020","journal-title":"J Am Med Inform Assoc"},{"key":"2023112802465564500_btad698-B21","first-page":"1","article-title":"M. LINNAEUS: a species name identification system for biomedical literature","volume":"11","author":"Martin","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023112802465564500_btad698-B22","author":"Miranda-Escalada","year":"2022"},{"key":"2023112802465564500_btad698-B23","author":"Mohan","year":"2019"},{"key":"2023112802465564500_btad698-B24","first-page":"1","author":"Mork","year":"2013"},{"key":"2023112802465564500_btad698-B25","doi-asserted-by":"crossref","first-page":"4837","DOI":"10.1093\/bioinformatics\/btac598","article-title":"BERN2: an advanced neural biomedical named entity recognition and normalization tool","volume":"38","author":"Mujeen","year":"2022","journal-title":"Bioinformatics"},{"key":"2023112802465564500_btad698-B26","author":"Neumann","year":"2019"},{"key":"2023112802465564500_btad698-B27","doi-asserted-by":"crossref","first-page":"e65390","DOI":"10.1371\/journal.pone.0065390","article-title":"The SPECIES and ORGANISMS resources for fast and accurate identification of taxonomic names in text","volume":"8","author":"Pafilis","year":"2013","journal-title":"PLoS One"},{"key":"2023112802465564500_btad698-B28","first-page":"58","author":"Peng","year":"2019"},{"key":"2023112802465564500_btad698-B29","doi-asserted-by":"crossref","first-page":"605","DOI":"10.3233\/SW-170286","article-title":"GERBIL \u2013 bechmarking named entity recognition and linking consistently","volume":"9","author":"R\u00f6der","year":"2018","journal-title":"Semnatic Web"},{"key":"2023112802465564500_btad698-B30","doi-asserted-by":"crossref","first-page":"D136","DOI":"10.1093\/nar\/gkr1178","article-title":"The NCBI taxonomy database","volume":"40","author":"Scott","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023112802465564500_btad698-B31","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023112802465564500_btad698-B32","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1186\/1471-2105-9-402","article-title":"Abbreviation definition identification based on automatic precision estimates","volume":"9","author":"Sohn","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023112802465564500_btad698-B33","first-page":"3641","author":"Sung","year":"2020"},{"key":"2023112802465564500_btad698-B34","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-12-S4-S4","article-title":"Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers","volume":"12","author":"Thomas","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023112802465564500_btad698-B35","first-page":"6710","author":"Tutubalina","year":"2020"},{"key":"2023112802465564500_btad698-B36","doi-asserted-by":"crossref","first-page":"e38460","DOI":"10.1371\/journal.pone.0038460","article-title":"SR4GN: a species recognition software tool for gene normalization","volume":"7","author":"Wei","year":"2012","journal-title":"PLoS One"},{"key":"2023112802465564500_btad698-B37","doi-asserted-by":"crossref","first-page":"918710","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"Biomed Res Int"},{"key":"2023112802465564500_btad698-B38","doi-asserted-by":"crossref","first-page":"W587","DOI":"10.1093\/nar\/gkz389","article-title":"PubTator Central: automated concept annotation for biomedical full text articles","volume":"47","author":"Wei","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023112802465564500_btad698-B39","doi-asserted-by":"crossref","first-page":"4449","DOI":"10.1093\/bioinformatics\/btac537","article-title":"tmVar 3.0: an improved variant concept recognition and normalization tool","volume":"38","author":"Wei","year":"2022","journal-title":"Bioinformatics"},{"key":"2023112802465564500_btad698-B40","first-page":"6397","author":"Wu","year":"2020"},{"key":"2023112802465564500_btad698-B41","author":"Yuan","year":"2022"},{"key":"2023112802465564500_btad698-B42","first-page":"868","author":"Zhang","year":"2022"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad698\/53483107\/btad698.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad698\/53863040\/btad698.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad698\/53863040\/btad698.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T02:47:23Z","timestamp":1701139643000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad698\/7425450"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,11,1]]},"references-count":42,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad698","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,11,1]]},"article-number":"btad698"}}