{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T18:45:33Z","timestamp":1770489933291,"version":"3.49.0"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2019,6,27]],"date-time":"2019-06-27T00:00:00Z","timestamp":1561593600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Helmholtz Society"},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["LE 1428\/7-1"],"award-info":[{"award-number":["LE 1428\/7-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["GRK 1651\/2"],"award-info":[{"award-number":["GRK 1651\/2"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"crossref","award":["031L0023A"],"award-info":[{"award-number":["031L0023A"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Several recent studies showed that the application of deep neural networks advanced the state-of-the-art in named entity recognition (NER), including biomedical NER. However, the impact on performance and the robustness of improvements crucially depends on the availability of sufficiently large training corpora, which is a problem in the biomedical domain with its often rather small gold standard corpora.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We evaluate different methods for alleviating the data sparsity problem by pretraining a deep neural network (LSTM-CRF), followed by a rather short fine-tuning phase focusing on a particular corpus. Experiments were performed using 34 different corpora covering five different biomedical entity types, yielding an average increase in F1-score of \u223c2 pp compared to learning without pretraining. We experimented both with supervised and semi-supervised pretraining, leading to interesting insights into the precision\/recall trade-off. Based on our results, we created the stand-alone NER tool HUNER incorporating fully trained models for five entity types. On the independent CRAFT corpus, which was not used for creating HUNER, it outperforms the state-of-the-art tools GNormPlus and tmChem by 5\u201313 pp on the entity types chemicals, species and genes.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>HUNER is freely available at https:\/\/hu-ner.github.io. HUNER comes in containers, making it easy to install and use, and it can be applied off-the-shelf to arbitrary texts. We also provide an integrated tool for obtaining and converting all 34 corpora used in our evaluation, including fixed training, development and test splits to enable fair comparisons in the future.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz528","type":"journal-article","created":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T18:40:38Z","timestamp":1561488038000},"page":"295-302","source":"Crossref","is-referenced-by-count":48,"title":["HUNER: improving biomedical NER with pretraining"],"prefix":"10.1093","volume":"36","author":[{"given":"Leon","family":"Weber","sequence":"first","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"}]},{"given":"Jannes","family":"M\u00fcnchmeyer","sequence":"additional","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"},{"name":"Seismology Section, Helmholtzzentrum Potsdam, Deutsches GeoForschungsZentrum GFZ , Potsdam 14473, Germany"}]},{"given":"Tim","family":"Rockt\u00e4schel","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University College London , London WC1E 6BT, UK"}]},{"given":"Maryam","family":"Habibi","sequence":"additional","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"}]},{"given":"Ulf","family":"Leser","sequence":"additional","affiliation":[{"name":"Computer Science Department, Humboldt-Universit\u00e4t zu Berlin , Berlin 10099, Germany"}]}],"member":"286","published-online":{"date-parts":[[2019,6,27]]},"reference":[{"key":"2023013109500781100_btz528-B1","doi-asserted-by":"crossref","first-page":"e107477.","DOI":"10.1371\/journal.pone.0107477","article-title":"Annotated chemical patent corpus: a gold standard for text mining","volume":"9","author":"Akhondi","year":"2014","journal-title":"PLoS One"},{"key":"2023013109500781100_btz528-B2","doi-asserted-by":"crossref","first-page":"161.","DOI":"10.1186\/1471-2105-13-161","article-title":"Concept annotation in the craft corpus","volume":"13","author":"Bada","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023013109500781100_btz528-B3","doi-asserted-by":"crossref","first-page":"205","DOI":"10.12688\/f1000research.4591.2","article-title":"Detecting miRNA mentions and relations in biomedical literature","volume":"3","author":"Bagewadi","year":"2014","journal-title":"F1000Research"},{"key":"2023013109500781100_btz528-B4","first-page":"3079","author":"Dai","year":"2015"},{"key":"2023013109500781100_btz528-B5","author":"Devlin","year":"2019"},{"key":"2023013109500781100_btz528-B6","first-page":"326","author":"Ding","year":"2001"},{"key":"2023013109500781100_btz528-B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J. Biomed. Inf"},{"key":"2023013109500781100_btz528-B8","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/1471-2105-9-84","article-title":"Osirisv1.2: a named entity recognition system for sequence variants of genes in biomedical literature","volume":"9","author":"Furlong","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013109500781100_btz528-B9","doi-asserted-by":"crossref","first-page":"85.","DOI":"10.1186\/1471-2105-11-85","article-title":"Linnaeus: a species name identification system for biomedical literature","volume":"11","author":"Gerner","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023013109500781100_btz528-B10","doi-asserted-by":"crossref","first-page":"4087","DOI":"10.1093\/bioinformatics\/bty449","article-title":"Transfer learning for biomedical named entity recognition with neural networks","volume":"34","author":"Giorgi","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013109500781100_btz528-B11","first-page":"A4.","author":"Goldberg","year":"2015"},{"key":"2023013109500781100_btz528-B12","author":"Gurulingappa","year":"2010"},{"key":"2023013109500781100_btz528-B13","doi-asserted-by":"crossref","first-page":"i37","DOI":"10.1093\/bioinformatics\/btx228","article-title":"Deep learning with word embeddings improves biomedical named entity recognition","volume":"33","author":"Habibi","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013109500781100_btz528-B14","first-page":"235","author":"Hahn","year":"2010"},{"key":"2023013109500781100_btz528-B15","first-page":"102","author":"Hakala","year":"2016"},{"key":"2023013109500781100_btz528-B16","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023013109500781100_btz528-B17","first-page":"328","author":"Howard","year":"2018"},{"key":"2023013109500781100_btz528-B18","author":"Huang","year":"2015"},{"key":"2023013109500781100_btz528-B19","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1093\/bioinformatics\/btv570","article-title":"Cell line name recognition in support of the identification of synthetic lethality in cancer from text","volume":"32","author":"Kaewphan","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013109500781100_btz528-B20","first-page":"2923","author":"Kafkas","year":"2012"},{"key":"2023013109500781100_btz528-B21","first-page":"70","author":"Kim","year":"2004"},{"key":"2023013109500781100_btz528-B22","author":"Kol\u00e1rik","year":"2008"},{"key":"2023013109500781100_btz528-B23","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1758-2946-7-S1-S1","article-title":"CHEMDNER: the drugs and chemical names extraction challenge","volume":"7","author":"Krallinger","year":"2015","journal-title":"J. Cheminf"},{"key":"2023013109500781100_btz528-B24","first-page":"63","author":"Krallinger","year":"2015"},{"key":"2023013109500781100_btz528-B25","first-page":"282","author":"Lafferty","year":"2001"},{"key":"2023013109500781100_btz528-B26","author":"Lample","year":"2016"},{"key":"2023013109500781100_btz528-B27","author":"Leaman","year":"2009"},{"key":"2023013109500781100_btz528-B28","doi-asserted-by":"crossref","first-page":"S3.","DOI":"10.1186\/1758-2946-7-S1-S3","article-title":"tmChem: a high performance approach for chemical named entity recognition and normalization","volume":"7","author":"Leaman","year":"2015","journal-title":"J. Cheminf"},{"key":"2023013109500781100_btz528-B29","first-page":"2016","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","author":"Li","year":"2016","journal-title":"Database"},{"key":"2023013109500781100_btz528-B30","first-page":"3111","author":"Mikolov","year":"2013"},{"key":"2023013109500781100_btz528-B31","author":"Min","year":"2017"},{"key":"2023013109500781100_btz528-B32","first-page":"16","author":"Neves","year":"2012"},{"key":"2023013109500781100_btz528-B33","doi-asserted-by":"crossref","first-page":"e65390","DOI":"10.1371\/journal.pone.0065390","article-title":"The species and organisms resources for fast and accurate identification of taxonomic names in text","volume":"8","author":"Pafilis","year":"2013","journal-title":"PLoS One"},{"key":"2023013109500781100_btz528-B34","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowledge Data Eng"},{"key":"2023013109500781100_btz528-B35","author":"Peters","year":"2018"},{"key":"2023013109500781100_btz528-B36","doi-asserted-by":"crossref","first-page":"50.","DOI":"10.1186\/1471-2105-8-50","article-title":"Bioinfer: a corpus for information extraction in the biomedical domain","volume":"8","author":"Pyysalo","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013109500781100_btz528-B37","first-page":"39","author":"Pyysalo","year":"2013"},{"key":"2023013109500781100_btz528-B38","author":"Ramachandran","year":"2017"},{"key":"2023013109500781100_btz528-B39","doi-asserted-by":"crossref","first-page":"S2.","DOI":"10.1186\/gb-2008-9-s2-s2","article-title":"Overview of BioCreative II gene mention recognition","volume":"9","author":"Smith","year":"2008","journal-title":"Genome Biol"},{"key":"2023013109500781100_btz528-B40","doi-asserted-by":"crossref","first-page":"W585","DOI":"10.1093\/nar\/gks563","article-title":"Geneview: a comprehensive semantic search engine for pubmed","volume":"40","author":"Thomas","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023013109500781100_btz528-B41","doi-asserted-by":"crossref","first-page":"e1000837.","DOI":"10.1371\/journal.pcbi.1000837","article-title":"A comprehensive benchmark of kernel methods to extract protein\u2013protein interactions from literature","volume":"6","author":"Tikk","year":"2010","journal-title":"PLoS Comput. Biol"},{"key":"2023013109500781100_btz528-B42","first-page":"142","volume-title":"Proceedings of CoNLL-2003","author":"Tjong Kim Sang","year":"2003"},{"key":"2023013109500781100_btz528-B43","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bat019","article-title":"Annotating the biomedical literature for the human variome","volume":"2013","author":"Verspoor","year":"2013","journal-title":"Database"},{"key":"2023013109500781100_btz528-B44","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1093\/bioinformatics\/btq002","article-title":"Disambiguating the species of biomedical named entities using natural language parsers","volume":"26","author":"Wang","year":"2010","journal-title":"Bioinformatics"},{"key":"2023013109500781100_btz528-B45","doi-asserted-by":"crossref","first-page":"W518","DOI":"10.1093\/nar\/gkt441","article-title":"PubTator: a web-based text mining tool for assisting biocuration","volume":"41","author":"Wei","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023013109500781100_btz528-B46","first-page":"1","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"BioMed Res. Int"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz528\/29192377\/btz528.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/295\/48981302\/bioinformatics_36_1_295.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/1\/295\/48981302\/bioinformatics_36_1_295.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,17]],"date-time":"2023-09-17T19:14:48Z","timestamp":1694978088000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/1\/295\/5523847"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,6,27]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz528","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,1,1]]},"published":{"date-parts":[[2019,6,27]]}}}