{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T04:44:00Z","timestamp":1774327440251,"version":"3.50.1"},"reference-count":66,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2024,8,28]],"date-time":"2024-08-28T00:00:00Z","timestamp":1724803200000},"content-version":"vor","delay-in-days":240,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,8,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Biomedical relation extraction from scientific publications is a key task in biomedical natural language processing (NLP) and can facilitate the creation of large knowledge bases, enable more efficient knowledge discovery, and accelerate evidence synthesis. In this paper, building upon our previous effort in the BioCreative VIII BioRED Track, we propose an enhanced end-to-end pipeline approach for biomedical relation extraction (RE) and novelty detection (ND) that effectively leverages existing datasets and integrates state-of-the-art deep learning methods. Our pipeline consists of four tasks performed sequentially: named entity recognition (NER), entity linking (EL), RE, and ND. We trained models using the BioRED benchmark corpus that was the basis of the shared task. We explored several methods for each task and combinations thereof: for NER, we compared a BERT-based sequence labeling model that uses the BIO scheme with a span classification model. For EL, we trained a convolutional neural network model for diseases and chemicals and used an existing tool, PubTator 3.0, for mapping other entity types. For RE and ND, we adapted the BERT-based, sentence-bound PURE model to bidirectional and document-level extraction. We also performed extensive hyperparameter tuning to improve model performance. We obtained our best performance using BERT-based models for NER, RE, and ND, and the hybrid approach for EL. Our enhanced and optimized pipeline showed substantial improvement compared to our shared task submission, NER: 93.53 (+3.09), EL: 83.87 (+9.73), RE: 46.18 (+15.67), and ND: 38.86 (+14.9). While the performances of the NER and EL models are reasonably high, RE and ND tasks remain challenging at the document level. Further enhancements to the dataset could enable more accurate and useful models for practical use. We provide our models and code at https:\/\/github.com\/janinaj\/e2eBioMedRE\/.<\/jats:p>\n               <jats:p>Database URL: https:\/\/github.com\/janinaj\/e2eBioMedRE\/<\/jats:p>","DOI":"10.1093\/database\/baae079","type":"journal-article","created":{"date-parts":[[2024,8,28]],"date-time":"2024-08-28T20:19:20Z","timestamp":1724876360000},"source":"Crossref","is-referenced-by-count":4,"title":["Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach"],"prefix":"10.1093","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0991-4429","authenticated-orcid":false,"given":"M Janina","family":"Sarol","sequence":"first","affiliation":[{"name":"Informatics Programs, University of Illinois Urbana-Champaign , 614 E Daniel Street, Champaign, IL 61820, United States"}]},{"given":"Gibong","family":"Hong","sequence":"additional","affiliation":[{"name":"School of Information Sciences, University of Illinois Urbana-Champaign , 501 E Daniel Street, Champaign, IL 61820, United States"}]},{"given":"Evan","family":"Guerra","sequence":"additional","affiliation":[{"name":"School of Information Sciences, University of Illinois Urbana-Champaign , 501 E Daniel Street, Champaign, IL 61820, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3987-9393","authenticated-orcid":false,"given":"Halil","family":"Kilicoglu","sequence":"additional","affiliation":[{"name":"School of Information Sciences, University of Illinois Urbana-Champaign , 501 E Daniel Street, Champaign, IL 61820, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,8,28]]},"reference":[{"key":"2024082908253601200_R1","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1038\/nrg1768","article-title":"Literature mining for the biologist: from information retrieval to biological discovery","volume":"7","author":"Jensen","year":"2006","journal-title":"Nat Rev Genet"},{"key":"2024082908253601200_R2","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbaa057","article-title":"Recent advances in biomedical literature mining","volume":"22","author":"Zhao","year":"2021","journal-title":"Briefings Bioinf"},{"key":"2024082908253601200_R3","doi-asserted-by":"crossref","DOI":"10.1186\/s13326-022-00280-6","article-title":"We are not ready yet: limitations of state-of-the-art disease named entity recognizers","volume":"13","author":"K\u00fchnel","year":"2022","journal-title":"J Biomed Semant"},{"key":"2024082908253601200_R4","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-6-S1-S1","article-title":"Overview of BioCreAtIvE: critical assessment of information extraction for biology","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinf"},{"key":"2024082908253601200_R5","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baac069","article-title":"Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations","volume":"2022","author":"Chen","year":"2022","journal-title":"Database"},{"key":"2024082908253601200_R6","first-page":"1","article-title":"Overview of BioNLP\u201909 shared task on event extraction","author":"Kim","year":"2009"},{"key":"2024082908253601200_R7","first-page":"1","article-title":"Overview of BioNLP shared task 2013","author":"N\u00e9dellec","year":"2013"},{"key":"2024082908253601200_R8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-s2-s2","article-title":"Overview of BioCreative II gene mention recognition","volume":"9","author":"Smith","year":"2008","journal-title":"Genome Biol"},{"key":"2024082908253601200_R9","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw032","article-title":"Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task","volume":"2016","author":"Wei","year":"2016","journal-title":"Database"},{"key":"2024082908253601200_R10","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac282","article-title":"BioRED: a rich biomedical relation extraction dataset","volume":"23","author":"Luo","year":"2022","journal-title":"Briefings Bioinf"},{"key":"2024082908253601200_R11","doi-asserted-by":"crossref","first-page":"W518","DOI":"10.1093\/nar\/gkt441","article-title":"PubTator: a web-based text mining tool for assisting biocuration","volume":"41","author":"Wei","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R12","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1007\/s40264-014-0218-z","article-title":"Text mining for adverse drug events: the promise, challenges, and state of the art","volume":"37","author":"Harpaz","year":"2014","journal-title":"Drug Safety"},{"key":"2024082908253601200_R13","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.jbi.2017.08.011","article-title":"Literature based discovery: models, methods, and trends","volume":"74","author":"Henry","year":"2017","journal-title":"J Biomed Informat"},{"key":"2024082908253601200_R14","article-title":"UIUC-BioNLP @ BioCreative VIII BioRED Track","author":"Sarol","year":"2023"},{"key":"2024082908253601200_R15","doi-asserted-by":"crossref","first-page":"1631","DOI":"10.18653\/v1\/2021.findings-emnlp.140","article-title":"BERT might be overkill: a tiny but effective biomedical entity linker based on residual convolutional neural networks","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2021","author":"Lai","year":"2021"},{"key":"2024082908253601200_R16","doi-asserted-by":"crossref","first-page":"W540","DOI":"10.1093\/nar\/gkae235","article-title":"PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge","volume":"52","author":"Wei","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R17","first-page":"50","article-title":"A frustratingly easy approach for entity and relation extraction","author":"Zhong","year":"2021"},{"key":"2024082908253601200_R18","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R19","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1136\/jamia.2009.002733","article-title":"An overview of MetaMap: historical perspective and recent advances","volume":"17","author":"Aronson","year":"2010","journal-title":"J Am Med Inf Assoc"},{"key":"2024082908253601200_R20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-3517-7","article-title":"Broad-coverage biomedical relation extraction with SemRep","volume":"21","author":"Kilicoglu","year":"2020","journal-title":"BMC Bioinf"},{"key":"2024082908253601200_R21","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans Comput Healthc"},{"key":"2024082908253601200_R23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J Biomed Informat"},{"key":"2024082908253601200_R24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1758-2946-7-S1-S1","article-title":"CHEMDNER: the drugs and chemical names extraction challenge","volume":"7","author":"Krallinger","year":"2015","journal-title":"J Cheminf"},{"key":"2024082908253601200_R25","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database"},{"key":"2024082908253601200_R26","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","article-title":"The DDI corpus: an annotated corpus with pharmacological substances and drug\u2013drug interactions","volume":"46","author":"Herrero-Zazo","year":"2013","journal-title":"J Biomed Informat"},{"key":"2024082908253601200_R27","doi-asserted-by":"crossref","first-page":"4497","DOI":"10.18653\/v1\/2022.findings-emnlp.329","article-title":"Thinking about GPT-3 in-context learning for biomedical IE? Think again","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2022","author":"Jimenez Gutierrez","year":"2022"},{"key":"2024082908253601200_R28","article-title":"Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations","author":"Chen","year":"2023","journal-title":"arXiv"},{"key":"2024082908253601200_R29","first-page":"15566","article-title":"Revisiting relation extraction in the era of large language models","author":"Wadhwa","year":"2023"},{"key":"2024082908253601200_R30","first-page":"5784","article-title":"Entity, relation, and event extraction with contextualized span representations","author":"Wadden","year":"2019"},{"key":"2024082908253601200_R31","first-page":"8003","article-title":"LinkBERT: pretraining language models with document links","author":"Yasunaga","year":"2022"},{"key":"2024082908253601200_R32","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad310","article-title":"AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning","volume":"39","author":"Luo","year":"2023","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-017-1776-8","article-title":"A neural network multi-task learning approach to biomedical named entity recognition","volume":"18","author":"Crichton","year":"2017","journal-title":"BMC Bioinf"},{"key":"2024082908253601200_R34","first-page":"141","article-title":"Overview of the BioCreative VI chemical-protein interaction track","author":"Krallinger","year":"2017"},{"key":"2024082908253601200_R35","doi-asserted-by":"crossref","DOI":"10.1016\/j.jbi.2023.104487","article-title":"BioREx: improving biomedical relation extraction by leveraging heterogeneous datasets","volume":"146","author":"Lai","year":"2023","journal-title":"J Biomed Informat"},{"key":"2024082908253601200_R36","first-page":"2006","article-title":"Span-based joint entity and relation extraction with transformer pre-training","author":"Eberts","year":"2020"},{"key":"2024082908253601200_R37","doi-asserted-by":"crossref","DOI":"10.1016\/j.jbi.2021.103968","article-title":"An attentive joint model with transformer-based weighted graph convolutional network for extracting adverse drug event relation","volume":"125","author":"El-Allaly","year":"2022","journal-title":"J Biomed Informat"},{"key":"2024082908253601200_R38","doi-asserted-by":"crossref","DOI":"10.1016\/j.jbi.2022.104252","article-title":"An overview of biomedical entity linking throughout the years","volume":"137","author":"French","year":"2023","journal-title":"J Biomed Informat"},{"key":"2024082908253601200_R39","first-page":"297","article-title":"Sieve-based entity linking for the biomedical domain","author":"D\u2019Souza","year":"2015"},{"key":"2024082908253601200_R40","first-page":"568","article-title":"Towards a semantic lexicon for clinical natural language processing","volume":"2012","author":"Liu","year":"2012","journal-title":"AMIA Annu Symp Proc"},{"key":"2024082908253601200_R41","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1093\/bioinformatics\/btt474","article-title":"DNorm: disease name normalization with pairwise learning to rank","volume":"29","author":"Leaman","year":"2013","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R42","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1186\/s12859-017-1805-7","article-title":"CNN-based ranking for biomedical entity normalization","volume":"18","author":"Li","year":"2017","journal-title":"BMC Bioinf"},{"key":"2024082908253601200_R43","first-page":"3275","article-title":"Robust representation learning of biomedical names","author":"Phan","year":"2019"},{"key":"2024082908253601200_R44","first-page":"3641","article-title":"Biomedical entity representations with synonym marginalization","author":"Sung","year":"2020"},{"key":"2024082908253601200_R45","first-page":"4228","article-title":"Self-alignment pretraining for biomedical entity representations","author":"Liu","year":"2021"},{"key":"2024082908253601200_R46","doi-asserted-by":"crossref","first-page":"W587","DOI":"10.1093\/nar\/gkz389","article-title":"PubTator central: automated concept annotation for biomedical full text articles","volume":"47","author":"Wei","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R47","article-title":"The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop","volume-title":"Zenodo","author":"Islamaj","year":"2023"},{"key":"2024082908253601200_R48","doi-asserted-by":"crossref","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"Biomed Res Int"},{"key":"2024082908253601200_R49","doi-asserted-by":"crossref","DOI":"10.1038\/s41597-021-00875-1","article-title":"NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature","volume":"8","author":"Islamaj","year":"2021","journal-title":"Sci Data"},{"key":"2024082908253601200_R50","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0065390","article-title":"The species and organisms resources for fast and accurate identification of taxonomic names in text","volume":"8","author":"Pafilis","year":"2013","journal-title":"PLoS One"},{"key":"2024082908253601200_R51","article-title":"Bio-ID track overview","author":"Arighi","year":"2017"},{"key":"2024082908253601200_R52","doi-asserted-by":"crossref","first-page":"4449","DOI":"10.1093\/bioinformatics\/btac537","article-title":"tmVar 3.0: an improved variant concept recognition and normalization tool","volume":"38","author":"Wei","year":"2022","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R53","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-11-85","article-title":"Linnaeus: a species name identification system for biomedical literature","volume":"11","author":"Gerner","year":"2010","journal-title":"BMC Bioinf"},{"key":"2024082908253601200_R54","doi-asserted-by":"crossref","first-page":"4837","DOI":"10.1093\/bioinformatics\/btac598","article-title":"BERN2: an advanced neural biomedical named entity recognition and normalization tool","volume":"38","author":"Sung","year":"2022","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R55","first-page":"1746","article-title":"Convolutional neural networks for sentence classification","author":"Kim","year":"2014"},{"key":"2024082908253601200_R56","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bar065","article-title":"MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database","volume":"2012","author":"Davis","year":"2012","journal-title":"Database"},{"key":"2024082908253601200_R57","doi-asserted-by":"crossref","first-page":"D1257","DOI":"10.1093\/nar\/gkac833","article-title":"Comparative toxicogenomics database (CTD): update 2023","volume":"51","author":"Davis","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R58","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad599","article-title":"GNorm2: an improved gene name recognition and normalization system","volume":"39","author":"Wei","year":"2023","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R59","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1093\/bioinformatics\/btw343","article-title":"TaggerOne: joint named entity recognition and normalization with semi-Markov models","volume":"32","author":"Leaman","year":"2016","journal-title":"Bioinformatics"},{"key":"2024082908253601200_R60","first-page":"2","article-title":"Towards deep learning models resistant to adversarial attacks","author":"Madry","year":"2018"},{"key":"2024082908253601200_R61","first-page":"3693","article-title":"Document-level n-ary relation extraction with multiscale representation learning","author":"Jia","year":"2019"},{"key":"2024082908253601200_R62","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gku1055","article-title":"Gene: a gene-centered information resource at NCBI","volume":"43","author":"Brown","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R63","article-title":"Medical subject headings (MeSH)","volume":"88","author":"Lipscomb","year":"2000","journal-title":"Bulletin Med Libr Assoc"},{"key":"2024082908253601200_R64","doi-asserted-by":"crossref","first-page":"D136","DOI":"10.1093\/nar\/gkr1178","article-title":"The NCBI taxonomy database","volume":"40","author":"Federhen","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R65","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2024082908253601200_R66","doi-asserted-by":"crossref","DOI":"10.7171\/jbt.18-2902-002","article-title":"The Cellosaurus, a cell-line knowledge resource","volume":"29","author":"Bairoch","year":"2018","journal-title":"J Biomol Techniques: JBT"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baae079\/58961353\/baae079.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baae079\/58961353\/baae079.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T12:10:54Z","timestamp":1724933454000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baae079\/7743272"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":66,"URL":"https:\/\/doi.org\/10.1093\/database\/baae079","relation":{},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]},"article-number":"baae079"}}