{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T00:19:40Z","timestamp":1758845980376,"version":"3.44.0"},"reference-count":21,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"vor","delay-in-days":141,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["112-2221-E-008-062-MY3"],"award-info":[{"award-number":["112-2221-E-008-062-MY3"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["112-2221-E-008-062-MY3"],"award-info":[{"award-number":["112-2221-E-008-062-MY3"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.<\/jats:p>","DOI":"10.1093\/database\/baae127","type":"journal-article","created":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T11:35:38Z","timestamp":1747913738000},"source":"Crossref","is-referenced-by-count":0,"title":["Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach"],"prefix":"10.1093","volume":"2025","author":[{"given":"Wilailack","family":"Meesawad","sequence":"first","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli District, Taoyuan 320,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jen-Chieh","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli District, Taoyuan 320,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun-Yu","family":"Hsueh","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli District, Taoyuan 320,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4934-5873","authenticated-orcid":false,"given":"Yu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli District, Taoyuan 320,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hsi-Chuan","family":"Hung","sequence":"additional","affiliation":[{"name":"Department of Medical Research, Cathay General Hospital , No. 280, Sec. 4, Ren\u2019ai Rd., Da\u2019an Dist., Taipei 106,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0513-107X","authenticated-orcid":false,"given":"Richard Tzong-Han","family":"Tsai","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Central University , No. 300, Zhongda Rd., Zhongli District, Taoyuan 320,","place":["Taiwan"]},{"name":"Center for GIS, Research Center for Humanities and Social Sciences, Academia Sinica , 128 Academia Rd., Sec. 2, Nangang District, Taipei 115,","place":["Taiwan"]},{"name":"Department of Medical Research, Cathay General Hospital , No. 280, Sec. 4, Ren\u2019ai Rd., Da\u2019an Dist., Taipei 106,","place":["Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,5,22]]},"reference":[{"article-title":"BioCreative - BioCreative VIII Challenge and Workshop","year":"2023","author":"Arighi","key":"2025092510514297000_R1"},{"key":"2025092510514297000_R2","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac282","article-title":"BioRED: a rich biomedical relation extraction dataset","volume":"23","author":"Luo","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025092510514297000_R3","article-title":"Improving language understanding by generative pre- training","volume-title":"OpenAI","author":"Radford","year":"2018"},{"article-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)","year":"2019","author":"Devlin","key":"2025092510514297000_R4"},{"key":"2025092510514297000_R5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2022","journal-title":"ACM Trans Comput Healthcare"},{"key":"2025092510514297000_R6","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.20","article-title":"Exploiting cloze-questions for few-shot text classification and natural language inference","author":"Schick","year":"2021"},{"key":"2025092510514297000_R7","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.449","article-title":"Comparing prompt-based and standard fine-tuning for Urdu text classification","author":"Ullah","year":"2023"},{"key":"2025092510514297000_R8","first-page":"4523","article-title":"Exploring data augmentation strategies for hate speech detection in Roman Urdu","author":"Azam","year":"2022"},{"key":"2025092510514297000_R9","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2023.104487","article-title":"BioREx: improving biomedical relation extraction by leveraging heterogeneous datasets","volume":"146","author":"Lai","year":"2023","journal-title":"J Biomed Informat"},{"key":"2025092510514297000_R10","article-title":"A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation","volume":"244","author":"Khan","year":"2023","journal-title":"Expert Syst Appl"},{"key":"2025092510514297000_R11","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1016\/j.artmed.2004.07.016","article-title":"Comparative experiments on learning information extractors for proteins and their interactions","volume":"33","author":"Bunescu","year":"2005","journal-title":"Artif Intell Med"},{"key":"2025092510514297000_R12","first-page":"11","article-title":"Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene\/protein relations","author":"Miranda","year":"2021"},{"key":"2025092510514297000_R13","doi-asserted-by":"publisher","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","article-title":"The DDI corpus: an annotated corpus with pharmacological substances and drug\u2013drug interactions","volume":"46","author":"Herrero-Zazo","year":"2013","journal-title":"J Biomed Informat"},{"key":"2025092510514297000_R14","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1093\/bioinformatics\/btl616","article-title":"RelEx\u2014relation extraction using dependency parse trees","volume":"23","author":"Fundel","year":"2007","journal-title":"Bioinformatics"},{"key":"2025092510514297000_R15","doi-asserted-by":"publisher","DOI":"10.1093\/database\/baw032","article-title":"Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task","volume":"2016","author":"Wei","year":"2016","journal-title":"Database"},{"key":"2025092510514297000_R16","doi-asserted-by":"publisher","first-page":"408","DOI":"10.1093\/bioinformatics\/btq667","article-title":"Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature","volume":"27","author":"Doughty","year":"2011","journal-title":"Bioinformatics"},{"key":"2025092510514297000_R17","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1007\/978-1-62703-435-7_20","article-title":"PharmGKB: the pharmacogenomics knowledge base","volume":"1015","author":"Thorn","journal-title":"Pharmacogenomics: Methods and Protocols"},{"key":"2025092510514297000_R18","first-page":"D845","article-title":"The DisGeNET knowledge platform for disease genomics: 2019 update","volume":"48","author":"Pi\u00f1ero","year":"2020","journal-title":"J Nucleic Acids Res"},{"article-title":"Pubtator Central API - NCBI - NLM - NIH","year":"..","author":"U.S. National Library of Medicine","key":"2025092510514297000_R19"},{"article-title":"Proceedings of the BioCreative VIII challenge and workshop: curation and evaluation in the era of generative models","year":"2023","author":"Islamaj","key":"2025092510514297000_R20"},{"key":"2025092510514297000_R21","doi-asserted-by":"publisher","DOI":"10.1093\/database\/baae069","article-title":"The overview of the BioRED (Biomedical Relation Extraction Dataset) track at BioCreative VIII","volume":"2024","author":"Islamaj","year":"2024","journal-title":"Database"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baae127\/63295037\/baae127.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baae127\/63295037\/baae127.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T14:51:51Z","timestamp":1758811911000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baae127\/8140696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":21,"URL":"https:\/\/doi.org\/10.1093\/database\/baae127","relation":{},"ISSN":["1758-0463"],"issn-type":[{"type":"electronic","value":"1758-0463"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]},"article-number":"baae127"}}