{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T08:47:43Z","timestamp":1770972463955,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T00:00:00Z","timestamp":1657843200000},"content-version":"vor","delay-in-days":195,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology, Taiwan","doi-asserted-by":"publisher","award":["MOST 109-2410-H-038-012-MY2"],"award-info":[{"award-number":["MOST 109-2410-H-038-012-MY2"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ensemble learning approach to integrate the advantages of various models to improve the system\u2019s performance. The experimental results of the NLM-CHEM track demonstrate that our method can achieve remarkable performance, with F1-scores of 85% and 91.8% in strict and approximate evaluations, respectively. Moreover, the proposed Medical Subject Headings identifier (MeSH ID) normalization algorithm is effective in entity normalization, which achieved a F1-score of about 80% in both strict and approximate evaluations. For the LitCovid track, the proposed method is also effective in detecting topics in the Coronavirus disease 2019 (COVID-19) literature, which outperformed the compared methods and achieve state-of-the-art performance in the LitCovid corpus.<\/jats:p><jats:p>Database URL: https:\/\/www.ncbi.nlm.nih.gov\/research\/coronavirus\/.<\/jats:p>","DOI":"10.1093\/database\/baac056","type":"journal-article","created":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T14:08:58Z","timestamp":1658153338000},"source":"Crossref","is-referenced-by-count":8,"title":["A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles"],"prefix":"10.1093","volume":"2022","author":[{"given":"Sheng-Jie","family":"Lin","sequence":"first","affiliation":[{"name":"Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, D\u00e1an District , Taipei City 106, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5227-0120","authenticated-orcid":false,"given":"Wen-Chao","family":"Yeh","sequence":"additional","affiliation":[{"name":"Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Guangfu Rd, East District , Hsinchu City 300, Taiwan"}]},{"given":"Yu-Wen","family":"Chiu","sequence":"additional","affiliation":[{"name":"Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, D\u00e1an District , Taipei City 106, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9634-8380","authenticated-orcid":false,"given":"Yung-Chun","family":"Chang","sequence":"additional","affiliation":[{"name":"Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, D\u00e1an District , Taipei City 106, Taiwan"},{"name":"Clinical Big Data Research Center, Taipei Medical University Hospital, No. 172-1, Section 2, Keelung Rd, D\u00e1an District , Taipei City 106, Taiwan"},{"name":"Pervasive AI Research Labs, Ministry of Science and Technology, No. 1001, Daxue Rd, East District , Hsinchu City 300, Taiwan"}]},{"given":"Min-Huei","family":"Hsu","sequence":"additional","affiliation":[{"name":"Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, D\u00e1an District , Taipei City 106, Taiwan"}]},{"given":"Yi-Shin","family":"Chen","sequence":"additional","affiliation":[{"name":"Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Guangfu Rd, East District , Hsinchu City 300, Taiwan"}]},{"given":"Wen-Lian","family":"Hsu","sequence":"additional","affiliation":[{"name":"Pervasive AI Research Labs, Ministry of Science and Technology, No. 1001, Daxue Rd, East District , Hsinchu City 300, Taiwan"},{"name":"Department of Computer Science and Information Engineering, Asia University, No. 500, Liufeng Rd, Wufeng District , Taichung City 413, Taiwan"}]}],"member":"286","published-online":{"date-parts":[[2022,7,15]]},"reference":[{"key":"2022071814082855900_R1","article-title":"The ai index 2021 annual report","volume-title":"arXiv preprint arXiv:2103.06312","author":"Zhang","year":"2021"},{"key":"2022071814082855900_R2","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1007\/978-1-4614-3223-4_12","volume-title":"Mining Text Data","author":"Hu","year":"2012"},{"key":"2022071814082855900_R3","first-page":"65","article-title":"Text mining: the state of the art and the challenges","author":"Tan","year":"1999"},{"key":"2022071814082855900_R4","volume-title":"Foundations of Statistical Natural Language Processing","author":"Manning","year":"1999"},{"key":"2022071814082855900_R5","article-title":"Natural language processing advancements by deep learning: a survey","author":"Torfi","year":"2020"},{"key":"2022071814082855900_R6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3434237","article-title":"A comprehensive survey on word representation models: from classical to state-of-the-art word representation language models","volume":"20","author":"Naseem","year":"2021","journal-title":"Transactions on Asian and Low-Resource Language Information Processing"},{"key":"2022071814082855900_R7","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.28801","article-title":"Cutting edge: towards PubMed 2.0","volume":"6","author":"Fiorini","year":"2017","journal-title":"Elife"},{"key":"2022071814082855900_R8","first-page":"76","article-title":"A comparison between named entity recognition models in the biomedical domain","author":"Cariello","year":"2021"},{"key":"2022071814082855900_R9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-018-0313-8","article-title":"Chemlistem: chemical named entity recognition using recurrent neural networks","volume":"10","author":"Corbett","year":"2018","journal-title":"J. Cheminform."},{"key":"2022071814082855900_R10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-3393-1","article-title":"DTranNER: biomedical named entity recognition with deep learning-based label-label transition model","volume":"21","author":"Hong","year":"2020","journal-title":"BMC Bioinform."},{"key":"2022071814082855900_R11","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw101","article-title":"PIPE: a protein\u2013protein interaction passage extraction module for BioCreative challenge","volume":"2016","author":"Chang","year":"2016","journal-title":"Database"},{"key":"2022071814082855900_R12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-2884-4","article-title":"Chemical-induced disease relation extraction via attention-based distant supervision","volume":"20","author":"Gu","year":"2019","journal-title":"BMC Bioinform."},{"key":"2022071814082855900_R13","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw032","article-title":"Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task","volume":"2016","author":"Wei","year":"2016","journal-title":"Database"},{"key":"2022071814082855900_R14","article-title":"BioCreative V CDR task corpus: a resource for chemical disease relation extraction","volume":"2016","author":"Li","year":"2016","journal-title":"Database"},{"key":"2022071814082855900_R15","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw048","article-title":"Exploiting syntactic and semantics information for chemical\u2013disease relation extraction","volume":"2016","author":"Zhou","year":"2016","journal-title":"Database"},{"key":"2022071814082855900_R16","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bax024","article-title":"Chemical-induced disease relation extraction via convolutional neural network","volume":"2017","author":"Gu","year":"2017","journal-title":"Database"},{"key":"2022071814082855900_R17","first-page":"221","article-title":"BioM-transformers: building large biomedical language models with BERT, ALBERT and ELECTRA","author":"Alrowili","year":"2021"},{"key":"2022071814082855900_R18","article-title":"Electra: pre-training text encoders as discriminators rather than generators","author":"Clark","year":"2020"},{"key":"2022071814082855900_R19","doi-asserted-by":"crossref","DOI":"10.2196\/19276","article-title":"Mining physicians\u2019 opinions on social media to obtain insights into COVID-19: mixed methods analysis","volume":"6","author":"Wahbeh","year":"2020","journal-title":"JMIR Public Health Surveillance"},{"key":"2022071814082855900_R20","article-title":"Modeling spatiotemporal pattern of depressive symptoms caused by COVID-19 using social media data mining","volume":"17","author":"Li","year":"2020","journal-title":"Int. J. Environ. Res. Public Health"},{"key":"2022071814082855900_R21","doi-asserted-by":"crossref","first-page":"D1534","DOI":"10.1093\/nar\/gkaa952","article-title":"LitCovid: an open database of COVID-19 literature","volume":"49","author":"Chen","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"2022071814082855900_R22","article-title":"Google\u2019s neural machine translation system: bridging the gap between human and machine translation","author":"Wu","year":"2016"},{"key":"2022071814082855900_R23","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process Syst."},{"key":"2022071814082855900_R24","article-title":"Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities","volume":"18","author":"Nielsen","year":"2016","journal-title":"CoRR"},{"key":"2022071814082855900_R25","first-page":"766","article-title":"Evaluating pretrained transformer-based models for COVID-19 fake news detection","author":"Hande","year":"2021"},{"key":"2022071814082855900_R26","first-page":"3265","article-title":"Improving Tuberculosis (TB) Prediction using Synthetically Generated Computed Tomography (CT) Images","author":"Lewis","year":"2021"},{"key":"2022071814082855900_R27","first-page":"1034","article-title":"Dgc-net: Dense geometric correspondence network","author":"Melekhov","year":"2019"},{"key":"2022071814082855900_R28","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"2022071814082855900_R29","first-page":"1","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans. Comput. Healthcare (HEALTH)"},{"key":"2022071814082855900_R30","first-page":"4171","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019"},{"key":"2022071814082855900_R31","article-title":"Albert: A lite bert for self-supervised learning of language representations","author":"Lan","year":"2019"},{"key":"2022071814082855900_R32","article-title":"Overview of the BioCreative VII LitCovid track: multi-label topic classification for COVID-19 literature annotation","author":"Chen","year":"2021"},{"key":"2022071814082855900_R33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-s2-s2","article-title":"Overview of BioCreative II gene mention recognition","volume":"9","author":"Smith","year":"2008","journal-title":"Genome Biol."},{"key":"2022071814082855900_R34","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Soviet Physics Doklady"},{"key":"2022071814082855900_R35","article-title":"The chemical corpus of the NLM-Chem BioCreative VII track","author":"Islamaj"},{"key":"2022071814082855900_R36","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1109\/TKDE.2013.39","article-title":"A review on multi-label learning algorithms","volume":"26","author":"Zhang","year":"2013","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"2022071814082855900_R37","article-title":"Decoupled weight decay regularization","author":"Loshchilov","year":"2017"},{"key":"2022071814082855900_R38","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/W19-5006","article-title":"Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets","author":"Peng","year":"2019"},{"key":"2022071814082855900_R39","doi-asserted-by":"crossref","first-page":"1279","DOI":"10.1093\/jamia\/ocz085","article-title":"ML-Net: multi-label classification of biomedical texts with deep neural networks","volume":"26","author":"Du","year":"2019","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2022071814082855900_R40","article-title":"Improving tagging consistency and entity coverage for chemical identification in full-text articles","author":"Kim","year":"2021"},{"key":"2022071814082855900_R41","article-title":"Team bioformer at BioCreative VII LitCovid track: multic-label topic classification for COVID-19 literature with a compact BERT model","author":"Fang"},{"key":"2022071814082855900_R42","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1146\/annurev-biodatasci-021821-061045","article-title":"Artificial intelligence in action: addressing the COVID-19 pandemic with natural language processing","volume":"4","author":"Chen","year":"2021","journal-title":"Annu. Rev. Biomed. Data Sci."},{"key":"2022071814082855900_R43","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1093\/oxfordjournals.pan.a004868","article-title":"Logistic regression in rare events data","volume":"9","author":"King","year":"2001","journal-title":"Political Anal."}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baac056\/44932579\/baac056.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baac056\/44932579\/baac056.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,11]],"date-time":"2023-02-11T21:46:59Z","timestamp":1676152019000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baac056\/6645124"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,1]]},"references-count":43,"URL":"https:\/\/doi.org\/10.1093\/database\/baac056","relation":{},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,1,1]]},"published":{"date-parts":[[2022,1,1]]},"article-number":"baac056"}}