{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:11:03Z","timestamp":1772165463134,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,4,4]],"date-time":"2022-04-04T00:00:00Z","timestamp":1649030400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,4]],"date-time":"2022-04-04T00:00:00Z","timestamp":1649030400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Recently, automatically extracting biomedical relations has been a significant subject in biomedical research due to the rapid growth of biomedical literature. Since the adaptation to the biomedical domain, the transformer-based BERT models have produced leading results on many biomedical natural language processing tasks. In this work, we will explore the approaches to improve the BERT model for relation extraction tasks in both the pre-training and fine-tuning stages of its applications. In the pre-training stage, we add another level of BERT adaptation on sub-domain data to bridge the gap between domain knowledge and task-specific knowledge. Also, we propose methods to incorporate the ignored knowledge in the last layer of BERT to improve its fine-tuning.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The experiment results demonstrate that our approaches for pre-training and fine-tuning can improve the BERT model performance. After combining the two proposed techniques, our approach outperforms the original BERT models with averaged F1 score improvement of 2.1% on relation extraction tasks. Moreover, our approach achieves state-of-the-art performance on three relation extraction benchmark datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>The extra pre-training step on sub-domain data can help the BERT model generalization on specific tasks, and our proposed fine-tuning mechanism could utilize the knowledge in the last layer of BERT to boost the model performance. Furthermore, the combination of these two approaches further improves the performance of BERT model on the relation extraction tasks.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-022-04642-w","type":"journal-article","created":{"date-parts":[[2022,4,4]],"date-time":"2022-04-04T04:05:54Z","timestamp":1649045154000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":48,"title":["Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction"],"prefix":"10.1186","volume":"23","author":[{"given":"Peng","family":"Su","sequence":"first","affiliation":[]},{"given":"K.","family":"Vijay-Shanker","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,4,4]]},"reference":[{"issue":"2","key":"4642_CR1","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/gb-2008-9-s2-s4","volume":"9","author":"M Krallinger","year":"2008","unstructured":"Krallinger M, Leitner F, Rodriguez-Penagos C, Valencia A. Overview of the protein\u2013protein interaction annotation extraction task of biocreative II. Genome Biol. 2008;9(2):4.","journal-title":"Genome Biol"},{"issue":"5","key":"4642_CR2","doi-asserted-by":"publisher","first-page":"914","DOI":"10.1016\/j.jbi.2013.07.011","volume":"46","author":"M Herrero-Zazo","year":"2013","unstructured":"Herrero-Zazo M, Segura-Bedmar I, Mart\u00ednez P, Declerck T. The DDI corpus: fan annotated corpus with pharmacological substances and drug\u2013drug interactions. J Biomed inform. 2013;46(5):914\u201320.","journal-title":"J Biomed inform"},{"key":"4642_CR3","unstructured":"Krallinger M, Rabal O, Akhondi SA, et al. Overview of the biocreative vi chemical\u2013protein interaction track. In: Proceedings of the sixth biocreative challenge evaluation workshop, vol. 1. 2017, pp. 141\u20136."},{"key":"4642_CR4","volume-title":"Handbook of knowledge representation","author":"F Van Harmelen","year":"2008","unstructured":"Van Harmelen F, Lifschitz V, Porter B. Handbook of knowledge representation. Amsterdam: Elsevier; 2008."},{"key":"4642_CR5","doi-asserted-by":"crossref","unstructured":"Macherey K, Och FJ, Ney H. Natural language understanding using statistical machine translation. In: Seventh European conference on speech communication and technology. 2001.","DOI":"10.21437\/Eurospeech.2001-520"},{"issue":"4","key":"4642_CR6","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1017\/S1351324901002807","volume":"7","author":"L Hirschman","year":"2001","unstructured":"Hirschman L, Gaizauskas R. Natural language question answering: the view from here. Nat Lang Eng. 2001;7(4):275.","journal-title":"Nat Lang Eng"},{"key":"4642_CR7","doi-asserted-by":"crossref","unstructured":"Culotta A, Sorensen J. Dependency tree kernels for relation extraction. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04). 2004, pp. 423\u20139.","DOI":"10.3115\/1218955.1219009"},{"key":"4642_CR8","doi-asserted-by":"crossref","unstructured":"Sierra G, Alarc\u00f3n R, Aguilar C, Bach C. Definitional verbal patterns for semantic relation extraction. Terminology. Int J Theor Appl Issues Spec Commun. 2008;14(1):74\u201398.","DOI":"10.1075\/term.14.1.05sie"},{"key":"4642_CR9","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.jbi.2018.08.005","volume":"86","author":"SK Sahu","year":"2018","unstructured":"Sahu SK, Anand A. Drug-drug interaction extraction from biomedical texts using long short-term memory network. J Biomed Inform. 2018;86:15\u201324.","journal-title":"J Biomed Inform"},{"key":"4642_CR10","doi-asserted-by":"publisher","first-page":"89354","DOI":"10.1109\/ACCESS.2019.2927253","volume":"7","author":"H Zhang","year":"2019","unstructured":"Zhang H, Guan R, Zhou F, Liang Y, Zhan Z-H, Huang L, Feng X. Deep residual convolutional neural network for protein\u2013protein interaction extraction. IEEE Access. 2019;7:89354\u201365.","journal-title":"IEEE Access"},{"key":"4642_CR11","first-page":"626226","volume":"626226","author":"P Su","year":"2019","unstructured":"Su P, Li G, Wu C, Vijay-Shanker K. Using distant supervision to augment manually annotated data for relation extraction. BioRxiv. 2019;626226:626226.","journal-title":"BioRxiv"},{"key":"4642_CR12","unstructured":"Dai AM, Le QV. Semi-supervised sequence learning. In: Advances in neural information processing systems. 2015, pp. 3079\u201387."},{"key":"4642_CR13","doi-asserted-by":"crossref","unstructured":"Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. arXiv:1802.05365. 2018.","DOI":"10.18653\/v1\/N18-1202"},{"key":"4642_CR14","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. 2018."},{"issue":"8","key":"4642_CR15","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog. 2019;1(8):9.","journal-title":"OpenAI Blog"},{"key":"4642_CR16","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I. Attention is all you need. In: Advances in neural information processing systems. 2017, pp. 5998\u20136008."},{"issue":"4","key":"4642_CR17","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234\u201340.","journal-title":"Bioinformatics"},{"key":"4642_CR18","doi-asserted-by":"crossref","unstructured":"Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: an evaluation of bert and elmo on ten benchmarking datasets. arXiv:1906.05474. 2019.","DOI":"10.18653\/v1\/W19-5006"},{"key":"4642_CR19","unstructured":"Beltagy I, Cohan A, Lo K. Scibert: pretrained contextualized embeddings for scientific text. arXiv:1903.10676. 2019."},{"key":"4642_CR20","doi-asserted-by":"crossref","unstructured":"Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Domain-specific language model pretraining for biomedical natural language processing. arXiv:2007.15779. 2020.","DOI":"10.1145\/3458754"},{"key":"4642_CR21","doi-asserted-by":"publisher","first-page":"160035","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.","journal-title":"Sci Data"},{"key":"4642_CR22","doi-asserted-by":"crossref","unstructured":"Ammar W, Groeneveld D, Bhagavatula C, Beltagy I, Crawford M, Downey D, Dunkelberger J, Elgohary A, Feldman S, Ha V, et al. Construction of the literature graph in semantic scholar. arXiv:1805.02262. 2018.","DOI":"10.18653\/v1\/N18-3011"},{"key":"4642_CR23","doi-asserted-by":"crossref","unstructured":"Gururangan S, Marasovi\u0107 A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith NA. Don\u2019t stop pretraining: adapt language models to domains and tasks. arXiv:2004.10964. 2020.","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"4642_CR24","unstructured":"Phang J, F\u00e9vry T, Bowman SR. Sentence encoders on stilts: supplementary training on intermediate labeled-data tasks. arXiv:1811.01088. 2018."},{"key":"4642_CR25","doi-asserted-by":"crossref","unstructured":"Reimers N, Gurevych I. Sentence-bert: sentence embeddings using siamese bert-networks. arXiv:1908.10084. 2019.","DOI":"10.18653\/v1\/D19-1410"},{"key":"4642_CR26","doi-asserted-by":"crossref","unstructured":"Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. Proteinbert: a universal deep-learning model of protein sequence and function. bioRxiv. 2021.","DOI":"10.1101\/2021.05.24.445464"},{"key":"4642_CR27","doi-asserted-by":"crossref","unstructured":"Tenney I, Das D, Pavlick E. Bert rediscovers the classical NLP pipeline. arXiv:1905.05950. 2019.","DOI":"10.18653\/v1\/P19-1452"},{"key":"4642_CR28","unstructured":"Tenney I, Xia P, Chen B, Wang A, Poliak A, McCoy RT, Kim N, Van\u00a0Durme B, Bowman SR, Das D, et al. What do you learn from context? Probing for sentence structure in contextualized word representations. arXiv:1905.06316. 2019."},{"key":"4642_CR29","unstructured":"Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. 2014."},{"issue":"2","key":"4642_CR30","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1016\/j.artmed.2004.07.016","volume":"33","author":"R Bunescu","year":"2005","unstructured":"Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW. Comparative experiments on learning information extractors for proteins and their interactions. Artif Intell Med. 2005;33(2):139\u201355.","journal-title":"Artif Intell Med"},{"key":"4642_CR31","unstructured":"Song Y, Wang J, Liang Z, Liu Z, Jiang T. Utilizing bert intermediate layers for aspect based sentiment analysis and natural language inference. arXiv:2002.04815. 2020."},{"issue":"3","key":"4642_CR32","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1108\/eb046814","volume":"14","author":"MF Porter","year":"1980","unstructured":"Porter MF, et al. An algorithm for suffix stripping. Program. 1980;14(3):130\u20137.","journal-title":"Program"},{"issue":"W1","key":"4642_CR33","doi-asserted-by":"publisher","first-page":"518","DOI":"10.1093\/nar\/gkt441","volume":"41","author":"C-H Wei","year":"2013","unstructured":"Wei C-H, Kao H-Y, Lu Z. Pubtator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 2013;41(W1):518\u201322.","journal-title":"Nucleic Acids Res"},{"issue":"8","key":"4642_CR34","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735\u201380.","journal-title":"Neural Comput"},{"key":"4642_CR35","doi-asserted-by":"crossref","unstructured":"Graves A, Fern\u00e1ndez S, Schmidhuber J. Bidirectional lstm networks for improved phoneme classification and recognition. In: International conference on artificial neural networks. Springer. 2005, pp. 799\u2013804.","DOI":"10.1007\/11550907_126"},{"key":"4642_CR36","unstructured":"Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. Tensorflow: a system for large-scale machine learning. In: 12th $$\\{$$USENIX$$\\}$$ symposium on operating systems design and implementation ($$\\{$$OSDI$$\\}$$ 16). 2016, pp. 265\u201383."},{"key":"4642_CR37","doi-asserted-by":"crossref","unstructured":"Su P, Vijay-Shanker K. Investigation of bert model on biomedical relation extraction based on revised fine-tuning mechanism. In: 2020 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE. 2020, pp. 2522\u20139.","DOI":"10.1109\/BIBM49941.2020.9313160"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04642-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-022-04642-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04642-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T09:26:42Z","timestamp":1675157202000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-022-04642-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,4]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["4642"],"URL":"https:\/\/doi.org\/10.1186\/s12859-022-04642-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-640112\/v1","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,4]]},"assertion":[{"value":"19 June 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"120"}}