{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T17:07:15Z","timestamp":1771261635523,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1UL1TR003167"],"award-info":[{"award-number":["1UL1TR003167"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01AG066749"],"award-info":[{"award-number":["1R01AG066749"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1U24MH130988-01"],"award-info":[{"award-number":["1U24MH130988-01"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01LM011934"],"award-info":[{"award-number":["R01LM011934"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R21EB029575"],"award-info":[{"award-number":["R21EB029575"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01AG078154"],"award-info":[{"award-number":["R01AG078154"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000005","name":"Department of Defense","doi-asserted-by":"publisher","award":["W81XWH-22-1-0164"],"award-info":[{"award-number":["W81XWH-22-1-0164"]}],"id":[{"id":"10.13039\/100000005","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["RP170668"],"award-info":[{"award-number":["RP170668"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>The rapid expansion of biomedical literature necessitates automated techniques to discern relationships between biomedical concepts from extensive free text. Such techniques facilitate the development of detailed knowledge bases and highlight research deficiencies. The LitCoin Natural Language Processing (NLP) challenge, organized by the National Center for Advancing Translational Science, aims to evaluate such potential and provides a manually annotated corpus for methodology development and benchmarking.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>For the named entity recognition (NER) task, we utilized ensemble learning to merge predictions from three domain-specific models, namely BioBERT, PubMedBERT, and BioM-ELECTRA, devised a rule-driven detection method for cell line and taxonomy names and annotated 70 more abstracts as additional corpus. We further finetuned the T0pp model, with 11 billion parameters, to boost the performance on relation extraction and leveraged entites\u2019 location information (eg, title, background) to enhance novelty prediction performance in relation extraction (RE).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our pioneering NLP system designed for this challenge secured first place in Phase I\u2014NER and second place in Phase II\u2014relation extraction and novelty prediction, outpacing over 200 teams. We tested OpenAI ChatGPT 3.5 and ChatGPT 4 in a Zero-Shot setting using the same test set, revealing that our finetuned model considerably surpasses these broad-spectrum large language models.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion and Conclusion<\/jats:title>\n                  <jats:p>Our outcomes depict a robust NLP system excelling in NER and RE across various biomedical entities, emphasizing that task-specific models remain superior to generic large ones. Such insights are valuable for endeavors like knowledge graph development and hypothesis formulation in biomedical research.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae061","type":"journal-article","created":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T19:21:36Z","timestamp":1711221696000},"page":"1904-1911","source":"Crossref","is-referenced-by-count":13,"title":["Ensemble pretrained language models to extract biomedical knowledge from literature"],"prefix":"10.1093","volume":"31","author":[{"given":"Zhao","family":"Li","sequence":"first","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Qiang","family":"Wei","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Liang-Chin","family":"Huang","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Jianfu","family":"Li","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2413-5918","authenticated-orcid":false,"given":"Yan","family":"Hu","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Yao-Shun","family":"Chuang","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Jianping","family":"He","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Avisha","family":"Das","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Vipina Kuttichi","family":"Keloth","sequence":"additional","affiliation":[{"name":"Section of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510, United States"}]},{"given":"Yuntao","family":"Yang","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Chiamaka S","family":"Diala","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6525-5213","authenticated-orcid":false,"given":"Kirk E","family":"Roberts","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"given":"Cui","family":"Tao","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9933-2205","authenticated-orcid":false,"given":"Xiaoqian","family":"Jiang","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7411-6047","authenticated-orcid":false,"given":"W Jim","family":"Zheng","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, TX 77030, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5274-4672","authenticated-orcid":false,"given":"Hua","family":"Xu","sequence":"additional","affiliation":[{"name":"Section of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,3,23]]},"reference":[{"issue":"5","key":"2024082207510121900_ocae061-B1","doi-asserted-by":"crossref","first-page":"bbac282","DOI":"10.1093\/bib\/bbac282","article-title":"BioRED: a rich biomedical relation extraction dataset","volume":"23","author":"Luo","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024082207510121900_ocae061-B2","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/1758-2946-7-S1-S10","article-title":"Recognition of chemical entities: combining dictionary-based and grammar-based approaches","volume":"7","author":"Akhondi","year":"2015","journal-title":"J Cheminform"},{"issue":"4","key":"2024082207510121900_ocae061-B3","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1093\/bib\/6.4.357","article-title":"What makes a gene name? Named entity recognition in the biomedical literature","volume":"6","author":"Leser","year":"2005","journal-title":"Brief Bioinform"},{"issue":"6","key":"2024082207510121900_ocae061-B4","doi-asserted-by":"crossref","first-page":"bbab282","DOI":"10.1093\/bib\/bbab282","article-title":"Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison","volume":"22","author":"Song","year":"2021","journal-title":"Brief Bioinform"},{"key":"2024082207510121900_ocae061-B5","author":"Huang","year":"2015"},{"issue":"6","key":"2024082207510121900_ocae061-B6","doi-asserted-by":"crossref","first-page":"283","DOI":"10.3390\/e19060283","article-title":"LSTM-CRF for drug-named entity recognition","volume":"19","author":"Zeng","year":"2017","journal-title":"Entropy"},{"issue":"9","key":"2024082207510121900_ocae061-B7","doi-asserted-by":"crossref","first-page":"1547","DOI":"10.1093\/bioinformatics\/btx815","article-title":"GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text","volume":"34","author":"Zhu","year":"2018","journal-title":"Bioinformatics"},{"key":"2024082207510121900_ocae061-B8","first-page":"5998","author":"Vaswani","year":"2017:"},{"issue":"140","key":"2024082207510121900_ocae061-B9","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2024082207510121900_ocae061-B10","first-page":"27381","article-title":"Flow network based generative models for non-iterative diverse candidate generation","volume":"34","author":"Bengio","year":"2021","journal-title":"Adv Neural Inf Process Syst"},{"issue":"3","key":"2024082207510121900_ocae061-B11","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1038\/s41397-019-0122-0","article-title":"Drug\u2013drug\u2013gene interactions and adverse drug reactions","volume":"20","author":"Malki","year":"2020","journal-title":"Pharmacogenomics J"},{"key":"2024082207510121900_ocae061-B12","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1016\/j.coph.2021.02.007","article-title":"Opportunities and challenges for nonaddictive interventions in chronic pain","volume":"57","author":"Malafoglia","year":"2021","journal-title":"Curr Opin Pharmacol"},{"key":"2024082207510121900_ocae061-B13","doi-asserted-by":"crossref","first-page":"103779","DOI":"10.1016\/j.jbi.2021.103779","article-title":"NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition","volume":"118","author":"Islamaj","year":"2021","journal-title":"J Biomed Inform"},{"key":"2024082207510121900_ocae061-B14","doi-asserted-by":"crossref","first-page":"918710","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"Biomed Res Int"},{"key":"2024082207510121900_ocae061-B15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jbi.2013.12.006","article-title":"NCBI disease corpus: a resource for disease name recognition and concept normalization","volume":"47","author":"Do\u011fan","year":"2014","journal-title":"J Biomed Inform"},{"issue":"1","key":"2024082207510121900_ocae061-B16","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1186\/1471-2105-13-161","article-title":"Concept annotation in the CRAFT corpus","volume":"13","author":"Bada","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2024082207510121900_ocae061-B17","author":"Sanh","year":"2021"},{"issue":"4","key":"2024082207510121900_ocae061-B18","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"issue":"1","key":"2024082207510121900_ocae061-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3458754","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Gu","year":"2021","journal-title":"ACM Trans Comput Healthc"},{"key":"2024082207510121900_ocae061-B20","first-page":"221","author":"Alrowili","year":"2021"},{"key":"2024082207510121900_ocae061-B21","author":"Qi","year":"2020"},{"key":"2024082207510121900_ocae061-B22","author":"Luoma","year":"2020"},{"key":"2024082207510121900_ocae061-B23","author":"Hu"},{"key":"2024082207510121900_ocae061-B24","author":"Hoffmann","year":"2022"},{"key":"2024082207510121900_ocae061-B25","author":"Kaplan","year":"2020"},{"key":"2024082207510121900_ocae061-B26","author":"Arora","year":"2022"},{"key":"2024082207510121900_ocae061-B27","author":"Bach","year":"2022"},{"key":"2024082207510121900_ocae061-B28","author":"Ding","year":"2021"},{"key":"2024082207510121900_ocae061-B29","first-page":"1180","author":"Sarkar"},{"key":"2024082207510121900_ocae061-B30","author":"Xia","year":"2012"},{"issue":"12","key":"2024082207510121900_ocae061-B31","doi-asserted-by":"crossref","first-page":"5586","DOI":"10.1109\/TKDE.2021.3070203","article-title":"A survey on multi-task learning","volume":"34","author":"Zhang","year":"2022","journal-title":"IEEE Trans Knowl Data Eng"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1904\/58868023\/ocae061.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1904\/58868023\/ocae061.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T10:18:49Z","timestamp":1724321929000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/9\/1904\/7634192"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,23]]},"references-count":31,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,3,23]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae061","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,3,23]]}}}