{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T08:48:15Z","timestamp":1774946895255,"version":"3.50.1"},"reference-count":61,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T00:00:00Z","timestamp":1708992000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ADAM"},{"name":"VLAIO O&O","award":["HBC.2020.3234"],"award-info":[{"award-number":["HBC.2020.3234"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>In this study, we investigate the potential of large language models (LLMs) to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>Drawing on the wealth of the Unified Medical Language System knowledge graph and harnessing cutting-edge LLMs, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of 3 steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Through rigorous evaluations of diverse downstream tasks, we demonstrate consistent and substantial improvements over the previous state of the art for semantic textual similarity (STS), biomedical concept representation (BCR), and clinically named entity linking, across 15+ datasets. Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages and finetuned on 7 European languages.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>Many clinical pipelines can benefit from our latest models. Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world. As a result, we hope to see BioLORD-2023 becoming a precious tool for future biomedical applications.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>In this article, we introduced BioLORD-2023, a state-of-the-art model for STS and BCR designed for the clinical domain.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae029","type":"journal-article","created":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T20:21:34Z","timestamp":1709065294000},"page":"1844-1855","source":"Crossref","is-referenced-by-count":45,"title":["BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4775-6975","authenticated-orcid":false,"given":"Fran\u00e7ois","family":"Remy","sequence":"first","affiliation":[{"name":"Internet and Data Science Lab, imec, Ghent University , Ghent, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8525-7160","authenticated-orcid":false,"given":"Kris","family":"Demuynck","sequence":"additional","affiliation":[{"name":"Internet and Data Science Lab, imec, Ghent University , Ghent, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9901-5768","authenticated-orcid":false,"given":"Thomas","family":"Demeester","sequence":"additional","affiliation":[{"name":"Internet and Data Science Lab, imec, Ghent University , Ghent, Belgium"}]}],"member":"286","published-online":{"date-parts":[[2024,2,27]]},"reference":[{"key":"2024082207515508100_ocae029-B1","doi-asserted-by":"crossref","first-page":"140628","DOI":"10.1109\/ACCESS.2021.3119621","article-title":"Machine learning techniques for biomedical natural language processing: a comprehensive review","volume":"9","author":"Houssein","year":"2021","journal-title":"IEEE Access"},{"issue":"5","key":"2024082207515508100_ocae029-B2","doi-asserted-by":"crossref","first-page":"2593","DOI":"10.1007\/s11280-023-01144-4","article-title":"Knowledge-graph-enabled biomedical entity linking: a survey","volume":"26","author":"Shi","year":"2023","journal-title":"World Wide Web"},{"key":"2024082207515508100_ocae029-B3","author":"Pan","year":"2023"},{"key":"2024082207515508100_ocae029-B4","first-page":"54","author":"Satvik"},{"issue":"4","key":"2024082207515508100_ocae029-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/bib\/bbad235","article-title":"Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction","volume":"24","author":"Lin","year":"2023","journal-title":"Brief Bioinform"},{"issue":"3","key":"2024082207515508100_ocae029-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3611651","article-title":"Pre-trained language models in biomedical domain: a systematic survey","volume":"56","author":"Wang","year":"2023","journal-title":"ACM Comput Surv"},{"key":"2024082207515508100_ocae029-B7","article-title":"A survey of knowledge enhanced pre-trained language models","author":"Hu","year":"2023","journal-title":"IEEE Tran Knowl Data Eng"},{"key":"2024082207515508100_ocae029-B8","author":"Feng","year":"2023"},{"key":"2024082207515508100_ocae029-B9","first-page":"3641","author":"Sung","year":"2020"},{"key":"2024082207515508100_ocae029-B10","first-page":"4228","author":"Liu","year":"2021"},{"key":"2024082207515508100_ocae029-B11","first-page":"1454","author":"Remy","year":"2022"},{"key":"2024082207515508100_ocae029-B12","first-page":"4171","author":"Devlin","year":"2019"},{"key":"2024082207515508100_ocae029-B13","first-page":"9459","author":"Lewis","year":"2020"},{"key":"2024082207515508100_ocae029-B14","first-page":"2284","author":"Kim","year":"2020"},{"key":"2024082207515508100_ocae029-B15","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Adv Neural Inform Proc Syst"},{"issue":"Database issue","key":"2024082207515508100_ocae029-B16","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The Unified Medical Language System (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"issue":"8","key":"2024082207515508100_ocae029-B17","doi-asserted-by":"crossref","first-page":"756","DOI":"10.1001\/jama.1980.03300340032015","article-title":"Progress in medical information management. Systematized nomenclature of medicine (SNOMED)","volume":"243","author":"Cot\u00e9","year":"1980","journal-title":"JAMA"},{"key":"2024082207515508100_ocae029-B18","first-page":"265","author":"Remy","year":"2023"},{"key":"2024082207515508100_ocae029-B19","first-page":"4512","author":"Reimers","year":"2020"},{"key":"2024082207515508100_ocae029-B20","first-page":"878","author":"Feng","year":"2022"},{"key":"2024082207515508100_ocae029-B21","first-page":"565","author":"Liu","year":"2021"},{"key":"2024082207515508100_ocae029-B22","author":"Cui","year":"2023"},{"issue":"1","key":"2024082207515508100_ocae029-B23","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1038\/s41746-022-00742-2","article-title":"A large language model for electronic health records","volume":"5","author":"Yang","year":"2022","journal-title":"NPJ Digit Med"},{"issue":"7972","key":"2024082207515508100_ocae029-B24","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"issue":"13","key":"2024082207515508100_ocae029-B25","doi-asserted-by":"crossref","first-page":"1233","DOI":"10.1056\/NEJMsr2214184","article-title":"Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine","volume":"388","author":"Lee","year":"2023","journal-title":"N Engl J Med"},{"issue":"1","key":"2024082207515508100_ocae029-B26","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1038\/s41746-023-00768-0","article-title":"Automating the overburdened clinical coding system: challenges and next steps","volume":"36","author":"Venkatesh","year":"2023","journal-title":"NPJ Digit Med"},{"key":"2024082207515508100_ocae029-B27","author":"Wu","year":"2023"},{"key":"2024082207515508100_ocae029-B28","author":"Yan","year":"2023"},{"key":"2024082207515508100_ocae029-B29","author":"Jin","year":"2023"},{"key":"2024082207515508100_ocae029-B30","author":"Taylor","year":"2022"},{"key":"2024082207515508100_ocae029-B31","author":"Wang","year":"2023"},{"key":"2024082207515508100_ocae029-B32","author":"Bolton","year":"2022"},{"issue":"6","key":"2024082207515508100_ocae029-B33","doi-asserted-by":"crossref","first-page":"bbac409","DOI":"10.1093\/bib\/bbac409","article-title":"BioGPT: generative pre-trained transformer for biomedical text generation and mining","volume":"23","author":"Luo","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024082207515508100_ocae029-B34","author":"OpenAI","year":"2023"},{"key":"2024082207515508100_ocae029-B35","author":"Oord","year":"2018"},{"key":"2024082207515508100_ocae029-B36","first-page":"47","author":"Remy","year":"2023"},{"key":"2024082207515508100_ocae029-B37","first-page":"3982","author":"Reimers","year":"2019"},{"key":"2024082207515508100_ocae029-B38","doi-asserted-by":"crossref","first-page":"163","DOI":"10.29007\/g7bg","article-title":"Multi-task learning and catastrophic forgetting in continual reinforcement learning","volume":"65","author":"Ribeiro","year":"2019","journal-title":"EPiC Ser Comput"},{"key":"2024082207515508100_ocae029-B39","first-page":"1121","author":"He","year":"2021"},{"key":"2024082207515508100_ocae029-B40","first-page":"6894","author":"Gao","year":"2021"},{"key":"2024082207515508100_ocae029-B41","first-page":"9119","author":"Li","year":"2020"},{"key":"2024082207515508100_ocae029-B42","first-page":"55","author":"Ethayarajh","year":"2019"},{"key":"2024082207515508100_ocae029-B43","first-page":"23965","author":"Wortsman","year":"2022"},{"key":"2024082207515508100_ocae029-B44","first-page":"1","author":"Remy","year":"2022"},{"issue":"1","key":"2024082207515508100_ocae029-B45","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1007\/s10579-018-9431-1","article-title":"MedSTS: a resource for clinical semantic textual similarity","volume":"54","author":"Wang","year":"2020","journal-title":"Lang Resourc Eval"},{"key":"2024082207515508100_ocae029-B46","first-page":"1586","author":"Romanov","year":"2018"},{"issue":"14","key":"2024082207515508100_ocae029-B47","doi-asserted-by":"crossref","first-page":"i49","DOI":"10.1093\/bioinformatics\/btx238","article-title":"BIOSSES: a semantic sentence similarity estimation system for the biomedical domain","volume":"33","author":"Sogancioglu","year":"2017","journal-title":"Bioinformatics"},{"key":"2024082207515508100_ocae029-B48","first-page":"216","author":"Marelli","year":"2014"},{"key":"2024082207515508100_ocae029-B49","first-page":"1","author":"Cer","year":"2017"},{"key":"2024082207515508100_ocae029-B50","author":"Ofer","year":"2023"},{"key":"2024082207515508100_ocae029-B51","author":"Kalyan","year":"2021"},{"key":"2024082207515508100_ocae029-B52","first-page":"6565","author":"Schulz","year":"2020"},{"key":"2024082207515508100_ocae029-B53","first-page":"572","article-title":"Semantic similarity and relatedness between clinical terms: an experimental study","volume":"2010","author":"Pakhomov","year":"2010","journal-title":"AMIA Annu Symp. Proc"},{"issue":"2","key":"2024082207515508100_ocae029-B54","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1016\/j.jbi.2010.10.004","article-title":"Towards a framework for developing semantic relatedness reference standards","volume":"44","author":"Pakhomov","year":"2011","journal-title":"J Biomed Inform"},{"key":"2024082207515508100_ocae029-B55","first-page":"8580","author":"Portelli","year":"2022"},{"issue":"2","key":"2024082207515508100_ocae029-B56","doi-asserted-by":"crossref","first-page":"e24","DOI":"10.2196\/publichealth.6396","article-title":"TwiMed: twitter and PubMed comparable corpus of drugs, diseases, symptoms, and their relations","volume":"3","author":"Alvaro","year":"2017","journal-title":"JMIR Public Health Surveill"},{"key":"2024082207515508100_ocae029-B57","first-page":"27","author":"Gonzalez-Hernandez","year":"2020"},{"key":"2024082207515508100_ocae029-B58","doi-asserted-by":"crossref","first-page":"103838","DOI":"10.1016\/j.dib.2019.103838","article-title":"The PsyTAR dataset: From patients generated narratives to a corpus of adverse drug events and effectiveness of psychiatric medications","volume":"24","author":"Zolnoori","year":"2019","journal-title":"Data Brief"},{"key":"2024082207515508100_ocae029-B59","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1016\/j.jbi.2015.03.010","article-title":"Cadec: a corpus of adverse drug event annotations","volume":"55","author":"Karimi","year":"2015","journal-title":"J Biomed Inform"},{"issue":"11","key":"2024082207515508100_ocae029-B60","doi-asserted-by":"crossref","first-page":"btad651","DOI":"10.1093\/bioinformatics\/btad651","article-title":"MedCPT: contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval","volume":"39","author":"Jin","year":"2023","journal-title":"Bioinformatics"},{"issue":"10","key":"2024082207515508100_ocae029-B61","doi-asserted-by":"crossref","first-page":"1538","DOI":"10.1093\/jamia\/ocaa136","article-title":"Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts","volume":"27","author":"Mao","year":"2020","journal-title":"J Am Med Inform Assoc"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1844\/58868179\/ocae029.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1844\/58868179\/ocae029.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T11:48:43Z","timestamp":1724327323000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/9\/1844\/7614965"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,27]]},"references-count":61,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,2,27]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae029","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,2,27]]}}}