{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T01:04:50Z","timestamp":1775264690088,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2024,5,23]],"date-time":"2024-05-23T00:00:00Z","timestamp":1716422400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100004826","name":"Natural Science Foundation of Beijing Municipality","doi-asserted-by":"publisher","award":["Z190024"],"award-info":[{"award-number":["Z190024"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"publisher","award":["12171270"],"award-info":[{"award-number":["12171270"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>Biomedical Knowledge Graphs play a pivotal role in various biomedical research domains. Concurrently, term clustering emerges as a crucial step in constructing these knowledge graphs, aiming to identify synonymous terms. Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and do not generalize well beyond UMLS terms. In this work, we leverage the world knowledge from large language models (LLMs) and propose Contrastive Learning for Representing Terms via Explanations (CoRTEx) to enhance term representation and significantly improves term clustering.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>The model training involves generating explanations for a cleaned subset of UMLS terms using ChatGPT. We employ contrastive learning, considering term and explanation embeddings simultaneously, and progressively introduce hard negative samples. Additionally, a ChatGPT-assisted BIRCH algorithm is designed for efficient clustering of a new ontology.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We established a clustering test set and a hard negative test set, where our model consistently achieves the highest F1 score. With CoRTEx embeddings and the modified BIRCH algorithm, we grouped 35\u2009580\u2009932 terms from the Biomedical Informatics Ontology System (BIOS) into 22\u2009104\u2009559 clusters with O(N) queries to ChatGPT. Case studies highlight the model\u2019s efficacy in handling challenging samples, aided by information from explanations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>By aligning terms to their explanations, CoRTEx demonstrates superior accuracy over benchmark models and robustness beyond its training set, and it is suitable for clustering terms for large-scale biomedical ontologies.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae115","type":"journal-article","created":{"date-parts":[[2024,5,23]],"date-time":"2024-05-23T02:41:25Z","timestamp":1716432085000},"page":"1912-1920","source":"Crossref","is-referenced-by-count":5,"title":["CoRTEx: contrastive learning for representing terms via explanations with applications on constructing biomedical knowledge graphs"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7959-2093","authenticated-orcid":false,"given":"Huaiyuan","family":"Ying","sequence":"first","affiliation":[{"name":"Center for Statistical Science, Department of Industrial Engineering, Tsinghua University , Beijing, 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9425-7752","authenticated-orcid":false,"given":"Zhengyun","family":"Zhao","sequence":"additional","affiliation":[{"name":"Center for Statistical Science, Department of Industrial Engineering, Tsinghua University , Beijing, 100084, China"}]},{"given":"Yang","family":"Zhao","sequence":"additional","affiliation":[{"name":"Weiyang College, Tsinghua University , Beijing, 100084, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2921-829X","authenticated-orcid":false,"given":"Sihang","family":"Zeng","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Medical Education, University of Washington , Seattle, WA 98195, United States"}]},{"given":"Sheng","family":"Yu","sequence":"additional","affiliation":[{"name":"Center for Statistical Science, Department of Industrial Engineering, Tsinghua University , Beijing, 100084, China"}]}],"member":"286","published-online":{"date-parts":[[2024,5,23]]},"reference":[{"issue":"Database issue","key":"2024082207521424100_ocae115-B1","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The Unified Medical Language System (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024082207521424100_ocae115-B2","author":"Yu","year":"2022"},{"issue":"1","key":"2024082207521424100_ocae115-B3","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1186\/s12911-022-01850-5","article-title":"Improving medical term embeddings using UMLS Metathesaurus","volume":"22","author":"Chanda","year":"2022","journal-title":"BMC Med Inform Decis Mak"},{"key":"2024082207521424100_ocae115-B4","doi-asserted-by":"crossref","first-page":"103983","DOI":"10.1016\/j.jbi.2021.103983","article-title":"CODER: knowledge-infused cross-lingual medical term embedding for term normalization","volume":"126","author":"Yuan","year":"2020","journal-title":"J Biomed Inform"},{"key":"2024082207521424100_ocae115-B5","first-page":"4228","author":"Liu"},{"key":"2024082207521424100_ocae115-B6","first-page":"91","author":"Zeng"},{"key":"2024082207521424100_ocae115-B7","first-page":"517","author":"Su"},{"issue":"5","key":"2024082207521424100_ocae115-B8","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac363","article-title":"A biomedical knowledge graph-based method for drug-drug interactions prediction through combining local and global features with deep neural networks","volume":"23","author":"Ren","year":"2022","journal-title":"Brief Bioinform"},{"issue":"5","key":"2024082207521424100_ocae115-B9","doi-asserted-by":"crossref","DOI":"10.1111\/exsy.13181","article-title":"Biomedical knowledge graph embeddings for personalized medicine: Predicting disease-gene associations","volume":"40","author":"Vilela","year":"2022","journal-title":"Expert Syst"},{"key":"2024082207521424100_ocae115-B10","doi-asserted-by":"crossref","DOI":"10.1038\/s41597-023-01960-3","article-title":"Building a knowledge graph to enable precision medicine","volume":"10","author":"Chandak","year":"2023","journal-title":"Scientif Data"},{"issue":"12","key":"2024082207521424100_ocae115-B11","doi-asserted-by":"crossref","first-page":"3191","DOI":"10.1109\/TKDE.2016.2605687","article-title":"Diagnosis code assignment using sparsity-based disease correlation embedding","volume":"28","author":"Wang","year":"2016","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2024082207521424100_ocae115-B12"},{"issue":"3","key":"2024082207521424100_ocae115-B13","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1093\/jamia\/ocab270","article-title":"Embedding electronic health records onto a knowledge network recognizes prodromal features of multiple sclerosis and predicts diagnosis","volume":"29","author":"Nelson","year":"2022","journal-title":"J Am Med Inform Assoc"},{"key":"2024082207521424100_ocae115-B14","doi-asserted-by":"crossref","first-page":"1414","DOI":"10.1016\/j.csbj.2020.05.017","article-title":"Constructing knowledge graphs and their biomedical applications","volume":"18","author":"Nicholson","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2024082207521424100_ocae115-B15"},{"key":"2024082207521424100_ocae115-B16","first-page":"28","author":"Bhowmik","year":"2021"},{"key":"2024082207521424100_ocae115-B17","author":"Agarwal","year":"2021"},{"key":"2024082207521424100_ocae115-B18","author":"Vaswani","year":"2017"},{"key":"2024082207521424100_ocae115-B19","author":"Devlin"},{"key":"2024082207521424100_ocae115-B20","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1007\/978-1-0716-0826-5_3","article-title":"Siamese neural networks: an overview","volume":"2190","author":"Chicco","year":"2021","journal-title":"Methods Mol Biol"},{"key":"2024082207521424100_ocae115-B21","doi-asserted-by":"crossref","first-page":"1166120","DOI":"10.3389\/fpubh.2023.1166120","article-title":"ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health","volume":"11","author":"De Angelis","year":"2023","journal-title":"Front Public Health"},{"issue":"2","key":"2024082207521424100_ocae115-B22","doi-asserted-by":"crossref","first-page":"100022","DOI":"10.1016\/j.metrad.2023.100022","article-title":"A comprehensive survey of ChatGPT: advancements, applications, prospects, and challenges","volume":"1","author":"Nazir","year":"2023","journal-title":"Meta Radiol"},{"key":"2024082207521424100_ocae115-B23","first-page":"14918","author":"Sun"},{"key":"2024082207521424100_ocae115-B24","author":"Gu"},{"key":"2024082207521424100_ocae115-B25","first-page":"7059","author":"Shridhar"},{"key":"2024082207521424100_ocae115-B26","author":"Gu"},{"issue":"9","key":"2024082207521424100_ocae115-B27","doi-asserted-by":"crossref","first-page":"1074","DOI":"10.1109\/43.159993","article-title":"New spectral methods for ratio cut partitioning and clustering","volume":"11","author":"Hagen","year":"1992","journal-title":"IEEE Trans Comput-Aided Des Integr Circuits Syst"},{"key":"2024082207521424100_ocae115-B28","first-page":"103","author":"Zhang"},{"key":"2024082207521424100_ocae115-B29","first-page":"1102","author":"Su"},{"key":"2024082207521424100_ocae115-B30","first-page":"5022","author":"Wang"},{"issue":"3","key":"2024082207521424100_ocae115-B31","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1109\/TBDATA.2019.2921572","article-title":"Billion-scale similarity search with GPUs","volume":"7","author":"Johnson","year":"2021","journal-title":"IEEE Trans Big Data"},{"issue":"140","key":"2024082207521424100_ocae115-B32","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"issue":"11","key":"2024082207521424100_ocae115-B33","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btad651","article-title":"MedCPT: contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval","volume":"39","author":"Jin","year":"2023","journal-title":"Bioinformatics"},{"key":"2024082207521424100_ocae115-B34","author":"Xiao"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1912\/58868273\/ocae115.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1912\/58868273\/ocae115.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T12:13:07Z","timestamp":1724328787000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/9\/1912\/7680017"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,23]]},"references-count":34,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,5,23]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae115","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,5,23]]}}}