{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T22:55:29Z","timestamp":1780527329012,"version":"3.54.1"},"reference-count":45,"publisher":"Oxford University Press (OUP)","issue":"1","funder":[{"name":"Intramural Research Program"},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["ZIC TR000410-03"],"award-info":[{"award-number":["ZIC TR000410-03"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,12,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Identifying sets of rare diseases with shared aspects of etiology and pathophysiology may enable drug repurposing. Toward that aim, we utilized an integrative knowledge graph to construct clusters of rare diseases.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>Data on 3242 rare diseases were extracted from the National Center for Advancing Translational Science Genetic and Rare Diseases Information center internal data resources. The rare disease data enriched with additional biomedical data, including gene and phenotype ontologies, biological pathway data, and small molecule-target activity data, to create a knowledge graph (KG). Node embeddings were trained and clustered. We validated the disease clusters through semantic similarity and feature enrichment analysis.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Thirty-seven disease clusters were created with a mean size of 87 diseases. We validate the clusters quantitatively via semantic similarity based on the Orphanet Rare Disease Ontology. In addition, the clusters were analyzed for enrichment of associated genes, revealing that the enriched genes within clusters are highly related.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>We demonstrate that node embeddings are an effective method for clustering diseases within a heterogenous KG. Semantically similar diseases and relevant enriched genes have been uncovered within the clusters. Connections between disease clusters and drugs are enumerated for follow-up efforts.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>We lay out a method for clustering rare diseases using graph node embeddings. We develop an easy-to-maintain pipeline that can be updated when new data on rare diseases emerges. The embeddings themselves can be paired with other representation learning methods for other data types, such as drugs, to address other predictive modeling problems.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocad186","type":"journal-article","created":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T03:35:48Z","timestamp":1695872148000},"page":"154-164","source":"Crossref","is-referenced-by-count":15,"title":["Clustering rare diseases within an ontology-enriched knowledge graph"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3218-5500","authenticated-orcid":false,"given":"Jaleal","family":"Sanjak","sequence":"first","affiliation":[{"name":"Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH) , Rockville, MD, United States"},{"name":"Chief Technology Office, Booz Allen Hamilton , Bethesda, MD, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jessica","family":"Binder","sequence":"additional","affiliation":[{"name":"Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH) , Rockville, MD, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6853-7443","authenticated-orcid":false,"given":"Arjun Singh","family":"Yadaw","sequence":"additional","affiliation":[{"name":"Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH) , Rockville, MD, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qian","family":"Zhu","sequence":"additional","affiliation":[{"name":"Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH) , Rockville, MD, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ewy A","family":"Math\u00e9","sequence":"additional","affiliation":[{"name":"Division of Pre-Clinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH) , Rockville, MD, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2023,9,27]]},"reference":[{"key":"2023122220314167200_ocad186-B1","volume-title":"Rare Diseases and Orphan Products: Accelerating Research and Development, in Rare Diseases and Orphan Products: Accelerating Research and Development","author":"Field","year":"2010"},{"issue":"2","key":"2023122220314167200_ocad186-B2","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1038\/s41431-019-0508-0","article-title":"Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database","volume":"28","author":"Nguengang Wakap","year":"2020","journal-title":"Eur J Hum Genet"},{"issue":"1","key":"2023122220314167200_ocad186-B3","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1186\/s13023-021-02061-3","article-title":"The IDeaS initiative: pilot study to assess the impact of rare diseases on patients and healthcare systems","volume":"16","author":"Tisdale","year":"2021","journal-title":"Orphanet J Rare Dis"},{"key":"2023122220314167200_ocad186-B4","author":"U.S. Government Accountability Office","year":"2021"},{"issue":"2","key":"2023122220314167200_ocad186-B5","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1038\/d41573-019-00180-y","article-title":"How many rare diseases are there?","volume":"19","author":"Haendel","year":"2020","journal-title":"Nat Rev Drug Discov"},{"issue":"1","key":"2023122220314167200_ocad186-B6","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1089\/hum.2016.29018.pjb","article-title":"Gene therapy: the view from NCATS","volume":"27","author":"Brooks","year":"2016","journal-title":"Hum Gene Ther"},{"issue":"1","key":"2023122220314167200_ocad186-B7","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1186\/s13063-019-3664-1","article-title":"Systematic review of basket trials, umbrella trials, and platform trials: a landscape analysis of master protocols","volume":"20","author":"Park","year":"2019","journal-title":"Trials"},{"issue":"1","key":"2023122220314167200_ocad186-B8","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1186\/s13321-020-00450-7","article-title":"A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions","volume":"12","author":"Jarada","year":"2020","journal-title":"J Cheminform"},{"key":"2023122220314167200_ocad186-B9","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.26726","article-title":"Systematic integration of biomedical knowledge prioritizes drugs for repurposing","volume":"6","author":"Himmelstein","year":"2017","journal-title":"Elife"},{"issue":"D1","key":"2023122220314167200_ocad186-B10","doi-asserted-by":"crossref","first-page":"D937","DOI":"10.1093\/nar\/gkx1062","article-title":"eRAM: encyclopedia of rare disease annotations for precision medicine","volume":"46","author":"Jia","year":"2018","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2023122220314167200_ocad186-B11","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1186\/s13023-021-01741-4","article-title":"RDmap: a map for exploring rare diseases","volume":"16","author":"Yang","year":"2021","journal-title":"Orphanet J Rare Dis"},{"key":"2023122220314167200_ocad186-B12","author":"Orphanet: an online rare disease and orphan drug database","year":"1999"},{"issue":"10","key":"2023122220314167200_ocad186-B13","doi-asserted-by":"crossref","first-page":"e18395","DOI":"10.2196\/18395","article-title":"Phenotypically similar rare disease identification from an integrative knowledge graph for data harmonization: preliminary study","volume":"8","author":"Zhu","year":"2020","journal-title":"JMIR Med Inform"},{"key":"2023122220314167200_ocad186-B14","first-page":"701","author":"Perozzi","year":"2014"},{"key":"2023122220314167200_ocad186-B15","author":"Grover","year":"2016"},{"key":"2023122220314167200_ocad186-B16","author":"Mikolov","year":"2-4, 2013; ,"},{"issue":"12","key":"2023122220314167200_ocad186-B17","doi-asserted-by":"crossref","first-page":"2133","DOI":"10.1093\/bioinformatics\/bty933","article-title":"OPA2Vec: Combining formal and informal content of biomedical ontologies to improve similarity-based prediction","volume":"35","author":"Smaili","year":"2019","journal-title":"Bioinformatics"},{"issue":"6","key":"2023122220314167200_ocad186-B18","first-page":"853","article-title":"Predicting candidate genes from phenotypes, functions and anatomical site of expression","volume":"37","author":"Chen","year":"2021","journal-title":"Bioinformatics (Oxford, Engl)"},{"issue":"18","key":"2023122220314167200_ocad186-B19","doi-asserted-by":"crossref","first-page":"4380","DOI":"10.1093\/bioinformatics\/btac520","article-title":"CoGO: a contrastive learning framework to predict disease similarity based on gene network and ontology structure","volume":"38","author":"Chen","year":"2022","journal-title":"Bioinformatics"},{"issue":"1","key":"2023122220314167200_ocad186-B20","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13326-020-00232-y","article-title":"An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD)","volume":"11","author":"Zhu","year":"2020","journal-title":"J Biomed Semantics"},{"issue":"1","key":"2023122220314167200_ocad186-B21","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene Ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"issue":"D1","key":"2023122220314167200_ocad186-B22","first-page":"D325-D3","article-title":"The Gene Ontology resource: enriching a GOld mine","volume":"49","author":"Carbon","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2023122220314167200_ocad186-B23","doi-asserted-by":"crossref","first-page":"D1207","DOI":"10.1093\/nar\/gkaa1043","article-title":"The human phenotype ontology in 2021","volume":"49","author":"K\u00f6hler","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023122220314167200_ocad186-B24","doi-asserted-by":"crossref","first-page":"baab069","DOI":"10.1093\/database\/baab069","article-title":"OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies","volume":"2021","author":"Jackson","year":"2021","journal-title":"Database (Oxford)"},{"issue":"D1","key":"2023122220314167200_ocad186-B25","first-page":"D489","article-title":"Pathway commons 2019 update: integration, analysis and exploration of pathway data","volume":"48","author":"Rodchenkov","year":"2019","journal-title":"Nucleic Acids Res."},{"issue":"D1","key":"2023122220314167200_ocad186-B26","doi-asserted-by":"crossref","first-page":"D1334","DOI":"10.1093\/nar\/gkaa993","article-title":"TCRD and Pharos 2021: mining the human proteome for disease biology","volume":"49","author":"Sheils","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2023122220314167200_ocad186-B27","doi-asserted-by":"crossref","first-page":"D1405","DOI":"10.1093\/nar\/gkac1033","article-title":"Pharos 2023: an integrated resource for the understudied human proteome","volume":"51","author":"Kelleher","year":"2023","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2023122220314167200_ocad186-B28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10817-013-9296-3","article-title":"The incredible ELK","volume":"53","author":"Kazakov","year":"2014","journal-title":"J Autom Reason"},{"issue":"1","key":"2023122220314167200_ocad186-B29","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1758-2946-5-3","article-title":"UniChem: a unified chemical structure cross-referencing and identifier tracking system","volume":"5","author":"Chambers","year":"2013","journal-title":"J Cheminform"},{"key":"2023122220314167200_ocad186-B30","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2023122220314167200_ocad186-B31","first-page":"166","author":"Satopaa","year":"2011"},{"key":"2023122220314167200_ocad186-B32","first-page":"986","author":"Sammut","year":"2010"},{"issue":"D1","key":"2023122220314167200_ocad186-B33","doi-asserted-by":"crossref","first-page":"D605","DOI":"10.1093\/nar\/gkaa1074","article-title":"The STRING database in 2021: customizable protein\u2013protein networks, and functional characterization of user-uploaded gene\/measurement sets","volume":"49","author":"Szklarczyk","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023122220314167200_ocad186-B34","author":"Vasant","year":"2014"},{"issue":"2","key":"2023122220314167200_ocad186-B35","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1016\/j.knosys.2010.10.001","article-title":"Ontology-based information content computation","volume":"24","author":"S\u00e1nchez","year":"2011","journal-title":"Knowl Based Syst"},{"issue":"D1","key":"2023122220314167200_ocad186-B36","doi-asserted-by":"crossref","first-page":"D1307","DOI":"10.1093\/nar\/gkab918","article-title":"NCATS Inxight Drugs: a comprehensive and curated portal for translational research","volume":"50","author":"Siramshetty","year":"2022","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"2023122220314167200_ocad186-B37","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.1002\/cpz1.90","article-title":"Gene set knowledge discovery with Enrichr","volume":"1","author":"Xie","year":"2021","journal-title":"Curr Protoc"},{"issue":"2","key":"2023122220314167200_ocad186-B38","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/ng.512","article-title":"Mutations in TRPV4 cause Charcot-Marie-Tooth disease type 2C","volume":"42","author":"Landour\u00e9","year":"2010","journal-title":"Nat Genet"},{"issue":"3","key":"2023122220314167200_ocad186-B39","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/j.ajhg.2009.01.021","article-title":"Mutations in the gene encoding the calcium-permeable ion channel TRPV4 produce spondylometaphyseal dysplasia, Kozlowski type and metatropic dysplasia","volume":"84","author":"Krakow","year":"2009","journal-title":"Am J Hum Genet"},{"issue":"8","key":"2023122220314167200_ocad186-B40","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1038\/ng.166","article-title":"Gain-of-function mutations in TRPV4 cause autosomal dominant brachyolmia","volume":"40","author":"Rock","year":"2008","journal-title":"Nat Genet"},{"issue":"6","key":"2023122220314167200_ocad186-B41","doi-asserted-by":"crossref","first-page":"1443","DOI":"10.1002\/ajmg.a.33414","article-title":"Spondylo-epiphyseal dysplasia, Maroteaux type (pseudo-Morquio syndrome type 2), and parastremmatic dysplasia are caused by TRPV4 mutations","volume":"152A","author":"Nishimura","year":"2010","journal-title":"Am J Med Genet A"},{"issue":"1","key":"2023122220314167200_ocad186-B42","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1002\/acn3.51477","article-title":"Genetic defects are common in myopathies with tubular aggregates","volume":"9","author":"Gang","year":"2022","journal-title":"Ann Clin Transl Neurol"},{"key":"2023122220314167200_ocad186-B43","doi-asserted-by":"publisher","author":"Sanjak","year":"2023","DOI":"10.6084\/m9.figshare.23748060.v1"},{"issue":"6","key":"2023122220314167200_ocad186-B44","doi-asserted-by":"crossref","first-page":"485","DOI":"10.18632\/oncotarget.281","article-title":"Myeloproliferative neoplasms: from JAK2 mutations discovery to JAK2 inhibitor therapies","volume":"2","author":"Passamonti","year":"2011","journal-title":"Oncotarget"},{"issue":"5","key":"2023122220314167200_ocad186-B45","doi-asserted-by":"crossref","first-page":"417","DOI":"10.2174\/1566524020666201015144702","article-title":"JAK2-mediated Intracellular Signaling","volume":"21","author":"Sopjani","year":"2021","journal-title":"Curr Mol Med"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/1\/154\/54762298\/ocad186.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/1\/154\/54762298\/ocad186.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,22]],"date-time":"2023-12-22T20:32:53Z","timestamp":1703277173000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/1\/154\/7284356"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,27]]},"references-count":45,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,9,27]]},"published-print":{"date-parts":[[2023,12,22]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocad186","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2023,9,27]]}}}