{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T14:30:40Z","timestamp":1778337040759,"version":"3.51.4"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T00:00:00Z","timestamp":1746576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","award":["U24LM013755"],"award-info":[{"award-number":["U24LM013755"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Common Data Elements (CDEs) standardize data collection and sharing across studies, enhancing data interoperability and improving research reproducibility. However, implementing CDEs presents challenges due to the broad range and variety of data elements. This study aims to develop a CDE mapping tool to bridge the gap between local data elements and National Institutes of Health (NIH) CDEs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Methods<\/jats:title>\n                  <jats:p>We propose CDEMapper, a large language model (LLM)-powered mapping tool designed to assist in mapping local data elements to NIH CDEs. CDEMapper has 3 core modules: (1) CDE indexing and embeddings. NIH CDEs were indexed and embedded to support semantic search; (2) CDE recommendations. The tool combines Elasticsearch (BM25 methods) with GPT services to recommend candidate CDEs and their permissible values; and (3) Human review. Users review and select the best match for their data elements and value sets. We evaluate the tool\u2019s recommendation accuracy and usability against manual annotations and testing.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>CDEMapper offers a publicly available, LLM-powered, and intuitive user interface that consolidates essential and advanced mapping services into a streamlined pipeline. The evaluation results demonstrated that the augmented BM25 with GPT embeddings and a GPT ranker achieved the overall best performance. The usability test also highlighted the effectiveness and efficiency of our tool.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussions and conclusions<\/jats:title>\n                  <jats:p>This work opens up the potential of using LLMs to assist with CDE mapping when aligning local data elements with NIH CDEs. Additionally, this effort helps researchers better understand the gaps between their data elements and NIH CDEs while promoting CDE reusability.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocaf064","type":"journal-article","created":{"date-parts":[[2025,4,15]],"date-time":"2025-04-15T11:28:12Z","timestamp":1744716492000},"page":"1130-1139","source":"Crossref","is-referenced-by-count":3,"title":["CDEMapper: enhancing National Institutes of Health common data element use with large language models"],"prefix":"10.1093","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1036-9365","authenticated-orcid":false,"given":"Yan","family":"Wang","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jimin","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huan","family":"He","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vincent","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yujia","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xubing","family":"Hao","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX 77030,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pritham","family":"Ram","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lingfei","family":"Qian","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qianqian","family":"Xie","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruey-Ling","family":"Weng","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fongci","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2413-5918","authenticated-orcid":false,"given":"Yan","family":"Hu","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX 77030,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5549-8780","authenticated-orcid":false,"given":"Licong","family":"Cui","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX 77030,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9933-2205","authenticated-orcid":false,"given":"Xiaoqian","family":"Jiang","sequence":"additional","affiliation":[{"name":"McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX 77030,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5274-4672","authenticated-orcid":false,"given":"Hua","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Na","family":"Hong","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics and Data Science, School of Medicine, Yale University , New Haven, CT 06510,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,5,7]]},"reference":[{"key":"2025062620293073600_ocaf064-B1","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1016\/j.conctc.2018.07.004","article-title":"Harmonization, data management, and statistical issues related to prospective multicenter studies in Ankylosing spondylitis (AS): experience from the Prospective Study Of Ankylosing Spondylitis (PSOAS) cohort","volume":"11","author":"Rahbar","year":"2018","journal-title":"Contemp Clin Trials Commun"},{"key":"2025062620293073600_ocaf064-B2","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1136\/jech-2020-214259","article-title":"Overview of retrospective data harmonisation in the MINDMAP project: process and results","volume":"75","author":"Wey","year":"2021","journal-title":"J Epidemiol Community Health"},{"key":"2025062620293073600_ocaf064-B3","doi-asserted-by":"crossref","first-page":"2019","DOI":"10.3390\/ijerph16112019","article-title":"The potential for fetal alcohol spectrum disorder prevention of a harmonized approach to data collection about alcohol use in pregnancy cohort studies","volume":"16","author":"Poole","year":"2019","journal-title":"Int J Environ Res Public Health"},{"key":"2025062620293073600_ocaf064-B4","doi-asserted-by":"crossref","first-page":"e023848","DOI":"10.1161\/JAHA.121.023848","article-title":"Practice patterns and outcomes of transcatheter aortic valve replacement in the United States and Japan: a report from joint data harmonization initiative of STS\/ACC TVT and J-TVT","volume":"11","author":"Kaneko","year":"2022","journal-title":"J Am Heart Assoc"},{"key":"2025062620293073600_ocaf064-B5","author":"Common Data Elements: Standardizing Data Collection\u2014FAIR Data: Data Collection and Sharing","year":"2024"},{"key":"2025062620293073600_ocaf064-B6","author":"Common Data Elements: Standardizing Data Collection","year":"2024"},{"key":"2025062620293073600_ocaf064-B7","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1200\/CCI.20.00059","article-title":"Improving cancer data interoperability: the promise of the Minimal Common Oncology Data Elements (mCODE) initiative","volume":"4","author":"Osterman","year":"2020","journal-title":"JCO Clin Cancer Inform"},{"key":"2025062620293073600_ocaf064-B8","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1136\/amiajnl-2010-000061","article-title":"Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience","volume":"18","author":"Pathak","year":"2011","journal-title":"J Am Med Inform Assoc"},{"key":"2025062620293073600_ocaf064-B9","doi-asserted-by":"crossref","first-page":"967","DOI":"10.1161\/STROKEAHA.111.634352","article-title":"Standardizing the structure of stroke clinical and epidemiologic research data: the National Institute of Neurological Disorders and Stroke (NINDS) Stroke Common Data Element (CDE) project","volume":"43","author":"Saver","year":"2012","journal-title":"Stroke"},{"key":"2025062620293073600_ocaf064-B10","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1007\/s12028-023-01795-1","article-title":"Common data elements for disorders of consciousness: recommendations from the electrophysiology working group","volume":"39","author":"Carroll","year":"2023","journal-title":"Neurocrit Care"},{"key":"2025062620293073600_ocaf064-B11","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1016\/j.jamda.2019.01.123","article-title":"Toward common data elements for international research in long-term care homes: advancing person-centered care","volume":"20","author":"Corazzini","year":"2019","journal-title":"J Am Med Dir Assoc"},{"key":"2025062620293073600_ocaf064-B12","first-page":"681","author":"O\u2019Connor"},{"key":"2025062620293073600_ocaf064-B13","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1089\/neu.2014.3861","article-title":"Pre-clinical traumatic brain injury common data elements: toward a common language across laboratories","volume":"32","author":"Smith","year":"2015","journal-title":"J Neurotrauma"},{"key":"2025062620293073600_ocaf064-B14","doi-asserted-by":"crossref","first-page":"103421","DOI":"10.1016\/j.jbi.2020.103421","article-title":"FAIR data sharing: the roles of common data elements and harmonization","volume":"107","author":"Kush","year":"2020","journal-title":"J Biomed Inform"},{"key":"2025062620293073600_ocaf064-B15","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR guiding principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci Data"},{"key":"2025062620293073600_ocaf064-B16","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1080\/02763869.2024.2323896","article-title":"Common data elements repository","volume":"43","author":"Villere","year":"2024","journal-title":"Med Ref Serv Q"},{"key":"2025062620293073600_ocaf064-B17","author":"Martone"},{"key":"2025062620293073600_ocaf064-B18","author":"NIH CDE Repository","year":"2024"},{"key":"2025062620293073600_ocaf064-B19","author":"Affairs","year":"2019"},{"key":"2025062620293073600_ocaf064-B20","first-page":"465","author":"Tao"},{"key":"2025062620293073600_ocaf064-B21","doi-asserted-by":"crossref","first-page":"30","DOI":"10.3389\/fninf.2015.00030","article-title":"The GAAIN entity mapper: an active-learning system for medical data mapping","volume":"9","author":"Ashish","year":"2016","journal-title":"Frontiers Neuroinform"},{"key":"2025062620293073600_ocaf064-B22","author":"Electronic Medical Records and Genomics (eMERGE) Network","year":"2024"},{"key":"2025062620293073600_ocaf064-B23","doi-asserted-by":"crossref","first-page":"e10205","DOI":"10.2196\/10205","article-title":"The D2Refine platform for the standardization of clinical research study data dictionaries: usability study","volume":"5","author":"Sharma","year":"2018","journal-title":"JMIR Human Factors"},{"key":"2025062620293073600_ocaf064-B24","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1186\/s12911-024-02500-8","article-title":"Mapping of Alzheimer\u2019s disease related data elements and the NIH Common Data Elements","volume":"24","author":"Hao","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"2025062620293073600_ocaf064-B25","author":"Ram"},{"key":"2025062620293073600_ocaf064-B26","author":"Matentzoglu","year":"2023"},{"key":"2025062620293073600_ocaf064-B27","doi-asserted-by":"crossref","first-page":"2076","DOI":"10.1093\/jamia\/ocae133","article-title":"Fine-tuning large language models for rare disease concept normalization","volume":"31","author":"Wang","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2025062620293073600_ocaf064-B28","author":"OpenAI GPT-4o","year":"2025"},{"key":"2025062620293073600_ocaf064-B29","author":"OpenAI","year":"2024"},{"key":"2025062620293073600_ocaf064-B30","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1097\/ICU.0000000000000869","article-title":"The American Academy of Ophthalmology IRIS Registry (Intelligent Research In Sight): current and future state of big data analytics","volume":"33","author":"Pershing","year":"2022","journal-title":"Curr Opin Ophthalmol"},{"key":"2025062620293073600_ocaf064-B31","doi-asserted-by":"crossref","first-page":"1943","DOI":"10.1161\/STROKEAHA.122.041394","article-title":"Consensus recommendations for standardized data elements, scales, and time segmentations in studies of human circadian\/diurnal biology and stroke","volume":"54","author":"Saver","year":"2023","journal-title":"Stroke"},{"key":"2025062620293073600_ocaf064-B32","author":"The National Alzheimer\u2019s Coordinating Center, Uniform Data Set (UDS)","year":"2024"},{"key":"2025062620293073600_ocaf064-B33","author":"The NHLBI-CONNECTS COVID-19 Therapeutic Trial Common Data Elements","year":"2024"},{"key":"2025062620293073600_ocaf064-B34","author":"CDE Mapping Tool\u2014Evaluation Data","year":"2025"},{"key":"2025062620293073600_ocaf064-B35","author":"NIH CDE Collections","year":"2025"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/7\/1130\/63100954\/ocaf064.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/7\/1130\/63100954\/ocaf064.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:29:40Z","timestamp":1750984180000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/32\/7\/1130\/8126535"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,7]]},"references-count":35,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,5,7]]},"published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaf064","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,5,7]]}}}