{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T19:15:02Z","timestamp":1769109302343,"version":"3.49.0"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2023,8,9]],"date-time":"2023-08-09T00:00:00Z","timestamp":1691539200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["U24 TR002306"],"award-info":[{"award-number":["U24 TR002306"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["U01 TR002062"],"award-info":[{"award-number":["U01 TR002062"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000865","name":"Bill & Melinda Gates Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000865","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.<\/jats:p>","DOI":"10.1093\/jamia\/ocad134","type":"journal-article","created":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T20:08:53Z","timestamp":1691525333000},"page":"2036-2040","source":"Crossref","is-referenced-by-count":15,"title":["An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)"],"prefix":"10.1093","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9763-1164","authenticated-orcid":false,"given":"Sijia","family":"Liu","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence and Informatics, Mayo Clinic , Rochester, Minnesota, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9090-8028","authenticated-orcid":false,"given":"Andrew","family":"Wen","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Informatics, Mayo Clinic , Rochester, Minnesota, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9970-8604","authenticated-orcid":false,"given":"Liwei","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Informatics, Mayo Clinic , Rochester, Minnesota, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1312-4195","authenticated-orcid":false,"given":"Huan","family":"He","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Informatics, Mayo Clinic , Rochester, Minnesota, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1691-5179","authenticated-orcid":false,"given":"Sunyang","family":"Fu","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Informatics, Mayo Clinic , Rochester, Minnesota, USA"}]},{"given":"Robert","family":"Miller","sequence":"additional","affiliation":[{"name":"Tufts Clinical and Translational Science Institute, Tufts Medical Center , Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0692-412X","authenticated-orcid":false,"given":"Andrew","family":"Williams","sequence":"additional","affiliation":[{"name":"Tufts Clinical and Translational Science Institute, Tufts Medical Center , Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9139-3433","authenticated-orcid":false,"given":"Daniel","family":"Harris","sequence":"additional","affiliation":[{"name":"Department of Internal Medicine, University of Kentucky , Lexington, Kentucky, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1238-9378","authenticated-orcid":false,"given":"Ramakanth","family":"Kavuluru","sequence":"additional","affiliation":[{"name":"Department of Internal Medicine, University of Kentucky , Lexington, Kentucky, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8036-2110","authenticated-orcid":false,"given":"Mei","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Internal Medicine, University of Kansas Medical Center , Kansas City, Kansas, USA"}]},{"given":"Noor","family":"Abu-el-Rub","sequence":"additional","affiliation":[{"name":"Department of Internal Medicine, University of Kansas Medical Center , Kansas City, Kansas, USA"}]},{"given":"Dalton","family":"Schutte","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities , Minneapolis, Minnesota, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8258-3585","authenticated-orcid":false,"given":"Rui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Care & Health Systems, University of Minnesota at Twin Cities , Minneapolis, Minnesota, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9006-6112","authenticated-orcid":false,"given":"Masoud","family":"Rouhizadeh","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville , Florida, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0851-1150","authenticated-orcid":false,"given":"John D","family":"Osborne","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Alabama at Birmingham , Birmingham, Alabama, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9189-9661","authenticated-orcid":false,"given":"Yongqun","family":"He","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics, University of Michigan Medical School , Ann Arbor, Michigan, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3241-8773","authenticated-orcid":false,"given":"Umit","family":"Topaloglu","sequence":"additional","affiliation":[{"name":"Department of Cancer Biology, Wake Forest School of Medicine , Winston-Salem, North Carolina, USA"}]},{"given":"Stephanie S","family":"Hong","sequence":"additional","affiliation":[{"name":"Department of Medicine, Johns Hopkins University , Baltimore, Maryland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3451-2165","authenticated-orcid":false,"given":"Joel H","family":"Saltz","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Stony Brook University , Stony Brook, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8242-9462","authenticated-orcid":false,"given":"Thomas","family":"Schaffter","sequence":"additional","affiliation":[{"name":"Sage Bionetwork , Seattle, Washington, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6840-9756","authenticated-orcid":false,"given":"Emily","family":"Pfaff","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of North Carolina Chapel Hill , Chapel Hill, North Carolina, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5437-2545","authenticated-orcid":false,"given":"Christopher G","family":"Chute","sequence":"additional","affiliation":[{"name":"Department of Medicine, Johns Hopkins University , Baltimore, Maryland, USA"}]},{"given":"Tim","family":"Duong","sequence":"additional","affiliation":[{"name":"Department of Radiology, Albert Einstein College of Medicine , Bronx, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9114-8737","authenticated-orcid":false,"given":"Melissa A","family":"Haendel","sequence":"additional","affiliation":[{"name":"Center for Health AI, University of Colorado Anschutz Medical Campus , Denver, Colorado, USA"}]},{"given":"Rafael","family":"Fuentes","sequence":"additional","affiliation":[{"name":"Alex Informatics , North Bethesda, Maryland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8411-6403","authenticated-orcid":false,"given":"Peter","family":"Szolovits","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology , Cambridge, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5274-4672","authenticated-orcid":false,"given":"Hua","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, Texas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2570-3741","authenticated-orcid":false,"given":"Hongfang","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Informatics, Mayo Clinic , Rochester, Minnesota, USA"},{"name":"School of Biomedical Informatics, University of Texas Health Science Center at Houston , Houston, Texas, USA"}]}],"member":"286","published-online":{"date-parts":[[2023,8,9]]},"reference":[{"issue":"2","key":"2023111709543729300_ocad134-B1","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1136\/jamia.2010.007237","article-title":"Data from clinical notes: a perspective on the tension between structure and flexible documentation","volume":"18","author":"Rosenbloom","year":"2011","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2023111709543729300_ocad134-B2","doi-asserted-by":"crossref","first-page":"e12802","DOI":"10.2196\/12802","article-title":"Artificial intelligence and the future of primary care: exploratory qualitative study of UK General Practitioners' Views","volume":"21","author":"Blease","year":"2019","journal-title":"J Med Internet Res"},{"key":"2023111709543729300_ocad134-B3","doi-asserted-by":"crossref","first-page":"103526","DOI":"10.1016\/j.jbi.2020.103526","article-title":"Clinical concept extraction: a methodology review","volume":"109","author":"Fu","year":"2020","journal-title":"J Biomed Inform"},{"issue":"25","key":"2023111709543729300_ocad134-B4","doi-asserted-by":"crossref","first-page":"2409","DOI":"10.1056\/NEJMp1605378","article-title":"From patient to patient\u2013sharing the data from clinical trials","volume":"374","author":"Haug","year":"2016","journal-title":"N Engl J Med"},{"issue":"13","key":"2023111709543729300_ocad134-B5","doi-asserted-by":"crossref","first-page":"e1313","DOI":"10.1212\/WNL.0000000000012602","article-title":"Association of silent cerebrovascular disease identified using natural language processing and future ischemic stroke","volume":"97","author":"Kent","year":"2021","journal-title":"Neurology"},{"key":"2023111709543729300_ocad134-B6","doi-asserted-by":"crossref","first-page":"100608","DOI":"10.1016\/j.conctc.2020.100608","article-title":"Site engagement for multi-site clinical trials","volume":"19","author":"Goodlett","year":"2020","journal-title":"Contemp Clin Trials Commun"},{"issue":"1","key":"2023111709543729300_ocad134-B7","first-page":"1041","article-title":"eGEMs: pathways to success for multisite clinical data research","volume":"1","author":"McGraw Jd","year":"2013","journal-title":"EGEMS (Wash DC)"},{"key":"2023111709543729300_ocad134-B8","first-page":"577","article-title":"A study of transportability of an existing smoking status detection module across institutions","volume":"2012","author":"Liu","year":"2012","journal-title":"AMIA Annu Symp Proc"},{"issue":"3","key":"2023111709543729300_ocad134-B9","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1093\/jamia\/ocx138","article-title":"Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions","volume":"25","author":"Sohn","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"8","key":"2023111709543729300_ocad134-B10","doi-asserted-by":"crossref","first-page":"e38155","DOI":"10.2196\/38155","article-title":"Multicenter validation of natural language processing algorithms for the detection of common data elements in operative notes for total hip arthroplasty: algorithm development and validation","volume":"10","author":"Han","year":"2022","journal-title":"JMIR Med Inform"},{"key":"2023111709543729300_ocad134-B11","first-page":"604","article-title":"Identification of patients with family history of pancreatic cancer\u2013investigation of an NLP system portability","volume":"216","author":"Mehrabi","year":"2015","journal-title":"Stud Health Technol Inform"},{"issue":"4","key":"2023111709543729300_ocad134-B12","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1093\/jamiaopen\/ooz035","article-title":"Heterogeneity introduced by EHR system implementation in a de-identified data resource from 100 non-affiliated organizations","volume":"2","author":"Glynn","year":"2019","journal-title":"JAMIA Open"},{"issue":"1","key":"2023111709543729300_ocad134-B13","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1186\/s12911-020-1072-9","article-title":"Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction","volume":"20","author":"Fu","year":"2020","journal-title":"BMC Med Inform Decis Mak"},{"key":"2023111709543729300_ocad134-B14","first-page":"149","article-title":"An information extraction framework for cohort identification using electronic health records","volume":"2013","author":"Liu","year":"2013","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"2023111709543729300_ocad134-B15","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1038\/s41746-019-0208-8","article-title":"Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation","volume":"2","author":"Wen","year":"2019","journal-title":"NPJ Digit Med"},{"key":"2023111709543729300_ocad134-B16","doi-asserted-by":"publisher","author":"Rando","year":"2021","DOI":"10.1101\/2021.03.20.21253896"},{"issue":"20","key":"2023111709543729300_ocad134-B17","doi-asserted-by":"crossref","first-page":"2232","DOI":"10.1200\/JCO.21.01074","article-title":"Outcomes of COVID-19 in Patients With Cancer: Report From the National COVID Cohort Collaborative (N3C)","volume":"39","author":"Sharafeldin","year":"2021","journal-title":"JCO"},{"issue":"3","key":"2023111709543729300_ocad134-B18","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1093\/jamia\/ocaa196","article-title":"The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment","volume":"28","author":"Haendel","year":"2021","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2023111709543729300_ocad134-B19","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/s41597-020-0523-6","article-title":"CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis","volume":"7","author":"He","year":"2020","journal-title":"Sci Data"},{"issue":"D1","key":"2023111709543729300_ocad134-B20","doi-asserted-by":"crossref","first-page":"D1207","DOI":"10.1093\/nar\/gkaa1043","article-title":"The Human Phenotype Ontology in 2021","volume":"49","author":"K\u00f6hler","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023111709543729300_ocad134-B21","volume-title":"Publicly Available Clinical BERT Embeddings","author":"Alsentzer","year":"2019"},{"key":"2023111709543729300_ocad134-B22"},{"key":"2023111709543729300_ocad134-B23","doi-asserted-by":"crossref","first-page":"101139","DOI":"10.1016\/j.imu.2022.101139","article-title":"Comparison of BERT implementations for natural language processing of narrative medical documents","volume":"36","author":"Turchin","year":"2023","journal-title":"Inform Med Unlocked"},{"key":"2023111709543729300_ocad134-B24","author":"Zhang","year":"2020"},{"issue":"3","key":"2023111709543729300_ocad134-B25","first-page":"189","article-title":"Bootstrap confidence intervals","volume":"11","author":"Thomas","year":"1996","journal-title":"Stat Sci"},{"issue":"6","key":"2023111709543729300_ocad134-B26","doi-asserted-by":"crossref","first-page":"e2200006","DOI":"10.1200\/CCI.22.00006","article-title":"Assessment of electronic health record for cancer research and patient care through a scoping review of cancer natural language processing","volume":"6","author":"Wang","year":"2022","journal-title":"JCO Clin Cancer Inform"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/advance-article-pdf\/doi\/10.1093\/jamia\/ocad134\/51075834\/ocad134.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/12\/2036\/53477548\/ocad134.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/12\/2036\/53477548\/ocad134.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T13:27:34Z","timestamp":1700227654000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/30\/12\/2036\/7239870"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,9]]},"references-count":26,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2023,8,9]]},"published-print":{"date-parts":[[2023,11,17]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocad134","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,12,1]]},"published":{"date-parts":[[2023,8,9]]}}}