{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T08:41:15Z","timestamp":1779266475224,"version":"3.51.4"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T00:00:00Z","timestamp":1741824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"Ontario Health"},{"name":"Ontario General Medicine Quality Improvement Network"},{"name":"Unity Health Toronto"},{"name":"AI Research and Education in Medicine"},{"DOI":"10.13039\/501100001804","name":"Canada Research Chairs Program","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001804","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>Electronic health records (EHRs) data are increasingly used for research and analysis, but there is little empirical evidence to inform how automated and manual assessments can be combined to efficiently assess data quality in large EHR repositories.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>The GEMINI database collected data from 462\u00a0226 patient admissions across 32 hospitals from 2021 to 2023. We report data quality issues identified through semi-automated and manual data quality assessments completed during the data collection phase. We conducted a simulation experiment to evaluate the relationship between the number of records reviewed manually, the detection of true data errors (true positives) and the number of manual chart abstraction errors (false positives) that required unnecessary investigation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The semi-automated data quality assessments identified 79 data quality issues requiring correction, of which 14 had a large impact, affecting at least 50% of records in the data. After resolving issues identified through semi-automated assessments, manual validation of 2676 patient encounters at 19 hospitals identified 4 new meaningful data errors (3 in transfusion data and 1 in physician identifiers), distributed across 4 hospitals. There were 365 manual chart abstraction errors, which required investigation by data analysts to identify as \u201cfalse positives.\u201d These errors increased linearly with the number of charts reviewed manually. Simulation results demonstrate that all 3 transfusion data errors were identified with 95% sensitivity after manual review of 5 records, whereas 18 records were needed for the physician\u2019s table.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion and Conclusion<\/jats:title>\n                  <jats:p>The GEMINI approach represents a scalable framework for data quality assessment and improvement in multisite EHR research databases. Manual data review is important but can be minimized to optimize the trade-off between true and false identification of data quality errors.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocaf042","type":"journal-article","created":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T14:44:19Z","timestamp":1741877059000},"page":"835-844","source":"Crossref","is-referenced-by-count":9,"title":["Optimizing the efficiency and effectiveness of data quality assurance in a multicenter clinical dataset"],"prefix":"10.1093","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3886-1853","authenticated-orcid":false,"given":"Anne","family":"Fu","sequence":"first","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]},{"name":"Department of Medicine, Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 1A8,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Trong","family":"Shen","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8792-2240","authenticated-orcid":false,"given":"Surain B","family":"Roberts","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]},{"name":"Institute of Health Policy, Management and Evaluation, University of Toronto , Toronto, ON M5T 3M6,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weihan","family":"Liu","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shruthi","family":"Vaidyanathan","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kayley-Jasmin","family":"Marchena-Romero","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuen Yu Phyllis","family":"Lam","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kieran","family":"Shah","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Denise Y F","family":"Mak","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"name":"GEMINI Investigators","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen","family":"Chin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seth J","family":"Stern","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Radha","family":"Koppula","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lisa F","family":"Joyce","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicholas","family":"Pellegrino","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nancy","family":"Harris","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vivian","family":"Ng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siddhartha","family":"Srivastava","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nathaniel","family":"Manikan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amelia","family":"Wilkinson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jenny","family":"Gastmeier","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jason C","family":"Kwan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hapiloe","family":"Byaruhanga","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linia","family":"Shaji","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siji","family":"George","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephanie","family":"Handsor","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Reshma Anna","family":"Roy","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chong Sung","family":"Kim","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Selam","family":"Mequanint","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fahad","family":"Razak","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]},{"name":"Department of Medicine, Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 1A8,","place":["Canada"]},{"name":"Institute of Health Policy, Management and Evaluation, University of Toronto , Toronto, ON M5T 3M6,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6857-931X","authenticated-orcid":false,"given":"Amol A","family":"Verma","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital , Toronto, ON M5C 3G7,","place":["Canada"]},{"name":"Department of Medicine, Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 1A8,","place":["Canada"]},{"name":"Institute of Health Policy, Management and Evaluation, University of Toronto , Toronto, ON M5T 3M6,","place":["Canada"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,3,13]]},"reference":[{"key":"2025042203524036200_ocaf042-B1","doi-asserted-by":"crossref","first-page":"e110","DOI":"10.5888\/pcd20.230120","article-title":"Advancing chronic disease practice through the CDC Data Modernization Initiative","volume":"20","author":"Carney","year":"2023","journal-title":"Prev Chronic Dis"},{"key":"2025042203524036200_ocaf042-B2","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1097\/PHH.0000000000001693","article-title":"Leveraging electronic health record data for timely chronic disease surveillance: the Multi-State EHR-based Network for Disease Surveillance","volume":"29","author":"Hohman","year":"2023","journal-title":"J Public Health Manag Pract"},{"key":"2025042203524036200_ocaf042-B3","doi-asserted-by":"crossref","first-page":"1406","DOI":"10.2105\/AJPH.2017.303874","article-title":"State and local chronic disease surveillance using electronic health record systems","volume":"107","author":"Klompas","year":"2017","journal-title":"Am J Public Health"},{"key":"2025042203524036200_ocaf042-B4","doi-asserted-by":"crossref","first-page":"S21","DOI":"10.1097\/MLR.0b013e318257dd67","article-title":"A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research","volume":"50","author":"Kahn","year":"2012","journal-title":"Med Care"},{"key":"2025042203524036200_ocaf042-B5","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1136\/amiajnl-2011-000681","article-title":"Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research","volume":"20","author":"Weiskopf","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"2025042203524036200_ocaf042-B6","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1093\/jamia\/ocaa225","article-title":"Assessing the quality of clinical and administrative data extracted from hospitals: the General Medicine Inpatient Initiative (GEMINI) experience","volume":"28","author":"Verma","year":"2021","journal-title":"J Am Med Inform Assoc"},{"key":"2025042203524036200_ocaf042-B7","doi-asserted-by":"crossref","first-page":"18","DOI":"10.13063\/2327-9214.1244","article-title":"A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data","volume":"4","author":"Kahn","year":"2016","journal-title":"EGEMs Gener Evid Meth Improve Patient Outcomes"},{"key":"2025042203524036200_ocaf042-B8","doi-asserted-by":"crossref","first-page":"E842","DOI":"10.9778\/cmajo.20170097","article-title":"Patient characteristics, resource use and outcomes associated with general internal medicine hospital care: the General Medicine Inpatient Initiative (GEMINI) retrospective cohort study","volume":"5","author":"Verma","year":"2017","journal-title":"CMAJ Open"},{"key":"2025042203524036200_ocaf042-B9","volume-title":"Secure Hash Standard","author":"Dang","year":"2015"},{"key":"2025042203524036200_ocaf042-B10","doi-asserted-by":"crossref","first-page":"ooae152","DOI":"10.1093\/jamiaopen\/ooae152","article-title":"pyDeid: an improved, fast, flexible, and generalizable rule-based approach for deidentification of free-text medical records","volume":"8","author":"Sundrelingam","year":"2025","journal-title":"JAMIA Open"},{"key":"2025042203524036200_ocaf042-B11","doi-asserted-by":"crossref","DOI":"10.1161\/01.CIR.101.23.e215","article-title":"PhysioBank, PhysioToolkit, and PhysioNet","volume":"101","author":"Goldberger","year":"2000","journal-title":"Circulation"},{"key":"2025042203524036200_ocaf042-B12","doi-asserted-by":"crossref","first-page":"600","DOI":"10.7326\/0003-4819-153-9-201011020-00010","article-title":"Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership","volume":"153","author":"Stang","year":"2010","journal-title":"Ann Intern Med"},{"key":"2025042203524036200_ocaf042-B13","author":"HQO","year":"2019"},{"key":"2025042203524036200_ocaf042-B14","year":"2019"},{"key":"2025042203524036200_ocaf042-B15","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41746-024-01196-4","article-title":"The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review","volume":"7","author":"Schwabe","year":"2024","journal-title":"NPJ Digit Med"},{"key":"2025042203524036200_ocaf042-B16","doi-asserted-by":"crossref","first-page":"1730","DOI":"10.1093\/jamia\/ocad120","article-title":"Electronic health record data quality assessment and tools: a systematic review","volume":"30","author":"Lewis","year":"2023","journal-title":"J Am Med Inform Assoc"},{"key":"2025042203524036200_ocaf042-B17","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s10916-022-01892-2","article-title":"Automating electronic health record data quality assessment","volume":"47","author":"Ozonze","year":"2023","journal-title":"J Med Syst"},{"key":"2025042203524036200_ocaf042-B18","doi-asserted-by":"crossref","first-page":"e43847","DOI":"10.2196\/43847","article-title":"A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): validation and usability study","volume":"11","author":"Williams","year":"2023","journal-title":"JMIR Med Inform"},{"key":"2025042203524036200_ocaf042-B19","doi-asserted-by":"crossref","first-page":"e43","DOI":"10.5888\/pcd21.230409","article-title":"Validation of Multi-State EHR-Based Network for Disease Surveillance (MENDS) data and implications for improving data quality and representativeness","volume":"21","author":"Hohman","year":"2024","journal-title":"Prev Chronic Dis"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/5\/835\/62400680\/ocaf042.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/5\/835\/62400680\/ocaf042.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,22]],"date-time":"2025-04-22T07:52:54Z","timestamp":1745308374000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/32\/5\/835\/8074958"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,13]]},"references-count":19,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,3,13]]},"published-print":{"date-parts":[[2025,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaf042","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,3,13]]}}}