{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T08:41:15Z","timestamp":1779266475219,"version":"3.51.4"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2020,11,4]],"date-time":"2020-11-04T00:00:00Z","timestamp":1604448000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Green Shield Canada Foundation"},{"name":"University of Toronto Division of General Internal Medicine"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Objective<\/jats:title>\n                    <jats:p>Large clinical databases are increasingly used for research and quality improvement. We describe an approach to data quality assessment from the General Medicine Inpatient Initiative (GEMINI), which collects and standardizes administrative and clinical data from hospitals.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>The GEMINI database contained 245\u00a0559 patient admissions at 7 hospitals in Ontario, Canada from 2010 to 2017. We performed 7 computational data quality checks and iteratively re-extracted data from hospitals to correct problems. Thereafter, GEMINI data were compared to data that were manually abstracted from the hospital\u2019s electronic medical record for 23\u00a0419 selected data points on a sample of 7488 patients.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Computational checks flagged 103 potential data quality issues, which were either corrected or documented to inform future analysis. For example, we identified the inclusion of canceled radiology tests, a time shift of transfusion data, and mistakenly processing the chemical symbol for sodium (\u201cNa\u201d) as a missing value. Manual validation identified 1 important data quality issue that was not detected by computational checks: transfusion dates and times at 1 site were unreliable. Apart from that single issue, across all data tables, GEMINI data had high overall accuracy (ranging from 98%\u2013100%), sensitivity (95%\u2013100%), specificity (99%\u2013100%), positive predictive value (93%\u2013100%), and negative predictive value (99%\u2013100%) compared to the gold standard.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion and Conclusion<\/jats:title>\n                    <jats:p>Computational data quality checks with iterative re-extraction facilitated reliable data collection from hospitals but missed 1 critical quality issue. Combining computational and manual approaches may be optimal for assessing the quality of large multisite clinical databases.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocaa225","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T15:13:40Z","timestamp":1600096420000},"page":"578-587","source":"Crossref","is-referenced-by-count":77,"title":["Assessing the quality of clinical and administrative data extracted from hospitals: the General Medicine Inpatient Initiative (GEMINI) experience"],"prefix":"10.1093","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6857-931X","authenticated-orcid":false,"given":"Amol A","family":"Verma","sequence":"first","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"},{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Institute for Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sachin V","family":"Pasricha","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"},{"name":"School of Medicine, Faculty of Health Sciences, Queen\u2019s University, Kingston, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hae Young","family":"Jung","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vladyslav","family":"Kushnir","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Denise Y F","family":"Mak","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Radha","family":"Koppula","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yishan","family":"Guo","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Janice L","family":"Kwan","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Department of Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lauren","family":"Lapointe-Shaw","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Division of General Internal Medicine, University Health Network, Toronto, Ontario, Canada"},{"name":"Institute for Clinical and Evaluative Sciences, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shail","family":"Rawal","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Division of General Internal Medicine, University Health Network, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Terence","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Institute for Better Health, Trillium Health Partners, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adina","family":"Weinerman","sequence":"additional","affiliation":[{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fahad","family":"Razak","sequence":"additional","affiliation":[{"name":"Li Ka Shing Knowledge Institute, St. Michael\u2019s Hospital, Toronto, Ontario, Canada"},{"name":"Department of Medicine, University of Toronto, Toronto, Ontario, Canada"},{"name":"Institute for Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2020,11,4]]},"reference":[{"issue":"5","key":"2021030612311888100_ocaa225-B1","doi-asserted-by":"crossref","first-page":"359","DOI":"10.7326\/0003-4819-151-5-200909010-00141","article-title":"Toward reuse of clinical data for research and quality improvement: the end of the beginning?","volume":"151","author":"Weiner","year":"2009","journal-title":"Ann Intern Med"},{"issue":"3","key":"2021030612311888100_ocaa225-B2","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1093\/ije\/dyv098","article-title":"Data resource profile: clinical practice research datalink (CPRD)","volume":"44","author":"Herrett","year":"2015","journal-title":"Int J Epidemiol"},{"issue":"10","key":"2021030612311888100_ocaa225-B3","first-page":"851","article-title":"Update from CPCSSN","volume":"62","author":"Birtwhistle","year":"2016","journal-title":"Can Fam Physician"},{"issue":"5","key":"2021030612311888100_ocaa225-B4","first-page":"199","article-title":"The American College of Surgeons National Surgical Quality Improvement Program: achieving better and safer surgery","volume":"41","author":"Ko","year":"2015","journal-title":"Jt Comm J Qual Patient Saf"},{"issue":"33","key":"2021030612311888100_ocaa225-B5","doi-asserted-by":"crossref","first-page":"E1054","DOI":"10.1503\/cmaj.170807","article-title":"Routinely collected data: the importance of high-quality diagnostic coding to research","volume":"189","author":"Nicholls","year":"2017","journal-title":"CMAJ"},{"issue":"2","key":"2021030612311888100_ocaa225-B6","doi-asserted-by":"crossref","first-page":"e93","DOI":"10.1002\/bjs.9723","article-title":"The rise of big clinical databases","volume":"102","author":"Cook","year":"2015","journal-title":"Br J Surg"},{"key":"2021030612311888100_ocaa225-B7","doi-asserted-by":"crossref","first-page":"S21","DOI":"10.1097\/MLR.0b013e318257dd67","article-title":"A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research","volume":"50","author":"Kahn","year":"2012","journal-title":"Med Care"},{"issue":"1","key":"2021030612311888100_ocaa225-B8","doi-asserted-by":"crossref","first-page":"18","DOI":"10.13063\/2327-9214.1244","article-title":"A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data","volume":"4","author":"Kahn","year":"2016","journal-title":"EGEMS"},{"issue":"3","key":"2021030612311888100_ocaa225-B9","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1093\/jamia\/ocx078","article-title":"Assessing the quality of administrative data for research: a framework from the Manitoba Centre for Health Policy","volume":"25","author":"Smith","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2021030612311888100_ocaa225-B10","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1136\/amiajnl-2011-000681","article-title":"Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research","volume":"20","author":"Weiskopf","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2021030612311888100_ocaa225-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1541880.1541883","article-title":"Methodologies for data quality assessment and improvement","volume":"41","author":"Batini","year":"2009","journal-title":"ACM Comput Surv"},{"issue":"4","key":"2021030612311888100_ocaa225-B12","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1370\/afm.1644","article-title":"Validating the 8 CPCSSN case definitions for chronic disease surveillance in a primary care database of electronic health records","volume":"12","author":"Williamson","year":"2014","journal-title":"Ann Fam Med"},{"issue":"1","key":"2021030612311888100_ocaa225-B13","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1183\/20734735.0344-2018","article-title":"How to validate a diagnosis recorded in electronic health records","volume":"15","author":"Nissen","year":"2019","journal-title":"Breathe"},{"key":"2021030612311888100_ocaa225-B14","first-page":"1080","article-title":"A framework for data quality assessment in clinical research datasets","volume":"2017","author":"Lee","year":"2017","journal-title":"AMIA Annu Symp Proc"},{"issue":"0","key":"2021030612311888100_ocaa225-B15","doi-asserted-by":"crossref","first-page":"2","DOI":"10.5334\/dsj-2015-002","article-title":"The challenges of data quality and data quality assessment in the big data era","volume":"14","author":"Cai","year":"2015","journal-title":"Data Sci J"},{"issue":"10","key":"2021030612311888100_ocaa225-B16","doi-asserted-by":"crossref","first-page":"e267","DOI":"10.1371\/journal.pmed.0020267","article-title":"Data cleaning: detecting, diagnosing, and editing data abnormalities","volume":"2","author":"Van den Broeck","year":"2005","journal-title":"PLoS Med"},{"issue":"1","key":"2021030612311888100_ocaa225-B17","doi-asserted-by":"crossref","first-page":"3","DOI":"10.5334\/egems.199","article-title":"Evaluating foundational data quality in the national patient-centered clinical research network (PCORnet(R","volume":"6","author":"Qualls","year":"2018","journal-title":"EGEMS"},{"issue":"1","key":"2021030612311888100_ocaa225-B18","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1186\/s12874-019-0737-5","article-title":"The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project","volume":"19","author":"Sunderland","year":"2019","journal-title":"BMC Med Res Methodol"},{"issue":"4","key":"2021030612311888100_ocaa225-B19","doi-asserted-by":"crossref","first-page":"E842","DOI":"10.9778\/cmajo.20170097","article-title":"Patient characteristics, resource use and outcomes associated with general internal medicine hospital care: the General Medicine Inpatient Initiative (GEMINI) retrospective cohort study","volume":"5","author":"Verma","year":"2017","journal-title":"CMAJ Open"},{"key":"2021030612311888100_ocaa225-B20","year":"2019"},{"key":"2021030612311888100_ocaa225-B21","year":"2019"},{"key":"2021030612311888100_ocaa225-B22","doi-asserted-by":"crossref","DOI":"10.1201\/b14764","volume-title":"Guide to the De-identification of Personal Health Information","author":"El Emam","year":"2013"},{"issue":"3","key":"2021030612311888100_ocaa225-B23","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1016\/j.ijmedinf.2010.10.016","article-title":"A methodology for the pseudonymization of medical data","volume":"80","author":"Neubauer","year":"2011","journal-title":"Int J Med Inform"},{"key":"2021030612311888100_ocaa225-B24","volume-title":"Federal Information Processing Standards Publication","author":"Dang","year":"2015"},{"issue":"1","key":"2021030612311888100_ocaa225-B25","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/1472-6947-14-32","article-title":"Influence of data quality on computed Dutch hospital quality indicators: a case study in colorectal cancer surgery","volume":"14","author":"Dentler","year":"2014","journal-title":"BMC Med Inform Decis Mak"},{"key":"2021030612311888100_ocaa225-B26","year":"2009"},{"issue":"4","key":"2021030612311888100_ocaa225-B27","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/07421222.1996.11518099","article-title":"Beyond accuracy: what data quality means to data consumers","volume":"12","author":"Wang","year":"1996","journal-title":"J Manag Inform Syst"},{"issue":"1","key":"2021030612311888100_ocaa225-B28","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1186\/s12911-017-0504-7","article-title":"Validation of multisource electronic health record data: an application to blood transfusion data","volume":"17","author":"Hoeven","year":"2017","journal-title":"BMC Med Inform Decis Mak"},{"issue":"1","key":"2021030612311888100_ocaa225-B29","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/s12911-019-0740-0","article-title":"A basic model for assessing primary health care electronic medical record data quality","volume":"19","author":"Terry","year":"2019","journal-title":"BMC Med Inform Decis Mak"},{"issue":"18","key":"2021030612311888100_ocaa225-B30","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1212\/WNL.0000000000007404","article-title":"Axon Registry(R) data validation: accuracy assessment of data extraction and measure specification","volume":"92","author":"Baca","year":"2019","journal-title":"Neurology"},{"issue":"7","key":"2021030612311888100_ocaa225-B31","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1093\/jamia\/ocy041","article-title":"Using statistical anomaly detection models to find clinical decision support malfunctions","volume":"25","author":"Ray","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2021030612311888100_ocaa225-B32","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1093\/jamia\/ocw067","article-title":"Comparison of accuracy of physical examination findings in initial progress notes between paper charts and a newly implemented electronic health record","volume":"24","author":"Yadav","year":"2017","journal-title":"J Am Med Inform Assoc"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/3\/578\/36428672\/ocaa225.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/28\/3\/578\/36428672\/ocaa225.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T01:49:47Z","timestamp":1668736187000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/28\/3\/578\/5961438"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,4]]},"references-count":32,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,11,4]]},"published-print":{"date-parts":[[2021,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaa225","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.03.16.20036962","asserted-by":"object"}]},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,3,1]]},"published":{"date-parts":[[2020,11,4]]}}}