{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T11:42:31Z","timestamp":1774957351957,"version":"3.50.1"},"reference-count":64,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2019,12,16]],"date-time":"2019-12-16T00:00:00Z","timestamp":1576454400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"State Department of Health","award":["DG_2016_0601_001"],"award-info":[{"award-number":["DG_2016_0601_001"]}]},{"name":"State Department of Health","award":["DG_2016_0601_002"],"award-info":[{"award-number":["DG_2016_0601_002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>Development of systematic approaches for understanding and assessing data quality is becoming increasingly important as the volume and utilization of health data steadily increases. In this study, a taxonomy of data defects was developed and utilized when automatically detecting defects to assess Medicaid data quality maintained by one of the states in the United States.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>There were more than 2.23 million rows and 32 million cells in the Medicaid data examined. The taxonomy was developed through document review, descriptive data analysis, and literature review. A software program was created to automatically detect defects by using a set of constraints whose development was facilitated by the taxonomy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Five major categories and seventeen subcategories of defects were identified. The major categories are missingness, incorrectness, syntax violation, semantic violation, and duplicity. More than 3 million defects were detected indicating substantial problems with data quality. Defect density exceeded 10% in five tables. The majority of the data defects belonged to format mismatch, invalid code, dependency-contract violation, and implausible value types. Such contextual knowledge can support prioritized quality improvement initiatives for the Medicaid data studied.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>This research took the initial steps to understand the types of data defects and detect defects in large healthcare datasets. The results generally suggest that healthcare organizations can potentially benefit from focusing on data quality improvement. For those purposes, the taxonomy developed and the approach followed in this study can be adopted.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocz201","type":"journal-article","created":{"date-parts":[[2019,11,26]],"date-time":"2019-11-26T12:11:28Z","timestamp":1574770288000},"page":"386-395","source":"Crossref","is-referenced-by-count":23,"title":["Understanding and detecting defects in healthcare administration data: Toward higher data quality to better support healthcare operations and decisions"],"prefix":"10.1093","volume":"27","author":[{"given":"Yili","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Information Systems, University of Maryland, Baltimore County, Baltimore, Maryland, USA"},{"name":"Postdoctoral Fellow the Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA"}]},{"given":"G\u00fcne\u015f","family":"Koru","sequence":"additional","affiliation":[{"name":"Department of Information Systems, University of Maryland, Baltimore County, Baltimore, Maryland, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,12,16]]},"reference":[{"issue":"5p2","key":"2020110613091641600_ocz201-B1","doi-asserted-by":"crossref","first-page":"1442","DOI":"10.1111\/j.1475-6773.2010.01140.x","article-title":"Data governance and stewardship: designing data stewardship entities and advancing data access","volume":"45","author":"Rosenbaum","year":"2010","journal-title":"Health Serv Res"},{"issue":"6","key":"2020110613091641600_ocz201-B2","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1136\/jamia.2000.0070569","article-title":"Impact of a computer-based patient record system on data collection, knowledge organization, and reasoning","volume":"7","author":"Patel","year":"2000","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613091641600_ocz201-B3","first-page":"522\u20139","author":"Dunkel","year":"1999"},{"issue":"9","key":"2020110613091641600_ocz201-B4","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1002\/(SICI)1097-4571(199709)48:9<862::AID-ASI12>3.0.CO;2-T","article-title":"Data mining with neural networks: solving business problems from application development to decision support","volume":"48","author":"Schroeder","year":"1997","journal-title":"J Am Soc Inf Sci"},{"issue":"2","key":"2020110613091641600_ocz201-B5","first-page":"311","article-title":"The perfect neuroimaging-genetics-computation storm: collision of petabytes of data, millions of hardware devices and thousands of software tools","volume":"8","author":"Dinov","year":"2014","journal-title":"Brain Imaging Behav"},{"issue":"1","key":"2020110613091641600_ocz201-B6","doi-asserted-by":"crossref","first-page":"3.","DOI":"10.7243\/2053-7662-4-3","article-title":"Volume and value of big healthcare data","volume":"4","author":"Dinov","year":"2016","journal-title":"J Med Stat Inform"},{"issue":"6","key":"2020110613091641600_ocz201-B7","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1093\/jamia\/ocw010","article-title":"Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories","volume":"23","author":"S\u00e1ez","year":"2016","journal-title":"J Am Med Inform Assoc"},{"issue":"6","key":"2020110613091641600_ocz201-B8","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/jamia\/ocw013","article-title":"Data quality of electronic medical records in Manitoba: do problem lists accurately reflect chronic disease billing diagnoses?","volume":"23","author":"Singer","year":"2016","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2020110613091641600_ocz201-B9","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1093\/jamia\/ocv156","article-title":"Assessing race and ethnicity data quality across cancer registries and EMRs in two hospitals","volume":"23","author":"Lee","year":"2016","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110613091641600_ocz201-B10","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/253769.253804","article-title":"Data quality in context","volume":"40","author":"Strong","year":"1997","journal-title":"Commun ACM"},{"issue":"1","key":"2020110613091641600_ocz201-B11","doi-asserted-by":"crossref","first-page":"1328185.","DOI":"10.1080\/16549716.2017.1328185","article-title":"Child anthropometry data quality from Demographic and Health Surveys, Multiple Indicator Cluster Surveys, and National Nutrition Surveys in the West Central Africa region: are we comparing apples and oranges?","volume":"10","author":"Corsi","year":"2017","journal-title":"Glob Health Action"},{"issue":"2","key":"2020110613091641600_ocz201-B12","doi-asserted-by":"crossref","first-page":"e15.","DOI":"10.2196\/medinform.6226","article-title":"Applying STOPP guidelines in primary care through electronic medical record decision support: randomized control trial highlighting the importance of data quality","volume":"5","author":"Price","year":"2017","journal-title":"JMIR Med Inform"},{"issue":"1","key":"2020110613091641600_ocz201-B13","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1136\/jamia.2000.0070106","article-title":"Assessing data quality from concordance, through correctness and completeness, to valid manipulatable representations","volume":"7","author":"Brennan","year":"2000","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613091641600_ocz201-B14","volume-title":"Preventing Death and Injury from Medical Errors Requires Dramatic, Systemwide Changes. Press Release","author":"Tickner","year":"1999"},{"issue":"1","key":"2020110613091641600_ocz201-B15","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1136\/amiajnl-2011-000681","article-title":"Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research","volume":"20","author":"Weiskopf","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613091641600_ocz201-B16","article-title":"Poor data management costs healthcare providers","author":"Lewis","year":"2012","journal-title":"Inf Week Healthc"},{"issue":"8","key":"2020110613091641600_ocz201-B17","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1016\/j.annepidem.2017.07.001","article-title":"Fetal death certificate data quality: a tale of two US counties","volume":"27","author":"Christiansen-Lindquist","year":"2017","journal-title":"Ann Epidemiol"},{"issue":"1","key":"2020110613091641600_ocz201-B18","doi-asserted-by":"crossref","first-page":"3\u201311.","DOI":"10.23876\/j.krcp.2017.36.1.3","article-title":"Medical big data: promise and challenges","volume":"36","author":"Lee","year":"2017","journal-title":"Kidney Res Clin Pract"},{"issue":"5","key":"2020110613091641600_ocz201-B19","doi-asserted-by":"crossref","first-page":"279","DOI":"10.14778\/1952376.1952378","article-title":"Guided data repair","volume":"4","author":"Yakout","year":"2011","journal-title":"Proc VLDB Endow"},{"key":"2020110613091641600_ocz201-B20","first-page":"1\u20135.","article-title":"Secondary use of EHR: data quality issues and informatics opportunities","volume":"2010","author":"Botsis","year":"2010","journal-title":"Summit Transl Bioinforma"},{"issue":"4","key":"2020110613091641600_ocz201-B21","first-page":"189\u201399.","article-title":"Agreement between physicians\u2019 office records and Medicare part B claims data","volume":"16","author":"Fowles","year":"1995","journal-title":"Health Care Financ Rev"},{"issue":"1","key":"2020110613091641600_ocz201-B22","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1093\/jamia\/ocw054","article-title":"Improving the quality of EHR recording in primary care: a data quality feedback tool","volume":"24","author":"Van Der Bij","year":"2017","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"2020110613091641600_ocz201-B23","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1197\/jamia.M1362","article-title":"Data quality of general practice electronic health records: the impact of a program of assessments, feedback, and training","volume":"11","author":"Porcheret","year":"2004","journal-title":"J Am Med Inform Assoc"},{"issue":"2","key":"2020110613091641600_ocz201-B24","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1197\/jamia.M1471","article-title":"Some unintended consequences of information technology in health care: the nature of patient care information system-related errors","volume":"11","author":"Ash","year":"2003","journal-title":"J Am Med Inform Assoc"},{"issue":"9","key":"2020110613091641600_ocz201-B25","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1109\/PROC.1980.11805","article-title":"Programs, life cycles, and laws of software evolution","volume":"68","author":"Lehman","year":"1980","journal-title":"Proc IEEE"},{"key":"2020110613091641600_ocz201-B26","volume-title":"Program Evolution: Processes of Software Change","author":"Lehman","year":"1985"},{"key":"2020110613091641600_ocz201-B27","first-page":"20","author":"Lehman","year":"1997"},{"key":"2020110613091641600_ocz201-B28","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1007\/978-3-642-54092-9_13","volume-title":"ENASE 2013: Evaluation of Novel Approaches to Software Engineering","author":"Drouin","year":"2013"},{"issue":"11","key":"2020110613091641600_ocz201-B29","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1145\/163359.163375","article-title":"Software complexity and maintenance costs","volume":"36","author":"Banker","year":"1993","journal-title":"Commun ACM"},{"issue":"1","key":"2020110613091641600_ocz201-B30","doi-asserted-by":"crossref","first-page":"304.","DOI":"10.1186\/s12913-017-2247-7","article-title":"The quality of Medicaid and Medicare data obtained from CMS and its contractors: implications for pharmacoepidemiology","volume":"17","author":"Leonard","year":"2017","journal-title":"BMC Health Serv Res"},{"key":"2020110613091641600_ocz201-B31","first-page":"1","author":"Rabia","year":"2018"},{"issue":"1","key":"2020110613091641600_ocz201-B32","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1109\/TASE.2016.2603420","article-title":"Data-defect inspection with kernel-neighbor-density-change outlier factor","volume":"15","author":"Cao","year":"2018","journal-title":"IEEE Trans Automat Sci Eng"},{"key":"2020110613091641600_ocz201-B33","first-page":"60","article-title":"Automated tools for clinical research data quality control using NCI common data elements","volume":"2014","author":"Hudson","year":"2014","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"issue":"3","key":"2020110613091641600_ocz201-B34","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1097\/PEP.0000000000000425","article-title":"Therapy use for children with developmental conditions: analysis of Colorado Medicaid data","volume":"29","author":"McManus","year":"2017","journal-title":"Pediatr Phys Ther"},{"issue":"6","key":"2020110613091641600_ocz201-B35","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1002\/pds.3627","article-title":"Validity of maternal and infant outcomes within nationwide Medicaid data","volume":"23","author":"Palmsten","year":"2014","journal-title":"Pharmacoepidemiol Drug Saf"},{"issue":"1","key":"2020110613091641600_ocz201-B36","doi-asserted-by":"crossref","first-page":"60.","DOI":"10.1186\/1472-6947-10-60","article-title":"A knowledge-based taxonomy of critical factors for adopting electronic health record systems by physicians: a systematic literature review","volume":"10","author":"Castillo","year":"2010","journal-title":"BMC Med Inform Decis Mak"},{"issue":"12","key":"2020110613091641600_ocz201-B37","doi-asserted-by":"crossref","first-page":"1216","DOI":"10.1097\/MLR.0b013e318148435a","article-title":"Quality of medicaid and medicare data obtained through Centers for Medicare and Medicaid Services (CMS)","volume":"45","author":"Hennessy","year":"2007","journal-title":"Med Care"},{"issue":"8_Part_2","key":"2020110613091641600_ocz201-B38","doi-asserted-by":"crossref","first-page":"666","DOI":"10.7326\/0003-4819-127-8_Part_2-199710151-00048","article-title":"Assessing quality using administrative data","volume":"127","author":"Iezzoni","year":"1997","journal-title":"Ann Intern Med"},{"issue":"2","key":"2020110613091641600_ocz201-B39","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1097\/00005650-197602000-00006","article-title":"Medicaid records as a valid data source: the Tennessee experience","volume":"14","author":"Federspiel","year":"1976","journal-title":"Med Care"},{"key":"2020110613091641600_ocz201-B40","first-page":"178","author":"Mehta","year":"2000"},{"issue":"3","key":"2020110613091641600_ocz201-B41","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1177\/1473095214542632","article-title":"As planning is everything, it is good for something!: Coasian economic taxonomy of modes of planning","volume":"15","author":"Lai","year":"2016","journal-title":"Planning Theory"},{"issue":"1","key":"2020110613091641600_ocz201-B42","doi-asserted-by":"crossref","first-page":"59","DOI":"10.3122\/jabfm.17.1.59","article-title":"Strength of recommendation taxonomy (SORT): a patient-centered approach to grading evidence in the medical literature","volume":"17","author":"Ebell","year":"2004","journal-title":"J Am Board Fam Pract"},{"issue":"12","key":"2020110613091641600_ocz201-B43","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.1002\/hec.1148","article-title":"A taxonomy of model structures for economic evaluation of health technologies","volume":"15","author":"Brennan","year":"2006","journal-title":"Health Econ"},{"issue":"2","key":"2020110613091641600_ocz201-B44","doi-asserted-by":"crossref","DOI":"10.5600\/mmrr.003.02.sa03","article-title":"The impact of electronic health records on ambulatory costs among Medicaid beneficiaries","volume":"3","author":"Adler-Milstein","year":"2013","journal-title":"Medicare Medicaid Res Rev"},{"issue":"4","key":"2020110613091641600_ocz201-B45","doi-asserted-by":"crossref","first-page":"1758","DOI":"10.1111\/j.1475-6773.2006.00684.x","article-title":"Qualitative data analysis for health services research: developing taxonomy, themes, and theory","volume":"42","author":"Bradley","year":"2007","journal-title":"Health Serv Res"},{"issue":"5 Pt 2","key":"2020110613091641600_ocz201-B46","first-page":"1101\u201318.","article-title":"Qualitative methods: what are they and why use them?","volume":"34","author":"Sofaer","year":"1999","journal-title":"Health Serv Res"},{"issue":"3","key":"2020110613091641600_ocz201-B47","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.jbi.2004.04.004","article-title":"A cognitive taxonomy of medical errors","volume":"37","author":"Zhang","year":"2004","journal-title":"J Biomed Inform"},{"issue":"6522","key":"2020110613091641600_ocz201-B48","doi-asserted-by":"crossref","first-page":"746","DOI":"10.1136\/bmj.292.6522.746","article-title":"Confidence intervals rather than P values: estimation rather than hypothesis testing","volume":"292","author":"Gardner","year":"1986","journal-title":"BMJ"},{"key":"2020110613091641600_ocz201-B49","author":"Ousterhout","year":"2009"},{"key":"2020110613091641600_ocz201-B50","first-page":"286","author":"Scott","year":"1996"},{"key":"2020110613091641600_ocz201-B51","volume-title":"SQLite","author":"Owens","year":"2010"},{"key":"2020110613091641600_ocz201-B52","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4302-0172-4","volume-title":"The Definitive Guide to SQLite","author":"Owens","year":"2006"},{"key":"2020110613091641600_ocz201-B53","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1007\/978-3-642-32498-7_5","volume-title":"CD-ARES 2012: Multidisciplinary Research and Practice for Information Systems","author":"Gschwandtner","year":"2012"},{"key":"2020110613091641600_ocz201-B54","author":"Oliveira"},{"key":"2020110613091641600_ocz201-B55","first-page":"751","author":"Lee","year":"1999"},{"issue":"15\u201321","key":"2020110613091641600_ocz201-B56","first-page":"48","article-title":"A survey of data quality tools","volume":"14","author":"Barateiro","year":"2005","journal-title":"Datenbank-Spektrum"},{"key":"2020110613091641600_ocz201-B57","author":"M\u00fcller","year":"2005"},{"issue":"4","key":"2020110613091641600_ocz201-B58","first-page":"3","article-title":"Data cleaning: problems and current approaches","volume":"23","author":"Rahm","year":"2000","journal-title":"IEEE Data Eng Bull"},{"issue":"1","key":"2020110613091641600_ocz201-B59","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1023\/A:1021564703268","article-title":"A taxonomy of dirty data","volume":"7","author":"Kim","year":"2003","journal-title":"Data Min Knowl Discov"},{"issue":"2","key":"2020110613091641600_ocz201-B60","article-title":"A rule based taxonomy of dirty data","volume":"1","author":"Li","year":"2018","journal-title":"J Comput"},{"key":"2020110613091641600_ocz201-B61","first-page":"1","author":"Wei","year":"2007"},{"issue":"4","key":"2020110613091641600_ocz201-B62","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1145\/2590989.2590995","article-title":"Data profiling revisited","volume":"42","author":"Naumann","year":"2014","journal-title":"Sigmod Rec"},{"key":"2020110613091641600_ocz201-B63","first-page":"78","volume-title":"ACM SIGPLAN Notices: Proceedings of the OOPSLA \u201903 Conference","author":"Demsky","year":"2003"},{"issue":"1","key":"2020110613091641600_ocz201-B64","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1023\/A:1009761603038","article-title":"Real-world data is dirty: data cleansing and the merge\/purge problem","volume":"2","author":"Hern\u00e1ndez","year":"1998","journal-title":"Data Min Knowl Discov"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/3\/386\/34152394\/ocz201.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/3\/386\/34152394\/ocz201.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,23]],"date-time":"2023-09-23T10:19:48Z","timestamp":1695464388000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/27\/3\/386\/5678773"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,16]]},"references-count":64,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,12,16]]},"published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz201","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3]]},"published":{"date-parts":[[2019,12,16]]}}}