{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T19:01:48Z","timestamp":1777575708432,"version":"3.51.4"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2025,7,14]]},"abstract":"<jats:p>Data-oriented applications, their users, and even the law require data of high quality. Research has divided the rather vague notion of data quality into various dimensions, such as accuracy, consistency, and reputation. To achieve the goal of high data quality, many tools and techniques exist to clean and otherwise improve data. Yet, systematic research on actually assessing data quality in its dimensions is largely absent, and with it, the ability to gauge the success of any data cleaning effort.<\/jats:p>\n          <jats:p>We propose five facets as ingredients to assess data quality: data, source, system, task, and human. Tapping each facet for data quality assessment poses its own challenges. We show how overcoming these challenges helps data quality assessment for those data quality dimensions mentioned in Europe's AI Act. Our work concludes with a proposal for a comprehensive data quality assessment framework.<\/jats:p>","DOI":"10.1145\/3749116.3749120","type":"journal-article","created":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T23:58:41Z","timestamp":1752537521000},"page":"18-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["The Five Facets of Data Quality Assessment"],"prefix":"10.1145","volume":"54","author":[{"given":"Sedir","family":"Mohammed","sequence":"first","affiliation":[{"name":"Hasso Plattner Institute, University of Potsdam, Germany"}]},{"given":"Lisa","family":"Ehrlinger","sequence":"additional","affiliation":[{"name":"Hasso Plattner Institute, University of Potsdam, Germany"}]},{"given":"Hazar","family":"Harmouch","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Netherlands"}]},{"given":"Felix","family":"Naumann","sequence":"additional","affiliation":[{"name":"Hasso Plattner Institute, University of Potsdam, Germany"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Chief Data Office, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,7,14]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"first regulation on artificial intelligence","author":"Act EU AI","year":"2023","unstructured":"EU AI Act: first regulation on artificial intelligence, 2023. URL https: \/\/www.europarl.europa.eu\/topics\/en\/ article\/20230601STO93804\/eu-aiact- first-regulation-on-artificialintelligence. (Last accessed: 2024-07--25)."},{"key":"e_1_2_1_2_1","unstructured":"HIPAA privacy rule to support reproductive health care privacy 2024. URL https:\/\/www.federalregister.gov\/ documents\/2024\/04\/26\/2024-08503\/hipaaprivacy- rule-to-support-reproductivehealth- care-privacy. (Last accessed: 2024-07--25)."},{"key":"e_1_2_1_3_1","volume-title":"Your machine learning and data science community","author":"Kaggle","year":"2024","unstructured":"Kaggle: Your machine learning and data science community, 2024. URL https:\/\/ www.kaggle.com\/. (Last accessed: 2024-07--15)."},{"key":"e_1_2_1_4_1","volume-title":"the free encyclopedia","author":"Wikipedia","year":"2024","unstructured":"Wikipedia, the free encyclopedia, 2024. URL https:\/\/www.wikipedia.org\/. (Last accessed: 2024-07--15)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00778-015-0389-Y"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3214303"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650203.3663326"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.83"},{"key":"e_1_2_1_9_1","volume-title":"Data quality: concepts, methodologies and techniques. Data-centric systems and applications","author":"Batini Carlo","year":"2006","unstructured":"Carlo Batini and Monica Scannapieco. Data quality: concepts, methodologies and techniques. Data-centric systems and applications. Springer, 2006. ISBN 978--3--540--33172--8 978--3- 642-06970--3."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24106-7"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/27633.27634"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1541880.1541883"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468791.3468841"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/3--540--48005-"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.3390\/E13071229"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2899751"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2203.04706"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465327"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/C2011-0-06130--6"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-030--87101--7"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-39689-2_1"},{"key":"e_1_2_1_22_1","volume-title":"Artifical inteligence act","author":"Parliament European","year":"2024","unstructured":"European Parliament. Artifical inteligence act. 2024. URL https:\/\/eur-lex.europa.eu\/ legal-content\/EN\/TXT\/?uri=CELEX: 32024R1689. Version from 2024-06--13."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611527"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00176"},{"key":"e_1_2_1_25_1","volume-title":"2024-02--13)","author":"GDPR.","year":"2016","unstructured":"GDPR. General data protection regulation (last accessed: 2024-02--13), 2016. URL https: \/\/eur-lex.europa.eu\/legal-content\/EN\/ TXT\/PDF\/?uri=CELEX:02016R0679--20160504."},{"key":"e_1_2_1_26_1","first-page":"227","volume-title":"Proceedings of the Conference Datenbanksysteme in Business, Technologie und Web Technik (BTW)","volume":"103","author":"Glavic Boris","year":"2007","unstructured":"Boris Glavic and Klaus R. Dittrich. Data provenance: A categorization of existing approaches. In Proceedings of the Conference Datenbanksysteme in Business, Technologie und Web Technik (BTW), volume P-103 of LNI, pages 227--241. GI, 2007. URL https: \/\/dl.gi.de\/handle\/20.500.12116\/31801."},{"key":"e_1_2_1_27_1","volume-title":"Grossman and Ophir Frieder. Information retrieval: algorithms and heuristics. Number 15","author":"David","year":"2004","unstructured":"David A. Grossman and Ophir Frieder. Information retrieval: algorithms and heuristics. Number 15. Springer, 2nd ed edition, 2004. ISBN 978--1--4020--3004--8 978--1--4020--3003--1.","edition":"2"},{"key":"e_1_2_1_28_1","first-page":"16","volume-title":"Proceedings of the International Conference on Information Quality","author":"Haegemans Tom","year":"2016","unstructured":"Tom Haegemans, Monique Snoeck, and Wilfried Lemahieu. Towards a precise definition of data accuracy and a justification for its measure. In Proceedings of the International Conference on Information Quality, pages 16--16. MIT Information Quality (MITIQ) Program, 2016."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00778-017-0486--1"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1534235"},{"key":"e_1_2_1_31_1","volume-title":"Executive order on the safe, secure, and trustworthy development and use of artificial intelligence","author":"House The White","year":"2023","unstructured":"The White House. Executive order on the safe, secure, and trustworthy development and use of artificial intelligence, 2023. URL https:\/\/www.whitehouse.gov\/briefingroom\/ presidential-actions\/2023\/10\/30\/ executive-order-on-the-safe-secureand- trustworthy-development-and-useof- artificial-intelligence\/."},{"key":"e_1_2_1_32_1","first-page":"165","volume-title":"Australasian Conference on Information Systems (ACIS)","author":"Jayawardene Vimukthi","year":"2013","unstructured":"Vimukthi Jayawardene, Shazia W. Sadiq, and Marta Indulska. The curse of dimensionality in data quality. In Australasian Conference on Information Systems (ACIS), page 165, 2013. URL https:\/\/aisel.aisnet.org\/ acis2013\/165."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.2307\/1402647"},{"key":"e_1_2_1_34_1","volume-title":"Jinjuan Heidi Feng, and Harry Hochheiser. Research Methods in Human Computer Interaction","author":"Lazar Jonathan","year":"2017","unstructured":"Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. Research Methods in Human Computer Interaction. Elsevier, second edition, 2017. ISBN 978-0--12--805390--4."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.3233\/SW-140134"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00009"},{"key":"e_1_2_1_37_1","series-title":"Data quality for practitioners series","volume-title":"Data quality assessment","author":"Maydanchik Arkady","year":"2007","unstructured":"Arkady Maydanchik. Data quality assessment. Data quality for practitioners series. Technics Publications, 2007. ISBN 978-0--9771400--2--2."},{"key":"e_1_2_1_38_1","first-page":"122","volume-title":"Proceedings of the International Conference on Very Large Databases (VLDB)","author":"Milo Tova","year":"1998","unstructured":"Tova Milo and Sagit Zohar. Using schema matching to simplify heterogeneous data translation. In Proceedings of the International Conference on Very Large Databases (VLDB), pages 122--133, 1998. URL http: \/\/www.vldb.org\/conf\/1998\/p122.pdf."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.00526"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.IS.2025.102549"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.bushor.2020.01.006"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2590989.2590995"},{"key":"e_1_2_1_43_1","first-page":"148","volume-title":"Fifth Conference on Information Quality (IQ 2000","author":"Naumann Felix","year":"2000","unstructured":"Felix Naumann and Claudia Rolker. Assessment methods for information quality criteria. In Fifth Conference on Information Quality (IQ 2000), pages 148--162. MIT, 2000."},{"issue":"1","key":"e_1_2_1_44_1","first-page":"24","article-title":"From cleaning before ML to cleaning for ML","volume":"44","author":"Neutatz Felix","year":"2021","unstructured":"Felix Neutatz, Binger Chen, Ziawasch Abedjan, and Eugene Wu. From cleaning before ML to cleaning for ML. IEEE Data Engineering Bulletin, 44(1):24--41, 2021. URL http:\/\/sites.computer.org\/debull\/ A21mar\/p24.pdf.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13222-022-"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/505248.506010"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533231"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220109"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/S007780100057"},{"key":"e_1_2_1_50_1","volume-title":"Data quality: the field guide","author":"Redman Thomas C","year":"2001","unstructured":"Thomas C Redman. Data quality: the field guide. Digital press, 2001."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00146-020-"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3186549.3186559"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/584792.584868"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1515\/9781400881970-018"},{"key":"e_1_2_1_57_1","first-page":"9391","volume-title":"Annual Conference on Neural Information Processing Systems (NeurIPS)","author":"Slack Dylan","year":"2021","unstructured":"Dylan Slack, Anna Hilgard, Sameer Singh, and Himabindu Lakkaraju. Reliable post hoc explanations: Modeling uncertainty in explainability. In Annual Conference on Neural Information Processing Systems (NeurIPS), pages 9391--9404, 2021. URL https: \/\/proceedings.neurips.cc\/paper\/2021\/ hash\/4e246a381baf2ce038b3b0f82c7d6fb4- Abstract.html."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247526"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360646"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1002\/ASI.20652"},{"key":"e_1_2_1_61_1","first-page":"9269","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","volume":"119","author":"Sundararajan Mukund","year":"2020","unstructured":"Mukund Sundararajan and Amir Najmi. The many Shapley values for model explanation. In Proceedings of the International Conference on Machine Learning (ICML), volume 119, pages 9269--9278. PMLR, 2020. URL http:\/\/proceedings.mlr.press\/v119\/ sundararajan20b.html."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1080\/07421222.1996.11518099"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00778-022-"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.10158"}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3749116.3749120","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T16:22:01Z","timestamp":1752596521000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3749116.3749120"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,14]]},"references-count":64,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,7,14]]}},"alternative-id":["10.1145\/3749116.3749120"],"URL":"https:\/\/doi.org\/10.1145\/3749116.3749120","relation":{},"ISSN":["0163-5808"],"issn-type":[{"value":"0163-5808","type":"print"}],"subject":[],"published":{"date-parts":[[2025,7,14]]},"assertion":[{"value":"2025-07-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}