{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T04:29:15Z","timestamp":1759206555203,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T00:00:00Z","timestamp":1653264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>Functional Dependencies define attribute relationships based on syntactic equality, and when used in data cleaning, they erroneously label syntactically different but semantically equivalent values as errors. We explore dependency-based data cleaning with Ontology Functional Dependencies (OFDs), which express semantic attribute relationships such as synonyms defined by an ontology. We study the theoretical foundations of OFDs, including sound and complete axioms and a linear-time inference procedure. We then propose an algorithm for discovering OFDs (exact ones and ones that hold with some exceptions) from data that uses the axioms to prune the search space. Toward enabling OFDs as data quality rules in practice, we study the problem of finding minimal repairs to a relation and ontology with respect to a set of OFDs. We demonstrate the effectiveness of our techniques on real datasets and show that OFDs can significantly reduce the number of false positive errors in data cleaning techniques that rely on traditional Functional Dependencies.<\/jats:p>","DOI":"10.1145\/3524303","type":"journal-article","created":{"date-parts":[[2022,4,21]],"date-time":"2022-04-21T12:44:03Z","timestamp":1650545043000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Contextual Data Cleaning with Ontology Functional Dependencies"],"prefix":"10.1145","volume":"14","author":[{"given":"Zheng","family":"Zheng","sequence":"first","affiliation":[{"name":"McMaster University, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Longtao","family":"Zheng","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, P.R.China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Morteza","family":"Alipourlangouri","sequence":"additional","affiliation":[{"name":"McMaster University, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4128-8074","authenticated-orcid":false,"given":"Fei","family":"Chiang","sequence":"additional","affiliation":[{"name":"McMaster University, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lukasz","family":"Golab","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jaroslaw","family":"Szlichta","sequence":"additional","affiliation":[{"name":"Ontario Tech University, Oshawa, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sridevi","family":"Baskaran","sequence":"additional","affiliation":[{"name":"McMaster University, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,23]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Ontobee. n.d. The Drug Ontology. Retrieved April 28 2022 from http:\/\/www.ontobee.org\/ontology\/DRON."},{"key":"e_1_3_2_3_2","unstructured":"Kaggle. n.d. Data Science for Good: Kiva Crowdfunding.Retrieved April 28 2022 from https:\/\/www.kaggle.com\/kiva\/data-science-for-good-kiva-crowdfunding?select=loan_themes_by_region.csv."},{"key":"e_1_3_2_4_2","unstructured":"NIH. 2016. Medical Ontology Research. Retrieved April 28 2022 from https:\/\/mor.nlm.nih.gov."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661884"},{"key":"e_1_3_2_6_2","unstructured":"R. Agrawal H. Mannila R. Srikant H. Toivonen and A. Verkamo. 1996. Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining U. M. Fayyad G. Piatetsky-Shapiro P. Smyth and R. Uthurusamy (Eds.). AAAI Press Menlo Park CA 307\u2013328."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132879"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1938551.1938585"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544854"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2021.11.020"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_3_2_12_2","article-title":"InfoClean: Protecting sensitive information in data cleaning","volume":"9","author":"Chiang F.","year":"2018","unstructured":"F. Chiang and D. Gairola. 2018. InfoClean: Protecting sensitive information in data cleaning. ACM J. Data. Inf. Qual. 9, 4 (2018), Article 22.","journal-title":"ACM J. Data. Inf. Qual."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767833"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.14778\/2536258.2536262"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749431"},{"key":"e_1_3_2_17_2","first-page":"315","volume-title":"Proceedings of VLDB","author":"Cong G.","year":"2007","unstructured":"G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. 2007. Improving data quality: Consistency and accuracy. In Proceedings of VLDB. 315\u2013326."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989373"},{"issue":"3","key":"e_1_3_2_19_2","first-page":"294","article-title":"Who solved the secretary problem?","volume":"4","author":"Ferguson T.","year":"1989","unstructured":"T. Ferguson. 1989. Who solved the secretary problem?Statist. Sci. 4, 3 (1989), 294\u2013296.","journal-title":"Statist. Sci."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/1216155.1216159"},{"key":"e_1_3_2_21_2","unstructured":"M. R. Garey and D. S. Johnson.1979. Computers and Intractability: A Guide to the Theory of NP-Completeness . W. H. Freeman."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.14778\/2536360.2536363"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2637928"},{"key":"e_1_3_2_24_2","volume-title":"LinkedCT: A Linked Data Space for Clinical Trials","author":"Hassanzadeh O.","year":"2009","unstructured":"O. Hassanzadeh, A. Kementsietsidis, L. Lim, R. J. Miller, and M. Wang. 2009. LinkedCT: A Linked Data Space for Clinical Trials. Technical Report CSRG-596. University of Toronto."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622249"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.5555\/645483.656220"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3184558.3191616"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/1514894.1514901"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.219"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-3975(98)80029-7"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/322307.322311"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360904"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-46439-5_24"},{"key":"e_1_3_2_34_2","article-title":"The HARPY Speech Recognition System","author":"Lowerre B.","year":"1976","unstructured":"B. Lowerre. 1976. The HARPY Speech Recognition System. Ph.D. Dissertation, Carnegie-Mellon University.","journal-title":"Ph.D. Dissertation, Carnegie-Mellon University."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.14778\/3339490.3339501"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407801"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407809"},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of OWLED","author":"Motik B.","year":"2007","unstructured":"B. Motik, I. Horrocks, and U. Sattler. 2007. Adding integrity constraints to OWL. In Proceedings of OWLED."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/502030.502033"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44503-X_13"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"S. Ortona V. Meduri and P. Papotti. 2018. Robust discovery of positive and negative rules in knowledge bases. In Proceedings of ICDE . 1168\u20131179.","DOI":"10.1109\/ICDE.2018.00108"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824086"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.14778\/2752939.2752946"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856325"},{"issue":"40","key":"e_1_3_2_45_2","first-page":"99","article-title":"The earth mover\u2019s distance as a metric for image retrieval","volume":"2","year":"2000","unstructured":"R. Yossi, T. Carlo, and J. Leonidas. 2000. The earth mover\u2019s distance as a metric for image retrieval. Int. J. Comput. Vis. 2, 40 (2000), 99\u2013121.","journal-title":"Int. J. Comput. Vis."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1993.10476408"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026543900054"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.14778\/3067421.3067422"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350241"},{"key":"e_1_3_2_51_2","article-title":"National Drug Code Directory","author":"Administration U.S. Food and Drug","year":"2018","unstructured":"U.S. Food and Drug Administration. 2018. National Drug Code Directory. Retrieved April 28, 2022 from https:\/\/www.fda.gov.","journal-title":"https:\/\/www.fda.gov"},{"key":"e_1_3_2_52_2","first-page":"101","volume-title":"Proceedings of DaWaK","author":"Wyss C.","year":"2001","unstructured":"C. Wyss, C. Giannella, and E. Robertson. 2001. FastFDs: A heuristic-driven, depth-first algorithm for mining FDs from relations. In Proceedings of DaWaK. 101\u2013110."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463706"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.5555\/1341681.1341690"},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","unstructured":"Z. Zheng L. Zheng M. Alipour Langouri F. Chiang L. Golab and J. Szlichta. 2022. Discovery and contextual data cleaning with ontology functional dependencies. arXiv:2105.08105 (2022).","DOI":"10.1145\/3524303"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524303","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524303","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:57Z","timestamp":1750188657000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524303"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,23]]},"references-count":54,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3524303"],"URL":"https:\/\/doi.org\/10.1145\/3524303","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2022,5,23]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}