{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T22:45:54Z","timestamp":1769208354953,"version":"3.49.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,10]]},"abstract":"<jats:p>We investigate the problem of discovering approximate denial constraints (DCs), for finding DCs that hold with some exceptions to avoid overfitting real-life dirty data and facilitate data cleaning tasks. Different methods have been proposed to address the problem, by following the same framework consisting of two phases. In the first phase a structure called<jats:italic>evidence set<\/jats:italic>is built on the given instance, and in the second phase approximate DCs are found by leveraging the evidence set. In this paper, we present novel and more efficient techniques under the same framework. (1) We optimize the evidence set construction by first building a condensed structure called<jats:italic>clue set<\/jats:italic>and then transforming the clue set to the evidence set. The clue set is more memory-efficient than the evidence set and facilitates more efficient bit operations and better cache utilization, and the transformation cost is usually trivial. We further study parallel clue set construction with multiple threads. (2) Our solution to approximate DC discovery from the evidence set is a highly non-trivial extension of the<jats:italic>evidence inversion<\/jats:italic>method for exact DC discovery. (3) Using a host of datasets, we experimentally verify our approximate DC discovery approach is on average 8.2 and 7.5 times faster than the two state-of-the-art ones that also leverage parallelism, respectively, and our methods for the two phases are up to an order of magnitude and two orders of magnitude faster than the state-of-the-art methods, respectively.<\/jats:p>","DOI":"10.14778\/3565816.3565828","type":"journal-article","created":{"date-parts":[[2022,11,24]],"date-time":"2022-11-24T00:35:16Z","timestamp":1669250116000},"page":"269-281","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Fast approximate denial constraint discovery"],"prefix":"10.14778","volume":"16","author":[{"given":"Renjie","family":"Xiao","sequence":"first","affiliation":[{"name":"Fudan University"}]},{"given":"Zijing","family":"Tan","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Haojin","family":"Wang","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Shuai","family":"Ma","sequence":"additional","affiliation":[{"name":"Beihang University"}]}],"member":"320","published-online":{"date-parts":[[2022,11,23]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1747","article-title":"Data Profiling","volume":"2017","author":"Abedjan Ziawasch","year":"2017","unstructured":"Ziawasch Abedjan , Lukasz Golab , and Felix Naumann . 2017 . Data Profiling : A Tutorial. In SIGMOD 2017. 1747 -- 1751 . Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2017. Data Profiling: A Tutorial. In SIGMOD 2017. 1747--1751.","journal-title":"A Tutorial. In SIGMOD"},{"key":"e_1_2_1_2_1","volume-title":"Data Profiling","author":"Abedjan Ziawasch","unstructured":"Ziawasch Abedjan , Lukasz Golab , Felix Naumann , and Thorsten Papenbrock . 2018. Data Profiling . Morgan & Claypool Publishers . Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. 2018. Data Profiling. Morgan & Claypool Publishers."},{"key":"e_1_2_1_3_1","volume-title":"Foundations of Databases","author":"Abiteboul Serge","unstructured":"Serge Abiteboul , Richard Hull , and Victor Vianu . 1995. Foundations of Databases . Addison-Wesley . Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407824"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3157794.3157800"},{"key":"e_1_2_1_6_1","unstructured":"Qi Cheng Jarek Gryz Fred Koo T. Y. Cliff Leung Linqi Liu Xiaoyan Qian and K. Bernhard Schiefer. 1999. Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. In VLDB. 687--698. Qi Cheng Jarek Gryz Fred Koo T. Y. Cliff Leung Linqi Liu Xiaoyan Qian and K. Bernhard Schiefer. 1999. Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. In VLDB. 687--698."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536258.2536262"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Xu Chu Ihab F. Ilyas and Paolo Papotti. 2013. Holistic data cleaning: Putting violations into context. In ICDE. 458--469. Xu Chu Ihab F. Ilyas and Paolo Papotti. 2013. Holistic data cleaning: Putting violations into context. In ICDE. 458--469.","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"e_1_2_1_9_1","unstructured":"Cristian Consonni Paolo Sottovia Alberto Montresor and Yannis Velegrakis. 2019. Discovering Order Dependencies through Order Compatibility. In EDBT. 409--420. Cristian Consonni Paolo Sottovia Alberto Montresor and Yannis Velegrakis. 2019. Discovering Order Dependencies through Order Compatibility. In EDBT. 409--420."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Michele Dallachiesa Amr Ebaid Ahmed Eldawy Ahmed K. Elmagarmid Ihab F. Ilyas Mourad Ouzzani and Nan Tang. 2013. NADEEF: a commodity data cleaning system. In SIGMOD. 541--552. Michele Dallachiesa Amr Ebaid Ahmed Eldawy Ahmed K. Elmagarmid Ihab F. Ilyas Mourad Ouzzani and Nan Tang. 2013. NADEEF: a commodity data cleaning system. In SIGMOD. 541--552.","DOI":"10.1145\/2463676.2465327"},{"key":"e_1_2_1_11_1","volume-title":"Foundations of Data Quality Management","author":"Fan Wenfei","unstructured":"Wenfei Fan and Floris Geerts . 2012. Foundations of Data Quality Management . Morgan & Claypool Publishers . Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/15M1055024"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3364324.3364332"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00586-5"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Stella Giannakopoulou Manos Karpathiotakis and Anastasia Ailamaki. 2020. Cleaning Denial Constraint Violations through Relaxation. In SIGMOD. 805--815. Stella Giannakopoulou Manos Karpathiotakis and Anastasia Ailamaki. 2020. Cleaning Denial Constraint Violations through Relaxation. In SIGMOD. 805--815.","DOI":"10.1145\/3318464.3389775"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Amir Gilad Daniel Deutch and Sudeepa Roy. 2020. On Multiple Semantics for Declarative Database Repairs. In SIGMOD. 817--831. Amir Gilad Daniel Deutch and Sudeepa Roy. 2020. On Multiple Semantics for Declarative Database Repairs. In SIGMOD. 817--831.","DOI":"10.1145\/3318464.3389721"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732240.2732248"},{"key":"e_1_2_1_18_1","volume-title":"Ilyas and Xu Chu","author":"Ihab","year":"2019","unstructured":"Ihab F. Ilyas and Xu Chu . 2019 . Data Cleaning. ACM. Ihab F. Ilyas and Xu Chu. 2019. Data Cleaning. ACM."},{"key":"e_1_2_1_19_1","volume-title":"An Introduction to Parallel Algorithms","author":"J\u00e1J\u00e1 Joseph F.","unstructured":"Joseph F. J\u00e1J\u00e1 . 1992. An Introduction to Parallel Algorithms . Addison-Wesley . Joseph F. J\u00e1J\u00e1. 1992. An Introduction to Parallel Algorithms. Addison-Wesley."},{"key":"e_1_2_1_20_1","unstructured":"Yifeng Jin Zijing Tan Weijun Zeng and Shuai Ma. 2021. Approximate Order Dependency Discovery. In ICDE. 25--36. Yifeng Jin Zijing Tan Weijun Zeng and Shuai Ma. 2021. Approximate Order Dependency Discovery. In ICDE. 25--36."},{"key":"e_1_2_1_21_1","unstructured":"Yifeng Jin Lin Zhu and Zijing Tan. 2020. Efficient Bidirectional Order Dependency Discovery. In ICDE. 61--72. Yifeng Jin Lin Zhu and Zijing Tan. 2020. Efficient Bidirectional Order Dependency Discovery. In ICDE. 61--72."},{"key":"e_1_2_1_22_1","unstructured":"Reza Karegar Parke Godfrey Lukasz Golab Mehdi Kargar Divesh Srivastava and Jaroslaw Szlichta. 2021. Efficient Discovery of Approximate Order Dependencies. In EDBT. 427--432. Reza Karegar Parke Godfrey Lukasz Golab Mehdi Kargar Divesh Srivastava and Jaroslaw Szlichta. 2021. Efficient Discovery of Approximate Order Dependencies. In EDBT. 427--432."},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Zuhair Khayyat Ihab F. Ilyas Alekh Jindal Samuel Madden Mourad Ouzzani Paolo Papotti Jorge-Arnulfo Quian\u00e9-Ruiz Nan Tang and Si Yin. 2015. BigDansing: A System for Big Data Cleansing. In SIGMOD. 1215--1230. Zuhair Khayyat Ihab F. Ilyas Alekh Jindal Samuel Madden Mourad Ouzzani Paolo Papotti Jorge-Arnulfo Quian\u00e9-Ruiz Nan Tang and Si Yin. 2015. BigDansing: A System for Big Data Cleansing. In SIGMOD. 1215--1230.","DOI":"10.1145\/2723372.2747646"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831362"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-016-0441-6"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Jyrki Kivinen and Heikki Mannila. 1992. Approximate Dependency Inference from Relations. In ICDT. 86--98. Jyrki Kivinen and Heikki Mannila. 1992. Approximate Dependency Inference from Relations. In ICDT. 86--98.","DOI":"10.1007\/3-540-56039-4_34"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00676-3"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192968"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-015-0412-3"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0190(02)00506-9"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3401960.3401966"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2014.01.012"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD. 821--833. Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD. 821--833.","DOI":"10.1145\/2882903.2915203"},{"key":"e_1_2_1_34_1","volume-title":"Pena and Eduardo Cunha de Almeida","author":"Eduardo H.","year":"2018","unstructured":"Eduardo H. M. Pena and Eduardo Cunha de Almeida . 2018 . BFASTDC : A Bitwise Algorithm for Mining Denial Constraints. In DEXA. 53--68. Eduardo H. M. Pena and Eduardo Cunha de Almeida. 2018. BFASTDC: A Bitwise Algorithm for Mining Denial Constraints. In DEXA. 53--68."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368293"},{"key":"e_1_2_1_36_1","first-page":"859","article-title":"Fast Detection of Denial Constraint Violations","volume":"15","author":"Pena Eduardo H. M.","year":"2022","unstructured":"Eduardo H. M. Pena , Eduardo Cunha de Almeida , and Felix Naumann . 2022 . Fast Detection of Denial Constraint Violations . PVLDB 15 , 4 (2022), 859 -- 871 . Eduardo H. M. Pena, Eduardo Cunha de Almeida, and Felix Naumann. 2022. Fast Detection of Denial Constraint Violations. PVLDB 15, 4 (2022), 859--871.","journal-title":"PVLDB"},{"key":"e_1_2_1_37_1","volume-title":"Eduardo Cunha de Almeida, and Felix Naumann.","author":"Pena Eduardo H. M.","year":"2020","unstructured":"Eduardo H. M. Pena , Edson Ramiro Lucas Filho , Eduardo Cunha de Almeida, and Felix Naumann. 2020 . Efficient Detection of Data Dependency Violations. In CIKM. 1235--1244. Eduardo H. M. Pena, Edson Ramiro Lucas Filho, Eduardo Cunha de Almeida, and Felix Naumann. 2020. Efficient Detection of Data Dependency Violations. In CIKM. 1235--1244."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_2_1_39_1","volume-title":"Ilyas","author":"Saxena Hemant","year":"2019","unstructured":"Hemant Saxena , Lukasz Golab , and Ihab F . Ilyas . 2019 . Distributed Discovery of Functional Dependencies. In ICDE. 1590--1593. Hemant Saxena, Lukasz Golab, and Ihab F. Ilyas. 2019. Distributed Discovery of Functional Dependencies. In ICDE. 1590--1593."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342638"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392778"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00683-4"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"David E. Simmen Eugene J. Shekita and Timothy Malkemus. 1996. Fundamental Techniques for Order Optimization. In SIGMOD. 57--67. David E. Simmen Eugene J. Shekita and Timothy Malkemus. 1996. Fundamental Techniques for Order Optimization. In SIGMOD. 57--67.","DOI":"10.1145\/235968.233320"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3067421.3067422"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0510-0"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350241"},{"key":"e_1_2_1_47_1","unstructured":"Jaroslaw Szlichta Parke Godfrey Jarek Gryz Wenbin Ma Weinan Qiu and Calisto Zuzarte. 2014. Business-Intelligence Queries with Order Dependencies in DB2. In EDBT. 750--761. Jaroslaw Szlichta Parke Godfrey Jarek Gryz Wenbin Ma Weinan Qiu and Calisto Zuzarte. 2014. Business-Intelligence Queries with Order Dependencies in DB2. In EDBT. 750--761."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556568"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3401960.3401965"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00684-3"},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Ziheng Wei and Sebastian Link. 2018. DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition. In SIGMOD. 1793--1796. Ziheng Wei and Sebastian Link. 2018. DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition. In SIGMOD. 1793--1796.","DOI":"10.1145\/3183713.3193544"},{"key":"e_1_2_1_52_1","unstructured":"Ziheng Wei and Sebastian Link. 2019. Discovery and Ranking of Functional Dependencies. In ICDE. 1526--1537. Ziheng Wei and Sebastian Link. 2019. Discovery and Ranking of Functional Dependencies. In ICDE. 1526--1537."},{"key":"e_1_2_1_53_1","volume-title":"Introduction to parallel algorithms","author":"Xavier C.","unstructured":"C. Xavier and S. Sitharama Iyengar . 1998. Introduction to parallel algorithms . Wiley . C. Xavier and S. Sitharama Iyengar. 1998. Introduction to parallel algorithms. Wiley."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3565816.3565828","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,9]],"date-time":"2024-10-09T12:40:54Z","timestamp":1728477654000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3565816.3565828"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10]]},"references-count":53,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,10]]}},"alternative-id":["10.14778\/3565816.3565828"],"URL":"https:\/\/doi.org\/10.14778\/3565816.3565828","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,10]]},"assertion":[{"value":"2022-11-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}