{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T15:25:55Z","timestamp":1781364355038,"version":"3.54.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:p>Differential dependencies (DDs) are proposed to specify constraints on the<jats:italic>differences<\/jats:italic>between values, where the semantics of<jats:italic>difference<\/jats:italic>can be \"similar\", \"dissimilar\" and beyond. DDs subsume functional dependencies (FDs), and find valuable applications in tasks such as violation detection, duplicate identification, and quantitative data cleaning, among others. In this paper we present an efficient DD discovery method for finding hidden DDs from data. We encode differences between values in a novel structure called the \"diff-set\", and present a set of techniques for constructing the diff-set, discovering valid DDs with set cover enumeration of the diff-set, and eliminating non-minimal DDs. Our extensive experimental evaluation verifies that our method outperforms the existing DD discovery method up to orders of magnitude. Furthermore, our method is adapted to discover an important subclass of DDs, known as<jats:italic>relaxed<\/jats:italic>FDs (RFDs), and is also up to orders of magnitude faster than the state-of-the-art RFD discovery method.<\/jats:p>","DOI":"10.14778\/3654621.3654624","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T22:21:08Z","timestamp":1717107668000},"page":"1552-1564","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Efficient Differential Dependency Discovery"],"prefix":"10.14778","volume":"17","author":[{"given":"Shulei","family":"Kuang","sequence":"first","affiliation":[{"name":"School of Computer Science, Fudan University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Honghui","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zijing","family":"Tan","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shuai","family":"Ma","sequence":"additional","affiliation":[{"name":"SKLSDE Lab, Beihang University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1747","article-title":"Data Profiling","volume":"2017","author":"Abedjan Ziawasch","year":"2017","unstructured":"Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2017. Data Profiling: A Tutorial. In SIGMOD 2017. 1747--1751.","journal-title":"A Tutorial. In SIGMOD"},{"key":"e_1_2_1_2_1","volume-title":"Data Profiling","author":"Abedjan Ziawasch","unstructured":"Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. 2018. Data Profiling. Morgan & Claypool Publishers."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3157794.3157800"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Philip Bohannon Michael Flaster Wenfei Fan and Rajeev Rastogi. 2005. A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In SIGMOD. 143--154.","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2967722"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2472010"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-019-00667-7"},{"key":"e_1_2_1_8_1","volume-title":"Jensen","author":"Chen Lu","year":"2023","unstructured":"Lu Chen, Yunjun Gao, Xuan Song, Zheng Li, Yifan Zhu, Xiaoye Miao, and Christian S. Jensen. 2023. Indexing Metric Spaces for Exact Similarity Search. ACM Comput. Surv. 55, 6 (2023), 128:1--128:39."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Xu Chu Ihab F. Ilyas and Paolo Papotti. 2013. Holistic data cleaning: Putting violations into context. In ICDE. 458--469.","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Michele Dallachiesa Amr Ebaid Ahmed Eldawy Ahmed K. Elmagarmid Ihab F. Ilyas Mourad Ouzzani and Nan Tang. 2013. NADEEF: a commodity data cleaning system. In SIGMOD. 541--552.","DOI":"10.1145\/2463676.2465327"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.250581"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Wenfei Fan. 2008. Dependencies revisited for improving data quality. In PODS. 159--170.","DOI":"10.1145\/1376916.1376940"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-010-0206-6"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588924"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1137\/15M1055024"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00586-5"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Stella Giannakopoulou Manos Karpathiotakis and Anastasia Ailamaki. 2020. Cleaning Denial Constraint Violations through Relaxation. In SIGMOD. 805--815.","DOI":"10.1145\/3318464.3389775"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(83)90084-1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/5925.5929"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687693"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/42.2.100"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2021.3130227"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Yifeng Jin Lin Zhu and Zijing Tan. 2020. Efficient Bidirectional Order Dependency Discovery. In ICDE. 61--72.","DOI":"10.1109\/ICDE48307.2020.00013"},{"key":"e_1_2_1_24_1","first-page":"1","volume-title":"Proc. ACM Manag. Data 1","author":"Kaminsky Youri","year":"2023","unstructured":"Youri Kaminsky, Eduardo H. M. Pena, and Felix Naumann. 2023. Discovering Similarity Inclusion Dependencies. Proc. ACM Manag. Data 1, 1 (2023), 75:1--75:24."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Nick Koudas Avishek Saha Divesh Srivastava and Suresh Venkatasubramanian. 2009. Metric Functional Dependencies. In ICDE. 1275--1278.","DOI":"10.1109\/ICDE.2009.219"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377379"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192968"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1007\/978-3-319-08608-8_5","article-title":"Mining Differential Dependencies: A Subspace Clustering Approach","volume":"8506","author":"Kwashie Selasi","year":"2014","unstructured":"Selasi Kwashie, Jixue Liu, Jiuyong Li, and Feiyue Ye. 2014. Mining Differential Dependencies: A Subspace Clustering Approach. In ADC (Lecture Notes in Computer Science), Vol. 8506. 50--61.","journal-title":"ADC (Lecture Notes in Computer Science)"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-319-19548-3_1","article-title":"Efficient Discovery of Differential Dependencies Through Association Rules Mining","volume":"9093","author":"Kwashie Selasi","year":"2015","unstructured":"Selasi Kwashie, Jixue Liu, Jiuyong Li, and Feiyue Ye. 2015. Efficient Discovery of Differential Dependencies Through Association Rules Mining. In ADC (Lecture Notes in Computer Science), Vol. 9093. 3--15.","journal-title":"ADC (Lecture Notes in Computer Science)"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0190(02)00506-9"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3401960.3401966"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"St\u00e9phane Lopes Jean-Marc Petit and Lotfi Lakhal. 2000. Efficient Discovery of Functional Dependencies and Armstrong Relations. In EDBT. 350--364.","DOI":"10.1007\/3-540-46439-5_24"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/0169-023X(94)90023-X"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2794367.2794377"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD. 821--833.","DOI":"10.1145\/2882903.2915203"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368293"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574254"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856325"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377377"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342638"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392778"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10619-018-7233-5"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000826"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.84"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3046443"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3067421.3067422"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0510-0"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350241"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556568"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.14778\/3401960.3401965"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Saravanan Thirumuruganathan Laure Berti-\u00c9quille Mourad Ouzzani Jorge-Arnulfo Quian\u00e9-Ruiz and Nan Tang. 2017. UGuide: User-Guided Discovery of FD-Detectable Errors. In SIGMOD. 1385--1397.","DOI":"10.1145\/3035918.3064024"},{"key":"e_1_2_1_53_1","article-title":"Detecting Inclusion Dependencies on Very Many Tables","volume":"42","author":"Tschirschnitz Fabian","year":"2017","unstructured":"Fabian Tschirschnitz, Thorsten Papenbrock, and Felix Naumann. 2017. Detecting Inclusion Dependencies on Very Many Tables. ACM Trans. Database Syst. 42, 3 (2017), 18:1--18:29.","journal-title":"ACM Trans. Database Syst."},{"key":"e_1_2_1_54_1","volume-title":"Jeffrey Xu Yu, and Hong Cheng","author":"Wang Yihan","year":"2017","unstructured":"Yihan Wang, Shaoxu Song, Lei Chen, Jeffrey Xu Yu, and Hong Cheng. 2017. Discovering Conditional Matching Rules. TKDD 11, 4 (2017), 46:1--46:38."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00684-3"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3358701.3358703"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Ziheng Wei and Sebastian Link. 2018. DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition. In SIGMOD. 1793--1796.","DOI":"10.1145\/3183713.3193544"},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Ziheng Wei and Sebastian Link. 2019. Discovery and Ranking of Functional Dependencies. In ICDE. 1526--1537.","DOI":"10.1109\/ICDE.2019.00137"},{"key":"e_1_2_1_59_1","volume-title":"Robertson","author":"Wyss Catharine M.","year":"2001","unstructured":"Catharine M. Wyss, Chris Giannella, and Edward L. Robertson. 2001. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances. In DaWaK."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565828"},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Renjie Xiao Yong'an Yuan Zijing Tan Shuai Ma and Wei Wang. 2022. Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration. In ICDE. 286--298.","DOI":"10.1109\/ICDE53745.2022.00026"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3654621.3654624","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T19:57:35Z","timestamp":1732132655000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3654621.3654624"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3]]},"references-count":61,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10.14778\/3654621.3654624"],"URL":"https:\/\/doi.org\/10.14778\/3654621.3654624","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,3]]},"assertion":[{"value":"2024-05-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}