{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:40:19Z","timestamp":1756485619808,"version":"3.44.0"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:p>\n            This paper studies the discovery of relaxed functional dependencies (RFDs). We consider RFDs that relax restrictions in both value equality and constraint satisfaction: treating values as equal if their distance is less than a given similarity threshold, and considering RFDs with violations below a given error threshold as valid. As a highly non-trivial extension of the row-based approach to functional dependency (FD) discovery, we present the first algorithm capable of discovering all valid and minimal RFDs. We extend the structure called\n            <jats:italic toggle=\"yes\">\"difference-set\"<\/jats:italic>\n            for\n            <jats:italic toggle=\"yes\">predicates<\/jats:italic>\n            that are combinations of attributes and similarity thresholds. We present an efficient method for difference-set construction, incorporating optimizations for both time and space complexity. When inferring RFDs from difference-sets, we enumerate RFDs based on the subsumption relationship of their right-hand-side predicates to share computations. An extensive experimental evaluation verifies that the proposed discovery algorithm is faster than baseline methods up to orders of magnitude and effective in finding hidden FDs from dirty data.\n          <\/jats:p>","DOI":"10.14778\/3734839.3734843","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:01:06Z","timestamp":1756483266000},"page":"2044-2056","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Efficient Discovery of Relaxed Functional Dependencies"],"prefix":"10.14778","volume":"18","author":[{"given":"Mengran","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science, Fudan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zijing","family":"Tan","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, China, Shanghai Key Laboratory of Data Science"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Honghui","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Fudan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuai","family":"Ma","sequence":"additional","affiliation":[{"name":"SKLSDE Lab, Beihang University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Data Profiling: A Tutorial. In SIGMOD. 1747\u20131751.","author":"Abedjan Ziawasch","year":"2017","unstructured":"Ziawasch Abedjan, Lukasz Golab, and Felix Naumann. 2017. Data Profiling: A Tutorial. In SIGMOD. 1747\u20131751."},{"key":"e_1_2_1_2_1","volume-title":"Data Profiling","author":"Abedjan Ziawasch","unstructured":"Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. 2018. Data Profiling. Morgan & Claypool Publishers, San Rafael."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661884"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Tobias Bleifu\u00df Susanne Bulow Johannes Frohnhofen Julian Risch Georg Wiese Sebastian Kruse Thorsten Papenbrock and Felix Naumann. 2016. Approximate Discovery of Functional Dependencies for Large Datasets. In CIKM. 1803\u20131812.","DOI":"10.1145\/2983323.2983781"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639298"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Bernardo Breve Loredana Caruccio Stefano Cirillo Vincenzo Deufemia and Giuseppe Polese. 2023. IndiBits: Incremental Discovery of Relaxed Functional Dependencies using Bitwise Similarity. In ICDE. 1393\u20131405.","DOI":"10.1109\/ICDE55515.2023.00111"},{"key":"e_1_2_1_7_1","article-title":"Incremental Discovery of Imprecise Functional Dependencies","volume":"12","author":"Caruccio Loredana","year":"2020","unstructured":"Loredana Caruccio and Stefano Cirillo. 2020. Incremental Discovery of Imprecise Functional Dependencies. ACM J. Data Inf. Qual. 12, 4 (2020), 19:1\u201319:25.","journal-title":"ACM J. Data Inf. Qual."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2967722"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2472010"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-019-00667-7"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2023.3248780"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536258.2536262"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Xu Chu Ihab F. Ilyas and Paolo Papotti. 2013. Holistic data cleaning: Putting violations into context. In ICDE. 458\u2013469.","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Xiaoou Ding Yida Liu Hongzhi Wang Chen Wang Yichen Song Donghua Yang and Jianmin Wang. 2024. Efficient Relaxed Functional Dependency Discovery with Minimal Set Cover. In ICDE. 3519\u20133531.","DOI":"10.1109\/ICDE60146.2024.00271"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3681954.3682015"},{"key":"e_1_2_1_16_1","volume-title":"Foundations of Data Quality Management","author":"Fan Wenfei","unstructured":"Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers, San Rafael."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.154"},{"key":"e_1_2_1_18_1","first-page":"139","article-title":"Database Dependency Discovery","volume":"12","author":"Flach Peter A.","year":"1999","unstructured":"Peter A. Flach and Iztok Savnik. 1999. Database Dependency Discovery: A Machine Learning Approach. AI Commun. 12, 3 (1999), 139\u2013160.","journal-title":"A Machine Learning Approach. AI Commun."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453900"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/42.2.100"},{"key":"e_1_2_1_21_1","volume-title":"Ilyas and Xu Chu","author":"Ihab","year":"2019","unstructured":"Ihab F. Ilyas and Xu Chu. 2019. Data Cleaning. ACM, New York City."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007641"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2021.3130227"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Yifeng Jin Lin Zhu and Zijing Tan. 2020. Efficient Bidirectional Order Dependency Discovery. In ICDE. 61\u201372.","DOI":"10.1109\/ICDE48307.2020.00013"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Jyrki Kivinen and Heikki Mannila. 1992. Approximate Dependency Inference from Relations. In ICDT. 86\u201398.","DOI":"10.1007\/3-540-56039-4_34"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192968"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3654621.3654624"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-015-0412-3"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2402"},{"key":"e_1_2_1_30_1","unstructured":"Mengran Li Zijing Tan Honghui Yang and Shuai Ma. 2024. https:\/\/github.com\/YukinoMR\/FastRFD (last accessed 2025\/4\/10)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00696-z"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Qiongqiong Lin Yunfan Gu Jingyan Sai Jinfei Liu Kui Ren Li Xiong Tianzhen Wang Yanbei Pang Sheng Wang and Feifei Li. 2023. EulerFD: An Efficient Double-Cycle Approximation of Functional Dependencies. In ICDE. 2878\u20132891.","DOI":"10.1109\/ICDE55515.2023.00220"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3401960.3401966"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Panagiotis Mandros Mario Boley and Jilles Vreeken. 2017. Discovering Reliable Approximate Functional Dependencies. In SIGKDD. 355\u2013363.","DOI":"10.1145\/3097983.3098062"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Panagiotis Mandros Mario Boley and Jilles Vreeken. 2018. Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms. In ICDM. 317\u2013326.","DOI":"10.1109\/ICDM.2018.00047"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Panagiotis Mandros David Kaltenpoth Mario Boley and Jilles Vreeken. 2020. Discovering Functional Dependencies from Mixed-Type Data. In SIGKDD. 1404\u20131414.","DOI":"10.1145\/3394486.3403193"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0169-023X(94)90023-X"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/2794367.2794377"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Thorsten Papenbrock and Felix Naumann. 2016. A Hybrid Approach to Functional Dependency Discovery. In SIGMOD. 821\u2013833.","DOI":"10.1145\/2882903.2915203"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368293"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574254"},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Fr\u00e9d\u00e9ric Pennerath Panagiotis Mandros and Jilles Vreeken. 2020. Discovering Approximate Functional Dependencies using Smoothed Mutual Information. In SIGKDD. 1254\u20131264.","DOI":"10.1145\/3394486.3403178"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-023-00788-y"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236193"},{"key":"e_1_2_1_45_1","volume-title":"ECML PKDD","author":"Rammelaere Joeri","year":"2018","unstructured":"Joeri Rammelaere and Floris Geerts. 2018. Revisiting Conditional Functional Dependency Discovery: Splitting the \"C\" from the \"FD\". In ECML PKDD 2018. 552\u2013568."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342638"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3392778"},{"key":"e_1_2_1_48_1","unstructured":"Philipp Schirmer Thorsten Papenbrock Sebastian Kruse Felix Naumann Dennis Hempfing Torben Mayer and Daniel Neusch\u00e4fer-Rube. 2019. DynFD: Functional Dependency Discovery in Dynamic Datasets. In EDBT. 253\u2013264."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00683-4"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000826"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2013.84"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3046443"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3067421.3067422"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0510-0"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556568"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00684-3"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Ziheng Wei and Sebastian Link. 2019. Discovery and Ranking of Functional Dependencies. In ICDE. 1526\u20131537.","DOI":"10.1109\/ICDE.2019.00137"},{"key":"e_1_2_1_58_1","volume-title":"Robertson","author":"Wyss Catharine M.","year":"2001","unstructured":"Catharine M. Wyss, Chris Giannella, and Edward L. Robertson. 2001. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances. In DaWaK. 101\u2013110."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565828"},{"key":"e_1_2_1_60_1","doi-asserted-by":"crossref","unstructured":"Renjie Xiao Yong'an Yuan Zijing Tan Shuai Ma and Wei Wang. 2022. Dynamic Functional Dependency Discovery with Dynamic Hitting Set Enumeration. In ICDE. 286\u2013298.","DOI":"10.1109\/ICDE53745.2022.00026"},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Yunjia Zhang Zhihan Guo and Theodoros Rekatsinas. 2020. A Statistical Perspective on Discovering Functional Dependencies in Noisy Data. In SIGMOD. 861\u2013876.","DOI":"10.1145\/3318464.3389749"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3734839.3734843","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T16:02:08Z","timestamp":1756483328000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3734839.3734843"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":61,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10.14778\/3734839.3734843"],"URL":"https:\/\/doi.org\/10.14778\/3734839.3734843","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,3]]},"assertion":[{"value":"2025-08-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}