{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T05:07:24Z","timestamp":1735708044580,"version":"3.32.0"},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>We demonstrate Rock, a system for cleaning relational data. Rock highlights the following unique features: (1) it extends logic rules by embedding machine learning models as predicates, to benefit from both ML and logic deduction; (2) it supports entity resolution, conflict resolution, timeliness deduction and missing data imputation in a unified process; and (3) it provides parallelly scalable algorithms for rule discovery, error detection and error correction, in batch and incremental modes. We will demonstrate Rock for its (a) easy-to-use interface, (b) scalability when cleaning large datasets, (c) accuracy for detecting and correcting errors across multiple tables, and (d) applications at banks and HR departments.<\/jats:p>","DOI":"10.14778\/3685800.3685878","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4373-4376","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Rock: Cleaning Data with both ML and Logic Rules"],"prefix":"10.14778","volume":"17","author":[{"given":"Zian","family":"Bao","sequence":"first","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Binbin","family":"Bie","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Wenfei","family":"Fan","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China and University of Edinburgh, United Kingdom and Beihang University, China"}]},{"given":"Daji","family":"Li","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Mengyun","family":"Li","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Kaiwen","family":"Lin","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Peijie","family":"Liu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Peng","family":"Liu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Zhicong","family":"Lv","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Mingliang","family":"Ouyang","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Chenyang","family":"Sun","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Shuai","family":"Tang","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Yaoshu","family":"Wang","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Qiyuan","family":"Wei","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Xiangqian","family":"Wu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Min","family":"Xie","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Jing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Runxiao","family":"Zhao","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Jie","family":"Zhu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]},{"given":"Yilin","family":"Zhu","sequence":"additional","affiliation":[{"name":"Shenzhen Institute of Computing Sciences, China"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2023. Rock. http:\/\/www.grandhoo.com\/en."},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Marcelo Arenas Leopoldo Bertossi and Jan Chomicki. 1999. Consistent Query Answers in Inconsistent Databases. In PODS. 68--79.","DOI":"10.1145\/303976.303983"},{"key":"e_1_2_1_3_1","volume-title":"Rock: Cleaning Data by Embedding ML in Logic Rules. In SIGMOD (industrial paper). ACM.","author":"Bao","year":"2024","unstructured":"Bao et. al. 2024. Rock: Cleaning Data by Embedding ML in Logic Rules. In SIGMOD (industrial paper). ACM."},{"key":"e_1_2_1_4_1","unstructured":"Exasol. 2020. Exasol Research Finds 58% of Organizations Make Decisions Based on Outdated Data. https:\/\/www.exasol.com\/news-exasol-research-finds-organizations-make-decisions-based-on-outdated-data\/."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-010-0206-6"},{"volume-title":"Linking Entities across Relations and Graphs","author":"Fan Wenfei","key":"e_1_2_1_6_1","unstructured":"Wenfei Fan, Ling Ge, Ruochun Jin, Ping Lu, and Wenyuan Yu. 2022. Linking Entities across Relations and Graphs. In ICDE. IEEE, 634--647."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366102.1366103"},{"key":"e_1_2_1_8_1","volume-title":"Proc. ACM Manag. Data","author":"Fan Wenfei","year":"2024","unstructured":"Wenfei Fan, Ziyan Han, Weilong Ren, Ding Wang Yaoshu Wang, Min Xie, and Mengyi Yan. 2024. Splitting Tuples of Mismatched Entities. Proc. ACM Manag. Data (2024)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Wenfei Fan Ziyan Han Yaoshu Wang and Min Xie. 2022. Parallel Rule Discovery from Large Datasets by Sampling. In SIGMOD. ACM 384--398.","DOI":"10.1145\/3514221.3526165"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588924"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457400"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3594512.3594524"},{"key":"e_1_2_1_13_1","volume-title":"Jonathan AC Sterne, and Kate Tilling","author":"Hughes Rachael A","year":"2019","unstructured":"Rachael A Hughes, Jon Heron, Jonathan AC Sterne, and Kate Tilling. 2019. Accounting for missing data in statistical analyses: Multiple imputation is not always the answer. International journal of epidemiology 48, 4 (2019), 1294--1304."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(90)90192-K"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407801"},{"key":"e_1_2_1_16_1","volume-title":"Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang.","author":"Mahdavi Mohammad","year":"2019","unstructured":"Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, and Nan Tang. 2019. Raha: A Configuration-Free Error Detection System. In SIGMOD. 865--882."},{"key":"e_1_2_1_17_1","volume-title":"Holo-Clean: Holistic Data Repairs with Probabilistic Inference. PVLDB","author":"Rekatsinas Theodoros","year":"2017","unstructured":"Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher R\u00e9. 2017. Holo-Clean: Holistic Data Repairs with Probabilistic Inference. PVLDB (2017)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685878","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:26:05Z","timestamp":1735622765000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685878"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":17,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685878"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685878","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}