{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T23:31:13Z","timestamp":1780443073208,"version":"3.54.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,11]]},"abstract":"<jats:p>\n            We study the problem of self-supervised and interpretable data cleaning, which automatically extracts interpretable data repair rules from dirty data. In this paper, we propose a novel framework, namely Garf, based on sequence generative adversarial networks (SeqGAN). One key information Garf tries to capture is data repair rules (for example, if the city is \"Dothan\", then the county should be \"Houston\"). Garf employs a SeqGAN consisting of a generator\n            <jats:italic>G<\/jats:italic>\n            and a discriminator\n            <jats:italic>D<\/jats:italic>\n            that trains\n            <jats:italic>G<\/jats:italic>\n            to learn the dependency relationships (\n            <jats:italic>e.g.<\/jats:italic>\n            , given a city value \"Dothan\" as input, the county can be determined as \"Houston\"). After training, the generator\n            <jats:italic>G<\/jats:italic>\n            can be used to generate data repair rules, but may contain both trusted and untrusted rules, especially when learning from dirty data. To mitigate this problem, Garf further updates the learned relationships with another discriminator\n            <jats:italic>D'<\/jats:italic>\n            to iteratively improve the quality of both rules and data. Garf takes advantages of both logical and learning-based methods, which allow cleaning dirty data with high interpretability and have no requirements for prior knowledge and training data. Extensive experiments on real-world and synthetic datasets demonstrate the effectiveness of Garf. Garf achieves new state-of-the-art data cleaning result with high accuracy, through learning from dirty datasets without human supervision.\n          <\/jats:p>","DOI":"10.14778\/3570690.3570694","type":"journal-article","created":{"date-parts":[[2023,1,23]],"date-time":"2023-01-23T17:29:55Z","timestamp":1674494995000},"page":"433-446","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Self-Supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks"],"prefix":"10.14778","volume":"16","author":[{"given":"Jinfeng","family":"Peng","sequence":"first","affiliation":[{"name":"Northeastern Univ., China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Derong","family":"Shen","sequence":"additional","affiliation":[{"name":"Northeastern Univ., China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nan","family":"Tang","sequence":"additional","affiliation":[{"name":"HKBU, Qatar"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tieying","family":"Liu","sequence":"additional","affiliation":[{"name":"Northeastern Univ., China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yue","family":"Kou","sequence":"additional","affiliation":[{"name":"Northeastern Univ., China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tiezheng","family":"Nie","sequence":"additional","affiliation":[{"name":"Northeastern Univ., China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hang","family":"Cui","sequence":"additional","affiliation":[{"name":"UIUC"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ge","family":"Yu","sequence":"additional","affiliation":[{"name":"Northeastern Univ., China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,1,23]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994518"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","first-page":"1433","DOI":"10.1007\/s00778-020-00617-6","article-title":"Automatic weighted matching rectifying rule discovery for data repairing","volume":"29","author":"Ahmad Hiba Abu","year":"2020","unstructured":"Hiba Abu Ahmad and Hongzhi Wang . 2020 . Automatic weighted matching rectifying rule discovery for data repairing . VLDB J. 29 , 6 (2020), 1433 -- 1447 . Hiba Abu Ahmad and Hongzhi Wang. 2020. Automatic weighted matching rectifying rule discovery for data repairing. VLDB J. 29, 6 (2020), 1433--1447.","journal-title":"VLDB J."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367920"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the ACM SIGMOD International Conference on Management of Data. 143--154","author":"Bohannon Philip","year":"2005","unstructured":"Philip Bohannon , Michael Flaster , Wenfei Fan , and Rajeev Rastogi . 2005 . A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification . In Proceedings of the ACM SIGMOD International Conference on Management of Data. 143--154 . Philip Bohannon, Michael Flaster, Wenfei Fan, and Rajeev Rastogi. 2005. A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 143--154."},{"key":"e_1_2_1_6_1","volume-title":"2008 IEEE 24th International Conference on Data Engineering. IEEE, 516--525","author":"Bravo Loreto","year":"2008","unstructured":"Loreto Bravo , Wenfei Fan , Floris Geerts , and Shuai Ma . 2008 . Increasing the expressivity of conditional functional dependencies without extra complexity . In 2008 IEEE 24th International Conference on Data Engineering. IEEE, 516--525 . Loreto Bravo, Wenfei Fan, Floris Geerts, and Shuai Ma. 2008. Increasing the expressivity of conditional functional dependencies without extra complexity. In 2008 IEEE 24th International Conference on Data Engineering. IEEE, 516--525."},{"key":"e_1_2_1_7_1","volume-title":"VLDB","volume":"7","author":"Bravo Loreto","year":"2007","unstructured":"Loreto Bravo , Wenfei Fan , and Shuai Ma . 2007 . Extending Dependencies with Conditions .. In VLDB , Vol. 7 . Citeseer, 243--254. Loreto Bravo, Wenfei Fan, and Shuai Ma. 2007. Extending Dependencies with Conditions.. In VLDB, Vol. 7. Citeseer, 243--254."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-019-00667-7"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11--16","author":"Chiang Fei","year":"2011","unstructured":"Fei Chiang and Ren\u00e9e J. Miller . 2011. A unified model for data and constraint repair . In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11--16 , 2011 , Hannover, Germany, Serge Abiteboul, Klemens B\u00f6hm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 446--457. Fei Chiang and Ren\u00e9e J. Miller. 2011. A unified model for data and constraint repair. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11--16, 2011, Hannover, Germany, Serge Abiteboul, Klemens B\u00f6hm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 446--457."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536258.2536262"},{"key":"e_1_2_1_11_1","volume-title":"29th IEEE International Conference on Data Engineering, ICDE. 458--469","author":"Chu Xu","year":"2013","unstructured":"Xu Chu , Ihab F. Ilyas , and Paolo Papotti . 2013 . Holistic data cleaning: Putting violations into context . In 29th IEEE International Conference on Data Engineering, ICDE. 458--469 . Xu Chu, Ihab F. Ilyas, and Paolo Papotti. 2013. Holistic data cleaning: Putting violations into context. In 29th IEEE International Conference on Data Engineering, ICDE. 458--469."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824109"},{"key":"e_1_2_1_13_1","first-page":"315","article-title":"Improving Data Quality: Consistency and Accuracy","volume":"7","author":"Cong Gao","year":"2007","unstructured":"Gao Cong , Wenfei Fan , Floris Geerts , Xibei Jia , and Shuai Ma . 2007 . Improving Data Quality: Consistency and Accuracy .. In VLDB , Vol. 7. 315 -- 326 . Gao Cong, Wenfei Fan, Floris Geerts, Xibei Jia, and Shuai Ma. 2007. Improving Data Quality: Consistency and Accuracy.. In VLDB, Vol. 7. 315--326.","journal-title":"VLDB"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465327"},{"key":"e_1_2_1_15_1","volume-title":"Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang.","author":"Deng Dong","year":"2017","unstructured":"Dong Deng , Raul Castro Fernandez , Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017 . The Data Civilizer System. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings . www.cidrdb.org. http:\/\/cidrdb.org\/cidr2017\/papers\/p44-deng-cidr17.pdf Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The Data Civilizer System. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2017\/papers\/p44-deng-cidr17.pdf"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3430915.3442430"},{"key":"e_1_2_1_17_1","volume-title":"Deep generative image models using a laplacian pyramid of adversarial networks. arXiv preprint arXiv:1506.05751","author":"Denton Emily","year":"2015","unstructured":"Emily Denton , Soumith Chintala , Arthur Szlam , and Rob Fergus . 2015. Deep generative image models using a laplacian pyramid of adversarial networks. arXiv preprint arXiv:1506.05751 ( 2015 ). Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a laplacian pyramid of adversarial networks. arXiv preprint arXiv:1506.05751 (2015)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536274.2536280"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-010-0206-6"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366102.1366103"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.154"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-011-0253-7"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.82"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2567657"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1921060"},{"key":"e_1_2_1_26_1","volume-title":"GAN Ensemble for Anomaly Detection. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence. 4090--4097","author":"Han Xu","year":"2021","unstructured":"Xu Han , Xiaohui Chen , and Li-Ping Liu . 2021 . GAN Ensemble for Anomaly Detection. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence. 4090--4097 . Xu Han, Xiaohui Chen, and Li-Ping Liu. 2021. GAN Ensemble for Anomaly Detection. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence. 4090--4097."},{"key":"e_1_2_1_27_1","volume-title":"33rd IEEE International Conference on Data Engineering, ICDE. 49--50","author":"Hao Shuang","year":"2017","unstructured":"Shuang Hao , Nan Tang , Guoliang Li , Jian He , Na Ta , and Jianhua Feng . 2017 . A Novel Cost-Based Model for Data Repairing . In 33rd IEEE International Conference on Data Engineering, ICDE. 49--50 . Shuang Hao, Nan Tang, Guoliang Li, Jian He, Na Ta, and Jianhua Feng. 2017. A Novel Cost-Based Model for Data Repairing. In 33rd IEEE International Conference on Data Engineering, ICDE. 49--50."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.141"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0506-9"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/42.2.100"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113269"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407801"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3324956"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2794367.2794377"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856325"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3377369.3377377"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236193"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137631"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476301"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732967.2732974"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457391"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/3457390.3457391"},{"key":"e_1_2_1_43_1","volume-title":"International Conference on Management of Data, SIGMOD. 457--468","author":"Wang Jiannan","year":"2014","unstructured":"Jiannan Wang and Nan Tang . 2014 . Towards dependable data repairing with fixing rules . In International Conference on Management of Data, SIGMOD. 457--468 . Jiannan Wang and Nan Tang. 2014. Towards dependable data repairing with fixing rules. In International Conference on Management of Data, SIGMOD. 457--468."},{"key":"e_1_2_1_44_1","first-page":"3","article-title":"Dependable Data Repairing with Fixing Rules","volume":"8","author":"Wang Jiannan","year":"2017","unstructured":"Jiannan Wang and Nan Tang . 2017 . Dependable Data Repairing with Fixing Rules . ACM J. Data Inf. Qual. 8 , 3 -- 4 (2017), 16:1--16:34. Jiannan Wang and Nan Tang. 2017. Dependable Data Repairing with Fixing Rules. ACM J. Data Inf. Qual. 8, 3--4 (2017), 16:1--16:34.","journal-title":"ACM J. Data Inf. Qual."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1093382.1093385"},{"key":"e_1_2_1_46_1","volume-title":"Third International Conference, DaWaK 2001, Munich, Germany, September 5--7, 2001, Proceedings. 101--110","author":"Wyss Catharine M.","unstructured":"Catharine M. Wyss , Chris Giannella , and Edward L. Robertson . 2001. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract. In Data Warehousing and Knowledge Discovery , Third International Conference, DaWaK 2001, Munich, Germany, September 5--7, 2001, Proceedings. 101--110 . Catharine M. Wyss, Chris Giannella, and Edward L. Robertson. 2001. FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract. In Data Warehousing and Knowledge Discovery, Third International Conference, DaWaK 2001, Munich, Germany, September 5--7, 2001, Proceedings. 101--110."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463706"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/1952376.1952378"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002","author":"Yao Hong","year":"2002","unstructured":"Hong Yao , Howard J. Hamilton , and Cory J. Butz . 2002. FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences . In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002 ), 9--12 December 2002 , Maebashi City, Japan. 729--732. Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2002. FD_Mine: Discovering Functional Dependencies in a Database Using Equivalences. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9--12 December 2002, Maebashi City, Japan. 729--732."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning, ICML. 5675--5684","author":"Yoon Jinsung","unstructured":"Jinsung Yoon , James Jordon , and Mihaela van der Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets . In Proceedings of the 35th International Conference on Machine Learning, ICML. 5675--5684 . Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing Data Imputation using Generative Adversarial Nets. In Proceedings of the 35th International Conference on Machine Learning, ICML. 5675--5684."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00457"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2852--2858","author":"Yu Lantao","year":"2017","unstructured":"Lantao Yu , Weinan Zhang , Jun Wang , and Yong Yu . 2017 . SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient . In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2852--2858 . Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2852--2858."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3570690.3570694","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,23]],"date-time":"2023-01-23T17:36:51Z","timestamp":1674495411000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3570690.3570694"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11]]},"references-count":52,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,11]]}},"alternative-id":["10.14778\/3570690.3570694"],"URL":"https:\/\/doi.org\/10.14778\/3570690.3570694","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,11]]},"assertion":[{"value":"2023-01-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}