{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T14:46:37Z","timestamp":1773153997135,"version":"3.50.1"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,10]]},"abstract":"<jats:p>We study the problem of introducing errors into clean databases for the purpose of benchmarking data-cleaning algorithms. Our goal is to provide users with the highest possible level of control over the error-generation process, and at the same time develop solutions that scale to large databases. We show in the paper that the error-generation problem is surprisingly challenging, and in fact, NP-complete. To provide a scalable solution, we develop a correct and efficient greedy algorithm that sacrifices completeness, but succeeds under very reasonable assumptions. To scale to millions of tuples, the algorithm relies on several non-trivial optimizations, including a new symmetry property of data quality constraints. The trade-off between control and scalability is the main technical contribution of the paper.<\/jats:p>","DOI":"10.14778\/2850578.2850579","type":"journal-article","created":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T14:10:31Z","timestamp":1454335831000},"page":"36-47","source":"Crossref","is-referenced-by-count":61,"title":["Messing up with BART"],"prefix":"10.14778","volume":"9","author":[{"given":"Patricia C.","family":"Arocena","sequence":"first","affiliation":[{"name":"University of Toronto, Canada"}]},{"given":"Boris","family":"Glavic","sequence":"additional","affiliation":[{"name":"Illinois Inst. of Technology"}]},{"given":"Giansalvatore","family":"Mecca","sequence":"additional","affiliation":[{"name":"University of Basilicata, Italy"}]},{"given":"Ren\u00e9e J.","family":"Miller","sequence":"additional","affiliation":[{"name":"University of Toronto, Canada"}]},{"given":"Paolo","family":"Papotti","sequence":"additional","affiliation":[{"name":"QCRI Doha, Qatar"}]},{"given":"Donatello","family":"Santoro","sequence":"additional","affiliation":[{"name":"University of Basilicata, Italy"}]}],"member":"320","published-online":{"date-parts":[[2015,10]]},"reference":[{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/319628.319634"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920870"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367896"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367920"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_2_1_7_1","first-page":"243","volume-title":"VLDB","author":"Bravo L.","year":"2007","unstructured":"L. Bravo , W. Fan , and S. Ma . Extending Dependencies with Conditions . In VLDB , pages 243 -- 254 , 2007 . L. Bravo, W. Fan, and S. Ma. Extending Dependencies with Conditions. In VLDB, pages 243--254, 2007."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"e_1_2_1_9_1","first-page":"315","volume-title":"VLDB","author":"Cong G.","year":"2007","unstructured":"G. Cong , W. Fan , F. Geerts , X. Jia , and S. Ma . Improving Data Quality: Consistency and Accuracy . In VLDB , pages 315 -- 326 , 2007 . G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving Data Quality: Consistency and Accuracy. In VLDB, pages 315--326, 2007."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.27"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465327"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2371176"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1366102.1366103"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920867"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066176"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816654"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733031"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920869"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453936"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1514894.1514901"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367867"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213875"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008726123320"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610494"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463706"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2850578.2850579","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:17:12Z","timestamp":1672222632000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2850578.2850579"}},"subtitle":["error generation for evaluating data-cleaning algorithms"],"short-title":[],"issued":{"date-parts":[[2015,10]]},"references-count":24,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,10]]}},"alternative-id":["10.14778\/2850578.2850579"],"URL":"https:\/\/doi.org\/10.14778\/2850578.2850579","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,10]]}}}