{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T07:26:54Z","timestamp":1775028414157,"version":"3.50.1"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2011,2]]},"abstract":"<jats:p>In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model. We empirically evaluate GDR on a real-world dataset and show significant improvement in data quality using our user guided repairing process. We also, assess the trade-off between the user efforts and the resulting data quality.<\/jats:p>","DOI":"10.14778\/1952376.1952378","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T08:17:57Z","timestamp":1403597877000},"page":"279-289","source":"Crossref","is-referenced-by-count":148,"title":["Guided data repair"],"prefix":"10.14778","volume":"4","author":[{"given":"Mohamed","family":"Yakout","sequence":"first","affiliation":[{"name":"Purdue University and Qatar Computing Research Institute, Qatar Foundation"}]},{"given":"Ahmed K.","family":"Elmagarmid","sequence":"additional","affiliation":[{"name":"Qatar Computing Research Institute, Qatar Foundation"}]},{"given":"Jennifer","family":"Neville","sequence":"additional","affiliation":[{"name":"Purdue University"}]},{"given":"Mourad","family":"Ouzzani","sequence":"additional","affiliation":[{"name":"Purdue University"}]},{"given":"Ihab F.","family":"Ilyas","sequence":"additional","affiliation":[]}],"member":"320","published-online":{"date-parts":[[2011,2]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Data Quality: Concepts, Methodologies and Techniques","author":"Batini C.","year":"2006","unstructured":"C. Batini and M. Scannapieco . Data Quality: Concepts, Methodologies and Techniques . Addison-Wesley , 2006 . C. Batini and M. Scannapieco. Data Quality: Concepts, Methodologies and Techniques. Addison-Wesley, 2006."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066175"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367920"},{"key":"e_1_2_1_4_1","first-page":"243","volume-title":"VLDB","author":"Bravo L.","year":"2007","unstructured":"L. Bravo , W. Fan , and S. Ma . Extending dependencies with conditions . In VLDB , pages 243 -- 254 , 2007 . L. Bravo, W. Fan, and S. Ma. Extending dependencies with conditions. In VLDB, pages 243--254, 2007."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ic.2004.04.007"},{"key":"e_1_2_1_7_1","first-page":"315","volume-title":"VLDB","author":"Cong G.","year":"2007","unstructured":"G. Cong , W. Fan , F. Geerts , X. Jia , and S. Ma . Improving data quality: consistency and accuracy . In VLDB , pages 315 -- 326 , 2007 . G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma. Improving data quality: consistency and accuracy. In VLDB, pages 315--326, 2007."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376916.1376940"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.208"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687674"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920867"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.336568"},{"key":"e_1_2_1_13_1","first-page":"376","volume-title":"VLDB","author":"Golab L.","year":"2008","unstructured":"L. Golab , H. Karloff , F. Korn , D. Srivastava , and B. Yu . On generating nearoptimal tableaux for conditional functional dependencies . In VLDB , pages 376 -- 390 , 2008 . L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu. On generating nearoptimal tableaux for conditional functional dependencies. In VLDB, pages 376--390, 2008."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376701"},{"key":"e_1_2_1_15_1","first-page":"877","volume-title":"IJCAI","author":"Kapoor A.","year":"2007","unstructured":"A. Kapoor , E. Horvitz , and S. Basu . Selective supervision: Guiding supervised learning with decision-theoretic active learning . In IJCAI , pages 877 -- 882 , 2007 . A. Kapoor, E. Horvitz, and S. Basu. Selective supervision: Guiding supervised learning with decision-theoretic active learning. In IJCAI, pages 877--882, 2007."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2007.367867"},{"key":"e_1_2_1_17_1","first-page":"381","volume-title":"VLDB","author":"Raman V.","year":"2001","unstructured":"V. Raman and J. M. Hellerstein . Potter's wheel: An interactive data cleaning system . In VLDB , pages 381 -- 390 , 2001 . V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001."},{"key":"e_1_2_1_18_1","volume-title":"Artificial Intelligence: A Modern Approach","author":"Russell S.","year":"2003","unstructured":"S. Russell and P. Norvig . Artificial Intelligence: A Modern Approach . Addison-Wesley , 2003 . S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Addison-Wesley, 2003."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775087"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244302760185243"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2010.5452767"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807325"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/502512.502540"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1952376.1952378","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T05:55:38Z","timestamp":1672206938000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1952376.1952378"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,2]]},"references-count":23,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2011,2]]}},"alternative-id":["10.14778\/1952376.1952378"],"URL":"https:\/\/doi.org\/10.14778\/1952376.1952378","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2011,2]]}}}