{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:52:57Z","timestamp":1776117177733,"version":"3.50.1"},"reference-count":12,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:p>Web scraping (or wrapping) is a popular means for acquiring data from the web. Recent advancements have made scalable wrapper-generation possible and enabled data acquisition processes involving thousands of sources. This makes wrapper analysis and maintenance both needed and challenging as no scalable tools exists that support these tasks.<\/jats:p>\n          <jats:p>We demonstrate WADaR, a scalable and highly automated tool for joint wrapper and data repair. WADaR uses off-the-shelf entity recognisers to locate target entities in wrapper-generated data. Markov chains are used to determine structural repairs, that are then encoded into suitable repairs for both the data and corresponding wrappers.<\/jats:p>\n          <jats:p>We show that WADaR is able to increase the quality of wrapper-generated relations between 15% and 60%, and to fully repair the corresponding wrapper without any knowledge of the original website in more than 50% of the cases.<\/jats:p>","DOI":"10.14778\/2824032.2824120","type":"journal-article","created":{"date-parts":[[2015,9,16]],"date-time":"2015-09-16T12:18:17Z","timestamp":1442405897000},"page":"1996-1999","source":"Crossref","is-referenced-by-count":16,"title":["WADaR"],"prefix":"10.14778","volume":"8","author":[{"given":"Stefano","family":"Ortona","sequence":"first","affiliation":[{"name":"Oxford University, United Kingdom"}]},{"given":"Giorgio","family":"Orsi","sequence":"additional","affiliation":[{"name":"Oxford University, United Kingdom"}]},{"given":"Marcello","family":"Buoncristiano","sequence":"additional","affiliation":[{"name":"Universit\u00e0 della Basilicata, Italy"}]},{"given":"Tim","family":"Furche","sequence":"additional","affiliation":[{"name":"Oxford University, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2015,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536206.2536209"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536258.2536261"},{"key":"e_1_2_1_3_1","first-page":"1713","volume-title":"SIGMOD","author":"Chu X.","year":"2015","unstructured":"X. Chu , Y. He , K. Chakrabarti , and K. Ganjam . Tegra: Table extraction by global record alignment . In SIGMOD , pages 1713 -- 1728 . ACM, 2015 . 10.1145\/2723372.2723725 X. Chu, Y. He, K. Chakrabarti, and K. Ganjam. Tegra: Table extraction by global record alignment. In SIGMOD, pages 1713--1728. ACM, 2015. 10.1145\/2723372.2723725"},{"key":"e_1_2_1_4_1","first-page":"699","volume-title":"PVLDB","author":"Chuang S.-L.","year":"2007","unstructured":"S.-L. Chuang , K. C.-C. Chang , and C. Zhai . Context-aware wrapping: synchronized data extraction . In PVLDB , pages 699 -- 710 , 2007 . S.-L. Chuang, K. C.-C. Chang, and C. Zhai. Context-aware wrapping: synchronized data extraction. In PVLDB, pages 699--710, 2007."},{"key":"e_1_2_1_5_1","volume-title":"VLDB","volume":"1","author":"Crescenzi V.","year":"2001","unstructured":"V. Crescenzi , G. Mecca , P. Merialdo , : Towards automatic data extraction from large web sites . In VLDB , volume 1 , 2001 . V. Crescenzi, G. Mecca, P. Merialdo, et al. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, volume 1, 2001."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733085.2733091"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-012-0286-6"},{"key":"e_1_2_1_8_1","first-page":"775","volume-title":"SIGIR","author":"Hao Q.","year":"2011","unstructured":"Q. Hao , R. Cai , Y. Pang , and L. Zhang . From one tree to a forest: a unified solution for structured web data extraction . In SIGIR , pages 775 -- 784 . ACM, 2011 . 10.1145\/2009916.2010020 Q. Hao, R. Cai, Y. Pang, and L. Zhang. From one tree to a forest: a unified solution for structured web data extraction. In SIGIR, pages 775--784. ACM, 2011. 10.1145\/2009916.2010020"},{"key":"e_1_2_1_9_1","first-page":"29","volume-title":"ICDE","author":"Mansuri I. R.","year":"2006","unstructured":"I. R. Mansuri and S. Sarawagi . Integrating unstructured data into relational databases . In ICDE , pages 29 -- 29 . IEEE, 2006 . 10.1109\/ICDE.2006.83 I. R. Mansuri and S. Sarawagi. Integrating unstructured data into relational databases. In ICDE, pages 29--29. IEEE, 2006. 10.1109\/ICDE.2006.83"},{"key":"e_1_2_1_10_1","first-page":"517","volume-title":"SIGMOD","author":"Wang D. Z.","year":"2011","unstructured":"D. Z. Wang , M. J. Franklin , M. Garofalakis , J. M. Hellerstein , and M. L. Wick . Hybrid in-database inference for declarative information extraction . In SIGMOD , pages 517 -- 528 . ACM, 2011 . 10.1145\/1989323.1989378 D. Z. Wang, M. J. Franklin, M. Garofalakis, J. M. Hellerstein, and M. L. Wick. Hybrid in-database inference for declarative information extraction. In SIGMOD, pages 517--528. ACM, 2011. 10.1145\/1989323.1989378"},{"key":"e_1_2_1_11_1","first-page":"76","volume-title":"WWW","author":"Zhai Y.","year":"2005","unstructured":"Y. Zhai and B. Liu . Web data extraction based on partial tree alignment . In WWW , pages 76 -- 85 , 2005 . 10.1145\/1060745.1060761 Y. Zhai and B. Liu. Web data extraction based on partial tree alignment. In WWW, pages 76--85, 2005. 10.1145\/1060745.1060761"},{"key":"e_1_2_1_12_1","first-page":"66","volume-title":"WWW","author":"Zhao H.","year":"2005","unstructured":"H. Zhao , W. Meng , Z. Wu , V. Raghavan , and C. Yu . Fully automatic wrapper generation for search engines . In WWW , pages 66 -- 75 , 2005 . 10.1145\/1060745.1060760 H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu. Fully automatic wrapper generation for search engines. In WWW, pages 66--75, 2005. 10.1145\/1060745.1060760"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2824032.2824120","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:05:15Z","timestamp":1672221915000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2824032.2824120"}},"subtitle":["joint wrapper and data repair"],"short-title":[],"issued":{"date-parts":[[2015,8]]},"references-count":12,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["10.14778\/2824032.2824120"],"URL":"https:\/\/doi.org\/10.14778\/2824032.2824120","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,8]]}}}