{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:52:55Z","timestamp":1776117175793,"version":"3.50.1"},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2011,1]]},"abstract":"<jats:p>We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables us to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based techniques require, we are able to perform information extraction at web-scale, with accuracy unattained with existing unsupervised extraction techniques. Our system is used in production at Yahoo! and powers live applications.<\/jats:p>","DOI":"10.14778\/1938545.1938547","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"219-230","source":"Crossref","is-referenced-by-count":89,"title":["Automatic wrappers for large scale web extraction"],"prefix":"10.14778","volume":"4","author":[{"given":"Nilesh","family":"Dalvi","sequence":"first","affiliation":[{"name":"Yahoo! Research, Santa Clara, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ravi","family":"Kumar","sequence":"additional","affiliation":[{"name":"Yahoo! Research, Santa Clara, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohamed","family":"Soliman","sequence":"additional","affiliation":[{"name":"U. of Waterloo, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,1]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"126","volume-title":"LWA","author":"Anton T.","year":"2005"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872799"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018054314350"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453916"},{"key":"e_1_2_1_5_1","first-page":"109","volume-title":"VLDB","author":"Crescenzi V.","year":"2001"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559882"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687749"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/988672.988687"},{"key":"e_1_2_1_9_1","first-page":"161","volume-title":"ICML","author":"Freitag D.","year":"1998"},{"key":"e_1_2_1_10_1","first-page":"577","volume-title":"AAAI\/IAAI","author":"Freitag D.","year":"2000"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/603867.603873"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081920"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/306766.306775"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00100-9"},{"key":"e_1_2_1_15_1","first-page":"729","volume-title":"IJCAI","author":"Kushmerick N.","year":"1997"},{"key":"e_1_2_1_16_1","volume-title":"CIDR","author":"Madhavan J.","year":"2009"},{"key":"e_1_2_1_17_1","volume-title":"AAAI: Workshop on AI and Information Integration","author":"Muslea I.","year":"1998"},{"key":"e_1_2_1_18_1","volume-title":"IBM Research Report RJ","author":"Myllymaki J.","year":"2002"},{"key":"e_1_2_1_19_1","first-page":"738","volume-title":"VLDB","author":"Sahuguet A.","year":"1999"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458502.1458505"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956907"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1938545.1938547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:45:52Z","timestamp":1672224352000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1938545.1938547"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,1]]},"references-count":21,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,1]]}},"alternative-id":["10.14778\/1938545.1938547"],"URL":"https:\/\/doi.org\/10.14778\/1938545.1938547","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2011,1]]}}}