{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T14:36:34Z","timestamp":1761489394058},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>We present the design of a system for assembling a table from a few example rows by harnessing the huge corpus of information-rich but unstructured lists on the web. We developed a totally unsupervised end to end approach which given the sample query rows --- (a) retrieves HTML lists relevant to the query from a pre-indexed crawl of web lists, (b) segments the list records and maps the segments to the query schema using a statistical model, (c) consolidates the results from multiple lists into a unified merged table, (d) and presents to the user the consolidated records ranked by their estimated membership in the target relation.<\/jats:p>\n          <jats:p>The key challenges in this task include construction of new rows from very few examples, and an abundance of noisy and irrelevant lists that swamp the consolidation and ranking of rows. We propose modifications to statistical record segmentation models, and present novel consolidation and ranking techniques that can process input tables of arbitrary schema without requiring any human supervision.<\/jats:p>\n          <jats:p>Experiments with Wikipedia target tables and 16 million unstructured lists show that even with just three sample rows, our system is very effective at recreating Wikipedia tables, with a mean runtime of around 20s.<\/jats:p>","DOI":"10.14778\/1687627.1687661","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"289-300","source":"Crossref","is-referenced-by-count":68,"title":["Answering table augmentation queries from unstructured lists on the web"],"prefix":"10.14778","volume":"2","author":[{"given":"Rahul","family":"Gupta","sequence":"first","affiliation":[{"name":"IIT Bombay, India"}]},{"given":"Sunita","family":"Sarawagi","sequence":"additional","affiliation":[{"name":"IIT Bombay, India"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014058"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2007.10.002"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2003.1234765"},{"key":"e_1_2_1_4_1","volume-title":"WebDB","author":"Cafarella M.","year":"2008","unstructured":"M. Cafarella , N. Khoussainova , D. Wang , E. Wu , Y. Zhang , and A. Halevy . Uncovering the relational web . In WebDB , 2008 . M. Cafarella, N. Khoussainova, D. Wang, E. Wu, Y. Zhang, and A. Halevy. Uncovering the relational web. In WebDB, 2008."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.55"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372182"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872796"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/511446.511477"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687749"},{"key":"e_1_2_1_10_1","volume-title":"NIPS","author":"Ghahramani Z.","year":"2005","unstructured":"Z. Ghahramani and K. A. Heller . Bayesian sets . In NIPS , 2005 . Z. Ghahramani and K. A. Heller. Bayesian sets. In NIPS, 2005."},{"key":"e_1_2_1_11_1","volume-title":"VLDB","author":"Gupta R.","year":"2006","unstructured":"R. Gupta and S. Sarawagi . Curating probabilistic databases from information extraction models . In VLDB , 2006 . R. Gupta and S. Sarawagi. Curating probabilistic databases from information extraction models. In VLDB, 2006."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/262228"},{"key":"e_1_2_1_13_1","volume-title":"Workshop on Advances in Text Extraction and Mining (ATEM)","author":"Lerman K.","year":"2001","unstructured":"K. Lerman , C. Knoblock , and S. Minton . Automatic data extraction from lists and tables in web sources . In Workshop on Advances in Text Extraction and Mining (ATEM) , 2001 . K. Lerman, C. Knoblock, and S. Minton. Automatic data extraction from lists and tables in web sources. In Workshop on Advances in Text Extraction and Mining (ATEM), 2001."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622655.1622671"},{"key":"e_1_2_1_15_1","volume-title":"UAI","author":"Ravikumar P.","year":"2004","unstructured":"P. Ravikumar and W. W. Cohen . A hierarchical graphical model for record linkage . In UAI , 2004 . P. Ravikumar and W. W. Cohen. A hierarchical graphical model for record linkage. In UAI, 2004."},{"key":"e_1_2_1_16_1","volume-title":"AAAI","author":"Riloff E.","year":"1999","unstructured":"E. Riloff and R. Jones . Learning dictionaries for information extraction by multi-level bootstrapping . In AAAI , 1999 . E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI, 1999."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000003"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775087"},{"key":"e_1_2_1_19_1","volume-title":"NIPS","author":"Sarawagi S.","year":"2004","unstructured":"S. Sarawagi and W. W. Cohen . Semi-markov conditional random fields for information extraction . In NIPS , 2004 . S. Sarawagi and W. W. Cohen. Semi-markov conditional random fields for information extraction. In NIPS, 2004."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.65"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1613715.1613787"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2007.104"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060761"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687661","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:26:37Z","timestamp":1672226797000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":23,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687661"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687661","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}