{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:39:04Z","timestamp":1781714344760,"version":"3.54.5"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,3]]},"abstract":"<jats:p>\n            We define the table union search problem and present a probabilistic solution for finding tables that are unionable with a query table within massive repositories. Two tables are\n            <jats:italic>unionable<\/jats:italic>\n            if they share attributes from the same domain. Our solution formalizes three statistical models that describe how unionable attributes are generated from set domains, semantic domains with values from an ontology, and natural language domains. We propose a data-driven approach that automatically determines the best model to use for each pair of attributes. Through a distribution-aware algorithm, we are able to find the optimal number of attributes in two tables that can be unioned. To evaluate accuracy, we created and open-sourced a benchmark of Open Data tables. We show that our table union search outperforms in speed and accuracy existing algorithms for finding related tables and scales to provide efficient search over Open Data repositories containing more than one million attributes.\n          <\/jats:p>","DOI":"10.14778\/3192965.3192973","type":"journal-article","created":{"date-parts":[[2018,5,22]],"date-time":"2018-05-22T19:56:10Z","timestamp":1527018970000},"page":"813-825","source":"Crossref","is-referenced-by-count":166,"title":["Table union search on open data"],"prefix":"10.14778","volume":"11","author":[{"given":"Fatemeh","family":"Nargesian","sequence":"first","affiliation":[{"name":"University of Toronto"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Erkang","family":"Zhu","sequence":"additional","affiliation":[{"name":"University of Toronto"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ken Q.","family":"Pu","sequence":"additional","affiliation":[{"name":"UOIT"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ren\u00e9e J.","family":"Miller","sequence":"additional","affiliation":[{"name":"University of Toronto"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2018,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060840"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944966"},{"key":"e_1_2_1_3_1","first-page":"21","volume-title":"Proceedings of the Compression and Complexity of Sequences","author":"Broder A.","year":"1997","unstructured":"A. Broder . On the resemblance and containment of documents . In Proceedings of the Compression and Complexity of Sequences , pages 21 -- 30 , 1997 . A. Broder. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences, pages 21--30, 1997."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453916"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687750"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509965"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb051463"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078186"},{"key":"e_1_2_1_9_1","volume-title":"2017 data scientist report. https:\/\/visit.crowdflower.com\/WC-2017-Data-Science-Report_LP.html","year":"2017","unstructured":"CrowdFlower. 2017 data scientist report. https:\/\/visit.crowdflower.com\/WC-2017-Data-Science-Report_LP.html , 2017 . CrowdFlower. 2017 data scientist report. https:\/\/visit.crowdflower.com\/WC-2017-Data-Science-Report_LP.html, 2017."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536345"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872784"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177732979"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00015-7"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2068"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872783"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983876"},{"key":"e_1_2_1_17_1","volume-title":"JMP for Basic Univariate and Multivariate Statistics: A Step-by-step Guide","author":"Lehman A.","year":"2005","unstructured":"A. Lehman . JMP for Basic Univariate and Multivariate Statistics: A Step-by-step Guide . SAS Institute , 2005 . A. Lehman. JMP for Basic Univariate and Multivariate Statistics: A Step-by-step Guide. SAS Institute, 2005."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.3233\/SW-140134"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137657"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/2787930"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1921005"},{"key":"e_1_2_1_22_1","first-page":"2677","volume-title":"IJCAI","author":"Ling X.","year":"2013","unstructured":"X. Ling , A. Y. Halevy , F. Wu , and C. Yu . Synthesizing union tables from the web . In IJCAI , pages 2677 -- 2683 , 2013 . X. Ling, A. Y. Halevy, F. Wu, and C. Yu. Synthesizing union tables from the web. In IJCAI, pages 2677--2683, 2013."},{"issue":"11","key":"e_1_2_1_23_1","first-page":"2579","article-title":"Visualizing data using t-sne","volume":"9","author":"Maaten L.","year":"2008","unstructured":"L. Maaten and G. Hinton . Visualizing data using t-sne . Journal of Machine Learning Research , 9 ( 11 ): 2579 -- 2605 , 2008 . L. Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11):2579--2605, 2008.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1394399"},{"key":"e_1_2_1_25_1","first-page":"3111","volume-title":"NISP","author":"Mikolov T.","year":"2013","unstructured":"T. Mikolov , I. Sutskever , K. Chen , G. S. Corrado , and J. Dean . Distributed representations of words and phrases and their compositionality . In NISP , pages 3111 -- 3119 , 2013 . T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NISP, pages 3111--3119, 2013."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687649"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336665"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-16518-4_1"},{"key":"e_1_2_1_29_1","volume-title":"Mathematical Statistics and Data Analysis","author":"Rice J. A.","year":"2006","unstructured":"J. A. Rice . Mathematical Statistics and Data Analysis . 2006 . J. A. Rice. Mathematical Statistics and Data Analysis. 2006."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2797115.2797118"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213962"},{"key":"e_1_2_1_32_1","first-page":"129","volume-title":"ICML","author":"Socher R.","year":"2011","unstructured":"R. Socher , C. C. Lin , A. Y. Ng , and C. D. Manning . Parsing natural scenes and natural language with recursive neural networks . In ICML , pages 129 -- 136 , 2011 . R. Socher, C. C. Lin, A. Y. Ng, and C. D. Manning. Parsing natural scenes and natural language with recursive neural networks. In ICML, pages 129--136, 2011."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/11687238_8"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002939"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213848"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2970398.2970403"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2006.12.002"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1757898.1758065"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994534"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3192965.3192973","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:13:53Z","timestamp":1672226033000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3192965.3192973"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3]]},"references-count":40,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2018,3]]}},"alternative-id":["10.14778\/3192965.3192973"],"URL":"https:\/\/doi.org\/10.14778\/3192965.3192973","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,3]]}}}