{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:55:22Z","timestamp":1775638522001,"version":"3.50.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["IIS-1956096, IIS-2107248 and IIS-1762268"],"award-info":[{"award-number":["IIS-1956096, IIS-2107248 and IIS-1762268"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of the union search. Consequently, we introduce a new notion of unionability that considers relationships between columns, together with the semantics of columns, in a principled way. To do so, we present two new methods to discover the semantic relationships between pairs of columns. The first uses an existing knowledge base (KB), and the second (which we call a \"synthesized KB\") uses knowledge from the data lake itself. We adopt an existing Table Union Search benchmark and present new (open) benchmarks that represent small and large real data lakes. We show that our new unionability search algorithm, called SANTOS, outperforms a state-of-the-art union search that uses a wide variety of column-based semantics, including word embeddings and regular expressions. We show empirically that our synthesized KB improves the accuracy of union search by representing relationship semantics that may not be contained in an available KB. This result hints at a promising future of creating synthesized KBs from data lakes with limited KB coverage and using them for union search.<\/jats:p>","DOI":"10.1145\/3588689","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-25","source":"Crossref","is-referenced-by-count":52,"title":["SANTOS: Relationship-based Semantic Table Union Search"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5720-1207","authenticated-orcid":false,"given":"Aamod","family":"Khatiwada","sequence":"first","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9020-3642","authenticated-orcid":false,"given":"Grace","family":"Fan","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8803-8481","authenticated-orcid":false,"given":"Roee","family":"Shraga","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2872-1865","authenticated-orcid":false,"given":"Zixuan","family":"Chen","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9614-0504","authenticated-orcid":false,"given":"Wolfgang","family":"Gatterbauer","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1484-4787","authenticated-orcid":false,"given":"Ren\u00e9e J.","family":"Miller","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6102-7472","authenticated-orcid":false,"given":"Mirek","family":"Riedewald","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536343"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306--4573(02)00021--3"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00067"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313685"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687750"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389742"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213962"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3430915.3430921"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2899391"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00093"},{"key":"e_1_2_2_12_1","first-page":"139","article-title":"Database Dependency Discovery","volume":"12","author":"Flach Peter A.","year":"1999","unstructured":"Peter A. Flach and Iztok Savnik. 1999. Database Dependency Discovery: A Machine Learning Approach. AI Commun., Vol. 12, 3 (1999), 139--160. http:\/\/content.iospress.com\/articles\/ai-communications\/aic182","journal-title":"A Machine Learning Approach. AI Commun."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-005--6994--4"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3417426"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2016.11.003"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-030--30793--6_14"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452763"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447772"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330993"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137657"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872518.2889386"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.5441\/002"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1921005"},{"key":"e_1_2_2_24_1","volume-title":"IJCAI","author":"Ling Xiao","year":"2013","unstructured":"Xiao Ling, Alon Y. Halevy, Fei Wu, and Cong Yu. 2013. Synthesizing Union Tables from the Web. In IJCAI 2013. IJCAI\/AAAI, 2677--2683. http:\/\/www.aaai.org\/ocs\/index.php\/IJCAI\/IJCAI13\/paper\/view\/6758"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.14.0309"},{"key":"e_1_2_2_26_1","volume-title":"Information theory, inference and learning algorithms","author":"MacKay David JC","unstructured":"David JC MacKay. 2003. Information theory, inference and learning algorithms. Cambridge university press. https:\/\/books.google.com\/books?id=AKuMj4PN_EMC"},{"key":"e_1_2_2_27_1","unstructured":"C.D. Manning P. Raghavan and H. Sch\u00fctze. 2008. Introduction to Information Retrieval. Cambridge University Press. https:\/\/books.google.com\/books?id=t1PoSh4uwVcC"},{"key":"e_1_2_2_28_1","volume-title":"Visualizing Semantic Table Annotations with TableMiner. In ISWC","volume":"1690","author":"Mazumdar Suvodeep","year":"2016","unstructured":"Suvodeep Mazumdar and Ziqi Zhang. 2016. Visualizing Semantic Table Annotations with TableMiner. In ISWC 2016, Vol. 1690. CEUR-WS.org. http:\/\/ceur-ws.org\/Vol-1690\/paper88.pdf"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3240491"},{"key":"e_1_2_2_30_1","first-page":"59","article-title":"Making Open Data Transparent: Data Discovery on Open Data","volume":"41","author":"Miller Ren\u00e9","year":"2018","unstructured":"Ren\u00e9 e J. Miller, Fatemeh Nargesian, Erkang Zhu, Christina Christodoulakis, Ken Q. Pu, and Periklis Andritsos. 2018. Making Open Data Transparent: Data Discovery on Open Data. IEEE Data Eng. Bull., Vol. 41, 2 (2018), 59--70. http:\/\/sites.computer.org\/debull\/A18june\/p59.pdf","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_2_31_1","volume-title":"Proceedings of the the First International Workshop on Consuming Linked Data","volume":"665","author":"Mulwad Varish","year":"2010","unstructured":"Varish Mulwad, Tim Finin, Zareen Syed, and Joshi Anupam. 2010. Using linked data to interpret tables. In Proceedings of the the First International Workshop on Consuming Linked Data, Vol. 665. CEUR-WS.org. http:\/\/ceur-ws.org\/Vol-665\/MulwadEtAl_COLD2010.pdf"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352116"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192973"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384346"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.2000.1711"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-030--49461--2_34"},{"key":"e_1_2_2_37_1","volume-title":"Proceedings of the first instructional conference on machine learning","volume":"242","author":"Juan","unstructured":"Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29--48."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2797115.2797118"},{"key":"e_1_2_2_39_1","volume-title":"A Statistical Interpretation of Term Specificity and Its Application in Retrieval","author":"Jones Karen Sparck","unstructured":"Karen Sparck Jones. 1988. A Statistical Interpretation of Term Specificity and Its Application in Retrieval. Taylor Graham Publishing, GBR, 132--142."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517906"},{"key":"e_1_2_2_41_1","volume-title":"Proceedings of the Second Web Science Conference. ACM. https:\/\/ebiquity.umbc.edu\/paper\/html\/id\/474","author":"Syed Zareen","year":"2010","unstructured":"Zareen Syed, Tim Finin, Varish Mulwad, and Anupam Joshi. 2010. Exploiting a Web of Semantic Data for Interpreting Tables. In Proceedings of the Second Web Science Conference. ACM. https:\/\/ebiquity.umbc.edu\/paper\/html\/id\/474"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301281"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002939"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.3390\/app11115110"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3470825"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407793"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389726"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.3233\/SW-160242"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994534"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588689","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588689","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588689","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:13Z","timestamp":1750178833000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588689"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588689"],"URL":"https:\/\/doi.org\/10.1145\/3588689","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}