{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T01:25:04Z","timestamp":1781054704787,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T00:00:00Z","timestamp":1561420800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Natural Sciences and Engineering Research Council of Canada","award":["453911"],"award-info":[{"award-number":["453911"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,6,25]]},"DOI":"10.1145\/3299869.3300065","type":"proceedings-article","created":{"date-parts":[[2019,6,18]],"date-time":"2019-06-18T17:41:43Z","timestamp":1560879703000},"page":"847-864","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":105,"title":["JOSIE"],"prefix":"10.1145","author":[{"given":"Erkang","family":"Zhu","sequence":"first","affiliation":[{"name":"University of Toronto, Toronto, ON, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dong","family":"Deng","sequence":"additional","affiliation":[{"name":"Rutgers University &amp; Inception Institute of Artificial Intelligence, Abu Dhabi, UAE"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fatemeh","family":"Nargesian","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, ON, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ren\u00e9e J.","family":"Miller","sequence":"additional","affiliation":[{"name":"Northeastern University, Boston, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2019,6,25]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Arvind Arasu Venkatesh Ganti and Raghav Kaushik. 2006. Efficient Exact Set-Similarity Joins. PVLDB 918--929.  Arvind Arasu Venkatesh Ganti and Raghav Kaushik. 2006. Efficient Exact Set-Similarity Joins. PVLDB 918--929."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_3_2_1_3_1","volume-title":"Carey","author":"Behm Alexander","year":"2011","unstructured":"Alexander Behm , Chen Li , and Michael J . Carey . 2011 . Answering approximate string queries on large data sets using external memory. In ICDE. 888--899. Alexander Behm, Chen Li, and Michael J. Carey. 2011. Answering approximate string queries on large data sets using external memory. In ICDE. 888--899."},{"key":"e_1_3_2_1_4_1","unstructured":"A. Broder. 1997. On the Resemblance and Containment of Documents. In SEQUENCES. 21--.   A. Broder. 1997. On the Resemblance and Containment of Documents. In SEQUENCES. 21--."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453916"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_3_2_1_7_1","volume-title":"Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang.","author":"Deng Dong","year":"2017","unstructured":"Dong Deng , Raul Castro Fernandez , Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017 . The Data Civilizer System. In CIDR. Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, and Nan Tang. 2017. The Data Civilizer System. In CIDR."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115413"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544886"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Dong Deng Guoliang Li Shuang Hao Jiannan Wang and Jianhua Feng. 2014. MassJoin: A mapreduce-based method for scalable string similarity joins. In ICDE. 340--351.  Dong Deng Guoliang Li Shuang Hao Jiannan Wang and Jianhua Feng. 2014. MassJoin: A mapreduce-based method for scalable string similarity joins. In ICDE. 340--351.","DOI":"10.1109\/ICDE.2014.6816663"},{"key":"e_1_3_2_1_11_1","volume-title":"Aurum: A Data Discovery System (To Appear). In ICDE.","author":"Fernandez Raul Castro","year":"2018","unstructured":"Raul Castro Fernandez , Ziawasch Abedjan , Famien Koko , Gina Yuan , Samuel Madden , and Michael Stonebraker . 2018 . Aurum: A Data Discovery System (To Appear). In ICDE. Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, and Michael Stonebraker. 2018. Aurum: A Data Discovery System (To Appear). In ICDE."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231760"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137657"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872518.2889386"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2015.05.001"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497434"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/2078331.2078340"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947620"},{"key":"e_1_3_2_1_19_1","volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D.","unstructured":"Christopher D. Manning , Prabhakar Raghavan , and Hinrich Sch\u00fctze . 2008. Introduction to Information Retrieval . Cambridge University Press . Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze. 2008. Introduction to Information Retrieval. Cambridge University Press."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2212351.2212353"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192973"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Chuitian Rong Chunbin Lin Yasin N. Silva Jianguo Wang Wei Lu and Xiaoyong Du. 2017. Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics. In ICDE. 1059--1070.  Chuitian Rong Chunbin Lin Yasin N. Silva Jianguo Wang Wei Lu and Xiaoyong Du. 2017. Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics. In ICDE. 1059--1070.","DOI":"10.1109\/ICDE.2017.151"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213962"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732981"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741285"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213935"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807222"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Jin Wang Guoliang Li Dong Deng Yong Zhang and Jianhua Feng. 2015. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE. 519--530.  Jin Wang Guoliang Li Dong Deng Yong Zhang and Jianhua Feng. 2015. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE. 519--530.","DOI":"10.1109\/ICDE.2015.7113311"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213847"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3099622.3099624"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.111"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Chuan Xiao WeiWang Xuemin Lin and Jeffrey Xu Yu. 2008. Efficient similarity joins for near duplicate detection. In WWW. 131--140.  Chuan Xiao WeiWang Xuemin Lin and Jeffrey Xu Yu. 2008. Efficient similarity joins for near duplicate detection. In WWW. 131--140.","DOI":"10.1145\/1367497.1367516"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-015-5900-5"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994534"}],"event":{"name":"SIGMOD\/PODS '19: International Conference on Management of Data","location":"Amsterdam Netherlands","acronym":"SIGMOD\/PODS '19","sponsor":["SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Proceedings of the 2019 International Conference on Management of Data"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3299869.3300065","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3299869.3300065","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:39:15Z","timestamp":1750210755000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3299869.3300065"}},"subtitle":["Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes"],"short-title":[],"issued":{"date-parts":[[2019,6,25]]},"references-count":34,"alternative-id":["10.1145\/3299869.3300065","10.1145\/3299869"],"URL":"https:\/\/doi.org\/10.1145\/3299869.3300065","relation":{},"subject":[],"published":{"date-parts":[[2019,6,25]]},"assertion":[{"value":"2019-06-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}