{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T05:56:20Z","timestamp":1717134980367},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:p>\n            We study the\n            <jats:italic>set similarity join problem<\/jats:italic>\n            , which retrieves all pairs of similar sets from two collections of sets for a given distance function. Existing exact solutions employ a signature-based filter-verification framework: If two sets are similar, they must have at least one signature in common, otherwise they can be pruned safely. We observe that the choice of the signature scheme has a significant impact on the performance. Unfortunately, choosing a good signature scheme is hard because the performance heavily depends on the characteristics of the underlying dataset.\n          <\/jats:p>\n          <jats:p>\n            To address this problem, we propose a\n            <jats:italic>hybrid signature composition<\/jats:italic>\n            that leverages the most selective portion of each signature scheme. Sets with an unselective primary signature are detected, and the signatures are replaced with a more selective secondary signature. We propose a generic framework called\n            <jats:italic>TwoL<\/jats:italic>\n            and a cost model to balance the computational overhead and the selectivity of the signature schemes. We implement our framework with two complementary signature schemes for Jaccard similarity and Hamming distance, resulting in effective two-level\n            <jats:italic>hybrid indexes<\/jats:italic>\n            that join datasets with diverse characteristics efficiently. TwoL consistently outperforms state-of-the-art set similarity joins on a benchmark with 13 datasets that cover a wide range of data characteristics.\n          <\/jats:p>","DOI":"10.14778\/3611479.3611480","type":"journal-article","created":{"date-parts":[[2023,8,25]],"date-time":"2023-08-25T02:08:08Z","timestamp":1692929288000},"page":"2686-2698","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A Two-Level Signature Scheme for Stable Set Similarity Joins"],"prefix":"10.14778","volume":"16","author":[{"given":"Daniel","family":"Schmitt","sequence":"first","affiliation":[{"name":"University of Salzburg, Austria"}]},{"given":"Daniel","family":"Kocher","sequence":"additional","affiliation":[{"name":"University of Salzburg, Austria"}]},{"given":"Nikolaus","family":"Augsten","sequence":"additional","affiliation":[{"name":"University of Salzburg, Austria"}]},{"given":"Willi","family":"Mann","sequence":"additional","affiliation":[{"name":"Celonis SE, Germany"}]},{"given":"Alexander","family":"Miller","sequence":"additional","affiliation":[{"name":"University of Salzburg, Austria"}]}],"member":"320","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164206"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2428536.2428537"},{"key":"e_1_2_1_4_1","volume-title":"A bayesian perspective on locality sensitive hashing with extensions for kernel methods. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 2","author":"Chakrabarti Aniket","year":"2015","unstructured":"Aniket Chakrabarti , Venu Satuluri , Atreya Srivathsan , and Srinivasan Parthasarathy . 2015. A bayesian perspective on locality sensitive hashing with extensions for kernel methods. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 2 ( 2015 ), 1--32. Aniket Chakrabarti, Venu Satuluri, Atreya Srivathsan, and Srinivasan Parthasarathy. 2015. A bayesian perspective on locality sensitive hashing with extensions for kernel methods. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 2 (2015), 1--32."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3055399.3055443"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856330"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183748"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276876"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5441\/002\/edbt.2021.11"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497434"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517880"},{"key":"e_1_2_1_13_1","volume-title":"PEL: Position-Enhanced Length Filter for Set Similarity Joins. In Grundlagen von Datenbanken.","author":"Mann Willi","year":"2014","unstructured":"Willi Mann and Nikolaus Augsten . 2014 . PEL: Position-Enhanced Length Filter for Set Similarity Joins. In Grundlagen von Datenbanken. Vol. 1313 . 89--94. Willi Mann and Nikolaus Augsten. 2014. PEL: Position-Enhanced Length Filter for Set Similarity Joins. In Grundlagen von Datenbanken. Vol. 1313. 89--94."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947620"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196959.3196985"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248043"},{"key":"e_1_2_1_17_1","volume-title":"Fast exact search in hamming space with multi-index hashing","author":"Norouzi Mohammad","year":"2013","unstructured":"Mohammad Norouzi , Ali Punjani , and David J Fleet . 2013. Fast exact search in hamming space with multi-index hashing . IEEE transactions on pattern analysis and machine intelligence 36, 6 ( 2013 ), 1107--1119. Mohammad Norouzi, Ali Punjani, and David J Fleet. 2013. Fast exact search in hamming space with multi-index hashing. IEEE transactions on pattern analysis and machine intelligence 36, 6 (2013), 1107--1119."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2884435.2884436"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816815"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983742"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3275536.3275539"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2019.2899597"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.195"},{"key":"e_1_2_1_24_1","volume-title":"Arnet-Miner: Extraction and Mining of Academic Social Networks. In KDD'08","author":"Tang Jie","year":"2008","unstructured":"Jie Tang , Jing Zhang , Limin Yao , Juanzi Li , Li Zhang , and Zhong Su . 2008 . Arnet-Miner: Extraction and Mining of Academic Social Networks. In KDD'08 . 990--998. Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnet-Miner: Extraction and Mining of Academic Social Networks. In KDD'08. 990--998."},{"key":"e_1_2_1_25_1","first-page":"436","article-title":"Eine Extremalaufgabe aus der Graphentheorie","volume":"48","author":"Tur\u00e1n Paul","year":"1941","unstructured":"Paul Tur\u00e1n . 1941 . Eine Extremalaufgabe aus der Graphentheorie . Mat. Fiz. Lapok 48 , 436 -- 452 (1941), 61. Paul Tur\u00e1n. 1941. Eine Extremalaufgabe aus der Graphentheorie. Mat. Fiz. Lapok 48, 436--452 (1941), 61.","journal-title":"Mat. Fiz. Lapok"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213847"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915211"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0529-2"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.111"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484838.2484842"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300065"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3611479.3611480","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,23]],"date-time":"2023-09-23T22:19:47Z","timestamp":1695507587000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3611479.3611480"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7]]},"references-count":32,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["10.14778\/3611479.3611480"],"URL":"https:\/\/doi.org\/10.14778\/3611479.3611480","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,7]]},"assertion":[{"value":"2023-08-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}