{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T17:02:35Z","timestamp":1759683755786},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,12]]},"abstract":"<jats:p>\n            We study the exact set similarity join problem, which, given two collections of sets, finds out all the similar set pairs from the collections. Existing methods generally utilize the prefix filter based framework. They generate a prefix for each set and prune all the pairs whose prefixes are disjoint. However the pruning power is limited, because if two dissimilar sets share a common element in their prefixes, they cannot be pruned. To address this problem, we propose a partition-based framework. We design a partition scheme to partition the sets into several subsets and guarantee that two sets are similar only if they share a common subset. To improve the pruning power, we propose a mixture of the subsets and their 1-deletion neighborhoods (the subset of a set by eliminating one element). As there are multiple allocation strategies to generate the mixture, we evaluate different allocations and design a dynamic-programming algorithm to select the optimal one. However the time complexity of generating the optimal one is\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>s<\/jats:italic>\n            <jats:sup>3<\/jats:sup>\n            ) for a set with size\n            <jats:italic>s.<\/jats:italic>\n            To speed up the allocation selection, we develop a greedy algorithm with an approximation ratio of 2. To further reduce the complexity, we design an adaptive grouping mechanism, and the two techniques can reduce the complexity to\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>s<\/jats:italic>\n            log\n            <jats:italic>s<\/jats:italic>\n            ). Experimental results on three real-world datasets show our method achieves high performance and outperforms state-of-the-art methods by 2-5 times.\n          <\/jats:p>","DOI":"10.14778\/2856318.2856330","type":"journal-article","created":{"date-parts":[[2016,2,1]],"date-time":"2016-02-01T14:10:31Z","timestamp":1454335831000},"page":"360-371","source":"Crossref","is-referenced-by-count":64,"title":["An efficient partition based method for exact set similarity joins"],"prefix":"10.14778","volume":"9","author":[{"given":"Dong","family":"Deng","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Guoliang","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"He","family":"Wen","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Jianhua","family":"Feng","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2015,12]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"918","volume-title":"VLDB","author":"Arasu A.","year":"2006","unstructured":"A. Arasu , V. Ganti , and R. Kaushik . Efficient exact set-similarity joins . In VLDB , pages 918 -- 929 , 2006 . A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918--929, 2006."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276781"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276323"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242610"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816663"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276876"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732296.2732299"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497434"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/2078331.2078340"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007652"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372071"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2140436.2140440"},{"key":"e_1_2_1_15_1","first-page":"2321","volume-title":"NIPS","author":"Shrivastava A.","year":"2014","unstructured":"A. Shrivastava and P. Li . Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS) . In NIPS , pages 2321 -- 2329 , 2014 . A. Shrivastava and P. Li. Asymmetric LSH (ALSH) for sublinear time maximum inner product search (MIPS). In NIPS, pages 2321--2329, 2014."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081956"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807222"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350263"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767865"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213847"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.111"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367516"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989428"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1717872"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2856318.2856330","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:23:35Z","timestamp":1672223015000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2856318.2856330"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,12]]},"references-count":24,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,12]]}},"alternative-id":["10.14778\/2856318.2856330"],"URL":"https:\/\/doi.org\/10.14778\/2856318.2856330","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,12]]}}}