{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T15:49:41Z","timestamp":1742399381123},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>We propose a novel technique for estimating the size of set similarity join. The proposed technique relies on a succinct representation of sets using Min-Hash signatures. We exploit frequent patterns in the signatures for the Set Similarity Join (SSJoin) size estimation by counting their support. However, there are overlaps among the counts of signature patterns and we need to use the set Inclusion-Exclusion (IE) principle. We develop a novel lattice-based counting method for efficiently evaluating the IE principle. The proposed counting technique is linear in the lattice size. To make the mining process very light-weight, we exploit a recently discovered Power-law relationship of pattern count and frequency. Extensive experimental evaluations show the proposed technique is capable of accurate and efficient estimation.<\/jats:p>","DOI":"10.14778\/1687627.1687702","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"658-669","source":"Crossref","is-referenced-by-count":15,"title":["Power-law based estimation of set similarity join size"],"prefix":"10.14778","volume":"2","author":[{"given":"Hongrae","family":"Lee","sequence":"first","affiliation":[{"name":"University of British Columbia"}]},{"given":"Raymond T.","family":"Ng","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Kyuseok","family":"Shim","sequence":"additional","affiliation":[{"name":"Seoul National University"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"487","volume-title":"Proc. VLDB","author":"Agrawal R.","year":"1994","unstructured":"R. Agrawal and R. Srikant . Fast Algorithms for Mining Association Rules . In Proc. VLDB , pages 487 -- 499 , 1994 . R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. In Proc. VLDB, pages 487--499, 1994."},{"key":"e_1_2_1_2_1","first-page":"918","volume-title":"Proc. VLDB","author":"Arasu A.","year":"2006","unstructured":"A. Arasu , V. Ganti , and R. Kaushik . Efficient Exact Set-Similarity Joins . In Proc. VLDB , pages 918 -- 929 , 2006 . A. Arasu, V. Ganti, and R. Kaushik. Efficient Exact Set-Similarity Joins. In Proc. VLDB, pages 918--929, 2006."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.85"},{"key":"e_1_2_1_5_1","first-page":"21","volume-title":"Proc. SEQUENCES","author":"Broder A. Z.","year":"1997","unstructured":"A. Z. Broder . On the Resemblance and Containment of Documents . In Proc. SEQUENCES , pages 21 -- 29 , 1997 . A. Z. Broder. On the Resemblance and Containment of Documents. In Proc. SEQUENCES, pages 21--29, 1997."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132956.1132958"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/335168.335225"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-007-0054-1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/070710111"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1534"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564719"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335412"},{"key":"e_1_2_1_14_1","unstructured":"Ferenc Bodon. Apriori implementation of ferenc bodon. http:\/\/www.cs.bme.hu\/bodon\/en\/apriori\/.  Ferenc Bodon. Apriori implementation of ferenc bodon. http:\/\/www.cs.bme.hu\/bodon\/en\/apriori\/."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066176"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497435"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453883"},{"key":"e_1_2_1_18_1","unstructured":"Hans D. Mittelmann. Decision tree for optimization software. http:\/\/plato.asu.edu\/sub\/nonlsq.html.  Hans D. Mittelmann. Decision tree for optimization software. http:\/\/plato.asu.edu\/sub\/nonlsq.html."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/223784.223807"},{"key":"e_1_2_1_20_1","unstructured":"IBM. Quest synthetic data generation code. http:\/\/www.almaden.ibm.com\/cs\/disciplines\/iis\/.  IBM. Quest synthetic data generation code. http:\/\/www.almaden.ibm.com\/cs\/disciplines\/iis\/."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142504"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516360.1516420"},{"key":"e_1_2_1_23_1","volume-title":"Solving Least Squares Problems","author":"Lawson C. L.","year":"1987","unstructured":"C. L. Lawson and R. J. Hanson . Solving Least Squares Problems . Society for Industrial Mathematics , 1987 . C. L. Lawson and R. J. Hanson. Solving Least Squares Problems. Society for Industrial Mathematics, 1987."},{"key":"e_1_2_1_24_1","first-page":"195","volume-title":"Proc. VLDB","author":"Lee H.","year":"2007","unstructured":"H. Lee , R. T. Ng , and K. Shim . Extending Q-Grams to Estimate Selectivity of String Matching with Edit Distance . In Proc. VLDB , pages 195 -- 206 , 2007 . H. Lee, R. T. Ng, and K. Shim. Extending Q-Grams to Estimate Selectivity of String Matching with Edit Distance. In Proc. VLDB, pages 195--206, 2007."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516360.1516455"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242606"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497485"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135834"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007652"},{"key":"e_1_2_1_30_1","volume-title":"Human Behavior and the Principle of Least Effort","author":"Zipf G. K.","year":"1947","unstructured":"G. K. Zipf . Human Behavior and the Principle of Least Effort . Addison-Wesley , 1947 . G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, 1947."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687702","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:30:43Z","timestamp":1672227043000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687702"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687702","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}