{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T03:06:23Z","timestamp":1775012783730,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2010,6,6]],"date-time":"2010-06-06T00:00:00Z","timestamp":1275782400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2010,6,6]]},"DOI":"10.1145\/1807167.1807222","type":"proceedings-article","created":{"date-parts":[[2010,6,8]],"date-time":"2010-06-08T12:37:34Z","timestamp":1276000654000},"page":"495-506","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":313,"title":["Efficient parallel set-similarity joins using MapReduce"],"prefix":"10.1145","author":[{"given":"Rares","family":"Vernica","sequence":"first","affiliation":[{"name":"University of California, Irvine, Irvine, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael J.","family":"Carey","sequence":"additional","affiliation":[{"name":"University of California, Irvine, Irvine, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Li","sequence":"additional","affiliation":[{"name":"University of California, Irvine, Irvine, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2010,6,6]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Apache Hadoop. http:\/\/hadoop.apache.org.  Apache Hadoop. http:\/\/hadoop.apache.org."},{"key":"e_1_3_2_1_2_1","unstructured":"Apache Hive. http:\/\/hadoop.apache.org\/hive.  Apache Hive. http:\/\/hadoop.apache.org\/hive."},{"key":"e_1_3_2_1_3_1","first-page":"918","volume-title":"VLDB","author":"Arasu A.","year":"2006","unstructured":"A. Arasu , V. Ganti , and R. Kaushik . Efficient exact set-similarity joins . In VLDB , pages 918 -- 929 , 2006 . A. Arasu, V. Ganti, and R. Kaushik. Efficient exact set-similarity joins. In VLDB, pages 918--929, 2006."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242591"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-7552(97)00031-7"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.9"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/129888.129894"},{"key":"e_1_3_2_1_9_1","first-page":"443","volume-title":"VLDB","author":"DeWitt D. J.","year":"1991","unstructured":"D. J. DeWitt , J. F. Naughton , and D. A. Schneider . An evaluation of non-equijoin algorithms . In VLDB , pages 443 -- 452 , 1991 . D. J. DeWitt, J. F. Naughton, and D. A. Schneider. An evaluation of non-equijoin algorithms. In VLDB, pages 443--452, 1991."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687568"},{"key":"e_1_3_2_1_11_1","unstructured":"Genbank. http:\/\/www.ncbi.nlm.nih.gov\/Genbank.  Genbank. http:\/\/www.ncbi.nlm.nih.gov\/Genbank."},{"key":"e_1_3_2_1_12_1","first-page":"518","volume-title":"VLDB","author":"Gionis A.","year":"1999","unstructured":"A. Gionis , P. Indyk , and R. Motwani . Similarity search in high dimensions via hashing . In VLDB , pages 518 -- 529 , 1999 . A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999."},{"key":"e_1_3_2_1_13_1","first-page":"491","volume-title":"VLDB","author":"Gravano L.","year":"2001","unstructured":"L. Gravano , P. G. Ipeirotis , H. V. Jagadish , N. Koudas , S. Muthukrishnan , and D. Srivastava . Approximate string joins in a database (almost) for free . In VLDB , pages 491 -- 500 , 2001 . L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491--500, 2001."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148222"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10170"},{"key":"e_1_3_2_1_16_1","unstructured":"Jaql. http:\/\/www.jaql.org.  Jaql. http:\/\/www.jaql.org."},{"key":"e_1_3_2_1_17_1","unstructured":"Jaql - Fuzzy join tutorial. http:\/\/code.google.com\/p\/jaql\/wiki\/fuzzyJoinTutorial.  Jaql - Fuzzy join tutorial. http:\/\/code.google.com\/p\/jaql\/wiki\/fuzzyJoinTutorial."},{"key":"e_1_3_2_1_18_1","first-page":"210","volume-title":"VLDB","author":"Kitsuregawa M.","year":"1990","unstructured":"M. Kitsuregawa and Y. Ogawa . Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc) . In VLDB , pages 210 -- 221 , 1990 . M. Kitsuregawa and Y. Ogawa. Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc). In VLDB, pages 210--221, 1990."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF03037022"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242606"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559865"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135834"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007652"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/67544.66937"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081956"},{"key":"e_1_3_2_1_26_1","volume-title":"UC Irvine","author":"Vernica R.","year":"2010","unstructured":"R. Vernica , M. Carey , and C. Li . Efficient parallel set-similarity joins using MapReduce. Technical report, Department of Computer Science , UC Irvine , March 2010 . http:\/\/asterix.ics.uci.edu. R. Vernica, M. Carey, and C. Li. Efficient parallel set-similarity joins using MapReduce. Technical report, Department of Computer Science, UC Irvine, March 2010. http:\/\/asterix.ics.uci.edu."},{"key":"e_1_3_2_1_27_1","volume-title":"http:\/\/www.ldc.upenn.edu\/Catalog\/CatalogEntry.jsp?catalogId=LDC2006T13","author":"Web","unstructured":"Web 1t 5-gram version 1. http:\/\/www.ldc.upenn.edu\/Catalog\/CatalogEntry.jsp?catalogId=LDC2006T13 . Web 1t 5-gram version 1. http:\/\/www.ldc.upenn.edu\/Catalog\/CatalogEntry.jsp?catalogId=LDC2006T13."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453957"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1367497.1367516"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247602"}],"event":{"name":"SIGMOD\/PODS '10: International Conference on Management of Data","location":"Indianapolis Indiana USA","acronym":"SIGMOD\/PODS '10","sponsor":["SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Proceedings of the 2010 ACM SIGMOD International Conference on Management of data"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1807167.1807222","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1807167.1807222","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:17:35Z","timestamp":1750249055000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1807167.1807222"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6,6]]},"references-count":30,"alternative-id":["10.1145\/1807167.1807222","10.1145\/1807167"],"URL":"https:\/\/doi.org\/10.1145\/1807167.1807222","relation":{},"subject":[],"published":{"date-parts":[[2010,6,6]]},"assertion":[{"value":"2010-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}