{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,16]],"date-time":"2024-06-16T20:31:23Z","timestamp":1718569883201},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,3]]},"abstract":"<jats:p>\n            Rank (i.e., top-\n            <jats:italic>k<\/jats:italic>\n            ) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are offered using SQL-like languages (like Hive and Pig), based on MapReduce jobs. We first provide solutions that are based on specialized indices, which may themselves be accessed using either MapReduce or coordinator-based strategies. The first index-based solution is based on inverted indices, which are accessed with MapReduce jobs. The second index-based solution adapts a popular centralized rank-join algorithm. We further contribute a novel statistical structure comprising histograms and Bloom filters, which forms the basis for the third index-based solution. We provide (i) MapReduce algorithms showing how to build these indices and statistical structures, (ii) algorithms to allow for online updates to these indices, and (iii) query processing algorithms utilizing them. We implemented all algorithms in Hadoop (HDFS) and HBase and tested them on TPC-H datasets of various scales, utilizing different queries on tables of various sizes and different score-attribute distributions. We ported our implementations to Amazon EC2 and \"in-house\" lab clusters of various scales. We provide performance results for three metrics: query execution time, network bandwidth consumption, and dollar-cost for query execution.\n          <\/jats:p>","DOI":"10.14778\/2732286.2732287","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"493-504","source":"Crossref","is-referenced-by-count":16,"title":["Rank join queries in NoSQL databases"],"prefix":"10.14778","volume":"7","author":[{"given":"Nikos","family":"Ntarmos","sequence":"first","affiliation":[{"name":"School of Computing Science University of Glasgow, UK"}]},{"given":"Ioannis","family":"Patlakas","sequence":"additional","affiliation":[{"name":"Max-Planck-Institut f\u00fcr Informatik, Germany"}]},{"given":"Peter","family":"Triantafillou","sequence":"additional","affiliation":[{"name":"School of Computing Science University of Glasgow, UK"}]}],"member":"320","published-online":{"date-parts":[[2014,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687731"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1739041.1739056"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/645484.656383"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1011767.1011798"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872787"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920908"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.108"},{"key":"e_1_2_1_9_1","unstructured":"DynamoDB pricing scheme: http:\/\/aws.amazon.com\/dynamodb\/#pricing.  DynamoDB pricing scheme: http:\/\/aws.amazon.com\/dynamodb\/#pricing."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/375551.375567"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1966.1053907"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1287369.1287453"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315451.1315516"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391729.1391730"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989424"},{"key":"e_1_2_1_16_1","volume-title":"Proc. VLDB","author":"Michel S.","year":"2005","unstructured":"S. Michel , P. Triantafillou , and G. Weikum . KLEE: A framework for distributed top-k query algorithms . In Proc. VLDB , 2005 . S. Michel, P. Triantafillou, and G. Weikum. KLEE: A framework for distributed top-k query algorithms. In Proc. VLDB, 2005."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2002.803864"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4379(93)90037-2"},{"key":"e_1_2_1_19_1","volume-title":"Proc. VLDB","author":"Natsev A.","year":"2001","unstructured":"A. Natsev , Y.-C. Chang , J. Smith , C.-S. Li , and J. Vitter . Supporting incremental join queries on ranked inputs . In Proc. VLDB , 2001 . A. Natsev, Y.-C. Chang, J. Smith, C.-S. Li, and J. Vitter. Supporting incremental join queries on ranked inputs. In Proc. VLDB, 2001."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989423"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376916.1376924"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629175.1629197"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920951"},{"key":"e_1_2_1_26_1","volume-title":"Proc. VLDB","author":"Xia C.","year":"2004","unstructured":"C. Xia , H. Lu , B. C. Ooi , and J. Hu . Gorder: An efficient method for kNN join processing . In Proc. VLDB , 2004 . C. Xia, H. Lu, B. C. Ooi, and J. Hu. Gorder: An efficient method for kNN join processing. In Proc. VLDB, 2004."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.111"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1080885.1080896"},{"key":"e_1_2_1_29_1","volume-title":"Proc. DEXA","author":"Zhao K.","year":"2005","unstructured":"K. Zhao , S. Zhou , K.-L. Tan , and A. Zhou . Supporting ranked join in peer-to-peer networks . In Proc. DEXA , 2005 . K. Zhao, S. Zhou, K.-L. Tan, and A. Zhou. Supporting ranked join in peer-to-peer networks. In Proc. DEXA, 2005."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2732286.2732287","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:00:27Z","timestamp":1672225227000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2732286.2732287"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,3]]},"references-count":29,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2014,3]]}},"alternative-id":["10.14778\/2732286.2732287"],"URL":"https:\/\/doi.org\/10.14778\/2732286.2732287","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,3]]}}}