{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:38:27Z","timestamp":1740173907162,"version":"3.37.3"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2016,6]]},"abstract":"<jats:p>While services such as Amazon AWS make computing power abundantly available, adding more computing nodes can incur high costs in, for instance, pay-as-you-go plans while not always significantly improving the net running time (aka wall-clock time) of queries. In this work, we provide algorithms for parallel evaluation of SGF queries in MapReduce that optimize total time, while retaining low net time. Not only can SGF queries specify all semi-join reducers, but also more expressive queries involving disjunction and negation. Since SGF queries can be seen as Boolean combinations of (potentially nested) semi-joins, we introduce a novel multi-semi-join (MSJ) MapReduce operator that enables the evaluation of a set of semi-joins in one job. We use this operator to obtain parallel query plans for SGF queries that outvalue sequential plans w.r.t. net time and provide additional optimizations aimed at minimizing total time without severely affecting net time. Even though the latter optimizations are NP-hard, we present effective greedy algorithms. Our experiments, conducted using our own implementation Gumbo on top of Hadoop, confirm the usefulness of parallel query plans, and the effectiveness and scalability of our optimizations, all with a significant improvement over Pig and Hive.<\/jats:p>","DOI":"10.14778\/2977797.2977800","type":"journal-article","created":{"date-parts":[[2016,8,5]],"date-time":"2016-08-05T12:34:02Z","timestamp":1470400442000},"page":"732-743","source":"Crossref","is-referenced-by-count":1,"title":["Parallel evaluation of multi-semi-joins"],"prefix":"10.14778","volume":"9","author":[{"given":"Jonny","family":"Daenen","sequence":"first","affiliation":[{"name":"Hasselt University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frank","family":"Neven","sequence":"additional","affiliation":[{"name":"Hasselt University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tony","family":"Tan","sequence":"additional","affiliation":[{"name":"National Taiwan University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stijn","family":"Vansummeren","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Libre de Bruxelles"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/551350"},{"key":"e_1_2_1_2_1","volume-title":"GYM: A multiround join algorithm in mapreduce. CoRR, abs\/1410.4156","author":"Afrati F.","year":"2014","unstructured":"F. Afrati , M. Joglekar , C. R\u00e9 , S. Salihoglu , and J. Ullman . GYM: A multiround join algorithm in mapreduce. CoRR, abs\/1410.4156 , 2014 . F. Afrati, M. Joglekar, C. R\u00e9, S. Salihoglu, and J. Ullman. GYM: A multiround join algorithm in mapreduce. CoRR, abs\/1410.4156, 2014."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2535570.2488334"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.47"},{"key":"e_1_2_1_5_1","volume-title":"Handling skew in multiway joins in parallel processing. CoRR, abs\/1504.03247","author":"Afrati F.","year":"2015","unstructured":"F. Afrati , J. Ullman , and A. Vasilakopoulos . Handling skew in multiway joins in parallel processing. CoRR, abs\/1504.03247 , 2015 . F. Afrati, J. Ullman, and A. Vasilakopoulos. Handling skew in multiway joins in parallel processing. CoRR, abs\/1504.03247, 2015."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1004275029985"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463664.2465224"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594538.2594558"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/322234.322238"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/0210059"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4379(81)90002-8"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807273"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275492"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750545"},{"key":"e_1_2_1_15_1","first-page":"521","volume-title":"EDBT","author":"Daenen J.","year":"2015","unstructured":"J. Daenen , F. Neven , and T. Tan . Gumbo: Guarded fragment queries over big data . In EDBT , pages 521 -- 524 , 2015 . J. Daenen, F. Neven, and T. Tan. Gumbo: Guarded fragment queries over big data. In EDBT, pages 521--524, 2015."},{"key":"e_1_2_1_16_1","volume-title":"Parallel evaluation of multi-semi-joins. CoRR, abs\/1605.05219","author":"Daenen J.","year":"2016","unstructured":"J. Daenen , F. Neven , T. Tan , and S. Vansummeren . Parallel evaluation of multi-semi-joins. CoRR, abs\/1605.05219 , 2016 . J. Daenen, F. Neven, T. Tan, and S. Vansummeren. Parallel evaluation of multi-semi-joins. CoRR, abs\/1605.05219, 2016."},{"key":"e_1_2_1_17_1","first-page":"4","volume":"0","author":"Daenen J.","year":"2016","unstructured":"J. Daenen and T. Tan . Gumbo v 0 . 4 , May 2016 . http:\/\/dx.doi.org\/10.5281\/zenodo.51517. 10.5281\/zenodo.51517 J. Daenen and T. Tan. Gumbo v0.4, May 2016. http:\/\/dx.doi.org\/10.5281\/zenodo.51517.","journal-title":"Gumbo"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732279.2732281"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/602220.602222"},{"key":"e_1_2_1_21_1","volume-title":"Computers and Intractability: A Guide to the Theory of NP-Completeness","author":"Garey M. R.","year":"1979","unstructured":"M. R. Garey and D. S. Johnson . Computers and Intractability: A Guide to the Theory of NP-Completeness . W. H. Freeman & Co. , New York, NY, USA , 1979 . M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979."},{"issue":"1","key":"e_1_2_1_22_1","first-page":"34","article-title":"Apache pig's optimizer","volume":"36","author":"Gates A.","year":"2013","unstructured":"A. Gates , J. Dai , and T. Nair . Apache pig's optimizer . IEEE Data Engineering Bulletin , 36 ( 1 ): 34 -- 45 , 2013 . A. Gates, J. Dai, and T. Nair. Apache pig's optimizer. IEEE Data Engineering Bulletin, 36(1):34--45, 2013.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_2_1_23_1","volume-title":"Description Logics","author":"Gr\u00e4del E.","year":"1998","unstructured":"E. Gr\u00e4del . Description logics and guarded fragments of first order logic . In Description Logics , 1998 . E. Gr\u00e4del. Description logics and guarded fragments of first order logic. In Description Logics, 1998."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/234313.234367"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989284.1989310"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10849-005-5789-8"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920906"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989423"},{"key":"e_1_2_1_29_1","first-page":"267","volume-title":"USENIX Annual Technical Conference","author":"Olston C.","year":"2008","unstructured":"C. Olston , B. Reed , A. Silberstein , and U. Srivastava . Automatic optimization of parallel dataflow programs . In USENIX Annual Technical Conference , pages 267 -- 273 , 2008 . C. Olston, B. Reed, A. Silberstein, and U. Srivastava. Automatic optimization of parallel dataflow programs. In USENIX Annual Technical Conference, pages 267--273, 2008."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_31_1","first-page":"245","volume-title":"ICDT","author":"Picalausa F.","year":"2014","unstructured":"F. Picalausa , G. Fletcher , J. Hidders , and S. Vansummeren . Principles of guarded structural indexing . In ICDT , pages 245 -- 256 , 2014 . F. Picalausa, G. Fletcher, J. Hidders, and S. Vansummeren. Principles of guarded structural indexing. In ICDT, pages 245--256, 2014."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391245"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463719"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447738"},{"key":"e_1_2_1_35_1","first-page":"149","volume-title":"DIMACS Workshop on Descriptive Complexity and Finite Models","author":"Vardi M.","year":"1996","unstructured":"M. Vardi . Why is modal logic so robustly decidable ? In DIMACS Workshop on Descriptive Complexity and Finite Models , pages 149 -- 184 , 1996 . M. Vardi. Why is modal logic so robustly decidable? In DIMACS Workshop on Descriptive Complexity and Finite Models, pages 149--184, 1996."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732232.2732234"},{"key":"e_1_2_1_37_1","volume-title":"Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly","author":"White T.","year":"2015","unstructured":"T. White . Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly , 2015 . T. White. Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly, 2015."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465288"},{"key":"e_1_2_1_39_1","first-page":"82","volume-title":"VLDB","author":"Yannakakis M.","year":"1981","unstructured":"M. Yannakakis . Algorithms for acyclic database schemes . In VLDB , pages 82 -- 94 , 1981 . M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82--94, 1981."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2977797.2977800","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:53:42Z","timestamp":1672221222000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2977797.2977800"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6]]},"references-count":39,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2016,6]]}},"alternative-id":["10.14778\/2977797.2977800"],"URL":"https:\/\/doi.org\/10.14778\/2977797.2977800","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2016,6]]}}}