{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T13:33:08Z","timestamp":1760016788948},"reference-count":15,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2015,8]]},"abstract":"<jats:p>Big Data analytics often include complex queries with similar or identical expressions, usually referred to as Common Table Expressions (CTEs). CTEs may be explicitly defined by users to simplify query formulations, or implicitly included in queries generated by business intelligence tools, financial applications and decision support systems. In Massively Parallel Processing (MPP) database systems, CTEs pose new challenges due to the distributed nature of query processing, the overwhelming volume of underlying data and the scalability criteria that systems are required to meet. In these settings, the effective optimization and efficient execution of CTEs are crucial for the timely processing of analytical queries over Big Data. In this paper, we present a comprehensive framework for the representation, optimization and execution of CTEs in the context of<jats:italic>Orca<\/jats:italic>-- Pivotal's query optimizer for Big Data. We demonstrate experimentally the benefits of our techniques using industry standard decision support benchmark.<\/jats:p>","DOI":"10.14778\/2824032.2824068","type":"journal-article","created":{"date-parts":[[2015,9,16]],"date-time":"2015-09-16T12:18:17Z","timestamp":1442405897000},"page":"1704-1715","source":"Crossref","is-referenced-by-count":9,"title":["Optimization of common table expressions in MPP database systems"],"prefix":"10.14778","volume":"8","author":[{"given":"Amr","family":"El-Helw","sequence":"first","affiliation":[{"name":"Pivotal Inc., Palo Alto, CA"}]},{"given":"Venkatesh","family":"Raghavan","sequence":"additional","affiliation":[{"name":"Pivotal Inc., Palo Alto, CA"}]},{"given":"Mohamed A.","family":"Soliman","sequence":"additional","affiliation":[{"name":"Pivotal Inc., Palo Alto, CA"}]},{"given":"George","family":"Caragea","sequence":"additional","affiliation":[{"name":"Pivotal Inc., Palo Alto, CA"}]},{"given":"Zhongxian","family":"Gu","sequence":"additional","affiliation":[{"name":"Datometry Inc., San Francisco, CA"}]},{"given":"Michalis","family":"Petropoulos","sequence":"additional","affiliation":[{"name":"Amazon Web Services, Palo Alto, CA"}]}],"member":"320","published-online":{"date-parts":[[2015,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"PostgreSQL. http:\/\/www.postgresql.org. PostgreSQL. http:\/\/www.postgresql.org."},{"key":"e_1_2_1_2_1","first-page":"373","volume-title":"SIGMOD","author":"Antova L.","year":"2014","unstructured":"L. Antova , A. El-Helw , M. A. Soliman , Z. Gu , M. Petropoulos , and F. Waas . Optimizing Queries over Partitioned Tables in MPP Systems . In SIGMOD , pages 373 -- 384 , 2014 . 10.1145\/2588555.2595640 L. Antova, A. El-Helw, M. A. Soliman, Z. Gu, M. Petropoulos, and F. Waas. Optimizing Queries over Partitioned Tables in MPP Systems. In SIGMOD, pages 373--384, 2014. 10.1145\/2588555.2595640"},{"key":"e_1_2_1_3_1","volume-title":"MBDS","author":"Bear C.","year":"2012","unstructured":"C. Bear , A. Lamb , and N. Tran . The Vertica Database: SQL RDBMS for Managing Big Data . In MBDS , 2012 . 10.1145\/2378356.2378367 C. Bear, A. Lamb, and N. Tran. The Vertica Database: SQL RDBMS for Managing Big Data. In MBDS, 2012. 10.1145\/2378356.2378367"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687563"},{"key":"e_1_2_1_5_1","first-page":"1223","volume-title":"SIGMOD","author":"Chang L.","year":"2014","unstructured":"L. Chang , Z. Wang , T. Ma , L. Jian , L. Ma , A. Goldshuv , L. Lonergan , J. Cohen , C. Welton , G. Sherry , and M. Bhandarkar . HAWQ: A Massively Parallel Processing SQL Engine in Hadoop . In SIGMOD , pages 1223 -- 1234 , 2014 . 10.1145\/2588555.2595636 L. Chang, Z. Wang, T. Ma, L. Jian, L. Ma, A. Goldshuv, L. Lonergan, J. Cohen, C. Welton, G. Sherry, and M. Bhandarkar. HAWQ: A Massively Parallel Processing SQL Engine in Hadoop. In SIGMOD, pages 1223--1234, 2014. 10.1145\/2588555.2595636"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375706"},{"key":"e_1_2_1_7_1","volume-title":"The Cascades Framework for Query Optimization","author":"Graefe G.","year":"1995","unstructured":"G. Graefe . The Cascades Framework for Query Optimization . IEEE Data Eng. Bull ., 18(3), 1995 . G. Graefe. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull., 18(3), 1995."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1109\/ICDE.2014.6816678","volume-title":"Data Engineering (ICDE), 2014 IEEE 30th International Conference on","author":"Perez L. L.","year":"2014","unstructured":"L. L. Perez and C. M. Jermaine . History-aware query optimization with materialized intermediate views . In Data Engineering (ICDE), 2014 IEEE 30th International Conference on , pages 520 -- 531 . IEEE, 2014 . L. L. Perez and C. M. Jermaine. History-aware query optimization with materialized intermediate views. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 520--531. IEEE, 2014."},{"key":"e_1_2_1_9_1","volume-title":"http:\/\/www.pivotal.io\/big-data\/pivotal-greenplum-database","author":"Database Greenplum","year":"2013","unstructured":"Pivotal. Greenplum Database . http:\/\/www.pivotal.io\/big-data\/pivotal-greenplum-database , 2013 . Pivotal. Greenplum Database. http:\/\/www.pivotal.io\/big-data\/pivotal-greenplum-database, 2013."},{"key":"e_1_2_1_10_1","first-page":"767","volume-title":"SIGMOD","author":"Shankar S.","year":"2012","unstructured":"S. Shankar , R. Nehme , J. Aguilar-Saborit , A. Chung , M. Elhemali , A. Halverson , E. Robinson , M. S. Subramanian , D. DeWitt , and C. Galindo-Legaria . Query Optimization in Microsoft SQL Server PDW . In SIGMOD , pages 767 -- 776 , 2012 . 10.1145\/2213836.2213953 S. Shankar, R. Nehme, J. Aguilar-Saborit, A. Chung, M. Elhemali, A. Halverson, E. Robinson, M. S. Subramanian, D. DeWitt, and C. Galindo-Legaria. Query Optimization in Microsoft SQL Server PDW. In SIGMOD, pages 767--776, 2012. 10.1145\/2213836.2213953"},{"key":"e_1_2_1_11_1","first-page":"1337","volume-title":"ICDE","author":"Silva Y. N.","year":"2012","unstructured":"Y. N. Silva , P. Larson , and J. Zhou . Exploiting Common Subexpressions for Cloud Query Processing . In ICDE , pages 1337 -- 1348 , 2012 . 10.1109\/ICDE.2012.106 Y. N. Silva, P. Larson, and J. Zhou. Exploiting Common Subexpressions for Cloud Query Processing. In ICDE, pages 1337--1348, 2012. 10.1109\/ICDE.2012.106"},{"key":"e_1_2_1_12_1","first-page":"337","volume-title":"SIGMOD","author":"Soliman M. A.","year":"2014","unstructured":"M. A. Soliman , L. Antova , V. Raghavan , A. El-Helw , Z. Gu , E. Shen , G. C. Caragea , C. Garcia-Alvarado , F. Rahman , M. Petropoulos , F. Waas , S. Narayanan , K. Krikellas , and R. Baldwin . Orca: A Modular Query Optimizer Architecture for Big Data . In SIGMOD , pages 337 -- 348 , 2014 . 10.1145\/2588555.2595637 M. A. Soliman, L. Antova, V. Raghavan, A. El-Helw, Z. Gu, E. Shen, G. C. Caragea, C. Garcia-Alvarado, F. Rahman, M. Petropoulos, F. Waas, S. Narayanan, K. Krikellas, and R. Baldwin. Orca: A Modular Query Optimizer Architecture for Big Data. In SIGMOD, pages 337--348, 2014. 10.1145\/2588555.2595637"},{"key":"e_1_2_1_13_1","unstructured":"TPC. TPC-DS Benchmark. http:\/\/www.tpc.org\/tpcds. TPC. TPC-DS Benchmark. http:\/\/www.tpc.org\/tpcds."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1145\/1247480.1247540","volume-title":"SIGMOD","author":"Zhou J.","year":"2007","unstructured":"J. Zhou , P. Larson , J. C. Freytag , and W. Lehner . Efficient Exploitation of Similar Subexpressions for Query Processing . In SIGMOD , pages 533 -- 544 , 2007 . 10.1145\/1247480.1247540 J. Zhou, P. Larson, J. C. Freytag, and W. Lehner. Efficient Exploitation of Similar Subexpressions for Query Processing. In SIGMOD, pages 533--544, 2007. 10.1145\/1247480.1247540"},{"key":"e_1_2_1_15_1","first-page":"1060","volume-title":"ICDE","author":"Zhou J.","year":"2010","unstructured":"J. Zhou , P.-\u00c5. Larson, and R. Chaiken . Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer . In ICDE , pages 1060 -- 1071 , 2010 . J. Zhou, P.-\u00c5. Larson, and R. Chaiken. Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer. In ICDE, pages 1060--1071, 2010."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2824032.2824068","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,14]],"date-time":"2023-08-14T05:50:01Z","timestamp":1691992201000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2824032.2824068"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,8]]},"references-count":15,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,8]]}},"alternative-id":["10.14778\/2824032.2824068"],"URL":"https:\/\/doi.org\/10.14778\/2824032.2824068","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2015,8]]}}}