{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T16:30:36Z","timestamp":1759336236545,"version":"3.32.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>An important feature of modern query optimizers is the ability to produce a query plan that is optimal for the underlying data set. This requires the ability to estimate cardinalities and computational costs of intermediate query plan nodes, which is highly dependent on both the query shape and the underlying data distribution. Traditional methods include collecting statistics on base tables and implementing cardinality and computational cost derivation inside the optimizer, which is error-prone for complex query shapes. This paper presents Presto's novel history-based optimization framework (HBO), which collects execution histories and uses them to optimize similar queries in the future. The framework produces accurate estimates for complex query shapes in a lightweight, automated manner, and adapts automatically to changes in underlying data distributions. We present the design and implementation of the HBO framework and provide details on its use in various optimization rules, as well as details on implementing the statistics store on top of a Redis key-value store. We also present the results of running HBO in production in two large data infrastructure organizations (Meta and Uber).<\/jats:p>","DOI":"10.14778\/3685800.3685828","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4077-4089","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Presto's History-Based Query Optimizer"],"prefix":"10.14778","volume":"17","author":[{"given":"Pranjal","family":"Shankhdhar","sequence":"first","affiliation":[{"name":"Meta Platforms, Menlo Park, CA, USA"}]},{"given":"Feilong","family":"Liu","sequence":"additional","affiliation":[{"name":"Meta Platforms, Menlo Park, CA, USA"}]},{"given":"Jay","family":"Narale","sequence":"additional","affiliation":[{"name":"Uber Technologies, San Francisco, CA, USA"}]},{"given":"James","family":"Sun","sequence":"additional","affiliation":[{"name":"Meta Platforms, Menlo Park, CA, USA"}]},{"given":"Rebecca","family":"Schlussel","sequence":"additional","affiliation":[{"name":"Meta Platforms, Menlo Park, CA, USA"}]},{"given":"Lyublena","family":"Antova","sequence":"additional","affiliation":[{"name":"Meta Platforms, Menlo Park, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2019. Improving the Presto planner for better push down and data federation. https:\/\/prestodb.io\/blog\/2019\/12\/23\/improve-presto-planner\/"},{"key":"e_1_2_1_2_1","unstructured":"2024. Oracle query hints. https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/21\/sqlrf\/Comments.html#GUID-D316D545-89E2-4D54-977F-FC97815CD62E__BABEAFGF"},{"key":"e_1_2_1_3_1","unstructured":"2024. Presto: Distributed SQL query engine for big data. https:\/\/github.com\/prestodb"},{"key":"e_1_2_1_4_1","unstructured":"2024. Redis in-memory key-value store. httsp:\/\/www.redis.io"},{"key":"e_1_2_1_5_1","unstructured":"2024. SQL server query hints. https:\/\/learn.microsoft.com\/en-us\/sql\/t-sql\/queries\/hints-transact-sql-query?view=sql-server-ver16"},{"key":"e_1_2_1_6_1","unstructured":"2024. Teradata query optimizers. https:\/\/docs.teradata.com\/r\/Enterprise_IntelliFlex_VMware\/SQL-Request-and-Transaction-Processing\/Query-Rewrite-Statistics-and-Optimization\/Query-Optimizers"},{"key":"e_1_2_1_7_1","first-page":"1026","article-title":"Cost-based query transformation in Oracle","volume":"6","author":"Ahmed Rafi","year":"2006","unstructured":"Rafi Ahmed, Allison Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zait, and Thierry Cruanes. 2006. Cost-based query transformation in Oracle. In VLDB, Vol. 6. 1026--1036.","journal-title":"VLDB"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526050"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611544"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/320455.320457"},{"key":"e_1_2_1_11_1","unstructured":"Kassem Awada Mohamed Y Eltabakh Conrad Tang Mohammed Al-Kateb Sanjay Nair and Grace Au. 2020. Cost Estimation Across Heterogeneous SQL-Based Big Data Infrastructures in Teradata IntelliSphere.. In EDBT. 534--545."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375686"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687556"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012722442-8\/50089-6"},{"key":"e_1_2_1_15_1","volume-title":"The Cascades Framework for Query Optimization","author":"Graefe Goetz","year":"1995","unstructured":"Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data(base) Engineering Bulletin 18 (1995), 19--29. https:\/\/api.semanticscholar.org\/CorpusID:260706023"},{"key":"e_1_2_1_16_1","volume-title":"Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al.","author":"Han Yuxing","year":"2021","unstructured":"Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, et al. 2021. Cardinality estimation in DBMS: A comprehensive benchmark evaluation. arXiv preprint arXiv:2109.05877 (2021)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749438"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551799"},{"key":"e_1_2_1_19_1","volume-title":"Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, and Yuanyuan Tian.","author":"Huang Hanxian","year":"2024","unstructured":"Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, and Yuanyuan Tian. 2024. Sibyl: Forecasting Time-Evolving Query Workloads. arXiv preprint arXiv:2401.03723 (2024)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012722442-8\/50011-2"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526154"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_23_1","volume-title":"Cardinality Estimation Done Right: Index-Based Join Sampling. In Conference on Innovative Data Systems Research. https:\/\/api.semanticscholar.org\/CorpusID:8154743","author":"Leis Viktor","year":"2017","unstructured":"Viktor Leis, Bernhard Radke, Andrey Gubichev, Alfons Kemper, and Thomas Neumann. 2017. Cardinality Estimation Done Right: Index-Based Join Sampling. In Conference on Innovative Data Systems Research. https:\/\/api.semanticscholar.org\/CorpusID:8154743"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3542700.3542703"},{"key":"e_1_2_1_25_1","volume-title":"Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711","author":"Marcus Ryan","year":"2019","unstructured":"Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711 (2019)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.421.0098"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50205"},{"key":"e_1_2_1_28_1","unstructured":"Mary Tork Roth Laura M Haas and Fatma Ozcan. 1999. Cost models do matter: Providing cost information for diverse data sources in a federated system. IBM Thomas J. Watson Research Division."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00196"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213953"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595637"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485459"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589769"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3461535.3461552"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588721"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454175"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685828","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:32:58Z","timestamp":1735623178000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685828"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":37,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685828"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685828","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}