{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T15:29:15Z","timestamp":1778167755956,"version":"3.51.4"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>Many organizations have embraced the \"Lakehouse\" data management paradigm, which involves constructing structured data warehouses on top of open, unstructured data lakes. This approach stands in stark contrast to traditional, closed, relational databases and introduces challenges for performance and stability of distributed query processors. Firstly, in large-scale, open Lakehouses with uncurated data, high ingestion rates, external tables, or deeply nested schemas, it is often costly or wasteful to maintain perfect and up-to-date table and column statistics. Secondly, inherently imperfect cardinality estimates with conjunctive predicates, joins and user-defined functions can lead to bad query plans. Thirdly, for the sheer magnitude of data involved, strictly relying on static query plan decisions can result in performance and stability issues such as excessive data movement, substantial disk spillage, or high memory pressure. To address these challenges, this paper presents our design, implementation, evaluation and practice of the Adaptive Query Execution (AQE) framework, which exploits natural execution pipeline breakers in query plans to collect accurate statistics and re-optimize them at runtime for both performance and robustness. In the TPC-DS benchmark, the technique demonstrates up to 25\u00d7 per query speedup. At Databricks, AQE has been successfully deployed in production for multiple years. It powers billions of queries and ETL jobs to process exabytes of data per day, through key enterprise products such as Databricks Runtime, Databricks SQL, and Delta Live Tables.<\/jats:p>","DOI":"10.14778\/3685800.3685818","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"3947-3959","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Adaptive and Robust Query Execution for Lakehouses at Scale"],"prefix":"10.14778","volume":"17","author":[{"given":"Maryann","family":"Xue","sequence":"first","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Yingyi","family":"Bu","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Abhishek","family":"Somani","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Wenchen","family":"Fan","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Ziqi","family":"Liu","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Steven","family":"Chen","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Herman","family":"van Hovell","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Bart","family":"Samwel","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Mostafa","family":"Mokhtar","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"RK","family":"Korlapati","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Andy","family":"Lam","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Yunxiao","family":"Ma","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Vuk","family":"Ercegovac","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Jiexing","family":"Li","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Alexander","family":"Behm","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Yuanjian","family":"Li","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Xiao","family":"Li","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Sriram","family":"Krishnamurthy","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Amit","family":"Shukla","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Michalis","family":"Petropoulos","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Sameer","family":"Paranjpye","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Reynold","family":"Xin","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]},{"given":"Matei","family":"Zaharia","sequence":"additional","affiliation":[{"name":"Databricks Inc., USA"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_2_1","volume-title":"In-memory query execution in Google Big-Query. https:\/\/cloud.google.com\/blog\/products\/bigquery\/in-memory-query-execution-in-google-bigquery. Last accessed","author":"Ahmadi Hossein","year":"2024","unstructured":"Hossein Ahmadi. 2016. In-memory query execution in Google Big-Query. https:\/\/cloud.google.com\/blog\/products\/bigquery\/in-memory-query-execution-in-google-bigquery. Last accessed: July 18, 2024."},{"key":"e_1_2_1_3_1","volume-title":"https:\/\/hadoop.apache.org\/. Last accessed","author":"Hadoop Apache","year":"2024","unstructured":"Apache Hadoop. 2008. https:\/\/hadoop.apache.org\/. Last accessed: July 18, 2024."},{"key":"e_1_2_1_4_1","volume-title":"https:\/\/iceberg.apache.org\/. Last accessed","author":"Iceberg Apache","year":"2024","unstructured":"Apache Iceberg. 2020. https:\/\/iceberg.apache.org\/. Last accessed: July 18, 2024."},{"key":"e_1_2_1_5_1","volume-title":"https:\/\/spark.apache.org\/. Last accessed","author":"Spark Apache","year":"2024","unstructured":"Apache Spark. 2010. https:\/\/spark.apache.org\/. Last accessed: July 18, 2024."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415560"},{"key":"e_1_2_1_7_1","volume-title":"Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In CIDR.","author":"Armbrust Michael","year":"2021","unstructured":"Michael Armbrust, Ali Ghodsi, Reynold Xin, and Matei Zaharia. 2021. Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In CIDR."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_9_1","volume-title":"Hellerstein","author":"Avnur Ron","year":"2000","unstructured":"Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously Adaptive Query Processing. In Proc. ACM SIGMOD. 261--272."},{"key":"e_1_2_1_10_1","volume-title":"DeWitt","author":"Babu Shivnath","year":"2005","unstructured":"Shivnath Babu, Pedro Bizarro, and David J. DeWitt. 2005. Proactive re-optimization with Rio. In Proc. ACM SIGMOD. 936--938."},{"key":"e_1_2_1_11_1","volume-title":"Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman Van Hovell, Maryann Xue, Reynold Xin, and Matei Zaharia.","author":"Behm Alexander","year":"2022","unstructured":"Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman Van Hovell, Maryann Xue, Reynold Xin, and Matei Zaharia. 2022. Photon: A Fast Query Engine for Lakehouse Systems. In Proc. ACM SIGMOD. 2326--2339."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687563"},{"key":"e_1_2_1_13_1","volume-title":"https:\/\/cloud.google.com\/bigquery\/docs. Last accessed","year":"2024","unstructured":"BigQuery. 2015. https:\/\/cloud.google.com\/bigquery\/docs. Last accessed: July 18, 2024."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_15_1","first-page":"82","article-title":"Integrated Querying of SQL database data and S3 data in Amazon Redshift","volume":"41","author":"Cai Mengchu","year":"2018","unstructured":"Mengchu Cai, Martin Grund, Anurag Gupta, Fabian Nagel, Ippokratis Pandis, Yannis Papakonstantinou, and Michalis Petropoulos. 2018. Integrated Querying of SQL database data and S3 data in Amazon Redshift. IEEE Data Eng. Bull. 41, 2, 82--90.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_16_1","volume-title":"Proc. VLDB Endow. 679--682","author":"Catozzi John","year":"2001","unstructured":"John Catozzi and Sorana Rabinovici. 2001. Operating System Extensions for the Teradata Parallel VLDB. In Proc. VLDB Endow. 679--682."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007602"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056097"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741"},{"key":"e_1_2_1_20_1","volume-title":"Proc. USENIX OSDI. 137--150","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proc. USENIX OSDI. 137--150."},{"key":"e_1_2_1_21_1","volume-title":"https:\/\/www.databricks.com\/blog\/delta-uniform-universal-format-lakehouse-interoperability. Last accessed","author":"UniForm Delta","year":"2024","unstructured":"Delta UniForm. 2023. https:\/\/www.databricks.com\/blog\/delta-uniform-universal-format-lakehouse-interoperability. Last accessed: July 18, 2024."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.50905"},{"key":"e_1_2_1_23_1","volume-title":"Faster SQL Queries on Delta Lake with Dynamic File Pruning. https:\/\/www.databricks.com\/blog\/2020\/04\/30\/faster-sql-queries-on-delta-lake-with-dynamic-file-pruning.html. Last accessed","author":"DFP.","year":"2024","unstructured":"DFP. 2020. Faster SQL Queries on Delta Lake with Dynamic File Pruning. https:\/\/www.databricks.com\/blog\/2020\/04\/30\/faster-sql-queries-on-delta-lake-with-dynamic-file-pruning.html. Last accessed: July 18, 2024."},{"key":"e_1_2_1_24_1","volume-title":"Proc. VLDB Endow. 209--219","author":"Fushimi Shinya","year":"1986","unstructured":"Shinya Fushimi, Masaru Kitsuregawa, and Hidehiko Tanaka. 1986. An Overview of The System Software of A Parallel Relational Database Machine GRACE. In Proc. VLDB Endow. 209--219."},{"key":"e_1_2_1_25_1","first-page":"19","article-title":"The Cascades Framework for Query Optimization","volume":"18","author":"Graefe Goetz","year":"1995","unstructured":"Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/67544.66960"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742795"},{"key":"e_1_2_1_28_1","volume-title":"DeWitt","author":"Kabra Navin","year":"1998","unstructured":"Navin Kabra and David J. DeWitt. 1998. Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. In Proc. ACM SIGMOD. 106--117."},{"key":"e_1_2_1_29_1","volume-title":"Proc. CIDR.","author":"Kornacker Marcel","year":"2015","unstructured":"Marcel Kornacker, Alexander Behm, Victor Bittorf, Taras Bobrovytsky, Casey Ching, Alan Choi, Justin Erickson, Martin Grund, Daniel Hecht, Matthew Jacobs, Ishaan Joshi, Lenni Kuff, Dileep Kumar, Alex Leblang, Nong Li, Ippokratis Pandis, Henry Robinson, David Rorke, Silvius Rus, John Russell, Dimitris Tsirogiannis, Skye Wanderman-Milne, and Michael Yoder. 2015. Impala: A Modern, Open-Source SQL Engine for Hadoop. In Proc. CIDR."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367518"},{"key":"e_1_2_1_31_1","volume-title":"https:\/\/delta.io. Last accessed","author":"Delta Lake Linux Foundation","year":"2024","unstructured":"Linux Foundation Delta Lake. 2020. https:\/\/delta.io. Last accessed: July 18, 2024."},{"key":"e_1_2_1_32_1","unstructured":"Guy Lohman. 2014. Is Query Optimization a \"Solved\" Problem?. In ACM SIGMOD Blog."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007642"},{"key":"e_1_2_1_34_1","volume-title":"Inside the SQL Server Query Optimizer","author":"Nevarez Benjamin","unstructured":"Benjamin Nevarez. 2011. Inside the SQL Server Query Optimizer. Simple Talk Publishing."},{"key":"e_1_2_1_35_1","volume-title":"Adaptive Plans in Oracle Database 12c. https:\/\/oracle-base.com\/articles\/12c\/adaptive-plans-12cr1. Last accessed","author":"Plan Oracle Adaptive","year":"2024","unstructured":"Oracle Adaptive Plan. 2013. Adaptive Plans in Oracle Database 12c. https:\/\/oracle-base.com\/articles\/12c\/adaptive-plans-12cr1. Last accessed: July 18, 2024."},{"key":"e_1_2_1_36_1","first-page":"1","article-title":"Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems","volume":"1","author":"Pavlopoulou Christina","year":"2022","unstructured":"Christina Pavlopoulou, Michael J. Carey, and Vassilis J. Tsotras. 2022. Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems. In Proc. EDBT. 1:1--1:12.","journal-title":"Proc. EDBT."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3229871"},{"key":"e_1_2_1_38_1","volume-title":"Price","author":"Selinger Patricia G.","year":"1979","unstructured":"Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proc. ACM SIGMOD. 23--34."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/6314.6315"},{"key":"e_1_2_1_40_1","volume-title":"Proc. VLDB Endow. 19--28","author":"Stillger Michael","year":"2001","unstructured":"Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In Proc. VLDB Endow. 19--28."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589769"},{"key":"e_1_2_1_42_1","unstructured":"Teradata IPE. 2024. Incremental Planning and Execution. https:\/\/docs.teradata.com\/r\/Enterprise_IntelliFlex_VMware\/SQL-Request-and-Transaction-Processing\/Query-Rewrite-Statistics-and-Optimization\/Teradata-Optimizer-Processes. Last accessed: July 18 2024."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/320473.320479"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465288"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376720"},{"key":"e_1_2_1_46_1","volume-title":"Proc. USENIX NSDI. 15--28","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proc. USENIX NSDI. 15--28."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447802"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685818","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:31:49Z","timestamp":1735623109000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685818"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":47,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685818"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685818","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}