{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:18:21Z","timestamp":1779175101064,"version":"3.51.4"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>Database research and development is heavily influenced by benchmarks, such as the industry-standard TPC-H and TPC-DS for analytical systems. However, these twenty-year-old benchmarks neither capture how databases are deployed nor what workloads modern cloud data warehouse systems face these days.<\/jats:p>\n          <jats:p>In this paper, we summarize well-known, confirm suspected, and unearth novel discrepancies between TPC-H\/DS and actual workloads using empirical data. We base our analysis on telemetrics from Amazon Redshift - one of the largest cloud data warehouse deployments. Among others, we show how write-heavy data pipelines are prominent, workloads vary over time (in both load and type), queries are repetitive, and how most properties of queries or workloads experience very long tailed distributions. We conclude that data warehouse benchmarks, just like database systems, need to become more holistic and stop focusing solely on query engine performance. Finally, we publish a dataset containing query statistics of 200 randomly selected Redshift serverless and provisioned instances (each) over a three-month period, as a basis for building more realistic benchmarks.<\/jats:p>","DOI":"10.14778\/3681954.3682031","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"3694-3706","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["Why TPC is Not Enough: An Analysis of the Amazon Redshift Fleet"],"prefix":"10.14778","volume":"17","author":[{"given":"Alexander","family":"van Renen","sequence":"first","affiliation":[{"name":"UTN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dominik","family":"Horn","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pascal","family":"Pfeil","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kapil","family":"Vaidya","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenjian","family":"Dong","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Murali","family":"Narayanaswamy","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhengchun","family":"Liu","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gaurav","family":"Saxena","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreas","family":"Kipf","sequence":"additional","affiliation":[{"name":"UTN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tim","family":"Kraska","sequence":"additional","affiliation":[{"name":"Amazon Web Services"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"11","volume":"202","author":"Amazon ANALYZE","unstructured":"[n.d.]. ANALYZE - Amazon Redshift --- docs.aws.amazon.com. https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/r_ANALYZE.html. [Accessed 2023-11-27].","journal-title":"Accessed"},{"key":"e_1_2_1_2_1","first-page":"12","volume":"202","author":"Amazon Redshift","unstructured":"Amazon. [n.d.]. Amazon Redshift continues its price-performance leadership. https:\/\/aws.amazon.com\/blogs\/big-data\/amazon-redshift-continues-its-price-performance-leadership\/. [Accessed 2023-12-14].","journal-title":"Accessed"},{"key":"e_1_2_1_3_1","first-page":"12","volume":"202","author":"Amazon Redshift SQL.","unstructured":"Amazon. [n.d.]. Amazon Redshift SQL. https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/c_redshift-sql.html. [Accessed 2023-12-18].","journal-title":"Accessed"},{"key":"e_1_2_1_4_1","first-page":"1","volume":"202","author":"What","unstructured":"Amazon. [n.d.]. What is Zero ETL? https:\/\/aws.amazon.com\/what-is\/zero-etl\/. [Accessed 2024-1-26].","journal-title":"Accessed"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526045"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Carsten Binnig Donald Kossmann Tim Kraska and Simon Loesing. 2009. How is the weather tomorrow?: towards a benchmark for the cloud. In DBTest.","DOI":"10.1145\/1594156.1594168"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407851"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Graham Cormode and Ke Yi. 2020. Small summaries for big data.","DOI":"10.1017\/9781108769938"},{"key":"e_1_2_1_9_1","first-page":"12","volume":"202","author":"Official Data Warehousing Performance Databricks Sets","unstructured":"Databricks. [n.d.]. Databricks Sets Official Data Warehousing Performance Record. https:\/\/www.databricks.com\/blog\/2021\/11\/02\/databricks-sets-official-data-warehousing-performance-record.html. [Accessed 2023-12-14].","journal-title":"Accessed"},{"key":"e_1_2_1_10_1","volume-title":"Yanzhu Ji, Davide Pagano, Gopal Paliwal, Panos Parchas, Pascal Pfeil, Orestis Polychroniou, Gaurav Saxena, Aamer Shah, Amina Voloder, Sherry Xiao, Davis Zhang, and Tim Kraska.","author":"Ding Jialin","year":"2024","unstructured":"Jialin Ding, Matt Abrams, Sanghita Bandyopadhyay, Luciano Di Palma, Yanzhu Ji, Davide Pagano, Gopal Paliwal, Panos Parchas, Pascal Pfeil, Orestis Polychroniou, Gaurav Saxena, Aamer Shah, Amina Voloder, Sherry Xiao, Davis Zhang, and Tim Kraska. 2024. Automated multidimensional data layouts in Amazon Redshift. In SIGMOD\/PODS 2024. https:\/\/www.amazon.science\/publications\/automated-multidimensional-data-layouts-in-amazon-redshift"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565857"},{"key":"e_1_2_1_12_1","volume-title":"Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. VLDB","author":"Ding Jialin","year":"2020","unstructured":"Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. VLDB (2020)."},{"key":"e_1_2_1_13_1","volume-title":"Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi.","author":"Ferdman Michael","year":"2012","unstructured":"Michael Ferdman, Almutaz Adileh, Yusuf Onur Ko\u00e7berber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS."},{"key":"e_1_2_1_14_1","volume-title":"International Database Engineering & Applications Symposium.","author":"Fernandes S\u00e9rgio","year":"2015","unstructured":"S\u00e9rgio Fernandes and Jorge Bernardino. 2015. What is BigQuery?. In International Database Engineering & Applications Symposium."},{"key":"e_1_2_1_15_1","first-page":"12","volume":"202","author":"Warehouse Benchmark Cloud Data","unstructured":"Fivetran. [n.d.]. Cloud Data Warehouse Benchmark. https:\/\/www.fivetran.com\/blog\/warehouse-benchmark. [Accessed 2023-12-14].","journal-title":"Accessed"},{"key":"e_1_2_1_16_1","unstructured":"Chang Ge Yinan Li Eric Eilebrecht Badrish Chandramouli and Donald Kossmann. 2019. Speculative Distributed CSV Data Parsing for Big Data Analytics. In SIGMOD."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2398438"},{"key":"e_1_2_1_18_1","volume-title":"Lazowska","author":"Jain Shrainik","year":"2016","unstructured":"Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed Lazowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In SIGMOD."},{"key":"e_1_2_1_19_1","volume-title":"Towards Cost-Optimal Query Processing in the Cloud. VLDB","author":"Leis Viktor","year":"2021","unstructured":"Viktor Leis and Maximilian Kuschewski. 2021. Towards Cost-Optimal Query Processing in the Cloud. VLDB (2021)."},{"key":"e_1_2_1_20_1","unstructured":"Zheng Li Liam O'Brien He Zhang and Rainbow Cai. 2012. On a Catalogue of Metrics for Evaluating Commercial Cloud Services. In GRID."},{"key":"e_1_2_1_21_1","volume-title":"Preventing bad plans by bounding the impact of cardinality estimation errors. VLDB","author":"Moerkotte Guido","year":"2009","unstructured":"Guido Moerkotte, Thomas Neumann, and Gabriele Steidl. 2009. Preventing bad plans by bounding the impact of cardinality estimation errors. VLDB (2009)."},{"key":"e_1_2_1_22_1","unstructured":"Ingo M\u00fcller Cornelius Ratsch and Franz F\u00e4rber. 2014. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems. In EDBT."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3653394"},{"key":"e_1_2_1_24_1","volume-title":"Thomas Fenech, Gonzalo G\u00f3mez, Davide Brini, Alejandro Montero, David Carrera, Umar Farooq Minhas, Jos\u00e9 A. Blakeley, Donald Kossmann, Raghu Ramakrishnan, and Clemens A. Szyperski.","author":"Poggi Nicol\u00e1s","year":"2019","unstructured":"Nicol\u00e1s Poggi, V\u00edctor Cuevas-Vicentt\u00edn, Josep Lluis Berral, Thomas Fenech, Gonzalo G\u00f3mez, Davide Brini, Alejandro Montero, David Carrera, Umar Farooq Minhas, Jos\u00e9 A. Blakeley, Donald Kossmann, Raghu Ramakrishnan, and Clemens A. Szyperski. 2019. Benchmarking Elastic Cloud Big Data Services Under SLA Constraints. In TPCTC."},{"key":"e_1_2_1_25_1","unstructured":"Alice Rey Michael Freitag and Thomas Neumann. 2023. Seamless Integration of Parquet Files into Data Processing. In BTW."},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Wolf R\u00f6diger Sam Idicula Alfons Kemper and Thomas Neumann. 2016. Flow-Join: Adaptive skew handling for distributed joins over high-speed networks. In ICDE.","DOI":"10.1109\/ICDE.2016.7498324"},{"key":"e_1_2_1_27_1","volume-title":"Naresh Chainani, Chunbin Lin, George Caragea, Fahim Chowdhury, Ryan Marcus, Tim Kraska, Ippokratis Pandis, and Balakrishnan (Murali) Narayanaswamy.","author":"Saxena Gaurav","year":"2023","unstructured":"Gaurav Saxena, Mohammad Arifur Rahman, Naresh Chainani, Chunbin Lin, George Caragea, Fahim Chowdhury, Ryan Marcus, Tim Kraska, Ippokratis Pandis, and Balakrishnan (Murali) Narayanaswamy. 2023. Auto-WLM: Machine learning enhanced workload management in Amazon Redshift. In SIGMOD\/PODS 2023. https:\/\/www.amazon.science\/publications\/auto-wlm-machine-learning-enhanced-workload-management-in-amazon-redshift"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3653395"},{"key":"e_1_2_1_29_1","first-page":"12","volume":"202","author":"Benchmarks Industry","unstructured":"Snowflake. [n.d.]. Industry Benchmarks and Competing with Integrity. https:\/\/www.snowflake.com\/blog\/industry-benchmarks-and-competing-with-integrity. [Accessed 2023-12-14].","journal-title":"Accessed"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352133"},{"key":"e_1_2_1_31_1","first-page":"11","volume":"202","author":"Transaction Processing Performance Council (TPC). 2021. TPC BENCHMARK\u2122 DS Standard Specification Version 3.2.0. https:\/\/www.tpc.org\/TPC_Documents_Current_Versions\/pdf\/TPC-DS_v3.2.0.pdf. [","unstructured":"Transaction Processing Performance Council (TPC). 2021. TPC BENCHMARK\u2122 DS Standard Specification Version 3.2.0. https:\/\/www.tpc.org\/TPC_Documents_Current_Versions\/pdf\/TPC-DS_v3.2.0.pdf. [Accessed 2023-11-28].","journal-title":"Accessed"},{"key":"e_1_2_1_32_1","first-page":"11","volume":"202","author":"Transaction Processing Performance Council (TPC). 2022. TPC BENCHMARK\u2122 H Standard Specification Revision 3.0.1. https:\/\/www.tpc.org\/TPC_Documents_Current_Versions\/pdf\/TPC-H_v3.0.1.pdf. [","unstructured":"Transaction Processing Performance Council (TPC). 2022. TPC BENCHMARK\u2122 H Standard Specification Revision 3.0.1. https:\/\/www.tpc.org\/TPC_Documents_Current_Versions\/pdf\/TPC-H_v3.0.1.pdf. [Accessed 2023-11-28].","journal-title":"Accessed"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Wei-Tek Tsai Yu Huang and Qihong Shao. 2011. Testing the scalability of SaaS applications. In SOCA.","DOI":"10.1109\/SOCA.2011.6166245"},{"key":"e_1_2_1_34_1","volume-title":"Cloud Analytics Benchmark. VLDB","author":"van Renen Alexander","year":"2023","unstructured":"Alexander van Renen and Viktor Leis. 2023. Cloud Analytics Benchmark. VLDB (2023)."},{"key":"e_1_2_1_35_1","volume-title":"Get Real: How Benchmarks Fail to Represent the Real World. In DBTest@SIGMOD.","author":"Vogelsgesang Adrian","year":"2018","unstructured":"Adrian Vogelsgesang, Michael Haubenschild, Jan Finis, Alfons Kemper, Viktor Leis, Tobias M\u00fchlbauer, Thomas Neumann, and Manuel Then. 2018. Get Real: How Benchmarks Fail to Represent the Real World. In DBTest@SIGMOD."},{"key":"e_1_2_1_36_1","first-page":"04","volume":"202","author":"Vuppalapati Midhul","unstructured":"Midhul Vuppalapati. [n.d.]. Snowflake dataset containing statistics for 70 million queries over 14 day period. https:\/\/github.com\/resource-disaggregation\/snowset. [Accessed 2022-04-15].","journal-title":"Accessed"},{"key":"e_1_2_1_37_1","unstructured":"Midhul Vuppalapati Justin Miron Rachit Agarwal Dan Truong Ashish Motivala and Thierry Cruanes. 2020. Building An Elastic Query Engine on Disaggregated Storage. In USENIX NSDI."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3682031","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T18:45:01Z","timestamp":1725475501000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3682031"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":37,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3682031"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3682031","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}