{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T20:21:01Z","timestamp":1760473261761,"version":"3.40.1"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T00:00:00Z","timestamp":1740787200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/deed.de"},{"start":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T00:00:00Z","timestamp":1741132800000},"content-version":"vor","delay-in-days":4,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/deed.de"}],"funder":[{"name":"IT University"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Datenbank Spektrum"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The paper discusses the lessons learned from building Snowflake, a\u00a0data management system for the cloud. Given the need for systems that can scale to handle large data volumes, provide expressive programming interfaces, and leverage the benefits of cloud computing, it describes the architecture of a\u00a0cloud-based data management system and optimization techniques specific to the cloud. Key techniques include pruning large file sets at both compile time and query runtime, optimizing data layouts in the background, and, more generally, the importance of performing maintenance tasks in the background, which is enabled by cloud resources. The paper also explains the need for using immutable files and the implications for data modification queries. Finally, it highlights the operational aspects of building and maintaining a\u00a0data management system that functions as an online cloud service. The paper concludes by outlining future directions for cloud-based data management systems.<\/jats:p>","DOI":"10.1007\/s13222-025-00494-9","type":"journal-article","created":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T11:19:52Z","timestamp":1741173592000},"page":"17-28","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Building a\u00a0Data Management System for the Cloud: Lessons Learned and Future Directions"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8528-4168","authenticated-orcid":false,"given":"Martin","family":"Hentschel","sequence":"first","affiliation":[]},{"given":"Jonathan","family":"Dees","sequence":"additional","affiliation":[]},{"given":"Florian","family":"Funke","sequence":"additional","affiliation":[]},{"given":"Max","family":"Heimel","sequence":"additional","affiliation":[]},{"given":"Ismail","family":"Oukid","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,5]]},"reference":[{"key":"494_CR1","series-title":"Technical Report US44413318","volume-title":"White paper: The digitization of the world from edge to core","author":"D Reinsel","year":"2018","unstructured":"Reinsel\u00a0D, Gantz\u00a0J, Rydning\u00a0J (2018) White paper: The digitization of the world from edge to core. IDC. Technical Report US44413318 (https:\/\/www.seagate.com\/files\/www-content\/our-story\/trends\/files\/idc-seagate-dataage-whitepaper.pdf)"},{"key":"494_CR2","volume-title":"MapReduce: Simplified data processing on large clusters","author":"J Dean","year":"2004","unstructured":"Dean\u00a0J, Ghemawat\u00a0S (2004) MapReduce: Simplified data processing on large clusters. OSDI."},{"key":"494_CR3","unstructured":"Apache Hadoop. https:\/\/hadoop.apache.org"},{"key":"494_CR4","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741","volume-title":"The Snowflake elastic data warehouse","author":"B Dageville","year":"2016","unstructured":"Dageville\u00a0B, Cruanes\u00a0T, Zukowski\u00a0M, Antonov\u00a0V, Avanes\u00a0A, Bock\u00a0J, Claybaugh\u00a0J, Engovatov\u00a0D, Hentschel\u00a0M, Huang\u00a0J et\u00a0al (2016) The Snowflake elastic data warehouse. ACM SIGMOD."},{"key":"494_CR5","unstructured":"Amazon S3. https:\/\/aws.amazon.com\/s3"},{"key":"494_CR6","unstructured":"Azure Blob Storage https:\/\/azure.microsoft.com\/en-us\/products\/storage\/blobs"},{"key":"494_CR7","unstructured":"Google Cloud Storage. https:\/\/cloud.google.com\/storage"},{"key":"494_CR8","unstructured":"Amazon EC2. https:\/\/aws.amazon.com\/ec2"},{"key":"494_CR9","unstructured":"Azure Virtual Machines. https:\/\/azure.microsoft.com\/en-us\/products\/virtual-machines"},{"key":"494_CR10","unstructured":"Google Compute Engine. https:\/\/cloud.google.com\/products\/compute"},{"key":"494_CR11","unstructured":"Barr J Amazon S3 Update \u2013 Strong Read-After-Write Consistency. https:\/\/aws.amazon.com\/blogs\/aws\/amazon-s3-update-strong-read-after-write-consistency. Accessed 2024-10-02"},{"key":"494_CR12","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807227","volume-title":"Positional update handling in column stores","author":"S H\u00e9man","year":"2010","unstructured":"H\u00e9man\u00a0S, Zukowski\u00a0M, Nes\u00a0NJ, Sidirourgos\u00a0L, Boncz\u00a0P (2010) Positional update handling in column stores. ACM SIGMOD."},{"key":"494_CR13","unstructured":"Karpov N Delta lake deletion vectors. https:\/\/delta.io\/blog\/2023-07-05-deletion-vectors. Accessed 2024-10-02"},{"key":"494_CR14","unstructured":"Stokely M Efficiency at snowflake: free pool management. https:\/\/medium.com\/snowflake\/efficiency-at-snowflake-free-pool-management-9dd7a0bd34d1. Accessed 2024-10-02"},{"key":"494_CR15","volume-title":"FoundationDB: a\u00a0distributed unbundled transactional key value store","author":"J Zhou","year":"2021","unstructured":"Zhou\u00a0J, Xu\u00a0M, Shraer\u00a0A, Namasivayam\u00a0B, Miller\u00a0A, Tschannen\u00a0E, Atherton\u00a0S, Beamon\u00a0AJ, Sears\u00a0R, Leach\u00a0J, Rosenthal\u00a0D, Dong\u00a0X, Wilson\u00a0W, Collins\u00a0B, Scherer\u00a0D, Grieser\u00a0A, Liu\u00a0Y, Moore\u00a0A, Muppana\u00a0B, Su\u00a0X, Yadav\u00a0V (2021) FoundationDB: a\u00a0distributed unbundled transactional key value store. ACM SIGMOD."},{"key":"494_CR16","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583156","volume-title":"Cloud analytics benchmark","author":"A Renen","year":"2023","unstructured":"Renen\u00a0A, Leis\u00a0V (2023) Cloud analytics benchmark. VLDB."},{"key":"494_CR17","volume-title":"Small materialized aggregates: a\u00a0light weight index structure for data warehousing","author":"G Moerkotte","year":"1998","unstructured":"Moerkotte\u00a0G (1998) Small materialized aggregates: a\u00a0light weight index structure for data warehousing. VLDB."},{"key":"494_CR18","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476385","volume-title":"Big metadata: when metadata is big data","author":"P Edara","year":"2021","unstructured":"Edara\u00a0P, Pasumansky\u00a0M (2021) Big metadata: when metadata is big data. VLDB."},{"key":"494_CR19","unstructured":"Chilukuri Y, Waack J, Zimmerer A Say hello to superfast top\u2011K queries in snowflake. https:\/\/www.snowflake.com\/en\/blog\/super-fast-top-k-queries. Accessed 2024-10-02"},{"key":"494_CR20","doi-asserted-by":"crossref","unstructured":"Bernstein PA, Chiu D\u2011MW (1981) Using semi-joins to solve relational queries. J\u00a0ACM 28(1)","DOI":"10.1145\/322234.322238"},{"key":"494_CR21","volume-title":"R* optimizer validation and performance evaluation for local queries","author":"LF Mackert","year":"1986","unstructured":"Mackert\u00a0LF, Lohman\u00a0GM (1986) R* optimizer validation and performance evaluation for local queries. ACM SIGMOD."},{"key":"494_CR22","doi-asserted-by":"publisher","DOI":"10.1109\/32.52778","volume-title":"Optimal semijoins for distributed database systems","author":"JK Mullin","year":"1990","unstructured":"Mullin\u00a0JK (1990) Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering."},{"key":"494_CR23","unstructured":"Snowflake documentation: clustering keys & clustered tables. https:\/\/docs.snowflake.com\/en\/user-guide\/tables-clustering-keys"},{"key":"494_CR24","unstructured":"Snowflake documentation: micro-partitions & data clustering. https:\/\/docs.snowflake.com\/en\/user-guide\/tables-clustering-micropartitions"},{"key":"494_CR25","unstructured":"Amazon S3 API Reference: DeleteObjects. https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/API\/API_DeleteObjects.html. Accessed 2024-10-02"},{"key":"494_CR26","doi-asserted-by":"publisher","DOI":"10.1145\/3209950.3209958","volume-title":"Snowtrail: testing with production queries on a\u00a0cloud database","author":"J Yan","year":"2018","unstructured":"Yan\u00a0J, Jin\u00a0Q, Jain\u00a0S, Viglas\u00a0SD, Lee\u00a0A (2018) Snowtrail: testing with production queries on a\u00a0cloud database. Workshop on Testing Database Systems."},{"key":"494_CR27","unstructured":"Dreseler M (2023) How building an industry DBMS differs from building a\u00a0research one. Sponsor Talk, CIDR. https:\/\/www.cidrdb.org\/cidr2023\/slides\/sponsor-talk-snowflake-slides.pdf"},{"key":"494_CR28","unstructured":"Aboulnaga A, Salem K, Soror AA, Minhas UF, Kokosielis P, Kamath S (2009) Deploying database appliances in the cloud. IEEE Data Eng Bull 32(1)"},{"key":"494_CR29","unstructured":"Abadi DJ (2009) Data management in the cloud: Limitations and opportunities. IEEE Data Eng Bull 32(1)"},{"key":"494_CR30","doi-asserted-by":"crossref","unstructured":"Narasayya VR, Chaudhuri S (2021) Cloud data services: Workloads, architectures and multi-tenancy. Found Trends Databases 10(1)","DOI":"10.1561\/1900000060"},{"key":"494_CR31","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920886","volume-title":"Dremel: interactive analysis of web-scale datasets","author":"S Melnik","year":"2010","unstructured":"Melnik\u00a0S, Gubarev\u00a0A, Long\u00a0JJ, Romer\u00a0G, Shivakumar\u00a0S, Tolton\u00a0M, Vassilakis\u00a0T (2010) Dremel: interactive analysis of web-scale datasets. VLDB."},{"key":"494_CR32","volume-title":"The cosmos big data platform at Microsoft: over a\u00a0decade of progress and a\u00a0decade to look forward","author":"L Katahanas","year":"2021","unstructured":"Katahanas\u00a0L, Talapady\u00a0CB, Rowe\u00a0J, Zhang\u00a0F, Draves\u00a0R, Friedman\u00a0M, Filho\u00a0SMI, Kumar\u00a0A (2021) The cosmos big data platform at Microsoft: over a\u00a0decade of progress and a\u00a0decade to look forward. VLDB."},{"key":"494_CR33","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376645","volume-title":"Building a\u00a0database on S3","author":"M Brantner","year":"2008","unstructured":"Brantner\u00a0M, Florescu\u00a0D, Graf\u00a0D, Kossmann\u00a0D, Kraska\u00a0T (2008) Building a\u00a0database on S3. ACM SIGMOD."},{"key":"494_CR34","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742795","volume-title":"Amazon Redshift and the case for simpler data warehouses","author":"A Gupta","year":"2015","unstructured":"Gupta\u00a0A, Agarwal\u00a0D, Tan\u00a0D, Kulesza\u00a0J, Pathak\u00a0R, Stefani\u00a0S, Srinivasan\u00a0V (2015) Amazon Redshift and the case for simpler data warehouses. ACM SIGMOD."},{"key":"494_CR35","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526045","volume-title":"Amazon Redshift re-invented","author":"N Armenatzoglou","year":"2022","unstructured":"Armenatzoglou\u00a0N, Basu\u00a0S, Bhanoori\u00a0N, Cai\u00a0M, Chainani\u00a0N, Chinta\u00a0K, Govindaraju\u00a0V, Green\u00a0TJ, Gupta\u00a0M, Hillig\u00a0S et\u00a0al (2022) Amazon Redshift re-invented. ACM SIGMOD."},{"key":"494_CR36","volume-title":"POLARIS: The distributed SQL engine in Azure Synapse","author":"J Aguilar-Saborit","year":"2020","unstructured":"Aguilar-Saborit\u00a0J, Ramakrishnan\u00a0R, Srinivasan\u00a0K, Bocksrocker\u00a0K, Alagiannis\u00a0I, Sankara\u00a0M, Shafiei\u00a0M, Blakeley\u00a0J, Dasarathy\u00a0G, Dash\u00a0S et\u00a0al (2020) POLARIS: The distributed SQL engine in Azure Synapse. VLDB."},{"key":"494_CR37","unstructured":"Google BigQuery. https:\/\/cloud.google.com\/bigquery"},{"key":"494_CR38","volume-title":"Lakehouse: a\u00a0new generation of open platforms that unify data warehousing and advanced analytics","author":"M Armbrust","year":"2021","unstructured":"Armbrust\u00a0M, Ghodsi\u00a0A, Xin\u00a0R, Zaharia\u00a0M (2021) Lakehouse: a\u00a0new generation of open platforms that unify data warehousing and advanced analytics. CIDR."},{"key":"494_CR39","unstructured":"Databricks SQL. https:\/\/www.databricks.com\/product\/databricks-sql"},{"key":"494_CR40","unstructured":"Oracle Autonomous Data Warehouse. https:\/\/www.oracle.com\/autonomous-database\/autonomous-data-warehouse"},{"key":"494_CR41","unstructured":"IBM Db2. https:\/\/www.ibm.com\/db2"},{"key":"494_CR42","unstructured":"Teradata. https:\/\/www.teradata.com"},{"key":"494_CR43","volume-title":"Lambada: interactive data analytics on cold data using serverless cloud infrastructure","author":"I M\u00fcller","year":"2020","unstructured":"M\u00fcller\u00a0I, Marroqu\u00edn\u00a0R, Alonso\u00a0G (2020) Lambada: interactive data analytics on cold data using serverless cloud infrastructure. ACM SIGMOD."},{"key":"494_CR44","unstructured":"Apache Parquet. https:\/\/parquet.apache.org"},{"key":"494_CR45","unstructured":"Apache Iceberg. https:\/\/iceberg.apache.org"}],"container-title":["Datenbank-Spektrum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-025-00494-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13222-025-00494-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-025-00494-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T03:28:13Z","timestamp":1742441293000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13222-025-00494-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["494"],"URL":"https:\/\/doi.org\/10.1007\/s13222-025-00494-9","relation":{},"ISSN":["1618-2162","1610-1995"],"issn-type":[{"type":"print","value":"1618-2162"},{"type":"electronic","value":"1610-1995"}],"subject":[],"published":{"date-parts":[[2025,3]]},"assertion":[{"value":"14 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}