{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T04:49:06Z","timestamp":1774932546085,"version":"3.50.1"},"reference-count":73,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>Modern enterprise applications and data warehouse systems move data into data lakes for economical and scalability reasons. Data is then stored in popular columnar file formats like Parquet which are optimized for writing using open table formats like Iceberg or Delta. This presents new challenges for existing database systems and their execution engines because excellent performance and scalability when accessing this data in complex analytical queries is expected while data is located in a remote data lake.<\/jats:p>\n          <jats:p>\n            In this work, we present how we adapted the HANA Cloud Database Engine for efficient processing of files in data lakes, which we call\n            <jats:italic toggle=\"yes\">SQL-on-Files<\/jats:italic>\n            (SoF). We motivate this evolution by its relevance for Business Data Cloud, SAP's Lakehouse, we discuss the viability of general architecture choices like\n            <jats:italic toggle=\"yes\">pushdown<\/jats:italic>\n            and\n            <jats:italic toggle=\"yes\">direct access<\/jats:italic>\n            architectures, and give insights into our SoF design decisions towards scalable, analytical query processing around execution engine, optimizer and caching. Our evaluation of SoF shows benefits of direct access over pushdown architectures for a new warehouse benchmark with complex, analytical workloads.\n          <\/jats:p>","DOI":"10.14778\/3750601.3750608","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"4831-4845","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["The HANA Native Query Engine for Lakehouse Systems"],"prefix":"10.14778","volume":"18","author":[{"given":"Daniel","family":"Ritter","sequence":"first","affiliation":[{"name":"SAP"}]},{"given":"Mihnea","family":"Andrei","sequence":"additional","affiliation":[{"name":"SAP"}]},{"given":"Sukhyeun","family":"Cho","sequence":"additional","affiliation":[{"name":"SAP"}]},{"given":"Maik","family":"G\u00f6rgens","sequence":"additional","affiliation":[{"name":"SAP"}]},{"given":"Taehyung","family":"Lee","sequence":"additional","affiliation":[{"name":"SAP"}]},{"given":"Norman","family":"May","sequence":"additional","affiliation":[{"name":"SAP"}]},{"given":"Amit","family":"Pathak","sequence":"additional","affiliation":[{"name":"SAP"}]},{"given":"Paul R.","family":"Willems","sequence":"additional","affiliation":[{"name":"SAP"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524284"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476377"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415560"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526045"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/BIGDATA52589.2021.9671534"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526054"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2501.07771"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3653369"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903741"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476292"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611486"},{"key":"e_1_2_1_12_1","unstructured":"Kira Duwe Angelos-Christos Anadiotis Andrew Lamb Lucas Lersch Boaz Leskes Daniel Ritter and Pinar Tozun. 2025. The Five-Minute Rule for the Cloud: Caching in Analytics Systems. In CIDR. CIDR www.cidrdb.org."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476385"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.JPDC.2023.02.007"},{"key":"e_1_2_1_15_1","unstructured":"The Apache Software Foundation. 2018. Apache Spark - Unified Engine for large-scale data analytics. https:\/\/spark.apache.org\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_16_1","unstructured":"The Apache Software Foundation. 2021. Hello from Apache Hudi | Apache Hudi. https:\/\/hudi.apache.org\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_17_1","unstructured":"The Apache Software Foundation. 2024. Apache Iceberg - Apache Iceberg. https:\/\/iceberg.apache.org\/. Accessed: 2024-04-23."},{"key":"e_1_2_1_18_1","unstructured":"The Apache Software Foundation. 2024. Apache ORC - High-Performance Columnar Storage for Hadoop. https:\/\/orc.apache.org\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_19_1","unstructured":"Trino Software Foundation. 2024. Trino | Distributed SQL query engine for big data. https:\/\/trino.io\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_20_1","first-page":"28","article-title":"The SAP HANA Database - An Architecture Overview","volume":"35","author":"F\u00e4rber Franz","year":"2012","unstructured":"Franz F\u00e4rber, Norman May, Wolfgang Lehner, Philipp Gro\u00dfe, Ingo M\u00fcller, Hannes Rauhe, and Jonathan Dees. 2012. The SAP HANA Database - An Architecture Overview. IEEE Data Eng. Bull. 35, 1 (2012), 28\u201333.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/38713.38742"},{"key":"e_1_2_1_22_1","volume-title":"LakeVilla: Multi-Table Transactions for Lakehouses. arXiv preprint arXiv:2504.20768","author":"G\u00f6tz Tobias","year":"2025","unstructured":"Tobias G\u00f6tz, Daniel Ritter, and Jana Giceva. 2025. LakeVilla: Multi-Table Transactions for Lakehouses. arXiv preprint arXiv:2504.20768 (2025)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/38713.38755"},{"key":"e_1_2_1_24_1","volume-title":"Elastic Compute in SAP HANA Cloud by Example of SAP Integrated Business Planning. Datenbank-Spektrum","author":"Gruschko Boris","year":"2025","unstructured":"Boris Gruschko, Kihong Kim, Hyunjun Kim, Taehyung Lee, Michael Mueller, and Daniel Ritter. 2025. Elastic Compute in SAP HANA Cloud by Example of SAP Integrated Business Planning. Datenbank-Spektrum (2025), 1\u201312."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598584"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/5236.5242"},{"key":"e_1_2_1_27_1","volume-title":"Deep Lake: A Lakehouse for Deep Learning. In 13th Conference on Innovative Data Systems Research, CIDR 2023","author":"Hambardzumyan Sasun","year":"2023","unstructured":"Sasun Hambardzumyan. 2023. Deep Lake: A Lakehouse for Deep Learning. In 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8\u201311, 2023. www.cidrdb.org. https:\/\/www.cidrdb.org\/cidr2023\/papers\/p69-buniatyan.pdf"},{"key":"e_1_2_1_28_1","volume-title":"Gartner Says Cloud Will Become a Business Necessity by","author":"Gartner Inc. 2023.","year":"2028","unstructured":"Gartner Inc. 2023. Gartner Says Cloud Will Become a Business Necessity by 2028. https:\/\/www.gartner.com\/en\/newsroom\/press-releases\/2023-11-29-gartner-says-cloud-will-become-a-business-necessity-by-2028. Accessed: 2024-07-17."},{"key":"e_1_2_1_29_1","unstructured":"Oracle inc. 2025. Oracle Exadata. https:\/\/www.oracle.com\/de\/engineered-systems\/exadata\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_30_1","unstructured":"Prakhar Jain. 2024. [Protocol Change Request] Delta Coordinated Commits #2598. https:\/\/github.com\/delta-io\/delta\/issues\/2598. Accessed: 2024-11-25."},{"key":"e_1_2_1_31_1","volume-title":"Analyzing and Comparing Lakehouse Storage Systems. In 13th Conference on Innovative Data Systems Research, CIDR 2023","author":"Jain Paras","year":"2023","unstructured":"Paras Jain, Peter Kraft, Conor Power, Tathagata Das, Ion Stoica, and Matei Zaharia. 2023. Analyzing and Comparing Lakehouse Storage Systems. In 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8\u201311, 2023. www.cidrdb.org. https:\/\/www.cidrdb.org\/cidr2023\/papers\/p92-jain.pdf"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3722212.3724436"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611484"},{"key":"e_1_2_1_34_1","volume-title":"Fast Updates on Read-Optimized Databases Using Multi-Core CPUs. CoRR abs\/1109.6885","author":"Kr\u00fcger Jens","year":"2011","unstructured":"Jens Kr\u00fcger, Changkyu Kim, Martin Grund, Nadathur Satish, David Schwalb, Jatin Chhugani, Hasso Plattner, Pradeep Dubey, and Alexander Zeier. 2011. Fast Updates on Read-Optimized Databases Using Multi-Core CPUs. CoRR abs\/1109.6885 (2011). arXiv:1109.6885 http:\/\/arxiv.org\/abs\/1109.6885"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589263"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3653368"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610507"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3653388"},{"key":"e_1_2_1_39_1","volume-title":"SAP HANA Cloud: Data Management for Modern Enterprise Applications. In Companion of the 2025 International Conference on Management of Data, SIGMOD\/PODS 2025","author":"May Norman","year":"2025","unstructured":"Norman May, Alexander B\u00f6hm, Daniel Ritter, Frank Renkes, Mihnea Andrei, and Wolfgang Lehner. 2025. SAP HANA Cloud: Data Management for Modern Enterprise Applications. In Companion of the 2025 International Conference on Management of Data, SIGMOD\/PODS 2025, Berlin, Germany, June 22\u201327, 2025. ACM, to appear."},{"key":"e_1_2_1_40_1","volume-title":"International Conference on Extending Database Technology.","author":"May Norman","unstructured":"Norman May, Wolfgang Lehner, P. ShahulHameed, Nitesh Maheshwari, Carsten M\u00fcller, Sudipto Chowdhuri, and Anil K. Goel. 2015. SAP HANA - From Relational OLAP Database to Big Data Infrastructure. In International Conference on Extending Database Technology."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415568"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002940"},{"key":"e_1_2_1_43_1","volume-title":"Freitag","author":"Neumann Thomas","year":"2020","unstructured":"Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12\u201315, 2020, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2020\/papers\/p29-neumann-cidr20.pdf"},{"key":"e_1_2_1_44_1","first-page":"10","article-title":"From the Application to the CPU: Holistic Resource Management for Modern Database Management Systems","volume":"42","author":"Noll Stefan","year":"2019","unstructured":"Stefan Noll, Norman May, Alexander B\u00f6hm, Jan M\u00fchlig, and Jens Teubner. 2019. From the Application to the CPU: Holistic Resource Management for Modern Database Management Systems. IEEE Data Eng. Bull. 42, 1 (2019), 10\u201321. http:\/\/sites.computer.org\/debull\/A19mar\/p10.pdf","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_45_1","unstructured":"Apache Parquet. 2024. Parquet. https:\/\/parquet.apache.org\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_46_1","volume-title":"Dexin Zhu, Lucky Katahanas, Chakrapani Bhat Talapady, Joshua Rowe, Fan Zhang, Rich Draves, Marc Friedman, Ivan Santa Maria Filho, and Amrish Kumar.","author":"Power Conor","year":"2021","unstructured":"Conor Power, Hiren Patel, Alekh Jindal, Jyoti Leeka, Bob Jenkins, Michael Rys, Ed Triou, Dexin Zhu, Lucky Katahanas, Chakrapani Bhat Talapady, Joshua Rowe, Fan Zhang, Rich Draves, Marc Friedman, Ivan Santa Maria Filho, and Amrish Kumar. 2021. The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward. In VLDB 2021."},{"key":"e_1_2_1_47_1","unstructured":"Delta Lake Project. 2023. Delta Sharing. https:\/\/delta.io\/sharing\/ Accessed: 2025-6-12."},{"key":"e_1_2_1_48_1","unstructured":"The Linux Foundation Projects. 2024. Home | Delta Lake. https:\/\/delta.io\/. Accessed: 2024-04-23."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18420\/BTW2023-12"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856319"},{"key":"e_1_2_1_51_1","unstructured":"SAP. 2025. SAP HANA Cloud - Virtual Tables. https:\/\/help.sap.com\/docs\/hana-cloud-database\/sap-hana-cloud-sap-hana-database-data-access-guide\/managing-virtual-tables. Accessed: 2025-03-17."},{"key":"e_1_2_1_52_1","unstructured":"SAP. 2025. SAP HANA Performance Guide for Developers. https:\/\/help.sap.com\/docs\/SAP_HANA_PLATFORM\/9de0171a6027400bb3b9bee385222eff\/4951c0a07a324da58d5bca9685415b0a.html. Accessed: 2024-12-07."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611540.3611547"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3653395"},{"key":"e_1_2_1_55_1","unstructured":"Amazon Web Services. 2020. Amazon Redshift Spectrum adds support for querying open source Apache Hudi and Delta Lake. https:\/\/aws.amazon.com\/about-aws\/whats-new\/2020\/09\/amazon-redshift-spectrum-adds-support-for-querying-open-source-apache-hudi-and-delta-lake\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_56_1","unstructured":"Amazon Web Services. 2025. Amazon S3 Select. https:\/\/docs.aws.amazon.com\/AmazonS3\/latest\/userguide\/selecting-content-from-objects.html. Accessed: 2025-03-17."},{"key":"e_1_2_1_57_1","unstructured":"Amazon Web Services. 2025. Amazon VPC Gateway Endpoints. https:\/\/docs.aws.amazon.com\/vpc\/latest\/privatelink\/gateway-endpoints.html. Accessed: 2025-03-17."},{"key":"e_1_2_1_58_1","unstructured":"Amazon Web Services. 2025. AWS Aqua. https:\/\/aws.amazon.com\/blogs\/aws\/new-aqua-advanced-query-accelerator-for-amazon-redshift\/. Accessed: 2025-03-17."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00196"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352123"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/2093889.2093965"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5220\/0007918704830490"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.1"},{"key":"e_1_2_1_64_1","volume-title":"Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska.","author":"van Renen Alexander","year":"2024","unstructured":"Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Eknath Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, and Tim Kraska. 2024. Why TPC is not enough: An analysis of the Amazon Redshift fleet. In VLDB 2024. https:\/\/www.amazon.science\/publications\/why-tpc-is-not-enough-an-analysis-of-the-amazon-redshift-$eet"},{"key":"e_1_2_1_65_1","volume-title":"International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2022","author":"von Merzljak Leonard","year":"2022","unstructured":"Leonard von Merzljak, Philipp Fent, Thomas Neumann, and Jana Giceva. 2022. What Are You Waiting For? Use Coroutines for Asynchronous I\/O to Hide I\/O Latencies and Maximize the Read Bandwidth!. In International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2022, Sydney, Australia, September 5, 2022, Rajesh Bordawekar and Tirthankar Lahiri (Eds.). 36\u201346. http:\/\/www.adms-conf.org\/2022-camera-ready\/ADMS22_merzljak.pdf"},{"key":"e_1_2_1_66_1","volume-title":"Building An Elastic Query Engine on Disaggregated Storage. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)","author":"Vuppalapati Midhul","year":"2020","unstructured":"Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. 2020. Building An Elastic Query Engine on Disaggregated Storage. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 449\u2013462. https:\/\/www.usenix.org\/conference\/nsdi20\/presentation\/vuppalapati"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685818"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00778-024-00867-8"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00174"},{"key":"e_1_2_1_70_1","volume-title":"Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010, Erich M. Nahum and Dongyan Xu (Eds.). USENIX Association. https:\/\/www.usenix.org\/conference\/hotcloud -cluster-computing-working-sets spark"},{"key":"e_1_2_1_71_1","volume-title":"11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11\u201315, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper17","author":"Zaharia Matei","year":"2021","unstructured":"Matei Zaharia, Ali Ghodsi, Reynold Xin, and Michael Armbrust. 2021. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. In 11th Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11\u201315, 2021, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2021\/papers\/cidr2021_paper17.pdf"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.14778\/3626292.3626298"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457559"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750608","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:39:17Z","timestamp":1758029957000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750608"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":73,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750608"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750608","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}