{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:55:13Z","timestamp":1775638513049,"version":"3.50.1"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>The trend of decomposing monolithic data management systems into a stack of reusable components has quickly gained momentum across the industry. Although a series of open-source projects have emerged targeting different layers of the stack, execution engines are of special importance due to the complexity they encapsulate, and the demand to optimize price-performance. In this tutorial, we will survey the space of composability in data management, focusing on the execution layer. We will discuss the main APIs, integration with existing and novel data management systems, and how specialized behavior can be accommodated by using extensibility APIs. With an emphasis on analytics, we will take a deeper dive into performance, discussing modern aspects of vectorization, compressed (encoding-aware) execution, and adaptivity. While the presentation is contextualized using real-world examples and experience while developing the Velox open-source execution engine and integrations with existing systems like Presto (Prestissimo) and Spark (Gluten), the concepts and techniques discussed are generally applicable to other execution engines. Finally, we will discuss future trends and ongoing work regarding novel file formats, compressed execution opportunities, and nascent hardware acceleration efforts, highlighting current challenges and open questions. With a survey of the state-of-the-art in this space, we hope this tutorial will help motivate individuals and organizations to embrace composability and promote collaborations across related projects.<\/jats:p>","DOI":"10.14778\/3685800.3685847","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4249-4252","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Composable Data Management: An Execution Overview"],"prefix":"10.14778","volume":"17","author":[{"given":"Pedro","family":"Pedreira","sequence":"first","affiliation":[{"name":"Meta Platforms Inc., Menlo Park, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deepak","family":"Majeti","sequence":"additional","affiliation":[{"name":"IBM, Pittsburgh, PA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Orri","family":"Erling","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc., Menlo Park, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687625"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598587"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626717"},{"key":"e_1_2_1_4_1","unstructured":"Apache Arrow. [n.d.]. A cross-language development platform for in-memory analytics. https:\/\/arrow.apache.org\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526054"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407851"},{"key":"e_1_2_1_7_1","volume-title":"Conference on Innovative Data Systems Research (CIDR).","author":"Boncz Peter A.","year":"2005","unstructured":"Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB\/X100: Hyper-Pipelining Query Execution. In Conference on Innovative Data Systems Research (CIDR)."},{"key":"e_1_2_1_8_1","volume-title":"Conference on Innovative Data Systems Research - CIDR.","author":"Chattopadhyay Biswapesh","year":"2023","unstructured":"Biswapesh Chattopadhyay, Pedro Pedreira, Sameer Agarwal, Suketu Vakharia, Peng Li, Weiran Liu, and Sundaram Narayanan. 2023. Shared Foundations: Modernizing Meta's Data Lakehouse. In Conference on Innovative Data Systems Research - CIDR."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/376284.375692"},{"key":"e_1_2_1_10_1","unstructured":"Voltron Data. [n.d.]. Theseus The GPU query engine for petabyte SQL. https:\/\/voltrondata.com\/theseus.html. Accessed: 2024-03-21."},{"key":"e_1_2_1_11_1","unstructured":"Apache Hadoop. [n.d.]. Apache Hadoop. https:\/\/hadoop.apache.org\/docs\/stable\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3275366.3284966"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476410"},{"key":"e_1_2_1_14_1","volume-title":"1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022","author":"Pasumansky Mosha","year":"2022","unstructured":"Mosha Pasumansky and Benjamin Wagner. 2022. Assembling a Query Engine From Spare Parts. In 1st International Workshop on Composable Data Management Systems, CDMS@VLDB 2022, Sydney, Australia, September 9, 2022."},{"key":"e_1_2_1_15_1","unstructured":"Pedro Pedreira Masha Basmanova and Orri Erling. [n.d.]. Introducing Velox: An open source unified execution engine. https:\/\/engineering.fb.com\/2023\/03\/09\/open-source\/velox-open-source-execution-engine\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554829"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603604"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","author":"Polychroniou Orestis","unstructured":"Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 1493--1508."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685836"},{"key":"e_1_2_1_21_1","unstructured":"Apache Spark. [n.d.]. Apache Spark - Unified Engine for large-scale data analytics. https:\/\/spark.apache.org\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589769"},{"key":"e_1_2_1_23_1","unstructured":"Velox. [n.d.]. Velox Documentation. https:\/\/facebookincubator.github.io\/velox\/. Accessed: 2024-03-21."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685847","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:29:55Z","timestamp":1735622995000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685847"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":23,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685847"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685847","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}