{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T06:40:16Z","timestamp":1735627216952,"version":"3.32.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>Vectorized execution engines process large datasets by decomposing computations into concise (tight) loops, which can be more efficiently executed by modern hardware. Providing loops that are optimal for execution usually adds burden to the software development process, as developers are required to understand details of vectorized execution, columnar data layout, data encodings, and the code compilation process itself, presenting a steep learning curve and challenges to organizations building and scaling large engineering teams. Due to their large quantity, scalar function authoring accentuates this problem. In our experience building the Velox open source execution engine, we have observed that exposing a large number of developers to the complexity inherent to vectorization resulted in a disproportionate amount of bugs and performance inefficiencies. In this paper, we describe the simple function interface (SFI) created to address this issue. SFI highly simplifies scalar function authoring by encapsulating the vectorization complexity required to generate tight loops, and presenting developers with a simpler, conciser, and more natural row-based interface - without sacrificing performance. SFI also hides columnar layout details, while providing developers the flexibility to efficiently implement advanced features such as functions with nested and recursive parameter types, type variables, variadic parameters, and generic types. Today, more than a thousand functions have been added to Velox using the SFI, implementing popular open source SQL dialects and internal domain-specific use cases at Meta, and are in active production use. While this paper presents implementation details, performance pitfalls, experimental results, and our overall experience developing the state-of-the-art Velox vectorized execution engine, we believe the concepts and trade-offs to be fundamentally equivalent and generally applicable to other vectorized engines.<\/jats:p>","DOI":"10.14778\/3685800.3685836","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4187-4199","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Simple (yet Efficient) Function Authoring for Vectorized Engines"],"prefix":"10.14778","volume":"17","author":[{"given":"Laith","family":"Sakka","sequence":"first","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Pedro","family":"Pedreira","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Orri","family":"Erling","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Masha","family":"Basmanova","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Kevin","family":"Wilfong","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Wei","family":"He","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Xiaoxuan","family":"Meng","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Krishna","family":"Pai","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]},{"given":"Bikramjeet","family":"Vig","sequence":"additional","affiliation":[{"name":"Meta Platforms Inc."}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626717"},{"key":"e_1_2_1_2_1","unstructured":"Apache Arrow. [n.d.]. A cross-language development platform for in-memory analytics. https:\/\/arrow.apache.org\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526054"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407851"},{"key":"e_1_2_1_5_1","volume-title":"Conference on Innovative Data Systems Research (CIDR).","author":"Boncz Peter A.","year":"2005","unstructured":"Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB\/X100: Hyper-Pipelining Query Execution. In Conference on Innovative Data Systems Research (CIDR)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352121"},{"key":"e_1_2_1_7_1","volume-title":"Conference on Innovative Data Systems Research - CIDR.","author":"Chattopadhyay Biswapesh","year":"2023","unstructured":"Biswapesh Chattopadhyay, Pedro Pedreira, Sameer Agarwal, Suketu Vakharia, Peng Li, Weiran Liu, and Sundaram Narayanan. 2023. Shared Foundations: Modernizing Meta's Data Lakehouse. In Conference on Innovative Data Systems Research - CIDR."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/376284.375692"},{"key":"e_1_2_1_9_1","unstructured":"Apache DataFusion. [n.d.]. Apache Arrow DataFusion Documentation. https:\/\/arrow.apache.org\/datafusion\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_10_1","unstructured":"The Presto Foundation. [n.d.]. Presto Documentation: Functions and Operators. https:\/\/prestodb.io\/docs\/current\/functions.html. Accessed: 2024-03-21."},{"key":"e_1_2_1_11_1","unstructured":"The Presto Foundation. [n.d.]. Presto: Free Open-Source SQL Query Engine for any Data. https:\/\/prestodb.io\/. Accessed: 2024-03-21."},{"volume-title":"Proceedings 14th International Conference on Data Engineering. 370-379","author":"Goldstein J.","key":"e_1_2_1_12_1","unstructured":"J. Goldstein, R. Ramakrishnan, and U. Shaft. 1998. Compressing relations and indexes. In Proceedings 14th International Conference on Data Engineering. 370-379."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3275366.3284966"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551852"},{"key":"e_1_2_1_15_1","volume-title":"Freitag","author":"Neumann Thomas","year":"2020","unstructured":"Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12--15, 2020, Online Proceedings. www.cidrdb.org."},{"key":"e_1_2_1_16_1","unstructured":"Pedro Pedreira. [n.d.]. Aligning Velox and Apache Arrow: Towards composable data management. https:\/\/engineering.fb.com\/2024\/02\/20\/developer-tools\/velox-apache-arrow-15-composable-data-management\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_17_1","unstructured":"Pedro Pedreira Masha Basmanova and Orri Erling. [n.d.]. Introducing Velox: An Open Source Unified Execution Engine. https:\/\/engineering.fb.com\/2023\/03\/09\/open-source\/velox-open-source-execution-engine\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554829"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603604"},{"key":"e_1_2_1_20_1","first-page":"1","article-title":"Composable Data Management","volume":"14","author":"Pedreira Pedro","year":"2024","unstructured":"Pedro Pedreira, Deepak Majeti, and Orri Erling. 2024. Composable Data Management: An Execution Overview. Proc. VLDB Endow. 14, 1 (aug 2024).","journal-title":"An Execution Overview. Proc. VLDB Endow."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824078"},{"volume-title":"Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","author":"Polychroniou Orestis","key":"e_1_2_1_22_1","unstructured":"Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 1493--1508."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_2_1_24_1","unstructured":"Laith Sakka. [n.d.]. Velox Blog: Simple Functions: Efficient Complex Types. https:\/\/velox-lib.io\/blog\/simple-functions-2\/. Accessed: 2024-07-17."},{"key":"e_1_2_1_25_1","volume-title":"Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813","author":"Sethi Raghav","year":"2019","unstructured":"Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). 1802--1813."},{"key":"e_1_2_1_26_1","unstructured":"Apache Spark. [n.d.]. Apache Spark - Unified Engine for large-scale data analytics. https:\/\/spark.apache.org\/. Accessed: 2024-03-21."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589769"},{"key":"e_1_2_1_28_1","volume-title":"Engineering A Compiler","author":"Torczon Linda","unstructured":"Linda Torczon and Keith Cooper. 2007. Engineering A Compiler (2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.","edition":"2"},{"key":"e_1_2_1_29_1","volume-title":"The Complete Guide","author":"Vandevoorde David","unstructured":"David Vandevoorde, Nicolai M. Josuttis, and Douglas Gregor. 2017. C++ Templates: The Complete Guide (2nd Edition) (2nd ed.). Addison-Wesley Professional.","edition":"2"},{"key":"e_1_2_1_30_1","unstructured":"Velox. [n.d.]. Bugs fixed in mapFromEntries function. https:\/\/github.com\/facebookincubator\/velox\/issues?q=label:map_from_entries_bugs. Accessed: 2024-03-21."},{"key":"e_1_2_1_31_1","unstructured":"Velox. [n.d.]. MultiMapFromEntries function source code. https:\/\/github.com\/facebookincubator\/velox\/blob\/main\/velox\/functions\/prestosql\/MultimapFromEntries.h. Accessed: 2024-03-21."},{"key":"e_1_2_1_32_1","unstructured":"Velox. [n.d.]. SimpleFunctionAdapter source code. https:\/\/github.com\/facebookincubator\/velox\/blob\/main\/velox\/expression\/SimpleFunctionAdapter.h. Accessed: 2024-03-21."},{"key":"e_1_2_1_33_1","unstructured":"Velox. [n.d.]. Velox: A C++ vectorized database acceleration library. https:\/\/github.com\/facebookincubator\/velox. Accessed: 2024-03-21."},{"key":"e_1_2_1_34_1","unstructured":"Velox. [n.d.]. Velox Documentation. https:\/\/facebookincubator.github.io\/velox\/. Accessed: 2024-03-21."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685836","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:31:40Z","timestamp":1735623100000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685836"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":34,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685836"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685836","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}