{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,24]],"date-time":"2025-08-24T01:22:51Z","timestamp":1755998571555,"version":"3.37.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:p>Although user-defined functions (UDFs) are a popular way to augment SQL's declarative approach with procedural code, the mismatch between programming paradigms creates a fundamental optimization challenge. UDF inlining automatically removes all UDF calls by replacing them with equivalent SQL subqueries. Although inlining leaves queries entirely in SQL (resulting in large performance gains), we observe that inlining the entire UDF often leads to sub-optimal performance. A better approach is to analyze the UDF, deconstruct it into smaller pieces, and inline only the pieces that help query optimization. To achieve this, we propose UDF outlining, a technique to intentionally hide pieces of a UDF from the optimizer, resulting in simpler UDFs and significantly faster query plans. Our implementation (PRISM) demonstrates that UDF outlining improves performance over conventional inlining (on average 1.29\u00d7 speedup for DuckDB and 298.73\u00d7 for SQL Server) through a combination of more effective unnesting, improved data skipping, and by avoiding unnecessary joins.<\/jats:p>","DOI":"10.14778\/3696435.3696436","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:41:46Z","timestamp":1739306506000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining"],"prefix":"10.14778","volume":"18","author":[{"given":"Samuel","family":"Arch","sequence":"first","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Yuchen","family":"Liu","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Todd C.","family":"Mowry","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Jignesh M.","family":"Patel","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Andrew","family":"Pavlo","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"V Aho Alfred S Lam Monica and D Ullman Jeffrey. 2007. Compilers Principles Techniques & Tools. pearson Education."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/390013.808479"},{"key":"e_1_2_1_3_1","unstructured":"Samuel Arch Arham Chopra Mayank Baranwal. 2023. Add support for nested laterals #7528. https:\/\/github.com\/duckdb\/duckdb\/pull\/7528."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462180"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 16th ACM SIGPLAN-SIGACT POPL symposium. 25--35","author":"R. Cytron","year":"1989","unstructured":"R. Cytron et al. 1989. An efficient method of computing static single assignment form. In Proceedings of the 16th ACM SIGPLAN-SIGACT POPL symposium. 25--35."},{"key":"e_1_2_1_6_1","unstructured":"E. Darling. 2022. When Does Scalar UDF Inlining Work In SQL Server? https:\/\/erikdarlingdata.com\/when-does-udf-inlining-kick-in\/."},{"key":"e_1_2_1_7_1","unstructured":"Inc. (pganalyze) Duboce Labs. 2024. C library for accessing the PostgreSQL parser outside of the server environment. https:\/\/github.com\/pganalyze\/libpg_query."},{"key":"e_1_2_1_8_1","unstructured":"DuckDB. 2023. Group by never completes #9718. https:\/\/github.com\/duckdb\/duckdb\/issues\/9718."},{"key":"e_1_2_1_9_1","unstructured":"DuckDB. 2024. Overview of DuckDB Internals. https:\/\/duckdb.org\/docs\/internals\/overview.html."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 2007 ACM SIGMOD Conference. 993--1004","author":"M","unstructured":"M Elhemali et al. 2007. Execution strategies for SQL subqueries. In Proceedings of the 2007 ACM SIGMOD Conference. 993--1004."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 2017 ACM Conference. 1663--1666","author":"Emani K Venkatesh","year":"2017","unstructured":"K Venkatesh Emani, Tejas Deshpande, Karthik Ramachandra, and S Sudarshan. 2017. Dbridge: Translating imperative code to sql. In Proceedings of the 2017 ACM Conference. 1663--1666."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 2016 International Conference on Management of Data. ACM, San Francisco California USA, 1781--1796","author":"Emani K. Venkatesh","unstructured":"K. Venkatesh Emani, Karthik Ramachandra, Subhro Bhattacharya, and S. Sudarshan. 2016. Extracting Equivalent SQL from Imperative Code in Database Applications. In Proceedings of the 2016 International Conference on Management of Data. ACM, San Francisco California USA, 1781--1796."},{"key":"e_1_2_1_13_1","unstructured":"Venkatesh Emani Avrilia Floratou and Carlo Curino. 2024. PyFroid: Scaling Data Analysis on a Commodity Workstation. (2024)."},{"key":"e_1_2_1_14_1","unstructured":"Steven Feuerstein. 2017. Speed up execution of your functions inside SQL statements with UDF pragma. https:\/\/stevenfeuersteinonplsql.blogspot.com\/2017\/03\/speed-up-execution-of-your-functions.html."},{"key":"e_1_2_1_15_1","first-page":"12","article-title":"YeSQL: rich user-defined functions without the overhead","volume":"15","author":"Foufoulas Y","year":"2022","unstructured":"Y Foufoulas, A Simitsis, and Y Ioannidis. 2022. YeSQL: rich user-defined functions without the overhead. Proc. VLDB Endow. 15, 12 (Aug. 2022), 3730--3733.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_16_1","first-page":"10","article-title":"YeSQL: \"you extend SQL\" with rich and highly performant user-defined functions in relational databases","volume":"15","author":"Foufoulas Yannis","year":"2022","unstructured":"Yannis Foufoulas, Alkis Simitsis, Lefteris Stamatogiannakis, and Yannis Ioannidis. 2022. YeSQL: \"you extend SQL\" with rich and highly performant user-defined functions in relational databases. Proc. VLDB Endow. 15, 10 (June 2022).","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_17_1","volume-title":"SQL. In CIDR 2024, Conference on Innovative Data Systems Research.","author":"Franz Kai","year":"2024","unstructured":"Kai Franz, Samuel I Arch, Denis Hirn, Torsten Grust, Todd Mowry, and Pavlo. 2024. Dear User-Defined Functions, Inlining isn't working out so great for us. Let's try batching to make our relationship work. Sincerely, SQL. In CIDR 2024, Conference on Innovative Data Systems Research."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/376284.375748"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489501"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389736"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","first-page":"8","DOI":"10.14778\/3457390.3457402","article-title":"Procedural extensions of SQL: understanding their usage in the wild","volume":"14","author":"Gupta S","year":"2021","unstructured":"S Gupta and K Ramachandra. 2021. Procedural extensions of SQL: understanding their usage in the wild. Proc. VLDB Endow. 14, 8 (April 2021), 1378--1391.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453975"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/800152.804919"},{"key":"e_1_2_1_24_1","unstructured":"Denis Hirn. 2023. Decorrelation and parallelization of recursive and materialized CTEs #10357. https:\/\/github.com\/duckdb\/duckdb\/pull\/10357."},{"key":"e_1_2_1_25_1","unstructured":"Denis Hirn. 2024. Apfel. https:\/\/apfel-db.cs.uni-tuebingen.de\/."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457272"},{"key":"e_1_2_1_27_1","volume-title":"Magpie: Python at Speed and Scale using Cloud Backends.","author":"A Jindal","year":"2021","unstructured":"A Jindal et al. 2021. Magpie: Python at Speed and Scale using Cloud Backends. (2021)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/178243.178258"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.14778\/3611479.3611539"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551801"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/319732.319745"},{"key":"e_1_2_1_32_1","volume-title":"CIDR Conference.","author":"Kl\u00e4be Steffen","year":"2022","unstructured":"Steffen Kl\u00e4be, Bobby DeSantis, Stefan Hagedorn, and Kai-Uwe Sattler. 2022. Accelerating python udfs in vectorized query execution. CIDR Conference."},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1369396.1370017","article-title":"Design of the Java HotSpot\u2122 client compiler for Java 6","volume":"5","author":"T Kotzmann","year":"2008","unstructured":"T Kotzmann et al. 2008. Design of the Java HotSpot\u2122 client compiler for Java 6. ACM TACO 5, 1 (2008), 1--32.","journal-title":"ACM TACO"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 2011 ACM SIGMOD Conference. 1177--1184","unstructured":"P-\u00c5 Larson et al. 2011. SQL server column store indexes. In Proceedings of the 2011 ACM SIGMOD Conference. 1177--1184."},{"key":"e_1_2_1_35_1","volume-title":"MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE\/ACM CGO. 2--14.","author":"C Lattner","year":"2021","unstructured":"C Lattner et al. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE\/ACM CGO. 2--14."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_2_1_37_1","unstructured":"Viktor Leis. 2023. PerfEvent. https:\/\/github.com\/viktorleis\/perfevent."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 2014 ACM SIGMOD Conference. 743--754","author":"Leis Viktor","year":"2014","unstructured":"Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In Proceedings of the 2014 ACM SIGMOD Conference. 743--754."},{"key":"e_1_2_1_39_1","unstructured":"X Liu V Emani A Floratou J Cahoon P Seamark and C Curino. 2023. PolySem: Efficient Polyglot Analytics on Semantic Data. (2023)."},{"key":"e_1_2_1_40_1","unstructured":"Microsoft. 2016. Scalar User-Defined Functions for In-Memory OLTP. https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/in-memory-oltp\/scalar-user-defined-functions-for-in-memory-oltp\/."},{"key":"e_1_2_1_41_1","unstructured":"Hannes M\u00fchleisen. 2023. 0.9.0 Preview Release \"Undulata\". https:\/\/github.com\/duckdb\/duckdb\/releases\/tag\/v0.9.0."},{"key":"e_1_2_1_42_1","volume-title":"Technologie und Web (BTW 2015)","author":"Neumann Thomas","year":"2015","unstructured":"Thomas Neumann and Alfons Kemper. 2015. Unnesting arbitrary queries. Datenbanksysteme f\u00fcr Business, Technologie und Web (BTW 2015) (2015)."},{"key":"e_1_2_1_43_1","unstructured":"Oracle. 2001. Oracle9i Database New Features. https:\/\/docs.oracle.com\/cd\/A91202_01\/901_doc\/server.901\/a90120\/ch2_feat.htm."},{"key":"e_1_2_1_44_1","unstructured":"Mark Raasveldt. 2023. Correlated Subqueries in SQL. https:\/\/duckdb.org\/2023\/05\/26\/correlated-subqueries-in-sql.html."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the 2019 SIGMOD Conference. 1981--1984","author":"Raasveldt Mark","year":"2019","unstructured":"Mark Raasveldt and Hannes M\u00fchleisen. 2019. Duckdb: an embeddable analytical database. In Proceedings of the 2019 SIGMOD Conference. 1981--1984."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3186728.3164140"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 15th ACM SIGPLAN-SIGACT POPL.","author":"Rosen B K","year":"1988","unstructured":"B K Rosen, M N Wegman, and F K Zadeck. 1988. Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT POPL."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the Twelfth ICDE. 450--458","author":"Seshadri P","year":"1996","unstructured":"P Seshadri, H Pirahesh, and TY C Leung. 1996. Complex query decorrelation. In Proceedings of the Twelfth ICDE. 450--458."},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"V Simhadri K Ramachandra A Chaitanya R Guravannavar and S. Sudarshan. 2014. Decorrelation of user defined function invocations in queries. In 2014 IEEE 30th ICDE. IEEE Chicago IL USA 532--543.","DOI":"10.1109\/ICDE.2014.6816679"},{"key":"e_1_2_1_50_1","unstructured":"SingleStore. 2021. SingleStoreDB Cloud Release Notes. https:\/\/docs.singlestore.com\/cloud\/release-notes\/singlestoredb-cloud-release-notes\/."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 2021 SIGMOD Conference. ACM, 1718--1731","author":"L","unstructured":"L Spiegelberg et al. 2021. Tuplex: Data Science in Python at Native Code Speed. In Proceedings of the 2021 SIGMOD Conference. ACM, 1718--1731."},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","first-page":"12","DOI":"10.14778\/3352063.3352109","article-title":"Tuplex: robust, efficient analytics when Python rules","volume":"12","author":"Spiegelberg L F.","year":"2019","unstructured":"L F. Spiegelberg and T Kraska. 2019. Tuplex: robust, efficient analytics when Python rules. Proc. VLDB Endow. 12, 12 (Aug. 2019), 1958--1961.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software. 187--204","author":"T","unstructured":"T W\u00fcrthinger et al. 2013. One VM to rule them all. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software. 187--204."},{"key":"e_1_2_1_54_1","volume-title":"Proc. ACM Program. Lang. 5, OOPSLA (Oct.","author":"Zhang G","year":"2021","unstructured":"G Zhang, Y Xu, X Shen, and I Dillig. 2021. UDF to SQL translation through compositional lazy inductive synthesis. Proc. ACM Program. Lang. 5, OOPSLA (Oct. 2021), 1--26."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3696435.3696436","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T20:42:10Z","timestamp":1739306530000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3696435.3696436"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["10.14778\/3696435.3696436"],"URL":"https:\/\/doi.org\/10.14778\/3696435.3696436","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,9]]},"assertion":[{"value":"2025-02-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}