{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T18:12:39Z","timestamp":1771956759390,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T00:00:00Z","timestamp":1622592000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T00:00:00Z","timestamp":1622592000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"publisher","award":["725286"],"award-info":[{"award-number":["725286"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["The VLDB Journal"],"published-print":{"date-parts":[[2021,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Although compiling queries to efficient machine code has become a common approach for query execution, a number of newly created database system projects still refrain from using compilation. It is sometimes claimed that the intricacies of code generation make compilation-based engines too complex. Also, a major barrier for adoption, especially for interactive ad hoc queries, is long compilation time. In this paper, we examine all stages of compiling query execution engines and show how to reduce compilation overhead. We incorporate the lessons learned from a decade of generating code in HyPer into a design that manages complexity and yields high speed. First, we introduce a code generation framework that establishes abstractions to manage complexity, yet generates code in a single fast pass. Second, we present a program representation whose data structures are tuned to support fast code generation and compilation. Third, we introduce a new compiler backend that is optimized for minimal compile time, and simultaneously, yields superior execution performance to competing approaches, e.g., Volcano-style or bytecode interpretation. We implemented these optimizations in our database system Umbra to show that it is possible to unite fast compilation and fast execution. Indeed, Umbra achieves unprecedentedly low query latencies. On small data sets, it is even faster than interpreter engines like DuckDB and PostgreSQL. At the same time, on large data sets, its throughput is on par with the state-of-the-art compiling system HyPer.<\/jats:p>","DOI":"10.1007\/s00778-020-00643-4","type":"journal-article","created":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T03:41:50Z","timestamp":1622605310000},"page":"883-905","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra"],"prefix":"10.1007","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0004-9686","authenticated-orcid":false,"given":"Timo","family":"Kersten","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Viktor","family":"Leis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Neumann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,6,2]]},"reference":[{"key":"643_CR1","unstructured":"Agarwal, S., Liu, D., Xin, R.: Apache Spark as a compiler: joining a billion rows per second on a laptop. (2016) https:\/\/databricks.com\/blog\/2016\/05\/23\/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html"},{"key":"643_CR2","unstructured":"Boncz, P., Zukowski, M., Nes, N.: MonetDB\/X100: hyper-pipelining query execution. In: CIDR (2005)"},{"issue":"12","key":"643_CR3","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1145\/1409360.1409380","volume":"51","author":"PA Boncz","year":"2008","unstructured":"Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77\u201385 (2008)","journal-title":"Commun. ACM"},{"key":"643_CR4","doi-asserted-by":"crossref","unstructured":"Click, C.: Global code motion\/global value mumbering. In: SIGPLAN, pp. 246\u2013257 (1995)","DOI":"10.1145\/223428.207154"},{"issue":"12","key":"643_CR5","first-page":"1466","volume":"8","author":"A Crotty","year":"2015","unstructured":"Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Binnig, C., \u00c7etintemel, U., Zdonik, S.: An architecture for compiling UDF-centric workflows. PVLDB 8(12), 1466\u20131477 (2015)","journal-title":"PVLDB"},{"key":"643_CR6","unstructured":"Crotty, A., Galakatos, A., Dursun, K., Kraska, T., \u00c7etintemel, U., Zdonik, S.B.: Tupleware: \u201cbig\u201d data, big analytics, small clusters. In: CIDR (2015)"},{"key":"643_CR7","doi-asserted-by":"crossref","unstructured":"Diaconu, C., Freedman, C., Ismert, E., Larson, P., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server\u2019s memory-optimized OLTP engine. In: SIGMOD, pp. 1243\u20131254 (2013)","DOI":"10.1145\/2463676.2463710"},{"key":"643_CR8","unstructured":"Dybvig, R.K., Hieb, R., Butler, T.: Destination-Driven Code Generation. Technical reports, Indiana University Computer Science Department (1990)"},{"key":"643_CR9","doi-asserted-by":"crossref","unstructured":"Funke, H., M\u00fchlig, J., Teubner, J.: Efficient generation of machine code for query compilers. In: DaMoN, pp. 6:1\u20136:7 (2020)","DOI":"10.1145\/3399666.3399925"},{"key":"643_CR10","doi-asserted-by":"crossref","unstructured":"Gupta, A., Agarwal, D., Tan, D., Kulesza, J., Pathak, R., Stefani, S., Srinivasan, V.: Amazon Redshift and the case for simpler data warehouses. In: SIGMOD, pp. 1917\u20131923 (2015)","DOI":"10.1145\/2723372.2742795"},{"key":"643_CR11","unstructured":"Haas, G., Haubenschild, M., Leis, V.: Exploiting directly-attached NVMe arrays in DBMS. In: CIDR (2020)"},{"key":"643_CR12","unstructured":"Hammacher, C.: (2018). https:\/\/v8.dev\/blog\/liftoff"},{"key":"643_CR13","unstructured":"Karpathiotakis, M., Alagiannis, I., Heinis, T., Branco, M., Ailamaki, A.: Just-in-time data virtualization: lightweight data management with ViDa. In: CIDR (2015)"},{"issue":"13","key":"643_CR14","first-page":"2209","volume":"11","author":"T Kersten","year":"2018","unstructured":"Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., Boncz, P.A.: Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB 11(13), 2209\u20132222 (2018)","journal-title":"PVLDB"},{"issue":"10","key":"643_CR15","first-page":"853","volume":"7","author":"Y Klonatos","year":"2014","unstructured":"Klonatos, Y., Koch, C., Rompf, T., Chafi, H.: Building efficient query engines in a high-level language. PVLDB 7(10), 853\u2013864 (2014)","journal-title":"PVLDB"},{"key":"643_CR16","unstructured":"Kobalicek, P.: (2014). https:\/\/github.com\/asmjit\/asmjit"},{"issue":"2","key":"643_CR17","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1007\/s00778-013-0348-4","volume":"23","author":"C Koch","year":"2014","unstructured":"Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., N\u00f6tzli, A., Lupei, D., Shaikhha, A.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23(2), 253\u2013278 (2014)","journal-title":"VLDB J."},{"key":"643_CR18","doi-asserted-by":"crossref","unstructured":"Kohn, A., Leis, V., Neumann, T.: Adaptive execution of compiled queries. In: ICDE (2018)","DOI":"10.1109\/ICDE.2018.00027"},{"issue":"12","key":"643_CR19","first-page":"2150","volume":"11","author":"T Kraska","year":"2018","unstructured":"Kraska, T.: Northstar: an interactive data science system. PVLDB 11(12), 2150\u20132164 (2018)","journal-title":"PVLDB"},{"key":"643_CR20","doi-asserted-by":"crossref","unstructured":"Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613\u2013624 (2010)","DOI":"10.1109\/ICDE.2010.5447892"},{"key":"643_CR21","unstructured":"Lattner, C., Adve, V.: LLVM: A compilation framework for lifelong program analysis and transformation. In: CGO, pp. 75\u201388 (2004)"},{"key":"643_CR22","doi-asserted-by":"crossref","unstructured":"Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD, pp. 743\u2013754 (2014)","DOI":"10.1145\/2588555.2610507"},{"issue":"1","key":"643_CR23","first-page":"1","volume":"11","author":"P Menon","year":"2017","unstructured":"Menon, P., Mowry, T.C., Pavlo, A.: Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last. PVLDB 11(1), 1\u201313 (2017)","journal-title":"PVLDB"},{"key":"643_CR24","unstructured":"Moerkotte, G.: Building query compilers. http:\/\/pi3.informatik.uni-mannheim.de\/~moer\/querycompiler.pdf"},{"issue":"9","key":"643_CR25","first-page":"539","volume":"4","author":"T Neumann","year":"2011","unstructured":"Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539\u2013550 (2011)","journal-title":"PVLDB"},{"key":"643_CR26","unstructured":"Neumann, T.: Linear time liveness analysis (2020). http:\/\/databasearchitects.blogspot.com\/2020\/04\/linear-time-liveness-analysis.html"},{"key":"643_CR27","unstructured":"Neumann, T., Freitag, M.J.: Umbra: a disk-based system with in-memory performance. In: CIDR (2020)"},{"issue":"1","key":"643_CR28","first-page":"3","volume":"37","author":"T Neumann","year":"2014","unstructured":"Neumann, T., Leis, V.: Compiling database queries into machine code. IEEE Data Eng. Bull. 37(1), 3\u201311 (2014)","journal-title":"IEEE Data Eng. Bull."},{"key":"643_CR29","unstructured":"Neumann, T., Leis, V., Kemper, A.: The complete story of joins (in hyper). In: BTW, pp. 31\u201350 (2017)"},{"issue":"9","key":"643_CR30","first-page":"1002","volume":"11","author":"S Palkar","year":"2018","unstructured":"Palkar, S., Thomas, J.J., Narayanan, D., Thaker, P., Palamuttam, R., Negi, P., Shanbhag, A., Schwarzkopf, M., Pirk, H., Amarasinghe, S.P., Madden, S., Zaharia, M.: Evaluating end-to-end optimization for data analytics applications in weld. PVLDB 11(9), 1002\u20131015 (2018)","journal-title":"PVLDB"},{"key":"643_CR31","unstructured":"Palkar, S., Thomas, J.J., Shanbhag, A., Schwarzkopt, M., Amarasinghe, S.P., Zaharia, M.: A common runtime for high performance data analysis. In: CIDR (2017)"},{"key":"643_CR32","unstructured":"Pall, M.: (2012). http:\/\/wiki.luajit.org\/Optimizations"},{"key":"643_CR33","unstructured":"Pall, M.: (2013). http:\/\/wiki.luajit.org\/SSA-IR-2.0"},{"key":"643_CR34","unstructured":"Paroski, D.: Code generation: the inner sanctum of database performance (2016) http:\/\/highscalability.com\/blog\/2016\/9\/7\/code-generation-the-inner-sanctum-of-database-performance.html"},{"key":"643_CR35","unstructured":"Pirk, H., Giceva, J., Pietzuch, P.R.: Thriving in the no man\u2019s land between compilers and databases. In: CIDR (2019)"},{"issue":"14","key":"643_CR36","first-page":"1707","volume":"9","author":"H Pirk","year":"2016","unstructured":"Pirk, H., Moll, O., Zaharia, M., Madden, S.: Voodoo\u2014a vector algebra for portable database performance on modern hardware. PVLDB 9(14), 1707\u20131718 (2016)","journal-title":"PVLDB"},{"key":"643_CR37","doi-asserted-by":"crossref","unstructured":"Poletto, M., Engler, D.R., Kaashoek, M.F.: tcc: a system for fast, flexible, and high-level dynamic code generation. In: SIGPLAN, pp. 109\u2013121 (1997)","DOI":"10.1145\/258916.258926"},{"key":"643_CR38","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1145\/330249.330250","volume":"21","author":"M Poletto","year":"1999","unstructured":"Poletto, M., Sarkar, V.: Linear scan register allocation. ACM Trans. Program. Lang. Syst. 21, 895\u2013913 (1999)","journal-title":"ACM Trans. Program. Lang. Syst."},{"key":"643_CR39","doi-asserted-by":"crossref","unstructured":"Raasveldt, M., M\u00fchleisen, H.: Duckdb: an embeddable analytical database. In: SIGMOD, pp. 1981\u20131984 (2019)","DOI":"10.1145\/3299869.3320212"},{"key":"643_CR40","doi-asserted-by":"publisher","first-page":"e10","DOI":"10.1017\/S0956796818000102","volume":"28","author":"A Shaikhha","year":"2018","unstructured":"Shaikhha, A., Dashti, M., Koch, C.: Push versus pull-based loop fusion in query engines. J. Funct. Program. 28, e10 (2018)","journal-title":"J. Funct. Program."},{"issue":"1","key":"643_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3183653","volume":"43","author":"A Shaikhha","year":"2018","unstructured":"Shaikhha, A., Klonatos, Y., Koch, C.: Building efficient query engines in a high-level language. ACM Trans. Database Syst. 43(1), 1\u201345 (2018)","journal-title":"ACM Trans. Database Syst."},{"key":"643_CR42","doi-asserted-by":"crossref","unstructured":"Shaikhha, A., Klonatos, Y., Parreaux, L., Brown, L., Dashti, M., Koch, C.: How to architect a query compiler. In: SIGMOD, pp. 1907\u20131922 (2016)","DOI":"10.1145\/2882903.2915244"},{"key":"643_CR43","doi-asserted-by":"crossref","unstructured":"Tahboub, R.Y., Essertel, G.M., Rompf, T.: How to architect a query compiler, revisited. In: SIGMOD, pp. 307\u2013322 (2018)","DOI":"10.1145\/3183713.3196893"},{"key":"643_CR44","doi-asserted-by":"crossref","unstructured":"van Renen, A., Leis, V., Kemper, A., Neumann, T., Hashida, T., Oe, K., Doi, Y., Harada, L., Sato, M.: Managing non-volatile memory in database systems. In: SIGMOD, pp. 1541\u20131555 (2018)","DOI":"10.1145\/3183713.3196897"},{"key":"643_CR45","doi-asserted-by":"crossref","unstructured":"Vogelsgesang, A., Haubenschild, M., Finis, J., Kemper, A., Leis, V., M\u00fchlbauer, T., Neumann, T., Then, M.: Get real: how benchmarks fail to represent the real world. In: DBTest (2018)","DOI":"10.1145\/3209950.3209952"},{"issue":"1","key":"643_CR46","first-page":"31","volume":"37","author":"S Wanderman-Milne","year":"2014","unstructured":"Wanderman-Milne, S., Li, N.: Runtime code generation in Cloudera Impala. IEEE Data Eng. Bull. 37(1), 31\u201337 (2014)","journal-title":"IEEE Data Eng. Bull."},{"key":"643_CR47","doi-asserted-by":"crossref","unstructured":"Zhang, R., Debray, S., Snodgrass, R.T.: Micro-specialization: dynamic code specialization of database management systems. In: CGO, pp. 63\u201373 (2012)","DOI":"10.1145\/2259016.2259025"},{"key":"643_CR48","doi-asserted-by":"crossref","unstructured":"Zhang, R., Snodgrass, R.T., Debray, S.: Micro-specialization in DBMSes. In: ICDE, pp. 690\u2013701 (2012)","DOI":"10.1109\/ICDE.2012.110"}],"container-title":["The VLDB Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-020-00643-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00778-020-00643-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-020-00643-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T10:32:00Z","timestamp":1628591520000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00778-020-00643-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,2]]},"references-count":48,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["643"],"URL":"https:\/\/doi.org\/10.1007\/s00778-020-00643-4","relation":{},"ISSN":["1066-8888","0949-877X"],"issn-type":[{"value":"1066-8888","type":"print"},{"value":"0949-877X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,2]]},"assertion":[{"value":"12 March 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 August 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}