{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:15:39Z","timestamp":1779174939862,"version":"3.51.4"},"reference-count":78,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T00:00:00Z","timestamp":1689724800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>GPU database systems are an effective solution to query optimization, particularly with compilation and data caching. They fall short, however, in end-to-end workloads, as existing compiler toolchains are too expensive for use with short-running queries. In this work, we define and evaluate a runtime-suitable query compilation pipeline for NVIDIA GPUs that extracts high performance with only minimal optimization. In particular, our balanced approach successfully trades minor slowdowns in execution for major speedups in compilation, even as data sizes increase. We demonstrate performance benefits compared to both CPU and GPU database systems using interpreters and compilers, extending query compilation for GPUs beyond cached use cases.<\/jats:p>","DOI":"10.1145\/3603503","type":"journal-article","created":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T11:59:24Z","timestamp":1686311964000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["rNdN: Fast Query Compilation for NVIDIA GPUs"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4841-5259","authenticated-orcid":false,"given":"Alexander","family":"Krolik","sequence":"first","affiliation":[{"name":"McGill University, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0663-7347","authenticated-orcid":false,"given":"Clark","family":"Verbrugge","sequence":"additional","affiliation":[{"name":"McGill University, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6755-9632","authenticated-orcid":false,"given":"Laurie","family":"Hendren","sequence":"additional","affiliation":[{"name":"McGill University, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,7,19]]},"reference":[{"key":"e_1_3_2_2_1","unstructured":"AMD. 2011. AMD Intermediate Language (IL). Retrieved from http:\/\/developer.amd.com\/wordpress\/media\/2012\/10\/AMD_Intermediate_Language_(IL)_Specification_v2.pdf."},{"key":"e_1_3_2_3_1","unstructured":"AMD. 2021. GCN Native ISA LLVM Code Generator\u2013ROCm Documentation 1.0.0 documentation. Retrieved from https:\/\/rocmdocs.amd.com\/en\/latest\/ROCm_Compiler_SDK\/ROCm-Native-ISA.html."},{"key":"e_1_3_2_4_1","unstructured":"AMD. 2022. Let\u2019s Build Everything\u2013GPUOpen. Retrieved from https:\/\/gpuopen.com\/."},{"key":"e_1_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-54807-9_8"},{"key":"e_1_3_2_6_1","doi-asserted-by":"crossref","first-page":"570","DOI":"10.1007\/978-3-319-32149-3_53","volume-title":"Parallel Processing and Applied Mathematics","author":"Bialas Piotr","year":"2016","unstructured":"Piotr Bialas and Adam Strzelecki. 2016. Benchmarking the cost of thread divergence in CUDA. In Parallel Processing and Applied Mathematics, Roman Wyrzykowski, Ewa Deelman, Jack Dongarra, Konrad Karczewski, Jacek Kitowski, and Kazimierz Wiatr (Eds.). Springer International, Cham, 570\u2013579."},{"key":"e_1_3_2_7_1","unstructured":"BlazingSQL Inc.2021. BlazingSQL\u2014High Performance SQL Engine on RAPIDS AI. Retrieved from https:\/\/blazingsql.com\/."},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0512-y"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/177492.177575"},{"key":"e_1_3_2_10_1","volume-title":"Hot Chips Symposium in 2019","author":"Burgess John","year":"2019","unstructured":"John Burgess. 2019. RTX on\u2014The NVIDIA Turing GPU. Hot Chips Symposium in 2019. Retrieved from https:\/\/old.hotchips.org\/hc31\/HC31_2.12_NVIDIA_final.pdf."},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2971677"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/800230.806984"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661189"},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276945.3276951"},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.5441\/002\/edbt.2021.35"},{"key":"e_1_3_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377555.3377892"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3168806"},{"key":"e_1_3_2_18_1","unstructured":"cloudcores. 2021. GitHub\u2014Cloudcores\/CuAssembler. Retrieved from https:\/\/github.com\/cloudcores\/CuAssembler\/."},{"key":"e_1_3_2_19_1","unstructured":"Brett W. Coon John Erik Lindholm Peter C. Mills and John R. Nickolls. 2010. Processing an Indirect Branch Instruction in a SIMD Architecture. Retrieved from https:\/\/patents.google.com\/patent\/US7761697B1\/en."},{"key":"e_1_3_2_20_1","unstructured":"Brett W. Coon John R. Nickolls Lars Nyland Peter C. Mills and John Erik Lindholm. 2012. Indirect Function Call Instructions in a Synchronous Parallel Thread Processor. Retrieved from https:\/\/patents.google.com\/patent\/US8312254B2\/en."},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.247"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH.2016.21"},{"key":"e_1_3_2_23_1","unstructured":"freedesktop.org. 2022. Mesa\u2013GitLab. Retrieved from https:\/\/gitlab.freedesktop.org\/mesa\/mesa."},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.12"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3399666.3399925"},{"key":"e_1_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476321"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/12276.13312"},{"key":"e_1_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863727"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2591635.2667158"},{"key":"e_1_3_2_30_1","unstructured":"Scott Gray. 2016. GitHub\u2014NervanaSystems\/maxas. Retrieved from https:\/\/github.com\/NervanaSystems\/maxas."},{"key":"e_1_3_2_31_1","first-page":"229","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919)","author":"Hayes Ari B.","year":"2019","unstructured":"Ari B. Hayes, Fei Hua, Jin Huang, Yanhao Chen, and Eddy Z. Zhang. 2019. Decoding CUDA binary. In Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201919). IEEE Press, 229\u2013241."},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551833"},{"key":"e_1_3_2_33_1","unstructured":"HEAVY.AI. 2023. GitHub\u2014heavyai\/heavydb: HeavyDB (formerly OmniSciDB). Retrieved from https:\/\/github.com\/heavyai\/heavydb."},{"key":"e_1_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517869"},{"issue":"1","key":"e_1_3_2_35_1","first-page":"40","article-title":"MonetDB: Two decades of research in column-oriented database architectures","volume":"35","author":"Idreos Stratos","year":"2012","unstructured":"Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35, 1 (2012), 40\u201345.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628101"},{"key":"e_1_3_2_37_1","unstructured":"Zhe Jia Marco Maggioni Benjamin Staiger and Daniele Paolo Scarpazza. 2018. Dissecting the NVIDIA Volta GPU architecture via microbenchmarking. arXiv:1804.06826. Retrieved from http:\/\/arxiv.org\/abs\/1804.06826."},{"key":"e_1_3_2_38_1","unstructured":"Zhe Jia Marco Maggioni Jeffrey Smith and Daniele Scarpazza. 2019. Dissecting the NVidia Turing T4 GPU via Microbenchmarking. arxiv:1903.07486. Retrieved from https:\/\/arxiv.org\/abs\/1903.07486."},{"key":"e_1_3_2_39_1","volume-title":"Design and Evaluation of Register Allocation on GPUs","author":"Kalra Charu","year":"2015","unstructured":"Charu Kalra. 2015. Design and Evaluation of Register Allocation on GPUs. Master\u2019s thesis. Northeastern University."},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-020-00643-4"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00027"},{"key":"e_1_3_2_42_1","volume-title":"Understanding the ISA Impact on GPU Architecture","author":"Kothiya Mayank","year":"2014","unstructured":"Mayank Kothiya. 2014. Understanding the ISA Impact on GPU Architecture. Master\u2019s thesis. North Carolina State University."},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370323"},{"key":"e_1_3_2_44_1","unstructured":"Serge Lamikhov-Center. 2020. GitHub\u2014serge1\/ELFIO. Retrieved from https:\/\/github.com\/serge1\/ELFIO."},{"key":"e_1_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476378"},{"key":"e_1_3_2_46_1","unstructured":"Martin Leitner-Ankerl. 2021. martinus\/robin-hood-hashing: Fast & Memory Efficient Hashtable Based on Robin Hood Hashing for C++11\/14\/17\/20. Retrieved from https:\/\/github.com\/martinus\/robin-hood-hashing."},{"key":"e_1_3_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.31"},{"key":"e_1_3_2_48_1","unstructured":"Mark Harris Luke Durant Olivier Giroux and Nick Stam. 2017. Inside Volta: The World\u2019s Most Advanced Data Center GPU\u2014NVIDIA Developer Blog. Retrieved from https:\/\/developer.nvidia.com\/blog\/inside-volta\/."},{"key":"e_1_3_2_49_1","unstructured":"Marcello Maggioni and Charu Chandrasekaran. 2017. Apple LLVM GPU Compiler: Embedded Dragons. US LLVM Developers\u2019 Meeting."},{"key":"e_1_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2015.7095794"},{"key":"e_1_3_2_51_1","unstructured":"Todd Mostak. 2014. An Overview of MapD (Massively Parallel Database). Retrieved from http:\/\/www.smallake.kr\/wp-content\/uploads\/2014\/09\/mapd_overview.pdf."},{"key":"e_1_3_2_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002940"},{"key":"e_1_3_2_53_1","unstructured":"Thomas Neumann. 2020. Database Architects: Linear Time Liveness Analysis. Retrieved from http:\/\/databasearchitects.blogspot.com\/2020\/04\/linear-time-liveness-analysis.html."},{"key":"e_1_3_2_54_1","unstructured":"John R. Nickolls Richard Craig Johnson Robert Steven Glanville and Guillermo Juan Rozas. 2011. Unanimous Branch Instructions in a Parallel Thread Processor. Retrieved from https:\/\/patents.google.com\/patent\/US20110072248\/en."},{"key":"e_1_3_2_55_1","unstructured":"NVIDIA. 2016. GeForce GTX 1080 Whitepaper. Retrieved from http:\/\/international.download.nvidia.com\/geforce-com\/international\/pdfs\/GeForce_GTX_1080_Whitepaper_FINAL.pdf."},{"key":"e_1_3_2_56_1","unstructured":"NVIDIA. 2020. NVIDIA Ampere GA102 GPU Architecture. Retrieved from https:\/\/images.nvidia.com\/aem-dam\/en-zz\/Solutions\/geforce\/ampere\/pdf\/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf."},{"key":"e_1_3_2_57_1","unstructured":"NVIDIA. 2021a. CUDA Binary Utilities :: CUDA Toolkit Documentation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-binary-utilities\/index.html."},{"key":"e_1_3_2_58_1","unstructured":"NVIDIA. 2021b. CUDA Occupancy Calculator :: CUDA Toolkit Documentation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-occupancy-calculator\/index.html."},{"key":"e_1_3_2_59_1","unstructured":"NVIDIA. 2021c. NVCC :: CUDA Toolkit Documentation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-compiler-driver-nvcc\/index.html."},{"key":"e_1_3_2_60_1","unstructured":"NVIDIA. 2021d. Programming Guide :: CUDA Toolkit Documentation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html."},{"key":"e_1_3_2_61_1","unstructured":"NVIDIA. 2021e. PTX ISA :: CUDA Toolkit Documentation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/parallel-thread-execution\/index.html."},{"key":"e_1_3_2_62_1","unstructured":"NVIDIA. 2021f. Volta Tuning Guide :: CUDA Toolkit Documentation. Retrieved from https:\/\/docs.nvidia.com\/cuda\/volta-tuning-guide\/index.html."},{"key":"e_1_3_2_63_1","unstructured":"Robert Ohannessian Jr Michael Alan Fetterman Olivier Giroux Jack H. Choquette Xiaogang Qiu Shirish Gadre and Meenaradchagan Vishnu. 2015. System Method and Computer Program Product for Implementing Software-based Scoreboarding. Retrieved from https:\/\/patents.google.com\/patent\/US20150220341A1\/en."},{"key":"e_1_3_2_64_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425890"},{"key":"e_1_3_2_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/330249.330250"},{"key":"e_1_3_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2016.7568315"},{"key":"e_1_3_2_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380595"},{"key":"e_1_3_2_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3497776.3517771"},{"key":"e_1_3_2_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377918"},{"key":"e_1_3_2_70_1","unstructured":"TPC Council. 2017. TPC Benchmark H."},{"key":"e_1_3_2_71_1","unstructured":"Wladimir J. van der Laan. 2010. GitHub\u2014laanwj\/decuda. Retrieved from https:\/\/github.com\/laanwj\/decuda."},{"key":"e_1_3_2_72_1","unstructured":"Fabian Wahlster. 2019. Implementing SPMD control flow in LLVM using reconverging CFGs. European LLVM Developers\u2019 Meeting."},{"key":"e_1_3_2_73_1","volume-title":"Proceedings of the 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems","author":"Wu Haicheng","year":"2011","unstructured":"Haicheng Wu, Gregory Diamos, Si Li, and Sudhakar Yalamanchili. 2011. Characterization and transformation of unstructured control flow in GPU applications. In Proceedings of the 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems."},{"key":"e_1_3_2_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374520"},{"key":"e_1_3_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503221.3508428"},{"key":"e_1_3_2_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CASES.2015.7324550"},{"key":"e_1_3_2_77_1","unstructured":"Hou Yunqing. 2015. GitHub\u2014hyqneuron\/asfermi. Retrieved from https:\/\/github.com\/hyqneuron\/asfermi\/."},{"key":"e_1_3_2_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2004.1274043"},{"key":"e_1_3_2_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018755"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603503","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3603503","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:26Z","timestamp":1750178786000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603503"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,19]]},"references-count":78,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3603503"],"URL":"https:\/\/doi.org\/10.1145\/3603503","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,19]]},"assertion":[{"value":"2022-09-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}