{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T05:53:33Z","timestamp":1774590813772,"version":"3.50.1"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,11]]},"abstract":"<jats:p>GPUs offer massive parallelism and high-bandwidth memory access, making them an attractive option for accelerating data analytics in database systems. However, while modern GPUs possess more resources than ever before (e.g., higher DRAM bandwidth), efficient system implementations and judicious resource allocations for query processing are still necessary for optimal performance. Database systems can save GPU runtime costs through just-enough resource allocation or improve query throughput with concurrent query processing by leveraging new GPU resource-allocation capabilities, such as Multi-Instance GPU (MIG).<\/jats:p><jats:p>In this paper, we do a cross-stack performance and resource-utilization analysis of four GPU database systems, including Crystal (the state-of-the-art GPU database, performance-wise) and TQP (the latest entry in the GPU database space). We evaluate the bottlenecks of each system through an in-depth microarchitectural study and identify resource underutilization by leveraging the classic roofline model. Based on the insights gained from our investigation, we propose optimizations for both system implementation and resource allocation, using which we are able to achieve 1.9x lower latency for single-query execution and up to 6.5x throughput improvement for concurrent query execution.<\/jats:p>","DOI":"10.14778\/3632093.3632107","type":"journal-article","created":{"date-parts":[[2024,1,20]],"date-time":"2024-01-20T11:26:31Z","timestamp":1705749991000},"page":"441-454","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["GPU Database Systems Characterization and Optimization"],"prefix":"10.14778","volume":"17","author":[{"given":"Jiashen","family":"Cao","sequence":"first","affiliation":[{"name":"Georgia Tech"}]},{"given":"Rathijit","family":"Sen","sequence":"additional","affiliation":[{"name":"Microsoft GSL"}]},{"given":"Matteo","family":"Interlandi","sequence":"additional","affiliation":[{"name":"Microsoft GSL"}]},{"given":"Joy","family":"Arulraj","sequence":"additional","affiliation":[{"name":"Georgia Tech"}]},{"given":"Hyesoon","family":"Kim","sequence":"additional","affiliation":[{"name":"Georgia Tech"}]}],"member":"320","published-online":{"date-parts":[[2024,1,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2017. PCIe 4.0 specification finally out with 16 GT\/s on tap. [Online] Available from: https:\/\/techreport.com\/news\/32064\/pcie-4-0-specification-finally-out-with-16-gts-on-tap\/."},{"key":"e_1_2_1_2_1","unstructured":"2019. PCI-SIG Achieves 32GT\/s with New PCI Express 5.0 Specification. [Online] Available from: https:\/\/www.businesswire.com\/news\/home\/20190529005766\/en\/PCI-SIG%C2%AE-Achieves-32GTs-with-New-PCI-Express%C2%AE-5.0-Specification."},{"key":"e_1_2_1_3_1","unstructured":"2022. PCI-SIG Announces PCI Express 7.0 Specification to Reach 128 GT\/s. [Online] Available from: https:\/\/www.businesswire.com\/news\/home\/20220621005137\/en."},{"key":"e_1_2_1_4_1","unstructured":"2022. PCI-SIG Releases PCIe 6.0 Specification Delivering Record Performance to Power Big Data Applications. [Online] Available from: https:\/\/www.businesswire.com\/news\/home\/20220111005011\/en\/PCI-SIG%C2%AE-Releases-PCIe%C2%AE-6.0-Specification-Delivering-Record-Performance-to-Power-Big-Data-Applications."},{"key":"e_1_2_1_5_1","volume-title":"DBMSs On A Modern Processor: Where Does Time Go? PVLDB","author":"Ailamaki Anastassia","year":"1999","unstructured":"Anastassia Ailamaki, David J DeWitt, Mark D Hill, and David A Wood. 1999. DBMSs On A Modern Processor: Where Does Time Go? PVLDB (1999)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Michael Armbrust Reynold S. Xin Cheng Lian Yin Huai Davies Liu Joseph K. Bradley Xiangrui Meng Tomer Kaftan Michael J. Franklin Ali Ghodsi and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In SIGMOD. 1383--1394.","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_7_1","unstructured":"Rob Armstrong Arthy Sundaram and Fred Oh. 2021. Revealing New Features in the CUDA 11.5 Toolkit. [Online] Available from: https:\/\/developer.nvidia.com\/blog\/revealing-new-features-in-the-cuda-11-5-toolkit\/."},{"key":"e_1_2_1_8_1","volume-title":"Share the tensor tea: how databases can leverage the machine learning ecosystem. PVLDB","author":"Asada Yuki","year":"2022","unstructured":"Yuki Asada, Victor Fu, Apurva Gandhi, Advitya Gemawat, Lihao Zhang, Dong He, Vivek Gupta, Ehi Nosakhare, Dalitso Banda, Rathijit Sen, and Matteo Interlandi. 2022. Share the tensor tea: how databases can leverage the machine learning ecosystem. PVLDB (2022), 3598--3601."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Sara S Baghsorkhi Matthieu Delahaye Sanjay J Patel William D Gropp and Wen-mei W Hwu. 2010. An Adaptive Performance Modeling Tool for GPU Architectures. In PPoPP. 10.","DOI":"10.1145\/1693453.1693470"},{"key":"e_1_2_1_10_1","unstructured":"Peter Bakkum and Srimat Chakradhar. 2010. Efficient Data Management for GPU Databases. https:\/\/github.com\/bakks\/virginian\/."},{"key":"e_1_2_1_11_1","unstructured":"BlazingSQL. 2021. BlazingSQL. https:\/\/github.com\/BlazingDB\/blazingsql."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Nils Boeschen and Carsten Binnig. 2022. GaccO - A GPU-accelerated OLTP DBMS. In SIGMOD. 1003--1016.","DOI":"10.1145\/3514221.3517876"},{"key":"e_1_2_1_13_1","volume-title":"HetExchange: Encapsulating Heterogeneous CPU-GPU Parallelism in JIT Compiled Engines. PVLDB","author":"Chrysogelos Periklis","year":"2019","unstructured":"Periklis Chrysogelos, Manos Karpathiotakis, Raja Appuswamy, and Anastasia Ailamaki. 2019. HetExchange: Encapsulating Heterogeneous CPU-GPU Parallelism in JIT Compiled Engines. PVLDB (2019), 544--556."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Nan Ding and Samuel Williams. 2019. An Instruction Roofline Model for GPUs. In PBMS. 7--18.","DOI":"10.1109\/PMBS49563.2019.00007"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Harish Doraiswamy and Juliana Freire. 2020. A GPU-friendly Geometric Data Model and Algebra for Spatial Queries. In SIGMOD. 1875--1885.","DOI":"10.1145\/3318464.3389774"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Sofoklis Floratos Mengbai Xiao Hao Wang Chengxin Guo Yuan Yuan Rubao Lee and Xiaodong Zhang. 2021. NestGPU: Nested Query Processing on GPU. In ICDE. 1008--1019.","DOI":"10.1109\/ICDE51399.2021.00092"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Henning Funke Sebastian Bre\u00df Stefan Noll Volker Markl and Jens Teubner. 2018. Pipelined Query Processing in Coprocessor Environments. In SIGMOD. 1603--1618.","DOI":"10.1145\/3183713.3183734"},{"key":"e_1_2_1_18_1","volume-title":"Data-Parallel Query Processing on Non-Uniform Data. PVLDB","author":"Funke Henning","year":"2020","unstructured":"Henning Funke and Jens Teubner. 2020. Data-Parallel Query Processing on Non-Uniform Data. PVLDB (2020), 884--897."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Emily Furst Mark Oskin and Bill Howe. 2017. Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries. In DaMON. 1--6.","DOI":"10.1145\/3076113.3076119"},{"key":"e_1_2_1_20_1","volume-title":"Jes\u00fas Camacho-Rodr\u00edguez, and Matteo Interlandi.","author":"Gandhi Apurva","year":"2022","unstructured":"Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jes\u00fas Camacho-Rodr\u00edguez, and Matteo Interlandi. 2022. The Tensor Data Platform: Towards an AI-centric Database System. In CIDR."},{"key":"e_1_2_1_21_1","volume-title":"Jes\u00fas Camacho-Rodr\u00edguez, Konstantinos Karanasos, and Matteo Interlandi.","author":"He Dong","year":"2022","unstructured":"Dong He, Supun C Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jes\u00fas Camacho-Rodr\u00edguez, Konstantinos Karanasos, and Matteo Interlandi. 2022. Query Processing on Tensor Computation Runtimes. PVLDB (2022), 2811--2825."},{"key":"e_1_2_1_22_1","unstructured":"HeavyDB. 2022. HeavyDB. https:\/\/github.com\/heavyai\/heavydb."},{"key":"e_1_2_1_23_1","volume-title":"Hardware-Oblivious Parallelism for in-Memory Column-Stores. PVLDB","author":"Heimel Max","year":"2013","unstructured":"Max Heimel, Michael Saecker, Holger Pirk, Stefan Manegold, and Volker Markl. 2013. Hardware-Oblivious Parallelism for in-Memory Column-Stores. PVLDB (2013), 709--720."},{"key":"e_1_2_1_24_1","volume-title":"Gables: A Roofline Model for Mobile SoCs. In HPCA. 317--330.","author":"Hill Mark","year":"2019","unstructured":"Mark Hill and Vijay Janapa Reddi. 2019. Gables: A Roofline Model for Mobile SoCs. In HPCA. 317--330."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Sunpyo Hong and Hyesoon Kim. 2009. An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness. In ISCA.","DOI":"10.1145\/1555754.1555775"},{"key":"e_1_2_1_26_1","volume-title":"Cache-Aware Roofline Model: Upgrading the Loft","author":"Ilic Aleksandar","year":"2014","unstructured":"Aleksandar Ilic, Frederico Pratas, and Leonel Sousa. 2014. Cache-Aware Roofline Model: Upgrading the Loft. IEEE CAL (2014), 21--24."},{"key":"e_1_2_1_27_1","volume-title":"Ties Robroek, and Pinar Toziin.","author":"Kaas Anders Friis","year":"2022","unstructured":"Anders Friis Kaas, Stilyan Petrov Paleykov, Ties Robroek, and Pinar Toziin. 2022. Deep Learning Training on Multi-Instance GPUs. arXiv:2209.06018 [cs.LG]"},{"key":"e_1_2_1_28_1","unstructured":"KaiGai Kohei. 2022. PG-Strom. https:\/\/github.com\/heterodb\/pg-strom."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Alexander Krolik Clark Verbrugge and Laurie Hendren. 2021. R3d3: Optimized Query Compilation on GPUs. In CGO. 277--288.","DOI":"10.1109\/CGO51591.2021.9370323"},{"key":"e_1_2_1_30_1","volume-title":"LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. 75--88.","author":"Lattner Chris","year":"2004","unstructured":"Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. 75--88."},{"key":"e_1_2_1_31_1","volume-title":"Balancing I\/O and GPU Bandwidth in Big Data Analytics. PVLDB","author":"Li Jing","year":"2016","unstructured":"Jing Li, Hung-Wei Tseng, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2016. HippogriffDB: Balancing I\/O and GPU Bandwidth in Big Data Analytics. PVLDB (2016), 1647--1658."},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Andr\u00e9 Lopes Frederico Pratas Leonel Sousa and Aleksandar Ilic. 2017. Exploring GPU Performance Power and Energy-Efficiency Bounds with Cache-aware Roofline Modeling. In ISPASS. 259--268.","DOI":"10.1109\/ISPASS.2017.7975297"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Clemens Lutz Sebastian Bre\u00df Steffen Zeuch Tilmann Rabl and Volker Markl. 2020. Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects. In SIGMOD. 1633--1649.","DOI":"10.1145\/3318464.3389705"},{"key":"e_1_2_1_34_1","volume-title":"Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects. In SIGMOD. 1017--1032.","author":"Lutz Clemens","year":"2022","unstructured":"Clemens Lutz, Sebastian Bre\u00df, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2022. Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects. In SIGMOD. 1017--1032."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Tobias Maltenberger Ivan Ilic Ilin Tolovski and Tilmann Rabl. 2022. Evaluating Multi-GPU Sorting with Modern Interconnects. In SIGMOD. 1795--1809.","DOI":"10.1145\/3514221.3517842"},{"key":"e_1_2_1_36_1","unstructured":"Lei Mao. 2021. Math-Bound VS Memory-Bound Operations. https:\/\/leimao.github.io\/blog\/Math-Bound-VS-Memory-Bound-Operations\/."},{"key":"e_1_2_1_37_1","volume-title":"Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB","author":"Neumann Thomas","year":"2011","unstructured":"Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB (2011), 539--550."},{"key":"e_1_2_1_38_1","unstructured":"NVIDIA. 2016. nvidia-smi Documentation. [Online] Available from: https:\/\/developer.download.nvidia.com\/compute\/DCGM\/docs\/nvidia-smi-367.38.pdf."},{"key":"e_1_2_1_39_1","unstructured":"NVIDIA. 2020. NVIDIA A100 TENSOR CORE GPU Unprecedented Acceleration at Every Scale. [Online] Available from: https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/a100\/pdf\/nvidia-a100-datasheet-nvidia-us-2188504-web.pdf."},{"key":"e_1_2_1_40_1","unstructured":"NVIDIA. 2021. NVIDIA Multi-Process Service Introduction. [Online] Available from: https:\/\/docs.nvidia.com\/deploy\/mps\/index.html."},{"key":"e_1_2_1_41_1","unstructured":"NVIDIA. 2022. NVIDIA Multi-Instance GPU. [Online] Available from: https:\/\/www.nvidia.com\/en-us\/technologies\/multi-instance-gpu\/."},{"key":"e_1_2_1_42_1","unstructured":"NVIDIA. 2022. NVIDIA Multi-Instance GPU User Guide. [Online] Available from: https:\/\/docs.nvidia.com\/datacenter\/tesla\/mig-user-guide\/."},{"key":"e_1_2_1_43_1","unstructured":"NVIDIA. 2022. NVIDIA NSight Systems User Guide. [Online] Available from: https:\/\/docs.nvidia.com\/nsight-systems\/UserGuide\/index.html."},{"key":"e_1_2_1_44_1","unstructured":"NVIDIA. 2022. Parallel Thread Execution ISA Version 7.8. [Online] Available from: https:\/\/docs.nvidia.com\/cuda\/parallel-thread-execution\/index.html."},{"key":"e_1_2_1_45_1","unstructured":"NVIDIA. 2022. Thrust. [Online] Availble from: https:\/\/docs.nvidia.com\/cuda\/thrust\/index.html."},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Georg Ofenbeck Ruedi Steinmann Victoria Caparros Daniele G. Spampinato and Markus P\u00fcschel. 2014. Applying the Roofline Model. In ISPASS. 76--85.","DOI":"10.1109\/ISPASS.2014.6844463"},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","unstructured":"Patrick O'Neil Elizabeth O'Neil Xuedong Chen and Stephen Revilak. 2009. The Star Schema Benchmark and Augmented Fact Table Indexing. In TPCTC. 237--252.","DOI":"10.1007\/978-3-642-10424-4_17"},{"key":"e_1_2_1_48_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024--8035."},{"key":"e_1_2_1_49_1","volume-title":"Improving Execution Efficiency of Just-in-Time Compilation Based Query Processing on GPUs. PVLDB","author":"Paul Johns","year":"2020","unstructured":"Johns Paul, Bingsheng He, Shengliang Lu, and Chiew Tong Lau. 2020. Improving Execution Efficiency of Just-in-Time Compilation Based Query Processing on GPUs. PVLDB (2020), 202--214."},{"key":"e_1_2_1_50_1","volume-title":"GPL: A GPU-based Pipelined Query Processing Engine. In SIGMOD. 1935--1950.","author":"Paul Johns","year":"2016","unstructured":"Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based Pipelined Query Processing Engine. In SIGMOD. 1935--1950."},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Johns Paul Shengliang Lu Bingsheng He and Chiew Tong Lau. 2021. MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures. In SIGMOD. 1413--1425.","DOI":"10.1145\/3448016.3457254"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Tilmann Rabl Meikel Poess Hans-Arno Jacobsen Patrick O'Neil and Elizabeth O'Neil. 2013. Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance. In ICPE. 361.","DOI":"10.1145\/2479871.2479927"},{"key":"e_1_2_1_53_1","volume-title":"Query Processing on Heterogeneous CPU\/GPU Systems. Comput. Surveys","author":"Rosenfeld Viktor","year":"2023","unstructured":"Viktor Rosenfeld, Sebastian Bre\u00df, and Volker Markl. 2023. Query Processing on Heterogeneous CPU\/GPU Systems. Comput. Surveys (2023), 1--38."},{"key":"e_1_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Rathijit Sen and Karthik Ramachandra. 2018. Characterizing resource sensitivity of database workloads. In HPCA. 657--669.","DOI":"10.1109\/HPCA.2018.00062"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Rathijit Sen and Yuanyuan Tian. 2023. Microarchitectural Analysis of Graph BI Queries on RDBMS. In DaMoN. 102--106.","DOI":"10.1145\/3592980.3595321"},{"key":"e_1_2_1_56_1","unstructured":"Anil Shanbhag. 2020. Crystal GPU Library. https:\/\/github.com\/anilshanbhag\/crystal."},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Anil Shanbhag Samuel Madden and Xiangyao Yu. 2020. A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. In SIGMOD. 1617--1632.","DOI":"10.1145\/3318464.3380595"},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Anil Shanbhag Bobbi W. Yogatama Xiangyao Yu and Samuel Madden. 2022. Tile-Based Lightweight Integer Compression in GPU. In SIGMOD. 1390--1403.","DOI":"10.1145\/3514221.3526132"},{"key":"e_1_2_1_59_1","unstructured":"Jian Shen Ze Wang David Wang Jeremy Shi and Steven Chen. 2019. AresDB. https:\/\/github.com\/uber\/aresdb."},{"key":"e_1_2_1_60_1","doi-asserted-by":"crossref","unstructured":"P. Sioulas P. Chrysogelos M. Karpathiotakis R. Appuswamy and A. Ailamaki. 2019. Hardware-Conscious Hash-Joins on GPUs. In ICDE. 698--709.","DOI":"10.1109\/ICDE.2019.00068"},{"key":"e_1_2_1_61_1","volume-title":"Micro-Architectural Analysis of OLAP: Limitations and Opportunities. PVLDB","author":"Sirin Utku","year":"2020","unstructured":"Utku Sirin and Anastasia Ailamaki. 2020. Micro-Architectural Analysis of OLAP: Limitations and Opportunities. PVLDB (2020), 840--853."},{"key":"e_1_2_1_62_1","volume-title":"A Comprehensive Empirical Study of Query Performance Across GPU DBMSes. SIGMETRICS","author":"Suh Young-Kyoon","year":"2022","unstructured":"Young-Kyoon Suh, Junyoung An, Byungchul Tak, and Gap-Joo Na. 2022. A Comprehensive Empirical Study of Query Performance Across GPU DBMSes. SIGMETRICS (2022), 1--29."},{"key":"e_1_2_1_63_1","unstructured":"Cheng Tan Zhichao Li Jian Zhang Yu Cao Sikai Qi Zherui Liu Yibo Zhu and Chuanxiong Guo. 2021. Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem. arXiv:2109.11067 [cs.DC]"},{"key":"e_1_2_1_64_1","volume-title":"RAPIDS: Collection of Libraries for End to End GPU Data Science. https:\/\/rapids.ai","author":"Development Team RAPIDS","year":"2018","unstructured":"RAPIDS Development Team. 2018. RAPIDS: Collection of Libraries for End to End GPU Data Science. https:\/\/rapids.ai"},{"key":"e_1_2_1_65_1","volume-title":"Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM","author":"Williams Samuel","year":"2009","unstructured":"Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures. Commun. ACM (2009), 65--76."},{"key":"e_1_2_1_66_1","unstructured":"Gene Wu Joseph L. Greathouse Alexander Lyashevsky Nuwan Jayasena and Derek Chiou. 2015. GPGPU Performance and Power Estimation Using Machine Learning. In HPCA. 564--576."},{"key":"e_1_2_1_67_1","volume-title":"Red Fox: An Execution Environment for Relational Query Processing on GPUs. In CGO. 44--54.","author":"Wu Haicheng","year":"2014","unstructured":"Haicheng Wu, Gregory Diamos, Tim Sheard, Molham Aref, Sean Baxter, Michael Garland, and Sudhakar Yalamanchili. 2014. Red Fox: An Execution Environment for Relational Query Processing on GPUs. In CGO. 44--54."},{"key":"e_1_2_1_68_1","volume-title":"Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. PVLDB","author":"Yogatama Bobbi W.","year":"2022","unstructured":"Bobbi W. Yogatama, Weiwei Gong, and Xiangyao Yu. 2022. Orchestrating Data Placement and Query Execution in Heterogeneous CPU-GPU DBMS. PVLDB (2022), 2491--2503."},{"key":"e_1_2_1_69_1","unstructured":"Fuxun Yu Di Wang Longfei Shangguan Minjia Zhang Chenchen Liu and Xiang Chen. 2022. A Survey of Multi-Tenant Deep Learning Inference on GPU. arXiv:2203.09040 [cs.DC]"},{"key":"e_1_2_1_70_1","volume-title":"The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. PVLDB","author":"Yuan Yuan","year":"2013","unstructured":"Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. PVLDB (2013), 817--828."},{"key":"e_1_2_1_71_1","volume-title":"Owens","author":"Zhang Yao","year":"2011","unstructured":"Yao Zhang and John D. Owens. 2011. A Quantitative Performance Analysis Model for GPU Architectures. In HCPA. 382--393."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3632093.3632107","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:59:21Z","timestamp":1731088761000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3632093.3632107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11]]},"references-count":71,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,11]]}},"alternative-id":["10.14778\/3632093.3632107"],"URL":"https:\/\/doi.org\/10.14778\/3632093.3632107","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,11]]},"assertion":[{"value":"2024-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}