{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T20:34:56Z","timestamp":1780346096770,"version":"3.54.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>There has been considerable interest in leveraging GPUs' computational power and high memory bandwidth for analytical database workloads. However, their limited memory capacity remains a fundamental limitation for databases whose sizes far exceed the GPU memory size. This challenge is exacerbated by the slow PCIe data transfer speed, that creates a bottleneck in overall system performance. In this work, we introduce a hybrid CPU-GPU query processing strategy that leverages the distinct strengths of CPU and GPU to alleviate the data transfer bottleneck. Our approach performs highly efficient data filtering on the CPU, which substantially reduces the volume of data transferred to the GPU via PCIe, and offloads compute-intensive operators such as joins to the GPU for further processing. Our evaluation on the TPC-H benchmark at scale factors up to 1000 (1TB), using a single A100 GPU with 80GB memory, demonstrates that our approach can effectively handle datasets significantly larger than the GPU memory size. Moreover, it substantially outperforms a state-of-the-art CPU-only database system in both performance and cost-effectiveness.<\/jats:p>","DOI":"10.14778\/3749646.3749710","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"4518-4531","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Scaling GPU-Accelerated Databases Beyond GPU Memory Size"],"prefix":"10.14778","volume":"18","author":[{"given":"Yinan","family":"Li","sequence":"first","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bailu","family":"Ding","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ziyun","family":"Wei","sequence":"additional","affiliation":[{"name":"Cornell University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lukas M.","family":"Maas","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Momin","family":"Al-Ghosien","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Spyros","family":"Blanas","sequence":"additional","affiliation":[{"name":"The Ohio State University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nicolas","family":"Bruno","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Carlo","family":"Curino","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matteo","family":"Interlandi","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Craig","family":"Peeper","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kaushik","family":"Rajan","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Surajit","family":"Chaudhuri","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Johannes","family":"Gehrke","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. TPC Benchmark H Standard Specification. Revision 2.17.1. https:\/\/www.tpc.org"},{"key":"e_1_2_1_2_1","volume-title":"Arm A64 Instruction Set Architecture: SVE Instructions. https:\/\/developer.arm.com\/documentation\/ddi0602\/2025-03\/SVE-Instructions. [Online","year":"2025","unstructured":"2025. Arm A64 Instruction Set Architecture: SVE Instructions. https:\/\/developer.arm.com\/documentation\/ddi0602\/2025-03\/SVE-Instructions. [Online; accessed July-2025]."},{"key":"e_1_2_1_3_1","volume-title":"Azure Virtual Machines Pricing. https:\/\/https:\/\/azure.microsoft.com\/en-us\/pricing\/details\/virtual-machines\/. [Online","year":"2025","unstructured":"2025. Azure Virtual Machines Pricing. https:\/\/https:\/\/azure.microsoft.com\/en-us\/pricing\/details\/virtual-machines\/. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_4_1","volume-title":"https:\/\/github.com\/BlazingDB\/blazingsql. [Online","author":"SQL.","year":"2025","unstructured":"2025. BlazingSQL. https:\/\/github.com\/BlazingDB\/blazingsql. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_5_1","volume-title":"https:\/\/www.heavy.ai\/product\/heavydb. [Online","author":"DB.","year":"2025","unstructured":"2025. HeavyDB. https:\/\/www.heavy.ai\/product\/heavydb. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_6_1","volume-title":"Microsoft SQL Server. https:\/\/www.microsoft.com\/en-us\/sql-server. [Online","year":"2025","unstructured":"2025. Microsoft SQL Server. https:\/\/www.microsoft.com\/en-us\/sql-server. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_7_1","volume-title":"NVIDIA A100 Tensor Core GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/. [Online","year":"2025","unstructured":"2025. NVIDIA A100 Tensor Core GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_8_1","volume-title":"NVIDIA H100 Tensor Core GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/h100\/. [Online","year":"2025","unstructured":"2025. NVIDIA H100 Tensor Core GPU. https:\/\/www.nvidia.com\/en-us\/data-center\/h100\/. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_9_1","volume-title":"RAPIDS Accelerator For Apache Spark. https:\/\/github.com\/NVIDIA\/spark-rapids. [Online","year":"2025","unstructured":"2025. RAPIDS Accelerator For Apache Spark. https:\/\/github.com\/NVIDIA\/spark-rapids. [Online; accessed Feb-2025]."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142548"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598587"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3662010.3663450"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554853"},{"key":"e_1_2_1_14_1","volume-title":"14th Conference on Innovative Data Systems Research, CIDR 2024","author":"Atwal R. J.","year":"2024","unstructured":"R. J. Atwal, Peter A. Boncz, Ryan Boyd, Antony Courtney, Till D\u00f6hmen, Florian Gerlinghoff, Jeff Huang, Joseph Hwang, Raphael Hyde, Elena Felder, Jacob Lacouture, Yves Le Maout, Boaz Leskes, Yao Liu, Alex Monahan, Dan Perkins, Tino Tereshko, Jordan Tigani, Nick Ursa, Stephanie Wang, and Yannick Welsch. 2024. MotherDuck: DuckDB in the cloud and in the client. In 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14\u201317, 2024. www.cidrdb.org. https:\/\/www.cidrdb.org\/cidr2024\/papers\/p46-atwal.pdf"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/28659.28689"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/0210059"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13222-014-0164-z"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3632093.3632107"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 19th International Conference on Very Large Data Bases (VLDB '93)","author":"Chen Ming-Syan","unstructured":"Ming-Syan Chen, Hui-I Hsiao, and Philip S. Yu. 1993. Applying Hash Filters to Improving the Execution of Bushy Trees. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB '93). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 505\u2013516."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3303753.3303760"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592980.3595313"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389769"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2674005.2674994"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920927"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2747642"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183734"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-12426-6_15"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1620585.1620588"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376670"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551833"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735497"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536360.2536370"},{"key":"e_1_2_1_34_1","unstructured":"Intel Corporation. 2025. Intel\u00ae 64 and IA-32 Architectures Software Developer's Manual Volume 2: Instruction Set Reference. Available at https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/articles\/technical\/intel-sdm.html."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453925"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2236584.2236592"},{"key":"e_1_2_1_37_1","volume-title":"The Art of Computer Programming","author":"Knuth Donald E.","unstructured":"Donald E. Knuth. 2009. The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams (12th ed.). Addison-Wesley Professional."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472456.3472511"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589323"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465322"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903735"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503601"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425890"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457254"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007328.3007336"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2747645"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485126"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436927"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/67544.66937"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380595"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526132"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00068"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064043"},{"key":"e_1_2_1_54_1","volume-title":"Zdonik","author":"Stonebraker Michael","year":"2005","unstructured":"Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. 2005. C-Store: A Column-oriented DBMS. In VLDB. ACM, 553\u2013564. http:\/\/www.vldb.org\/archives\/website\/2005\/program\/paper\/thu\/p553-stonebraker.pdf"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3685980.3685984"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588709"},{"key":"e_1_2_1_57_1","volume-title":"Vectorizing Database Column Scans with Complex Predicates. In International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures - ADMS 2013","author":"Willhalm Thomas","year":"2013","unstructured":"Thomas Willhalm, Ismail Oukid, Ingo M\u00fcller, and Franz Faerber. 2013. Vectorizing Database Column Scans with Complex Predicates. In International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures - ADMS 2013, Riva del Garda, Trento, Italy, August 26, 2013. 1\u201312. http:\/\/www.adms-conf.org\/2013\/muller_adms13.pdf"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687671"},{"key":"e_1_2_1_59_1","volume-title":"Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries. In 14th Conference on Innovative Data Systems Research, CIDR 2024","author":"Yang Yifei","year":"2024","unstructured":"Yifei Yang, Hangdong Zhao, Xiangyao Yu, and Paraschos Koutris. 2024. Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries. In 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14\u201317, 2024. www.cidrdb.org. https:\/\/www.cidrdb.org\/cidr2024\/papers\/p22-yang.pdf"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the Seventh International Conference on Very Large Data Bases -","volume":"7","author":"Yannakakis Mihalis","year":"1981","unstructured":"Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. In Proceedings of the Seventh International Conference on Very Large Data Bases - Volume 7 (Cannes, France) (VLDB '81). VLDB Endowment, 82\u201394."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.14778\/3704965.3704977"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551809"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536206.2536210"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564709"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3090163.3090167"},{"key":"e_1_2_1_66_1","volume-title":"Not Harder. Proc. ACM Manag. Data 3","author":"Zimmerer Andreas","year":"2025","unstructured":"Andreas Zimmerer, Damien Dam, Jan Kossmann, Juliane Waack, Ismail Oukid, and Andreas Kipf. 2025. Pruning in Snowflake: Working Smarter, Not Harder. Proc. ACM Manag. Data 3 (2025)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749710","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T03:36:36Z","timestamp":1757043396000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749710"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":66,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749710"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749710","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}