{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:54:53Z","timestamp":1775638493607,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["BI2011\/1,BI2011\/2,SFB1053"],"award-info":[{"award-number":["BI2011\/1,BI2011\/2,SFB1053"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>In this paper, we present a novel pipelined GPU join that accelerates the performance of distributed DBMSs by leveraging GPU resources on fast networks. A key insight is that we enable pipelined join execution by overlapping the network shuffling with the build and probe phases, thereby significantly reducing the GPU idle time. To demonstrate this, we propose novel algorithms for distributed pipelined GPU joins with RDMA and GPUDirect for both arbitrarily large probe- and build-side tables. In our evaluation, we show our pipelined distributed GPU join can reduce the overall runtime of a full query by up to 6\u00d7 against a state-of-the-art CPU-only join.<\/jats:p>","DOI":"10.1145\/3588709","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-26","source":"Crossref","is-referenced-by-count":15,"title":["Distributed GPU Joins on Fast RDMA-capable Networks"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7243-5507","authenticated-orcid":false,"given":"Lasse","family":"Thostrup","sequence":"first","affiliation":[{"name":"Technical University of Darmstadt, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2193-2245","authenticated-orcid":false,"given":"Gloria","family":"Doci","sequence":"additional","affiliation":[{"name":"Snowflake, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8654-5738","authenticated-orcid":false,"given":"Nils","family":"Boeschen","sequence":"additional","affiliation":[{"name":"Technical University of Darmstadt, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3788-6664","authenticated-orcid":false,"given":"Manisha","family":"Luthra","sequence":"additional","affiliation":[{"name":"Technical University of Darmstadt &amp; DFKI, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2744-7836","authenticated-orcid":false,"given":"Carsten","family":"Binnig","sequence":"additional","affiliation":[{"name":"Technical University of Darmstadt &amp; DFKI, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732219.2732227"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544839"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750547"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055540.3055545"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2904483.2904485"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882936"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536274.2536325"},{"key":"e_1_2_2_8_1","volume-title":"USENIX NSDI (Seattle, WA) (NSDI'14)","author":"Dragojevi\u0107 Aleksandar","unstructured":"Aleksandar Dragojevi\u0107, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast Remote Memory. In USENIX NSDI (Seattle, WA) (NSDI'14). USENIX Association, USA, 401--414."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3389133.3389138"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00131"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1565694.1565701"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2010.23"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183734"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337862"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2015.21"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1620585.1620588"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376670"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536206.2536216"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2236584.2236592"},{"key":"e_1_2_2_20_1","volume-title":"Andersen","author":"Kalia Anuj","year":"2016","unstructured":"Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In USENIX OSDI (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 185--201."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389705"},{"key":"e_1_2_2_22_1","volume-title":"Triton Join: Efficiently Scaling the Operator State on GPUs with Fast Interconnects. In ACM SIGMOD.","author":"Lutz Clemens","year":"2022","unstructured":"Clemens Lutz, Sebastian Bre\u00df, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2022. Triton Join: Efficiently Scaling the Operator State on GPUs with Fast Interconnects. In ACM SIGMOD."},{"key":"e_1_2_2_23_1","unstructured":"NVIDIA. 2021a. GPUDirect RDMA. NVIDIA. https:\/\/developer.nvidia.com\/gpudirect"},{"key":"e_1_2_2_24_1","unstructured":"NVIDIA. 2021b. GPUDirect RDMA Design Considerations - Synchronization and Memory Ordering. NVIDIA. https:\/\/docs.nvidia.com\/cuda\/gpudirect-rdma\/index.html#sync-behavior"},{"key":"e_1_2_2_25_1","unstructured":"NVIDIA. 2021c. Mellanox OFED GPUDirect RDMA. NVIDIA. https:\/\/www.mellanox.com\/products\/GPUDirect-RDMA"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457254"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7364051"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436927"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498324"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807207"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380595"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00068"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452816"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544166"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536206.2536210"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055330.3055335"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13222-020-00355-7"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300081"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588709","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588709","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:14Z","timestamp":1750178834000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588709"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588709"],"URL":"https:\/\/doi.org\/10.1145\/3588709","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}