{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:45:20Z","timestamp":1774982720523,"version":"3.50.1"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T00:00:00Z","timestamp":1734480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,12,18]]},"abstract":"<jats:p>Recent advances in Dual In-line Memory Modules (DIMMs) allow DIMMs to support Processing-In-DIMM (PID) by placing In-DIMM Processors (IDPs) near their memory banks. Prior studies have shown that in-memory joins can benefit from PID by offloading their operations onto the IDPs and exploiting the high internal memory bandwidth of DIMMs. Aimed at evenly balancing the computational loads between the IDPs, the existing algorithms perform IDP-wise global partitioning on input tables and then make each IDP process a partition of the input tables. Unfortunately, we find that the existing PID join algorithms achieve low performance and scalability with skewed input tables. With skewed input tables, the IDP-wise global partitioning incurs imbalanced loads between the IDPs, making the IDPs remain idle until the heaviest-load IDP completes processing its partition. To fully exploit the IDPs for accelerating in-memory joins involving skewed input tables, therefore, we need a new PID join algorithm which achieves high skew resistance by mitigating the imbalanced inter-IDP loads. In this paper, we present SPID-Join, a skew-resistant PID join algorithm which exploits two parallelisms inherent in DIMM architectures, namely bank- and rank-level parallelisms. By replicating join keys across the banks within a rank and across ranks, SPID-Join significantly increases the internal memory bandwidth and computational throughput allocated to each join key, improving the load balance between the IDPs and accelerating join executions. SPID-Join exploits the bank- and the rank-level parallelisms to minimize join key replication overheads and support a wider range of join key replication ratios. Despite achieving high skew resistance, SPID-Join exhibits a trade-off between the join key replication ratio and the join execution latency, making the best-performing join key replication ratio depend on join and PID system configurations. We, therefore, augment SPID-Join with a cost model which identifies the best-performing join key replication ratio for given join and PID system configurations. By accurately modeling and scaling the IDPs' throughput and the inter-IDP communication bandwidth, the cost model accurately captures the impact of the join key replication ratio on SPID-Join. Our experimental results using eight UPMEM DIMMs, which collectively provide a total of 1,024 IDPs, show that SPID-Join achieves up to 10.38x faster join executions over PID-Join, the state-of-the-art PID join algorithm, with highly skewed input tables.<\/jats:p>","DOI":"10.1145\/3698827","type":"journal-article","created":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T16:40:35Z","timestamp":1734712835000},"page":"1-27","source":"Crossref","is-referenced-by-count":5,"title":["SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0501-794X","authenticated-orcid":false,"given":"Suhyun","family":"Lee","sequence":"first","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2665-6273","authenticated-orcid":false,"given":"Chaemin","family":"Lim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2363-501X","authenticated-orcid":false,"given":"Jinwoo","family":"Choi","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1885-0578","authenticated-orcid":false,"given":"Heelim","family":"Choi","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9254-922X","authenticated-orcid":false,"given":"Chan","family":"Lee","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3725-0380","authenticated-orcid":false,"given":"Yongjun","family":"Park","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0757-2725","authenticated-orcid":false,"given":"Kwanghyun","family":"Park","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0762-7901","authenticated-orcid":false,"given":"Hanjun","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1015-9969","authenticated-orcid":false,"given":"Youngsok","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2024,12,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732219.2732227"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544839"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452831"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735499"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13222-023-00456-z"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592980.3595323"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592980.3595312"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523626"},{"key":"e_1_2_1_10_1","volume-title":"Breaking the Memory Wall in MonetDB. Communications of the ACM (CACM)","author":"Boncz Peter A.","year":"2008","unstructured":"Peter A. Boncz, Martin L. Kersten, and Stefan Manegold. 2008. Breaking the Memory Wall in MonetDB. Communications of the ACM (CACM) (2008)."},{"key":"e_1_2_1_11_1","unstructured":"Peter A Boncz Stefan Manegold Martin L Kersten et al. 1999. Database architecture optimized for the new bottleneck: Memory access. In VLDB."},{"key":"e_1_2_1_12_1","volume-title":"The design and implementation of CoGaDB: A column-oriented GPU-accelerated DBMS. Datenbank-Spektrum","author":"Bre\u00df Sebastian","year":"2014","unstructured":"Sebastian Bre\u00df. 2014. The design and implementation of CoGaDB: A column-oriented GPU-accelerated DBMS. Datenbank-Spektrum (2014)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00163"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC56929.2023.10247915"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035921"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid.2014.35"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2661888"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/SOCC56010.2022.9908126"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2019.8875680"},{"key":"e_1_2_1_20_1","volume-title":"Proc. 18th International Conference on Very Large Data Bases (VLDB).","author":"DeWitt David J.","unstructured":"David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, and S. Seshadri. 1992. Practical Skew Handling in Parallel Joins. In Proc. 18th International Conference on Very Large Data Bases (VLDB)."},{"key":"e_1_2_1_21_1","volume-title":"Onur Mutlu, and Izzat El Hajj.","author":"Diab Safaa","year":"2023","unstructured":"Safaa Diab, Amir Nassereldine, Mohammed Alser, Juan G\u00f3mez Luna, Onur Mutlu, and Izzat El Hajj. 2023. A framework for high-throughput sequence alignment using real processing-in-memory systems. Bioinformatics (2023)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/PDP55904.2022.00028"},{"key":"e_1_2_1_23_1","unstructured":"Hao Gao and Nikolai Sakharnykh. 2021. Scaling Joins to a Thousand GPUs. In ADMS@ VLDB."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI54635.2022.00064"},{"key":"e_1_2_1_25_1","volume-title":"Proc. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).","author":"G\u00f3mez-Luna Juan","year":"2023","unstructured":"Juan G\u00f3mez-Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, and Onur Mutlu. 2023. Evaluating Machine Learning Workloads on Memory-Centric Computing Systems. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)."},{"key":"e_1_2_1_26_1","volume-title":"Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu.","author":"G\u00f3mez-Luna Juan","year":"2022","unstructured":"Juan G\u00f3mez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu. 2022. Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System. IEEE Access (2022)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC59245.2023.00030"},{"key":"e_1_2_1_28_1","volume-title":"Proc. 7th bieennial Conference on Innovative Data Systems Research (CIDR).","author":"Halstead Robert J","year":"2015","unstructured":"Robert J Halstead, Ildar Absalyamov, Walid A Najjar, and Vassilis J Tsotras. 2015. FPGA-based Multithreading for In-Memory Hash Joins.. In Proc. 7th bieennial Conference on Innovative Data Systems Research (CIDR)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.Congress.2013.37"},{"key":"e_1_2_1_30_1","volume-title":"Proc. 17th International Conference on Very Large Data Bases (VLDB).","author":"Kien","unstructured":"Kien A. Hua and Chiang Lee. 1991. Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. In Proc. 17th International Conference on Very Large Data Bases (VLDB)."},{"key":"e_1_2_1_31_1","unstructured":"Intel. 2019. Intel\u00ae Xeon\u00ae Gold 5222 Processor. https:\/\/www.intel.com\/content\/www\/us\/en\/products\/sku\/192445\/intel-xeon-gold-5222-processor-16--5m-cache-3--80-ghz\/specifications.html"},{"key":"e_1_2_1_32_1","unstructured":"JEDEC Solid State Technology Association. 2012. DDR4 SDRAM STANDARD. https:\/\/xdevs.com\/doc\/Standards\/DDR4\/JESD79--4%20DDR4%20SDRAM.pdf."},{"key":"e_1_2_1_33_1","volume-title":"PIM-tree: A Skew-resistant Index for Processing-in-Memory. Proc. VLDB Endowment (PVLDB)","author":"Kang Hongbo","year":"2022","unstructured":"Hongbo Kang, Yiwei Zhao, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Charles McGuffey, and Phillip B. Gibbons. 2022. PIM-tree: A Skew-resistant Index for Processing-in-Memory. Proc. VLDB Endowment (PVLDB) (2022)."},{"key":"e_1_2_1_34_1","volume-title":"Yongsuk Kwon, KyungSoo Kim, Jin Jung, Ilkwon Yun, Sung Joo Park, Hyunsun Park, Joonho Song, Jeonghyeon Cho, Kyomin Sohn, Nam Sung Kim, and Hsien-Hsin S. Lee.","author":"Ke Liu","year":"2022","unstructured":"Liu Ke, Xuan Zhang, Jinin So, Jong-Geon Lee, Shin-Haeng Kang, Sukhan Lee, Songyi Han, YeonGon Cho, Jin Hyun Kim, Yongsuk Kwon, KyungSoo Kim, Jin Jung, Ilkwon Yun, Sung Joo Park, Hyunsun Park, Joonho Song, Jeonghyeon Cho, Kyomin Sohn, Nam Sung Kim, and Hsien-Hsin S. Lee. 2022. Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM. IEEE Micro (2022)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687564"},{"key":"e_1_2_1_36_1","volume-title":"LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer","author":"Kim Jin Hyun","year":"2022","unstructured":"Jin Hyun Kim, Shin-Haeng Kang, Sukhan Lee, Hyeonsu Kim, Yuhwan Ro, Seungwon Lee, David Wang, Jihyun Choi, Jinin So, YeonGon Cho, JoonHo Song, Jeonghyeon Cho, Kyomin Sohn, and Nam Sung Kim. 2022. Aquabolt-XL HBM2-PIM, LPDDR5-PIM With In-Memory Processing, and AXDIMM With Acceleration Buffer. IEEE Micro (2022)."},{"key":"e_1_2_1_37_1","volume-title":"Proc. 16th International Conference on Very Large Data Bases (VLDB).","author":"Kitsuregawa Masaru","year":"1990","unstructured":"Masaru Kitsuregawa and Yasushi Ogawa. 1990. Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). In Proc. 16th International Conference on Very Large Data Bases (VLDB)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/HCS55958.2022.9895629"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/BIBM49941.2020.9313351"},{"key":"e_1_2_1_40_1","volume-title":"Analysis of Data Transfer Bottlenecks in Commercial PIM Systems: A Study with UPMEM-PIM","author":"Lee Dongjae","year":"2024","unstructured":"Dongjae Lee, Bongjoon Hyun, Taehun Kim, and Minsoo Rhu. 2024. Analysis of Data Transfer Bottlenecks in Commercial PIM Systems: A Study with UPMEM-PIM. IEEE Computer Architecture Letters (2024)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533737.3535093"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00013"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589258"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517911"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646083"},{"key":"e_1_2_1_47_1","volume-title":"Many-query join: efficient shared execution of relational joins on modern hardware. The VLDB Journal","author":"Makreshanski Darko","year":"2018","unstructured":"Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2018. Many-query join: efficient shared execution of relational joins on modern hardware. The VLDB Journal (2018)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"S. Manegold P. Boncz and M. Kersten. 2002. Optimizing Main-Memory Join on Modern Hardware. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2002).","DOI":"10.1109\/TKDE.2002.1019210"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1114252.1114263"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1141277.1141396"},{"key":"e_1_2_1_51_1","unstructured":"Raghunath Othayoth Nambiar and Meikel Poess. 2006. The Making of TPC-DS.. In VLDB."},{"key":"e_1_2_1_52_1","volume-title":"Proc. 2021 USENIX Annual Technical Conference (USENIX ATC).","author":"Nider Joel","year":"2021","unstructured":"Joel Nider, Craig Mustard, Andrada Zoltan, John Ramsden, Larry Liu, Jacob Grossbard, Mohammad Dashti, Romaric Jodin, Alexandre Ghiti, Jordi Chauzi, and Alexandra Fedorova. 2021. A Case Study of Processing-in-Memory in off-the-Shelf Systems. In Proc. 2021 USENIX Annual Technical Conference (USENIX ATC)."},{"key":"e_1_2_1_53_1","unstructured":"Oracle. 2024. MySQL 8.0 Reference Manual - The INFORMATION_SCHEMA STATISTICS Table. https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/information-schema-statistics-table.html."},{"key":"e_1_2_1_54_1","volume-title":"Revisiting hash join on graphics processors: A decade later. Distributed and Parallel Databases","author":"Paul Johns","year":"2020","unstructured":"Johns Paul, Bingsheng He, Shengliang Lu, and Chiew Tong Lau. 2020. Revisiting hash join on graphics processors: A decade later. Distributed and Parallel Databases (2020)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457254"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498316"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498324"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7364051"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3085504.3085521"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587148"},{"key":"e_1_2_1_61_1","unstructured":"Roee Shlomo Julien Legriel Aph\u00e9lie Moisson and Sylvan Brocard. 2023. UPMEM-PIM Evaluation for SQL Query Acceleration. https:\/\/github.com\/upmem\/dpu_olap."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00068"},{"key":"e_1_2_1_63_1","volume-title":"The case for shared nothing","author":"Stonebraker Michael","year":"1986","unstructured":"Michael Stonebraker. 1986. The case for shared nothing. IEEE Database Eng. Bull. (1986)."},{"key":"e_1_2_1_64_1","volume-title":"Marco AZ Alves, and Eduardo C de Almeida.","author":"Tom\u00e9 Diego G","year":"2018","unstructured":"Diego G Tom\u00e9, Tiago Rodrigo Kepe, Marco AZ Alves, and Eduardo C de Almeida. 2018. Near-Data Filters: Taking Another Brick from the Memory Wall.. In ADMS@ VLDB."},{"key":"e_1_2_1_65_1","unstructured":"Transaction Processing Performance Council (TPC). 2022. TPC Benchmark H. https:\/\/www.tpc.org\/TPC_Documents_Current_Versions\/pdf\/TPC-H_v3.0.1.pdf."},{"key":"e_1_2_1_66_1","unstructured":"UPMEM SAS. 2021. UPMEM SDK. https:\/\/sdk.upmem.com\/2021.3.0\/index.html"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376720"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2677451"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3006446"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00111"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626739"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3698827","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3698827","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T17:46:24Z","timestamp":1774979184000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3698827"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,18]]},"references-count":71,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,18]]}},"alternative-id":["10.1145\/3698827"],"URL":"https:\/\/doi.org\/10.1145\/3698827","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,18]]}}}