{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T20:33:45Z","timestamp":1774038825905,"version":"3.50.1"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly accelerate in-memory join, whose performance is frequently bounded by main-memory accesses, by offloading the operations of join from host central processing units (CPUs) to the IDPs. As real PIM hardware has not been available until very recently, the prior PIM-assisted join algorithms have relied on PIM hardware simulators which assume fast shared memory between the IDPs and fast inter-IDP communication; however, on commodity PIM-enabled DIMMs, the IDPs do not share memory and demand the CPUs to mediate inter-IDP communication. Such discrepancies in the architectural characteristics make the prior studies incompatible with the DIMMs. Thus, to exploit the high potential of PIM on commodity PIM-enabled DIMMs, we need a new join algorithm designed and optimized for the DIMMs and their architectural characteristics.<\/jats:p>\n          <jats:p>In this paper, we design and analyze Processing-In-DIMM Join (PID-Join), a fast in-memory join algorithm which exploits UPMEM DIMMs, currently the only publicly-available PIM-enabled DIMMs. The DIMMs impose several key challenges on efficient acceleration of join including the shared-nothing nature and limited compute capabilities of the IDPs, the lack of hardware support for fast inter-IDP communication, and the slow IDP-wise data transfers between the IDPs and the main memory. PID-Join overcomes the challenges by prototyping and evaluating hash, sort-merge, and nested-loop algorithms optimized for the IDPs, enabling fast inter-IDP communication using host CPU cache streaming and vector instructions, and facilitating fast rank-wise data transfers between the IDPs and the main memory. Our evaluation using a real system equipped with eight UPMEM DIMMs and 1,024 IDPs shows that PID-Join greatly improves the performance of in-memory join over various CPU-based in-memory join algorithms.<\/jats:p>","DOI":"10.1145\/3589258","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-27","source":"Crossref","is-referenced-by-count":42,"title":["Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2665-6273","authenticated-orcid":false,"given":"Chaemin","family":"Lim","sequence":"first","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0501-794X","authenticated-orcid":false,"given":"Suhyun","family":"Lee","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2363-501X","authenticated-orcid":false,"given":"Jinwoo","family":"Choi","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0463-7717","authenticated-orcid":false,"given":"Jounghoo","family":"Lee","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3480-6626","authenticated-orcid":false,"given":"Seongyeon","family":"Park","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0762-7901","authenticated-orcid":false,"given":"Hanjun","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4010-6611","authenticated-orcid":false,"given":"Jinho","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1015-9969","authenticated-orcid":false,"given":"Youngsok","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336678"},{"key":"e_1_2_2_2_1","volume-title":"Proc. 17th International Conference on High Performance Computing and Communications (HPCC).","author":"Zanata Alves Marco Antonio","year":"2015","unstructured":"Marco Antonio Zanata Alves, Carlos Villavieja, Matthias Diener, Francis Birck Moreira, and Philippe Olivier Alexandre Navaux. 2015. SiNUCA: A Validated Micro-Architecture Simulator. In Proc. 17th International Conference on High Performance Computing and Communications (HPCC)."},{"key":"e_1_2_2_3_1","unstructured":"Austin Appleby. 2011. MurmurHash. https:\/\/sites.google.com\/site\/murmurhash\/"},{"key":"e_1_2_2_4_1","unstructured":"JEDEC Solid State Technology Association. 2012. DDR4 SDRAM STANDARD. https:\/\/xdevs.com\/doc\/Standards\/DDR4\/JESD79--4%20DDR4%20SDRAM.pdf"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732219.2732227"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544839"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452831"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213851"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1409360.1409380"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00270"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272743.1272747"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2019.8875680"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527431"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.1997.0873"},{"key":"e_1_2_2_15_1","unstructured":"E Knuth Donald et al. 1999. The art of computer programming. Sorting and searching Vol. 3 (1999)."},{"key":"e_1_2_2_16_1","volume-title":"Proc. 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).","author":"dos Santos Sairo R.","unstructured":"Sairo R. dos Santos, Francis B. Moreira, Tiago R. Kepe, and Marco A. Z. Alves. 2022. Advancing Database System Operators with Near-Data Processing. In Proc. 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080233"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056040"},{"key":"e_1_2_2_19_1","volume-title":"Proc. 12th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS).","author":"Gao Hao","year":"2021","unstructured":"Hao Gao and Nikolai Sakharnykh. 2021. Scaling Joins to a Thousand GPUs. In Proc. 12th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS)."},{"key":"e_1_2_2_20_1","volume-title":"Ivan Fernandez, Christina Giannoula, Geraldo F Oliveira, and Onur Mutlu.","author":"G\u00f3mez-Luna Juan","year":"2021","unstructured":"Juan G\u00f3mez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F Oliveira, and Onur Mutlu. 2021. Benchmarking a new paradigm: An experimental analysis of a real processing-in-memory architecture. arXiv preprint arXiv:2105.03814 (2021)."},{"key":"e_1_2_2_21_1","volume-title":"Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu.","author":"G\u00f3mez-Luna Juan","year":"2022","unstructured":"Juan G\u00f3mez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu. 2022. Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System. IEEE Access, Vol. 10 (2022)."},{"key":"e_1_2_2_22_1","volume-title":"Algorithm design and applications","author":"Goodrich Michael T","unstructured":"Michael T Goodrich and Roberto Tamassia. 2015. Algorithm design and applications. Vol. 363. Wiley Hoboken."},{"key":"e_1_2_2_23_1","unstructured":"The PostgreSQL Global Development Group. 2022. Documentation: 7.2: Performance Tips - PostgreSQL. https:\/\/www.postgresql.org\/docs\/7.2\/performance-tips.html"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00058"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844457"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446749"},{"key":"e_1_2_2_27_1","volume-title":"Proc. 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"He Mingxuan","unstructured":"Mingxuan He, Choungki Song, Ilkon Kim, Chunseok Jeong, Seho Kim, Il Park, Mithuna Thottethodi, and T. N. Vijaykumar. 2020. Newton: A DRAM-maker's Accelerator-in-Memory (AiM) Architecture for Machine Learning. In Proc. 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824036"},{"key":"e_1_2_2_29_1","unstructured":"Richard D Hipp. 2022. SQLite. https:\/\/www.sqlite.org\/index.html"},{"key":"e_1_2_2_30_1","unstructured":"Intel. 2022a. APP Metrics for Intel Microprocessors - Intel Xeon Processor. https:\/\/www.intel.com\/content\/dam\/support\/us\/en\/documents\/processors\/APP-for-Intel-Xeon-Processors.pdf"},{"key":"e_1_2_2_31_1","unstructured":"Intel. 2022b. Intel\u00ae 64 and IA-32 Architectures Developer's Manual: Vol. 3A. https:\/\/www.intel.com\/content\/www\/us\/en\/architecture-and-technology\/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSIT.2012.6242474"},{"key":"e_1_2_2_33_1","unstructured":"Bob Jenkins. 1997. A Hash Function for Hash Table Lookup. https:\/\/burtleburtle.net\/bob\/hash\/doobs.html"},{"key":"e_1_2_2_34_1","volume-title":"Yongsuk Kwon, et al.","author":"Ke Liu","year":"2021","unstructured":"Liu Ke, Xuan Zhang, Jinin So, Jong-Geon Lee, Shin-Haeng Kang, Sukhan Lee, Songyi Han, YeonGon Cho, Jin Hyun Kim, Yongsuk Kwon, et al. 2021. Near-memory processing in action: Accelerating personalized recommendation with AxDIMM. IEEE Micro, Vol. 42 (2021)."},{"key":"e_1_2_2_35_1","volume-title":"Alves","author":"Kepe Tiago R.","year":"2019","unstructured":"Tiago R. Kepe, Eduardo C. de Almeida, and Marco A. Z. Alves. 2019. Database Processing-in-Memory: An Experimental Study. Proceedings of the VLDB Endowment, Vol. 13 (2019)."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3151106.3151112"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687564"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.1994.108"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.3850\/9783981537079_0512"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00013"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517911"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/356643.356645"},{"key":"e_1_2_2_43_1","volume-title":"Proc. 5th Workshop on Architectures and Systems for Big Data (ASBD).","author":"Mirzadeh Nooshin S.","year":"2015","unstructured":"Nooshin S. Mirzadeh, Onur Kocberber, Babak Falsafi, and Boris Grot. 2015. Sort vs. Hash Join Revisited for Near-Memory Execution. In Proc. 5th Workshop on Architectures and Systems for Big Data (ASBD)."},{"key":"e_1_2_2_44_1","volume-title":"Proc. USENIX Annual Technical Conference (USENIX ATC).","author":"Nider Joel","year":"2021","unstructured":"Joel Nider, Craig Mustard, Andrada Zoltan, John Ramsden, Larry Liu, Jacob Grossbard, Mohammad Dashti, Romaric Jodin, Alexandre Ghiti, Jordi Chauzi, and Alexandra (Sasha) Fedorova. 2021. A Case Study of Processing-in-Memory in off-the-Shelf Systems. In Proc. USENIX Annual Technical Conference (USENIX ATC)."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2015.74"},{"key":"e_1_2_2_46_1","unstructured":"Oracle. 2022. MySQL 8.0 Reference Manual: Nested-Loop Join Algorithms. https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/nested-loop-joins.html"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.592312"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457254"},{"key":"e_1_2_2_49_1","volume-title":"PIMDB: Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics. arXiv preprint arXiv:2203.10486","author":"Perach Ben","year":"2022","unstructured":"Ben Perach, Ronny Ronen, Benny Kimelfeld, and Shahar Kvatinsky. 2022. PIMDB: Understanding Bulk-Bitwise Processing In-Memory Through Database Analytics. arXiv preprint arXiv:2203.10486 (2022)."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/23005.42225"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610521"},{"key":"e_1_2_2_52_1","volume-title":"ACM","volume":"59","author":"Puatracscu Mihai","year":"2012","unstructured":"Mihai Puatracscu and Mikkel Thorup. 2012. The Power of Simple Tabulation Hashing. J. ACM, Vol. 59 (2012)."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3229874"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551849"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850585"},{"key":"e_1_2_2_56_1","volume-title":"DRAMSim2: A Cycle Accurate Memory System Simulator","author":"Rosenfeld Paul","year":"2011","unstructured":"Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters, Vol. 10 (2011)."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7364051"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/3436905.3436927"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137636"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2777598.2777602"},{"key":"e_1_2_2_61_1","unstructured":"Ambuj Shatdal Chander Kant and Jeffrey F Naughton. 1994. Cache conscious algorithms for relational query processing."},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2857044"},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00068"},{"key":"e_1_2_2_64_1","volume-title":"Cyclic Redundancy Check","author":"Sobolewski John S.","unstructured":"John S. Sobolewski. 2003. Cyclic Redundancy Check. John Wiley and Sons Ltd."},{"key":"e_1_2_2_65_1","unstructured":"UPMEM SAS. 2021. UPMEM SDK. https:\/\/sdk.upmem.com\/2021.3.0\/index.html"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2677451"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"},{"key":"e_1_2_2_68_1","unstructured":"Zuyu Zhang Harshad Deshmukh and Jignesh M Patel. 2019. Data Partitioning for In-Memory Systems: Myths Challenges and Opportunities. In CIDR."},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115409"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589258","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589258","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:54Z","timestamp":1750182534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589258"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":69,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589258"],"URL":"https:\/\/doi.org\/10.1145\/3589258","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}