{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,12]],"date-time":"2025-04-12T04:30:13Z","timestamp":1744432213168,"version":"3.37.3"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004168","name":"Universit\u00e4t zu L\u00fcbeck","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004168","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Modern computer systems can use different types of hardware acceleration to achieve massive performance improvements. Some accelerators like FPGA and dedicated GPU (dGPU) need optimized data structures for the best performance and often use dedicated memory. In contrast, APUs, which are a combination of a CPU and an integrated GPU (iGPU), support shared memory and allow the iGPU to work together with the CPU on pointer-based data structures. First, we develop an approach for dGPU to accelerate queries in libcuckoo and robin-map and when looking at accelerating insert, updates and erase operations in the original libcuckoo using OneAPI on an APU. We evaluate the dGPU against the CPU variants and our dGPU approach adapted for the CPU and also in a hybrid context by using longer keys on the CPU and shorter keys on the dGPU. In comparison with the original libcuckoo algorithm, our dGPU approach achieves a speed-up of 2.1, and in comparison with the robin-map a speed-up of 1.5. For hybrid workloads, our approach is efficient if long keys are processed on the CPU and short keys are processed on the dGPU. By processing a mixture of 20% long keys on the CPU and 80% short keys on dGPU, our hybrid approach has a 40% higher throughput than the CPU only approach. In addition, we develop a hybrid APU approach for insert, update and erase operations in the original libcuckoo structure focusing on shared memory with iGPU accelerated look-ups of the positions for insert, update and erase operations.<\/jats:p>","DOI":"10.1007\/s10115-023-01891-w","type":"journal-article","created":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T11:02:10Z","timestamp":1685098930000},"page":"4359-4377","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Hybrid CPU\/GPU\/APU accelerated query, insert, update and erase operations in hash tables with string keys"],"prefix":"10.1007","volume":"65","author":[{"given":"Tobias","family":"Groth","sequence":"first","affiliation":[]},{"given":"Sven","family":"Groppe","sequence":"additional","affiliation":[]},{"given":"Thilo","family":"Pionteck","sequence":"additional","affiliation":[]},{"given":"Franz","family":"Valdiek","sequence":"additional","affiliation":[]},{"given":"Martin","family":"Koppehel","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,26]]},"reference":[{"issue":"1","key":"1891_CR1","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1109\/TPDS.2019.2929768","volume":"31","author":"B Lessley","year":"2020","unstructured":"Lessley B, Childs H (2020) Data-parallel hashing techniques for GPU architectures. IEEE Trans Parallel Distrib Syst 31(1):237\u2013250. https:\/\/doi.org\/10.1109\/TPDS.2019.2929768","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"1891_CR2","doi-asserted-by":"publisher","unstructured":"Groth T, Groppe,S, Pionteck T, Koppehel M, Valdiek F (2022) Accelerated parallel hybrid GPU\/CPU hash table queries with string keys. In: The 33rd international conference on database and expert systems applications (DEXA), Vienna, Austria .This paper received the Norman Revell Best Paper Award. https:\/\/doi.org\/10.1007\/978-3-031-12426-6_15","DOI":"10.1007\/978-3-031-12426-6_15"},{"key":"1891_CR3","doi-asserted-by":"publisher","unstructured":"Mittal S, Vetter JS (2015) A survey of CPU-GPU heterogeneous computing techniques. ACM Comput Surv. https:\/\/doi.org\/10.1145\/2788396","DOI":"10.1145\/2788396"},{"key":"1891_CR4","unstructured":"Behrens T, Rosenfeld V, Traub J, Bre\u00df S, Markl V (2018) Efficient simd vectorization for hashing in opencl. In: EDBT, pp 489\u2013492"},{"key":"1891_CR5","doi-asserted-by":"publisher","unstructured":"J\u00fcnger D, Kobus R, M\u00fcller A, Hundt C, Xu K, Liu W, Schmidt B (2020) Warpcore: a library for fast hash tables on GPUs. In: 2020 IEEE 27th international conference on high performance computing, data, and analytics (HiPC), pp 11\u201320. https:\/\/doi.org\/10.1109\/HiPC50609.2020.00015","DOI":"10.1109\/HiPC50609.2020.00015"},{"key":"1891_CR6","doi-asserted-by":"publisher","unstructured":"Merrill DG, Grimshaw AS (2010) Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques. PACT \u201910. Association for Computing Machinery, New York, pp 545\u2013546. https:\/\/doi.org\/10.1145\/1854273.1854344","DOI":"10.1145\/1854273.1854344"},{"key":"1891_CR7","doi-asserted-by":"publisher","unstructured":"Ashkiani S, Farach-Colton M, Owens JD (2018) A dynamic hash table for the GPU. In: 2018 IEEE international parallel and distributed processing symposium (IPDPS), pp 419\u2013429. https:\/\/doi.org\/10.1109\/IPDPS.2018.00052","DOI":"10.1109\/IPDPS.2018.00052"},{"key":"1891_CR8","doi-asserted-by":"publisher","unstructured":"Green O (2021) Hashgraph-scalable hash tables using a sparse graph data structure. ACM Trans Parallel Comput. https:\/\/doi.org\/10.1145\/3460872","DOI":"10.1145\/3460872"},{"key":"1891_CR9","doi-asserted-by":"publisher","unstructured":"Li Y, Zhu Q, Lyu Z, Huang Z, Sun J (2021) Dycuckoo: Dynamic hash tables on GPUs. In: 2021 IEEE 37th international conference on data engineering (ICDE), pp 744\u2013755. https:\/\/doi.org\/10.1109\/ICDE51399.2021.00070","DOI":"10.1109\/ICDE51399.2021.00070"},{"key":"1891_CR10","doi-asserted-by":"publisher","unstructured":"Lupescu G, Tapus N (2021) Design of hashtable for heterogeneous architectures. In: 2021 23rd international conference on control systems and computer science (CSCS), pp 172\u2013177. https:\/\/doi.org\/10.1109\/CSCS52396.2021.00035","DOI":"10.1109\/CSCS52396.2021.00035"},{"key":"1891_CR11","doi-asserted-by":"publisher","unstructured":"Ashkiani S, Farach-Colton M, Owens JD (2018) A dynamic hash table for the GPU. In: 2018 IEEE international parallel and distributed processing symposium (IPDPS), pp 419\u2013429. https:\/\/doi.org\/10.1109\/IPDPS.2018.00052","DOI":"10.1109\/IPDPS.2018.00052"},{"key":"1891_CR12","doi-asserted-by":"publisher","unstructured":"Daga M, Nutter M (2012) Exploiting coarse-grained parallelism in b+ tree searches on an APU. In: 2012 SC companion: high performance computing, networking storage and analysis, pp 240\u2013247. https:\/\/doi.org\/10.1109\/SC.Companion.2012.40","DOI":"10.1109\/SC.Companion.2012.40"},{"key":"1891_CR13","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1007\/978-3-030-89698-0_100","volume-title":"Advances in natural computation, fuzzy systems and knowledge discovery","author":"H Luan","year":"2022","unstructured":"Luan H, Fu Y (2022) Accelerating group-by and aggregation on heterogeneous CPU\u2013GPU platforms. In: Xie Q, Zhao L, Li K, Yadav A, Wang L (eds) Advances in natural computation, fuzzy systems and knowledge discovery. Springer, Cham, pp 980\u2013990"},{"issue":"8","key":"1891_CR14","doi-asserted-by":"publisher","first-page":"4177","DOI":"10.1007\/s11227-018-2347-0","volume":"75","author":"A Villegas","year":"2019","unstructured":"Villegas A, Navarro A, Asenjo R, Plata O (2019) Toward a software transactional memory for heterogeneous CPU\u2013GPU processors. J Supercomput 75(8):4177\u20134192","journal-title":"J Supercomput"},{"key":"1891_CR15","doi-asserted-by":"publisher","unstructured":"Kulikov D, Nikolskaia D, Kurapov P (2021) Efficient hardware-agnostic DBMs operator implementation using SYCL. In: 2021 international conference engineering and telecommunication (En &T), pp 1\u20135. https:\/\/doi.org\/10.1109\/EnT50460.2021.9681747","DOI":"10.1109\/EnT50460.2021.9681747"},{"key":"1891_CR16","doi-asserted-by":"publisher","unstructured":"Breyer M, Dai\u00df G, Pfl\u00fcger D (2021) Performance-portable distributed k-nearest neighbors using locality-sensitive hashing and SYCL. In: IWOCL\u201921. Association for Computing Machinery, New York. NY, USA https:\/\/doi.org\/10.1145\/3456669.3456692","DOI":"10.1145\/3456669.3456692"},{"key":"1891_CR17","doi-asserted-by":"crossref","unstructured":"Knuth DE (1974) The art of computer programming, vol. 3: sorting and searching. Addison\u2013Wesley","DOI":"10.1145\/1283920.1283929"},{"issue":"2","key":"1891_CR18","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1016\/0022-0000(78)90046-6","volume":"16","author":"LJ Guibas","year":"1978","unstructured":"Guibas LJ, Szemeredi E (1978) The analysis of double hashing. J Comput Syst Sci 16(2):226\u2013274. https:\/\/doi.org\/10.1016\/0022-0000(78)90046-6","journal-title":"J Comput Syst Sci"},{"key":"1891_CR19","doi-asserted-by":"crossref","unstructured":"Pagh R, Rodler FF (2001) Cuckoo hashing. In: auf\u00a0der Heide, F.M. (ed.) Algorithms\u2014ESA 2001. Springer, Berlin, Heidelberg, pp 121\u2013133","DOI":"10.1007\/3-540-44676-1_10"},{"key":"1891_CR20","doi-asserted-by":"publisher","unstructured":"Li X, Andersen DG, Kaminsky M, Freedman MJ (2014) Algorithmic improvements for fast concurrent cuckoo hashing. In: Proceedings of the 9th European conference on computer systems. EuroSys \u201914. Association for Computing Machinery, New York. https:\/\/doi.org\/10.1145\/2592798.2592820","DOI":"10.1145\/2592798.2592820"},{"key":"1891_CR21","doi-asserted-by":"publisher","unstructured":"Celis P, Larson P-A, Munro JI (1985) Robin hood hashing. In: 26th annual symposium on foundations of computer science (sfcs 1985), pp 281\u2013288. https:\/\/doi.org\/10.1109\/SFCS.1985.48","DOI":"10.1109\/SFCS.1985.48"},{"key":"1891_CR22","unstructured":"Fan B, Andersen DG, Kaminsky M (2013) Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. In: 10th USENIX symposium on networked systems design and implementation (NSDI 13). USENIX Association, Lombard, pp 371\u2013384. https:\/\/www.usenix.org\/conference\/nsdi13\/technical-sessions\/presentation\/fan"},{"issue":"2","key":"1891_CR23","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1147\/rd.312.0249","volume":"31","author":"RM Karp","year":"1987","unstructured":"Karp RM, Rabin MO (1987) Efficient randomized pattern-matching algorithms. IBM J Res Dev 31(2):249\u2013260. https:\/\/doi.org\/10.1147\/rd.312.0249","journal-title":"IBM J Res Dev"},{"key":"1891_CR24","doi-asserted-by":"publisher","unstructured":"Dayarathne N, Ragel R (2014) Accelerating Rabin Karp on a graphics processing unit (GPU) using compute unified device architecture (CUDA). In: 7th international conference on information and automation for sustainability, pp 1\u20136. https:\/\/doi.org\/10.1109\/ICIAFS.2014.7069589","DOI":"10.1109\/ICIAFS.2014.7069589"},{"key":"1891_CR25","doi-asserted-by":"publisher","unstructured":"Daga M, Tschirhart ZS, Freitag C (2015) Exploring parallel programming models for heterogeneous computing systems. In: 2015 IEEE international symposium on workload characterization, pp 98\u2013107. https:\/\/doi.org\/10.1109\/IISWC.2015.16","DOI":"10.1109\/IISWC.2015.16"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-023-01891-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-023-01891-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-023-01891-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T17:10:37Z","timestamp":1692724237000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-023-01891-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":25,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["1891"],"URL":"https:\/\/doi.org\/10.1007\/s10115-023-01891-w","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"type":"print","value":"0219-1377"},{"type":"electronic","value":"0219-3116"}],"subject":[],"published":{"date-parts":[[2023,5,26]]},"assertion":[{"value":"18 January 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 April 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 April 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 May 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}