{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:20:57Z","timestamp":1750220457322,"version":"3.41.0"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,6,8]],"date-time":"2021-06-08T00:00:00Z","timestamp":1623110400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["SHF-1422516"],"award-info":[{"award-number":["SHF-1422516"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>Lookup operations for in-memory databases are heavily memory bound, because they often rely on pointer-chasing linked data structure traversals. They also have many branches that are hard-to-predict due to random key lookups. In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions only a small fraction of the performance benefit is achieved through prefetching alone. We propose the Node Tracker (NT), a novel programmable prefetcher\/pre-execution unit that is highly effective in exploiting inter key-lookup parallelism to improve single-thread performance. We extend NT with branch outcome streaming (BOS) to reduce branch mispredictions and show that this achieves an extra 3\u00d7 speedup. Finally, we evaluate the NT as a pre-execution unit and demonstrate that we can further improve the performance in both single- and multi-threaded execution modes. Our results show that, on average, NT improves single-thread performance by 4.1\u00d7 when used as a prefetcher; 11.9\u00d7 as a prefetcher with BOS; 14.9\u00d7 as a pre-execution unit and 18.8\u00d7 as a pre-execution unit with BOS. Finally, with 24 cores of the latter version, we achieve a speedup of 203\u00d7 and 11\u00d7 over the single-core and 24-core baselines, respectively.<\/jats:p>","DOI":"10.1145\/3452099","type":"journal-article","created":{"date-parts":[[2021,6,8]],"date-time":"2021-06-08T16:21:19Z","timestamp":1623169279000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Fast Key-Value Lookups with Node Tracker"],"prefix":"10.1145","volume":"18","author":[{"given":"Mustafa","family":"Cavus","sequence":"first","affiliation":[{"name":"Dept. of Elect., Comp. and Biomed. Eng., Univ. of Rhode Island, Kingston, RI, USA"}]},{"given":"Mohammed","family":"Shatnawi","sequence":"additional","affiliation":[{"name":"Dept. of Elect., Comp. and Biomed. Eng., Univ. of Rhode Island, Kingston, RI, USA"}]},{"given":"Resit","family":"Sendag","sequence":"additional","affiliation":[{"name":"Dept. of Elect., Comp. and Biomed. Eng., Univ. of Rhode Island, Kingston, RI, USA"}]},{"given":"Augustus K.","family":"Uht","sequence":"additional","affiliation":[{"name":"Dept. of Elect., Comp. and Biomed. Eng., Univ. of Rhode Island, Kingston, RI, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,6,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856321"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272743.1272747"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807206"},{"volume-title":"Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium.","author":"Jung Changhee","key":"e_1_2_1_4_1","unstructured":"Changhee Jung , Daeseob Lim , Jaejin Lee , and Y. Solihin . 2006. Helper thread prefetching for loosely-coupled multiprocessor systems . In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium. Changhee Jung, Daeseob Lim, Jaejin Lee, and Y. Solihin. 2006. Helper thread prefetching for loosely-coupled multiprocessor systems. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium."},{"key":"e_1_2_1_5_1","volume-title":"An event-triggered programmable prefetcher for irregular workloads. ACM SIGPLAN Not. 53, 2 (Mar","author":"Ainsworth S.","year":"2018","unstructured":"S. Ainsworth and T. M. Jones . An event-triggered programmable prefetcher for irregular workloads. ACM SIGPLAN Not. 53, 2 (Mar . 2018 ), 578--592. DOI:https:\/\/doi.org\/10.1145\/3296957.3173189 10.1145\/3296957.3173189 S. Ainsworth and T. M. Jones. An event-triggered programmable prefetcher for irregular workloads. ACM SIGPLAN Not. 53, 2 (Mar. 2018), 578--592. DOI:https:\/\/doi.org\/10.1145\/3296957.3173189"},{"volume-title":"Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA\u201915)","author":"Ahn J.","key":"e_1_2_1_6_1","unstructured":"J. Ahn , S. Hong , S. Yoo , O. Mutlu , and K. Choi . 2015. A scalable processing-in-memory accelerator for parallel graph processing . In Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA\u201915) , 105\u2013117. J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA\u201915), 105\u2013117."},{"key":"e_1_2_1_7_1","unstructured":"AMD. 2017. Software Optimization Guide for AMD Family 17h Processors 2.8.1.6. https:\/\/developer.amd.com\/wordpress\/media\/2013\/12\/55723_SOG_Fam_17h_Processors_3.00.pdf.  AMD. 2017. Software Optimization Guide for AMD Family 17h Processors 2.8.1.6. https:\/\/developer.amd.com\/wordpress\/media\/2013\/12\/55723_SOG_Fam_17h_Processors_3.00.pdf."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_9_1","unstructured":"WikiChip Fuse. 2019. Intel Sunny Cove Core to Deliver a Major Improvement in Single-Thread Performance Bigger Improvements to Follow. Retrieved from https:\/\/fuse.wikichip.org\/news\/2371\/intel-sunny-cove-core-to-deliver-a-major-improvement-in-single-thread-performance-bigger-improvements-to-follow\/.  WikiChip Fuse. 2019. Intel Sunny Cove Core to Deliver a Major Improvement in Single-Thread Performance Bigger Improvements to Follow. Retrieved from https:\/\/fuse.wikichip.org\/news\/2371\/intel-sunny-cove-core-to-deliver-a-major-improvement-in-single-thread-performance-bigger-improvements-to-follow\/."},{"key":"e_1_2_1_10_1","unstructured":"AMD. 2019. Retrieved from https:\/\/www.amd.com\/en\/products\/epyc-server.  AMD. 2019. Retrieved from https:\/\/www.amd.com\/en\/products\/epyc-server."},{"volume-title":"Proceedings of the IEEE International Conference on Computer Design (ICCD\u201918)","author":"Cavus Mustafa","key":"e_1_2_1_11_1","unstructured":"Mustafa Cavus , Resit Sendag , and Joshua J. Yi . 2018. Array tracking prefetcher for indirect accesses . In Proceedings of the IEEE International Conference on Computer Design (ICCD\u201918) . 132--139. DOI:10.1109\/ICCD.2018.00028 10.1109\/ICCD.2018.00028 Mustafa Cavus, Resit Sendag, and Joshua J. Yi. 2018. Array tracking prefetcher for indirect accesses. In Proceedings of the IEEE International Conference on Computer Design (ICCD\u201918). 132--139. DOI:10.1109\/ICCD.2018.00028"},{"volume-title":"Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201312","author":"Hashemi M.","key":"e_1_2_1_12_1","unstructured":"M. Hashemi , O. Mutlu , and Y. N. Patt . 2016. Continuous runahead: Transparent hardware acceleration for memory intensive workloads . In Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201312 . DOI: 10.1109\/MICRO.2016.7783764. 10.1109\/MICRO.2016.7783764 M. Hashemi, O. Mutlu, and Y. N. Patt. 2016. Continuous runahead: Transparent hardware acceleration for memory intensive workloads. In Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201312. DOI: 10.1109\/MICRO.2016.7783764."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304593"},{"volume-title":"Proceedings of the 2018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201314","author":"Mukkara A.","key":"e_1_2_1_14_1","unstructured":"A. Mukkara , N. Beckmann , M. Abeydeera , X. Ma , and D. Sanchez . 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling . In Proceedings of the 2018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201314 . A. Mukkara, N. Beckmann, M. Abeydeera, X. Ma, and D. Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In Proceedings of the 2018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 1\u201314."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 5th Championship on Branch Prediction.","author":"Seznec Andre","year":"2016","unstructured":"Andre Seznec . 2016 . TAGE-SC-L Branch Predictors Again . In Proceedings of the 5th Championship on Branch Prediction. Andre Seznec. 2016. TAGE-SC-L Branch Predictors Again. In Proceedings of the 5th Championship on Branch Prediction."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446087"},{"volume-title":"Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), 1\u201312","author":"Kim J.","key":"e_1_2_1_17_1","unstructured":"J. Kim , S. H. Pugsley , P. V. Gratz , A. L. N. Reddy , C. Wilkerson , and Z. Chishti . 2016. Path confidence based lookahead prefetching . In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), 1\u201312 . J. Kim, S. H. Pugsley, P. V. Gratz, A. L. N. Reddy, C. Wilkerson, and Z. Chishti. 2016. Path confidence based lookahead prefetching. In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), 1\u201312."},{"volume-title":"Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 247--259","author":"Jain A.","key":"e_1_2_1_18_1","unstructured":"A. Jain and C. Lin . 2013. Linearizing irregular memory accesses for improved correlated prefetching . Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 247--259 . A. Jain and C. Lin. 2013. Linearizing irregular memory accesses for improved correlated prefetching. Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 247--259."},{"volume-title":"Proceedings of the 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), 469\u2013480","author":"Li S.","key":"e_1_2_1_20_1","unstructured":"S. Li , J. H. Ahn , R. D. Strong , J. B. Brockman , D. M. Tullsen , and N. P. Jouppi . 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures . In Proceedings of the 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), 469\u2013480 . S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 2009 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), 469\u2013480."},{"key":"e_1_2_1_21_1","volume-title":"Article 35 (Aug.","author":"Mittal Sparsh","year":"2016","unstructured":"Sparsh Mittal . 2016. A survey of recent prefetching techniques for processor caches. ACM Comput. Surv. 49, 2 , Article 35 (Aug. 2016 ), 35 pages. DOI: https:\/\/doi.org\/10.1145\/2907071 10.1145\/2907071 Sparsh Mittal. 2016. A survey of recent prefetching techniques for processor caches. ACM Comput. Surv. 49, 2, Article 35 (Aug. 2016), 35 pages. DOI: https:\/\/doi.org\/10.1145\/2907071"},{"volume-title":"Proceedings of the International Conference on Supercomputing (ICS\u201909)","author":"Ishii Y.","key":"e_1_2_1_22_1","unstructured":"Y. Ishii , M. Inaba , and K. Hiraki . 2009. Access map pattern matching for data cache prefetch . In Proceedings of the International Conference on Supercomputing (ICS\u201909) . 499\u2013500. Y. Ishii, M. Inaba, and K. Hiraki. 2009. Access map pattern matching for data cache prefetch. In Proceedings of the International Conference on Supercomputing (ICS\u201909). 499\u2013500."},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO). 141\u2013152","author":"Shevgoor M.","key":"e_1_2_1_23_1","unstructured":"M. Shevgoor , S. Koladiya , R. Balasubramonian , C. Wilkerson , S. H. Pugsley , and Z. Chishti . 2015. Efficiently prefetching complex address patterns . In Proceedings of the International Symposium on Microarchitecture (MICRO). 141\u2013152 . M. Shevgoor, S. Koladiya, R. Balasubramonian, C. Wilkerson, S. H. Pugsley, and Z. Chishti. 2015. Efficiently prefetching complex address patterns. In Proceedings of the International Symposium on Microarchitecture (MICRO). 141\u2013152."},{"volume-title":"Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919)","author":"Bakhshalipour M.","key":"e_1_2_1_24_1","unstructured":"M. Bakhshalipour , M. Shakerinava , P. Lotfi-Kamran , and H. Sarbazi-Azad . 2019. Bingo spatial data prefetcher . In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919) , 399\u2013411. M. Bakhshalipour, M. Shakerinava, P. Lotfi-Kamran, and H. Sarbazi-Azad. 2019. Bingo spatial data prefetcher. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919), 399\u2013411."},{"key":"e_1_2_1_25_1","volume-title":"InThird Data Prefetching Competition. ISCA 2019","author":"Bakalapati S.","year":"2020","unstructured":"S. Bakalapati and B. Panda . 2019. Bouquet of instruction pointers: Instruction pointer classifier based hardware prefetching . InThird Data Prefetching Competition. ISCA 2019 . IEEE Press, 118--131. DOI:https:\/\/doi.org\/10.1109\/ISCA45697. 2020 .00021 10.1109\/ISCA45697.2020.00021 S. Bakalapati and B. Panda. 2019. Bouquet of instruction pointers: Instruction pointer classifier based hardware prefetching. InThird Data Prefetching Competition. ISCA 2019. IEEE Press, 118--131. DOI:https:\/\/doi.org\/10.1109\/ISCA45697.2020.00021"},{"key":"e_1_2_1_26_1","unstructured":"Retrieved from https:\/\/dpc3.compas.cs.stonybrook.edu.  Retrieved from https:\/\/dpc3.compas.cs.stonybrook.edu."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.29"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133382.2133384"},{"key":"e_1_2_1_29_1","volume-title":"SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th International Symposium on Microarchitecture","author":"Lipasti M.","year":"1995","unstructured":"M. Lipasti , W. Schmidt , S. Kunkel , and R. Roediger . 1995 . SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th International Symposium on Microarchitecture . IEEE Computer Society Press , Los Alamitos, CA , 232\u2013236. M. Lipasti, W. Schmidt, S. Kunkel, and R. Roediger. 1995. SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th International Symposium on Microarchitecture. IEEE Computer Society Press, Los Alamitos, CA, 232\u2013236."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/248208.237190"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/106972.106979"},{"key":"e_1_2_1_32_1","unstructured":"Mowry. 1994.\n  Tolerating Latency through Software-controlled Data Prefetching\n  . Ph.D. Dissertation Stanford University 1994\n  .  Mowry. 1994. Tolerating Latency through Software-controlled Data Prefetching. Ph.D. Dissertation Stanford University 1994."},{"volume-title":"Compiler Construction","author":"Wu Youfeng","key":"e_1_2_1_33_1","unstructured":"Youfeng Wu , Mauricio Serrano , Rakesh Krishnaiyer , Wei Li , and Jesse Fang1. 2002. Value profile guided stride prefetching for irregular code . In Compiler Construction . Springer , 307\u2013324. Youfeng Wu, Mauricio Serrano, Rakesh Krishnaiyer, Wei Li, and Jesse Fang1. 2002. Value profile guided stride prefetching for irregular code. In Compiler Construction. Springer, 307\u2013324."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201917)","author":"Jones Ainsworth","year":"2017","unstructured":"Ainsworth and Jones . 2017 . Software prefetching for indirect memory accesses . In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201917) . IEEE Press, 305--317. Ainsworth and Jones. 2017. Software prefetching for indirect memory accesses. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201917). IEEE Press, 305--317."},{"volume-title":"Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201902)","author":"Yeung Kim","key":"e_1_2_1_35_1","unstructured":"Kim and D. Yeung . 2002. Design and evaluation of compiler algorithms for pre-execution . Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201902) . 159--170. DOI:https:\/\/doi.org\/10.1145\/635508.605415 10.1145\/635508.605415 Kim and D. Yeung. 2002. Design and evaluation of compiler algorithms for pre-execution. Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201902). 159--170. DOI:https:\/\/doi.org\/10.1145\/635508.605415"},{"volume-title":"Proceedings of the USENIX Conference on Hot Topics in Parallelism (HotPar\u201911)","author":"Lau E.","key":"e_1_2_1_36_1","unstructured":"E. Lau , J. E. Miller , I. Choi , D. Yeung , S. Amarasinghe , and A. Agarwal . 2011. Multicore performance optimization using partner cores . Proceedings of the USENIX Conference on Hot Topics in Parallelism (HotPar\u201911) . E. Lau, J. E. Miller, I. Choi, D. Yeung, S. Amarasinghe, and A. Agarwal. 2011. Multicore performance optimization using partner cores. Proceedings of the USENIX Conference on Hot Topics in Parallelism (HotPar\u201911)."},{"volume-title":"Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201915)","author":"Ham T. J.","key":"e_1_2_1_37_1","unstructured":"T. J. Ham , J. L. Arag\u00f3n , and M. Martonosi . 2015. DeSC: Decoupled supply-compute communication management for heterogeneous architectures . Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201915) . 191--203. DOI:10.1145\/2830772.2830800 10.1145\/2830772.2830800 T. J. Ham, J. L. Arag\u00f3n, and M. Martonosi. 2015. DeSC: Decoupled supply-compute communication management for heterogeneous architectures. Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201915). 191--203. DOI:10.1145\/2830772.2830800"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201906)","author":"Ganusov I.","key":"e_1_2_1_38_1","unstructured":"I. Ganusov and M. Burtscher . 2006. Efficient emulation of hardware prefetchers via event-driven helper threading . Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201906) . I. Ganusov and M. Burtscher. 2006. Efficient emulation of hardware prefetchers via event-driven helper threading. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201906)."},{"volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201915)","author":"Ho C.-H.","key":"e_1_2_1_39_1","unstructured":"C.-H. Ho , S. J. Kim , and K. Sankaralingam . 2015. Efficient execution of memory access phases using datafow specialization . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201915) . 118--130. DOI:10.1145\/2749469.2750390 10.1145\/2749469.2750390 C.-H. Ho, S. J. Kim, and K. Sankaralingam. 2015. Efficient execution of memory access phases using datafow specialization. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201915). 118--130. DOI:10.1145\/2749469.2750390"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201914)","author":"Kumar S.","key":"e_1_2_1_40_1","unstructured":"S. Kumar , A. Shriraman , V. Srinivasan , D. Lin , and J. Phillips . 2014. Sqrl: Hardware accelerator for collecting software data structures . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201914) . 475--476. DOI:10.1145\/2628071.2628118 10.1145\/2628071.2628118 S. Kumar, A. Shriraman, V. Srinivasan, D. Lin, and J. Phillips. 2014. Sqrl: Hardware accelerator for collecting software data structures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201914). 475--476. DOI:10.1145\/2628071.2628118"},{"volume-title":"Proceedings of the International Conference on Supercomputing (ICS\u201915)","author":"Kumar S.","key":"e_1_2_1_41_1","unstructured":"S. Kumar , N. Vedula , A. Shriraman , and V. Srinivasan . 2015. Dasx: Hardware accelerator for software data structures . In Proceedings of the International Conference on Supercomputing (ICS\u201915) . 361--372. DOI:https:\/\/doi.org\/10.1145\/2751205.2751231 10.1145\/2751205.2751231 S. Kumar, N. Vedula, A. Shriraman, and V. Srinivasan. 2015. Dasx: Hardware accelerator for software data structures. In Proceedings of the International Conference on Supercomputing (ICS\u201915). 361--372. DOI:https:\/\/doi.org\/10.1145\/2751205.2751231"},{"volume-title":"Meet the walkers: Accelerating index traversals for inmemory databases. In Proceedings of the IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201913)","author":"Kocberber O.","key":"e_1_2_1_42_1","unstructured":"O. Kocberber , B. Grot , J. Picorel , B. Falsaf ., K. Lim , and P. Ranganathan . 2013 . Meet the walkers: Accelerating index traversals for inmemory databases. In Proceedings of the IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201913) . O. Kocberber, B. Grot, J. Picorel, B. Falsaf., K. Lim, and P. Ranganathan. 2013. Meet the walkers: Accelerating index traversals for inmemory databases. In Proceedings of the IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201913)."},{"volume-title":"Proceedings of the 15th International Symposium on High Performance Computing Architecture (HPCA\u201909)","author":"Ebrahimi E.","key":"e_1_2_1_43_1","unstructured":"E. Ebrahimi , O. Mutlu , and Y. N. Patt . 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems . In Proceedings of the 15th International Symposium on High Performance Computing Architecture (HPCA\u201909) , 7\u201317. E. Ebrahimi, O. Mutlu, and Y. N. Patt. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In Proceedings of the 15th International Symposium on High Performance Computing Architecture (HPCA\u201909), 7\u201317."},{"volume-title":"Proceedings of the 8th International Conference on Architectual Support for Programming Languages and Operating Systems (ASPLOS\u201998)","author":"Roth A.","key":"e_1_2_1_44_1","unstructured":"A. Roth , A. Moshovos , and G. S. Sohi . 1998. Dependence based prefetching for linked data structures . In Proceedings of the 8th International Conference on Architectual Support for Programming Languages and Operating Systems (ASPLOS\u201998) , 115\u2013126. A. Roth, A. Moshovos, and G. S. Sohi. 1998. Dependence based prefetching for linked data structures. In Proceedings of the 8th International Conference on Architectual Support for Programming Languages and Operating Systems (ASPLOS\u201998), 115\u2013126."},{"volume-title":"Proceedings of the 14th International Conference on Supercomputing (ICS\u201900)","author":"Yang C.-L.","key":"e_1_2_1_45_1","unstructured":"C.-L. Yang and A. R. Lebeck . 2000. Push vs. pull: Data movement for linked data structures . In Proceedings of the 14th International Conference on Supercomputing (ICS\u201900) . 176\u2013186. C.-L. Yang and A. R. Lebeck. 2000. Push vs. pull: Data movement for linked data structures. In Proceedings of the 14th International Conference on Supercomputing (ICS\u201900). 176\u2013186."},{"volume-title":"Proceedings of the 35th International Symposium on Microarchitecture. IEEE Computer Society Press","author":"Collins J.","key":"e_1_2_1_46_1","unstructured":"J. Collins , S. Sair , B. Calder , and D. Tullsen . 2002. Pointer cache assisted prefetching . In Proceedings of the 35th International Symposium on Microarchitecture. IEEE Computer Society Press , Los Alamitos, CA, 62\u201373. J. Collins, S. Sair, B. Calder, and D. Tullsen. 2002. Pointer cache assisted prefetching. In Proceedings of the 35th International Symposium on Microarchitecture. IEEE Computer Society Press, Los Alamitos, CA, 62\u201373."},{"volume-title":"Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA\u201999)","author":"Roth Amir","key":"e_1_2_1_47_1","unstructured":"Amir Roth and Gurindar S. Sohi . 1999. Effective jump-pointer prefetching for linked data structures . In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA\u201999) . IEEE Computer Society, Los Alamitos, CA, 111\u2013121. Amir Roth and Gurindar S. Sohi. 1999. Effective jump-pointer prefetching for linked data structures. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA\u201999). IEEE Computer Society, Los Alamitos, CA, 111\u2013121."},{"volume-title":"Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA\u201903)","author":"Wang Zhenlin","key":"e_1_2_1_48_1","unstructured":"Zhenlin Wang , Doug Burger , Kathryn S. McKinley , Steven K. Reinhardt , and Charles C. Weems . 2003. Guided region prefetching: A cooperative hardware\/software approach . In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA\u201903) . ACM, New York, NY, 388\u2013398. Zhenlin Wang, Doug Burger, Kathryn S. McKinley, Steven K. Reinhardt, and Charles C. Weems. 2003. Guided region prefetching: A cooperative hardware\/software approach. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA\u201903). ACM, New York, NY, 388\u2013398."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830807"},{"key":"e_1_2_1_50_1","volume-title":"Multiperspective perceptron predictor. Championship Branch Prediction Competition","author":"Jimenez Daniel","year":"2016","unstructured":"Daniel Jimenez . 2016. Multiperspective perceptron predictor. Championship Branch Prediction Competition ( 2016 ). Daniel Jimenez. 2016. Multiperspective perceptron predictor. Championship Branch Prediction Competition (2016)."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA-14)","author":"\u00a0al H. Gao","year":"2008","unstructured":"H. Gao et \u00a0al . 2008 . Address-branch correlation: A novel locality for long-latency hard-to-predict branches . Proceedings of the International Symposium on High Performance Computer Architecture (HPCA-14) . 74--85. DOI:10.1109\/HPCA.2008.4658629 10.1109\/HPCA.2008.4658629 H. Gao et\u00a0al. 2008. Address-branch correlation: A novel locality for long-latency hard-to-predict branches. Proceedings of the International Symposium on High Performance Computer Architecture (HPCA-14). 74--85. DOI:10.1109\/HPCA.2008.4658629"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/1787275.1787321"},{"volume-title":"Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913)","author":"Farooq M. U.","key":"e_1_2_1_53_1","unstructured":"M. U. Farooq , Khubaib, and L. K. John . 2013. Store-load-branch (SLB) predictor: A compiler assisted branch prediction for data dependent branches . In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913) , 59\u201370. M. U. Farooq, Khubaib, and L. K. John. 2013. Store-load-branch (SLB) predictor: A compiler assisted branch prediction for data dependent branches. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913), 59\u201370."},{"volume-title":"Proceedings of the International Symposium on Memory Systems (MEMSYS\u201917)","author":"G.","key":"e_1_2_1_54_1","unstructured":"Lloyd, G. Scott and Maya Gokhale. 2017. Near memory key\/value lookup acceleration . Proceedings of the International Symposium on Memory Systems (MEMSYS\u201917) . 26--33. DOI:https:\/\/doi.org\/10.1145\/3132402.3132434 10.1145\/3132402.3132434 Lloyd, G. Scott and Maya Gokhale. 2017. Near memory key\/value lookup acceleration. Proceedings of the International Symposium on Memory Systems (MEMSYS\u201917). 26--33. DOI:https:\/\/doi.org\/10.1145\/3132402.3132434"},{"key":"e_1_2_1_55_1","first-page":"1","article-title":"Main-memory hash joins on modern processor architectures","volume":"27","author":"Balkesen Cagri","year":"2014","unstructured":"Cagri Balkesen , Jens Teubner , Gustavo Alonso , and M. Tamer \u00d6zsu . 2014 . Main-memory hash joins on modern processor architectures . IEEE Trans. Knowl. Data Eng. 27 (2014), 1 \u2013 1 . 10.1109\/TKDE.2014.2313874. Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer \u00d6zsu. 2014. Main-memory hash joins on modern processor architectures. IEEE Trans. Knowl. Data Eng. 27 (2014), 1\u20131. 10.1109\/TKDE.2014.2313874.","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544839"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/356989.357013"},{"volume-title":"Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. 329\u2013340","author":"Sheikh R.","key":"e_1_2_1_58_1","unstructured":"R. Sheikh , J. Tuck , and E. Rotenberg . 2012. Control-flow decoupling . In Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. 329\u2013340 . DOI: 10.1109\/MICRO.2012.38 10.1109\/MICRO.2012.38 R. Sheikh, J. Tuck, and E. Rotenberg. 2012. Control-flow decoupling. In Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. 329\u2013340. DOI: 10.1109\/MICRO.2012.38"},{"key":"e_1_2_1_60_1","volume-title":"International Symposium on Code Generation and Optimization 2004","author":"Lattner C.","year":"2004","unstructured":"C. Lattner and V. Adve . 2004. LLVM: a compilation framework for lifelong program analysis & transformation . International Symposium on Code Generation and Optimization 2004 ( 2004 ), 75--86. DOI:10.1109\/CGO.2004.1281665 10.1109\/CGO.2004.1281665 C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. International Symposium on Code Generation and Optimization 2004 (2004), 75--86. DOI:10.1109\/CGO.2004.1281665"},{"volume-title":"Proceedings of the International Conference on Computer Design VLSI in Computers and Processors. 593\u2013601","author":"Chen Cheng K.","key":"e_1_2_1_61_1","unstructured":"Cheng K. Chen , Chih-Chieh Lee , and T. N. Mudge . 1997. Instruction prefetching using branch prediction information . In Proceedings of the International Conference on Computer Design VLSI in Computers and Processors. 593\u2013601 . DOI: 10.1109\/ICCD.1997.628926. 10.1109\/ICCD.1997.628926 Cheng K. Chen, Chih-Chieh Lee, and T. N. Mudge. 1997. Instruction prefetching using branch prediction information. In Proceedings of the International Conference on Computer Design VLSI in Computers and Processors. 593\u2013601. DOI: 10.1109\/ICCD.1997.628926."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155638"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001184"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3374216"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452099","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452099","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3452099","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:07Z","timestamp":1750193287000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3452099"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,8]]},"references-count":62,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3452099"],"URL":"https:\/\/doi.org\/10.1145\/3452099","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,6,8]]},"assertion":[{"value":"2020-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}