{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T22:06:46Z","timestamp":1775254006483,"version":"3.50.1"},"reference-count":97,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2022YFB3105103"],"award-info":[{"award-number":["2022YFB3105103"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>\n            Indirect memory accesses (IMAs, i.e.,\n            <jats:italic>A<\/jats:italic>\n            [\n            <jats:italic>f<\/jats:italic>\n            (\n            <jats:italic>B<\/jats:italic>\n            [\n            <jats:italic>i<\/jats:italic>\n            ])]) are typical memory access patterns in applications such as graph analysis, machine learning, and database. IMAs are composed of producer-consumer pairs, where the consumers\u2019 memory addresses are derived from the producers\u2019 memory data. Due to the built-in value-dependent feature, IMAs exhibit poor locality, making prefetching ineffective. Hindered by the challenges of recording the potentially complex graphs of instruction dependencies among IMA producers and consumers, current state-of-the-art hardware prefetchers either (a) exhibit inadequate IMA identification abilities or (b)\u00a0rely on the run-ahead mechanism to prefetch IMAs intermittently and insufficiently.\n          <\/jats:p>\n          <jats:p>\n            To solve this problem, we propose Tyche,\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n            an efficient and general hardware prefetcher to enhance IMA performance. Tyche adopts a bilateral propagation mechanism to precisely excavate the instruction dependencies in simple chains with moderate length (rather than complex graphs). Based on the exact instruction dependencies, Tyche can accurately identify various IMA patterns, including nonlinear ones, and generate accurate prefetching requests continuously. Evaluated on broad benchmarks, Tyche achieves an average performance speedup of 16.2% over the state-of-the-art spatial prefetcher Berti. More importantly, Tyche outperforms the state-of-the-art IMA prefetchers IMP, Gretch, and Vector Runahead, by 15.9%, 12.8%, and 10.7%, respectively, with a lower storage overhead of only 0.57 KB.\n          <\/jats:p>","DOI":"10.1145\/3641853","type":"journal-article","created":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T11:52:12Z","timestamp":1705924332000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-3118-8477","authenticated-orcid":false,"given":"Feng","family":"Xue","sequence":"first","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1247-9644","authenticated-orcid":false,"given":"Chenji","family":"Han","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2640-8173","authenticated-orcid":false,"given":"Xinyu","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6118-3541","authenticated-orcid":false,"given":"Junliang","family":"Wu","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China and University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1724-4904","authenticated-orcid":false,"given":"Tingting","family":"Zhang","sequence":"additional","affiliation":[{"name":"Loongson Technology Co., Ltd., Beijing, China and Institute of Computing Technology, CAS, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5341-1343","authenticated-orcid":false,"given":"Tianyi","family":"Liu","sequence":"additional","affiliation":[{"name":"The University of Texas at San Antonio, San Antonio, TX, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9823-2573","authenticated-orcid":false,"given":"Yifan","family":"Hao","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7603-4210","authenticated-orcid":false,"given":"Zidong","family":"Du","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2530-5874","authenticated-orcid":false,"given":"Qi","family":"Guo","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0430-3669","authenticated-orcid":false,"given":"Fuxin","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,3,23]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1109\/IISWC.2015.11","volume-title":"Proceedings of the 2015 IEEE International Symposium on Workload Characterization","author":"Ahmad Masab","year":"2015","unstructured":"Masab Ahmad, Farrukh Hijaz, Qingchuan Shi, and Omer Khan. 2015. CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores. In Proceedings of the 2015 IEEE International Symposium on Workload Characterization. 44\u201355."},{"issue":"2","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1145\/3296957.3173189","article-title":"An Event-Triggered Programmable Prefetcher for Irregular Workloads","volume":"53","author":"Ainsworth Sam","year":"2018","unstructured":"Sam Ainsworth and Timothy M. Jones. 2018. An Event-Triggered Programmable Prefetcher for Irregular Workloads. SIGPLAN Not. 53, 2 (Mar.2018), 578\u2013592.","journal-title":"SIGPLAN Not."},{"issue":"3","key":"e_1_3_2_4_2","first-page":"8","article-title":"Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective","volume":"36","author":"Ainsworth Sam","year":"2019","unstructured":"Sam Ainsworth and Timothy M. Jones. 2019. Software Prefetching for Indirect Memory Accesses: A Microarchitectural Perspective. ACM Trans. Comput. Syst. 36, 3, Article 8 (Jun.2019), 34 pages.","journal-title":"ACM Trans. Comput. Syst."},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1109\/PACT.2003.1238005","volume-title":"Proceedings of the 2003 12th International Conference on Parallel Architectures and Compilation Techniques","author":"Al-Sukhni Hassan","year":"2003","unstructured":"Hassan Al-Sukhni, Ian Bratt, and Daniel A. Connors. 2003. Compiler-Directed Content-Aware Prefetching for Dynamic Data Structures. In Proceedings of the 2003 12th International Conference on Parallel Architectures and Compilation Techniques. IEEE, 91\u2013100."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1145\/379240.379251","volume-title":"Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA\u201901)","author":"Annavaram Murali","year":"2001","unstructured":"Murali Annavaram, Jignesh M. Patel, and Edward S. Davidson. 2001. Data Prefetching by Dependence Graph Precomputation. In Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA\u201901). ACM, New York, 52\u201361."},{"key":"e_1_3_2_7_2","first-page":"513","volume-title":"Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920)","author":"Ayers Grant","year":"2020","unstructured":"Grant Ayers, Heiner Litz, Christos Kozyrakis, and Parthasarathy Ranganathan. 2020. Classifying Memory Access Patterns for Prefetching. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920). ACM, New York, 513\u2013526."},{"key":"e_1_3_2_8_2","first-page":"131","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Bakhshalipour Mohammad","year":"2018","unstructured":"Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2018. Domino Temporal Data Prefetcher. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918). IEEE, Los Alamitos, CA, 131\u2013142."},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1109\/HPCA.2019.00053","volume-title":"Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919)","author":"Bakhshalipour Mohammad","year":"2019","unstructured":"Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. 2019. Bingo Spatial Data Prefetcher. In Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919). IEEE, Los Alamitos, CA, 399\u2013411."},{"issue":"2","key":"e_1_3_2_10_2","first-page":"14","article-title":"CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories","volume":"14","author":"Balasubramonian Rajeev","year":"2017","unstructured":"Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim. 14, 2, Article 14 (Jun.2017), 25 pages.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_2_11_2","first-page":"362","volume-title":"Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE\u201913)","author":"Balkesen Cagri","year":"2013","unstructured":"Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer \u00d6zsu. 2013. Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE\u201913). IEEE, Los Alamitos, CA, 362\u2013373."},{"key":"e_1_3_2_12_2","first-page":"373","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919)","author":"Basak Abanti","year":"2019","unstructured":"Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919). IEEE, Los Alamitos, CA, 373\u2013386."},{"key":"e_1_3_2_13_2","first-page":"41","volume-title":"Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC\u201905)","author":"Bellard Fabrice","year":"2005","unstructured":"Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC\u201905). USENIX Association, , 41."},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1145\/3352460.3358325","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201919)","author":"Bera Rahul","year":"2019","unstructured":"Rahul Bera, Anant V. Nori, Onur Mutlu, and Sreenivas Subramoney. 2019. DSPatch: Dual Spatial Pattern Prefetcher. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201919). ACM, New York, 531\u2013544."},{"issue":"1","key":"e_1_3_2_15_2","first-page":"4","article-title":"Informed Prefetching for Indirect Memory Accesses","volume":"17","author":"Cavus Mustafa","year":"2020","unstructured":"Mustafa Cavus, Resit Sendag, and Joshua J. Yi. 2020. Informed Prefetching for Indirect Memory Accesses. ACM Trans. Archit. Code Optim. 17, 1, Article 4 (Mar.2020), 29 pages.","journal-title":"ACM Trans. Archit. Code Optim."},{"issue":"2","key":"e_1_3_2_16_2","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1145\/307338.300995","article-title":"Simultaneous Subordinate Microthreading (SSMT)","volume":"27","author":"Chappell Robert S.","year":"1999","unstructured":"Robert S. Chappell, Jared Stark, Sangwook P. Kim, Steven K. Reinhardt, and Yale N. Patt. 1999. Simultaneous Subordinate Microthreading (SSMT). SIGARCH Comput. Archit. News 27, 2 (May1999), 186\u2013195.","journal-title":"SIGARCH Comput. Archit. News"},{"key":"e_1_3_2_17_2","article-title":"Improving Hash Join Performance through Prefetching","author":"Chen Shimin","year":"2007","unstructured":"Shimin Chen, Anastassia Ailamaki, Phillip B. Gibbons, and Todd C. Mowry. 2007. Improving Hash Join Performance through Prefetching. ACM Trans. Database Syst. 32, 3 (2007), 36 pages.","journal-title":"ACM Trans. Database Syst."},{"issue":"05","key":"e_1_3_2_18_2","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1109\/12.381947","article-title":"Effective Hardware-Based Data Prefetching for High-Performance Processors","volume":"44","author":"Chen Tien-Fu","year":"1995","unstructured":"Tien-Fu Chen and Jean-Loup Baer. 1995. Effective Hardware-Based Data Prefetching for High-Performance Processors. IEEE Trans. Comput. 44, 05 (May1995), 609\u2013623.","journal-title":"IEEE Trans. Comput."},{"issue":"2","key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1145\/986533.986536","article-title":"A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-Chain Prefetching","volume":"22","author":"Choi Seungryul","year":"2004","unstructured":"Seungryul Choi, Nicholas Kohout, Sumit Pamnani, Dongkeun Kim, and Donald Yeung. 2004. A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-Chain Prefetching. ACM Trans. Comput. Syst. 22, 2 (May2004), 214\u2013280.","journal-title":"ACM Trans. Comput. Syst."},{"key":"e_1_3_2_20_2","first-page":"62","volume-title":"Proceedings of the 35th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201902)","author":"Collins Jamison","year":"2002","unstructured":"Jamison Collins, Suleyman Sair, Brad Calder, and Dean M. Tullsen. 2002. Pointer Cache Assisted Prefetching. In Proceedings of the 35th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201902). IEEE, Los Alamitos, CA, 62\u201373."},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1109\/MICRO.2001.991128","volume-title":"Proceedings of the 34th ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201901)","author":"Collins Jamison D.","year":"2001","unstructured":"Jamison D. Collins, Dean M. Tullsen, Hong Wang, and John Paul Shen. 2001. Dynamic speculative precomputation. In Proceedings of the 34th ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201901). IEEE, Los Alamitos, CA, 306\u2013317."},{"issue":"2","key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1145\/384285.379248","article-title":"Speculative Precomputation: Long-Range Prefetching of Delinquent Loads","volume":"29","author":"Collins Jamison D.","year":"2001","unstructured":"Jamison D. Collins, Hong Wang, Dean M. Tullsen, Christopher Hughes, Yong-Fong Lee, Dan Lavery, and John P. Shen. 2001. Speculative Precomputation: Long-Range Prefetching of Delinquent Loads. SIGARCH Comput. Archit. News 29, 2 (May2001), 14\u201325.","journal-title":"SIGARCH Comput. Archit. News"},{"issue":"10","key":"e_1_3_2_23_2","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1145\/605432.605427","article-title":"A Stateless, Content-Directed Data Prefetching Mechanism","volume":"37","author":"Cooksey Robert","year":"2002","unstructured":"Robert Cooksey, Stephan Jourdan, and Dirk Grunwald. 2002. A Stateless, Content-Directed Data Prefetching Mechanism. SIGPLAN Not. 37, 10 (Oct.2002), 279\u2013290.","journal-title":"SIGPLAN Not."},{"issue":"1","key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2049662.2049663","article-title":"The University of Florida Sparse Matrix Collection","volume":"38","author":"Davis Timothy A.","year":"2011","unstructured":"Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec.2011), 25 pages.","journal-title":"ACM Trans. Math. Softw."},{"key":"e_1_3_2_25_2","first-page":"1","volume-title":"2006 IEEE Hot Chips 18 Symposium (HCS)","author":"Doweck Jack","year":"2006","unstructured":"Jack Doweck. 2006. Inside Intel\u00aeCore microarchitecture. In 2006 IEEE Hot Chips 18 Symposium (HCS). IEEE, Los Alamitos, CA, USA, 1\u201335."},{"key":"e_1_3_2_26_2","first-page":"68","volume-title":"Proceedings of the 11th International Conference on Supercomputing (ICS\u201997)","author":"Dundas James","year":"1997","unstructured":"James Dundas and Trevor Mudge. 1997. Improving Data Cache Performance by Pre-Executing Instructions under a Cache Miss. In Proceedings of the 11th International Conference on Supercomputing (ICS\u201997). ACM, New York, NY, USA, 68\u201375."},{"key":"e_1_3_2_27_2","first-page":"7","volume-title":"Proceedings of the 2009 IEEE 15th International Symposium on High Performance Computer Architecture","author":"Ebrahimi Eiman","year":"2009","unstructured":"Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In Proceedings of the 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, Los Alamitos, CA, 7\u201317."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","unstructured":"Nathan Gober Gino Chacon Lei Wang Paul V. Gratz Daniel A. Jimenez Elvira Teran Seth Pugsley and Jinchun Kim. 2022. The Championship Simulator: Architectural Simulation for Education and Competition. (2022). DOI:10.48550\/arXiv.2210.14324","DOI":"10.48550\/arXiv.2210.14324"},{"key":"e_1_3_2_29_2","first-page":"599","volume-title":"Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201914)","author":"Gonzalez Joseph E.","year":"2014","unstructured":"Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. \\(\\lbrace\\) GraphX \\(\\rbrace\\) : Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201914). USENIX Association, Broomfield, CO, 599\u2013613."},{"key":"e_1_3_2_30_2","first-page":"40","volume-title":"Proceedings of the ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920)","author":"Grayson Brian","year":"2020","unstructured":"Brian Grayson, Jeff Rupley, Gerald Zuraski, Eric Quinnell, Daniel A. Jim\u00e9nez, Tarun Nakra, Paul Kitchin, Ryan Hensley, Edward Brekelbaum, Vikas Sinha, and Ankit Ghiya. 2020. Evolution of the Samsung Exynos CPU Microarchitecture. In Proceedings of the ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920). IEEE, Los Alamitos, CA, 40\u201351."},{"key":"e_1_3_2_31_2","first-page":"1","volume-title":"Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Ham Tae Jun","year":"2016","unstructured":"Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). IEEE, 1\u201313."},{"key":"e_1_3_2_32_2","first-page":"1","volume-title":"Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Hashemi Milad","year":"2016","unstructured":"Milad Hashemi, Onur Mutlu, and Yale N. Patt. 2016. Continuous Runahead: Transparent Hardware Acceleration for Memory Intensive Workloads. In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). IEEE, Los Alamitos, CA, 1\u201312."},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1145\/2830772.2830812","volume-title":"Proceedings of the 48th International Symposium on Microarchitecture (MICRO\u201915)","author":"Hashemi Milad","year":"2015","unstructured":"Milad Hashemi and Yale N. Patt. 2015. Filtered Runahead Execution with a Runahead Buffer. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO\u201915). ACM, New York, 358\u2013369."},{"key":"e_1_3_2_34_2","first-page":"317","volume-title":"Proceedings of the 19th International Symposium on High Performance Computer Architecture (HPCA\u201903)","author":"Hu Zhigang","year":"2003","unstructured":"Zhigang Hu, Margaret Martonosi, and Stefanos Kaxiras. 2003. TCP: Tag Correlating Prefetchers. In Proceedings of the 19th International Symposium on High Performance Computer Architecture (HPCA\u201903). IEEE, Los Alamitos, CA, 317\u2013326."},{"key":"e_1_3_2_35_2","first-page":"604","volume-title":"Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC\u201998)","author":"Indyk Piotr","year":"1998","unstructured":"Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC\u201998). ACM, New York, 604\u2013613."},{"key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1145\/2540708.2540730","volume-title":"Proceedings of the 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201913)","author":"Jain Akanksha","year":"2013","unstructured":"Akanksha Jain and Calvin Lin. 2013. Linearizing Irregular Memory Accesses for Improved Correlated Prefetching. In Proceedings of the 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201913). ACM, New York, 247\u2013259."},{"key":"e_1_3_2_37_2","first-page":"747","volume-title":"Proceedings of the 17th European Conference on Computer Systems (EuroSys\u201922)","author":"Jamilan Saba","year":"2022","unstructured":"Saba Jamilan, Tanvir Ahmed Khan, Grant Ayers, Baris Kasikci, and Heiner Litz. 2022. APT-GET: Profile-Guided Timely Software Prefetching. In Proceedings of the 17th European Conference on Computer Systems (EuroSys\u201922). ACM, New York, 747\u2013764."},{"key":"e_1_3_2_38_2","first-page":"1012","volume-title":"Proceedings of the 55th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201922)","author":"Jiang Shizhi","year":"2022","unstructured":"Shizhi Jiang, Qiusong Yang, and Yiwei Ci. 2022. Merging Similar Patterns for Hardware Prefetching. In Proceedings of the 55th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201922). IEEE, Los Alamitos, CA, 1012\u20131026."},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1145\/264107.264207","volume-title":"Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA\u201997)","author":"Joseph Doug","year":"1997","unstructured":"Doug Joseph and Dirk Grunwald. 1997. Prefetching Using Markov Predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA\u201997). ACM, New York, 252\u2013263."},{"key":"e_1_3_2_40_2","first-page":"623","volume-title":"Proceedings of the 47th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201914)","author":"Kadjo David","year":"2014","unstructured":"David Kadjo, Jinchun Kim, Prabal Sharma, Reena Panda, Paul Gratz, and Daniel Jimenez. 2014. B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors. In Proceedings of the 47th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201914). IEEE, 623\u2013634."},{"issue":"1","key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1145\/1961295.1950411","article-title":"Inter-Core Prefetching for Multicore Processors Using Migrating Helper Threads","volume":"39","author":"Kamruzzaman Md","year":"2011","unstructured":"Md Kamruzzaman, Steven Swanson, and Dean M. Tullsen. 2011. Inter-Core Prefetching for Multicore Processors Using Migrating Helper Threads. SIGARCH Comput. Archit. News 39, 1 (Mar.2011), 393\u2013404.","journal-title":"SIGARCH Comput. Archit. News"},{"key":"e_1_3_2_42_2","first-page":"243","volume-title":"Proceedings of the 27th International Conference on Data Engineering","author":"Kang U.","year":"2011","unstructured":"U. Kang, Duen Horng Chau, and Christos Faloutsos. 2011. Mining large graphs: Algorithms, inference, and discoveries. In Proceedings of the 27th International Conference on Data Engineering. IEEE, Los Alamitos, CA, 243\u2013254."},{"issue":"2","key":"e_1_3_2_43_2","first-page":"18","article-title":"Gretch: A Hardware Prefetcher for Graph Analytics","volume":"18","author":"Kaushik Anirudh Mohan","year":"2021","unstructured":"Anirudh Mohan Kaushik, Gennady Pekhimenko, and Hiren Patel. 2021. Gretch: A Hardware Prefetcher for Graph Analytics. ACM Trans. Archit. Code Optim. 18, 2, Article 18 (Feb.2021), 25 pages.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_2_44_2","volume-title":"Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Kim Jinchun","year":"2016","unstructured":"Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy, Chris Wilkerson, and Zeshan Chishti. 2016. Path Confidence Based Lookahead Prefetching. In Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). IEEE, Los Alamitos, CA, Article 60, 12 pages."},{"key":"e_1_3_2_45_2","first-page":"468","volume-title":"Proceedings of the 46th International Symposium on Microarchitecture (MICRO\u201913)","author":"Kocberber Onur","year":"2013","unstructured":"Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the Walkers: Accelerating Index Traversals for in-Memory Databases. In Proceedings of the 46th International Symposium on Microarchitecture (MICRO\u201913). ACM, 468\u2013479."},{"key":"e_1_3_2_46_2","first-page":"83","volume-title":"Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA\u201918)","author":"Kondguli Sushant","year":"2018","unstructured":"Sushant Kondguli and Michael Huang. 2018. Division of Labor: A More Effective Approach to Prefetching. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA\u201918). IEEE, Los Alamitos, CA, 83\u201395."},{"issue":"09","key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1109\/TPDS.2008.224","article-title":"Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems","volume":"20","author":"Lee Jaejin","year":"2009","unstructured":"Jaejin Lee, Changhee Jung, Daeseob Lim, and Yan Solihin. 2009. Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems. IEEE Transactions on Parallel and Distributed Systems 20, 09 (Sep.2009), 1309\u20131324.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_2_48_2","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved June 6 2023 from http:\/\/snap.stanford.edu\/data"},{"key":"e_1_3_2_49_2","first-page":"685","volume-title":"Proceedings of the 29th 2008 IEEE International Conference on Computer Design","author":"Lim Chun Leng","year":"2008","unstructured":"Chun Leng Lim and Gregory T. Byrd. 2008. Exploiting producer patterns and L2 cache for timely dependence-based prefetching. In Proceedings of the 29th 2008 IEEE International Conference on Computer Design. IEEE, Los Alamitos, CA, 685\u2013692."},{"key":"e_1_3_2_50_2","first-page":"300","volume-title":"Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201922)","author":"Litz Heiner","year":"2022","unstructured":"Heiner Litz, Grant Ayers, and Parthasarathy Ranganathan. 2022. CRISP: Critical Slice Prefetching. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201922). ACM, New York, 300\u2013313."},{"key":"e_1_3_2_51_2","first-page":"339","volume-title":"Proceedings of the 29th ACM on International Conference on Supercomputing (ICS\u201915)","author":"Liu Weifeng","year":"2015","unstructured":"Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS\u201915). ACM, New York, 339\u2013350."},{"key":"e_1_3_2_52_2","first-page":"417","volume-title":"Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920)","author":"Lockerman Elliot","year":"2020","unstructured":"Elliot Lockerman, Axel Feldmann, Mohammad Bakhshalipour, Alexandru Stanescu, Shashwat Gupta, Daniel Sanchez, and Nathan Beckmann. 2020. Livia: Data-Centric Computing Throughout the Memory Hierarchy. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920). ACM, New York, 417\u2013433."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/ISCA.2001.937430","volume-title":"Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA\u201901)","author":"Luk Chi-Keung","year":"2001","unstructured":"Chi-Keung Luk. 2001. Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors. In Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA\u201901). ACM, New York, 40\u201351."},{"key":"e_1_3_2_54_2","first-page":"222","volume-title":"Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201996)","author":"Luk Chi-Keung","year":"1996","unstructured":"Chi-Keung Luk and Todd C. Mowry. 1996. Compiler-Based Prefetching for Recursive Data Structures. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201996). ACM, New York, 222\u2013233."},{"key":"e_1_3_2_55_2","volume-title":"Proceedings of the 2006 ACM\/IEEE Conference on Supercomputing (SC\u201906)","author":"Luszczek Piotr R.","year":"2006","unstructured":"Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) Benchmark Suite. In Proceedings of the 2006 ACM\/IEEE Conference on Supercomputing (SC\u201906). ACM, New York, Article 213."},{"issue":"01","key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1109\/TC.2022.3157525","article-title":"Graphfire: Synergizing Fetch, Insertion, and Replacement Policies for Graph Analytics","volume":"72","author":"Manocha Aninda","year":"2023","unstructured":"Aninda Manocha, Juan L. Arag\u00f3n, and Margaret Martonosi. 2023. Graphfire: Synergizing Fetch, Insertion, and Replacement Policies for Graph Analytics. IEEE Trans. Comput. 72, 01 (Jan.2023), 291\u2013304.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_2_57_2","first-page":"469","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916)","author":"Michaud Pierre","year":"2016","unstructured":"Pierre Michaud. 2016. Best-Offset Hardware Prefetching. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916). IEEE, Los Alamitos, CA, 469\u2013480."},{"key":"e_1_3_2_58_2","article-title":"Introducing the Graph 500","author":"Murphy Richard C.","year":"2010","unstructured":"Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the Graph 500. Cray Users Group (CUG) (2010).","journal-title":"Cray Users Group (CUG)"},{"key":"e_1_3_2_59_2","first-page":"370","volume-title":"Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA\u201905)","author":"Mutlu Onur","year":"2005","unstructured":"Onur Mutlu, Hyesoon Kim, and Yale N. Patt. 2005. Techniques for Efficient Processing in Runahead Execution Engines. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA\u201905). IEEE, Los Alamitos, CA, 370\u2013381."},{"issue":"01","key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/MM.2006.10","article-title":"Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance","volume":"26","author":"Mutlu Onur","year":"2006","unstructured":"Onur Mutlu, Hyesoon Kim, and Yale N. Patt. 2006. Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance. IEEE Micro 26, 01 (Jan.2006), 10\u201320.","journal-title":"IEEE Micro"},{"key":"e_1_3_2_61_2","first-page":"129","volume-title":"Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201903)","author":"Mutlu Onur","year":"2003","unstructured":"Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. 2003. Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201903). IEEE, Los Alamitos, CA, 129."},{"issue":"06","key":"e_1_3_2_62_2","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MM.2003.1261383","article-title":"Runahead Execution: An Effective Alternative to Large Instruction Windows","volume":"23","author":"Mutlu Onur","year":"2003","unstructured":"Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. 2003. Runahead Execution: An Effective Alternative to Large Instruction Windows. IEEE Micro 23, 06 (Nov.2003), 20\u201325.","journal-title":"IEEE Micro"},{"key":"e_1_3_2_63_2","first-page":"195","volume-title":"Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA\u201921)","author":"Naithani Ajeya","year":"2021","unstructured":"Ajeya Naithani, Sam Ainsworth, Timothy M. Jones, and Lieven Eeckhout. 2021. Vector Runahead. In Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA\u201921). IEEE, Los Alamitos, CA, USA, 195\u2013208."},{"key":"e_1_3_2_64_2","first-page":"397","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Naithani Ajeya","year":"2020","unstructured":"Ajeya Naithani, Josu\u00e9 Feliu, Almutaz Adileh, and Lieven Eeckhout. 2020. Precise Runahead Execution. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920). IEEE, Los Alamitos, CA, USA, 397\u2013410."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1145\/3613424.3614255","volume-title":"Proceedings of the 56th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201923)","author":"Naithani Ajeya","year":"2023","unstructured":"Ajeya Naithani, Jaime Roelandts, Sam Ainsworth, Timothy M. Jones, and Lieven Eeckhout. 2023. Decoupled Vector Runahead. In Proceedings of the 56th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201923). ACM, New York, 17\u201331."},{"key":"e_1_3_2_66_2","first-page":"975","volume-title":"Proceedings of the 55th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201922)","author":"Navarro-Torres Agust\u00edn","year":"2023","unstructured":"Agust\u00edn Navarro-Torres, Biswabandan Panda, Jes\u00fas Alastruey-Bened\u00e9, Pablo Ib\u00e1\u00f1ez, V\u00edctor Vi\u00f1als Y\u00fafera, and Alberto Ros. 2023. Berti: An Accurate Local-Delta Data Prefetcher. In Proceedings of the 55th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201922). IEEE, 975\u2013991."},{"key":"e_1_3_2_67_2","first-page":"96","volume-title":"Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913)","author":"Nesbit Kyle J.","year":"2004","unstructured":"Kyle J. Nesbit and James E. Smith. 2004. Data Cache Prefetching Using a Global History Buffer. In Proceedings of the IEEE 19th International Symposium on High Performance Computer Architecture (HPCA\u201913). IEEE, Los Alamitos, CA, USA, 96."},{"issue":"03","key":"e_1_3_2_68_2","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1109\/TC.2020.2990302","article-title":"Compiler-Assisted Data Streaming for Regular Code Structures","volume":"70","author":"Neves Nuno","year":"2021","unstructured":"Nuno Neves, Pedro Tom\u00e1s, and Nuno Roma. 2021. Compiler-Assisted Data Streaming for Regular Code Structures. IEEE Trans. Comput. 70, 03 (Mar.2021), 483\u2013494.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_2_69_2","first-page":"596","volume-title":"Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Nguyen Quan M.","year":"2020","unstructured":"Quan M. Nguyen and Daniel Sanchez. 2020. Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism. In Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). IEEE, Los Alamitos, CA, 596\u2013608."},{"key":"e_1_3_2_70_2","first-page":"118","volume-title":"Proceedings of the ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920)","author":"Pakalapati Samuel","year":"2020","unstructured":"Samuel Pakalapati and Biswabandan Panda. 2020. Bouquet of Instruction Pointers: Instruction Pointer Classifier-Based Spatial Hardware Prefetching. In Proceedings of the ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920). IEEE, Los Alamitos, CA, 118\u2013131."},{"key":"e_1_3_2_71_2","first-page":"1","volume-title":"2021 IEEE Hot Chips 33 Symposium","author":"Pellegrini Andrea","year":"2021","unstructured":"Andrea Pellegrini. 2021. Arm Neoverse N2: Arm\u2019s 2nd generation high performance infrastructure CPUs and system IPs. In 2021 IEEE Hot Chips 33 Symposium. IEEE, Los Alamitos, CA, 1\u201327."},{"key":"e_1_3_2_72_2","first-page":"626","volume-title":"Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA\u201914)","author":"Pugsley Seth H.","year":"2014","unstructured":"Seth H. Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L. Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. 2014. Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers. In Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA\u201914). IEEE, Los Alamitos, CA, 626\u2013637."},{"issue":"5","key":"e_1_3_2_73_2","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1145\/1037947.1024416","article-title":"Compiler Orchestrated Prefetching via Speculation and Predication","volume":"32","author":"Rabbah Rodric M.","year":"2004","unstructured":"Rodric M. Rabbah, Hariharan Sandanagobalane, Mongkol Ekpanyapong, and Weng-Fai Wong. 2004. Compiler Orchestrated Prefetching via Speculation and Predication. SIGARCH Comput. Archit. News 32, 5 (Oct.2004), 189\u2013198.","journal-title":"SIGARCH Comput. Archit. News"},{"key":"e_1_3_2_74_2","first-page":"115","volume-title":"Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201998)","author":"Roth Amir","year":"1998","unstructured":"Amir Roth, Andreas Moshovos, and Gurindar S. Sohi. 1998. Dependence Based Prefetching for Linked Data Structures. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201998). ACM, 115\u2013126."},{"key":"e_1_3_2_75_2","first-page":"111","volume-title":"Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA\u201999)","author":"Roth Amir","year":"1999","unstructured":"Amir Roth and Gurindar S. Sohi. 1999. Effective Jump-Pointer Prefetching for Linked Data Structures. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA\u201999). IEEE, Los Alamitos, CA, 111\u2013121."},{"key":"e_1_3_2_76_2","volume-title":"Proceedings of the 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5)","author":"Seznec Andr\u00e9","year":"2016","unstructured":"Andr\u00e9 Seznec. 2016. Tage-sc-l branch predictors again. In Proceedings of the 5th JILP Workshop on Computer Architecture Competitions (JWAC-5): Championship Branch Prediction (CBP-5)."},{"issue":"10","key":"e_1_3_2_77_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1145\/605432.605403","article-title":"Automatically Characterizing Large Scale Program Behavior","volume":"37","author":"Sherwood Timothy","year":"2002","unstructured":"Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically Characterizing Large Scale Program Behavior. SIGPLAN Not. 37, 10 (Oct.2002), 45\u201357.","journal-title":"SIGPLAN Not."},{"key":"e_1_3_2_78_2","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1145\/2830772.2830793","volume-title":"Proceedings of the 48th International Symposium on Microarchitecture (MICRO\u201915)","author":"Shevgoor Manjunath","year":"2015","unstructured":"Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chishti. 2015. Efficiently Prefetching Complex Address Patterns. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO\u201915). ACM, New York, 141\u2013152."},{"key":"e_1_3_2_79_2","first-page":"135","volume-title":"Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913)","author":"Shun Julian","year":"2013","unstructured":"Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201913). ACM, New York, 135\u2013146."},{"issue":"1","key":"e_1_3_2_80_2","doi-asserted-by":"crossref","first-page":"2:1\u20132:21","DOI":"10.1147\/JRD.2014.2376112","article-title":"IBM POWER8 processor core microarchitecture","volume":"59","author":"Sinharoy B.","year":"2015","unstructured":"B. Sinharoy, J. A. Van Norstrand, R. J. Eickemeyer, H. Q. Le, J. Leenstra, D. Q. Nguyen, B. Konigsburg, K. Ward, M. D. Brown, J. E. Moreira, D. Levitan, S. Tung, D. Hrusecky, J. W. Bishop, M. Gschwind, M. Boersma, M. Kroener, M. Kaltenbach, T. Karkhanis, and K. M. Fernsler. 2015. IBM POWER8 processor core microarchitecture. IBM Journal of Research and Development 59, 1 (2015), 2:1\u20132:21.","journal-title":"IBM Journal of Research and Development"},{"key":"e_1_3_2_81_2","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1145\/1555754.1555766","volume-title":"Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA\u201909)","author":"Somogyi Stephen","year":"2009","unstructured":"Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, and Babak Falsafi. 2009. Spatio-Temporal Memory Streaming. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA\u201909). ACM, New York, 69\u201380."},{"key":"e_1_3_2_82_2","first-page":"252","volume-title":"Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA\u201906)","author":"Somogyi Stephen","year":"2006","unstructured":"Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2006. Spatial Memory Streaming. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA\u201906). IEEE, Los Alamitos, CA, 252\u2013263."},{"key":"e_1_3_2_83_2","first-page":"654","volume-title":"Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921)","author":"Talati Nishil","year":"2021","unstructured":"Nishil Talati, Kyle May, Armand Behroozi, Yichen Yang, Kuba Kaszyk, Christos Vasiladiotis, Tarunesh Verma, Lu Li, Brandon Nguyen, Jiawen Sun, John Magnus Morton, Agreen Ahmadi, Todd Austin, Michael O\u2019Boyle, Scott Mahlke, Trevor Mudge, and Ronald Dreslinski. 2021. Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921). IEEE, Los Alamitos, CA, 654\u2013667."},{"key":"e_1_3_2_84_2","unstructured":"Loongson Technology. 2022. LoongArch Documentation. Retrieved December 12 2023 from https:\/\/loongson.github.io\/LoongArch-Documentation\/"},{"key":"e_1_3_2_85_2","unstructured":"Loongson Technology. 2022. Typical Instruction Formats in LoongArch. Retrieved December 12 2023 from https:\/\/loongson.github.io\/LoongArch-Documentation\/LoongArch-Vol1-EN.html"},{"key":"e_1_3_2_86_2","first-page":"956","volume-title":"Proceedings of the 55th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201922)","author":"Vavouliotis Georgios","year":"2023","unstructured":"Georgios Vavouliotis, Gino Chacon, Lluc Alvarez, Paul V. Gratz, Daniel A. Jim\u00e9nez, and Marc Casas. 2023. Page Size Aware Cache Prefetching. In Proceedings of the 55th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201922). IEEE, Los Alamitos, CA, 956\u2013974."},{"key":"e_1_3_2_87_2","first-page":"New York, 197\u20132","volume-title":"Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys\u201913)","author":"Venkataraman Shivaram","year":"2013","unstructured":"Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber. 2013. Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys\u201913). ACM, New York, 197\u2013210."},{"key":"e_1_3_2_88_2","volume-title":"Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA\u201909)","author":"Wenisch Thomas F.","year":"2009","unstructured":"Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2009. Practical off-chip meta-data for temporal memory streaming. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA\u201909). IEEE, Los Alamitos, CA."},{"key":"e_1_3_2_89_2","first-page":"Los Alamitos, C","volume-title":"Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA\u201905)","author":"Wenisch Thomas F.","year":"2005","unstructured":"Thomas F. Wenisch, Stephen Somogyi, Nikolaos Hardavellas, Jangwoo Kim, Anastassia Ailamaki, and Babak Falsafi. 2005. Temporal Streaming of Shared Memory. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA\u201905). IEEE, Los Alamitos, CA, 222\u2013233."},{"key":"e_1_3_2_90_2","first-page":"New York, 996\u20131","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201952)","author":"Wu Hao","year":"2019","unstructured":"Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. 2019. Temporal Prefetching Without the Off-Chip Metadata. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201952). ACM, New York, 996\u20131008."},{"key":"e_1_3_2_91_2","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1145\/3307650.3322225","volume-title":"Proceedings of the 46th International Symposium on Computer Architecture (ISCA\u201919)","author":"Wu Hao","year":"2019","unstructured":"Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, and Calvin Lin. 2019. Efficient Metadata Management for Irregular Data Prefetching. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA\u201919). ACM, New York, 449\u2013461."},{"key":"e_1_3_2_92_2","first-page":"615","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201919)","author":"Yan Mingyu","year":"2019","unstructured":"Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2019. Alleviating Irregularity in Graph Analytics Acceleration: A Hardware\/Software Co-Design Approach. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201919). ACM, New York, 615\u2013628."},{"key":"e_1_3_2_93_2","first-page":"178","volume-title":"Proceedings of the 48th International Symposium on Microarchitecture (MICRO\u201915)","author":"Yu Xiangyao","year":"2015","unstructured":"Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, and Srinivas Devadas. 2015. IMP: Indirect Memory Prefetcher. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO\u201915). ACM, New York, 178\u2013190."},{"key":"e_1_3_2_94_2","first-page":"609","volume-title":"Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Zhang Chao","year":"2020","unstructured":"Chao Zhang, Yuan Zeng, John Shalf, and Xiaochen Guo. 2020. RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher. In Proceedings of the 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). IEEE, Los Alamitos, CA, 609\u2013621."},{"key":"e_1_3_2_95_2","first-page":"593","volume-title":"Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201918)","author":"Zhang Dan","year":"2018","unstructured":"Dan Zhang, Xiaoyu Ma, Michael Thomson, and Derek Chiou. 2018. Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201918). ACM, New York, 593\u2013607."},{"key":"e_1_3_2_96_2","first-page":"85","volume-title":"Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA\u201907)","author":"Zhang Weifeng","year":"2007","unstructured":"Weifeng Zhang, Dean M. Tullsen, and Brad Calder. 2007. Accelerating and Adapting Precomputation Threads for Effcient Prefetching. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA\u201907). IEEE, Los Alamitos, CA, 85\u201395."},{"key":"e_1_3_2_97_2","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1145\/3470496.3527409","volume-title":"Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA\u201922)","author":"Zhao Jin","year":"2022","unstructured":"Jin Zhao, Yun Yang, Yu Zhang, Xiaofei Liao, Lin Gu, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, Xinyu Jiang, and Hui Yu. 2022. TDGraph: A Topology-Driven Accelerator for High-Performance Streaming Graph Processing. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA\u201922). ACM, New York, 116\u2013129."},{"key":"e_1_3_2_98_2","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1145\/379240.379246","volume-title":"Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA\u201901)","author":"Zilles Craig","year":"2001","unstructured":"Craig Zilles and Gurindar Sohi. 2001. Execution-Based Prediction Using Speculative Slices. In Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA\u201901). ACM, New York, 2\u201313."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3641853","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3641853","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:03Z","timestamp":1750291443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3641853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,23]]},"references-count":97,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3641853"],"URL":"https:\/\/doi.org\/10.1145\/3641853","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,23]]},"assertion":[{"value":"2023-11-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-11","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}