{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:38:54Z","timestamp":1772725134663,"version":"3.50.1"},"reference-count":73,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T00:00:00Z","timestamp":1671148800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>Over the years, processor throughput has steadily increased. However, the memory throughput has not increased at the same rate, which has led to the memory wall problem in turn increasing the gap between effective and theoretical peak processor performance. To cope with this, there has been an abundance of work in the area of data\/instruction prefetcher designs. Broadly, prefetchers predict future data\/instruction address accesses and proactively fetch data\/instructions in the memory hierarchy with the goal of lowering data\/instruction access latency. To this end, one or more prefetchers are deployed at each level of the memory hierarchy, but typically, each prefetcher gets designed in isolation without comprehensively accounting for other prefetchers in the system. As a result, individual prefetchers do not always complement each other, and that leads to lower average performance gains and\/or many negative outliers. In this work, we propose Puppeteer, which is a hardware prefetcher manager that uses a suite of random forest regressors to determine at runtime which prefetcher should be ON at each level in the memory hierarchy, such that the prefetchers complement each other and we reduce the data\/instruction access latency. Compared to a design with no prefetchers, using Puppeteer \u00a0we improve IPC by 46.0% in 1 one-core, 25.8% in four-core, and 11.9% in eight-core processors on average across traces generated from SPEC2017, SPEC2006, and Cloud suites with ~11-KB overhead. Moreover, we also reduce the number of negative outliers by more than 89%, and the performance loss of the worst-case negative outlier from 25% to only 5% compared to the state of the art.<\/jats:p>","DOI":"10.1145\/3570304","type":"journal-article","created":{"date-parts":[[2022,10,31]],"date-time":"2022-10-31T12:50:58Z","timestamp":1667220658000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0349-6959","authenticated-orcid":false,"given":"Furkan","family":"Eris","sequence":"first","affiliation":[{"name":"Boston University, Boston, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7749-5396","authenticated-orcid":false,"given":"Marcia","family":"Louis","sequence":"additional","affiliation":[{"name":"Boston University, Boston, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4325-7060","authenticated-orcid":false,"given":"Kubra","family":"Eris","sequence":"additional","affiliation":[{"name":"Boston University, Boston, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3550-720X","authenticated-orcid":false,"given":"Jos\u00e9","family":"Abell\u00e1n","sequence":"additional","affiliation":[{"name":"Universidad Cat\u00f3lica de Murcia, Guadalupe, Murcia, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3256-9942","authenticated-orcid":false,"given":"Ajay","family":"Joshi","sequence":"additional","affiliation":[{"name":"Boston University, Boston, MA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,12,16]]},"reference":[{"key":"e_1_3_2_2_2","article-title":"AMD Ryzen Processor","year":"2017","unstructured":"AMD. 2017. AMD Ryzen Processor. Retrieved November 9, 2022 from https:\/\/www.amd.com\/en\/ryzen.","journal-title":"https:\/\/www.amd.com\/en\/ryzen"},{"key":"e_1_3_2_3_2","unstructured":"AMD. 2020. Software Optimization Guide for AMD EPYC TM 7001 Processors. Retrieved November 9 2022 from https:\/\/ developer.amd.com\/wordpress\/media\/2013\/12\/55723_SOG_Fam_17h_Processors_3.00.pdf."},{"key":"e_1_3_2_4_2","article-title":"GDC 2022 - AMD RyzenTM Processor Software Optimization","year":"2022","unstructured":"AMD. 2022. GDC 2022 - AMD RyzenTM Processor Software Optimization. Retrieved November 9, 2022 from https:\/\/ youtu.be\/helEx02HN_I.","journal-title":"https:\/\/ youtu.be\/helEx02HN_I"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00053"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480114"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322207"},{"key":"e_1_3_2_8_2","unstructured":"Advanced Micro Devices Bios.2010. Kernel Developer Guide (BKDG) for AMD Family 10h Models 00h-0fh Processors. Available at https:\/\/www.amd.com."},{"key":"e_1_3_2_9_2","volume-title":"Proceedings of the International Workshop on AI-Assisted Design for Architecture (AIDArc) Held in Conjunction with ISCA","author":"Braun Peter","year":"2019","unstructured":"Peter Braun and Heiner Litz. 2019. Understanding memory access patterns for prefetching. In Proceedings of the International Workshop on AI-Assisted Design for Architecture (AIDArc) Held in Conjunction with ISCA."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/1168918.1168892"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM.2016.7838029"},{"key":"e_1_3_2_12_2","article-title":"SPEC CPU\u00ae2006","author":"Corporation Standard Performance Evaluation","year":"2006","unstructured":"Standard Performance Evaluation Corporation. 2006. SPEC CPU\u00ae2006. Retrieved November 9, 2022 from https:\/\/www.spec.org\/cpu2006\/","journal-title":"https:\/\/www.spec.org\/cpu2006\/"},{"key":"e_1_3_2_13_2","article-title":"SPEC CPU\u00ae2017","author":"Corporation Standard Performance Evaluation","year":"2017","unstructured":"Standard Performance Evaluation Corporation. 2017. SPEC CPU\u00ae2017. Retrieved November 9, 2022 from https:\/\/www.spec.org\/cpu2017\/","journal-title":"https:\/\/www.spec.org\/cpu2017\/"},{"key":"e_1_3_2_14_2","unstructured":"Koby Crammer Ofer Dekel Joseph Keshet Shai Shalev-Shwartz and Yoram Singer. 2006. Online passive aggressive algorithms. Journal of Machine Learning Research 7 (2006) 551\u2013585."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/1735971.1736058"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024723.2000081"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669154"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798232"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/2643033"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308897.3308958"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.23"},{"key":"e_1_3_2_22_2","unstructured":"Intel. 2011. Intel\u00ae 64 and IA-32 Architectures Software Developer\u2018s Manual. Available at https:\/\/www.intel.com."},{"key":"e_1_3_2_23_2","article-title":"Learning memory access patterns","author":"Hashemi Milad","year":"2018","unstructured":"Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning memory access patterns. arXiv preprint arXiv:1803.02329.","journal-title":"arXiv preprint arXiv:1803.02329."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243181"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337854"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.32"},{"key":"e_1_3_2_27_2","article-title":"Intel i9","year":"2017","unstructured":"Intel. 2017. Intel i9. Retrieved November 9, 2022 from https:\/\/www.intel.com\/content\/www\/us\/en\/products \/details\/processors\/core\/i9.html","journal-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/products \/details\/processors\/core\/i9.html"},{"key":"e_1_3_2_28_2","article-title":"Tuning Intel Xeon","year":"2017","unstructured":"Intel. 2017. Tuning Intel Xeon. Retrieved November 9, 2022 from https:\/\/community.intel.com\/t5\/Software-Tuning-Performance\/How-to-control-th e-four-hardware-prefetchers-in-L1-and-L2-more\/td-p\/1104586.","journal-title":"https:\/\/community.intel.com\/t5\/Software-Tuning-Performance\/How-to-control-th e-four-hardware-prefetchers-in-L1-and-L2-more\/td-p\/1104586"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2022.3210397"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370837"},{"key":"e_1_3_2_31_2","first-page":"78","volume-title":"Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA\u201996)","author":"Kagi A.","year":"1996","unstructured":"A. Kagi, James R. Goodman, and Doug Burger. 1996. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA\u201996). IEEE, Los Alamitos, CA, 78\u201378."},{"key":"e_1_3_2_32_2","first-page":"357","volume-title":"ACM SIGPLAN Notices","author":"Kang Hui","year":"2013","unstructured":"Hui Kang and Jennifer L. Wong. 2013. To hardware prefetch or not to prefetch? A virtualized environment study and core binding approach. ACM SIGPLAN Notices 48 (2013), 357\u2013368."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783763"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093336.3037701"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00018"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-15-8"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2133382.2133384"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654116"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2890505"},{"key":"e_1_3_2_40_2","volume-title":"Adaptive Cache Prefetching Using Machine Learning and Monitoring Hardware Performance Counters","author":"Maldikar Pranita","year":"2014","unstructured":"Pranita Maldikar. 2014. Adaptive Cache Prefetching Using Machine Learning and Monitoring Hardware Performance Counters. Ph.D. Dissertation. University of Minnesota."},{"key":"e_1_3_2_41_2","article-title":"Boosting algorithms as gradient descent","author":"Mason Llew","year":"1999","unstructured":"Llew Mason, Jonathan Baxter, Peter Bartlett, and Marcus Frean. 1999. Boosting algorithms as gradient descent. In Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS\u201999). 512\u2013518.","journal-title":"Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS\u201999)."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3075564.3075578"},{"key":"e_1_3_2_43_2","unstructured":"Tomoki Nakamura Toru Koizumi Yuya Degawa Hidetsugu Irie Shuichi Sakai and Ryota Shioya. 2020. D-JOLT: Distant Jolt Prefetcher. Retrieved November 9 2022 from https:\/\/research.ece.ncsu.edu\/wp-content\/uploads\/sites\/19\/2020\/05\/D-JOLT.pdf."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3229543.3229555"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31537-4_13"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00021"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2749473"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3345000"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2003.1238020"},{"key":"e_1_3_2_50_2","article-title":"DPC3","author":"Pugsley Seth","year":"2019","unstructured":"Seth Pugsley et\u00a0al. 2019. DPC3. Retrieved November 9, 2022 from https:\/\/dpc3.compas.cs.stonybrook.edu\/.","journal-title":"https:\/\/dpc3.compas.cs.stonybrook.edu\/"},{"key":"e_1_3_2_51_2","article-title":"ChampSim","author":"Pugsley Seth","year":"2020","unstructured":"Seth Pugsley et\u00a0al. 2020. ChampSim. Retrieved November 9, 2022 from https:\/\/github.com\/ChampSim\/ChampSim.","journal-title":"https:\/\/github.com\/ChampSim\/ChampSim"},{"key":"e_1_3_2_52_2","article-title":"IPC1","author":"Pugsley Seth","year":"2020","unstructured":"Seth Pugsley et\u00a0al. 2020. IPC1. Retrieved November 9, 2022 from https:\/\/research.ece.ncsu.edu\/ipc\/.","journal-title":"https:\/\/research.ece.ncsu.edu\/ipc\/"},{"key":"e_1_3_2_53_2","first-page":"383","volume-title":"Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, the 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and the 2015 IEEE 12th International Conference on Embedded Software and Systems","author":"Rahman Saami","year":"2015","unstructured":"Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem. 2015. Maximizing hardware prefetch effectiveness with machine learning. In Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, the 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and the 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, Los Alamitos, CA, 383\u2013389."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30115-8_34"},{"key":"e_1_3_2_55_2","unstructured":"Joseph Rogers. 2019. Effects of an LSTM Composite Prefetcher. Retrieved November 9 2022 from https:\/\/www.diva-portal.org\/smash\/get\/diva2:1369282\/FULLTEXT01.pdf."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.3002947"},{"key":"e_1_3_2_57_2","unstructured":"Andr\u00e9 Seznec. 2020. The FNL+MMA Instruction Cache Prefetcher. Retrieved November 9 2022 from https:\/\/research.ece.ncsu.edu\/wp-content\/uploads\/sites\/19\/2020\/05\/FNLMMA-final.pdf."},{"key":"e_1_3_2_58_2","unstructured":"Mehran Shakerinava Mohammad Bakhshalipour Pejman Lotfi-Kamran and Hamid Sarbazi-Azad. 2019. Multi-lookahead offset prefetching. In the Third Data Prefetching Championship (DPC3) in Conjunction with the International Symposium on Computer Architecture (ISCA\u201919) . 1\u20134."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830793"},{"key":"e_1_3_2_60_2","unstructured":"Zhan Shi Akanksha Jain Kevin Swersky Milad Hashemi Parthasarathy Ranganathan and Calvin Lin. 2019. A Neural Hierarchical Sequence Model for Irregular Data Prefetching. Retrieved November 9 2022 from https:\/\/www.cs.utexas.edu\/lin\/papers\/mlsys19.pdf."},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446752"},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/1150019.1136508"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346185"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357526.3357549"},{"key":"e_1_3_2_65_2","first-page":"193","volume-title":"The Top Ten Algorithms in Data Mining","author":"Steinberg Dan","year":"2009","unstructured":"Dan Steinberg. 2009. CART: Classification and regression trees. In The Top Ten Algorithms in Data Mining. Chapman & Hall\/CRC, 193\u2013216."},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00103"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859663"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1002\/0471704091"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798239"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155671"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/216585.216588"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132402.3132405"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCW.2018.8403712"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3422575.3422807"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3570304","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3570304","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:38Z","timestamp":1750182578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3570304"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,16]]},"references-count":73,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3570304"],"URL":"https:\/\/doi.org\/10.1145\/3570304","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,16]]},"assertion":[{"value":"2022-01-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-17","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}