{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T05:26:28Z","timestamp":1767849988377,"version":"3.49.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"3-4","license":[{"start":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T00:00:00Z","timestamp":1606694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100005416","name":"Research Council of Norway","doi-asserted-by":"crossref","award":["302279"],"award-info":[{"award-number":["302279"]}],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2020,11,30]]},"abstract":"<jats:p>The front-end bottleneck is a well-established problem in server workloads owing to their deep software stacks and large instruction footprints. Despite years of research into effective L1-I and BTB prefetching, state-of-the-art techniques force a trade-off between metadata storage cost and performance. Temporal Stream prefetchers deliver high performance but require a prohibitive amount of metadata to accommodate the temporal history. Meanwhile, BTB-directed prefetchers incur low cost by using the existing in-core branch prediction structures but fall short on performance due to BTB\u2019s inability to capture the massive control flow working set of server applications. This work overcomes the fundamental limitation of BTB-directed prefetchers, which is capturing a large control flow working set within an affordable BTB storage budget. We re-envision the BTB organization to maximize its control flow coverage by observing that an application\u2019s instruction footprint can be mapped as a combination of its unconditional branch working set and, for each unconditional branch, a spatial encoding of the cache blocks around the branch target. Effectively capturing a map of the application\u2019s instruction footprint in the BTB enables highly effective BTB-directed prefetching that outperforms the state-of-the-art prefetchers by up to 10% for equivalent storage budget.<\/jats:p>","DOI":"10.1145\/3484492","type":"journal-article","created":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T11:27:14Z","timestamp":1641295634000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Shooting Down the Server Front-End Bottleneck"],"prefix":"10.1145","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6306-304X","authenticated-orcid":false,"given":"Rakesh","family":"Kumar","sequence":"first","affiliation":[{"name":"Norwegian University of Science and Technology (NTNU), Trondheim, Norway"}]},{"given":"Boris","family":"Grot","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2022,1,4]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173178"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.5555\/645925.671662"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/279361.279364"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/384265.291067"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2750392"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.5555\/846213.846576"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771774"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540731"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.5555\/320080.320085"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.13"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522308"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2528521.1508281"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540732"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830785"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1109\/HPCA.2017.53","volume-title":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917)","author":"Kumar Rakesh","year":"2017","unstructured":"Rakesh Kumar, Cheng-Chieh Huang, Boris Grot, and Vijay Nagarajan. 2017. Boomerang: A metadata-free architecture for control flow delivery. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917). 493\u2013504. http:\/\/dx.doi.org\/10.1109\/HPCA.2017.53"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155638"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/144965.145016"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.79"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/871656.859629"},{"key":"e_1_3_2_21_2","article-title":"A case for (partially) TAgged GEometric history length branch prediction","volume":"8","author":"Seznec Andr\u00e9","year":"2006","unstructured":"Andr\u00e9 Seznec and Pierre Michaud. 2006. A case for (partially) TAgged GEometric history length branch prediction. Journal of Instruction-Level Parallelism 8 (2006). https:\/\/jilp.org\/vol8\/index.html.","journal-title":"Journal of Instruction-Level Parallelism"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/C-M.1978.218016"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.5555\/846213.846576"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.5555\/580550.876449"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0129053399000065"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514220"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.3002947"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195642"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854044"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772964"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/3049832.3049858"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977666"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.5555\/3314872.3314876"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290979"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/580550.876448"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2986212"},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1109\/MICRO50266.2020.00024","volume-title":"2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Khan Tanvir Ahmed","year":"2020","unstructured":"Tanvir Ahmed Khan, Akshitha Sriraman, Joseph Devietti, Gilles Pokam, Heiner Litz, and Baris Kasikci. 2020. I-SPY: Context-driven conditional instruction prefetching with coalescing. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). 146\u2013159. http:\/\/dx.doi.org\/10.1109\/MICRO50266.2020.00024"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2021.3109945"},{"key":"e_1_3_2_39_2","unstructured":"AMD Software Optimization Guide. Section 2.8.1.2. ([n. d.]). https:\/\/www.amd.com\/system\/files\/TechDocs\/56665.zip."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00017"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"Tanvir Ahmed Khan Nathan Brown Akshitha Sriraman Niranjan K. Soundararajan Rakesh Kumar Joseph Devietti Sreenivas Subramoney Gilles A. Pokam Heiner Litz and Baris Kasikci. 2021. Twig: Profile-guided BTB prefetching for data center applications. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture Virtual Event Greece October 18-22 2021 . ACM 816\u2013829. https:\/\/doi.org\/10.1145\/3466752.3480124","DOI":"10.1145\/3466752.3480124"}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3484492","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3484492","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:14Z","timestamp":1750191434000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3484492"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":40,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[2020,11,30]]}},"alternative-id":["10.1145\/3484492"],"URL":"https:\/\/doi.org\/10.1145\/3484492","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"value":"0734-2071","type":"print"},{"value":"1557-7333","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,30]]},"assertion":[{"value":"2020-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}