{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:51:26Z","timestamp":1750308686377,"version":"3.41.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2015,3,9]],"date-time":"2015-03-09T00:00:00Z","timestamp":1425859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["CCF-0916583 and CPS-0931931"],"award-info":[{"award-number":["CCF-0916583 and CPS-0931931"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,4,16]]},"abstract":"<jats:p>Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as graphic processing unit (GPU) cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is one of the most important shared resources due to its impact on performance. Accesses to the shared LLC in heterogeneous multicore processors can be dominated by the GPU due to the significantly higher number of concurrent threads supported by the architecture. Under current cache management policies, the CPU applications\u2019 share of the LLC can be significantly reduced in the presence of competing GPU applications. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism (TLP). In addition to the performance challenge, introduction of diverse cores onto the same die changes the energy consumption profile and, in turn, affects the energy efficiency of the processor.<\/jats:p>\n          <jats:p>In this work, we propose heterogeneous LLC management (HeLM), a novel shared LLC management policy that takes advantage of the GPU\u2019s tolerance for memory access latency. HeLM is able to throttle GPU LLC accesses and yield LLC space to cache-sensitive CPU applications. This throttling is achieved by allowing GPU accesses to bypass the LLC when an increase in memory access latency can be tolerated. The latency tolerance of a GPU application is determined by the availability of TLP, which is measured at runtime as the average number of threads that are available for issuing. For a baseline configuration with two CPU cores and four GPU cores, modeled after existing heterogeneous processor designs, HeLM outperforms least recently used (LRU) policy by 10.4%. Additionally, HeLM also outperforms competing policies. Our evaluations show that HeLM is able to sustain performance with varying core mix.<\/jats:p>\n          <jats:p>\n            In addition to the performance benefit, bypassing also reduces total accesses to the LLC, leading to a reduction in the energy consumption of the LLC module. However, LLC bypassing has the potential to increase off-chip bandwidth utilization and DRAM energy consumption. Our experiments show that HeLM exhibits better energy efficiency by reducing the ED\n            <jats:sup>2<\/jats:sup>\n            by 18% over LRU while impacting only a 7% increase in off-chip bandwidth utilization.\n          <\/jats:p>","DOI":"10.1145\/2710019","type":"journal-article","created":{"date-parts":[[2015,3,9]],"date-time":"2015-03-09T19:03:01Z","timestamp":1425927781000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Performance-Energy Considerations for Shared Cache Management in a Heterogeneous Multicore Processor"],"prefix":"10.1145","volume":"12","author":[{"given":"Anup","family":"Holey","sequence":"first","affiliation":[{"name":"Intel Corporation, Folsom, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vineeth","family":"Mekkat","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pen-Chung","family":"Yew","sequence":"additional","affiliation":[{"name":"University of Minnesota-Twin Cities, Minneapolis, MN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonia","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of Minnesota-Twin Cities, Minneapolis, MN"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,3,9]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Advanced Micro Devices Incorporated. 2007. ATI Stream Computing Programming Guide. Available at http:\/\/www.amd.com.  Advanced Micro Devices Incorporated. 2007. ATI Stream Computing Programming Guide. Available at http:\/\/www.amd.com."},{"key":"e_1_2_1_2_1","unstructured":"Advanced Micro Devices Incorporated. 2009. Evergreen Family Instruction Set Architecture. Available at http:\/\/www.amd.com.  Advanced Micro Devices Incorporated. 2009. Evergreen Family Instruction Set Architecture. Available at http:\/\/www.amd.com."},{"key":"e_1_2_1_3_1","unstructured":"Advanced Micro Devices Incorporated. 2011. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Available at http:\/\/developer.amd.com\/sdks\/amdappsdk\/.  Advanced Micro Devices Incorporated. 2011. AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK). Available at http:\/\/developer.amd.com\/sdks\/amdappsdk\/."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.888701"},{"key":"e_1_2_1_5_1","unstructured":"Nathan Brookwood. 2010. AMD Fusion Family of APUs: Enabling a Superior Immersive PC Experience. Advanced Micro Devices (AMD) White Paper.  Nathan Brookwood. 2010. AMD Fusion Family of APUs: Enabling a Superior Immersive PC Experience. Advanced Micro Devices (AMD) White Paper."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1274971.1275005"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669164"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370860"},{"key":"e_1_2_1_9_1","first-page":"1","article-title":"Simpoint 3.0: Faster and more flexible program analysis","volume":"7","author":"Hamerly Greg","year":"2005","unstructured":"Greg Hamerly , Erez Perelman , Jeremy Lau , and Brad Calder . 2005 . Simpoint 3.0: Faster and more flexible program analysis . Journal of Instruction Level Parallelism 7 , 4, 1 -- 28 . Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program analysis. Journal of Instruction Level Parallelism 7, 4, 1--28.","journal-title":"Journal of Instruction Level Parallelism"},{"key":"e_1_2_1_10_1","unstructured":"Intel Corporation. 2009. Intel Sandy Bridge Microarchitecture. Available at http:\/\/www.intel.com.  Intel Corporation. 2009. Intel Sandy Bridge Microarchitecture. Available at http:\/\/www.intel.com."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454145"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815971"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.817393"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.24"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2007.70816"},{"key":"e_1_2_1_16_1","volume-title":"Retrieved","author":"Khronos Group","year":"2009","unstructured":"Khronos Group . 2009 . OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems . Retrieved January 19, 2015, from http:\/\/www.khronos.org\/opencl\/. Khronos Group. 2009. OpenCL: The Open Standard for Parallel Programming of Heterogeneous Systems. Retrieved January 19, 2015, from http:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1026001"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO 36)","author":"Kumar Rakesh","year":"2003","unstructured":"Rakesh Kumar , Keith I. Farkas , Norman P. Jouppi , Parthasarathy Ranganathan , and Dean M. Tullsen . 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction . In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO 36) . 81--92. DOI:http:\/\/dx.doi.org\/10.1109\/MICRO. 2003 .1253185 10.1109\/MICRO.2003.1253185 Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO 36). 81--92. DOI:http:\/\/dx.doi.org\/10.1109\/MICRO.2003.1253185"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379259"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168947"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485964"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771793"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/2337159.2337208"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/2523721.2523753"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/1786054.1786086"},{"key":"e_1_2_1_27_1","unstructured":"NVIDIA Corporation. 2007. NVIDIA CUDA C Programming Guide. Available at http:\/\/www.nvidia.com.  NVIDIA Corporation. 2007. NVIDIA CUDA C Programming Guide. Available at http:\/\/www.nvidia.com."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/505306.505330"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250709"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.5"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.49"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2011.4"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1062261.1062262"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2008.4658623"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1241601.1241625"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.165388"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:SUPE.0000014800.27383.8f"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370865"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555778"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11515-8_20"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2710019","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2710019","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:01:09Z","timestamp":1750276869000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2710019"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3,9]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,4,16]]}},"alternative-id":["10.1145\/2710019"],"URL":"https:\/\/doi.org\/10.1145\/2710019","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,3,9]]},"assertion":[{"value":"2014-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-03-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}