{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,4]],"date-time":"2025-10-04T00:39:08Z","timestamp":1759538348450,"version":"build-2065373602"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>In modern multi-processor systems-on-chips (MPSoCs), writebacks from the private caches to the shared cache can introduce significant performance bottlenecks, especially because multiple threads from different co-executing programs contend for the shared cache resources. Intelligent cache bypass decisions for writebacks help mitigate such contention and enhance the utilization of the shared cache. Most prior cache bypass strategies account for contention for shared cache capacity by focusing primarily on data reuse, with only recent research beginning to consider bandwidth contention also in dynamic bypass decisions. However, data sharing, a crucial characteristic of modern multithreaded workloads, remains largely overlooked by state-of-the-art cache bypass decisions. Bypassing highly shared cache lines can increase the volume of main memory accesses, potentially resulting in performance bottlenecks. We introduce SHARP, a novel cache bypass policy that incorporates three key factors: data sharing, contention, and data reuse, into its dynamic bypass decisions for cache writebacks. In addition to prioritizing the caching of data with high reuse, we prioritize the caching of data shared across multiple threads to enhance cache utilization. We dynamically modulate our bypass decisions, employing aggressive bypass for writebacks when shared cache contention is high, while employing conservative bypass when contention is low. Experiments across a diverse set of PARSEC workloads demonstrate that SHARP improves overall system throughput by 12% and 8% compared to the no-bypass baseline and the state-of-the-art bypass baseline, respectively. SHARP also reduces the overall cache energy consumption by 14% over the no-bypass baseline.<\/jats:p>","DOI":"10.1145\/3760746","type":"journal-article","created":{"date-parts":[[2025,8,16]],"date-time":"2025-08-16T11:07:12Z","timestamp":1755342432000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SHARP: SHARing-Aware Cache Writeback byPass"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-2328-9237","authenticated-orcid":false,"given":"Dinesh","family":"Joshi","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Delhi","place":["New Delhi, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4885-5055","authenticated-orcid":false,"given":"Aritra","family":"Bagchi","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi","place":["New Delhi, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2508-7531","authenticated-orcid":false,"given":"Preeti Ranjan","family":"Panda","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi","place":["New Delhi, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,26]]},"reference":[{"unstructured":"AMD. 2023. XA Zynq\u2122UltraScale+\u2122MPSoC. (May2023). Retrieved March 11 2024 from https:\/\/docs.xilinx.com\/v\/u\/en-US\/ds894-zynq-ultrascale-plus-overview","key":"e_1_3_1_2_2"},{"unstructured":"Apple. 2023. Apple introduces M2 Ultra. (2023). Retrieved August 3 2024 from https:\/\/www.apple.com\/in\/newsroom\/2023\/06\/apple-introduces-m2-ultra\/","key":"e_1_3_1_3_2"},{"unstructured":"NXP Semiconductors. 2023. Layerscape\u00aeLX2160A Processor. (September2023). Retrieved March 11 2024 from https:\/\/www.nxp.com\/products\/processors-and-microcontrollers\/arm-processors\/layerscape-processors\/layerscape-lx2160a-lx2120a-lx2080a-processors:LX2160A","key":"e_1_3_1_4_2"},{"unstructured":"Intel. 2024. Intel\u00aeXeon\u00aeE-2488 Processor. (2024). Retrieved August 3 2024 from https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/236182\/intel-xeon-e-2488-processor-24m-cache-3-20-ghz.html","key":"e_1_3_1_5_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_6_2","DOI":"10.1145\/216585.216588"},{"doi-asserted-by":"publisher","key":"e_1_3_1_7_2","DOI":"10.1145\/3173162.3173177"},{"doi-asserted-by":"publisher","key":"e_1_3_1_8_2","DOI":"10.1109\/MM.2024.3373763"},{"doi-asserted-by":"publisher","key":"e_1_3_1_9_2","DOI":"10.1109\/HPCA.2018.00019"},{"doi-asserted-by":"publisher","key":"e_1_3_1_10_2","DOI":"10.1109\/PACT.2017.19"},{"doi-asserted-by":"publisher","key":"e_1_3_1_11_2","DOI":"10.1145\/3190508.3190511"},{"doi-asserted-by":"publisher","key":"e_1_3_1_12_2","DOI":"10.1145\/3302424.3303963"},{"doi-asserted-by":"publisher","key":"e_1_3_1_13_2","DOI":"10.1109\/MICRO.2006.49"},{"doi-asserted-by":"publisher","key":"e_1_3_1_14_2","DOI":"10.1145\/3632748"},{"doi-asserted-by":"publisher","key":"e_1_3_1_15_2","DOI":"10.1145\/3608096"},{"doi-asserted-by":"publisher","key":"e_1_3_1_16_2","DOI":"10.1145\/3362100"},{"doi-asserted-by":"publisher","key":"e_1_3_1_17_2","DOI":"10.1145\/3653452"},{"key":"e_1_3_1_18_2","article-title":"Arm\u00aeNeoverse\u2122V1 Reference Design","year":"2020","unstructured":"Arm. 2020. Arm\u00aeNeoverse\u2122V1 Reference Design. Retrieved 14 Apr. 2025 from https:\/\/developer.arm.com\/Tools%20and%20Software\/Neoverse%20V1%20Reference%20Design","journal-title":"https:\/\/developer.arm.com\/Tools%20and%20Software\/Neoverse%20V1%20Reference%20Design"},{"key":"e_1_3_1_19_2","article-title":"STMicroelectronics STM32MP1 Series Reference Manual","year":"2019","unstructured":"STMicroelectronics. 2019. STMicroelectronics STM32MP1 Series Reference Manual. Retrieved 06 Apr. 2025 from https:\/\/www.st.com\/resource\/en\/reference_manual\/rm0436-stm32mp157-advanced-armbased-32bit-mpus-stmicroelectronics.pdf","journal-title":"https:\/\/www.st.com\/resource\/en\/reference_manual\/rm0436-stm32mp157-advanced-armbased-32bit-mpus-stmicroelectronics.pdf"},{"key":"e_1_3_1_20_2","article-title":"Arm\u00aeCoreLink\u2122CI-700 Coherent Interconnect Technical Reference Manual","year":"2022","unstructured":"Arm. 2022. Arm\u00aeCoreLink\u2122CI-700 Coherent Interconnect Technical Reference Manual. Retrieved 16 Jan. 2025 from https:\/\/developer.arm.com\/documentation\/101569\/0300\/?lang=en. (April2022).","journal-title":"https:\/\/developer.arm.com\/documentation\/101569\/0300\/?lang=en"},{"key":"e_1_3_1_21_2","article-title":"Arm\u00aeNeoverse\u2122CMN-700 Coherent Mesh Network Technical Reference Manual","year":"2020","unstructured":"Arm. 2020. Arm\u00aeNeoverse\u2122CMN-700 Coherent Mesh Network Technical Reference Manual. Retrieved 16 Jan. 2025 from https:\/\/developer.arm.com\/documentation\/102308\/0302\/?lang=en. (December2020).","journal-title":"https:\/\/developer.arm.com\/documentation\/102308\/0302\/?lang=en"},{"key":"e_1_3_1_22_2","article-title":"Arm\u00aeCoreLink\u2122NIC-400 Interconnect Technical Reference Manual","year":"2016","unstructured":"Arm. 2016. Arm\u00aeCoreLink\u2122NIC-400 Interconnect Technical Reference Manual. Retrieved 16 Jan. 2025 from https:\/\/developer.arm.com\/documentation\/100459\/0000\/nic-450-components\/nic-400-network-interconnect?lang=en","journal-title":"https:\/\/developer.arm.com\/documentation\/100459\/0000\/nic-450-components\/nic-400-network-interconnect?lang=en"},{"doi-asserted-by":"publisher","key":"e_1_3_1_23_2","DOI":"10.1109\/MM.2021.3114903"},{"key":"e_1_3_1_24_2","article-title":"MESI Two Level Protocol Overview","author":"Lowe-Power Jason","year":"2025","unstructured":"Jason Lowe-Power. 2025. MESI Two Level Protocol Overview. Retrieved 02 Jun. 2025 from https:\/\/www.gem5.org\/documentation\/general_docs\/ruby\/MESI_Two_Level\/","journal-title":"https:\/\/www.gem5.org\/documentation\/general_docs\/ruby\/MESI_Two_Level\/"},{"unstructured":"Intel. 2025. Article ID: 000099741, Cache-Coherence Protocol Directory Placed in Intel\u00aeXeon\u00aeProcessors. Retrieved 25 Jun 2025 from https:\/\/www.intel.com\/content\/www\/us\/en\/support\/articles\/000099741\/processors\/intel-xeon-processors.html","journal-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/support\/articles\/000099741\/processors\/intel-xeon-processors.html","article-title":"Article ID: 000099741, Cache-Coherence Protocol Directory Placed in Intel\u00aeXeon\u00aeProcessors","key":"e_1_3_1_25_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_26_2","DOI":"10.1145\/1787275.1787314"},{"doi-asserted-by":"publisher","key":"e_1_3_1_27_2","DOI":"10.1109\/PACT.2015.23"},{"doi-asserted-by":"publisher","key":"e_1_3_1_28_2","DOI":"10.7873\/DATE.2015.0438"},{"doi-asserted-by":"publisher","key":"e_1_3_1_29_2","DOI":"10.1145\/2370816.2370862"},{"doi-asserted-by":"publisher","key":"e_1_3_1_30_2","DOI":"10.1109\/SAMOS.2016.7818332"},{"doi-asserted-by":"publisher","key":"e_1_3_1_31_2","DOI":"10.1145\/2155620.2155671"},{"doi-asserted-by":"publisher","key":"e_1_3_1_32_2","DOI":"10.1109\/AEECT.2013.6716445"},{"doi-asserted-by":"publisher","key":"e_1_3_1_33_2","DOI":"10.1145\/3357526.3357569"},{"doi-asserted-by":"publisher","key":"e_1_3_1_34_2","DOI":"10.1109\/ISCA45697.2020.00045"},{"doi-asserted-by":"publisher","key":"e_1_3_1_35_2","DOI":"10.1145\/3146347.3146356"},{"doi-asserted-by":"publisher","key":"e_1_3_1_36_2","DOI":"10.1109\/IPDPS.2013.16"},{"unstructured":"Jason Lowe-Power Abdul Ahmad Ayaz Akram Mohammad Alian Rico Amslinger Matteo Andreozzi Adri\u00e0 Armejach Nils Asmussen Brad Beckmann et\u00a0al. 2020. The gem5 simulator: Version 20.0+. arXiv:2007.03152. Retrieved 30 Sep. 2020 from https:\/\/arxiv.org\/abs\/2007.03152","key":"e_1_3_1_37_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_38_2","DOI":"10.1109\/MICRO.2010.24"},{"doi-asserted-by":"publisher","key":"e_1_3_1_39_2","DOI":"10.1145\/2000064.2000075"},{"doi-asserted-by":"publisher","key":"e_1_3_1_40_2","DOI":"10.1145\/2716282.2716283"},{"doi-asserted-by":"publisher","key":"e_1_3_1_41_2","DOI":"10.5555\/2523721.2523753"},{"doi-asserted-by":"publisher","key":"e_1_3_1_42_2","DOI":"10.1145\/2897937.2898036"},{"doi-asserted-by":"publisher","key":"e_1_3_1_43_2","DOI":"10.1145\/1273440.1250709"},{"issue":"4","key":"e_1_3_1_44_2","first-page":"1","article-title":"Reuse distance-based probabilistic cache replacement","volume":"12","author":"Das Subhasis","year":"2015","unstructured":"Subhasis Das, Tor M. Aamodt, and William J. Dally. 2015. Reuse distance-based probabilistic cache replacement. ACM Transactions on Architecture and Code Optimization (TACO) 12, 4 (2015), 1\u201322.","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"doi-asserted-by":"publisher","key":"e_1_3_1_45_2","DOI":"10.1145\/1816038.1815971"},{"doi-asserted-by":"publisher","key":"e_1_3_1_46_2","DOI":"10.1109\/HPCA51647.2021.00033"},{"doi-asserted-by":"publisher","key":"e_1_3_1_47_2","DOI":"10.1109\/IPDPS.2016.30"},{"doi-asserted-by":"publisher","key":"e_1_3_1_48_2","DOI":"10.1145\/3007787.3001146"},{"doi-asserted-by":"publisher","key":"e_1_3_1_49_2","DOI":"10.1145\/1454115.1454128"},{"doi-asserted-by":"publisher","key":"e_1_3_1_50_2","DOI":"10.1109\/ISPASS.2012.6189219"},{"doi-asserted-by":"publisher","key":"e_1_3_1_51_2","DOI":"10.1109\/TCAD.2024.3446720"},{"doi-asserted-by":"publisher","key":"e_1_3_1_52_2","DOI":"10.1145\/3605573.3605616"},{"doi-asserted-by":"publisher","key":"e_1_3_1_53_2","DOI":"10.1109\/HPCA.2011.5749726"},{"doi-asserted-by":"publisher","key":"e_1_3_1_54_2","DOI":"10.1145\/1669112.1669166"},{"doi-asserted-by":"publisher","key":"e_1_3_1_55_2","DOI":"10.1145\/1840845.1840929"},{"doi-asserted-by":"publisher","key":"e_1_3_1_56_2","DOI":"10.1145\/1454115.1454128"},{"doi-asserted-by":"publisher","key":"e_1_3_1_57_2","DOI":"10.1109\/HPCA.2014.6835944"},{"doi-asserted-by":"publisher","key":"e_1_3_1_58_2","DOI":"10.1145\/2627369.2627611"},{"key":"e_1_3_1_59_2","article-title":"CACTI 6.0: A tool to model large caches","volume":"27","author":"Muralimanohar Naveen","year":"2009","unstructured":"Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories 27 (2009), 1\u201328.","journal-title":"HP Laboratories"},{"doi-asserted-by":"publisher","key":"e_1_3_1_60_2","DOI":"10.1145\/3123939.3123942"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3760746","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T14:05:24Z","timestamp":1759500324000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3760746"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,26]]},"references-count":59,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3760746"],"URL":"https:\/\/doi.org\/10.1145\/3760746","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2025,9,26]]},"assertion":[{"value":"2025-08-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}