{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:04:08Z","timestamp":1750309448088,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T00:00:00Z","timestamp":1726012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared with a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core\u2019s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC-OPT, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC-OPT is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC-OPT shows average-case performance speedups of 1.89\u00d7, 3.36\u00d7, and 6.24\u00d7 compared with the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared with the original ZCLLC that does not have any optimizations, ZCLLC-OPT shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared with ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.<\/jats:p>","DOI":"10.1145\/3687308","type":"journal-article","created":{"date-parts":[[2024,8,8]],"date-time":"2024-08-08T11:14:22Z","timestamp":1723115662000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3272-062X","authenticated-orcid":false,"given":"Zhuanhao","family":"Wu","sequence":"first","affiliation":[{"name":"University of Waterloo, Waterloo, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8347-0109","authenticated-orcid":false,"given":"Anirudh","family":"Kaushik","sequence":"additional","affiliation":[{"name":"Intel Corporation, Toronto, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2750-4471","authenticated-orcid":false,"given":"Hiren","family":"Patel","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada"}]}],"member":"320","published-online":{"date-parts":[[2024,9,11]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11241-021-09376-1"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/ECRTS.2014.11"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/MECO49872.2020.9134262"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","unstructured":"Brian N. Bershad Dennis Lee Theodore H. Romer and J. Bradley Chen. 1994. Avoiding conflict misses dynamically in large direct-mapped caches. ACM 158\u2013170. DOI:10.1145\/195473.195527","DOI":"10.1145\/195473.195527"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","unstructured":"Nathan Binkert Bradford Beckmann Gabriel Black Steven K. Reinhardt Ali Saidi Arkaprava Basu Joel Hestness Derek R. Hower Tushar Krishna Somayeh Sardashti Rathijit Sen Korey Sewell Muhammad Shoaib Nilay Vaish Mark D. Hill and David A. Wood. 2011. The Gem5 simulator. 39 2 (aug2011) 1\u20137. DOI:10.1145\/2024716.2024718","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3398665"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00032"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00015"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2016.015"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589098"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2009.5090750"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00019"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6423(99)00010-6"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2830555"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/rtss.2008.10"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2010.08.007"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11241-019-09336-w"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11241-019-09336-w"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2018.00059"},{"key":"e_1_3_2_22_2","volume-title":"Computer Architecture, Sixth Edition: A Quantitative Approach (6th ed.)","author":"Hennessy John L.","year":"2017","unstructured":"John L. Hennessy and David A. Patterson. 2017. Computer Architecture, Sixth Edition: A Quantitative Approach (6th ed.). Morgan Kaufmann Publishers Inc., San Francisco."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3556975"},{"key":"e_1_3_2_24_2","article-title":"Improving real-time performance by utilizing cache allocation technology","year":"2015","unstructured":"Intel. 2015. Improving real-time performance by utilizing cache allocation technology. Intel Corporation (2015). https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/white-papers\/cache-allocation-technology-white-paper.pdf","journal-title":"Intel Corporation"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.3037747"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS52030.2021.00017"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2021.3123056"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS46320.2019.00044"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3092946"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2016.7461323"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2834848.2834851"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339669"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2000.898054"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.4230\/OASIcs.WCET.2009.2283"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11241-015-9235-y"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.2200\/s00962ed2v01y201910cac049"},{"key":"e_1_3_2_37_2","volume-title":"Ultra-Reliable MPC574XB\/c\/G Mcus for Automotive and Industrial Control and Gateway","author":"Semiconductors NXP","year":"2022","unstructured":"NXP Semiconductors. 2022. Ultra-Reliable MPC574XB\/c\/G Mcus for Automotive and Industrial Control and Gateway. Retrieved from https:\/\/www.nxp.com\/"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292009"},{"volume-title":"RH850\/C1M-AX","year":"2022","key":"e_1_3_2_39_2","unstructured":"Renesas. 2022. RH850\/C1M-AX. Retrieved from https:\/\/www.renesas.com\/"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS48715.2020.00006"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2016.7482078"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593235"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/1353534.1346299"},{"key":"e_1_3_2_44_2","unstructured":"Sriram Srinivasan and William L Walker. 2018. Shadow tag memory to monitor state of cachelines at different cache level. US Patent 10 073 776."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391545"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS.2016.7461361"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/ECRTS.2013.26"},{"key":"e_1_3_2_48_2","volume-title":"ZCLLC","author":"Wu Zhuahao","year":"2023","unstructured":"Zhuahao Wu, Anirudh Kaushik, and Hiren Patel. 2023. ZCLLC. Retrieved from https:\/\/github.com\/zhuanhao-wu\/gem5-zcllc"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS58335.2023.00027"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530614"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2013.44"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628104"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530613"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3139258.3139269"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/1787275.1787314"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687308","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687308","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:01Z","timestamp":1750295401000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687308"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,11]]},"references-count":54,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3687308"],"URL":"https:\/\/doi.org\/10.1145\/3687308","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2024,9,11]]},"assertion":[{"value":"2023-12-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}