{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T22:17:00Z","timestamp":1766269020132,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T00:00:00Z","timestamp":1704844800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2024,1,31]]},"abstract":"<jats:p>In modern multi-processor systems-on-chip (MPSoCs), requests from different processor cores, accelerators, and their responses from the lower-level memory contend for the shared cache bandwidth, making it a critical performance bottleneck. Prior research on shared cache management has considered requests from cores but has ignored crucial contributions from their responses. Prior cache bypass techniques focused on data reuse and neglected the system-level implications of shared cache contention. We propose COBRRA, a novel shared cache controller policy that mitigates the contention by aggressively bypassing selected responses from the lower-level memory and scheduling the remaining requests and responses to the cache efficiently. COBRRA is able to improve the average performance of a set of 15 SPEC workloads by 49% and 33% compared to the no-bypass baseline and the best-performing state-of-the-art bypass solution, respectively. Furthermore, COBRRA reduces the overall cache energy consumption by 38% and 31% compared to the no-bypass baseline and the most energy-efficient state-of-the-art bypass solution, respectively.<\/jats:p>","DOI":"10.1145\/3632748","type":"journal-article","created":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T12:13:03Z","timestamp":1700223183000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["COBRRA: COntention-aware cache Bypass with Request-Response Arbitration"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-4885-5055","authenticated-orcid":false,"given":"Aritra","family":"Bagchi","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2328-9237","authenticated-orcid":false,"given":"Dinesh","family":"Joshi","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2508-7531","authenticated-orcid":false,"given":"Preeti Ranjan","family":"Panda","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1,10]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Intel. 2016. Memory Performance in a Nutshell. https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/articles\/technical\/memory-performance-in-a-nutshell.html"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304062"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2003.1253701"},{"key":"e_1_3_1_5_2","unstructured":"ARM Ltd. AMBA 5 CHI Architecture Specification. Version F 2022. https:\/\/developer.arm.com\/documentation\/ihi0050\/f\/?lang=en"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357526.3357547"},{"key":"e_1_3_1_7_2","unstructured":"Andriy Berestovskyy. 2019. Applied C++: Memory Latency. https:\/\/medium.com\/applied\/applied-c-memory-latency-d05a42fe354e"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2613908.2613909"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2019.00016"},{"key":"e_1_3_1_11_2","volume-title":"Intel Atom Processor P5362","author":"Corporation Intel","year":"2021","unstructured":"Intel Corporation. 2021. Intel Atom Processor P5362. Retrieved February 24, 2022, from https:\/\/www.intel.in\/content\/www\/in\/en\/products\/sku\/134793\/intel-atom-processor-p5362-27m-cache-2-2ghz\/specifications.html"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2897966"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2017.09.007"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2018.09.001"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.43"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE51398.2021.9474096"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337801.3337820"},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1109\/HPCA.2018.00019","volume-title":"2018 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"El-Sayed Nosayba","year":"2018","unstructured":"Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918). IEEE, 104\u2013117."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2015.2428694"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2620977"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1109\/HPCA.2017.46","volume-title":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917)","author":"Gaur Jayesh","year":"2017","unstructured":"Jayesh Gaur, Mainak Chaudhuri, Pradeep Ramachandran, and Sreenivas Subramoney. 2017. Near-optimal access partitioning for memory hierarchies with multiple heterogeneous bandwidth sources. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917). IEEE, 13\u201324."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT52795.2021.00023"},{"key":"e_1_3_1_23_2","volume-title":"AM62x Processors","author":"Instruments Texas","year":"2022","unstructured":"Texas Instruments. 2022. AM62x Processors. Retrieved August 17, 2022, from https:\/\/www.ti.com\/lit\/wp\/sprad41\/sprad41.pdf"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1109\/HPCA53966.2022.00054","volume-title":"2022 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201922)","author":"Jalili Majid","year":"2022","unstructured":"Majid Jalili and Mattan Erez. 2022. Reducing load latency with cache level prediction. In 2022 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201922). IEEE, 648\u2013661."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.24"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/AEECT.2013.6716445"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2007.70816"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1109\/ISCA.2018.00035","volume-title":"2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918)","author":"Korgaonkar Kunal","year":"2018","unstructured":"Kunal Korgaonkar, Ishwar Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Steven Swanson, Ian Young, and Hong Wang. 2018. Density tradeoffs of non-volatile memory as a replacement for SRAM based last level cache. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918). IEEE, 315\u2013327."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370862"},{"key":"e_1_3_1_30_2","volume-title":"IEEE International Symposium on Performance Analysis of Systems and Software","author":"Limaye Ankur","year":"2018","unstructured":"Ankur Limaye and Tosiron Adegbija. 2018. A workload characterization of the SPEC CPU2017 benchmark suite. In IEEE International Symposium on Performance Analysis of Systems and Software."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2001.990695"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2007.3"},{"key":"e_1_3_1_33_2","first-page":"28","article-title":"CACTI 6.0: A tool to model large caches","volume":"27","author":"Muralimanohar Naveen","year":"2009","unstructured":"Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP laboratories 27 (2009), 28.","journal-title":"HP laboratories"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1109\/MICRO.2007.21","volume-title":"40th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201907)","author":"Mutlu Onur","year":"2007","unstructured":"Onur Mutlu and Thomas Moscibroda. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In 40th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201907). IEEE, 146\u2013160."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303963"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/SAMOS.2016.7818332"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.49"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/342001.339668"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2554850.2554992"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.19"},{"key":"e_1_3_1_41_2","volume-title":"NXP Layerscape Processors","author":"Semiconductors NXP","year":"2017","unstructured":"NXP Semiconductors. 2017. NXP Layerscape Processors. Retrieved January 9, 2023, from https:\/\/www.nxp.com\/products\/processors-and-microcontrollers\/arm-processors\/layerscape-processors\/layerscape-lx2160a-lx2120a-lx2080a-processors:LX2160A"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2526003"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3362100"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3362100"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2011.1"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.7873\/DATE.2013.179"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2898036"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.46"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.85"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155671"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190511"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337863"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2013.6691165"},{"key":"e_1_3_1_54_2","first-page":"76","volume-title":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA\u201915)","author":"Xie Xiaolong","year":"2015","unstructured":"Xiaolong Xie, Yun Liang, Yu Wang, Guangyu Sun, and Tao Wang. 2015. Coordinated static and dynamic cache bypassing for GPUs. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA\u201915). IEEE, 76\u201388."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/DASC.2011.65"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/2627369.2627611"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/1519065.1519076"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3632748","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3632748","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:51:04Z","timestamp":1750287064000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3632748"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,10]]},"references-count":56,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,31]]}},"alternative-id":["10.1145\/3632748"],"URL":"https:\/\/doi.org\/10.1145\/3632748","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2024,1,10]]},"assertion":[{"value":"2023-04-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-05","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}