{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,15]],"date-time":"2025-11-15T10:33:21Z","timestamp":1763202801259,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T00:00:00Z","timestamp":1694217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"crossref","award":["2020-IR-2979"],"award-info":[{"award-number":["2020-IR-2979"]}],"id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,10,31]]},"abstract":"<jats:p>\n            Modern multi-processor systems-on-chip (MPSoCs) are characterized by caches shared by multiple cores. These shared caches receive\n            <jats:italic>requests<\/jats:italic>\n            issued by the processor cores. Requests that are subject to cache misses may result in the generation of\n            <jats:italic>responses<\/jats:italic>\n            . These responses are received from the lower level of the memory hierarchy and written to the cache. The outstanding requests and responses contend for the shared cache bandwidth. To mitigate the impact of the cache bandwidth contention on the overall system performance, an efficient request and response arbitration policy is needed.\n          <\/jats:p>\n          <jats:p>\n            Research on shared cache management has neglected the additional cache contention caused by responses, which are written to the cache. We propose\n            <jats:italic>CABARRE<\/jats:italic>\n            , a novel request and response arbitration policy at shared caches, so as to improve the overall system performance.\n            <jats:italic>CABARRE<\/jats:italic>\n            shows a performance improvement of 23% on average across a set of SPEC workloads compared to straightforward adaptations of state-of-the-art solutions.\n          <\/jats:p>","DOI":"10.1145\/3608096","type":"journal-article","created":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T13:33:18Z","timestamp":1694266398000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["CABARRE: Request Response Arbitration for Shared Cache Management"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3944-9636","authenticated-orcid":false,"given":"Garima","family":"Modi","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4885-5055","authenticated-orcid":false,"given":"Aritra","family":"Bagchi","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8398-476X","authenticated-orcid":false,"given":"Neetu","family":"Jindal","sequence":"additional","affiliation":[{"name":"Intel Architecture Group, Intel, India"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8862-9941","authenticated-orcid":false,"given":"Ayan","family":"Mandal","sequence":"additional","affiliation":[{"name":"Intel Architecture Group, Intel, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2508-7531","authenticated-orcid":false,"given":"Preeti Ranjan","family":"Panda","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi, India"}]}],"member":"320","published-online":{"date-parts":[[2023,9,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1109\/ISVLSI.2017.59","volume-title":"2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","author":"Adegbija Tosiron","year":"2017","unstructured":"Tosiron Adegbija and Ravi Tandon. 2017. Coding for efficient caching in multicore embedded systems. In 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 296\u2013301."},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1109\/ICCD46524.2019.00022","volume-title":"2019 IEEE 37th International Conference on Computer Design (ICCD)","author":"Chaudhuri Mainak","year":"2019","unstructured":"Mainak Chaudhuri, Jayesh Gaur, and Sreenivas Subramoney. 2019. Bandwidth-aware last-level caching: Efficiently coordinating off-chip read and write bandwidth. In 2019 IEEE 37th International Conference on Computer Design (ICCD). IEEE, 109\u2013118."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3156692"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1109\/PACT.2019.00016","volume-title":"2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)","author":"Chung Jongwook","year":"2019","unstructured":"Jongwook Chung, Yuhwan Ro, Joonsung Kim, Jaehyung Ahn, Jangwoo Kim, John Kim, Jae W. Lee, and Jung Ho Ahn. 2019. Enforcing last-level cache partitioning through memory virtual channels. In 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 97\u2013109."},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","first-page":"695","DOI":"10.23919\/DATE51398.2021.9474096","volume-title":"2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Dutta Kousik Kumar","year":"2021","unstructured":"Kousik Kumar Dutta, Prathamesh Nitin Tanksale, and Shirshendu Das. 2021. A fairness conscious cache replacement policy for last level cache. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 695\u2013700."},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1109\/HPCA.2018.00019","volume-title":"2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)","author":"El-Sayed Nosayba","year":"2018","unstructured":"Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 104\u2013117."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532213.3532250"},{"issue":"2","key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1109\/TC.2015.2428694","article-title":"Bandwidth-aware on-line scheduling in SMT multicores","volume":"65","author":"Feliu Josue","year":"2015","unstructured":"Josue Feliu, Julio Sahuquillo, Salvador Petit, and Jose Duato. 2015. Bandwidth-aware on-line scheduling in SMT multicores. IEEE Trans. Comput. 65, 2 (2015), 422\u2013434.","journal-title":"IEEE Trans. Comput."},{"issue":"5","key":"e_1_3_1_10_2","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1109\/TC.2016.2620977","article-title":"Perf&Fair: A progress-aware scheduler to enhance performance and fairness in SMT multicores","volume":"66","author":"Feliu Josue","year":"2016","unstructured":"Josue Feliu, Julio Sahuquillo, Salvador Petit, and Jose Duato. 2016. Perf&Fair: A progress-aware scheduler to enhance performance and fairness in SMT multicores. IEEE Trans. Comput. 66, 5 (2016), 905\u2013911.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485930"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/2555729.2555740"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1109\/PACT52795.2021.00023","volume-title":"2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT)","author":"Holtryd Nadja Ramh\u00f6j","year":"2021","unstructured":"Nadja Ramh\u00f6j Holtryd, Madhavan Manivannan, Per Stenstr\u00f6m, and Miquel Peric\u00e0s. 2021. CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 213\u2013225."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/1394608.1382172"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","first-page":"800","DOI":"10.23919\/DATE.2017.7927098","volume-title":"Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017","author":"Jain Rahul","year":"2017","unstructured":"Rahul Jain, Preeti Ranjan Panda, and Sreenivas Subramoney. 2017. A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioning. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 800\u2013805."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815971"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2023.3242178"},{"key":"e_1_3_1_19_2","first-page":"1","volume-title":"2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)","author":"K\u0119dzierski Kamil","year":"2010","unstructured":"Kamil K\u0119dzierski, Miquel Moreto, Francisco J. Cazorla, and Mateo Valero. 2010. Adapting cache partitioning algorithms to pseudo-lru replacement policies. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1\u201312."},{"key":"e_1_3_1_20_2","first-page":"65","volume-title":"2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Kim Yoongu","year":"2010","unstructured":"Yoongu Kim, Michael Papamichael, Onur Mutlu, and Mor Harchol-Balter. 2010. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 65\u201376."},{"key":"e_1_3_1_21_2","first-page":"79","volume-title":"2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Li Zhaoying","year":"2018","unstructured":"Zhaoying Li, Lei Ju, Hongjun Dai, Xin Li, Mengying Zhao, and Zhiping Jia. 2018. Set variation-aware shared LLC management for CPU-GPU heterogeneous architecture. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 79\u201384."},{"key":"e_1_3_1_22_2","article-title":"The gem5 simulator: Version 20.0+","author":"Lowe-Power Jason","year":"2020","unstructured":"Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adri\u00e0 Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, et\u00a0al. 2020. The gem5 simulator: Version 20.0+. arXiv preprint arXiv:2007.03152 (2020).","journal-title":"arXiv preprint arXiv:2007.03152"},{"key":"e_1_3_1_23_2","first-page":"28","article-title":"CACTI 6.0: A tool to model large caches","volume":"27","author":"Muralimanohar Naveen","year":"2009","unstructured":"Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories 27 (2009), 28.","journal-title":"HP Laboratories"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2968066"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.49"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/342001.339668"},{"key":"e_1_3_1_27_2","article-title":"Cads: Core-aware dynamic scheduler for multicore memory controllers","author":"Sanchez Eduardo Olmedo","year":"2019","unstructured":"Eduardo Olmedo Sanchez and Xian-He Sun. 2019. Cads: Core-aware dynamic scheduler for multicore memory controllers. arXiv preprint arXiv:1907.07776 (2019).","journal-title":"arXiv preprint arXiv:1907.07776"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"779","DOI":"10.23919\/DATE.2018.8342112","volume-title":"2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Song Yang","year":"2018","unstructured":"Yang Song, Olivier Alavoine, and Bill Lin. 2018. Row-buffer hit harvesting in orchestrated last-level cache and DRAM scheduling for heterogeneous multicore systems. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 779\u2013784."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815972"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2526003"},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1145\/2830772.2830803","volume-title":"Proceedings of the 48th International Symposium on Microarchitecture","author":"Subramanian Lavanya","year":"2015","unstructured":"Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In Proceedings of the 48th International Symposium on Microarchitecture. 62\u201375."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3362100"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3126535"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2011.1"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3202663"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2898036"},{"key":"e_1_3_1_37_2","first-page":"1","volume-title":"Proceedings of the 48th International Conference on Parallel Processing","author":"Xiang Yaocheng","year":"2019","unstructured":"Yaocheng Xiang, Chencheng Ye, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang. 2019. EMBA: Efficient memory bandwidth allocation to improve performance on intel commodity processor. In Proceedings of the 48th International Conference on Parallel Processing. 1\u201312."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854306"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3608096","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3608096","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:32Z","timestamp":1750178792000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3608096"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,9]]},"references-count":37,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2023,10,31]]}},"alternative-id":["10.1145\/3608096"],"URL":"https:\/\/doi.org\/10.1145\/3608096","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2023,9,9]]},"assertion":[{"value":"2023-03-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}