{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T16:25:46Z","timestamp":1756311946473,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,3,1]],"date-time":"2023-03-01T00:00:00Z","timestamp":1677628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>\n            <jats:bold>Chip multiprocessors (CMP)<\/jats:bold>\n            with more cores have more traffic to the\n            <jats:bold>last-level cache (LLC)<\/jats:bold>\n            . Without a corresponding increase in LLC bandwidth, such traffic cannot be sustained, resulting in performance degradation. Previous research focused on data placement techniques to improve access latency in\n            <jats:bold>Non-Uniform Cache Architectures (NUCA)<\/jats:bold>\n            . Placing data closer to the referring core reduces traffic in cache interconnect. However, earlier data placement work did not account for the frequency with which specific memory references are accessed. The difficulty of tracking access frequency for all memory references is one of the main reasons why it was not considered in NUCA data placement.\n          <\/jats:p>\n          <jats:p>\n            In this research, we present a hardware-assisted solution called\n            <jats:bold>\n              ACTION (\n              <jats:underline>A<\/jats:underline>\n              daptive\n              <jats:underline>C<\/jats:underline>\n              ache Block Migra\n              <jats:underline>tion<\/jats:underline>\n              )\n            <\/jats:bold>\n            to track the access frequency of individual memory references and prioritize placement of frequently referred data closer to the affine core. ACTION mechanism implements cache block migration when there is a detectable change in access frequencies due to a shift in the program phase. ACTION counts access references in the LLC stream using a simple and approximate method and uses a straightforward placement and migration solution to keep the hardware overhead low. We evaluate ACTION on a 4-core CMP with a 5x5 mesh LLC network implementing a partitioned D-NUCA against workloads exhibiting distinct asymmetry in cache block access frequency. Our simulation results indicate that ACTION can improve CMP performance by up to 7.5% over\n            <jats:bold>state-of-the-art (SOTA)<\/jats:bold>\n            D-NUCA solutions.\n          <\/jats:p>","DOI":"10.1145\/3572911","type":"journal-article","created":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T12:05:35Z","timestamp":1669723535000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["ACTION: Adaptive Cache Block Migration in Distributed Cache Architectures"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3600-9432","authenticated-orcid":false,"given":"Chandra Sekhar","family":"Mummidi","sequence":"first","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8221-3824","authenticated-orcid":false,"given":"Sandip","family":"Kundu","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, Massachusetts, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,3]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2015.35"},{"key":"e_1_3_1_3_2","first-page":"180","volume-title":"International Workshop on Power-Aware Computer Systems","author":"Balasubramonian Rajeev","year":"2003","unstructured":"Rajeev Balasubramonian, Viji Srinivasan, Sandhya Dwarkadas, and Alper Buyuktosunoglu. 2003. Hot-and-cold: Using criticality in the design of energy-efficient caches. In International Workshop on Power-Aware Computer Systems. Springer, 180\u2013195."},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1109\/MICRO.2004.21","volume-title":"37th International Symposium on Microarchitecture (MICRO-37\u201904)","author":"Beckmann Bradford M.","year":"2004","unstructured":"Bradford M. Beckmann and David A. Wood. 2004. Managing wire delay in large chip-multiprocessor caches. In 37th International Symposium on Microarchitecture (MICRO-37\u201904). IEEE, 319\u2013330."},{"key":"e_1_3_1_5_2","first-page":"213","volume-title":"Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques","author":"Beckmann Nathan","year":"2013","unstructured":"Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE, 213\u2013224."},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1109\/IISWC.2008.4636090","volume-title":"2008 IEEE International Symposium on Workload Characterization","author":"Bienia Christian","year":"2008","unstructured":"Christian Bienia, Sanjeev Kumar, and Kai Li. 2008. Parsec vs. splash-2: A quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors. In 2008 IEEE International Symposium on Workload Characterization. IEEE, 47\u201356."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/337292.337523"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1109\/ISCA.2005.39","volume-title":"32nd International Symposium on Computer Architecture (ISCA\u201905)","author":"Chishti Zeshan","year":"2005","unstructured":"Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2005. Optimizing replication, communication, and capacity allocation in CMPs. In 32nd International Symposium on Computer Architecture (ISCA\u201905). IEEE, 357\u2013368."},{"key":"e_1_3_1_9_2","first-page":"455","volume-title":"2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201906)","author":"Cho Sangyeun","year":"2006","unstructured":"Sangyeun Cho and Lei Jin. 2006. Managing distributed, shared L2 caches through OS-level page allocation. In 2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201906). IEEE, 455\u2013468."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3149371"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2014.10"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555779"},{"key":"e_1_3_1_13_2","volume-title":"Proceedings of the Biennial Conference on Innovative Data Systems Research","author":"Hardavellas Nikos","year":"2007","unstructured":"Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia Ailamaki, and Babak Falsafi. 2007. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the Biennial Conference on Innovative Data Systems Research."},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1109\/HPCA.2006.1598115","volume-title":"The Twelfth International Symposium on High-Performance Computer Architecture, 2006.","author":"Jaleel Aamer","year":"2006","unstructured":"Aamer Jaleel, Matthew Mattina, and Bruce Jacob. 2006. Last level cache (llc) performance of data mining workloads on a CMP-a case study of parallel bioinformatics workloads. In The Twelfth International Symposium on High-Performance Computer Architecture, 2006. IEEE, 88\u201398."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485957"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605420"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079079.3079089"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/1394608.1382159"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2980024.2872363"},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1109\/HPCA.2009.4798236","volume-title":"2009 IEEE 15th International Symposium on High Performance Computer Architecture","author":"Qureshi Moinuddin K.","year":"2009","unstructured":"Moinuddin K. Qureshi. 2009. Adaptive spill-receive for robust high-performance caching in CMPs. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, 45\u201354."},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1109\/MICRO.2006.49","volume-title":"2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201906)","author":"Qureshi Moinuddin K.","year":"2006","unstructured":"Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In 2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201906). IEEE, 423\u2013432."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/342001.339685"},{"key":"e_1_3_1_23_2","article-title":"Years of microprocessor trend data","author":"Rupp Karl","unstructured":"Karl Rupp, M. Horovitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten. 42. Years of microprocessor trend data. by Karlrupp. Net.[Online (42).","journal-title":"by Karlrupp. Net.[Online"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1109\/MICRO.2010.20","volume-title":"2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Sanchez Daniel","year":"2010","unstructured":"Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 187\u2013198."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000073"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485963"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2016.25"},{"key":"e_1_3_1_28_2","first-page":"1","volume-title":"Hot Chips Symposium","author":"Stuecheli Jeff","year":"2013","unstructured":"Jeff Stuecheli. 2013. POWER8. In Hot Chips Symposium. 1\u201320."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2002.997877"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1145\/3079856.3080214","volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture","author":"Tsai Po-An","year":"2017","unstructured":"Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-defined cache hierarchies. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 652\u2013665."},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/MICRO.2006.38","volume-title":"2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201906)","author":"Varadarajan Keshavan","year":"2006","unstructured":"Keshavan Varadarajan, S. K. Nandy, Vishal Sharda, Amrutur Bharadwaj, Ravi Iyer, Srihari Makineni, and Donald Newell. 2006. Molecular caches: A caching structure for dynamic creation of application-specific heterogeneous cache regions. In 2006 39th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201906). IEEE, 433\u2013442."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173197"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1109\/ISCA.2005.53","volume-title":"32nd International Symposium on Computer Architecture (ISCA\u201905)","author":"Zhang Michael","year":"2005","unstructured":"Michael Zhang and Krste Asanovic. 2005. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In 32nd International Symposium on Computer Architecture (ISCA\u201905). IEEE, 336\u2013345."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3572911","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3572911","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:38Z","timestamp":1750182698000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3572911"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3572911"],"URL":"https:\/\/doi.org\/10.1145\/3572911","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2023,3]]},"assertion":[{"value":"2022-03-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}