{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T16:57:16Z","timestamp":1781283436267,"version":"3.54.1"},"reference-count":137,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T00:00:00Z","timestamp":1726272000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>\n            Modern computing systems access data in main memory at coarse granularity (e.g., at 512-bit cache block granularity). Coarse-grained access leads to wasted energy because the system does not use all individually accessed small portions (e.g.,\n            <jats:italic>words<\/jats:italic>\n            , each of which typically is 64 bits) of a cache block. In modern DRAM-based computing systems, two key coarse-grained access mechanisms lead to wasted energy: large and fixed-size (i) data transfers between DRAM and the memory controller and (ii) DRAM row activations.\n          <\/jats:p>\n          <jats:p>We propose Sectored DRAM, a new, low-overhead DRAM substrate that reduces wasted energy by enabling fine-grained DRAM data transfer and DRAM row activation. To retrieve only useful data from DRAM, Sectored DRAM exploits the observation that many cache blocks are not fully utilized in many workloads due to poor spatial locality. Sectored DRAM predicts the words in a cache block that will likely be accessed during the cache block\u2019s residency in cache and (i) transfers only the predicted words on the memory channel by dynamically tailoring the DRAM data transfer size for the workload and (ii) activates a smaller set of cells that contain the predicted words by carefully operating physically isolated portions of DRAM rows (i.e., mats). Activating a smaller set of cells on each access relaxes DRAM power delivery constraints and allows the memory controller to schedule DRAM accesses faster.<\/jats:p>\n          <jats:p>We evaluate Sectored DRAM using 41 workloads from widely used benchmark suites. Compared to a system with coarse-grained DRAM, Sectored DRAM reduces the DRAM energy consumption of highly memory intensive workloads by up to (on average) 33% (20%) while improving their performance by up to (on average) 36% (17%). Sectored DRAM\u2019s DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23%. Sectored DRAM\u2019s DRAM chip area overhead is 1.7% of the area of a modern DDR4 chip. Compared to state-of-the-art fine-grained DRAM architectures, Sectored DRAM greatly reduces DRAM energy consumption, does not reduce DRAM bandwidth, and can be implemented with low hardware cost. Sectored DRAM provides 89% of the performance benefits of, consumes 12% less DRAM energy than, and takes up 34% less DRAM chip area than a high-performance state-of-the-art fine-grained DRAM architecture (Half-DRAM). It is our hope and belief that Sectored DRAM\u2019s ideas and results will help to enable more efficient and high-performance memory systems. To this end, we open source Sectored DRAM at https:\/\/github.com\/CMU-SAFARI\/Sectored-DRAM.<\/jats:p>","DOI":"10.1145\/3673653","type":"journal-article","created":{"date-parts":[[2024,6,14]],"date-time":"2024-06-14T11:29:10Z","timestamp":1718364550000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5333-5726","authenticated-orcid":false,"given":"Ataberk","family":"Olgun","sequence":"first","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2718-0297","authenticated-orcid":false,"given":"F. Nisa","family":"Bostanci","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1557-4819","authenticated-orcid":false,"given":"Geraldo","family":"Francisco de Oliveira Junior","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9291-3626","authenticated-orcid":false,"given":"Yahya Can","family":"Tugrul","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1129-1889","authenticated-orcid":false,"given":"Rahul","family":"Bera","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9333-6077","authenticated-orcid":false,"given":"Abdullah Giray","family":"Yaglikci","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9766-007X","authenticated-orcid":false,"given":"Hasan","family":"Hassan","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2701-3787","authenticated-orcid":false,"given":"Oguz","family":"Ergin","sequence":"additional","affiliation":[{"name":"TOBB University of Economics and Technology, Ankara Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0075-2312","authenticated-orcid":false,"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"ETH Zurich, Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,9,14]]},"reference":[{"key":"e_1_3_3_2_2","volume-title":"SC","year":"2009","unstructured":"Jung Ho Ahn et\u00a0al. 2009. Future scaling of processor-memory interfaces. In SC."},{"key":"e_1_3_3_3_2","doi-asserted-by":"crossref","unstructured":"Jung Ho Ahn et\u00a0al. 2009. Multicore DIMM: An energy efficient memory module with independently controlled DRAMs. IEEE Computer Architecture Letters 8 1 (2009) 5\u20138.","DOI":"10.1109\/L-CA.2008.13"},{"key":"e_1_3_3_4_2","volume-title":"CF","year":"2021","unstructured":"Tareq Alawneh, Raimund Kirner, and Catherine Menon. 2021. Dynamic row activation mechanism for multi-core systems. In CF."},{"key":"e_1_3_3_5_2","doi-asserted-by":"crossref","unstructured":"D. B. Alpert et\u00a0al. 1988. Performance trade-offs for microprocessor cache memories. IEEE Micro 8 4 (1988) 44\u201354.","DOI":"10.1109\/40.7771"},{"key":"e_1_3_3_6_2","unstructured":"AMD. 2013. BIOS and Kernel Developer\u2019s Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors. AMD."},{"key":"e_1_3_3_7_2","unstructured":"AMD. 2022. uProf User Guide. Retrieved June 17 2024 from https:\/\/www.amd.com\/content\/dam\/amd\/en\/documents\/developer\/version-4-1-documents\/uprof\/uprof-ug-rev-4.1.pdf"},{"key":"e_1_3_3_8_2","volume-title":"HPCA","year":"1995","unstructured":"C. Anderson et\u00a0al. 1995. Two techniques for improving performance on bus-based multiprocessors. In HPCA."},{"key":"e_1_3_3_9_2","doi-asserted-by":"crossref","unstructured":"Rajeev Balasubramonian et\u00a0al. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM TACO 14 2 (2017) Article 14 25 pages.","DOI":"10.1145\/3085572"},{"key":"e_1_3_3_10_2","volume-title":"MICRO","year":"2022","unstructured":"Rahul Bera et\u00a0al. 2022. Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction. In MICRO."},{"key":"e_1_3_3_11_2","volume-title":"MICRO","year":"2021","unstructured":"Rahul Bera et\u00a0al. 2021. Pythia: A customizable hardware prefetching framework using online reinforcement learning. In MICRO."},{"key":"e_1_3_3_12_2","volume-title":"ISCA","year":"2015","unstructured":"Ishwar Bhati et\u00a0al. 2015. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions. In ISCA."},{"key":"e_1_3_3_13_2","doi-asserted-by":"crossref","unstructured":"Tony M. Brewer. 2010. Instruction set innovations for the Convey HC-1 computer. IEEE Micro 30 2 (2010) 70\u201379.","DOI":"10.1109\/MM.2010.36"},{"key":"e_1_3_3_14_2","unstructured":"Karthik Chandrasekar et\u00a0al. 2012. DRAMPower: Open-Source DRAM Power & Energy Estimation Tool. Retrieved June 17 2024 from http:\/\/www.drampower.info"},{"key":"e_1_3_3_15_2","volume-title":"SIGMETRICS","year":"2016","unstructured":"Kevin K. Chang et\u00a0al. 2016. Understanding latency variation in modern DRAM chips: Experimental characterization, analysis, and optimization. In SIGMETRICS."},{"key":"e_1_3_3_16_2","volume-title":"SIGMETRICS","year":"2017","unstructured":"Kevin K. Chang et\u00a0al. 2017. Understanding reduced-voltage operation in modern DRAM devices: Experimental characterization, analysis, and mechanisms. In SIGMETRICS."},{"key":"e_1_3_3_17_2","volume-title":"HPCA","year":"2017","unstructured":"Niladrish Chatterjee et\u00a0al. 2017. Architecting an energy-efficient DRAM system for GPUs. In HPCA."},{"key":"e_1_3_3_18_2","volume-title":"HPCA","year":"2004","unstructured":"C. F. Chen et\u00a0al. 2004. Accurate and complexity-effective spatial pattern prediction. In HPCA."},{"key":"e_1_3_3_19_2","volume-title":"FTCS","year":"1996","unstructured":"C. L. Chen. 1996. Symbol error correcting codes for memory applications. In FTCS."},{"key":"e_1_3_3_20_2","doi-asserted-by":"crossref","unstructured":"Elliott Cooper-Balis et\u00a0al. 2010. Fine-grained activation for power reduction in DRAM. IEEE Micro 30 3 (2010) 34\u201347.","DOI":"10.1109\/MM.2010.43"},{"key":"e_1_3_3_21_2","volume-title":"WSEAS","year":"2006","unstructured":"Ahmed Dalalah et\u00a0al. 2006. New hardware architecture for bit-counting. In WSEAS."},{"key":"e_1_3_3_22_2","volume-title":"MICRO","year":"2009","unstructured":"Reetuparna Das et\u00a0al. 2009. Application-aware prioritization mechanisms for on-chip networks. In MICRO."},{"key":"e_1_3_3_23_2","volume-title":"ICAC","year":"2011","unstructured":"Howard David et\u00a0al. 2011. Memory power management via dynamic voltage\/frequency scaling. In ICAC."},{"key":"e_1_3_3_24_2","unstructured":"Robert H. Dennard. 1968. Field-effect transistor memory. (July 1967). Patent No. 3387286A Filed July 14th. 1967; Issued June 4th. 1968."},{"key":"e_1_3_3_25_2","volume-title":"ISCA","year":"2011","unstructured":"Eiman Ebrahimi et\u00a0al. 2011. Prefetch-aware shared resource management for multi-core systems. In ISCA."},{"key":"e_1_3_3_26_2","volume-title":"MICRO","year":"2009","unstructured":"Eiman Ebrahimi et\u00a0al. 2009. Coordinated control of multiple prefetchers in multi-core systems. In MICRO."},{"key":"e_1_3_3_27_2","volume-title":"HPCA","year":"2009","unstructured":"Eiman Ebrahimi et\u00a0al. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA."},{"key":"e_1_3_3_28_2","doi-asserted-by":"crossref","unstructured":"Stijn Eyerman et\u00a0al. 2008. System-level performance metrics for multiprogram workloads. IEEE Micro 28 3 (2008) 42\u201353.","DOI":"10.1109\/MM.2008.44"},{"key":"e_1_3_3_29_2","volume-title":"MOCAST","year":"2018","unstructured":"Luca Frontini et\u00a0al. 2018. A very compact population count circuit for associative memories. In MOCAST."},{"key":"e_1_3_3_30_2","volume-title":"ISCA","year":"2019","unstructured":"Elba Garza et\u00a0al. 2019. Bit-level perceptron prediction for indirect branches. In ISCA."},{"key":"e_1_3_3_31_2","volume-title":"SIGMETRICS","year":"2019","unstructured":"Saugata Ghose et\u00a0al. 2019. Demystifying complex workload-DRAM interactions: An experimental study. In SIGMETRICS."},{"key":"e_1_3_3_32_2","volume-title":"MICRO","year":"2016","unstructured":"Heonjae Ha et\u00a0al. 2016. Improving energy efficiency of DRAM by exploiting half page row access. In MICRO."},{"key":"e_1_3_3_33_2","unstructured":"Greg Hamerly et\u00a0al. 2005. SimPoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7 (2005) 1\u201328."},{"key":"e_1_3_3_34_2","doi-asserted-by":"crossref","unstructured":"Per Hammarlund et\u00a0al. 2014. Haswell: The fourth-generation Intel core processor. IEEE Micro 34 2 (2014) 6\u201320.","DOI":"10.1109\/MM.2014.10"},{"key":"e_1_3_3_35_2","volume-title":"ISCA","year":"2016","unstructured":"Milad Hashemi et\u00a0al. 2016. Accelerating dependent cache misses with an enhanced memory controller. In ISCA."},{"key":"e_1_3_3_36_2","volume-title":"ISCA","year":"2019","unstructured":"Hasan Hassan et\u00a0al. 2019. CROW: A low-cost substrate for improving DRAM performance, energy efficiency, and reliability. In ISCA."},{"key":"e_1_3_3_37_2","volume-title":"HPCA","year":"2016","unstructured":"Hasan Hassan et\u00a0al. 2016. ChargeCache: Reducing DRAM latency by exploiting row access locality. In HPCA."},{"key":"e_1_3_3_38_2","volume-title":"ISCA","year":"1984","unstructured":"Mark D. Hill et\u00a0al. 1984. Experimental evaluation of on-chip microprocessor cache memories. In ISCA."},{"key":"e_1_3_3_39_2","volume-title":"ICS","year":"2004","unstructured":"Sorin Iacobovici et\u00a0al. 2004. Effective stream-based and execution-based data prefetching. In ICS."},{"key":"e_1_3_3_40_2","volume-title":"HPCA","year":"1999","unstructured":"K. Inoue et\u00a0al. 1999. Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM\/logic LSIs. In HPCA."},{"key":"e_1_3_3_41_2","unstructured":"Intel. 2022. Intel Alder Lake Events. Retrieved June 17 2024 from https:\/\/perfmon-events.intel.com\/"},{"key":"e_1_3_3_42_2","unstructured":"Intel. 2022. Intel Performance Counter Monitor\u2014A Better Way to Measure CPU Utilization. Retrieved June 17 2024 from https:\/\/intel.ly\/3xLo80Y"},{"key":"e_1_3_3_43_2","volume-title":"ISCA","year":"2008","unstructured":"E. Ipek et\u00a0al. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In ISCA."},{"key":"e_1_3_3_44_2","volume-title":"VLSI Memory Chip Design","year":"2013","unstructured":"Kiyoo Itoh. 2013. VLSI Memory Chip Design. Springer Science & Business Media."},{"key":"e_1_3_3_45_2","volume-title":"JESD79-3: DDR3 SDRAM Standard","year":"2007","unstructured":"JEDEC. 2007. JESD79-3: DDR3 SDRAM Standard. JEDEC."},{"key":"e_1_3_3_46_2","volume-title":"JESD79-4C: DDR4 SDRAM Standard","year":"2020","unstructured":"JEDEC. 2020. JESD79-4C: DDR4 SDRAM Standard. JEDEC."},{"key":"e_1_3_3_47_2","volume-title":"JESD79-5: DDR5 SDRAM Standard","year":"2020","unstructured":"JEDEC. 2020. JESD79-5: DDR5 SDRAM Standard. JEDEC."},{"key":"e_1_3_3_48_2","volume-title":"MICRO","year":"2003","unstructured":"Daniel A. Jim\u00e9nez. 2003. Fast path-based neural branch prediction. In MICRO."},{"key":"e_1_3_3_49_2","volume-title":"HPCA","year":"2001","unstructured":"Daniel A. Jim\u00e9nez et\u00a0al. 2001. Dynamic branch prediction with perceptrons. In HPCA."},{"key":"e_1_3_3_50_2","doi-asserted-by":"crossref","unstructured":"Daniel A. Jim\u00e9nez et\u00a0al. 2002. Neural methods for dynamic branch prediction. ACM Transactions on Computer Systems 20 4 (2002) 369\u2013397.","DOI":"10.1145\/571637.571639"},{"key":"e_1_3_3_51_2","volume-title":"MICRO","year":"2017","unstructured":"Daniel A. Jim\u00e9nez et\u00a0al. 2017. Multiperspective reuse prediction. In MICRO."},{"key":"e_1_3_3_52_2","volume-title":"ICCD","year":"1995","unstructured":"M. Kadiyala et\u00a0al. 1995. A dynamic cache sub-block design to reduce false sharing. In ICCD."},{"key":"e_1_3_3_53_2","volume-title":"MICRO","year":"2011","unstructured":"Dimitris Kaseridis et\u00a0al. 2011. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era. In MICRO."},{"key":"e_1_3_3_54_2","volume-title":"DRAM Circuit Design: Fundamental and High-Speed Topics","year":"2008","unstructured":"B. Keeth et\u00a0al. 2008. DRAM Circuit Design: Fundamental and High-Speed Topics (2nd ed.). Wiley-IEEE Press."},{"key":"e_1_3_3_55_2","volume-title":"HPCA","year":"2018","unstructured":"Jeremie S. Kim et\u00a0al. 2018. The DRAM latency PUF: Quickly evaluating physical unclonable functions by exploiting the latency-reliability tradeoff in modern commodity DRAM devices. In HPCA."},{"key":"e_1_3_3_56_2","volume-title":"HPCA","year":"2019","unstructured":"Jeremie S. Kim et\u00a0al. 2019. D-RaNGe: Using commodity DRAM devices to generate true random numbers with low latency and high throughput. In HPCA."},{"key":"e_1_3_3_57_2","volume-title":"ISCA","year":"2020","unstructured":"Jeremie S. Kim et\u00a0al. 2020. Revisiting RowHammer: An experimental analysis of modern DRAM devices and mitigation techniques. In ISCA."},{"key":"e_1_3_3_58_2","volume-title":"HPCA","year":"2010","unstructured":"Yoongu Kim et\u00a0al. 2010. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA."},{"key":"e_1_3_3_59_2","volume-title":"MICRO","year":"2010","unstructured":"Yoongu Kim et\u00a0al. 2010. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO."},{"key":"e_1_3_3_60_2","volume-title":"ISCA","year":"2012","unstructured":"Yoongu Kim et\u00a0al. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In ISCA."},{"key":"e_1_3_3_61_2","doi-asserted-by":"crossref","unstructured":"Yoongu Kim et\u00a0al. 2016. Ramulator: A fast and extensible DRAM simulator. IEEE Computer Architecture Letters 15 1 (2016) 45\u201349.","DOI":"10.1109\/LCA.2015.2414456"},{"key":"e_1_3_3_62_2","volume-title":"ISSCC","year":"2012","unstructured":"Kibong Koo et\u00a0al. 2012. A 1.2V 38nm 2.4Gb\/s\/pin 2Gb DDR4 SDRAM with bank group and \u00d74 half-page architecture. In ISSCC."},{"key":"e_1_3_3_63_2","volume-title":"ISCA","year":"1998","unstructured":"Sanjeev Kumar et\u00a0al. 1998. Exploiting spatial locality in data caches using spatial footprints. In ISCA."},{"key":"e_1_3_3_64_2","volume-title":"MICRO","year":"2012","unstructured":"Snehasish Kumar et\u00a0al. 2012. Amoeba-Cache: Adaptive blocks for eliminating waste in the memory hierarchy. In MICRO."},{"key":"e_1_3_3_65_2","volume-title":"MICRO","year":"2009","unstructured":"Chang Joo Lee et\u00a0al. 2009. Improving memory bank-level parallelism in the presence of prefetching. In MICRO."},{"key":"e_1_3_3_66_2","volume-title":"SIGMETRICS","year":"2017","unstructured":"Donghyuk Lee et\u00a0al. 2017. Design-induced latency variation in modern DRAM chips: Characterization, analysis, and latency reduction mechanisms. In SIGMETRICS."},{"key":"e_1_3_3_67_2","volume-title":"HPCA","year":"2015","unstructured":"Donghyuk Lee et\u00a0al. 2015. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case. In HPCA."},{"key":"e_1_3_3_68_2","volume-title":"HPCA","year":"2013","unstructured":"Donghyuk Lee et\u00a0al. 2013. Tiered-latency DRAM: A low latency and low cost DRAM architecture. In HPCA."},{"key":"e_1_3_3_69_2","volume-title":"PACT","year":"2015","unstructured":"Donghyuk Lee et\u00a0al. 2015. Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM. In PACT."},{"key":"e_1_3_3_70_2","volume-title":"HPCA","year":"2017","unstructured":"Yebin Lee et\u00a0al. 2017. Partial row activation for low-power DRAM system. In HPCA."},{"key":"e_1_3_3_71_2","doi-asserted-by":"crossref","unstructured":"C. Lefurgy et\u00a0al. 2003. Energy management for commercial servers. Computer 36 12 (2003) 39\u201348.","DOI":"10.1109\/MC.2003.1250880"},{"key":"e_1_3_3_72_2","doi-asserted-by":"crossref","unstructured":"Sheng Li et\u00a0al. 2013. The McPAT framework for multicore and manycore architectures simultaneously modeling power area and timing. ACM Transactions on Architecture and Code Optimization 10 1 (2013) Article 5 29 pages.","DOI":"10.1145\/2445572.2445577"},{"key":"e_1_3_3_73_2","volume-title":"MICRO","year":"2017","unstructured":"S. Li et\u00a0al. 2017. DRISA: A DRAM-based reconfigurable in-situ accelerator. In MICRO."},{"key":"e_1_3_3_74_2","doi-asserted-by":"crossref","unstructured":"J. S. Liptay. 1968. Structural aspects of the System\/360 Model 85 II: The cache. IBM Systems Journal 7 1 (1968) 15\u201321.","DOI":"10.1147\/sj.71.0015"},{"key":"e_1_3_3_75_2","volume-title":"ICPADS","year":"1997","unstructured":"Kuang-Chih Liu et\u00a0al. 1997. On the effectiveness of sectored caches in reducing false sharing misses. In ICPADS."},{"key":"e_1_3_3_76_2","unstructured":"Haocong Luo et\u00a0al. 2023. Ramulator 2.0: A modern modular and extensible DRAM simulator. arXiv:2308.11030 [cs.AR] (2023)."},{"key":"e_1_3_3_77_2","volume-title":"ISCA","year":"2023","unstructured":"Haocong Luo et\u00a0al. 2023. RowPress: Amplifying read disturbance in modern DRAM chips. In ISCA."},{"key":"e_1_3_3_78_2","doi-asserted-by":"crossref","unstructured":"Jack A. Mandelman et\u00a0al. 2002. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development 46 2.3 (2002) 187\u2013212.","DOI":"10.1147\/rd.462.0187"},{"key":"e_1_3_3_79_2","volume-title":"ISSCC","year":"1983","unstructured":"T. Mano et\u00a0al. 1983. Submicron VLSI memory circuits. In ISSCC."},{"key":"e_1_3_3_80_2","unstructured":"Deepak Molly Mathew et\u00a0al. 2020. Using runtime reverse engineering to optimize DRAM refresh. (April 2020). Patent No. 10622054B2 Filed Sept. 5th. 2018 Issued April 14th. 2020."},{"key":"e_1_3_3_81_2","volume-title":"MEMSYS","year":"2017","unstructured":"Deepak M. Mathew et\u00a0al. 2017. Using run-time reverse-engineering to optimize DRAM refresh. In MEMSYS."},{"key":"e_1_3_3_82_2","volume-title":"SDRAM, 4Gb: x4, x8, x16 DDR4 SDRAM Features","year":"2014","unstructured":"Micron Technology. 2014. SDRAM, 4Gb: x4, x8, x16 DDR4 SDRAM Features. Micron Technology."},{"key":"e_1_3_3_83_2","doi-asserted-by":"crossref","unstructured":"W. R. Moore. 1986. A review of fault-tolerant techniques for the enhancement of integrated circuit yield. Proceedings of the IEEE 74 5 (1986) 684\u2013698.","DOI":"10.1109\/PROC.1986.13531"},{"key":"e_1_3_3_84_2","volume-title":"USENIX Security","year":"2007","unstructured":"Thomas Moscibroda et\u00a0al. 2007. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security."},{"key":"e_1_3_3_85_2","volume-title":"Architecture Design for Soft Errors","year":"2008","unstructured":"Shubu Mukherjee. 2008. Architecture Design for Soft Errors. Morgan Kaufmann Publishers."},{"key":"e_1_3_3_86_2","volume-title":"IMW","year":"2013","unstructured":"Onur Mutlu. 2013. Memory scaling: A systems architecture perspective. In IMW."},{"key":"e_1_3_3_87_2","volume-title":"MICRO","year":"2007","unstructured":"Onur Mutlu et\u00a0al. 2007. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO."},{"key":"e_1_3_3_88_2","volume-title":"ISCA","year":"2008","unstructured":"Onur Mutlu et\u00a0al. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA."},{"key":"e_1_3_3_89_2","volume-title":"MICRO","year":"2006","unstructured":"Kyle J. Nesbit et\u00a0al. 2006. Fair queuing memory systems. In MICRO."},{"key":"e_1_3_3_90_2","volume-title":"ISCA","year":"2021","unstructured":"Ataberk Olgun et\u00a0al. 2021. QUAC-TRNG: High-throughput true random number generation using quadruple row activation in commodity DRAMs. In ISCA."},{"key":"e_1_3_3_91_2","volume-title":"HPCA","year":"2024","unstructured":"Geraldo F. Oliveira et\u00a0al. 2024. MIMDRAM: An end-to-end processing-using-DRAM system for high-throughput, energy-efficient and programmer-transparent multiple-instruction multiple-data computing. In HPCA."},{"key":"e_1_3_3_92_2","doi-asserted-by":"crossref","unstructured":"Geraldo F. Oliveira et\u00a0al. 2021. DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks. IEEE Access 9 (2021) 1\u201346.","DOI":"10.1109\/ACCESS.2021.3110993"},{"key":"e_1_3_3_93_2","volume-title":"MICRO","year":"2017","unstructured":"Mike O\u2019Connor et\u00a0al. 2017. Fine-grained DRAM: Energy-efficient DRAM for extreme bandwidth systems. In MICRO."},{"key":"e_1_3_3_94_2","volume-title":"Enabling Effective Error Mitigation in Memory Chips That Use On-Die Error-Correcting Codes","year":"2022","unstructured":"Minesh Patel. 2022. Enabling Effective Error Mitigation in Memory Chips That Use On-Die Error-Correcting Codes. Ph. D. Dissertation. ETH Z\u00fcrich."},{"key":"e_1_3_3_95_2","volume-title":"DSN","year":"2019","unstructured":"Minesh Patel et\u00a0al. 2019. Understanding and modeling on-die error correction in modern DRAM: An experimental study using real devices. In DSN."},{"key":"e_1_3_3_96_2","volume-title":"MICRO","year":"2020","unstructured":"Minesh Patel et\u00a0al. 2020. Bit-exact ECC recovery (BEER): Determining DRAM on-die ECC functions by exploiting DRAM data retention characteristics. In MICRO."},{"key":"e_1_3_3_97_2","volume-title":"ISCA","year":"2015","unstructured":"Indrani Paul et\u00a0al. 2015. Harmonia: Balancing compute and memory power in high-performance GPUs. In ISCA."},{"key":"e_1_3_3_98_2","volume-title":"HPCA","year":"2006","unstructured":"Prateek Pujara et\u00a0al. 2006. Increasing the cache efficiency by eliminating noise. In HPCA."},{"key":"e_1_3_3_99_2","volume-title":"HPCA","year":"2007","unstructured":"Moinuddin K. Qureshi et\u00a0al. 2007. Line distillation: Increasing cache capacity by filtering unused words in cache lines. In HPCA."},{"key":"e_1_3_3_100_2","unstructured":"Rambus. 2014. DRAM Power Model. Retrieved June 17 2024 from https:\/\/www.rambus.com\/energy\/"},{"key":"e_1_3_3_101_2","unstructured":"Rambus. 2017. TN-40-07: Calculating Memory Power for DDR4 SDRAM. Retrieved June 17 2024 from https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/technical-note\/dram\/tn4007_ddr4_power_calculation.pdf"},{"key":"e_1_3_3_102_2","volume-title":"MASCOTS","year":"2000","unstructured":"Jeffrey B. Rothman et\u00a0al. 2000. Sector cache design and performance. In MASCOTS."},{"key":"e_1_3_3_103_2","volume-title":"ICS","year":"1999","unstructured":"Jeffrey B. Rothman et\u00a0al. 1999. The pool of subsectors cache design. In ICS."},{"key":"e_1_3_3_104_2","volume-title":"ISHPC","year":"2002","unstructured":"Jeffrey B. Rothman et\u00a0al. 2002. Minerva: An adaptive subblock coherence protocol for improved SMP performance. In ISHPC."},{"key":"e_1_3_3_105_2","unstructured":"SAFARI Research Group. 2023. Ramulator\u2014GitHub Page. Retrieved June 17 2024 from https:\/\/github.com\/CMU-SAFARI\/ramulator"},{"key":"e_1_3_3_106_2","unstructured":"SAFARI Research Group. 2023. Ramulator 2.0\u2014GitHub Repository. Retrieved June 17 2024 from https:\/\/github.com\/CMU-SAFARI\/ramulator2"},{"key":"e_1_3_3_107_2","unstructured":"SAFARI Research Group. 2024. DAMOV Benchmark Suite and Simulation Framework. Retrieved June 17 2024 from https:\/\/github.com\/CMU-SAFARI\/DAMOV"},{"key":"e_1_3_3_108_2","unstructured":"SAFARI Research Group. 2024. Sectored DRAM\u2014GitHub Page. Retrieved June 17 2024 from https:\/\/github.com\/CMU-SAFARI\/Sectored-DRAM"},{"key":"e_1_3_3_109_2","volume-title":"MICRO","year":"2013","unstructured":"Vivek Seshadri et\u00a0al. 2013. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization. In MICRO."},{"key":"e_1_3_3_110_2","volume-title":"MICRO","year":"2017","unstructured":"Vivek Seshadri et\u00a0al. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In MICRO."},{"key":"e_1_3_3_111_2","unstructured":"Vivek Seshadri et\u00a0al. 2019. In-DRAM bulk bitwise execution engine. arXiv:1905.09822 [cs.AR] (2019)."},{"key":"e_1_3_3_112_2","volume-title":"ISCA","year":"1994","unstructured":"A. Seznec. 1994. Decoupled sectored caches: Conciliating low tag implementation cost. In ISCA."},{"key":"e_1_3_3_113_2","doi-asserted-by":"crossref","unstructured":"Alan Jay Smith. 1987. Line (block) size choice for CPU cache memories. IEEE Transactions on Computers C-36 9 (1987) 1063\u20131075.","DOI":"10.1109\/TC.1987.5009537"},{"key":"e_1_3_3_114_2","volume-title":"ASPLOS","year":"1987","unstructured":"J. E. Smith et\u00a0al. 1987. The ZS-1 central processor. In ASPLOS."},{"key":"e_1_3_3_115_2","doi-asserted-by":"crossref","unstructured":"J. E. Smith et\u00a0al. 1995. The microarchitecture of superscalar processors. Proceedings of the IEEE 83 12 (1995) 1609\u20131624.","DOI":"10.1109\/5.476078"},{"key":"e_1_3_3_116_2","volume-title":"ASPLOS","year":"2000","unstructured":"Allan Snavely et\u00a0al. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS."},{"key":"e_1_3_3_117_2","volume-title":"SC","year":"2014","unstructured":"Young Hoon Son et\u00a0al. 2014. Microbank: Architecting through-silicon interposer-based main memory systems. In SC."},{"key":"e_1_3_3_118_2","volume-title":"HPCA","year":"2007","unstructured":"Santhosh Srinath et\u00a0al. 2007. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA."},{"key":"e_1_3_3_119_2","unstructured":"Standard Performance Evaluation Corp. 2006. SPEC CPU\u00ae 2006. Retrieved June 17 2024 from http:\/\/www.spec.org\/cpu2006"},{"key":"e_1_3_3_120_2","unstructured":"Standard Performance Evaluation Corp. 2017. SPEC CPU\u00ae 2017. Retrieved June 17 2024 from http:\/\/www.spec.org\/cpu2017"},{"key":"e_1_3_3_121_2","doi-asserted-by":"crossref","unstructured":"Lavanya Subramanian et\u00a0al. 2016. BLISS: Balancing performance fairness and complexity in memory access scheduling. IEEE Transactions on Parallel and Distributed Systems 27 10 (2016) 3071\u20133087.","DOI":"10.1109\/TPDS.2016.2526003"},{"key":"e_1_3_3_122_2","volume-title":"ISCA","year":"2010","unstructured":"Kshitij Sudan et\u00a0al. 2010. Micro-pages: Increasing DRAM efficiency with locality-aware data placement. In ISCA."},{"key":"e_1_3_3_123_2","doi-asserted-by":"crossref","unstructured":"J. M. Tendler et\u00a0al. 2002. POWER4 system microarchitecture. IBM Journal of Research and Development 46 1 (2002) 5\u201325.","DOI":"10.1147\/rd.461.0005"},{"key":"e_1_3_3_124_2","volume-title":"MICRO","year":"2016","unstructured":"Elvira Teran et\u00a0al. 2016. Perceptron learning for reuse prediction. In MICRO."},{"key":"e_1_3_3_125_2","volume-title":"ISCA","year":"2010","unstructured":"Aniruddha N. Udipi et\u00a0al. 2010. Rethinking DRAM design and organization for energy-constrained multi-cores. In ISCA."},{"key":"e_1_3_3_126_2","volume-title":"ISCA","year":"2010","unstructured":"Thomas Vogelsang. 2010. Understanding the energy consumption of dynamic random access memories. In ISCA."},{"key":"e_1_3_3_127_2","volume-title":"ICCD","year":"2006","unstructured":"Frederick A. Ware et\u00a0al. 2006. Improving power and data efficiency with threaded memory modules. In ICCD."},{"key":"e_1_3_3_128_2","volume-title":"HPCA","year":"2010","unstructured":"Malcolm Ware et\u00a0al. 2010. Architecting for power management: The IBM\u00ae POWER7\u2122 approach. In HPCA."},{"key":"e_1_3_3_129_2","volume-title":"MICRO","year":"2022","unstructured":"A. Giray Yaglikci et\u00a0al. 2022. HiRA: Hidden row activation for reducing refresh latency of off-the-shelf DRAM chips. In MICRO."},{"key":"e_1_3_3_130_2","doi-asserted-by":"crossref","unstructured":"K. C. Yeager. 1996. The Mips R10000 superscalar microprocessor. IEEE Micro 16 2 (1996) 28\u201341.","DOI":"10.1109\/40.491460"},{"key":"e_1_3_3_131_2","doi-asserted-by":"crossref","unstructured":"Ravikiran Yeleswarapu et\u00a0al. 2020. Addressing multiple bit\/symbol errors in DRAM subsystem. arXiv:1908.01806 (2020).","DOI":"10.7717\/peerj-cs.359"},{"key":"e_1_3_3_132_2","volume-title":"ISCA","year":"2011","unstructured":"Doe Hyun Yoon et\u00a0al. 2011. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In ISCA."},{"key":"e_1_3_3_133_2","volume-title":"ISCA","year":"2012","unstructured":"Doe Hyun Yoon et\u00a0al. 2012. The dynamic granularity memory system. In ISCA."},{"key":"e_1_3_3_134_2","volume-title":"MICRO","year":"2009","unstructured":"George L. Yuan et\u00a0al. 2009. Complexity effective memory access scheduling for many-core accelerator architectures. In MICRO."},{"key":"e_1_3_3_135_2","volume-title":"HPCA","year":"2024","unstructured":"Ismail Emir Yuksel et\u00a0al. 2024. Functionally-complete Boolean logic in real DRAM chips: Experimental characterization and analysis. In HPCA."},{"key":"e_1_3_3_136_2","volume-title":"ISLPED","year":"2017","unstructured":"Chao Zhang and Xiaochen Guo. 2017. Enabling efficient fine-grained DRAM activations with interleaved I\/O. In ISLPED."},{"key":"e_1_3_3_137_2","volume-title":"ISCA","year":"2014","unstructured":"Tao Zhang et\u00a0al. 2014. Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation. In ISCA."},{"key":"e_1_3_3_138_2","volume-title":"MICRO","year":"2008","unstructured":"Hongzhong Zheng et\u00a0al. 2008. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency. In MICRO."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3673653","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3673653","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:06:07Z","timestamp":1750291567000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3673653"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,14]]},"references-count":137,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3673653"],"URL":"https:\/\/doi.org\/10.1145\/3673653","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,14]]},"assertion":[{"value":"2023-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}