{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:38Z","timestamp":1750220618671,"version":"3.41.0"},"reference-count":118,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T00:00:00Z","timestamp":1609286400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100002418","name":"Intel Corporation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100002418","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006785","name":"Google","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100016682","name":"VMware","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100016682","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Microsoft"},{"name":"Alibaba"},{"name":"Facebook"},{"DOI":"10.13039\/501100003816","name":"Huawei Technologies","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003816","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,3,31]]},"abstract":"<jats:p>\n            To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve the energy efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the\n            <jats:italic>refresh overhead<\/jats:italic>\n            , which is estimated to consume 40% of the total DRAM energy.\n          <\/jats:p>\n          <jats:p>\n            In this article, we propose a new mechanism,\n            <jats:italic>Refresh Triggered Computation (RTC)<\/jats:italic>\n            , that exploits the memory access patterns of CNN applications to reduce the number of\n            <jats:italic>refresh operations<\/jats:italic>\n            . RTC uses two major techniques to mitigate the refresh overhead. First,\n            <jats:italic>Refresh Triggered Transfer (RTT)<\/jats:italic>\n            is based on our\n            <jats:italic>new<\/jats:italic>\n            observation that a CNN application accesses a large portion of the DRAM in a predictable and recurring manner. Thus, the read\/write accesses of the application inherently refresh the DRAM, and therefore a significant fraction of refresh operations can be skipped. Second,\n            <jats:italic>Partial Array Auto-Refresh (PAAR)<\/jats:italic>\n            eliminates the refresh operations to DRAM regions that do not store any data.\n          <\/jats:p>\n          <jats:p>\n            We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16\n            <jats:italic>Gb<\/jats:italic>\n            DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs shows that RTC reduces average DRAM energy consumption by 24.4% and 61.3% for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for\n            <jats:italic>Face Recognition<\/jats:italic>\n            and\n            <jats:italic>Bayesian Confidence Propagation Neural Network (BCPNN)<\/jats:italic>\n            , respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time.\n          <\/jats:p>","DOI":"10.1145\/3417708","type":"journal-article","created":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T12:30:51Z","timestamp":1609331451000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Refresh Triggered Computation"],"prefix":"10.1145","volume":"18","author":[{"given":"Syed M. A. H.","family":"Jafri","sequence":"first","affiliation":[{"name":"KTH Royal Institute of Technology, Kista, Sweden"}]},{"given":"Hasan","family":"Hassan","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Z\u00fcrich, Switzerland"}]},{"given":"Ahmed","family":"Hemani","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology, Kista, Sweden"}]},{"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Z\u00fcrich, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2020,12,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2013.6522336"},{"key":"e_1_2_1_2_1","unstructured":"AMBA Specification. 1999. Rev. 2.0. ARM. Retrieved from http:\/\/www.arm.com.  AMBA Specification. 1999. Rev. 2.0. ARM. Retrieved from http:\/\/www.arm.com."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"S. Baek S. Cho and R. Melhem. 2013. Refresh now and then. IEEE Transactions on Computers (TC'13) 63 12 (2013) 3114--3126.  S. Baek S. Cho and R. Melhem. 2013. Refresh now and then. IEEE Transactions on Computers (TC'13) 63 12 (2013) 3114--3126.","DOI":"10.1109\/TC.2013.164"},{"volume-title":"Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15)","author":"Bhati I.","key":"e_1_2_1_4_1","unstructured":"I. Bhati , Z. Chishti , Shih-Lien Lu , and B. Jacob . 2015. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions . In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15) . 235--246. I. Bhati, Z. Chishti, Shih-Lien Lu, and B. Jacob. 2015. Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). 235--246."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"J. M. Chabloz and A. Hemani. 2014. Low-latency maximal-throughput communication interfaces for rationally related clock domains. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI'13) 22 3 (2013) 641--654.  J. M. Chabloz and A. Hemani. 2014. Low-latency maximal-throughput communication interfaces for rationally related clock domains. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI'13) 22 3 (2013) 641--654.","DOI":"10.1109\/TVLSI.2013.2252030"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2896377.2901453"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446095"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078505.3078590"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835946"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"volume-title":"2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16)","author":"Chen Y.","key":"e_1_2_1_11_1","unstructured":"Y. Chen , J. Emer , and V. Sze . 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks . In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16) . 367--379. Y. Chen, J. Emer, and V. Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16). 367--379."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2019.102942"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2015.33"},{"key":"e_1_2_1_14_1","unstructured":"Cobham. 2017. GRLIB IP Library User\u2019s Manual. Retrieved from http:\/\/www.gaisler.com\/products\/grlib\/grlib.pdf.  Cobham. 2017. GRLIB IP Library User\u2019s Manual. Retrieved from http:\/\/www.gaisler.com\/products\/grlib\/grlib.pdf."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2597652.2597663"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC.2018.8465769"},{"volume-title":"Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15)","author":"Du Z.","key":"e_1_2_1_17_1","unstructured":"Z. Du , R. Fasthuber , T. Chen , P. Ienne , L. Li , T. Luo , X. Feng , Y. Chen , and O. Temam . 2015. ShiDianNao: Shifting vision processing closer to the sensor . In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15) . 92--104. Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). 92--104."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.93"},{"volume-title":"2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'12)","author":"Esmaeilzadeh H.","key":"e_1_2_1_19_1","unstructured":"H. Esmaeilzadeh , A. Sampson , L. Ceze , and D. Burger . 2012. Neural acceleration for general-purpose approximate programs . In 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'12) . IEEE, 449--460. H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012. Neural acceleration for general-purpose approximate programs. In 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'12). IEEE, 449--460."},{"volume-title":"2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC'14)","author":"Farahini N.","key":"e_1_2_1_20_1","unstructured":"N. Farahini , A. Hemani , A. Lansner , F. Clermidy , and C. Svensson . 2014. A scalable custom simulation machine for the bayesian confidence propagation neural network model of the brain . In 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC'14) . IEEE, 578--585. N. Farahini, A. Hemani, A. Lansner, F. Clermidy, and C. Svensson. 2014. A scalable custom simulation machine for the bayesian confidence propagation neural network model of the brain. In 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC'14). IEEE, 578--585."},{"key":"e_1_2_1_21_1","volume-title":"Muhammad Adeel Tajammul, and Kolin Paul","author":"Farahini Nasim","year":"2014","unstructured":"Nasim Farahini , Ahmed Hemani , Hassan Sohofi , Syed M. A. H. Jafri , Muhammad Adeel Tajammul, and Kolin Paul . 2014 . Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric. Microprocessors and Microsystems ( 2014). Nasim Farahini, Ahmed Hemani, Hassan Sohofi, Syed M. A. H. Jafri, Muhammad Adeel Tajammul, and Kolin Paul. 2014. Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric. Microprocessors and Microsystems (2014)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3309697.3331482"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219617.3219661"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.13"},{"key":"e_1_2_1_25_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS'14). 2672--2680.  Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS'14). 2672--2680."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322231"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446096"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.62"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130218.3132339"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0007767"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TELSKS.2011.6143177"},{"key":"e_1_2_1_33_1","volume-title":"TMS320C55x DSP mnemonic instruction set reference guide. Literature Number: SPRU374G October","author":"Instruments Texas","year":"2002","unstructured":"Texas Instruments . 2002. TMS320C55x DSP mnemonic instruction set reference guide. Literature Number: SPRU374G October ( 2002 ). Texas Instruments. 2002. TMS320C55x DSP mnemonic instruction set reference guide. Literature Number: SPRU374G October (2002)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.21"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669156"},{"volume-title":"VLSI Memory Chip Design","author":"Itoh Kiyoo","key":"e_1_2_1_36_1","unstructured":"Kiyoo Itoh . 2013. VLSI Memory Chip Design . Vol. 5 . Springer Science 8 Business Media. Kiyoo Itoh. 2013. VLSI Memory Chip Design. Vol. 5. Springer Science 8 Business Media."},{"key":"e_1_2_1_37_1","unstructured":"ITRS. 2011. International Technology Roadmap for Semiconductors 2011 Edition: Executive Summary. Retrieved from http:\/\/www.itrs.net\/Links\/2011ITRS\/2011Chapters\/2011ExecSum.pdf.  ITRS. 2011. International Technology Roadmap for Semiconductors 2011 Edition: Executive Summary. Retrieved from http:\/\/www.itrs.net\/Links\/2011ITRS\/2011Chapters\/2011ExecSum.pdf."},{"key":"e_1_2_1_38_1","volume-title":"Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann.","author":"Jacob Bruce","year":"2010","unstructured":"Bruce Jacob , Spencer Ng , and David Wang . 2010 . Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann. Bruce Jacob, Spencer Ng, and David Wang. 2010. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann."},{"key":"e_1_2_1_39_1","unstructured":"Syed M. A. H. Jafri Hasan Hassan Ahmed Hemani and Onur Mutlu. 2019. Refresh triggered computation: Improving the energy efficiency of convolutional neural network accelerators. In arXiv preprint arXiv:1910.06672.  Syed M. A. H. Jafri Hasan Hassan Ahmed Hemani and Onur Mutlu. 2019. Refresh triggered computation: Improving the energy efficiency of convolutional neural network accelerators. In arXiv preprint arXiv:1910.06672."},{"volume-title":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS'17)","author":"Jafri Syed M. A. H.","key":"e_1_2_1_40_1","unstructured":"Syed M. A. H. Jafri , A. Hemani , K. Paul , and N. Abbas . 2017. MOCHA: Morphable locality and compression aware architecture for convolutional neural networks . In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS'17) . IEEE, 276--286. Syed M. A. H. Jafri, A. Hemani, K. Paul, and N. Abbas. 2017. MOCHA: Morphable locality and compression aware architecture for convolutional neural networks. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS'17). IEEE, 276--286."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISQED.2013.6523597"},{"key":"e_1_2_1_42_1","unstructured":"JEDEC. 2007. DDR3 SDRAM standard. JESD79-3.  JEDEC. 2007. DDR3 SDRAM standard. JESD79-3."},{"key":"e_1_2_1_43_1","unstructured":"JEDEC. 2014. Low Power Double Data Rate 4 (LPDDR4). Standard No. JESD209-4.  JEDEC. 2014. Low Power Double Data Rate 4 (LPDDR4). Standard No. JESD209-4."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168944"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818950.2818964"},{"key":"e_1_2_1_46_1","unstructured":"U. Kang Hak-Soo Yu Churoo Park Hongzhong Zheng John Halbert Kuljit Bains SeongJin Jang and Joosun Choi. 2014. Co-architecting controllers and DRAM to enhance DRAM process scaling. In The Memory Forum.  U. Kang Hak-Soo Yu Churoo Park Hongzhong Zheng John Halbert Kuljit Bains SeongJin Jang and Joosun Choi. 2014. Co-architecting controllers and DRAM to enhance DRAM process scaling. In The Memory Forum."},{"volume-title":"DRAM Circuit Design: Fundamental and High-Speed Topics","author":"Keeth Brent","key":"e_1_2_1_47_1","unstructured":"Brent Keeth , R. Jacob Baker , Brian Johnson , and Feng Lin . 2007. DRAM Circuit Design: Fundamental and High-Speed Topics . John Wiley 8 Sons. Brent Keeth, R. Jacob Baker, Brian Johnson, and Feng Lin. 2007. DRAM Circuit Design: Fundamental and High-Speed Topics. John Wiley 8 Sons."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2591971.2592000"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2016.30"},{"key":"e_1_2_1_50_1","volume-title":"A case for memory content-based detection and mitigation of data-dependent failures in DRAM","author":"Khan Samira","year":"2016","unstructured":"Samira Khan , Chris Wilkerson , Donghyuk Lee , Alaa R. Alameldeen , and Onur Mutlu . 2016. A case for memory content-based detection and mitigation of data-dependent failures in DRAM . IEEE Computer Architecture Letters (CAL '16) 16, 2 ( 2016 ), 88--93. Samira Khan, Chris Wilkerson, Donghyuk Lee, Alaa R. Alameldeen, and Onur Mutlu. 2016. A case for memory content-based detection and mitigation of data-dependent failures in DRAM. IEEE Computer Architecture Letters (CAL'16) 16, 2 (2016), 88--93."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123945"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/MWSCAS48704.2020.9184512"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2018.00051"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00026"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00011"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00059"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00060"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853210"},{"volume-title":"2012 39th Annual International Symposium on Computer Architecture (ISCA'12)","author":"Kim Y.","key":"e_1_2_1_59_1","unstructured":"Y. Kim , V. Seshadri , D. Lee , J. Liu , and O. Mutlu . 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM . In 2012 39th Annual International Symposium on Computer Architecture (ISCA'12) . IEEE, 368--379. Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In 2012 39th Annual International Symposium on Computer Architecture (ISCA'12). IEEE, 368--379."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.41390"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358280"},{"volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12)","author":"Krizhevsky Alex","key":"e_1_2_1_62_1","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012. Imagenet classification with deep convolutional neural networks . In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12) . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). 1097--1105."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2832911"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078505.3078533"},{"volume-title":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA'15)","author":"Lee D.","key":"e_1_2_1_67_1","unstructured":"D. Lee , Yoongu Kim , G. Pekhimenko , S. Khan , V. Seshadri , K. Chang , and O. Mutlu . 2015. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case . In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA'15) . IEEE, 489--501. D. Lee, Yoongu Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, and O. Mutlu. 2015. Adaptive-latency DRAM: Optimizing DRAM timing for the common-case. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA'15). IEEE, 489--501."},{"key":"e_1_2_1_68_1","volume-title":"2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA'13)","author":"Lee D.","year":"2013","unstructured":"D. Lee , Yoongu Kim , Vivek Seshadri , Jamie Liu , Lavanya Subramanian , and Onur Mutlu . 2013 . Tiered-latency DRAM: A low latency and low cost DRAM architecture . In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA'13) . IEEE, 615--626. D. Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, and Onur Mutlu. 2013. Tiered-latency DRAM: A low latency and low cost DRAM architecture. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA'13). IEEE, 615--626."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2015.51"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.1996.569409"},{"volume-title":"Embedded DSP Processor Design: Application Specific Instruction Set Processors","author":"Liu Dake","key":"e_1_2_1_71_1","unstructured":"Dake Liu . 2008. Embedded DSP Processor Design: Application Specific Instruction Set Processors . Elsevier . Dake Liu. 2008. Embedded DSP Processor Design: Application Specific Instruction Set Processors. Elsevier."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366231.2337161"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485928"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370816.2370869"},{"volume-title":"Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11)","author":"Liu Song","key":"e_1_2_1_75_1","unstructured":"Song Liu , Karthik Pattabiraman , Thomas Moscibroda , and Benjamin G. Zorn . 2011. Flikker: Saving DRAM refresh-power through critical data partitioning . In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11) . 213--224. Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11). 213--224."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00061"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.50"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2012.21"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/1016720.1016726"},{"key":"e_1_2_1_80_1","unstructured":"Micron. 2014. Mobile LPDDR3 SDRAM. Retrieved from https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/mobile-dram\/low-power-dram\/lpddr3\/253b_12-5x12-5_2ch_8-16gb_2c0f_mobile_lpddr3.pdf?rev=1b66d5710434460eb13dc3be8faa6d77.  Micron. 2014. Mobile LPDDR3 SDRAM. Retrieved from https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/mobile-dram\/low-power-dram\/lpddr3\/253b_12-5x12-5_2ch_8-16gb_2c0f_mobile_lpddr3.pdf?rev=1b66d5710434460eb13dc3be8faa6d77."},{"volume-title":"Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13)","author":"Mukundan Janani","key":"e_1_2_1_81_1","unstructured":"Janani Mukundan , Hillery Hunter , Kyu-hyoun Kim, Jeffrey Stuecheli , and Jos\u00e9 F. Mart\u00ednez . 2013. Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems . In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13) . 48--59. Janani Mukundan, Hillery Hunter, Kyu-hyoun Kim, Jeffrey Stuecheli, and Jos\u00e9 F. Mart\u00ednez. 2013. Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13). 48--59."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155664"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/IMW.2013.6582088"},{"key":"e_1_2_1_84_1","volume-title":"Automation and Test in Europe Conference and Exhibition (DATE'17)","author":"Mutlu Onur","year":"2017","unstructured":"Onur Mutlu . 2017 . The RowHammer problem and other issues we may face as memory becomes denser. In Design , Automation and Test in Europe Conference and Exhibition (DATE'17) . IEEE, 1116--1121. Onur Mutlu. 2017. The RowHammer problem and other issues we may face as memory becomes denser. In Design, Automation and Test in Europe Conference and Exhibition (DATE'17). IEEE, 1116--1121."},{"key":"e_1_2_1_85_1","volume-title":"RowHammer: A retrospective","author":"Mutlu Onur","year":"2019","unstructured":"Onur Mutlu and Jeremie Kim . 2019. RowHammer: A retrospective . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD '19) ( 2019 ). Onur Mutlu and Jeremie Kim. 2019. RowHammer: A retrospective. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD'19) (2019)."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.7"},{"key":"e_1_2_1_87_1","volume-title":"Research problems and opportunities in memory systems. Supercomputing Frontiers and Innovations (SUPERFRI'15) 1, 3","author":"Mutlu Onur","year":"2015","unstructured":"Onur Mutlu and Lavanya Subramanian . 2015. Research problems and opportunities in memory systems. Supercomputing Frontiers and Innovations (SUPERFRI'15) 1, 3 ( 2015 ), 19--55. Onur Mutlu and Lavanya Subramanian. 2015. Research problems and opportunities in memory systems. Supercomputing Frontiers and Innovations (SUPERFRI'15) 1, 3 (2015), 19--55."},{"volume-title":"2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA'13)","author":"Nair Prashant","key":"e_1_2_1_88_1","unstructured":"Prashant Nair , Chia-Chen Chou , and Moinuddin K. Qureshi . 2013. A case for refresh pausing in DRAM memory systems . In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA'13) . IEEE, 627--638. Prashant Nair, Chia-Chen Chou, and Moinuddin K. Qureshi. 2013. A case for refresh pausing in DRAM memory systems. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA'13). IEEE, 627--638."},{"key":"e_1_2_1_89_1","volume-title":"Qureshi","author":"Nair Prashant J.","year":"2014","unstructured":"Prashant J. Nair , Chia-Chen Chou , and Moinuddin K . Qureshi . 2014 . Refresh pausing in DRAM memory systems. ACM Transactions on Architecture and Code Optimization (TACO' 14) 11, 1 (2014), 1--26. Prashant J. Nair, Chia-Chen Chou, and Moinuddin K. Qureshi. 2014. Refresh pausing in DRAM memory systems. ACM Transactions on Architecture and Code Optimization (TACO'14) 11, 1 (2014), 1--26."},{"volume-title":"Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13)","author":"Nair Prashant J.","key":"e_1_2_1_90_1","unstructured":"Prashant J. Nair , Dae-Hyun Kim , and Moinuddin K. Qureshi . 2013. ArchShield: Architectural framework for assisting DRAM scaling by tolerating high error rates . In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13) . 72--83. Prashant J. Nair, Dae-Hyun Kim, and Moinuddin K. Qureshi. 2013. ArchShield: Architectural framework for assisting DRAM scaling by tolerating high error rates. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13). 72--83."},{"volume-title":"2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16)","author":"Nair Prashant J.","key":"e_1_2_1_91_1","unstructured":"Prashant J. Nair , Vilas Sridharan , and Moinuddin K. Qureshi . 2016. XED: Exposing on-die error detection information for strong memory reliability . In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16) . IEEE, 341--353. Prashant J. Nair, Vilas Sridharan, and Moinuddin K. Qureshi. 2016. XED: Exposing on-die error detection information for strong memory reliability. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16). IEEE, 341--353."},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2019.00017"},{"volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA'17)","author":"Patel M.","key":"e_1_2_1_93_1","unstructured":"M. Patel , J. S. Kim , and O. Mutlu . 2017. The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions . In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA'17) . 255--268. M. Patel, J. S. Kim, and O. Mutlu. 2017. The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA'17). 255--268."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00034"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.58"},{"volume-title":"Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13)","author":"Qadeer Wajahat","key":"e_1_2_1_96_1","unstructured":"Wajahat Qadeer , Rehan Hameed , Ofer Shacham , Preethi Venkatesan , Christos Kozyrakis , and Mark A. Horowitz . 2013. Convolution engine: Balancing efficiency 8 flexibility in specialized computing . In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13) . 24--35. Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A. Horowitz. 2013. Convolution engine: Balancing efficiency 8 flexibility in specialized computing. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13). 24--35."},{"volume-title":"2015 45th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN'15)","author":"Qureshi M. K.","key":"e_1_2_1_97_1","unstructured":"M. K. Qureshi , Dae-Hyun Kim , S. Khan , P. J. Nair , and O. Mutlu . 2015. AVATAR: A variable-retention-time (VRT) aware refresh for DRAM systems . In 2015 45th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN'15) . IEEE, 427--437. M. K. Qureshi, Dae-Hyun Kim, S. Khan, P. J. Nair, and O. Mutlu. 2015. AVATAR: A variable-retention-time (VRT) aware refresh for DRAM systems. In 2015 45th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN'15). IEEE, 427--437."},{"volume-title":"Proceedings of the 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'13)","author":"Seshadri Vivek","key":"e_1_2_1_99_1","unstructured":"Vivek Seshadri , Yoongu Kim , Chris Fallin , Donghyuk Lee , Rachata Ausavarungnirun , Gennady Pekhimenko , Yixin Luo , Onur Mutlu , Phillip B. Gibbons , Michael A. Kozuch , and Todd C. Mowry . 2013. RowClone: Fast and energy- efficient In-DRAM bulk data copy and initialization . In Proceedings of the 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'13) . 185--197. Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and energy- efficient In-DRAM bulk data copy and initialization. In Proceedings of the 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'13). 185--197."},{"volume-title":"2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'17)","author":"Seshadri Vivek","key":"e_1_2_1_100_1","unstructured":"Vivek Seshadri , Donghyuk Lee , Thomas Mullins , Hasan Hassan , Amirali Boroumand , Jeremie Kim , Michael A. Kozuch , Onur Mutlu , Phillip B. Gibbons , and Todd C. Mowry . 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology . In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'17) . IEEE, 273--287. Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'17). IEEE, 273--287."},{"key":"e_1_2_1_101_1","volume-title":"In-DRAM bulk bitwise execution engine. arXiv preprint arXiv:1905.09822","author":"Seshadri Vivek","year":"2019","unstructured":"Vivek Seshadri and Onur Mutlu . 2019. In-DRAM bulk bitwise execution engine. arXiv preprint arXiv:1905.09822 ( 2019 ). Vivek Seshadri and Onur Mutlu. 2019. In-DRAM bulk bitwise execution engine. arXiv preprint arXiv:1905.09822 (2019)."},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSD.2015.70"},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(81)90087-5"},{"volume-title":"2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16)","author":"Song H.","key":"e_1_2_1_104_1","unstructured":"H. Song , L. Xingyu , M. Huizi , P. Jing , P. Ardavan , A. H. Mark , and J. D. William . 2016. EIE: Efficient inference engine on compressed deep neural network . In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16) . IEEE, 243--254. H. Song, L. Xingyu, M. Huizi, P. Jing, P. Ardavan, A. H. Mark, and J. D. William. 2016. EIE: Efficient inference engine on compressed deep neural network. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA'16). IEEE, 243--254."},{"volume-title":"2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'10)","author":"Stuecheli J.","key":"e_1_2_1_105_1","unstructured":"J. Stuecheli , D. Kaseridis , H. C. Hunter , and L. K. John . 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory . In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'10) . IEEE, 375--384. J. Stuecheli, D. Kaseridis, H. C. Hunter, and L. K. John. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In 2010 43rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'10). IEEE, 375--384."},{"key":"e_1_2_1_106_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.5555\/1509633.1509742"},{"key":"e_1_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.1587\/transfun.E92.A.1161"},{"key":"e_1_2_1_109_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2012.6237031"},{"key":"e_1_2_1_111_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00037"},{"key":"e_1_2_1_112_1","doi-asserted-by":"publisher","DOI":"10.1145\/378239.378521"},{"key":"e_1_2_1_114_1","volume-title":"Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture (HPCA'06)","author":"Venkatesan Ravi K.","year":"2006","unstructured":"Ravi K. Venkatesan , Stephen Herr , and Eric Rotenberg . 2006 . Retention-Aware Placement inDRAM (RAPID): Software Methods for Quasi-Non-Volatile DRAM . In Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture (HPCA'06) . IEEE, 155--165. Ravi K. Venkatesan, Stephen Herr, and Eric Rotenberg. 2006. Retention-Aware Placement inDRAM (RAPID): Software Methods for Quasi-Non-Volatile DRAM. In Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture (HPCA'06). IEEE, 155--165."},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.42"},{"key":"e_1_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00036"},{"key":"e_1_2_1_117_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00032"},{"key":"e_1_2_1_118_1","volume-title":"IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05)","volume":"5","author":"Xue Chun","year":"2005","unstructured":"Chun Xue , Zili Shao , Ying Chen , and E. H.-M. Sha . 2005 . Optimizing DSP scheduling via address assignment with array and loop transformation . In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05) , Vol. 5 . IEEE, 85--88. Chun Xue, Zili Shao, Ying Chen, and E. H.-M. Sha. 2005. Optimizing DSP scheduling via address assignment with array and loop transformation. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05), Vol. 5. IEEE, 85--88."},{"key":"e_1_2_1_119_1","doi-asserted-by":"publisher","DOI":"10.1145\/3229631.3229650"},{"key":"e_1_2_1_120_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853217"},{"key":"e_1_2_1_121_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS45731.2020.9180873"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3417708","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3417708","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:14Z","timestamp":1750197674000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3417708"}},"subtitle":["Improving the Energy Efficiency of Convolutional Neural Network Accelerators"],"short-title":[],"issued":{"date-parts":[[2020,12,30]]},"references-count":118,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,3,31]]}},"alternative-id":["10.1145\/3417708"],"URL":"https:\/\/doi.org\/10.1145\/3417708","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2020,12,30]]},"assertion":[{"value":"2019-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-12-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}