{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:46:36Z","timestamp":1750308396152,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T00:00:00Z","timestamp":1643587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>Traditional processor architectures utilize an external DRAM for data storage, while they also operate under worst-case timing constraints. Such designs are heavily constrained by the delay costs of the data transfer between the core pipeline and the DRAM, and they are incapable of exploiting the timing variations of their pipeline stages. In this work, we focus on a near-data processing methodology combined with a novel timing analysis technique that enables the adaptive frequency scaling of the core clock and boosts the performance of low-power designs. We propose a near-data processing and better-than-worst-case co-design methodology to efficiently move the instruction execution to the DRAM side and, at the same time, to allow the pipeline to operate at higher clock frequencies compared to the worst-case approach. To this end, we develop a timing analysis technique, which evaluates the timing requirements of individual instructions and we dynamically scale the clock frequency, according to the instructions types that currently occupy the pipeline. We evaluate the proposed methodology on six different RISC-V post-layout implementations using an HMC DRAM to enable the processing-in-memory (PIM) process. Results indicate an average speedup factor of 1.96\u00d7 with a 1.6\u00d7 reduction in energy consumption compared to a standard RISC-V PIM baseline implementation.<\/jats:p>","DOI":"10.1145\/3504005","type":"journal-article","created":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T13:43:02Z","timestamp":1643636582000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis"],"prefix":"10.1145","volume":"19","author":[{"given":"Tziouvaras","family":"Athanasios","sequence":"first","affiliation":[{"name":"University of Thessaly, Department of Computer Engineering, Volos, Greece"}]},{"given":"Dimitriou","family":"Georgios","sequence":"additional","affiliation":[{"name":"University of Thessaly, Department of Computer Science, Lamia, Greece"}]},{"given":"Stamoulis","family":"Georgios","sequence":"additional","affiliation":[{"name":"University of Thessaly, Department of Computer Engineering, Volos, Greece"}]}],"member":"320","published-online":{"date-parts":[[2022,1,31]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872887.2750386"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2016.8"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815967"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2014.55"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/JLT.2013.2283277"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2015.2473655"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3185768.3185771"},{"key":"e_1_3_1_9_2","first-page":"1","article-title":"Neural-PIM: Efficient processing-in-memory with neural approximation of peripherals","author":"Cao Weidong","year":"2021","unstructured":"Weidong Cao, Yilong Zhao, Adith Boloor, Yinhe Han, Xuan Zhang, and Li Jiang. 2021. Neural-PIM: Efficient processing-in-memory with neural approximation of peripherals. IEEE Trans. Comput. (2021), 1\u20131. https:\/\/doi.org\/10.1109\/TC.2021.3122905","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_1_10_2","volume-title":"BOOM v2: An Open-source Out-of-order RISC-V Core","author":"Celio Christopher","year":"2017","unstructured":"Christopher Celio, Pi-Feng Chiu, Borivoje Nikolic, David A. Patterson, and Krste Asanovi\u0107. 2017. BOOM v2: An Open-source Out-of-order RISC-V Core. Technical Report UCB\/EECS-2017-157. EECS Department, University of California, Berkeley."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001208"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/956417.956571"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322268"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056040"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2008.918077"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2714566"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2015.22"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446059"},{"key":"e_1_3_1_19_2","unstructured":"Saugata Ghose Kevin Hsieh Amirali Boroumand Rachata Ausavarungnirun and Onur Mutlu. 2018. Enabling the adoption of processing-in-memory: Challenges mechanisms future research directions. arXiv:cs.AR\/1802.00320. Retrieved from https:\/\/arxiv.org\/abs\/1802.00320."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2014.6757358"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2017.2777101"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3063378"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2010.5452747"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00076"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3094043"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662389"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2010.5416652"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/309847.310010"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2019.2913098"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2717764.2717783"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00073"},{"key":"e_1_3_1_32_2","volume-title":"Hybrid Memory Cube Specification 2.1","author":"consortium Hybrid memory cube","year":"2018","unstructured":"Hybrid memory cube consortium. 2018. Hybrid Memory Cube Specification 2.1. Technical Report. Retrieved from https:\/\/www.nuvation.com\/sites\/default\/files\/Nuvation-Engineering-Images\/Articles\/FPGAs-and-HMC\/HMC-30G-VSR_HMCC_Specification.pdf."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.3039498"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844483"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2013.72"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00065"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480934"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2018.2876312"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2016.08.001"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4257"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSD.2018.00106"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394885.3431522"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2007.373409"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACET48583.2019.8956266"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2014.6835958"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00055"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.5555\/2755753.2755890"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3504005","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3504005","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:45:05Z","timestamp":1750268705000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3504005"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,31]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3504005"],"URL":"https:\/\/doi.org\/10.1145\/3504005","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2022,1,31]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}