{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T11:52:35Z","timestamp":1759146755322,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2018,3,22]],"date-time":"2018-03-22T00:00:00Z","timestamp":1521676800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002551","name":"Seoul National University","doi-asserted-by":"crossref","award":["Promising-Pioneering Researcher Program 2015"],"award-info":[{"award-number":["Promising-Pioneering Researcher Program 2015"]}],"id":[{"id":"10.13039\/501100002551","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"crossref","award":["21A20151113068, NRF-2015K1A3A1A14021288"],"award-info":[{"award-number":["21A20151113068, NRF-2015K1A3A1A14021288"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2018,3,31]]},"abstract":"<jats:p>Modulo-scheduled course-grain reconfigurable array (CGRA) processors excel at exploiting loop-level parallelism at a high performance per watt ratio. The frequent reconfiguration of the array, however, causes between 25% and 45% of the consumed chip energy to be spent on the instruction memory and fetches therefrom. This article presents a hardware\/software codesign methodology for such architectures that is able to reduce both the size required to store the modulo-scheduled loops and the energy consumed by the instruction decode logic. The hardware modifications improve the spatial organization of a CGRA\u2019s execution plan by reorganizing the configuration memory into separate partitions based on a statistical analysis of code. A compiler technique optimizes the generated code in the temporal dimension by minimizing the number of signal changes. The optimizations achieve, on average, a reduction in code size of more than 63% and in energy consumed by the instruction decode logic by 70% for a wide variety of application domains. Decompression of the compressed loops can be performed in hardware with no additional latency, rendering the presented method ideal for low-power CGRAs running at high frequencies. The presented technique is orthogonal to dictionary-based compression schemes and can be combined to achieve a further reduction in code size.<\/jats:p>","DOI":"10.1145\/3162018","type":"journal-article","created":{"date-parts":[[2018,3,23]],"date-time":"2018-03-23T12:29:49Z","timestamp":1521808189000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Improving Energy Efficiency of Coarse-Grain Reconfigurable Arrays Through Modulo Schedule Compression\/Decompression"],"prefix":"10.1145","volume":"15","author":[{"given":"Hochan","family":"Lee","sequence":"first","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]},{"given":"Mansureh S.","family":"Moghaddam","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]},{"given":"Dongkwan","family":"Suh","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, Republic of Korea"}]},{"given":"Bernhard","family":"Egger","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2018,3,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2008.2001562"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370392"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.22"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1024499601571"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.49"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/1116164.1116486"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2012.6378687"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2013.6718396"},{"volume-title":"Proceedings of the 29th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-29)","author":"Conte Thomas M.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/3049832.3049854"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2014.05.009"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1216919.1216935"},{"volume-title":"Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA\u201903)","year":"2003","author":"Heysters Paul M.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the Workshop on Synthesis and System Integration of Mixed Technologies. 105--109","year":"1997","author":"Ishiura Nagisa","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.166"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/846216.846937"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2007.912133"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2012.6412157"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2008.2006039"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1165573.1165646"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/53990.54022"},{"volume-title":"Proceedings of the Workshop on Application Specific Processors.","author":"Lambrechts A.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2009.5377609"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2013.6718352"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2343045.2343048"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2492045.2492057"},{"volume-title":"Proceedings of the 2013 IEEE International Conference on Consumer Electronics (ICCE\u201913)","year":"2013","author":"Lee Won-Jong","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/513918.513929"},{"volume-title":"Proceedings of the 9th IEEE\/ACM\/IFIP International Conference on Hardware\/Software Codesign and System Synthesis (CODES+ISSS\u201913)","year":"2013","author":"Li Shuo","key":"e_1_2_1_29_1"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/296399.296435"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45234-8_7"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1049\/ip-cdt:20030833"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1531542.1531616"},{"volume-title":"Proceedings of the 2008 International Conference on Field Programmable Logic and Applications. 305--310","author":"Nishimura T.","key":"e_1_2_1_35_1"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542452.1542456"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASQED.2011.6111753"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629395.1629433"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080256"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/968880.969243"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192731"},{"volume-title":"Retrieved","year":"2011","key":"e_1_2_1_42_1"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2010.12"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2014.6865597"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2012.6412114"},{"key":"e_1_2_1_47_1","unstructured":"Synopsys. 2010. Synopsys Design Compiler 2010. Available at http:\/\/www.synopsys.com\/.  Synopsys. 2010. Synopsys Design Compiler 2010. Available at http:\/\/www.synopsys.com\/."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2893475"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2005.1568536"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/144953.145003"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3162018","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3162018","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:54Z","timestamp":1750213614000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3162018"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3,22]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,3,31]]}},"alternative-id":["10.1145\/3162018"],"URL":"https:\/\/doi.org\/10.1145\/3162018","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2018,3,22]]},"assertion":[{"value":"2017-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-03-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}