{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:23:59Z","timestamp":1740122639330,"version":"3.37.3"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"3-4","license":[{"start":{"date-parts":[[2024,11,16]],"date-time":"2024-11-16T00:00:00Z","timestamp":1731715200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,11,16]],"date-time":"2024-11-16T00:00:00Z","timestamp":1731715200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["71738","71738","71738"],"award-info":[{"award-number":["71738","71738","71738"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"Research Council of Finland","doi-asserted-by":"publisher","award":["31344","31344","31344"],"award-info":[{"award-number":["31344","31344","31344"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Tampere University"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Des Autom Embed Syst"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To improve the energy efficiency of computation, accelerators trade off performance and energy consumption for flexibility. Fixed-function accelerators reach high energy efficiency, but are inflexible. Adding programmability via an instruction set architecture\u00a0(ISA) incurs an energy consumption overhead, as instructions are fetched and decoded. To reduce it, hardware-controlled instruction caches and software-controlled components, such as loop buffers and (programmable) dictionaries improve the energy efficiency of instruction streams in embedded processors. Reducing the instruction overhead with code compression is well established and dictionary compression has been an effective approach due to its simplicity. Compared to static dictionaries, adding programmability improves the effectiveness. However, run-time-programmable dictionary compression and its effect on energy consumption has not been thoroughly studied. We describe a scheme to target energy efficiency by using fine-grained programmable dictionaries in embedded compute devices. Guided by compile-time analysis, the dictionary contents are changed during execution. On CHStone and Embench suites, our method reduces energy consumption on average by 11.4% and 3.8% with negligible run-time overhead. The addition of a loop buffer further reduces the energy consumption by 19.8% and 4.5% in the two suites. Our results indicate that programmable dictionary compression allows further energy reductions over an already highly tuned instruction stream.<\/jats:p>","DOI":"10.1007\/s10617-024-09290-2","type":"journal-article","created":{"date-parts":[[2024,11,16]],"date-time":"2024-11-16T08:17:42Z","timestamp":1731745062000},"page":"245-274","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Energy-efficient instruction compression with programmable dictionaries"],"prefix":"10.1007","volume":"28","author":[{"given":"Joonas","family":"Multanen","sequence":"first","affiliation":[]},{"given":"Barry","family":"de Bruin","sequence":"additional","affiliation":[]},{"given":"Henk","family":"Corporaal","sequence":"additional","affiliation":[]},{"given":"Pekka","family":"J\u00e4\u00e4skel\u00e4inen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,11,16]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Silven O, Jyrkk\u00e4 K (2007) Observations on power-efficiency trends in mobile communication devices. EURASIP J Embedded Syst 2007(1)","key":"9290_CR1","DOI":"10.1186\/1687-3963-2007-056976"},{"doi-asserted-by":"crossref","unstructured":"Collin M, Brorsson M (2003) Low power instruction fetch using profiled variable length instructions. In: Proceedings of the international systems-on-chip conference (SoC), pp 183\u2013188","key":"9290_CR2","DOI":"10.1109\/SOC.2003.1241489"},{"doi-asserted-by":"crossref","unstructured":"Lee Y, Avizienis R, Bishara A, Xia R, Lockhart D, Batten C, Asanovi\u0107 K (2011) Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In: Proceedings of the international symposium on computer architecture (ISCA), pp 129\u2013140","key":"9290_CR3","DOI":"10.1145\/2000064.2000080"},{"unstructured":"Schiavone D, Conti F, Rossi D, Gautschi M, Pullini A, Flamand E, Benini L (2017) Slow and steady wins the race? a comparison of ultra-low-power RISC-V cores for internet-of-things applications. In: Proceedings of international symposium on power and timing modeling, optimization and simulation (PATMOS)","key":"9290_CR4"},{"doi-asserted-by":"crossref","unstructured":"Molendijk M, Putter F, Gomony M, J\u00e4\u00e4skel\u00e4inen P, Corporaal H (2023) BrainTTA: A 28.6 TOPS\/W compiler programmable transport-triggered NN SoC. In: Proceedings of the international conference on computer design (ICCD), pp 78\u201385","key":"9290_CR5","DOI":"10.1109\/ICCD58817.2023.00022"},{"unstructured":"Roman Kaplan, Michael Stearns: Intel\u00ae\u00a0gaudi\u00ae\u00a03 AI accelerator. White paper, Intel Corporation (April 2024). https:\/\/www.intel.com\/content\/www\/us\/en\/content-details\/817486\/intel-gaudi-3-ai-accelerator-white-paper.html","key":"9290_CR6"},{"unstructured":"AMD Inc.: AI engines and their applications. White paper (September 2022). https:\/\/www.xilinx.com\/content\/dam\/xilinx\/support\/documents\/white_papers\/wp506-ai-engine.pdf","key":"9290_CR7"},{"doi-asserted-by":"crossref","unstructured":"Kin J, Munish Gupta Mangione-Smith WH (1997) The filter cache: an energy efficient memory structure. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 184\u2013193","key":"9290_CR8","DOI":"10.1109\/MICRO.1997.645809"},{"doi-asserted-by":"crossref","unstructured":"Hiraki M, Bajwa RS, Kojima H, Gorny DJ, Nitta K, Shri A (1996) Stage-skip pipeline: a low power processor architecture using a decoded instruction buffer. In: Proceedings of 1996 international symposium on low power electronics and design, pp 353\u2013358","key":"9290_CR9","DOI":"10.1109\/LPE.1996.547538"},{"doi-asserted-by":"crossref","unstructured":"Lee L, Moyer B, Arends J (1999) Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: Proceedings of the international symposium on low power electronics and design (ISLPED), pp 267\u2013269","key":"9290_CR10","DOI":"10.1145\/313817.313944"},{"issue":"7","key":"9290_CR11","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1145\/315253.314419","volume":"34","author":"G-R Uh","year":"1999","unstructured":"Uh G-R, Wang Y, Whalley D, Jinturkar S, Burns C, Cao V (1999) Effective exploitation of a zero overhead loop buffer. SIGPLAN Notices 34(7):10\u201319","journal-title":"SIGPLAN Notices"},{"doi-asserted-by":"crossref","unstructured":"Wolfe A, Chanin A (1992) Executing compressed programs on an embedded RISC architecture. In: Proceedings of the international symposium on microarchitecture (MICRO), pp. 81\u201391","key":"9290_CR12","DOI":"10.1109\/MICRO.1992.697002"},{"doi-asserted-by":"crossref","unstructured":"Lefurgy C, Bird P, Chen I-C Mudge (1997) Improving code density using compression techniques. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 194\u2013203","key":"9290_CR13","DOI":"10.1109\/MICRO.1997.645810"},{"issue":"5","key":"9290_CR14","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1109\/TVLSI.2006.876105","volume":"14","author":"Y Xie","year":"2006","unstructured":"Xie Y, Wolf W, Lekatsas H (2006) Code compression for embedded VLIW processors using variable-to-fixed coding. IEEE Trans Very Large Scale Integr Syst 14(5):525\u2013536","journal-title":"IEEE Trans Very Large Scale Integr Syst"},{"issue":"4","key":"9290_CR15","doi-asserted-by":"publisher","first-page":"673","DOI":"10.1109\/TCAD.2008.917563","volume":"27","author":"S Seong","year":"2008","unstructured":"Seong S, Mishra P (2008) Bitmask-based code compression for embedded systems. IEEE Trans Comput Aided Des Integr Circuits Syst 27(4):673\u2013685","journal-title":"IEEE Trans Comput Aided Des Integr Circuits Syst"},{"issue":"4","key":"9290_CR16","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1145\/1835420.1835424","volume":"15","author":"T Bonny","year":"2010","unstructured":"Bonny T, Henkel J (2010) Huffman-based code compression techniques for embedded processors. ACM Trans Design Autom Electron Syst 15(4):31\u201337","journal-title":"ACM Trans Design Autom Electron Syst"},{"doi-asserted-by":"crossref","unstructured":"Lin CH, Xie Y, Wolf W (2007) Code compression for VLIW embedded systems using a self-generating table. Trans Very Large Scale Integr Syst 15(10)","key":"9290_CR17","DOI":"10.1109\/TVLSI.2007.904097"},{"doi-asserted-by":"crossref","unstructured":"Lekatsas H, Henkel J, Jakkula V (2002) Design of an one-cycle decompression hardware for performance increase in embedded systems. In: Proceedings of the design automation conference (DAC), pp. 34\u201339","key":"9290_CR18","DOI":"10.1145\/513918.513929"},{"issue":"4","key":"9290_CR19","doi-asserted-by":"publisher","first-page":"467","DOI":"10.1109\/TC.2004.1268405","volume":"53","author":"L Benini","year":"2004","unstructured":"Benini L, Menichelli F, Olivieri M (2004) A class of code compression schemes for reducing power consumption in embedded microprocessor systems. IEEE Trans Comput 53(4):467\u2013482","journal-title":"IEEE Trans Comput"},{"doi-asserted-by":"crossref","unstructured":"Netto EW, Azevedo R, Centoducatte P, Araujo G (2004) Multi-profile instruction based compression. In: Proceedings of the symposium on computer architecture and high performance computing (SBAC-PAD), pp. 23\u201329","key":"9290_CR20","DOI":"10.1109\/SBAC-PAD.2004.26"},{"doi-asserted-by":"crossref","unstructured":"Das D, Kumar R, Chakrabarti PP (2005) Dictionary based code compression for variable length instruction encodings. In: Proceedings of the international conference on VLSI design held jointly with international conference on embedded systems design, pp 545\u2013550","key":"9290_CR21","DOI":"10.1109\/ICVD.2005.81"},{"doi-asserted-by":"crossref","unstructured":"Shrivastava K, Mishra P (2011) Dual code compression for embedded systems. In: Proceedings of the international conference on VLSI design (VLSID), pp. 177\u2013182","key":"9290_CR22","DOI":"10.1109\/VLSID.2011.13"},{"doi-asserted-by":"crossref","unstructured":"Benini L, Macii A, Macii E, Poncino M (1999) Selective instruction compression for memory energy reduction in embedded systems. In: Proceedings of the international symposium on low power electronics and design (ISLPED), pp 206\u2013211","key":"9290_CR23","DOI":"10.1145\/313817.313927"},{"doi-asserted-by":"crossref","unstructured":"Brorsson M, Collin M (2006) Adaptive and flexible dictionary code compression for embedded applications. In: Proceedings of the international conference on compilers, architecture and synthesis for embedded systems","key":"9290_CR24","DOI":"10.1145\/1176760.1176776"},{"doi-asserted-by":"crossref","unstructured":"Thuresson M, Sj\u00e4lander M, Stenstrom P (2009) A flexible code compression scheme using partitioned look-up tables. In: Proceedings of the international conference on high performance embedded architectures and compilers (HiPEAC), pp 95\u2013109","key":"9290_CR25","DOI":"10.1007\/978-3-540-92990-1_9"},{"doi-asserted-by":"crossref","unstructured":"Yoshida Y, Song B-Y, Okuhata H, Onoye T, Shirakawa I (1997) An object code compression approach to embedded processors. In: Proceedings of the international symposium on low power electronics and design (ISLPED), pp 265\u2013268","key":"9290_CR26","DOI":"10.1145\/263272.263349"},{"doi-asserted-by":"crossref","unstructured":"Multanen J, Hepola K, J\u00e4\u00e4skel\u00e4inen P (2020) Programmable dictionary code compression for instruction stream energy efficiency. In: Proceedings of the international conference on computer design (ICCD), pp 356\u2013363","key":"9290_CR27","DOI":"10.1109\/ICCD50377.2020.00066"},{"doi-asserted-by":"crossref","unstructured":"Conte T, Banerjia S, Larin S, Menezes K, Sathaye S (1996) Instruction fetch mechanisms for VLIW architectures with compressed encodings. In: Proceedings of the international symposium on microarchitecture. (MICRO), pp 201\u2013211","key":"9290_CR28","DOI":"10.1109\/MICRO.1996.566462"},{"doi-asserted-by":"crossref","unstructured":"Lefurgy C, Piccininni E, Mudge T (1999) Evaluation of a high performance code compression method. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 93\u2013102","key":"9290_CR29","DOI":"10.1109\/MICRO.1999.809447"},{"unstructured":"Ishiura N, Yamaguchi M (1997) Instruction code compression for application specific VLIW processors based on automatic field partitioning. In: Proceedings of international workshop on synthesis and system integration of mixed technologies (SASIMI)","key":"9290_CR30"},{"doi-asserted-by":"crossref","unstructured":"Lekatsas H, Wolf W (1998) Code compression for embedded systems. In: Proceedings of the design automation conference (DAC), pp 516\u2013521","key":"9290_CR31","DOI":"10.1145\/277044.277185"},{"issue":"4","key":"9290_CR32","doi-asserted-by":"publisher","first-page":"470","DOI":"10.1145\/944027.944032","volume":"8","author":"L Li","year":"2003","unstructured":"Li L, Chakrabarty K, Touba N (2003) Test data compression using dictionaries with selective entries and fixed-length indices. Trans Design Autom Electron Syst 8(4):470\u2013490","journal-title":"Trans Design Autom Electron Syst"},{"doi-asserted-by":"crossref","unstructured":"Hwu W-M, Mahlke S, Chen W, Chang P, Warter N, Bringmann R, Ouellette R, Hank R, Kiyohara T, Haab G, Holm J, Lavery D (1993) The superblock: an effective technique for VLIW and superscalar compilation. J Supercomput 7","key":"9290_CR33","DOI":"10.1007\/978-1-4615-3200-2_7"},{"doi-asserted-by":"crossref","unstructured":"Rawlins M, Gordon-Ross A (2011) On the interplay of loop caching, code compression, and cache configuration. In: Proceedings of the Asia and South pacific design automation conference (ASP-DAC), pp 243\u2013248","key":"9290_CR34","DOI":"10.1109\/ASPDAC.2011.5722191"},{"doi-asserted-by":"crossref","unstructured":"Benes R, Nowick S, Wolfe A (1998) A fast asynchronous huffman decoder for compressed-code embedded processors. In: Proceedings of the fourth international symposium on advanced research in asynchronous circuits and systems (ASYNC), pp 43\u201356","key":"9290_CR35","DOI":"10.1109\/ASYNC.1998.666493"},{"doi-asserted-by":"crossref","unstructured":"Pan H, Asanovi\u0107 K (2001) Heads and tails: a variable-length instruction format supporting parallel fetch and decode. In: Proceedings of the international conference on compilers, architecture, and synthesis for embedded systems (CASES), pp 168\u2013175","key":"9290_CR36","DOI":"10.1145\/502217.502244"},{"doi-asserted-by":"crossref","unstructured":"Lefurgy C, Piccininni E, Mudge T (2000) Reducing code size with run-time decompression. In: Proceedings of the international symposium on high-performance computer architecture (HPCA), pp 218\u2013228","key":"9290_CR37","DOI":"10.1109\/HPCA.2000.824352"},{"key":"9290_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2512464","volume":"13","author":"J Gu","year":"2013","unstructured":"Gu J, Guo H, Ishihara T (2013) Dlic: Decoded loop instructions caching for energy-aware embedded processors. ACM Trans Embedded Comput Syst 13:1\u201323","journal-title":"ACM Trans Embedded Comput Syst"},{"key":"9290_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2435227.2435251","volume":"12","author":"M Rawlins","year":"2013","unstructured":"Rawlins M, Gordon-Ross A (2013) Adaptive loop caching using lightweight runtime control flow analysis. ACM Trans Embedded Comput Syst 12:1\u201326","journal-title":"ACM Trans Embedded Comput Syst"},{"key":"9290_CR40","volume-title":"Microprocessor architectures: from VLIW to TTA","author":"H Corporaal","year":"1998","unstructured":"Corporaal H (1998) Microprocessor architectures: from VLIW to TTA. Wiley, Chichester, England"},{"key":"9290_CR41","first-page":"242","volume":"17","author":"Y Hara","year":"2009","unstructured":"Hara Y, Tomiyama H, Honda S, Takada H (2009) Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. J Inf Process 17:242\u2013254","journal-title":"J Inf Process"},{"unstructured":"Bennett J, Dabbelt P, Garlati C, Madhusudan G, Mudge T, Patterson D (2019) Embench: An evolving benchmark suite for embedded IoT computers from an academic-industrial cooperative","key":"9290_CR42"},{"doi-asserted-by":"crossref","unstructured":"J\u00e4\u00e4skel\u00e4inen P, Viitanen T, Takala J, Berg H (2017) HW\/SW co-design toolset for customization of exposed datapath processors. Comput Platforms Softw Defined Radio","key":"9290_CR43","DOI":"10.1007\/978-3-319-49679-5_8"},{"doi-asserted-by":"crossref","unstructured":"Li S, Chen K, Ahn JH, Brockman J, Jouppi N (2011) CACTI-P: architecture-level modeling for sram-based structures with advanced leakage reduction techniques. In: Proceedings of the international conference on computer-aided design, pp 694\u2013701","key":"9290_CR44","DOI":"10.1109\/ICCAD.2011.6105405"},{"issue":"2","key":"9290_CR45","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/MM.2014.12","volume":"34","author":"L Codrescu","year":"2014","unstructured":"Codrescu L, Anderson W, Venkumanhanti S, Zeng M, Plondke E, Koob C, Ingle A, Tabony C, Maule R (2014) Hexagon DSP: an architecture optimized for mobile multimedia and communications. IEEE Micro 34(2):34\u201343","journal-title":"IEEE Micro"},{"doi-asserted-by":"crossref","unstructured":"Chen C, Xiang X, Liu C, Shang Y, Guo R, Liu D, Lu Y, Hao Z, Luo J, Chen Z, Li C, Pu Y, Meng J, Yan X, Xie Y, Qi X (2020) Xuantie-910: a commercial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension : industrial product. In: Proceedings of the international symposium on computer architecture (ISCA), pp 52\u201364","key":"9290_CR46","DOI":"10.1109\/ISCA45697.2020.00016"}],"container-title":["Design Automation for Embedded Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10617-024-09290-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10617-024-09290-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10617-024-09290-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T05:04:00Z","timestamp":1733288640000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10617-024-09290-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,16]]},"references-count":46,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["9290"],"URL":"https:\/\/doi.org\/10.1007\/s10617-024-09290-2","relation":{},"ISSN":["0929-5585","1572-8080"],"issn-type":[{"type":"print","value":"0929-5585"},{"type":"electronic","value":"1572-8080"}],"subject":[],"published":{"date-parts":[[2024,11,16]]},"assertion":[{"value":"9 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 November 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}