{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T22:22:35Z","timestamp":1766269355648,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2009,10,1]],"date-time":"2009-10-01T00:00:00Z","timestamp":1254355200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002809","name":"Generalitat de Catalunya","doi-asserted-by":"publisher","award":["2009 SGR 1250"],"award-info":[{"award-number":["2009 SGR 1250"]}],"id":[{"id":"10.13039\/501100002809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004837","name":"Ministerio de Ciencia e Innovaci\u00f3n","doi-asserted-by":"publisher","award":["TIN2007-61763"],"award-info":[{"award-number":["TIN2007-61763"]}],"id":[{"id":"10.13039\/501100004837","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2009,10]]},"abstract":"<jats:p>The register file is a critical component in a modern superscalar processor. It must be large enough to accommodate the results of all in-flight instructions. It must also have enough ports to allow simultaneous issue and writeback of many values each cycle. However, this makes it one of the most energy-consuming structures within the processor with a high access latency. As technology scales, there comes a point where register accesses are the bottleneck to performance and so must be pipelined over several cycles. This increases the pipeline depth, lowering performance.<\/jats:p>\n          <jats:p>To overcome these challenges, we propose a novel use of compiler analysis to aid register caching. Adding a register cache allows us to preserve single-cycle register accesses, maintaining performance and reducing energy consumption. We do this by passing information to the processor using free bits in a real ISA, allowing us to cache only the most important registers. Evaluating the register cache over a variety of sizes and associativities and varying the read ports into the cache, our best scheme achieves an energy-delay-squared (EDD) product of 0.81, with a performance increase of 11%. Another configuration saves 13% of register system energy. Using four register cache read ports brings both performance gains and energy savings, consistently outperforming two state-of-the-art hardware approaches.<\/jats:p>","DOI":"10.1145\/1596510.1596511","type":"journal-article","created":{"date-parts":[[2009,10,27]],"date-time":"2009-10-27T13:28:14Z","timestamp":1256650094000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Energy-efficient register caching with compiler assistance"],"prefix":"10.1145","volume":"6","author":[{"given":"Timothy M.","family":"Jones","sequence":"first","affiliation":[{"name":"University of Edinburgh, Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael F. P.","family":"O'Boyle","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jaume","family":"Abella","sequence":"additional","affiliation":[{"name":"Intel Labs Barcelona\u2014UPC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonio","family":"Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Intel Labs Barcelona\u2014UPC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"O\u011fuz","family":"Ergin","sequence":"additional","affiliation":[{"name":"TOBB University of Economics and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2009,10,29]]},"reference":[{"volume-title":"Proceedings of the 21st International Conference on Computer Design (ICCD'03)","author":"Aggarwal A.","key":"e_1_2_1_1_1","unstructured":"Aggarwal , A. and Franklin , M . 2003. Energy efficient asymmetrically ported register files . In Proceedings of the 21st International Conference on Computer Design (ICCD'03) . IEEE, Los Alamitos, CA. Aggarwal, A. and Franklin, M. 2003. Energy efficient asymmetrically ported register files. In Proceedings of the 21st International Conference on Computer Design (ICCD'03). IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM.2005.1609436"},{"volume-title":"Proceedings of the 34th International Symposium on Microarchitecture (MICRO'01)","author":"Balasubramonian R.","key":"e_1_2_1_3_1","unstructured":"Balasubramonian , R. , Dwarkadas , S. , and Albonesi , D. H . 2001. Reducing the complexity of the register file in dynamic superscalar processors . In Proceedings of the 34th International Symposium on Microarchitecture (MICRO'01) . IEEE, Los Alamitos, CA. Balasubramonian, R., Dwarkadas, S., and Albonesi, D. H. 2001. Reducing the complexity of the register file in dynamic superscalar processors. In Proceedings of the 34th International Symposium on Microarchitecture (MICRO'01). IEEE, Los Alamitos, CA."},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA). IEEE","author":"Borch E.","key":"e_1_2_1_4_1","unstructured":"Borch , E. , Manne , S. , Emer , J. , and Tune , E . 2002. Loose loops sink chips . In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA). IEEE , Los Alamitos, CA. Borch, E., Manne, S., Emer, J., and Tune, E. 2002. Loose loops sink chips. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA). IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339657"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.888701"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Burger D. and Austin T. 1997. The simplescalar tool set version 2.0. Tech. rep. TR1342 University of Wisconsin-Madison.  Burger D. and Austin T. 1997. The simplescalar tool set version 2.0. Tech. rep. TR1342 University of Wisconsin-Madison.","DOI":"10.1145\/268806.268810"},{"volume-title":"Proceedings of the 31st International Symposium on Computer Architecture (ISCA'04)","author":"Butts J. A.","key":"e_1_2_1_8_1","unstructured":"Butts , J. A. and Sohi , G. S . 2004. Use-based register caching with decoupled indexing . In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'04) . ACM, New York. Butts, J. A. and Sohi, G. S. 2004. Use-based register caching with decoupled indexing. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'04). ACM, New York."},{"key":"e_1_2_1_9_1","unstructured":"Compaq 1998. Alpha Architecture Handbook. http:\/\/www.compaq.com\/cpq-alphaserver\/technology\/literature\/alphaahb.pdf  Compaq 1998. Alpha Architecture Handbook. http:\/\/www.compaq.com\/cpq-alphaserver\/technology\/literature\/alphaahb.pdf"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339708"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.29"},{"volume-title":"Proceedings of the 22nd International Conference on Computer Design (ICCD'04)","author":"Ergin O.","key":"e_1_2_1_12_1","unstructured":"Ergin , O. , Balkan , D. , Ponomarev , D. , and Ghose , K . 2004. Increasing processor performance through early register release . In Proceedings of the 22nd International Conference on Computer Design (ICCD'04) . IEEE, Los Alamitos, CA. Ergin, O., Balkan, D., Ponomarev, D., and Ghose, K. 2004. Increasing processor performance through early register release. In Proceedings of the 22nd International Conference on Computer Design (ICCD'04). IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2006.145"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379266"},{"volume-title":"Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA'98)","author":"Gonz\u00e1lez A.","key":"e_1_2_1_15_1","unstructured":"Gonz\u00e1lez , A. , Gonz\u00e1lez , J. , and Valero , M . 1998. Virtual-physical registers . In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA'98) . IEEE, Los Alamitos, CA. Gonz\u00e1lez, A., Gonz\u00e1lez, J., and Valero, M. 1998. Virtual-physical registers. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA'98). IEEE, Los Alamitos, CA."},{"volume-title":"Proceedings of the 31st International Symposium on Computer Architecture (ISCA'01)","author":"Gonz\u00e1lez R.","key":"e_1_2_1_16_1","unstructured":"Gonz\u00e1lez , R. , Cristal , A. , Ortega , D. , Veidenbaum , A. , and Valero , M . 2004. A content aware integer register file organization . In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'01) . ACM, New York. Gonz\u00e1lez, R., Cristal, A., Ortega, D., Veidenbaum, A., and Valero, M. 2004. A content aware integer register file organization. In Proceedings of the 31st International Symposium on Computer Architecture (ISCA'01). ACM, New York."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.32"},{"volume-title":"Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA'00)","author":"Hu Z.","key":"e_1_2_1_18_1","unstructured":"Hu , Z. and Martonosi , M . 2000. Reducing register file power consumption by exploiting value lifetime . In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA'00) . ACM, New York. Hu, Z. and Martonosi, M. 2000. Reducing register file power consumption by exploiting value lifetime. In Proceedings of the Workshop on Complexity Effective Design (WCED) in Conjunction with the 27th International Symposium on Computer Architecture (ISCA'00). ACM, New York."},{"key":"e_1_2_1_19_1","unstructured":"Intel. 2007. Intel Core Microarchitecture. http:\/\/www.intel.com\/technology\/architecture-silicon\/core\/index.htm  Intel. 2007. Intel Core Microarchitecture. http:\/\/www.intel.com\/technology\/architecture-silicon\/core\/index.htm"},{"key":"e_1_2_1_20_1","unstructured":"Intel Corp. 1998. Embedded Pentium Processor Family Developer's Manual. http:\/\/developer.intel.com\/design\/intarch\/MANUALS\/241428.htm  Intel Corp. 1998. Embedded Pentium Processor Family Developer's Manual. http:\/\/developer.intel.com\/design\/intarch\/MANUALS\/241428.htm"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.14"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/871506.871602"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.3"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514202"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.798316"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859621"},{"volume-title":"Proceedings of the 30th International Symposium on Microarchitecture (MICRO). IEEE","author":"Martin M. M.","key":"e_1_2_1_27_1","unstructured":"Martin , M. M. , Roth , A. , and Fischer , C. N . 1997. Exploiting dead value information . In Proceedings of the 30th International Symposium on Microarchitecture (MICRO). IEEE , Los Alamitos, CA. Martin, M. M., Roth, A., and Fischer, C. N. 1997. Exploiting dead value information. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO). IEEE, Los Alamitos, CA."},{"volume-title":"Proceedings of the International Conference on Parallel Processing (ICPP'02)","author":"Monreal T.","key":"e_1_2_1_28_1","unstructured":"Monreal , T. , Vi\u00f1als , V. , Gonz\u00e1lez , A. , and Valero , M . 2002. Hardware schemes for early register release . In Proceedings of the International Conference on Parallel Processing (ICPP'02) . IEEE, Los Alamitos, CA. Monreal, T., Vi\u00f1als, V., Gonz\u00e1lez, A., and Valero, M. 2002. Hardware schemes for early register release. In Proceedings of the International Conference on Parallel Processing (ICPP'02). IEEE, Los Alamitos, CA."},{"volume-title":"Proceedings of the 35th International Symposium on Microarchitecture (MICRO'02)","author":"Park I.","key":"e_1_2_1_29_1","unstructured":"Park , I. , Powell , M. D. , and Vijaykumar , T. N . 2002. Reducing register ports for higher speed and lower energy . In Proceedings of the 35th International Symposium on Microarchitecture (MICRO'02) . IEEE, Los Alamitos, CA. Park, I., Powell, M. D., and Vijaykumar, T. N. 2002. Reducing register ports for higher speed and lower energy. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO'02). IEEE, Los Alamitos, CA."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/377792.377859"},{"key":"e_1_2_1_31_1","unstructured":"Sandpile. 2007. Ia-32 implementation. Intel Core. http:\/\/www.sandpile.org\/impl\/core.htm.  Sandpile. 2007. Ia-32 implementation. Intel Core. http:\/\/www.sandpile.org\/impl\/core.htm."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_2_1_33_1","unstructured":"Smith M. D. and Holloway G. 2000. The Machine-SUIF documentation set. http:\/\/www.eecs.harvard.edu\/machsuif\/software\/software.html.  Smith M. D. and Holloway G. 2000. The Machine-SUIF documentation set. http:\/\/www.eecs.harvard.edu\/machsuif\/software\/software.html."},{"key":"e_1_2_1_34_1","unstructured":"Tarjan D. Thoziyoor S. and Jouppi N. P. 2006. Cacti 4.0. Tech. rep. HPL-2006--86 HP Laboratories Palo Alto.  Tarjan D. Thoziyoor S. and Jouppi N. P. 2006. Cacti 4.0. Tech. rep. HPL-2006--86 HP Laboratories Palo Alto."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859627"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232993"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1165573.1165633"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1596510.1596511","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1596510.1596511","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:23:32Z","timestamp":1750249412000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1596510.1596511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,10]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2009,10]]}},"alternative-id":["10.1145\/1596510.1596511"],"URL":"https:\/\/doi.org\/10.1145\/1596510.1596511","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2009,10]]},"assertion":[{"value":"2008-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-10-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}