{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:15:49Z","timestamp":1750220149045,"version":"3.41.0"},"reference-count":76,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,9,16]],"date-time":"2022-09-16T00:00:00Z","timestamp":1663286400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Intel research"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>This article proposes a new microarchitectural scheme for reducing the hardware complexity of the integer register file of a superscalar processor. The register file is split into two banks holding even-numbered and odd-numbered physical registers, respectively. Each bank provides one read port to each two-input integer execution unit. This way, each bank has half the total number of read ports, and the register file area is roughly halved, which reduces the energy dissipated per register access and the register access time. However, a bank conflict occurs when both inputs of a two-input micro-operation lie in the same bank. Bank conflicts hurt performance, and we propose a simple solution to remove most bank conflicts, thus recovering most of the lost performance.<\/jats:p>","DOI":"10.1145\/3544838","type":"journal-article","created":{"date-parts":[[2022,6,21]],"date-time":"2022-06-21T13:19:48Z","timestamp":1655817588000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["HAIR: Halving the Area of the Integer Register File with Odd\/Even Banking"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7037-4014","authenticated-orcid":false,"given":"Pierre","family":"Michaud","sequence":"first","affiliation":[{"name":"Inria, Univ Rennes, CNRS, IRISA, Rennes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0012-4578","authenticated-orcid":false,"given":"Anis","family":"Peysieux","sequence":"additional","affiliation":[{"name":"Inria, Univ Rennes, CNRS, IRISA, Rennes, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,9,16]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_4_2_2","DOI":"10.1145\/3524059.3532396"},{"unstructured":"AMD. 2020. Software Optimization Guide for AMD EPYC 7003 Processors. Retrieved from https:\/\/developer.amd.com\/resources\/developer-guides-manuals\/.","key":"e_1_3_4_3_2"},{"doi-asserted-by":"publisher","key":"e_1_3_4_4_2","DOI":"10.1109\/MICRO.2003.1253201"},{"doi-asserted-by":"publisher","key":"e_1_3_4_5_2","DOI":"10.1109\/MICRO.2001.991122"},{"doi-asserted-by":"publisher","key":"e_1_3_4_6_2","DOI":"10.1145\/1815961.1815970"},{"doi-asserted-by":"publisher","key":"e_1_3_4_7_2","DOI":"10.1016\/j.spl.2013.01.023"},{"doi-asserted-by":"publisher","key":"e_1_3_4_8_2","DOI":"10.1109\/HPCA.2002.995719"},{"issue":"6","key":"e_1_3_4_9_2","article-title":"Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors","volume":"20","author":"Brooks D. M.","year":"2000","unstructured":"D. M. Brooks, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva, A. Buyuktosunoglu, J.-D. Wellman, V. Zyuban, M. Gupta, and P. W. Cook. 2000. Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors. IEEE Micro 20, 6 (Nov. 2000), 26\u201344.","journal-title":"IEEE Micro"},{"doi-asserted-by":"publisher","key":"e_1_3_4_10_2","DOI":"10.1147\/rd.491.0167"},{"doi-asserted-by":"publisher","key":"e_1_3_4_11_2","DOI":"10.1109\/ISCA.1998.694770"},{"unstructured":"Standard Performance Evaluation Corporation. 2017. SPEC CPU 2017. Retrieved from https:\/\/www.spec.org\/cpu2017\/.","key":"e_1_3_4_12_2"},{"doi-asserted-by":"publisher","key":"e_1_3_4_13_2","DOI":"10.1145\/339647.339708"},{"doi-asserted-by":"publisher","key":"e_1_3_4_14_2","DOI":"10.1109\/ISSCC.2011.5746308"},{"key":"e_1_3_4_15_2","volume-title":"Proceedings of the Workshop on Temperature-Aware Computer Systems (TACS\u201905)","author":"Donald J.","year":"2005","unstructured":"J. Donald and M. Martonosi. 2005. Leveraging simultaneous multithreading for adaptive thermal control. In Proceedings of the Workshop on Temperature-Aware Computer Systems (TACS\u201905)."},{"doi-asserted-by":"publisher","key":"e_1_3_4_16_2","DOI":"10.1016\/j.physd.2006.09.027"},{"doi-asserted-by":"publisher","key":"e_1_3_4_17_2","DOI":"10.1103\/PhysRevLett.96.040601"},{"doi-asserted-by":"publisher","key":"e_1_3_4_18_2","DOI":"10.1287\/ijoc.2017.0798"},{"key":"e_1_3_4_19_2","article-title":"On the evolution of random graphs","volume":"5","author":"Erd\u0151s P.","year":"1960","unstructured":"P. Erd\u0151s and A. R\u00e9nyi. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5 (1960), 17\u201361.","journal-title":"Publ. Math. Inst. Hung. Acad. Sci."},{"doi-asserted-by":"publisher","key":"e_1_3_4_20_2","DOI":"10.1109\/MICRO.2004.29"},{"doi-asserted-by":"publisher","key":"e_1_3_4_21_2","DOI":"10.1109\/ICCD.2004.1347965"},{"doi-asserted-by":"publisher","key":"e_1_3_4_22_2","DOI":"10.1109\/4.668985"},{"doi-asserted-by":"publisher","key":"e_1_3_4_23_2","DOI":"10.1017\/CBO9781316339831"},{"issue":"2","key":"e_1_3_4_24_2","article-title":"The intel Pentium M processor: Microarchitecture and performance","volume":"7","author":"Gochman S.","year":"2003","unstructured":"S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. 2003. The intel Pentium M processor: Microarchitecture and performance. Intel Technol. J. 7, 2 (May 2003), 21\u201336.","journal-title":"Intel Technol. J."},{"key":"e_1_3_4_25_2","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1007\/978-3-031-01729-2","volume-title":"Processor Microarchitecture: An Implementation Perspective","author":"Gonz\u00e1lez A.","year":"2011","unstructured":"A. Gonz\u00e1lez, F. Latorre, and G. Magklis. 2011. Processor Microarchitecture: An Implementation Perspective. Morgan & Claypool, Chapter 6, 54."},{"doi-asserted-by":"publisher","key":"e_1_3_4_26_2","DOI":"10.1109\/MC.2008.209"},{"key":"e_1_3_4_27_2","volume-title":"Advanced Microarchitecture and Circuit Design Techniques for On-chip Memories in CMOS Technology","author":"Hsu S. K.","year":"2006","unstructured":"S. K. Hsu. 2006. Advanced Microarchitecture and Circuit Design Techniques for On-chip Memories in CMOS Technology. Ph.D. Dissertation. Oregon State University."},{"doi-asserted-by":"publisher","key":"e_1_3_4_28_2","DOI":"10.1109\/MM.2011.42"},{"unstructured":"Intel. 2021. Intel 64 and IA-32 architectures optimization reference manual. Retrieved from https:\/\/www.intel.com.","key":"e_1_3_4_29_2"},{"doi-asserted-by":"publisher","key":"e_1_3_4_30_2","DOI":"10.5555\/290940.290985"},{"doi-asserted-by":"publisher","key":"e_1_3_4_31_2","DOI":"10.1145\/859618.859623"},{"doi-asserted-by":"publisher","key":"e_1_3_4_32_2","DOI":"10.1109\/HPCA.2005.3"},{"doi-asserted-by":"publisher","key":"e_1_3_4_33_2","DOI":"10.1109\/MICRO.2003.1253185"},{"doi-asserted-by":"publisher","key":"e_1_3_4_34_2","DOI":"10.1109\/CMPCON.1997.584667"},{"issue":"1","key":"e_1_3_4_35_2","article-title":"The McPAT framework for multicore and manycore architectures: Simultaneously modeling power, area and timing","volume":"10","author":"Li S.","year":"2013","unstructured":"S. Li, J. Ho. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2013. The McPAT framework for multicore and manycore architectures: Simultaneously modeling power, area and timing. ACM Trans. Arch. Code Optim. 10, 1 (April 2013).","journal-title":"ACM Trans. Arch. Code Optim."},{"doi-asserted-by":"publisher","key":"e_1_3_4_36_2","DOI":"10.1109\/ICCAD.2011.6105405"},{"doi-asserted-by":"publisher","key":"e_1_3_4_37_2","DOI":"10.5555\/998680.1006728"},{"doi-asserted-by":"publisher","key":"e_1_3_4_38_2","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_3_4_39_2","volume-title":"Power Aware Computing","author":"Martin A. J.","year":"2002","unstructured":"A. J. Martin, M. Nystr\u00f6m, and P. I. P\u00e9nzes. 2002. \\( ET^2 \\) : A metric for time and energy efficiency of computation. In Power Aware Computing, R. Graybill and R. Melhem (Eds.). Springer."},{"doi-asserted-by":"publisher","key":"e_1_3_4_40_2","DOI":"10.1109\/JSSC.2011.2167823"},{"key":"e_1_3_4_41_2","article-title":"Scheduling operations using a dependency matrix","author":"Merchant A. A.","year":"1998","unstructured":"A. A. Merchant and D. S. Sager. 1998. Scheduling operations using a dependency matrix. US patent 6334182. (Aug. 1998).","journal-title":"US patent 6334182"},{"doi-asserted-by":"publisher","key":"e_1_3_4_42_2","DOI":"10.1109\/ISPASS.2003.1190247"},{"doi-asserted-by":"publisher","key":"e_1_3_4_43_2","DOI":"10.1109\/HPCA.2016.7446087"},{"doi-asserted-by":"publisher","key":"e_1_3_4_44_2","DOI":"10.1002\/cpe.4666"},{"doi-asserted-by":"publisher","key":"e_1_3_4_45_2","DOI":"10.1109\/MICRO.1999.809456"},{"doi-asserted-by":"publisher","key":"e_1_3_4_46_2","DOI":"10.1109\/ICPP.2002.1040854"},{"key":"e_1_3_4_47_2","volume-title":"Networks (2nd ed.)","author":"Newman M.","year":"2018","unstructured":"M. Newman. 2018. Networks (2nd ed.). Oxford University Press."},{"issue":"12","key":"e_1_3_4_48_2","article-title":"A 7-nm 6R6W register file with double-pumped read and write operations for high-bandwidth memory in machine learning and CPU processors","volume":"1","author":"Nguyen H.","year":"2018","unstructured":"H. Nguyen, J. Jeong, F. Atallah, D. Yingling, and K. Bowman. 2018. A 7-nm 6R6W register file with double-pumped read and write operations for high-bandwidth memory in machine learning and CPU processors. IEEE Solid-State Circ. Lett. 1, 12 (December 2018), 225\u2013228.","journal-title":"IEEE Solid-State Circ. Lett."},{"doi-asserted-by":"publisher","key":"e_1_3_4_49_2","DOI":"10.1109\/MICRO.2002.1176248"},{"doi-asserted-by":"publisher","key":"e_1_3_4_50_2","DOI":"10.1109\/HPCA.2016.7446105"},{"doi-asserted-by":"publisher","key":"e_1_3_4_51_2","DOI":"10.1145\/2749469.2749470"},{"key":"e_1_3_4_52_2","volume-title":"Proceedings of the International Solid-State Circuits Conference (ISSCC\u201902)","author":"Preston R. P.","year":"2002","unstructured":"R. P. Preston, R. W. Badeau, D. W. Bailey, S. L. Bell, L. L. Biro, W. J. Bowhill, D. E. Dever, S. Felix, R. Gammack, V. Germini, M. K. Gowan, P. Gronowski, D. B. Jackson, S. Mehta, S. V. Morton, J. D. Pickholtz, M. H. Reilly, and M. J. Smith. 2002. Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading. In Proceedings of the International Solid-State Circuits Conference (ISSCC\u201902)."},{"doi-asserted-by":"publisher","key":"e_1_3_4_53_2","DOI":"10.1145\/1250662.1250709"},{"doi-asserted-by":"publisher","key":"e_1_3_4_54_2","DOI":"10.1109\/MICRO.1999.809439"},{"doi-asserted-by":"publisher","key":"e_1_3_4_55_2","DOI":"10.1145\/339647.339668"},{"doi-asserted-by":"publisher","key":"e_1_3_4_56_2","DOI":"10.1109\/MM.2022.3164338"},{"doi-asserted-by":"publisher","key":"e_1_3_4_57_2","DOI":"10.1145\/1250662.1250704"},{"key":"e_1_3_4_58_2","article-title":"A case for (partially) tagged geometric history length branch prediction","volume":"8","author":"Seznec A.","year":"2006","unstructured":"A. Seznec and P. Michaud. 2006. A case for (partially) tagged geometric history length branch prediction. J. Instr. Level Parall. 8 (February 2006).","journal-title":"J. Instr. Level Parall."},{"doi-asserted-by":"publisher","key":"e_1_3_4_59_2","DOI":"10.1145\/2830772.2830831"},{"doi-asserted-by":"publisher","key":"e_1_3_4_60_2","DOI":"10.1109\/MICRO.2002.1176265"},{"doi-asserted-by":"publisher","key":"e_1_3_4_61_2","DOI":"10.1109\/MICRO.2010.43"},{"doi-asserted-by":"publisher","key":"e_1_3_4_62_2","DOI":"10.1147\/JRD.2014.2376112"},{"doi-asserted-by":"publisher","key":"e_1_3_4_63_2","DOI":"10.1145\/859618.859620"},{"doi-asserted-by":"publisher","key":"e_1_3_4_64_2","DOI":"10.1109\/HPCA.2018.00031"},{"key":"e_1_3_4_65_2","volume-title":"Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS\u201904)","author":"Tram L.","year":"2004","unstructured":"L. Tram, N. Nelson, F. Ngai, S. Dropsho, and M. Huang. 2004. Dynamically reducing pressure on the physical register file through simple register sharing. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS\u201904)."},{"doi-asserted-by":"publisher","key":"e_1_3_4_66_2","DOI":"10.1145\/859618.859627"},{"doi-asserted-by":"publisher","key":"e_1_3_4_67_2","DOI":"10.1109\/PACT.1996.552666"},{"key":"e_1_3_4_68_2","volume-title":"CMOS VLSI Design: A Circuits and Systems Perspective (4th ed.)","author":"Weste N. H. E.","year":"2010","unstructured":"N. H. E. Weste and D. M. Harris. 2010. CMOS VLSI Design: A Circuits and Systems Perspective (4th ed.). Addison-Wesley."},{"unstructured":"Wikipedia. 2022. Bipartite Graph. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Bipartite_graph.","key":"e_1_3_4_69_2"},{"unstructured":"Wikipedia. 2022. Birthday Problem. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Birthday_problem.","key":"e_1_3_4_70_2"},{"unstructured":"Wikipedia. 2022. Maximum Cut. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Maximum_cut.","key":"e_1_3_4_71_2"},{"unstructured":"Wikipedia. 2022. Multigraph. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Multigraph.","key":"e_1_3_4_72_2"},{"doi-asserted-by":"publisher","key":"e_1_3_4_73_2","DOI":"10.1109\/40.491460"},{"doi-asserted-by":"publisher","key":"e_1_3_4_74_2","DOI":"10.1109\/MM.2007.37"},{"doi-asserted-by":"publisher","key":"e_1_3_4_75_2","DOI":"10.1109\/ICCD.1995.528826"},{"key":"e_1_3_4_76_2","article-title":"Method and apparatus for scheduling instructions in waves","author":"Zaidi N.","year":"1997","unstructured":"N. Zaidi, G. Hammond, and K. Shoemaker. 1997. Method and apparatus for scheduling instructions in waves. US patent 6016540. (January 1997).","journal-title":"US patent 6016540"},{"doi-asserted-by":"publisher","key":"e_1_3_4_77_2","DOI":"10.1145\/505306.505313"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544838","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3544838","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:06Z","timestamp":1750186806000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3544838"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,16]]},"references-count":76,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3544838"],"URL":"https:\/\/doi.org\/10.1145\/3544838","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2022,9,16]]},"assertion":[{"value":"2022-03-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}