{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,21]],"date-time":"2025-09-21T17:03:33Z","timestamp":1758474213460,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2015,8,31]],"date-time":"2015-08-31T00:00:00Z","timestamp":1440979200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["267175"],"award-info":[{"award-number":["267175"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,10,6]]},"abstract":"<jats:p>During the past 10 years, the clock frequency of high-end superscalar processors has not increased. Performance keeps growing mainly by integrating more cores on the same chip and by introducing new instruction set extensions. However, this benefits only some applications and requires rewriting and\/or recompiling these applications. A more general way to accelerate applications is to increase the IPC, the number of instructions executed per cycle. Although the focus of academic microarchitecture research moved away from IPC techniques, the IPC of commercial processors was continuously improved during these years.<\/jats:p>\n          <jats:p>We argue that some of the benefits of technology scaling should be used to raise the IPC of future superscalar cores further. Starting from microarchitecture parameters similar to recent commercial high-end cores, we show that an effective way to increase the IPC is to allow the out-of-order engine to issue more micro-ops per cycle. But this must be done without impacting the clock cycle. We propose combining two techniques: clustering and register write specialization. Past research on clustered microarchitectures focused on narrow issue clusters, as the emphasis at that time was on allowing high clock frequencies.<\/jats:p>\n          <jats:p>Instead, in this study, we consider wide issue clusters, with the goal of increasing the IPC under a constant clock frequency. We show that on a wide issue dual cluster, a very simple steering policy that sends 64 consecutive instructions to the same cluster, the next 64 instructions to the other cluster, and so forth, permits tolerating an intercluster delay of three cycles. We also propose a method for decreasing the energy cost of sending results from one cluster to the other cluster.<\/jats:p>","DOI":"10.1145\/2800787","type":"journal-article","created":{"date-parts":[[2015,9,1]],"date-time":"2015-09-01T13:41:09Z","timestamp":1441114869000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Revisiting Clustered Microarchitecture for Future Superscalar Cores"],"prefix":"10.1145","volume":"12","author":[{"given":"Pierre","family":"Michaud","sequence":"first","affiliation":[{"name":"IRISA\/Inria, Rennes Cedex, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrea","family":"Mondelli","sequence":"additional","affiliation":[{"name":"IRISA\/Inria, Rennes Cedex, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andr\u00e9","family":"Seznec","sequence":"additional","affiliation":[{"name":"IRISA\/Inria, Rennes Cedex, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,8,31]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360165"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.502.0287"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1816000"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1880043.1880046"},{"volume-title":"Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS\u201908)","author":"Cai Q.","key":"e_1_2_2_5_1","unstructured":"Q. Cai , J. M. Codina , J. Gonz\u00e1lez , and A. Gonz\u00e1lez . 2008. A software-hardware hybrid steering mechanism for clustered microarchitectures . In Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS\u201908) . Q. Cai, J. M. Codina, J. Gonz\u00e1lez, and A. Gonz\u00e1lez. 2008. A software-hardware hybrid steering mechanism for clustered microarchitectures. In Proceedings of the International Symposium on Parallel and Distributed Processing (IPDPS\u201908)."},{"volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201904)","author":"Cain H. W.","key":"e_1_2_2_6_1","unstructured":"H. W. Cain and M. H. Lipasti . 2004. Memory ordering: A value-based approach . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201904) . H. W. Cain and M. H. Lipasti. 2004. Memory ordering: A value-based approach. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201904)."},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201999)","author":"Canal R.","key":"e_1_2_2_7_1","unstructured":"R. Canal , J.-M. Parcerisa , and A. Gonz\u00e1lez . 1999. A cost-effective clustered architecture . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201999) . R. Canal, J.-M. Parcerisa, and A. Gonz\u00e1lez. 1999. A cost-effective clustered architecture. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT\u201999)."},{"volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201900)","author":"Canal R.","key":"e_1_2_2_8_1","unstructured":"R. Canal , J. M. Parcerisa , and A. Gonz\u00e1lez . 2000. Dynamic cluster assignment mechanisms . In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201900) . R. Canal, J. M. Parcerisa, and A. Gonz\u00e1lez. 2000. Dynamic cluster assignment mechanisms. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201900)."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279378"},{"key":"e_1_2_2_10_1","unstructured":"S. Curtis R. J. Murray and H. Opie. 1999. Multiported bypass cache in a bypass network. U.S. Patent 6000016.  S. Curtis R. J. Murray and H. Opie. 1999. Multiported bypass cache in a bypass network. U.S. Patent 6000016."},{"volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201914)","author":"Czechowski K.","key":"e_1_2_2_11_1","unstructured":"K. Czechowski , V. W. Lee , E. Grochowski , and R. Ronnen . 2014. Improving the energy efficiency of big cores . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201914) . K. Czechowski, V. W. Lee, E. Grochowski, and R. Ronnen. 2014. Improving the energy efficiency of big cores. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201914)."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.1974.1050511"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000108"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541954"},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201997)","author":"Farkas K. I.","key":"e_1_2_2_15_1","unstructured":"K. I. Farkas , P. Chow , N. P. Jouppi , and Z. Vranesic . 1997. The multicluster architecture: Reducing cycle time through partitioning . In Proceedings of the International Symposium on Microarchitecture (MICRO\u201997) . K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. 1997. The multicluster architecture: Reducing cycle time through partitioning. In Proceedings of the International Symposium on Microarchitecture (MICRO\u201997)."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/4.668985"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379253"},{"volume-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201911)","author":"Golden M.","key":"e_1_2_2_18_1","unstructured":"M. Golden , S. Arekapudi , and J. Vinh . 2011. 40-entry unified out-of-order scheduler and integer execution unit for AMD Bulldozer x86-64 core . In IEEE International Solid-State Circuits Conference (ISSCC\u201911) . M. Golden, S. Arekapudi, and J. Vinh. 2011. 40-entry unified out-of-order scheduler and integer execution unit for AMD Bulldozer x86-64 core. In IEEE International Solid-State Circuits Conference (ISSCC\u201911)."},{"key":"e_1_2_2_19_1","doi-asserted-by":"crossref","unstructured":"A. Gonz\u00e1lez F. Latorre and G. Magklis. 2011. Execute. Processor Microarchitecture. Morgan and Claypool 78--90.  A. Gonz\u00e1lez F. Latorre and G. Magklis. 2011. Execute. Processor Microarchitecture. Morgan and Claypool 78--90.","DOI":"10.1007\/978-3-031-01729-2_7"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1054943.1054950"},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201901)","author":"Goshima M.","key":"e_1_2_2_21_1","unstructured":"M. Goshima , K. Nishino , Y. Nakashima , S. I. Mori , T. Kitamura , and S. Tomita . 2001. A high-speed dynamic instruction scheduling scheme for superscalar processors . In Proceedings of the International Symposium on Microarchitecture (MICRO\u201901) . M. Goshima, K. Nishino, Y. Nakashima, S. I. Mori, T. Kitamura, and S. Tomita. 2001. A high-speed dynamic instruction scheduling scheme for superscalar processors. In Proceedings of the International Symposium on Microarchitecture (MICRO\u201901)."},{"key":"e_1_2_2_22_1","unstructured":"W. Herrick. 2000. Design Challenges in Multi-GHz Microprocessors. Keynote address at the Asia and South Pacific Design Automation Conference (ASP-DAC\u201900).  W. Herrick. 2000. Design Challenges in Multi-GHz Microprocessors. Keynote address at the Asia and South Pacific Design Automation Conference (ASP-DAC\u201900)."},{"volume-title":"Intel 64 and IA-32 Architectures Optimization Reference Manual","key":"e_1_2_2_23_1","unstructured":"Intel. 2014. Intel 64 and IA-32 Architectures Optimization Reference Manual . Intel Corp . Intel. 2014. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corp."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250686"},{"key":"e_1_2_2_25_1","volume-title":"Retrieved","author":"ITRS.","year":"2013","unstructured":"ITRS. 2013 . International Technology Roadmap for Semiconductors\u2014Process Integration, Devices, and Structures . Retrieved July 30, 2015, from http:\/\/www.itrs.net\/. ITRS. 2013. International Technology Roadmap for Semiconductors\u2014Process Integration, Devices, and Structures. Retrieved July 30, 2015, from http:\/\/www.itrs.net\/."},{"volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201904)","author":"Karkhanis T. S.","key":"e_1_2_2_26_1","unstructured":"T. S. Karkhanis and J. E. Smith . 2004. A first-order superscalar processor model . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201904) . T. S. Karkhanis and J. E. Smith. 2004. A first-order superscalar processor model. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201904)."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.755465"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01205182"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_2_2_30_1","volume-title":"Technical Report RR-5744. Inria.","author":"Michaud P.","year":"2005","unstructured":"P. Michaud , Y. Sazeides , A. Seznec , T. Constantinou , and D. Fetis . 2005 . An Analytical Model of Temperature in Microprocessors . Technical Report RR-5744. Inria. P. Michaud, Y. Sazeides, A. Seznec, T. Constantinou, and D. Fetis. 2005. An Analytical Model of Temperature in Microprocessors. Technical Report RR-5744. Inria."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026431920605"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264201"},{"volume-title":"Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC\u201902)","author":"Preston R. P.","key":"e_1_2_2_33_1","unstructured":"R. P. Preston , R. W. Badeau , D. W. Bailey , S. L. Bell , L. L. Biro , W. J. Bowhill , D. E. Dever , S. Felix , R. Gammack , V. Germini , M. K. Gowan , P. Gronowski , D. B. Jackson , S. Mehta , S. V. Morton , J. D. Pickholtz , M. H. Reilly , and M. J. Smith . 2002. Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading . In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC\u201902) . R. P. Preston, R. W. Badeau, D. W. Bailey, S. L. Bell, L. L. Biro, W. J. Bowhill, D. E. Dever, S. Felix, R. Gammack, V. Germini, M. K. Gowan, P. Gronowski, D. B. Jackson, S. Mehta, S. V. Morton, J. D. Pickholtz, M. H. Reilly, and M. J. Smith. 2002. Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC\u201902)."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1972.223514"},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201996)","author":"Rotenberg E.","key":"e_1_2_2_36_1","unstructured":"E. Rotenberg , S. Bennett , and J. E. Smith . 1996. Trace cache: A low latency approach to high bandwidth instruction fetching . In Proceedings of the International Symposium on Microarchitecture (MICRO\u201996) . E. Rotenberg, S. Bennett, and J. E. Smith. 1996. Trace cache: A low latency approach to high bandwidth instruction fetching. In Proceedings of the International Symposium on Microarchitecture (MICRO\u201996)."},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201997)","author":"Rotenberg E.","key":"e_1_2_2_37_1","unstructured":"E. Rotenberg , Q. Jacobson , Y. Sazeides , and J. E. Smith . 1997. Trace processors . In Proceedings of the International Symposium on Microarchitecture (MICRO\u201997) . E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. E. Smith. 1997. Trace processors. In Proceedings of the International Symposium on Microarchitecture (MICRO\u201997)."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.6"},{"volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201902)","author":"Seznec A.","key":"e_1_2_2_39_1","unstructured":"A. Seznec , S. Felix , V. Krishnan , and Y. Sazeides . 2002a. Design tradeoffs for the alpha EV8 conditional branch predictor . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201902) . A. Seznec, S. Felix, V. Krishnan, and Y. Sazeides. 2002a. Design tradeoffs for the alpha EV8 conditional branch predictor. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201902)."},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201902)","author":"Seznec A.","key":"e_1_2_2_40_1","unstructured":"A. Seznec , E. Toullec , and O. Rochecouste . 2002b. Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors . In Proceedings of the International Symposium on Microarchitecture (MICRO\u201902) . A. Seznec, E. Toullec, and O. Rochecouste. 2002b. Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors. In Proceedings of the International Symposium on Microarchitecture (MICRO\u201902)."},{"key":"e_1_2_2_41_1","first-page":"2006","article-title":"A case for (partially) tagged geometric history length branch prediction","volume":"8","author":"Seznec A.","year":"2006","unstructured":"A. Seznec and P. Michaud . 2006 . A case for (partially) tagged geometric history length branch prediction . Journal of Instruction-Level Parallelism , vol. 8 , February 2006 . A. Seznec and P. Michaud. 2006. A case for (partially) tagged geometric history length branch prediction. Journal of Instruction-Level Parallelism, vol. 8, February 2006.","journal-title":"Journal of Instruction-Level Parallelism"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.29"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2011.2127330"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2014.2376112"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.26"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1093\/qjmam\/10.4.482"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264119"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.910816"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2800787","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2800787","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:42:44Z","timestamp":1750225364000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2800787"}},"subtitle":["A Case for Wide Issue Clusters"],"short-title":[],"issued":{"date-parts":[[2015,8,31]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2015,10,6]]}},"alternative-id":["10.1145\/2800787"],"URL":"https:\/\/doi.org\/10.1145\/2800787","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,8,31]]},"assertion":[{"value":"2015-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-08-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}