{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:20:35Z","timestamp":1750306835589,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,12,1]],"date-time":"2013-12-01T00:00:00Z","timestamp":1385856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004316","name":"International Business Machines Corporation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004316","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p>With the emergence of multicore processors, various aggressive execution models have been proposed to exploit fine-grained thread-level parallelism, taking advantage of the fast on-chip interconnection communication. However, the aggressive nature of these execution models often leads to excessive energy consumption incommensurate to execution time reduction. In the context of Thread-Level Speculation, we demonstrated that on a same-ISA heterogeneous multicore system, by dynamically deciding how on-chip resources are utilized, speculative threads can achieve performance gain in an energy-efficient way.<\/jats:p>\n          <jats:p>Through a systematic design space exploration, we built a multicore architecture that integrates heterogeneous components of processing cores and first-level caches. To cope with processor reconfiguration overheads, we introduced runtime mechanisms to mitigate their impacts. To match program execution with the most energy-efficient processor configuration, the system was equipped with a dynamic resource allocation scheme that characterizes program behaviors using novel processor counters.<\/jats:p>\n          <jats:p>We evaluated the proposed heterogeneous system with a diverse set of benchmark programs from SPEC CPU2000 and CPU20006 suites. Compared to the most efficient homogeneous TLS implementation, we achieved similar performance but consumed 18% less energy. Compared to the most efficient homogeneous uniprocessor running sequential programs, we improved performance by 29% and reduced energy consumption by 3.6%, which is a 42% improvement in energy-delay-squared product.<\/jats:p>","DOI":"10.1145\/2541228.2541233","type":"journal-article","created":{"date-parts":[[2014,1,2]],"date-time":"2014-01-02T13:09:43Z","timestamp":1388668183000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution"],"prefix":"10.1145","volume":"10","author":[{"given":"Yangchun","family":"Luo","sequence":"first","affiliation":[{"name":"Advanced Micro Devices Inc., Sunnyvale, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei-Chung","family":"Hsu","sequence":"additional","affiliation":[{"name":"National Chiao Tung University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonia","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of Minnesota, Twin Cities"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2013,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1128022.1128029"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555792"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339657"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168893"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.23"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000067"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508244.1508260"},{"volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201903)","author":"Isci C.","key":"e_1_2_1_8_1","unstructured":"Isci , C. and Martonosi , M . 2003. Runtime power monitoring in high-end processors: Methodology and empirical data . In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201903) . IEEE Computer Society, Washington, DC, 93. Isci, C. and Martonosi, M. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201903). IEEE Computer Society, Washington, DC, 93."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996851"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1229428.1229474"},{"volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201903)","author":"Kumar R.","key":"e_1_2_1_11_1","unstructured":"Kumar , R. , Farkas , K. I. , Jouppi , N. P. , Ranganathan , P. , and Tullsen , D. M . 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction . In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201903) . IEEE Computer Society, Washington, DC, 81. Kumar, R., Farkas, K. I., Jouppi, N. P., Ranganathan, P., and Tullsen, D. M. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201903). IEEE Computer Society, Washington, DC, 81."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1152154.1152162"},{"volume-title":"Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA\u201904)","author":"Kumar R.","key":"e_1_2_1_13_1","unstructured":"Kumar , R. , Tullsen , D. M. , Ranganathan , P. , Jouppi , N. P. , and Farkas , K. I . 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance . In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA\u201904) . IEEE Computer Society, Washington, DC, 64. Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA\u201904). IEEE Computer Society, Washington, DC, 64."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122971.1122997"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/263326.263382"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854329"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555812"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2355585.2355586"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.44"},{"key":"e_1_2_1_21_1","unstructured":"Open64 Developers. 2001. Open64 compiler and tools. http:\/\/www.open64.net.  Open64 Developers. 2001. Open64 compiler and tools. http:\/\/www.open64.net."},{"volume-title":"Proceedings of the International Conference on Computer Design.","author":"Packirisamy V.","key":"e_1_2_1_23_1","unstructured":"Packirisamy , V. , Luo , Y. , Hung , W.-L. , Zhai , A. , Yew , P.-C. , and Ngai , T . -F. 2008. Efficiency of thread-level speculation in SMT and CMP architectures\u2014performance, power and thermal perspective . In Proceedings of the International Conference on Computer Design. Packirisamy, V., Luo, Y., Hung, W.-L., Zhai, A., Yew, P.-C., and Ngai, T.-F. 2008. Efficiency of thread-level speculation in SMT and CMP architectures\u2014performance, power and thermal perspective. In Proceedings of the International Conference on Computer Design."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/11945918_19"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2086696.2086707"},{"volume-title":"Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS\u201906)","author":"Perelman E.","key":"e_1_2_1_26_1","unstructured":"Perelman , E. , Polito , M. , Bouguet , J.-Y. , Sampson , J. , Calder , B. , and Dulong , C . 2006. Detecting phases in parallel applications on shared memory architectures . In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS\u201906) . IEEE Computer Society, Washington, DC, 88. Perelman, E., Polito, M., Bouguet, J.-Y., Sampson, J., Calder, B., and Dulong, C. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS\u201906). IEEE Computer Society, Washington, DC, 88."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088178"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.28"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2005.2"},{"key":"e_1_2_1_30_1","unstructured":"Shivakumar P. and Jouppi N. 2001. CACTI 3.0: An integrated cache timing power and area model. Tech. rep. Compaq Computer Corporation.  Shivakumar P. and Jouppi N. 2001. CACTI 3.0: An integrated cache timing power and area model. Tech. rep. Compaq Computer Corporation."},{"key":"e_1_2_1_31_1","unstructured":"SimpleScalar LLC. 2004. The SimpleScalar tool set. http:\/\/www.simplescalar.com\/.  SimpleScalar LLC. 2004. The SimpleScalar tool set. http:\/\/www.simplescalar.com\/."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082469.1082471"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339650"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508244.1508274"},{"volume-title":"Computer Design, 2007. ICCD 2007. 25th International Conference on. 409--416","author":"Tuck J.","key":"e_1_2_1_36_1","unstructured":"Tuck , J. , Liu , W. , and Torrellas , J . 2007. Cap: Criticality analysis for power-efficient speculative multithreading . In Computer Design, 2007. ICCD 2007. 25th International Conference on. 409--416 . Tuck, J., Liu, W., and Torrellas, J. 2007. Cap: Criticality analysis for power-efficient speculative multithreading. In Computer Design, 2007. ICCD 2007. 25th International Conference on. 409--416."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232993"},{"volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201998)","author":"Vijaykumar T. N.","key":"e_1_2_1_38_1","unstructured":"Vijaykumar , T. N. and Sohi , G. S . 1998. Task selection for a multiscalar processor . In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201998) . IEEE, Los Alamitos, CA, 81--92. Vijaykumar, T. N. and Sohi, G. S. 1998. Task selection for a multiscalar processor. In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201998). IEEE, Los Alamitos, CA, 81--92."},{"volume-title":"Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201902)","author":"Wang H.-S.","key":"e_1_2_1_39_1","unstructured":"Wang , H.-S. , Zhu , X. , Peh , L.-S. , and Malik , S . 2002. Orion: A power-performance simulator for interconnection networks . In Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201902) . IEEE, Los Alamitos, CA, 294--305. Wang, H.-S., Zhu, X., Peh, L.-S., and Malik, S. 2002. Orion: A power-performance simulator for interconnection networks. In Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture (MICRO\u201902). IEEE, Los Alamitos, CA, 294--305."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69330-7_20"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.43"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.7"},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA\u201902)","author":"Yang S.-H.","key":"e_1_2_1_44_1","unstructured":"Yang , S.-H. , Falsafi , B. , Powell , M. D. , and Vijaykumar , T. N . 2002. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay . In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA\u201902) . IEEE Computer Society, Washington, DC, 151. Yang, S.-H., Falsafi, B., Powell, M. D., and Vijaykumar, T. N. 2002. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA\u201902). IEEE Computer Society, Washington, DC, 151."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605416"},{"volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201904)","author":"Zhai A.","key":"e_1_2_1_46_1","unstructured":"Zhai , A. , Colohan , C. B. , Steffan , J. G. , and Mowry , T. C . 2004. Compiler optimization of memory-resident value communication between speculative threads . In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201904) . IEEE Computer Society, Washington, DC, 39. Zhai, A., Colohan, C. B., Steffan, J. G., and Mowry, T. C. 2004. Compiler optimization of memory-resident value communication between speculative threads. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201904). IEEE Computer Society, Washington, DC, 39."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2541233","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2541228.2541233","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:09:55Z","timestamp":1750234195000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2541233"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1145\/2541228.2541233"],"URL":"https:\/\/doi.org\/10.1145\/2541228.2541233","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"2012-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}