{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:24:46Z","timestamp":1750307086694,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2012,9,1]],"date-time":"2012-09-01T00:00:00Z","timestamp":1346457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS-0834599, EIA-0220021"],"award-info":[{"award-number":["CNS-0834599, EIA-0220021"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-0834599, EIA-0220021"],"award-info":[{"award-number":["CNS-0834599, EIA-0220021"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"publisher","award":["SRC-2008-TJ-1819"],"award-info":[{"award-number":["SRC-2008-TJ-1819"]}],"id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,9]]},"abstract":"<jats:p>Efficiently utilizing multicore processors to improve their performance potentials demands extracting thread-level parallelism from the applications. Various novel and sophisticated execution models have been proposed to extract thread-level parallelism from sequential programs. One such execution model, Thread-Level Speculation (TLS), allows potentially dependent threads to execute speculatively in parallel.<\/jats:p>\n          <jats:p>\n            However, TLS execution is inherently unpredictable, and consequently incorrect speculation could degrade performance for the multicore systems. Existing approaches have focused on using the compilers to select sequential program regions to apply TLS. Our research shows that even the state-of-the-art compiler makes suboptimal decisions, due to the unpredictability of TLS execution. Thus, we propose to\n            <jats:italic>dynamically<\/jats:italic>\n            optimize TLS performance.\n          <\/jats:p>\n          <jats:p>This article describes the design, implementation, and evaluation of a runtime thread dispatching mechanism that adjusts the behaviors of speculative threads based on their efficiency. In the proposed system, speculative threads are monitored by hardware-based performance counters and their performance impact is evaluated with a novel methodology that takes into account various unique TLS characteristics. Thread dispatching policies are devised to adjust the behaviors of speculative threads accordingly.<\/jats:p>\n          <jats:p>With the help of the runtime evaluation, where and how to create speculative threads is better determined. Evaluated with all the SPEC CPU2000 benchmark programs written in C, the dynamic dispatching system outperforms the state-of-the-art compiler-based thread management techniques by 9.4% on average. Comparing to sequential execution, we achieve 1.37X performance improvement on a four-core CMP-based system.<\/jats:p>","DOI":"10.1145\/2355585.2355586","type":"journal-article","created":{"date-parts":[[2012,10,2]],"date-time":"2012-10-02T13:50:00Z","timestamp":1349185800000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Dynamically dispatching speculative threads to improve sequential execution"],"prefix":"10.1145","volume":"9","author":[{"given":"Yangchun","family":"Luo","sequence":"first","affiliation":[{"name":"Advanced Micro Devices, Sunnyvale, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonia","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, MN"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,10,5]]},"reference":[{"volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 226--236","author":"Akkary H.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed And Runtime Optimization (CGO '03)","author":"Bruening D.","key":"e_1_2_1_2_1"},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA '02)","author":"Cintra M.","key":"e_1_2_1_4_1"},{"volume-title":"Proceedings of the 34th Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 306--317","author":"Collins J. D.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996852"},{"volume-title":"Proceedings of the IFIP WG10","author":"Dubey P. K.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168880"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.139703"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (Supercomputing '98)","author":"Gupta M.","key":"e_1_2_1_11_1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/291069.291020"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996851"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1229428.1229474"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605415"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.19"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/319838.319854"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.31"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122971.1122997"},{"volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 180","author":"Lu J.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","unstructured":"Lu J. Chen H. Yew P.-C. and Chung Hsu W. 2004. Design and implementation of a lightweight dynamic optimization system. J. Instruct.-Level Parallel. 6.  Lu J. Chen H. Yew P.-C. and Chung Hsu W. 2004. Design and implementation of a lightweight dynamic optimization system. J. Instruct.-Level Parallel. 6."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2005.18"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379250"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065034"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305214"},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA '02)","author":"Marcuello P.","key":"e_1_2_1_26_1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277850"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Mericas A. 2006. Performance monitoring on the POWER5 microprocessor. In Performance Evaluation and Benchmarking L. K. John and L. Eeckhout Eds. CRC Press 247--266.  Mericas A. 2006. Performance monitoring on the POWER5 microprocessor. In Performance Evaluation and Benchmarking L. K. John and L. Eeckhout Eds. CRC Press 247--266.","DOI":"10.1201\/9781420037425.ch12"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.26"},{"key":"e_1_2_1_30_1","unstructured":"Open64 Developers. 2001. Open64 compiler and tools. http:\/\/www.open64.net.  Open64 Developers. 2001. Open64 compiler and tools. http:\/\/www.open64.net."},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '99)","author":"Oplinger J. T.","key":"e_1_2_1_31_1"},{"volume-title":"Proceedings of the 20th International Conference on Parallel and Distributed Processing (IPDPS'06)","author":"Perelman E.","key":"e_1_2_1_32_1"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360155"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065043"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088178"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088173"},{"key":"e_1_2_1_37_1","unstructured":"SimpleScalar LLC. 2004. The SimpleScalar tool set. http:\/\/www.simplescalar.com\/.  SimpleScalar LLC. 2004. The SimpleScalar tool set. http:\/\/www.simplescalar.com\/."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082469.1082471"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339650"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1346281.1346317"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/858570.858576"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/378993.379247"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.795219"},{"volume-title":"Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 14","author":"Tubella J.","key":"e_1_2_1_45_1"},{"volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. IEEE, 81--92","author":"Vijaykumar T. N.","key":"e_1_2_1_46_1"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69330-7_20"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605416"},{"volume-title":"Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO '04)","author":"Zhai A.","key":"e_1_2_1_50_1"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.7"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355586","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2355585.2355586","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:34:24Z","timestamp":1750239264000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355586"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9]]},"references-count":49,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,9]]}},"alternative-id":["10.1145\/2355585.2355586"],"URL":"https:\/\/doi.org\/10.1145\/2355585.2355586","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2012,9]]},"assertion":[{"value":"2010-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}