{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:38:43Z","timestamp":1750307923481,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2008,5,1]],"date-time":"2008-05-01T00:00:00Z","timestamp":1209600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2008,5]]},"abstract":"<jats:p>Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program performance under TLS, which stalls as a result of synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and, hence, failed speculation. Using SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly aggressive instruction scheduling techniques, the compiler can drastically reduce the<jats:italic>critical forwarding path<\/jats:italic>introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.<\/jats:p>","DOI":"10.1145\/1369396.1369399","type":"journal-article","created":{"date-parts":[[2008,6,3]],"date-time":"2008-06-03T15:11:43Z","timestamp":1212505903000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Compiler and hardware support for reducing the synchronization of speculative threads"],"prefix":"10.1145","volume":"5","author":[{"given":"Antonia","family":"Zhai","sequence":"first","affiliation":[{"name":"University of Minnesota, Minneapolis, MN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J. Gregory","family":"Steffan","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher B.","family":"Colohan","sequence":"additional","affiliation":[{"name":"Google, Ann Arbor, Michigan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Todd C.","family":"Mowry","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, Pennsylvania"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2008,5,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Akkary H. and Driscoll M. 1998. A Dynamic Multithreading Processor. In MICRO-31. Akkary H. and Driscoll M. 1998. A Dynamic Multithreading Processor. In MICRO-31."},{"volume-title":"Leading the industry: Multi-core technology &amp","author":"AMD Corporation","key":"e_1_2_1_2_1","unstructured":"AMD Corporation . 2005. Leading the industry: Multi-core technology &amp ; dual-core processors from AMD. http:\/\/multicore.amd.com\/en\/Technology\/. AMD Corporation. 2005. Leading the industry: Multi-core technology &amp; dual-core processors from AMD. http:\/\/multicore.amd.com\/en\/Technology\/."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/277650.277665"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.546612"},{"key":"e_1_2_1_5_1","unstructured":"Chang P. P. Warter N. J. Mahlke S. A. Chen W. Y. and Hwu W. W. 1991. Three superblock scheduling models for superscalar and superpipelined processors. Tech. Rept. CRHC-91-29 Center for Reliable and High-Performance Computing University of Illinois. Chang P. P. Warter N. J. Mahlke S. A. Chen W. Y. and Hwu W. W. 1991. Three superblock scheduling models for superscalar and superpipelined processors. Tech. Rept. CRHC-91-29 Center for Reliable and High-Performance Computing University of Illinois."},{"volume-title":"8th International Symposium on High-Performance Computer Architecture (HPCA-8).","author":"Cintra M.","key":"e_1_2_1_6_1","unstructured":"Cintra , M. and Torrellas , J . 2002. Learning cross-thread violations in speculative parallelization for multiprocessors . In 8th International Symposium on High-Performance Computer Architecture (HPCA-8). Cintra, M. and Torrellas, J. 2002. Learning cross-thread violations in speculative parallelization for multiprocessors. In 8th International Symposium on High-Performance Computer Architecture (HPCA-8)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.363382"},{"key":"e_1_2_1_8_1","volume-title":"31st International Conference on Very Large Data Bases (VLDB","author":"Colohan C. B.","year":"2005","unstructured":"Colohan , C. B. , Ailamaki , A. , Steffan , J. G. , and Mowry , T. C . 2005. Optimistic intra-transaction parallelism on chip multiprocessors . In 31st International Conference on Very Large Data Bases (VLDB 2005 ). Colohan, C. B., Ailamaki, A., Steffan, J. G., and Mowry, T. C. 2005. Optimistic intra-transaction parallelism on chip multiprocessors. In 31st International Conference on Very Large Data Bases (VLDB 2005)."},{"volume-title":"33rd Annual International Symposium on Computer Architecture (ISCA '06)","author":"Colohan C. B.","key":"e_1_2_1_9_1","unstructured":"Colohan , C. B. , Ailamaki , A. , Steffan , J. G. , and Mowry , T. C . 2006. Hardware support for large speculative threads . In 33rd Annual International Symposium on Computer Architecture (ISCA '06) . Colohan, C. B., Ailamaki, A., Steffan, J. G., and Mowry, T. C. 2006. Hardware support for large speculative threads. In 33rd Annual International Symposium on Computer Architecture (ISCA '06)."},{"key":"e_1_2_1_10_1","volume-title":"International Conference on Parallel Processing.","author":"Cytron R.","year":"1986","unstructured":"Cytron , R. 1986 . Doacross: Beyond vectorization for multiprocessors . In International Conference on Parallel Processing. Cytron, R. 1986. Doacross: Beyond vectorization for multiprocessors. In International Conference on Parallel Processing."},{"key":"e_1_2_1_11_1","volume-title":"International Conference on Parallel Architectures and Compilation Techniques (PACT","author":"Dubey P.","year":"1995","unstructured":"Dubey , P. , O'Brien , K. , O'Brien , K. , and Barton , C . 1995. Single-program speculative multithreading (spsm) architecture: Compiler-assisted fine-grained multithreading . In International Conference on Parallel Architectures and Compilation Techniques (PACT 1995 ). Dubey, P., O'Brien, K., O'Brien, K., and Barton, C. 1995. Single-program speculative multithreading (spsm) architecture: Compiler-assisted fine-grained multithreading. In International Conference on Parallel Architectures and Compilation Techniques (PACT 1995)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379253"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1981.1675827"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.139703"},{"key":"e_1_2_1_15_1","unstructured":"Gabbay F. and Mendelson A. 1996. Speculative execution based on value prediction. Tech. Rept. EE Department TR &num;1080 Technion--Israel Institute of Technology. Gabbay F. and Mendelson A. 1996. Speculative execution based on value prediction. Tech. Rept. EE Department TR &num;1080 Technion--Israel Institute of Technology."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/195473.195534"},{"volume-title":"4th International Symposium on High-Performance Computer Architecture (HPCA-4).","author":"Gopal S.","key":"e_1_2_1_17_1","unstructured":"Gopal , S. , Vijaykumar , T. , Smith , J. , and Sohi , G . 1998. Speculative versioning cache . In 4th International Symposium on High-Performance Computer Architecture (HPCA-4). Gopal, S., Vijaykumar, T., Smith, J., and Sohi, G. 1998. Speculative versioning cache. In 4th International Symposium on High-Performance Computer Architecture (HPCA-4)."},{"volume-title":"Supercomputing '98","author":"Gupta M.","key":"e_1_2_1_18_1","unstructured":"Gupta , M. and Nim , R . 1998. Techniques for speculative run-time parallelization of loops . In Supercomputing '98 . Gupta, M. and Nim, R. 1998. Techniques for speculative run-time parallelization of loops. In Supercomputing '98."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/291069.291020"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/169627.169747"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"IBM Corporation. 2007. IBM unleashes world's fastest chip in powerful new computer. http:\/\/www-03.ibm.com\/press\/us\/en\/pressrelease\/21580.wss. IBM Corporation. 2007. IBM unleashes world's fastest chip in powerful new computer. http:\/\/www-03.ibm.com\/press\/us\/en\/pressrelease\/21580.wss.","DOI":"10.1063\/pt.5.020977"},{"key":"e_1_2_1_22_1","unstructured":"Intel Corporation. 2005. Intel's Dual-Core Processor for Desktop PCs. http:\/\/www.intel.com\/personal\/desktopcomputer\/dual_core\/index.htm. Intel Corporation. 2005. Intel's Dual-Core Processor for Desktop PCs. http:\/\/www.intel.com\/personal\/desktopcomputer\/dual_core\/index.htm."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/319838.319854"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/143095.143136"},{"volume-title":"International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Krishnan V.","key":"e_1_2_1_25_1","unstructured":"Krishnan , V. and Torrellas , J . 1999. The need for fast communication in hardware-based speculative chip multiprocessors . In International Conference on Parallel Architectures and Compilation Techniques (PACT). Krishnan, V. and Torrellas, J. 1999. The need for fast communication in hardware-based speculative chip multiprocessors. In International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"volume-title":"International Symposium on Microarchitecture.","author":"Lipasti M. H.","key":"e_1_2_1_26_1","unstructured":"Lipasti , M. H. and Shen , J. P . 1996. Exceeding the data-flow limit via value prediction . In International Symposium on Microarchitecture. Lipasti, M. H. and Shen, J. P. 1996. Exceeding the data-flow limit via value prediction. In International Symposium on Microarchitecture."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305214"},{"volume-title":"International Symposium on Microarchitecture.","author":"Marcuello P.","key":"e_1_2_1_28_1","unstructured":"Marcuello , P. , Tubella , J. , and Gonzalez , A . 1999a. Value prediction for speculative multithreaded architectures . In International Symposium on Microarchitecture. Marcuello, P., Tubella, J., and Gonzalez, A. 1999a. Value prediction for speculative multithreaded architectures. In International Symposium on Microarchitecture."},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Marcuello P. Tubella J. and Gonzalez A. 1999b. Value prediction for speculative multithreaded architectures. In Micro 32 Haifa Israel. Marcuello P. Tubella J. and Gonzalez A. 1999b. Value prediction for speculative multithreaded architectures. In Micro 32 Haifa Israel.","DOI":"10.1145\/277830.277850"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264189"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.24269"},{"volume-title":"Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99)","author":"Oplinger J.","key":"e_1_2_1_32_1","unstructured":"Oplinger , J. , Heine , D. , and Lam , M. S . 1999. In search of speculative thread-level parallelism . In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99) . Oplinger, J., Heine, D., and Lam, M. S. 1999. In search of speculative thread-level parallelism. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1980.1675676"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/781498.781500"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065944.1065964"},{"volume-title":"Proceedings of Micro 30","author":"Rotenberg E.","key":"e_1_2_1_36_1","unstructured":"Rotenberg , E. , Jacobson , Q. , Sazeides , Y. , and Smith , J . 1997. Trace processors . In Proceedings of Micro 30 . Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. 1997. Trace processors. In Proceedings of Micro 30."},{"key":"e_1_2_1_37_1","first-page":"248","article-title":"The predictability of data values","volume":"13","author":"Sazeides Y.","year":"1997","unstructured":"Sazeides , Y. and Smith , J. E. 1997 . The predictability of data values . Proceedings of Micro 13 , 248 -- 258 . Sazeides, Y. and Smith, J. E. 1997. The predictability of data values. Proceedings of Micro 13, 248--258.","journal-title":"Proceedings of Micro"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339650"},{"volume-title":"8th International Symposium on High-Performance Computer Architecture (HPCA-8).","author":"Steffan J. G.","key":"e_1_2_1_40_1","unstructured":"Steffan , J. G. , Colohan , C. B. , Zhai , A. , and Mowry , T. C . 2002. Improving value communication for thread-level speculation . In 8th International Symposium on High-Performance Computer Architecture (HPCA-8). Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. 2002. Improving value communication for thread-level speculation. In 8th International Symposium on High-Performance Computer Architecture (HPCA-8)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082469.1082471"},{"key":"e_1_2_1_42_1","unstructured":"Sun Corporation. 2005. Throughput computing\u2014Niagara. http:\/\/www.sun.com\/processors\/throughput\/. Sun Corporation. 2005. Throughput computing\u2014Niagara. http:\/\/www.sun.com\/processors\/throughput\/."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.795219"},{"volume-title":"30th Annual IEEE\/ACM International Symposium on Microarchitecture (Micro-30)","author":"Wang K.","key":"e_1_2_1_45_1","unstructured":"Wang , K. and Franklin , M . 1997. Highly accurate data value prediction using hybrid predictors . In 30th Annual IEEE\/ACM International Symposium on Microarchitecture (Micro-30) . Research Triangle Park, NC. Wang, K. and Franklin, M. 1997. Highly accurate data value prediction using hybrid predictors. In 30th Annual IEEE\/ACM International Symposium on Microarchitecture (Micro-30). Research Triangle Park, NC."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/193209.193217"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.491460"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Zhai A. Colohan C. B. Steffan J. G. and Mowry T. C. 2002. Compiler optimizations to accelerate scalar value communication between speculative threads. Tech. Rept. CMU-CS-02-162 School of Computer Science Carnegie Mellon University. August. Zhai A. Colohan C. B. Steffan J. G. and Mowry T. C. 2002. Compiler optimizations to accelerate scalar value communication between speculative threads. Tech. Rept. CMU-CS-02-162 School of Computer Science Carnegie Mellon University. August.","DOI":"10.1145\/605397.605416"},{"volume-title":"The 2004 International Symposium on Code Generation and Optimization","author":"Zhai A.","key":"e_1_2_1_50_1","unstructured":"Zhai , A. , Colohan , C. B. , Steffan , J. G. , and Mowry , T. C . 2004. Compiler optimization of memory-resident value communication between speculative threads . In The 2004 International Symposium on Code Generation and Optimization , Palo Alto, CA. Zhai, A., Colohan, C. B., Steffan, J. G., and Mowry, T. C. 2004. Compiler optimization of memory-resident value communication between speculative threads. In The 2004 International Symposium on Code Generation and Optimization, Palo Alto, CA."},{"volume-title":"thesis","author":"Zilles C.","key":"e_1_2_1_51_1","unstructured":"Zilles , C. 2002. Master \/slave speculative parallelization and approximate code. Ph.D. thesis , University of Wisconsin\u2014Madison. Zilles, C. 2002. Master\/slave speculative parallelization and approximate code. Ph.D. thesis, University of Wisconsin\u2014Madison."},{"volume-title":"35th Annual IEEE\/ACM International Symposium on Microarchitecture (Micro-35)","author":"Zilles C.","key":"e_1_2_1_52_1","unstructured":"Zilles , C. and Sohi , G . 2002. Master\/slave speculative parallelization . In 35th Annual IEEE\/ACM International Symposium on Microarchitecture (Micro-35) . Zilles, C. and Sohi, G. 2002. Master\/slave speculative parallelization. In 35th Annual IEEE\/ACM International Symposium on Microarchitecture (Micro-35)."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1369396.1369399","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1369396.1369399","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T14:51:43Z","timestamp":1750258303000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1369396.1369399"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,5]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,5]]}},"alternative-id":["10.1145\/1369396.1369399"],"URL":"https:\/\/doi.org\/10.1145\/1369396.1369399","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2008,5]]},"assertion":[{"value":"2006-03-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-11-30","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-05-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}