{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T06:16:01Z","timestamp":1767852961898,"version":"3.49.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2005,8,1]],"date-time":"2005-08-01T00:00:00Z","timestamp":1122854400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2005,8]]},"abstract":"<jats:p>\n            Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to improve the throughput of a multithreaded or multiprogrammed workload, the real challenge is how to easily create\n            <jats:italic>parallel software<\/jats:italic>\n            to allow single programs to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is\n            <jats:italic>Thread-Level Speculation (TLS)<\/jats:italic>\n            , which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this article, we propose and evaluate a design for supporting TLS that seamlessly scales both within a chip and beyond because it is a straightforward extension of write-back invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on single-chip multiprocessors where the first level caches are either private or shared. For our private-cache design, the program performance of two of 13 general purpose applications studied improves by 86% and 56%, four others by more than 8%, and an average across all applications of 16%---confirming that TLS is a promising way to exploit the naturally-multithreaded processing resources of future computer systems.\n          <\/jats:p>","DOI":"10.1145\/1082469.1082471","type":"journal-article","created":{"date-parts":[[2005,11,7]],"date-time":"2005-11-07T16:00:45Z","timestamp":1131379245000},"page":"253-300","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":131,"title":["The STAMPede approach to thread-level speculation"],"prefix":"10.1145","volume":"23","author":[{"given":"J. Gregory","family":"Steffan","sequence":"first","affiliation":[{"name":"University of Toronto, Ontario, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christopher","family":"Colohan","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonia","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, MN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Todd C.","family":"Mowry","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2005,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of ISCA 27","author":"Agarwal W.","unstructured":"Agarwal , W. , Hrishikesh , M. , Keckler , S. , and Burger , D . 2000. Clock rate versus IPC: the end of the road for conventional microarchitectures . In Proceedings of ISCA 27 . 10.1145\/339647.339691 Agarwal, W., Hrishikesh, M., Keckler, S., and Burger, D. 2000. Clock rate versus IPC: the end of the road for conventional microarchitectures. In Proceedings of ISCA 27. 10.1145\/339647.339691"},{"key":"e_1_2_1_2_1","volume-title":"Compilers: Principles, Techniques and Tools","author":"Aho A. V.","year":"1986","unstructured":"Aho , A. V. , Sethi , R. , and Ullman , J. D . 1986 . Compilers: Principles, Techniques and Tools . Addison Wesley . Aho, A. V., Sethi, R., and Ullman, J. D. 1986. Compilers: Principles, Techniques and Tools. Addison Wesley."},{"key":"e_1_2_1_3_1","unstructured":"Akkary H. and Driscoll M. 1998. A dynamic multithreading processor. In MICRO-31.   Akkary H. and Driscoll M. 1998. A dynamic multithreading processor. In MICRO-31."},{"key":"e_1_2_1_4_1","volume-title":"Tech. Rep. CS-TR-1997-1344, Computer Sciences Department","author":"Breach S. E.","year":"1996","unstructured":"Breach , S. E. , Vijaykumar , T. N. , Gopal , S. , Smith , J. E. , and Sohi , G. S . 1996 . Data memory alternatives for multiscalar processors. Tech. Rep. CS-TR-1997-1344, Computer Sciences Department , University of Wisconsin-Madison. Breach, S. E., Vijaykumar, T. N., Gopal, S., Smith, J. E., and Sohi, G. S. 1996. Data memory alternatives for multiscalar processors. Tech. Rep. CS-TR-1997-1344, Computer Sciences Department, University of Wisconsin-Madison."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 27th Annual International Symposium on Microarchitecture. 181--190","author":"Breach S. E.","year":"1927","unstructured":"Breach , S. E. , Vijaykumar , T. N. , and Sohi , G. S . 1994. The anatomy of the register file in a multiscalar processor . In Proceedings of the 27th Annual International Symposium on Microarchitecture. 181--190 . 10.1145\/ 1927 24.192750 Breach, S. E., Vijaykumar, T. N., and Sohi, G. S. 1994. The anatomy of the register file in a multiscalar processor. In Proceedings of the 27th Annual International Symposium on Microarchitecture. 181--190. 10.1145\/192724.192750"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM Press, 13--24","author":"Cintra M.","unstructured":"Cintra , M. and Llanos , D. R . 2003. Toward efficient and robust software speculative parallelization on multiprocessors . In Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM Press, 13--24 . 10.1145\/781498.781501 Cintra, M. and Llanos, D. R. 2003. Toward efficient and robust software speculative parallelization on multiprocessors. In Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM Press, 13--24. 10.1145\/781498.781501"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of ISCA 27","author":"Cintra M.","unstructured":"Cintra , M. , Mart\u00ednez , J. F. , and Torrellas , J . 2000. Architectural support for scalable speculative parallelization in shared-memory multiprocessors . In Proceedings of ISCA 27 . 10.1145\/339647.363382 Cintra, M., Mart\u00ednez, J. F., and Torrellas, J. 2000. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of ISCA 27. 10.1145\/339647.363382"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 8th HPCA.","author":"Cintra M.","unstructured":"Cintra , M. and Torrellas , J . 2002. Learning cross-thread violations in speculative parallelization for multiprocessors . In Proceedings of the 8th HPCA. Cintra, M. and Torrellas, J. 2002. Learning cross-thread violations in speculative parallelization for multiprocessors. In Proceedings of the 8th HPCA."},{"key":"e_1_2_1_9_1","volume-title":"International Conference on Parallel Architectures and Compilation Techniques.","author":"Emer J.","year":"2001","unstructured":"Emer , J. 2001 . Ev8: The post-ultimate alpha (keynote address) . In International Conference on Parallel Architectures and Compilation Techniques. Emer, J. 2001. Ev8: The post-ultimate alpha (keynote address). In International Conference on Parallel Architectures and Compilation Techniques."},{"key":"e_1_2_1_10_1","first-page":"338","volume-title":"Proceedings of ISCA 21","author":"Farrens M.","year":"1919","unstructured":"Farrens , M. , Tyson , G. , and Pleszkun , A . 1994. A study of single-chip processor\/cache organizations for large number of transistors . In Proceedings of ISCA 21 . pp. 338 -- 347 . 10.1145\/ 1919 95.192066 Farrens, M., Tyson, G., and Pleszkun, A. 1994. A study of single-chip processor\/cache organizations for large number of transistors. In Proceedings of ISCA 21. pp. 338--347. 10.1145\/191995.192066"},{"key":"e_1_2_1_11_1","volume-title":"Suds: Primitive mechanisms for memory dependence speculation. Tech. Rep. MIT\/LCS Technical Memo LCS-TM-591. January.","author":"Frank M.","year":"1999","unstructured":"Frank , M. , Moritz , C. , Greenwald , B. , Amarasinghe , S. , and Agarwal , A . 1999 . Suds: Primitive mechanisms for memory dependence speculation. Tech. Rep. MIT\/LCS Technical Memo LCS-TM-591. January. Frank, M., Moritz, C., Greenwald, B., Amarasinghe, S., and Agarwal, A. 1999. Suds: Primitive mechanisms for memory dependence speculation. Tech. Rep. MIT\/LCS Technical Memo LCS-TM-591. January."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.509907"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA).","author":"Garzaran M. J.","unstructured":"Garzaran , M. J. , Prvulovic , M. , Llaberia , J. M. , Vinals , V. , Rauchwerger , L. , and Torrellas , J . 2003. Tradeoffs in buffering memory state for thread-level speculation in multiprocessors . In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA). Garzaran, M. J., Prvulovic, M., Llaberia, J. M., Vinals, V., Rauchwerger, L., and Torrellas, J. 2003. Tradeoffs in buffering memory state for thread-level speculation in multiprocessors. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1996.0104"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the Fourth International Symposium on High-Performance Computer Architecture.","author":"Gopal S.","unstructured":"Gopal , S. , Vijaykumar , T. , Smith , J. , and Sohi , G . 1998. Speculative versioning cache . In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture. Gopal, S., Vijaykumar, T., Smith, J., and Sohi, G. 1998. Speculative versioning cache. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of Supercomputing","author":"Gupta M.","year":"1998","unstructured":"Gupta , M. and Nim , R . 1998. Techniques for speculative run-time parallelization of loops . In Proceedings of Supercomputing 1998 . Gupta, M. and Nim, R. 1998. Techniques for speculative run-time parallelization of loops. In Proceedings of Supercomputing 1998."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of ASPLOS-VIII. 10","author":"Hammond L.","unstructured":"Hammond , L. , Willey , M. , and Olukotun , K . 1998. Data speculation support for a chip multiprocessor . In Proceedings of ASPLOS-VIII. 10 .1145\/291069.291020 Hammond, L., Willey, M., and Olukotun, K. 1998. Data speculation support for a chip multiprocessor. In Proceedings of ASPLOS-VIII. 10.1145\/291069.291020"},{"key":"e_1_2_1_18_1","volume-title":"Microprocessor Forum '99","author":"Kahle J.","year":"1999","unstructured":"Kahle , J. 1999 . Power4: A Dual-CPU processor chip . Microprocessor Forum '99 . Kahle, J. 1999. Power4: A Dual-CPU processor chip. Microprocessor Forum '99."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the ACM SIGPLAN 92 Conference on Programming Language Design and Implementation. 10","author":"Knoop J.","unstructured":"Knoop , J. and Ruthing , O . 1992. Lazy code motion . In Proceedings of the ACM SIGPLAN 92 Conference on Programming Language Design and Implementation. 10 .1145\/143095.143136 Knoop, J. and Ruthing, O. 1992. Lazy code motion. In Proceedings of the ACM SIGPLAN 92 Conference on Programming Language Design and Implementation. 10.1145\/143095.143136"},{"key":"e_1_2_1_20_1","unstructured":"Krishnan V. and Torrellas J. 1999a. A chip multiprocessor architecture with speculative multithreading. IEEE Trans. Comput. Special Issue on Multithreaded Architecture. 10.1109\/12.795218   Krishnan V. and Torrellas J. 1999a. A chip multiprocessor architecture with speculative multithreading. IEEE Trans. Comput. Special Issue on Multithreaded Architecture. 10.1109\/12.795218"},{"key":"e_1_2_1_21_1","volume-title":"The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Krishnan V.","unstructured":"Krishnan , V. and Torrellas , J . 1999b . The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In International Conference on Parallel Architectures and Compilation Techniques (PACT). Krishnan, V. and Torrellas, J. 1999b. The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors. In International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 24th ISCA. 241--251","author":"Laudon J.","unstructured":"Laudon , J. and Lenoski , D . 1997. The SGI Origin: A ccNUMA highly scalable server . In Proceedings of the 24th ISCA. 241--251 . 10.1145\/264107.264206 Laudon, J. and Lenoski, D. 1997. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th ISCA. 241--251. 10.1145\/264107.264206"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 8th HPCA.","author":"Marcuello P.","unstructured":"Marcuello , P. and Gonz\u00e1lez , A . 2002. Thread-spawning scheme for speculative multithreading . In Proceedings of the 8th HPCA. Marcuello, P. and Gonz\u00e1lez, A. 2002. Thread-spawning scheme for speculative multithreading. In Proceedings of the 8th HPCA."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the ACM International Conference on Supercomputing. 10","author":"Marcuello P.","unstructured":"Marcuello , P. and Gonzlez , A . 1999. Clustered speculative multithreaded processors . In Proceedings of the ACM International Conference on Supercomputing. 10 .1145\/305138.305214 Marcuello, P. and Gonzlez, A. 1999. Clustered speculative multithreaded processors. In Proceedings of the ACM International Conference on Supercomputing. 10.1145\/305138.305214"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 24th ISCA. 10","author":"Moshovos A. I.","unstructured":"Moshovos , A. I. , Breach , S. E. , Vijaykumar , T. , and Sohi , G. S . 1997. Dynamic speculation and synchronization of data dependences . In Proceedings of the 24th ISCA. 10 .1145\/264107.264189 Moshovos, A. I., Breach, S. E., Vijaykumar, T., and Sohi, G. S. 1997. Dynamic speculation and synchronization of data dependences. In Proceedings of the 24th ISCA. 10.1145\/264107.264189"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of ASPLOS-VII. 10","author":"Olukotun K.","unstructured":"Olukotun , K. , Nayfeh , B. A. , Hammond , L. , Wilson , K. , and Chang , K . 1996. The Case for a Single-Chip Multiprocessor . In Proceedings of ASPLOS-VII. 10 .1145\/237090.237140 Olukotun, K., Nayfeh, B. A., Hammond, L., Wilson, K., and Chang, K. 1996. The Case for a Single-Chip Multiprocessor. In Proceedings of ASPLOS-VII. 10.1145\/237090.237140"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the International Conference on Supercomputing. 10","author":"Ooi C. L.","unstructured":"Ooi , C. L. , Kim , S. W. , Park , I. , Eigenmann , R. , Falsafi , B. , and Vijaykumar , T. N . 2001. Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor . In Proceedings of the International Conference on Supercomputing. 10 .1145\/377792.377863 Ooi, C. L., Kim, S. W., Park, I., Eigenmann, R., Falsafi, B., and Vijaykumar, T. N. 2001. Multiplex: Unifying conventional and speculative thread-level parallelism on a chip multiprocessor. In Proceedings of the International Conference on Supercomputing. 10.1145\/377792.377863"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99)","author":"Oplinger J.","unstructured":"Oplinger , J. , Heine , D. , and Lam , M. S . 1999. In search of speculative thread-level parallelism . In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99) . Oplinger, J., Heine, D., and Lam, M. S. 1999. In search of speculative thread-level parallelism. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT'99)."},{"key":"e_1_2_1_29_1","volume-title":"Tech. Rep. CS-TR-1996-1328","author":"Palacharla S.","year":"1996","unstructured":"Palacharla , S. , Jouppi , N. P. , and Smith , J. E . 1996 . Quantifying the complexity of superscalar processors. Tech. Rep. CS-TR-1996-1328 , University of Wisconsin-Madison. Palacharla, S., Jouppi, N. P., and Smith, J. E. 1996. Quantifying the complexity of superscalar processors. Tech. Rep. CS-TR-1996-1328, University of Wisconsin-Madison."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 30th annual international symposium on Computer architecture. 10","author":"Park I.","unstructured":"Park , I. , Falsafi , B. , and Vijaykumar , T. N . 2003. Implicitly-multithreaded processors . In Proceedings of the 30th annual international symposium on Computer architecture. 10 .1145\/859618.859624 Park, I., Falsafi, B., and Vijaykumar, T. N. 2003. Implicitly-multithreaded processors. In Proceedings of the 30th annual international symposium on Computer architecture. 10.1145\/859618.859624"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Prabhu M. and Olukotun K. 2003. Using thread-level speculation to simplify manual parallelization. In Principles and Practices of Parallel Programming. 10.1145\/781498.781500   Prabhu M. and Olukotun K. 2003. Using thread-level speculation to simplify manual parallelization. In Principles and Practices of Parallel Programming. 10.1145\/781498.781500","DOI":"10.1145\/781498.781500"},{"key":"e_1_2_1_32_1","volume-title":"proceedings of the 28th Annual International Symposium on Computer Architecture. 10","author":"Prvulovic M.","unstructured":"Prvulovic , M. , Garzaran , M. , Rauchwerger , L. , and Torrellas , J . 2001. Removing architectural bottlenecks to the scalability of speculative parallelizatoin . In proceedings of the 28th Annual International Symposium on Computer Architecture. 10 .1145\/379240.379264 Prvulovic, M., Garzaran, M., Rauchwerger, L., and Torrellas, J. 2001. Removing architectural bottlenecks to the scalability of speculative parallelizatoin. In proceedings of the 28th Annual International Symposium on Computer Architecture. 10.1145\/379240.379264"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of PLDI '95","author":"Rauchwerger L.","year":"2071","unstructured":"Rauchwerger , L. and Padua , D . 1995. The LRPD Test: Speculative run-time parallelization of loops with privatization and reduction parallelization . In Proceedings of PLDI '95 . 218--232. 10.1145\/ 2071 10.207148 Rauchwerger, L. and Padua, D. 1995. The LRPD Test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of PLDI '95. 218--232. 10.1145\/207110.207148"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of Micro 30","author":"Rotenberg E.","unstructured":"Rotenberg , E. , Jacobson , Q. , Sazeides , Y. , and Smith , J . 1997. Trace processors . In Proceedings of Micro 30 . Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. 1997. Trace processors. In Proceedings of Micro 30."},{"key":"e_1_2_1_35_1","volume-title":"7th International Symposium on High Performance Computer Architecture (HPCA-7). 20--24","author":"Roth A.","unstructured":"Roth , A. and Sohi , G . 2001. Speculative data-driven multithreading . In 7th International Symposium on High Performance Computer Architecture (HPCA-7). 20--24 . Roth, A. and Sohi, G. 2001. Speculative data-driven multithreading. In 7th International Symposium on High Performance Computer Architecture (HPCA-7). 20--24."},{"key":"e_1_2_1_36_1","volume-title":"Fourth Workshop on Multithreaded Execution, Architecture and Compilation.","author":"Rundberg P.","unstructured":"Rundberg , P. and Stenstrom , P . 2000. Low-cost thread-level data dependence speculation on multiprocessors . In Fourth Workshop on Multithreaded Execution, Architecture and Compilation. Rundberg, P. and Stenstrom, P. 2000. Low-cost thread-level data dependence speculation on multiprocessors. In Fourth Workshop on Multithreaded Execution, Architecture and Compilation."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of ISCA 22","author":"Sohi G. S.","unstructured":"Sohi , G. S. , Breach , S. , and Vijaykumar , T. N . 1995. Multiscalar processors . In Proceedings of ISCA 22 . 414--425. 10.1145\/223982.224451 Sohi, G. S., Breach, S., and Vijaykumar, T. N. 1995. Multiscalar processors. In Proceedings of ISCA 22. 414--425. 10.1145\/223982.224451"},{"key":"e_1_2_1_38_1","volume-title":"The SPEC Benchmark Suite. Tech. rep","unstructured":"SPEC. 2000. The SPEC Benchmark Suite. Tech. rep ., Standard Performance Evaluation Corporation . http:\/\/www.spechbench.org. SPEC. 2000. The SPEC Benchmark Suite. Tech. rep., Standard Performance Evaluation Corporation. http:\/\/www.spechbench.org."},{"key":"e_1_2_1_40_1","volume-title":"Tech. Rep. CMU-CS-97-188, School of Computer Science","author":"Steffan J. G.","year":"1997","unstructured":"Steffan , J. G. , Colohan , C. B. , and Mowry , T. C . 1997 . Architectural Support for Thread-Level Data Speculation . Tech. Rep. CMU-CS-97-188, School of Computer Science , Carnegie Mellon University . November. Steffan, J. G., Colohan, C. B., and Mowry, T. C. 1997. Architectural Support for Thread-Level Data Speculation. Tech. Rep. CMU-CS-97-188, School of Computer Science, Carnegie Mellon University. November."},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 8th HPCA.","author":"Steffan J. G.","unstructured":"Steffan , J. G. , Colohan , C. B. , Zhai , A. , and Mowry , T. C . 2002. Improving value communication for thread-level speculation . In Proceedings of the 8th HPCA. Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. 2002. Improving value communication for thread-level speculation. In Proceedings of the 8th HPCA."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of ISCA 27","author":"Steffan J. G.","unstructured":"Steffan , J. G. , Colohan , C. B. , Zhaia , A. , and Mowry , T. C . 2000. A scalable approach to thread-level speculation . In Proceedings of ISCA 27 . 10.1145\/339647.339650 Steffan, J. G., Colohan, C. B., Zhaia, A., and Mowry, T. C. 2000. A scalable approach to thread-level speculation. In Proceedings of ISCA 27. 10.1145\/339647.339650"},{"key":"e_1_2_1_43_1","unstructured":"Tjiang S. Wolf M. Lam M. Pieper K. and Hennessy J. 1992. Languages and Compilers for Parallel Computing. Springer-Verlag Berlin Germany 137--151.  Tjiang S. Wolf M. Lam M. Pieper K. and Hennessy J. 1992. Languages and Compilers for Parallel Computing. Springer-Verlag Berlin Germany 137--151."},{"key":"e_1_2_1_44_1","volume-title":"MAJC: Microprocessor Architecture for Java Computing. HotChips '99","author":"Tremblay M.","year":"1999","unstructured":"Tremblay , M. 1999 . MAJC: Microprocessor Architecture for Java Computing. HotChips '99 . Tremblay, M. 1999. MAJC: Microprocessor Architecture for Java Computing. HotChips '99."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of ISCA 22","author":"Tullsen D. M.","unstructured":"Tullsen , D. M. , Eggers , S. J. , and Levy , H. M . 1995. Simultaneous multithreading: Maximizing on-chip parallelism . In Proceedings of ISCA 22 . 392--403. 10.1145\/223982.224449 Tullsen, D. M., Eggers, S. J., and Levy, H. M. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of ISCA 22. 392--403. 10.1145\/223982.224449"},{"key":"e_1_2_1_46_1","unstructured":"Veenstra J. 2000. MINT+ mips emulator. Personal communication.  Veenstra J. 2000. MINT+ mips emulator. Personal communication."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.491460"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of ASPLOS-X. 10","author":"Zhaia A.","unstructured":"Zhaia , A. , Colohan , C. B. , Steffan , J. G. , and Mowry , T. C . 2002. Compiler optimization of scalar value communication between speculative threads . In Proceedings of ASPLOS-X. 10 .1145\/605397.605416 Zhaia, A., Colohan, C. B., Steffan, J. G., and Mowry, T. C. 2002. Compiler optimization of scalar value communication between speculative threads. In Proceedings of ASPLOS-X. 10.1145\/605397.605416"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization.","author":"Zhaia A.","unstructured":"Zhaia , A. , Colohan , C. B. , Steffan , J. G. , and Mowry , T. C . 2004. Compiler optimization of memory-resident value communication between speculative threads . In Proceedings of the International Symposium on Code Generation and Optimization. Zhaia, A., Colohan, C. B., Steffan, J. G., and Mowry, T. C. 2004. Compiler optimization of memory-resident value communication between speculative threads. In Proceedings of the International Symposium on Code Generation and Optimization."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the Fourth International Symposium on High-Performance Computer Architecture.","author":"Zhang Y.","unstructured":"Zhang , Y. , Rauchwerger , L. , and Torrellas , J . 1998. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors . In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture. Zhang, Y., Rauchwerger, L., and Torrellas, J. 1998. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture."},{"key":"e_1_2_1_52_1","volume-title":"Fifth International Symposium on High-Performance Computer Architecture (HPCA). 135--141","author":"Zhang Y.","unstructured":"Zhang , Y. , Rauchwerger , L. , and Torrellas , J . 1999. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors . In Fifth International Symposium on High-Performance Computer Architecture (HPCA). 135--141 . Zhang, Y., Rauchwerger, L., and Torrellas, J. 1999. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In Fifth International Symposium on High-Performance Computer Architecture (HPCA). 135--141."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the 22nd Annual International Symposium on Computer Architecture. 188--200","author":"Zhang Z.","unstructured":"Zhang , Z. and Torrellas , J . 1995. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching . In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 188--200 . 10.1145\/223982.224423 Zhang, Z. and Torrellas, J. 1995. Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 188--200. 10.1145\/223982.224423"},{"key":"e_1_2_1_54_1","volume-title":"35th International Symposium on Microarchitecture (MICRO-35)","author":"Zilles C.","unstructured":"Zilles , C. and Sohi , G . 2002. Master\/slave speculative parallelization . In 35th International Symposium on Microarchitecture (MICRO-35) . 18--22. Zilles, C. and Sohi, G. 2002. Master\/slave speculative parallelization. In 35th International Symposium on Microarchitecture (MICRO-35). 18--22."}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1082469.1082471","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1082469.1082471","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:18:46Z","timestamp":1750263526000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1082469.1082471"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,8]]},"references-count":52,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2005,8]]}},"alternative-id":["10.1145\/1082469.1082471"],"URL":"https:\/\/doi.org\/10.1145\/1082469.1082471","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"value":"0734-2071","type":"print"},{"value":"1557-7333","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,8]]},"assertion":[{"value":"2005-08-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}