{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:51:28Z","timestamp":1750308688722,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2012,9,1]],"date-time":"2012-09-01T00:00:00Z","timestamp":1346457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,9]]},"abstract":"<jats:p>The current trend toward multicore architectures has placed great pressure on programmers and compilers to generate thread-parallel programs. Improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution. One notable technique that facilitates the extraction of parallel threads from sequential applications is thread-level speculation (TLS). This technique allows programmers\/compilers to generate threads without checking for inter-thread data and control dependences, which are then transparently enforced by the hardware. Most prior work on TLS has concentrated on thread selection and mechanisms to efficiently support the main TLS operations, such as squashes, data versioning, and commits.<\/jats:p>\n          <jats:p>This article seeks to enhance TLS functionality by combining it with other speculative multithreaded execution models. The main idea is that TLS already requires extensive hardware support, which when slightly augmented can accommodate other speculative multithreaded techniques. Recognizing that for different applications, or even program phases, the application bottlenecks may be different, it is reasonable to assume that the more versatile a system is, the more efficiently it will be able to execute the given program.<\/jats:p>\n          <jats:p>Toward this direction, we first show that mixed execution models that combine TLS with Helper Threads (HT), RunAhead execution (RA) and MultiPath execution (MP) perform better than any of the models alone. Based on a simple model that we propose, we show that benefits come from being able to extract additional ILP without harming the TLP extracted by TLS. We then show that by combining all the execution models in a unified one that combines all these speculative multithreaded models, ILP can be further enhanced with only minimal additional cost in hardware.<\/jats:p>","DOI":"10.1145\/2355585.2355591","type":"journal-article","created":{"date-parts":[[2012,10,2]],"date-time":"2012-10-02T13:50:00Z","timestamp":1349185800000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Mixed speculative multithreaded execution models"],"prefix":"10.1145","volume":"9","author":[{"given":"Polychronis","family":"Xekalakis","sequence":"first","affiliation":[{"name":"University of Edinburgh, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikolas","family":"Ioannou","sequence":"additional","affiliation":[{"name":"University of Edinburgh, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marcelo","family":"Cintra","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,10,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277854"},{"volume-title":"Proceedings of the International Conference on High Performance Computing, 213--224","author":"Aragon J. L.","key":"e_1_2_1_2_1","unstructured":"Aragon , J. L. , Gonz\u00e1lez , J. , Garca , J. M. , and Gonz\u00e1lez , A . 2001. Confidence estimation for branch prediction reversal . In Proceedings of the International Conference on High Performance Computing, 213--224 . Aragon, J. L., Gonz\u00e1lez ,J., Garca, J. M., and Gonz\u00e1lez, A. 2001. Confidence estimation for branch prediction reversal. In Proceedings of the International Conference on High Performance Computing, 213--224."},{"volume-title":"Proceedings of the International Symposium on Microarchitecture. 387--398","author":"Barnes R.","key":"e_1_2_1_3_1","unstructured":"Barnes , R. , Nystrom , E. , Sias , J. , Patel , S. , Navarro , N. , and Hwu , W. M. 2003. Beating in-order stalls with \u201cflea-ficker\u201d two-pass pipelining . In Proceedings of the International Symposium on Microarchitecture. 387--398 . Barnes, R., Nystrom, E., Sias, J., Patel, S., Navarro, N., and Hwu, W. M. 2003. Beating in-order stalls with \u201cflea-ficker\u201d two-pass pipelining. In Proceedings of the International Symposium on Microarchitecture. 387--398."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1138035.1138038"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/300979.300995"},{"volume-title":"Proceedings of the International Symposium on Computer Architecture. 307--317","author":"Chappell R. S.","key":"e_1_2_1_6_1","unstructured":"Chappell , R. S. , Tseng , F. , Patt , Y. N. , and Yoaz , A . 2002. Difficult-path branch prediction using subordinate microthreads . In Proceedings of the International Symposium on Computer Architecture. 307--317 . Chappell, R. S., Tseng, F., Patt, Y. N., and Yoaz, A. 2002. Difficult-path branch prediction using subordinate microthreads. In Proceedings of the International Symposium on Computer Architecture. 307--317."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555814"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1383-7621(03)00042-0"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379248"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263597"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2011.72"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279376"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/291069.291020"},{"key":"e_1_2_1_14_1","unstructured":"Heil T. and Smith J. E. 1996. Selective dual path execution. Tech. Rep. Department of Electrical and Computer Engineering University of Wisconsin-Madison.  Heil T. and Smith J. E. 1996. Selective dual path execution. Tech. Rep. Department of Electrical and Computer Engineering University of Wisconsin-Madison."},{"key":"e_1_2_1_15_1","unstructured":"Intel Corp. Intel turbo boost technology in intel core microarchitecture (nehalem) based processors. http:\/\/download.intel.com\/design\/processor\/applnots\/320354.pdf.  Intel Corp. Intel turbo boost technology in intel core microarchitecture (nehalem) based processors. http:\/\/download.intel.com\/design\/processor\/applnots\/320354.pdf."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155654"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.20"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.9"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279393"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277852"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122971.1122997"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305214"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture. 129--140","author":"Mutlu O.","key":"e_1_2_1_23_1","unstructured":"Mutlu , O. , Stark , J. , Wilkerson , C. , and Patt , Y. N . 2003. Runahead execution: An alternative to very large instruction windows . In Proceedings of the International Symposium on High-Performance Computer Architecture. 129--140 . Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: An alternative to very large instruction windows. In Proceedings of the International Symposium on High-Performance Computer Architecture. 129--140."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2009.60"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.37"},{"key":"e_1_2_1_26_1","unstructured":"Renau J. SESC simulator. http:\/\/sesc.sourceforge.net.  Renau J. SESC simulator. http:\/\/sesc.sourceforge.net."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088173"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.13"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture. 2--13","author":"Steffan J. G.","key":"e_1_2_1_30_1","unstructured":"Steffan , J. G. and Mowry , T. C . 1998. The potential for using thread-level data speculation to facilitate automatic parallelization . In Proceedings of the International Symposium on High-Performance Computer Architecture. 2--13 . Steffan, J. G. and Mowry, T. C. 1998. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the International Symposium on High-Performance Computer Architecture. 2--13."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/378993.379247"},{"key":"e_1_2_1_32_1","unstructured":"Tarjan D. Thoziyoor S. and Jouppi N. P. 2006. Cacti 4.0. Tech. Rep. Compaq Western Research Lab.  Tarjan D. Thoziyoor S. and Jouppi N. P. 2006. Cacti 4.0. Tech. Rep. Compaq Western Research Lab."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.232993"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1062261.1062310"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture. 367--378","author":"Xekalakis P.","key":"e_1_2_1_35_1","unstructured":"Xekalakis , P. and Cintra , M . 2010. Handling branches in TLS systems with multi-path execution . In Proceedings of the International Symposium on High-Performance Computer Architecture. 367--378 . Xekalakis, P. and Cintra, M. 2010. Handling branches in TLS systems with multi-path execution. In Proceedings of the International Symposium on High-Performance Computer Architecture. 367--378."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542333"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379246"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355591","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2355585.2355591","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:01:15Z","timestamp":1750276875000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355591"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9]]},"references-count":37,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,9]]}},"alternative-id":["10.1145\/2355585.2355591"],"URL":"https:\/\/doi.org\/10.1145\/2355585.2355591","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2012,9]]},"assertion":[{"value":"2010-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}