{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T04:29:32Z","timestamp":1761971372450,"version":"build-2065373602"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2012,9,1]],"date-time":"2012-09-01T00:00:00Z","timestamp":1346457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,9]]},"abstract":"<jats:p>High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era. We propose a new architecture that uses multiple small latency-tolerant out-of-order cores to improve single-thread performance. Improving single-thread performance with multiple small out-of-order cores allows designers to place more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the multicore parallel hardware without sacrificing performance of inherently serial and hard to parallelize applications. Our architecture combines speculative multithreading (SpMT) with checkpoint recovery and continual flow pipeline architectures. It splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple cooperating small and latency-tolerant out-of-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE uses latency tolerance to overcome performance issues of SpMT caused by interthread data dependences. To evaluate this architecture, we have developed a microarchitecture performance model of DOE based on PTLSim, a simulation infrastructure of the x86 instruction set architecture. We evaluate the potential performance of DOE processor architecture using a simple heuristic to fork control independent threads in hardware at the target addresses of future procedure return instructions. Using applications from SpecInt 2000, we study DOE under ideal as well as realistic architectural constraints. We discuss the performance impact of key DOE architecture and application variables such as number of cores, interthread data dependences, intercore data communication delay, buffers capacity, and branch mispredictions. Without any DOE specific compiler optimizations, our results show that DOE outperforms conventional SpMT architectures by 15%, on average. We also show that DOE with four small cores can perform on average equally well to a large superscalar core, consuming about the same power. Most importantly, DOE improves throughput performance by a significant amount over a large superscalar core, up to 2.5 times, when running multitasking applications.<\/jats:p>","DOI":"10.1145\/2355585.2355592","type":"journal-article","created":{"date-parts":[[2012,10,2]],"date-time":"2012-10-02T13:50:00Z","timestamp":1349185800000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Disjoint out-of-order execution processor"],"prefix":"10.1145","volume":"9","author":[{"given":"Mageda","family":"Sharafeddine","sequence":"first","affiliation":[{"name":"American University of Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Komal","family":"Jothi","sequence":"additional","affiliation":[{"name":"American University of Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haitham","family":"Akkary","sequence":"additional","affiliation":[{"name":"American University of Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,10,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339691"},{"key":"e_1_2_1_2_1","volume-title":"Compilers: Principles, Techniques and Tools","author":"Aho A. V.","year":"2006","unstructured":"Aho , A. V. , Sethi , R. , Ullman , J. D. , and Lam , M. S . 2006 . Compilers: Principles, Techniques and Tools , 2 nd Edition. Pearson Education Inc . Aho, A. V., Sethi, R., Ullman, J. D., and Lam, M. S. 2006. Compilers: Principles, Techniques and Tools, 2nd Edition. Pearson Education Inc.","edition":"2"},{"volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 226--236","author":"Akkary H.","key":"e_1_2_1_3_1","unstructured":"Akkary , H. and Driscoll , M . 1998. A dynamic multithreading processor . In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 226--236 . Akkary, H. and Driscoll, M. 1998. A dynamic multithreading processor. In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 226--236."},{"volume-title":"Proceedings of the 36th Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 423--434","author":"Akkary H.","key":"e_1_2_1_4_1","unstructured":"Akkary , H. , Rajwar , R. , and Srinivasan , S. T . 2003. Checkpoint processing and recovery: towards scalable large instruction window processors . In Proceedings of the 36th Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 423--434 . Akkary, H., Rajwar, R., and Srinivasan, S. T. 2003. Checkpoint processing and recovery: towards scalable large instruction window processors. In Proceedings of the 36th Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 423--434."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250717"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1346281.1346298"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264209"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555814"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '98)","author":"Chen M. K.","key":"e_1_2_1_9_1","unstructured":"Chen , M. K. and Olukotun , K . 1998. Exploiting method-level parallelism in single-threaded Java programs . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '98) . IEEE, 176--184. Chen, M. K. and Olukotun, K. 1998. Exploiting method-level parallelism in single-threaded Java programs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '98). IEEE, 176--184."},{"volume-title":"Proceedings of the 34th Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 4--15","author":"Cher C.","key":"e_1_2_1_10_1","unstructured":"Cher , C. and Vijaykumar , T. N . 2001. Skipper: A microarchitecture for exploiting control flow independence . In Proceedings of the 34th Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 4--15 . Cher, C. and Vijaykumar, T. N. 2001. Skipper: A microarchitecture for exploiting control flow independence. In Proceedings of the 34th Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 4--15."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.363382"},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 43--54","author":"Cintra M.","key":"e_1_2_1_12_1","unstructured":"Cintra , M. and Torrellas , J . 2002. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors . In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 43--54 . Cintra, M. and Torrellas, J. 2002. Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 43--54."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/279358.279378"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '99)","author":"Codrescu L.","key":"e_1_2_1_14_1","unstructured":"Codrescu , L. and Wills , D. S . 1999. On dynamic speculative thread partitioning and the mem-slicing algorithm . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '99) . IEEE, 40--47. Codrescu, L. and Wills, D. S. 1999. On dynamic speculative thread partitioning and the mem-slicing algorithm. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '99). IEEE, 40--47."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.902753"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.13"},{"key":"e_1_2_1_17_1","volume-title":"Tech. Rep. UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain.","author":"Cristal A.","year":"2002","unstructured":"Cristal , A. , Valero , M. , Llosa , J. , and Gonzalez , A . 2002 . Large Virtual ROBs by Processor Checkpointing . Tech. Rep. UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain. Cristal, A., Valero, M., Llosa, J., and Gonzalez, A. 2002. Large Virtual ROBs by Processor Checkpointing. Tech. Rep. UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.53"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2004.10008"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/75277.75280"},{"volume-title":"Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques.","author":"Dubey P. K.","key":"e_1_2_1_21_1","unstructured":"Dubey , P. K. , O'brien K. , O'Brien , K. M. , and Barton , C . 1995. Single-program speculative multithreading (spmp) architecture: compiler-assisted fine-grained multithreading . In Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques. Dubey, P. K., O'brien K., O'Brien, K. M., and Barton, C. 1995. Single-program speculative multithreading (spmp) architecture: compiler-assisted fine-grained multithreading. In Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.139703"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2005.46"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2004.10004"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815966"},{"volume-title":"Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE.","author":"Gopal S.","key":"e_1_2_1_27_1","unstructured":"Gopal , S. , Vijaykumar , T. N. , Smith , J. E. , and Sohi , G . 1998. Speculative versioning cache . In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE. Gopal, S., Vijaykumar, T. N., Smith, J. E., and Sohi, G. 1998. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE."},{"volume-title":"Proceedings of the Symposium of VLSI Circuits.","author":"Hsu S.","key":"e_1_2_1_28_1","unstructured":"Hsu , S. , Chatterjee , B. , Sachdev , M. , Alvandpour , A. , Krishnamurthy , R. , and Borkar , S . 2003. A 90nm 6.5 GHz 256x64b dual supply register file with split decoder scheme . In Proceedings of the Symposium of VLSI Circuits. Hsu, S., Chatterjee, B., Sachdev, M., Alvandpour, A., Krishnamurthy, R., and Borkar, S. 2003. A 90nm 6.5 GHz 256x64b dual supply register file with split decoder scheme. In Proceedings of the Symposium of VLSI Circuits."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250686"},{"volume-title":"Proceedings of the 3th International Symposium on High-Performance Computer Architecture. IEEE, 218--229","author":"Jacobson Q.","key":"e_1_2_1_30_1","unstructured":"Jacobson , Q. , Bennett , S. , Sharma , N. , and Smith , J. E . 1997. Control flow speculation in multiscalar processors . In Proceedings of the 3th International Symposium on High-Performance Computer Architecture. IEEE, 218--229 . Jacobson, Q., Bennett, S., Sharma, N., and Smith, J. E. 1997. Control flow speculation in multiscalar processors. In Proceedings of the 3th International Symposium on High-Performance Computer Architecture. IEEE, 218--229."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/115952.115957"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/800052.801868"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.139702"},{"volume-title":"Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, 59--70","author":"Lebeck A. R.","key":"e_1_2_1_34_1","unstructured":"Lebeck , A. R. , Koppanalil , J. , Li , T. , Patwardhan , J. , and Rotenberg , E . 2002. A large, fast instruction window for tolerating cache misses . In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, 59--70 . Lebeck, A. R., Koppanalil, J., Li, T., Patwardhan, J., and Rotenberg, E. 2002. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, 59--70."},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 55--64","author":"Marcuello P.","key":"e_1_2_1_36_1","unstructured":"Marcuello , P. and Gonzalez , A . 2002. Thread-spawning schemes for speculative multithreading . In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 55--64 . Marcuello, P. and Gonzalez, A. 2002. Thread-spawning schemes for speculative multithreading. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 55--64."},{"volume-title":"Proceedings of the 14th International Symposium on Parallel and Distributed Processing. IEEE, 595--601","author":"Marcuello P.","key":"e_1_2_1_37_1","unstructured":"Marcuello , P. and Gonzalez , A . 2000. A quantitative assessment of thread-level speculation techniques . In Proceedings of the 14th International Symposium on Parallel and Distributed Processing. IEEE, 595--601 . Marcuello, P. and Gonzalez, A. 2000. A quantitative assessment of thread-level speculation techniques. In Proceedings of the 14th International Symposium on Parallel and Distributed Processing. IEEE, 595--601."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277850"},{"volume-title":"Proceedings of the 32st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 230--236","author":"Marcuello P.","key":"e_1_2_1_39_1","unstructured":"Marcuello , P. , Tubella , J. , and Gonzalez , A . 1999. Value prediction for speculative multithreaded architectures . In Proceedings of the 32st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 230--236 . Marcuello, P., Tubella, J., and Gonzalez, A. 1999. Value prediction for speculative multithreaded architectures. In Proceedings of the 32st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 230--236."},{"volume-title":"Proceedings of the 35st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 3--14","author":"Martinez J. F.","key":"e_1_2_1_40_1","unstructured":"Martinez , J. F. , Renau , J. , Huang , M. C. , Prvulovic , M. , and Torrellas , J . 2002. Cherry: Checkpointed early resource recycling in out-of-order microprocessors . In Proceedings of the 35st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 3--14 . Martinez, J. F., Renau, J., Huang, M. C., Prvulovic, M., and Torrellas, J. 2002. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 3--14."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264189"},{"volume-title":"Proceedings of the 9th International Symposium on High-Performance Computer Architecture. IEEE.","author":"Mutlu O.","key":"e_1_2_1_43_1","unstructured":"Mutlu , O. , Stark , J. , Wilkerson , C. , and Patt , Y. N . 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors . In Proceedings of the 9th International Symposium on High-Performance Computer Architecture. IEEE. Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture. IEEE."},{"volume-title":"Proceedings of the IEEE 26th International Conference on Computer Design. IEEE, 384--389","author":"Nekkalapu S.","key":"e_1_2_1_44_1","unstructured":"Nekkalapu , S. , Akkary , H. , Jothi , K. , Retnamma , R. , and Song , X . 2008. A simple latency tolerant processor . In Proceedings of the IEEE 26th International Conference on Computer Design. IEEE, 384--389 . Nekkalapu, S., Akkary, H., Jothi, K., Retnamma, R., and Song, X. 2008. A simple latency tolerant processor. In Proceedings of the IEEE 26th International Conference on Computer Design. IEEE, 384--389."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676371"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305155"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237140"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.491458"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/309758.309771"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065944.1065964"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065043"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1972.223514"},{"volume-title":"Proceedings of the 30st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 138--148","author":"Rotenberg E.","key":"e_1_2_1_53_1","unstructured":"Rotenberg , E. , Jacobson , Q. , Sazeides , Y. , and Smith , J. E . 1997. Trace processors . In Proceedings of the 30st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 138--148 . Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. E. 1997. Trace processors. In Proceedings of the 30st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 138--148."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224451"},{"volume-title":"Proceedings of the IEEE 22nd International Conference on Computer Design. IEEE, 360--367","author":"Srinivasan S. T.","key":"e_1_2_1_55_1","unstructured":"Srinivasan , S. T. , Akkary , H. , Holman , T. , and Lai , K . 2004. A minimal dual-core speculative multithreading architecture . In Proceedings of the IEEE 22nd International Conference on Computer Design. IEEE, 360--367 . Srinivasan, S. T., Akkary, H., Holman, T., and Lai, K. 2004. A minimal dual-core speculative multithreading architecture. In Proceedings of the IEEE 22nd International Conference on Computer Design. IEEE, 360--367."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1024393.1024407"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339650"},{"volume-title":"Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 65--74","author":"Steffan J.","key":"e_1_2_1_58_1","unstructured":"Steffan , J. , Colohan , C. B. , Zhai , A. , and Mowry , T. C . 2002. Improving value communication for thread-level speculation . In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 65--74 . Steffan, J., Colohan, C. B., Zhai, A., and Mowry, T. C. 2002. Improving value communication for thread-level speculation. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture. IEEE, 65--74."},{"volume-title":"Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 2--13","author":"Steffan J.","key":"e_1_2_1_59_1","unstructured":"Steffan J. and Mowry , T. C . 1998. The potential for using thread-level data speculation to facilitate automatic parallelization . In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 2--13 . Steffan J. and Mowry, T. C. 1998. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 2--13."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.795219"},{"volume-title":"Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 24--35","author":"Tsai J.-Y.","key":"e_1_2_1_61_1","unstructured":"Tsai , J.-Y. , Jiang , Z. , Ness , E. , and Yew , P . -C. 1998. Performance Study of a concurrent multithreaded processor . In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 24--35 . Tsai, J.-Y., Jiang, Z., Ness, E., and Yew, P.-C. 1998. Performance Study of a concurrent multithreaded processor. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture. IEEE, 24--35."},{"volume-title":"Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 81--92","author":"Vijaykumar T. N.","key":"e_1_2_1_62_1","unstructured":"Vijaykumar , T. N. , and Sohi , G . 1998. Task selection for a multiscalar processor . In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 81--92 . Vijaykumar, T. N., and Sohi, G. 1998. Task selection for a multiscalar processor. In Proceedings of the 31st Annual ACM\/IEEE International Symposium on Microarchitecture. ACM, 81--92."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815965"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2007.363733"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605416"},{"volume-title":"Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture.","author":"Zilles C.","key":"e_1_2_1_66_1","unstructured":"Zilles , C. and Sohi , G . 2002. Master\/slave speculative parallelization . In Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture. Zilles, C. and Sohi, G. 2002. Master\/slave speculative parallelization. In Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355592","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2355585.2355592","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:01:15Z","timestamp":1750276875000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355592"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9]]},"references-count":63,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,9]]}},"alternative-id":["10.1145\/2355585.2355592"],"URL":"https:\/\/doi.org\/10.1145\/2355585.2355592","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2012,9]]},"assertion":[{"value":"2010-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}