{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T13:04:58Z","timestamp":1770987898803,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2009,6,1]],"date-time":"2009-06-01T00:00:00Z","timestamp":1243814400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Program. Lang. Syst."],"published-print":{"date-parts":[[2009,6]]},"abstract":"<jats:p>We have built a runtime compilation system that takes unmodified sequential binaries and improves their performance on off-the-shelf multiprocessors using dynamic vectorization and loop-level parallelization techniques. Our system, Azure, is purely software based and requires no specific hardware support for speculative thread execution, yet it is able to break even in most cases; that is, the achieved speedup exceeds the cost of runtime monitoring and compilation, often by significant amounts.<\/jats:p>\n          <jats:p>\n            Key to this remarkable performance is an offline preprocessing step that extracts a\n            <jats:italic>mostly correct<\/jats:italic>\n            control flow graph (CFG) from the binary program ahead of time. This statically obtained CFG is incomplete in that it may be missing some edges corresponding to computed branches. We describe how such additional control flow edges are discovered and handled at runtime, so that an incomplete static analysis never leads to an incorrect optimization result.\n          <\/jats:p>\n          <jats:p>\n            The availability of a\n            <jats:italic>mostly correct<\/jats:italic>\n            CFG enables us to statically partition a binary executable into single-entry multiple-exit regions and to identify potential parallelization candidates ahead of execution. Program regions that are not candidates for parallelization can thereby be excluded completely from runtime monitoring and dynamic recompilation. Azure's extremely low overhead is a direct consequence of this design.\n          <\/jats:p>","DOI":"10.1145\/1538917.1538918","type":"journal-article","created":{"date-parts":[[2009,6,30]],"date-time":"2009-06-30T13:10:17Z","timestamp":1246367417000},"page":"1-46","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Mostly static program partitioning of binary executables"],"prefix":"10.1145","volume":"31","author":[{"given":"Efe","family":"Yardimci","sequence":"first","affiliation":[{"name":"University of California, Irvine"}]},{"given":"Michael","family":"Franz","sequence":"additional","affiliation":[{"name":"University of California, Irvine"}]}],"member":"320","published-online":{"date-parts":[[2009,7,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 31st Annual International Symposium on Microarchitecture. ACM Press, 226--236","author":"Akkary H.","unstructured":"]] Akkary , H. and Driscoll , M. A . 1998. A dynamic multithreading processor . In Proceedings of the 31st Annual International Symposium on Microarchitecture. ACM Press, 226--236 . ]]Akkary, H. and Driscoll, M. A. 1998. A dynamic multithreading processor. In Proceedings of the 31st Annual International Symposium on Microarchitecture. ACM Press, 226--236."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349303"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the Conference on Compiler Construction. Lecture Notes in Computer Science","volume":"2985","author":"Balakrishnan G.","unstructured":"]] Balakrishnan , G. and Reps , T . 2004. Analyzing memory accesses in x86 executables . In Proceedings of the Conference on Compiler Construction. Lecture Notes in Computer Science , vol. 2985 , Springer Verlag, 5--23. ]]Balakrishnan, G. and Reps, T. 2004. Analyzing memory accesses in x86 executables. In Proceedings of the Conference on Compiler Construction. Lecture Notes in Computer Science, vol. 2985, Springer Verlag, 5--23."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 36th International Symposium on Micro-architecture. IEEE, 191--201","author":"Baraz L.","unstructured":"]] Baraz , L. , Devor , T. , Etzion , O. , Goldenberg , S. , Skaletsky , A. , Wang , Y. , and Zemach , Y . 2003. IA-32 Execution Layer: A two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems . In Proceedings of the 36th International Symposium on Micro-architecture. IEEE, 191--201 . ]]Baraz, L., Devor, T., Etzion, O., Goldenberg, S., Skaletsky, A., Wang, Y., and Zemach, Y. 2003. IA-32 Execution Layer: A two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th International Symposium on Micro-architecture. IEEE, 191--201."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434200001400404"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/6.402166"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 1--20","author":"Carlisle M. C.","unstructured":"]] Carlisle , M. C. , Rogers , A. , Reppy , J. H. , and Hendren , L. J . 1994. Early experiences with Olden . In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 1--20 . ]]Carlisle, M. C., Rogers, A., Reppy, J. H., and Hendren, L. J. 1994. Early experiences with Olden. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 1--20."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/503032.503045"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.671403"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380250706"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/781498.781501"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1128020.1128573"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264126"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the International Conference on Computer Design.","author":"Ebcio\u011flu K.","unstructured":"]] Ebcio\u011flu , K. , Fritts , J. , Kosonocky , S. , Gschwind , M. , Altman , E. , Kailas , K. , and Bright , T . 1998. An eight-issue tree VLIW processor for dynamic binary translation . In Proceedings of the International Conference on Computer Design. ]]Ebcio\u011flu, K., Fritts, J., Kosonocky, S., Gschwind, M., Altman, E., Kailas, K., and Bright, T. 1998. An eight-issue tree VLIW processor for dynamic binary translation. In Proceedings of the International Conference on Computer Design."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1981.1675827"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301683"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.848474"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01205185"},{"key":"e_1_2_1_19_1","unstructured":"]]Kagan M. Gochman S. Orenstien D. and Lin D. 1997. MMX micro-architecture of Pentium processors with MMX technology and Pentium II micro-processors. Intel Techn. J. 8.  ]]Kagan M. Gochman S. Orenstien D. and Lin D. 1997. MMX micro-architecture of Pentium processors with MMX technology and Pentium II micro-processors. Intel Techn. J. 8."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.931893"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/778559.778562"},{"key":"e_1_2_1_22_1","volume-title":"The technology behind Crusoe processors. White Paper","author":"Klaiber A.","unstructured":"]] Klaiber , A. 2000. The technology behind Crusoe processors. White Paper , Transmeta Corp . ]]Klaiber, A. 2000. The technology behind Crusoe processors. White Paper, Transmeta Corp."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277852"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.795218"},{"key":"e_1_2_1_25_1","unstructured":"]]Krishnan V. S. 1998. Speculative multithreading architectures. Tech. rep. UIUCDCS-R-98-2048 UIUC.   ]]Krishnan V. S. 1998. Speculative multithreading architectures. Tech. rep. UIUCDCS-R-98-2048 UIUC."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380240204"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301667"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305150"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.755466"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237140"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065010.1065043"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.752782"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Symposium on Microarchitecture. 138--148","author":"Rotenberg E.","unstructured":"]] Rotenberg , E. , Jacobson , Q. , Sazeides , Y. , and Smith , J . 1997. Trace processors . In Proceedings of the International Symposium on Microarchitecture. 138--148 . ]]Rotenberg, E., Jacobson, Q., Sazeides, Y., and Smith, J. 1997. Trace processors. In Proceedings of the International Symposium on Microarchitecture. 138--148."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the SSBA Symposium on Image Analysis.","author":"Skoglund J.","unstructured":"]] Skoglund , J. and Felsberg , M . 2005. Fast image processing using SSE2 . In Proceedings of the SSBA Symposium on Image Analysis. ]]Skoglund, J. and Felsberg, M. 2005. Fast image processing using SSE2. In Proceedings of the SSBA Symposium on Image Analysis."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/285930.286010"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/989393.989446"},{"key":"e_1_2_1_38_1","unstructured":"]]Thakkar S. T. and Huff T. 1999. The Internet Streaming SIMD Extensions. Intel Tech. J. 8.  ]]Thakkar S. T. and Huff T. 1999. The Internet Streaming SIMD Extensions. Intel Tech. J. 8."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.795219"},{"key":"e_1_2_1_40_1","volume-title":"-C","author":"Tsai J.-Y.","year":"1996","unstructured":"]] Tsai , J.-Y. and Yew , P . -C . 1996 . The superthreaded architecture: Thread pipelining with run time data dependence checking and control speculation. In Proceedings of Parallel Architectures and Compilation Techniques . 35--46. ]]Tsai, J.-Y. and Yew, P.-C. 1996. The superthreaded architecture: Thread pipelining with run time data dependence checking and control speculation. In Proceedings of Parallel Architectures and Compilation Techniques. 35--46."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.224449"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the International Conference on Parallel Processing. IEEE Computer Society, Washington, 163","author":"Voss M. J.","unstructured":"]] Voss , M. J. and Eigenmann , R . 2000. Adapt: Automated de-coupled adaptive program transformation . In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society, Washington, 163 . ]]Voss, M. J. and Eigenmann, R. 2000. Adapt: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society, Washington, 163."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/568014.379583"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1128022.1128040"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 7th International Symposium on High-Performance Computer Architecture.","author":"Zilles C.","unstructured":"]] Zilles , C. and Sohi , G . 2001. A programmable co-processor for profiling . In Proceedings of the 7th International Symposium on High-Performance Computer Architecture. ]]Zilles, C. and Sohi, G. 2001. A programmable co-processor for profiling. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture."}],"container-title":["ACM Transactions on Programming Languages and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1538917.1538918","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1538917.1538918","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:26:54Z","timestamp":1750278414000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1538917.1538918"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,6]]},"references-count":44,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2009,6]]}},"alternative-id":["10.1145\/1538917.1538918"],"URL":"https:\/\/doi.org\/10.1145\/1538917.1538918","relation":{},"ISSN":["0164-0925","1558-4593"],"issn-type":[{"value":"0164-0925","type":"print"},{"value":"1558-4593","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,6]]},"assertion":[{"value":"2006-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-07-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}