{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:15:20Z","timestamp":1750306520377,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T00:00:00Z","timestamp":1417996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["DAL No 267175"],"award-info":[{"award-number":["DAL No 267175"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>ARM ISA-based processors are no longer low-cost, low-power processors. Nowadays, ARM ISA-based processor manufacturers are striving to implement medium-end to high-end processor cores, which implies implementing a state-of-the-art out-of-order execution engine. Unfortunately, providing efficient out-of-order execution on legacy ARM codes may be quite challenging due to guarded instructions.<\/jats:p>\n          <jats:p>Predicting the guarded instructions addresses the main serialization impact associated with guarded instructions execution and the multiple definition problem. Moreover, guard prediction allows one to use a global branch-and-guard history predictor to predict both branches and guards, often improving branch prediction accuracy. Unfortunately, such a global branch-and-guard history predictor requires the systematic use of guard predictions. In that case, poor guard prediction accuracy would lead to poor overall performance on some applications.<\/jats:p>\n          <jats:p>Building on top of recent advances in branch prediction and confidence estimation, we propose a hybrid branch-and-guard predictor, combining a global branch history component and global branch-and-guard history component. The potential gain or loss due to the systematic use of guard prediction is dynamically evaluated at runtime. Two computing modes are enabled: systematic guard prediction use and high-confidence-only guard prediction use.<\/jats:p>\n          <jats:p>Our experiments show that on most applications, an overwhelming majority of guarded instructions are predicted. Therefore, a simple but relatively inefficient hardware solution can be used to execute the few unpredicted guarded instructions. Significant performance benefits are observed on most applications, while applications with poorly predictable guards do not suffer from performance loss.<\/jats:p>","DOI":"10.1145\/2677037","type":"journal-article","created":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T16:17:14Z","timestamp":1418055434000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Efficient Out-of-Order Execution of Guarded ISAs"],"prefix":"10.1145","volume":"11","author":[{"given":"Nathanael","family":"Pr\u00e9millieu","sequence":"first","affiliation":[{"name":"ARM Ltd., Cambridge, England"}]},{"given":"Andr\u00e9","family":"Seznec","sequence":"additional","affiliation":[{"name":"INRIA\/IRISA, Rennes, France"}]}],"member":"320","published-online":{"date-parts":[[2014,12,8]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_1_1","DOI":"10.1145\/567067.567085"},{"volume-title":"Alpha 21264 Microprocessor Hardware Reference Manual","unstructured":"Alpha. 1999. Alpha 21264 Microprocessor Hardware Reference Manual . Compaq Computer Corporation . Alpha. 1999. Alpha 21264 Microprocessor Hardware Reference Manual. Compaq Computer Corporation.","key":"e_1_2_1_2_1"},{"unstructured":"ARM. 2014. ARM Architecture Reference Manual. ARM v7-A and ARM v7-R edition.  ARM. 2014. ARM Architecture Reference Manual. ARM v7-A and ARM v7-R edition.","key":"e_1_2_1_3_1"},{"unstructured":"Fabrice Bellard. 2012. QEMU. Retrieved rom http:\/\/wiki.qemu.org\/Main_Page.  Fabrice Bellard. 2012. QEMU. Retrieved rom http:\/\/wiki.qemu.org\/Main_Page.","key":"e_1_2_1_4_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1145\/2024716.2024718"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1007\/BF03356749"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1145\/279358.279378"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1145\/782814.782840"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/115372.115320"},{"key":"e_1_2_1_10_1","volume-title":"Adaptive information processing: An effective way to improve perceptron predictors. J. Instruction-Level Parallelism 7","author":"Gao Hongliang","year":"2005","unstructured":"Hongliang Gao and Huiyang Zhou . 2005. Adaptive information processing: An effective way to improve perceptron predictors. J. Instruction-Level Parallelism 7 ( 2005 ). Hongliang Gao and Huiyang Zhou. 2005. Adaptive information processing: An effective way to improve perceptron predictors. J. Instruction-Level Parallelism 7 (2005)."},{"key":"e_1_2_1_11_1","volume-title":"Faster and more flexible program phase analysis. J. Instruction Level Parallelism 7 (Sept","author":"Hamerly Greg","year":"2005","unstructured":"Greg Hamerly , Erez Perelman , Jeremy Lau , and Brad Calder . 2005. SimPoint 3.0 : Faster and more flexible program phase analysis. J. Instruction Level Parallelism 7 (Sept . 2005 ). Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. SimPoint 3.0: Faster and more flexible program phase analysis. J. Instruction Level Parallelism 7 (Sept. 2005)."},{"unstructured":"Intel Corp. 2002. Intel Itanium Architecture Software Developers Manual. Volume 3: Instruction Set Reference. (2002).  Intel Corp. 2002. Intel Itanium Architecture Software Developers Manual. Volume 3: Instruction Set Reference. (2002).","key":"e_1_2_1_12_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_13_1","DOI":"10.1145\/571637.571639"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1109\/MICRO.2006.20"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1109\/MICRO.2005.38"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1145\/191995.192022"},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.1109\/HPCA.2007.346186"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1145\/1183401.1183410"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1109\/ISCA.2005.13"},{"unstructured":"Andr\u00e9 Seznec. 2007. The L-TAGE branch predictor. In J. Instruction Level Parallelism.  Andr\u00e9 Seznec. 2007. The L-TAGE branch predictor. In J. Instruction Level Parallelism.","key":"e_1_2_1_20_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1145\/2155620.2155635"},{"doi-asserted-by":"publisher","key":"e_1_2_1_22_1","DOI":"10.5555\/2014698.2014879"},{"key":"e_1_2_1_23_1","volume-title":"A case for (partially) tagged Geometric History Length Branch Prediction. J. Instruction Level Parallelism (Feb","author":"Seznec Andr","year":"2006","unstructured":"Andr Seznec and Pierre Michaud . 2006. A case for (partially) tagged Geometric History Length Branch Prediction. J. Instruction Level Parallelism (Feb . 2006 ). Andr Seznec and Pierre Michaud. 2006. A case for (partially) tagged Geometric History Length Branch Prediction. J. Instruction Level Parallelism (Feb. 2006)."},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.5555\/822080.822828"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.5555\/800052.801871"},{"key":"e_1_2_1_26_1","volume-title":"SPEC CPU2006","author":"SPEC.","year":"2006","unstructured":"SPEC. 2006 . SPEC CPU2006 . Retrieved from http:\/\/www.spec.org\/cpu 2006\/. SPEC. 2006. SPEC CPU2006. Retrieved from http:\/\/www.spec.org\/cpu2006\/."},{"doi-asserted-by":"publisher","key":"e_1_2_1_27_1","DOI":"10.1109\/MICRO.2008.4771812"},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1145\/1089008.1089011"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1145\/192724.192753"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.5555\/580550.876426"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2677037","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2677037","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:11:56Z","timestamp":1750227116000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2677037"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,8]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2677037"],"URL":"https:\/\/doi.org\/10.1145\/2677037","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2014,12,8]]},"assertion":[{"value":"2014-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-12-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}