{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:55:29Z","timestamp":1750308929207,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2004,12,1]],"date-time":"2004-12-01T00:00:00Z","timestamp":1101859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2004,12]]},"abstract":"<jats:p>Large instruction window processors achieve high performance by exposing large amounts of instruction level parallelism. However, accessing large hardware structures typically required to buffer and process such instruction window sizes significantly degrade the cycle time. This paper proposes a novel checkpoint processing and recovery (CPR) microarchitecture, and shows how to implement a large instruction window processor without requiring large structures thus permitting a high clock frequency.We focus on four critical aspects of a microarchitecture: (1) scheduling instructions, (2) recovering from branch mispredicts, (3) buffering a large number of stores and forwarding data from stores to any dependent load, and (4) reclaiming physical registers. While scheduling window size is important, we show the performance of large instruction windows to be more sensitive to the other three design issues. Our CPR proposal incorporates novel microarchitectural schemes for addressing these design issues---a selective checkpoint mechanism for recovering from mispredicts, a hierarchical store queue organization for fast store-load forwarding, and an effective algorithm for aggressive physical register reclamation. Our proposals allow a processor to realize performance gains due to instruction windows of thousands of instructions without requiring large cycle-critical hardware structures.<\/jats:p>","DOI":"10.1145\/1044823.1044826","type":"journal-article","created":{"date-parts":[[2005,8,1]],"date-time":"2005-08-01T17:31:42Z","timestamp":1122917502000},"page":"418-444","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["An analysis of a resource efficient checkpoint architecture"],"prefix":"10.1145","volume":"1","author":[{"given":"Haitham","family":"Akkary","sequence":"first","affiliation":[{"name":"Intel Corporation, Hillsboro, OR"}]},{"given":"Ravi","family":"Rajwar","sequence":"additional","affiliation":[{"name":"Intel Corporation, Hillsboro, OR"}]},{"given":"Srikanth T.","family":"Srinivasan","sequence":"additional","affiliation":[{"name":"Intel Corporation, Hillsboro, OR"}]}],"member":"320","published-online":{"date-parts":[[2004,12]]},"reference":[{"volume-title":"Proceedings of the 36th International Symposium on Microarchitecture.","author":"Akkary H.","key":"e_1_2_1_1_1","unstructured":"Akkary , H. , Rajwar , R. , and Srinivasan , S. T . 2003. Checkpoint processing and recovery: Towards scalable large instruction window processors . In Proceedings of the 36th International Symposium on Microarchitecture. Akkary, H., Rajwar, R., and Srinivasan, S. T. 2003. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proceedings of the 36th International Symposium on Microarchitecture."},{"volume-title":"Proceedings of the 34th International Symposium on Microarchitecture. 237--249","author":"Balasubramonian R.","key":"e_1_2_1_2_1","unstructured":"Balasubramonian , R. , Dwarkadas , S. , and Albonesi , D . 2001. Reducing the complexity of the register file in dynamic superscalar processors . In Proceedings of the 34th International Symposium on Microarchitecture. 237--249 . Balasubramonian, R., Dwarkadas, S., and Albonesi, D. 2001. Reducing the complexity of the register file in dynamic superscalar processors. In Proceedings of the 34th International Symposium on Microarchitecture. 237--249."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"volume-title":"Proceedings of the 34th International Symposium on Microarchitecture. 204--213","author":"Brown M. D.","key":"e_1_2_1_4_1","unstructured":"Brown , M. D. , Stark , J. , and Patt , Y. N . 2001. Select-free instruction scheduling logic . In Proceedings of the 34th International Symposium on Microarchitecture. 204--213 . Brown, M. D., Stark, J., and Patt, Y. N. 2001. Select-free instruction scheduling logic. In Proceedings of the 34th International Symposium on Microarchitecture. 204--213."},{"volume-title":"Proceedings of the 25th International Symposium on Microarchitecture. 292--300","author":"Capitanio A.","key":"e_1_2_1_5_1","unstructured":"Capitanio , A. , Dutt , N. , and Nicolau , A . 1992. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs . In Proceedings of the 25th International Symposium on Microarchitecture. 292--300 . Capitanio, A., Dutt, N., and Nicolau, A. 1992. Partitioned register files for VLIWs: A preliminary analysis of tradeoffs. In Proceedings of the 25th International Symposium on Microarchitecture. 292--300."},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Cristal A. Martinez J. F. Llosa J. and Valero M. 2003. A case for resource-conscious out-of-order processors. In Computer Architecture Letters.  Cristal A. Martinez J. F. Llosa J. and Valero M. 2003. A case for resource-conscious out-of-order processors. In Computer Architecture Letters.","DOI":"10.1145\/1152923.1024296"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 10th International Symposium on High-Performance Computer Architecture. 48--59","author":"Cristal A.","year":"2004","unstructured":"Cristal , A. , Ortega , D. , Llosa , J. , and Valero , M . 2004. Out-of-order commit processors . In Proceedings of the 10th International Symposium on High-Performance Computer Architecture. 48--59 . 10.1109\/HPCA. 2004 .10008 Cristal, A., Ortega, D., Llosa, J., and Valero, M. 2004. Out-of-order commit processors. In Proceedings of the 10th International Symposium on High-Performance Computer Architecture. 48--59. 10.1109\/HPCA.2004.10008"},{"key":"e_1_2_1_8_1","volume-title":"Tech. Rep. UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain. July.","author":"Cristal A.","year":"2002","unstructured":"Cristal , A. , Valero , M. , Llosa , J.-L. , and Gonzalez , A . 2002 . Large Virtual ROBs by Processor Checkpointing . Tech. Rep. UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain. July. Cristal, A., Valero, M., Llosa, J.-L., and Gonzalez, A. 2002. Large Virtual ROBs by Processor Checkpointing. Tech. Rep. UPC-DAC-2002-39, Department of Computer Science, Barcelona, Spain. July."},{"volume-title":"Proceedings of the 27th Annual International Symposium on Computer Architecture. ACM Press. 316--325","author":"Cruz J.-L.","key":"e_1_2_1_9_1","unstructured":"Cruz , J.-L. , Gonzalez , A. , Valero , M. , and Topham , N. P . 2000. Multiple-banked register file architectures . In Proceedings of the 27th Annual International Symposium on Computer Architecture. ACM Press. 316--325 . 10.1145\/339647.339708 Cruz, J.-L., Gonzalez, A., Valero, M., and Topham, N. P. 2000. Multiple-banked register file architectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture. ACM Press. 316--325. 10.1145\/339647.339708"},{"volume-title":"Proceedings of the 1997 International Conference on Supercomputing. 68--75","author":"Dundas J.","key":"e_1_2_1_10_1","unstructured":"Dundas , J. and Mudge , T . 1997. Improving data cache performance by pre-executing instructions under a cache miss . In Proceedings of the 1997 International Conference on Supercomputing. 68--75 . 10.1145\/263580.263597 Dundas, J. and Mudge, T. 1997. Improving data cache performance by pre-executing instructions under a cache miss. In Proceedings of the 1997 International Conference on Supercomputing. 68--75. 10.1145\/263580.263597"},{"key":"e_1_2_1_11_1","unstructured":"Hinton G. Sager D. Upton M. Boggs D. Carmean D. Kyker A. and Roussel P. 2001. The microarchitecture of the Pentium 4 processor. Intel Technology Journal.  Hinton G. Sager D. Upton M. Boggs D. Carmean D. Kyker A. and Roussel P. 2001. The microarchitecture of the Pentium 4 processor. Intel Technology Journal."},{"volume-title":"Proceedings of the 14th Annual International Symposium on Computer Architecture. 18--26","author":"Hwu W. W.","key":"e_1_2_1_12_1","unstructured":"Hwu , W. W. and Patt , Y. N . 1987. Checkpoint repair for out-of-order execution machines . In Proceedings of the 14th Annual International Symposium on Computer Architecture. 18--26 . 10.1145\/30350.30353 Hwu, W. W. and Patt, Y. N. 1987. Checkpoint repair for out-of-order execution machines. In Proceedings of the 14th Annual International Symposium on Computer Architecture. 18--26. 10.1145\/30350.30353"},{"volume-title":"Proceedings of the 29th International Symposium on Microarchitecture. 142--152","author":"Jacobsen E.","key":"e_1_2_1_13_1","unstructured":"Jacobsen , E. , Rotenberg , E. , and Smith , J. E . 1996. Assigning confidence to conditional branch predictions . In Proceedings of the 29th International Symposium on Microarchitecture. 142--152 . Jacobsen, E., Rotenberg, E., and Smith, J. E. 1996. Assigning confidence to conditional branch predictions. In Proceedings of the 29th International Symposium on Microarchitecture. 142--152."},{"volume-title":"Workshop on Memory Performance Issues.","author":"Karkhanis T.","key":"e_1_2_1_14_1","unstructured":"Karkhanis , T. and Smith , J. E . 2002. A day in the life of a data cache miss . In Workshop on Memory Performance Issues. Karkhanis, T. and Smith, J. E. 2002. A day in the life of a data cache miss. In Workshop on Memory Performance Issues."},{"volume-title":"Proceedings of the 29th Annual International Symposium on Computer Architecture. 59--70","author":"Lebeck A. R.","key":"e_1_2_1_15_1","unstructured":"Lebeck , A. R. , Koppanalil , J. , Li , T. , Patwardhan , J. , and Rotenberg , E . 2002. A large, fast instruction window for tolerating cache misses . In Proceedings of the 29th Annual International Symposium on Computer Architecture. 59--70 . Lebeck, A. R., Koppanalil, J., Li, T., Patwardhan, J., and Rotenberg, E. 2002. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th Annual International Symposium on Computer Architecture. 59--70."},{"volume-title":"Proceedings of the 42nd IEEE Computer Society International Conference (COMPCON). 28--36","author":"Leibholz D.","key":"e_1_2_1_16_1","unstructured":"Leibholz , D. and Razdan , R . 1997. The Alpha 21264: A 500 MHz out-of-order execution microprocessor . In Proceedings of the 42nd IEEE Computer Society International Conference (COMPCON). 28--36 . Leibholz, D. and Razdan, R. 1997. The Alpha 21264: A 500 MHz out-of-order execution microprocessor. In Proceedings of the 42nd IEEE Computer Society International Conference (COMPCON). 28--36."},{"volume-title":"Proceedings of the 35th International Symposium on Microarchitecture.","author":"Mart\u00ednez J. F.","key":"e_1_2_1_17_1","unstructured":"Mart\u00ednez , J. F. , Renau , J. , Huang , M. C. , Prvulovic , M. , and Torrellas , J . 2002. Cherry: Checkpointed early resource recycling in out-of-order microprocessors . In Proceedings of the 35th International Symposium on Microarchitecture. Mart\u00ednez, J. F., Renau, J., Huang, M. C., Prvulovic, M., and Torrellas, J. 2002. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35th International Symposium on Microarchitecture."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design. ACM Press","author":"Moshovos A.","year":"2003","unstructured":"Moshovos , A. 2003 . Checkpointing alternatives for high performance, power-aware processors . In Proceedings of the 2003 International Symposium on Low Power Electronics and Design. ACM Press , New York, 318--321. 10.1145\/871506.871585 Moshovos, A. 2003. Checkpointing alternatives for high performance, power-aware processors. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design. ACM Press, New York, 318--321. 10.1145\/871506.871585"},{"volume-title":"Proceedings of the 30th International Symposium on Microarchitecture. 235--245","author":"Moshovos A.","key":"e_1_2_1_19_1","unstructured":"Moshovos , A. and Sohi , G. S . 1997. Streamlining inter-operation memory communication via data dependence prediction . In Proceedings of the 30th International Symposium on Microarchitecture. 235--245 . Moshovos, A. and Sohi, G. S. 1997. Streamlining inter-operation memory communication via data dependence prediction. In Proceedings of the 30th International Symposium on Microarchitecture. 235--245."},{"volume-title":"Proceedings of the 26th International Symposium on Microarchitecture. 202--213","author":"Moudgill M.","key":"e_1_2_1_20_1","unstructured":"Moudgill , M. , Pingali , K. , and Vassiliadis , S . 1993. Register renaming and dynamic speculation: an alternative approach . In Proceedings of the 26th International Symposium on Microarchitecture. 202--213 . Moudgill, M., Pingali, K., and Vassiliadis, S. 1993. Register renaming and dynamic speculation: an alternative approach. In Proceedings of the 26th International Symposium on Microarchitecture. 202--213."},{"volume-title":"Proceedings of the 24th Annual International Symposium on Computer Architecture. 206--218","author":"Palacharla S.","key":"e_1_2_1_21_1","unstructured":"Palacharla , S. , Jouppi , N. P. , and Smith , J. E . 1997. Complexity-effective superscalar processors . In Proceedings of the 24th Annual International Symposium on Computer Architecture. 206--218 . 10.1145\/264107.264201 Palacharla, S., Jouppi, N. P., and Smith, J. E. 1997. Complexity-effective superscalar processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture. 206--218. 10.1145\/264107.264201"},{"volume-title":"Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 199--210","author":"Ranganathan P.","key":"e_1_2_1_22_1","unstructured":"Ranganathan , P. , Pai , V. S. , and Adve , S. V . 1997. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models . In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 199--210 . 10.1145\/258492.258512 Ranganathan, P., Pai, V. S., and Adve, S. V. 1997. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures. 199--210. 10.1145\/258492.258512"},{"volume-title":"Proceedings of the 12th Annual International Symposium on Computer Architecture. 36--44","author":"Smith J. E.","key":"e_1_2_1_23_1","unstructured":"Smith , J. E. and Pleszkun , A. R . 1985. Implementation of precise interrupts in pipelined processors . In Proceedings of the 12th Annual International Symposium on Computer Architecture. 36--44 . Smith, J. E. and Pleszkun, A. R. 1985. Implementation of precise interrupts in pipelined processors. In Proceedings of the 12th Annual International Symposium on Computer Architecture. 36--44."},{"volume-title":"Proceedings of the 29th Annual International Symposium on Computer Architecture. 25--34","author":"Sprangle E.","key":"e_1_2_1_24_1","unstructured":"Sprangle , E. and Carmean , D . 2002. Increasing processor performance by implementing deeper pipelines . In Proceedings of the 29th Annual International Symposium on Computer Architecture. 25--34 . Sprangle, E. and Carmean, D. 2002. Increasing processor performance by implementing deeper pipelines. In Proceedings of the 29th Annual International Symposium on Computer Architecture. 25--34."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.461.0005"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.491460"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1044823.1044826","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1044823.1044826","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:36:48Z","timestamp":1750282608000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1044823.1044826"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,12]]},"references-count":26,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2004,12]]}},"alternative-id":["10.1145\/1044823.1044826"],"URL":"https:\/\/doi.org\/10.1145\/1044823.1044826","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2004,12]]},"assertion":[{"value":"2004-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}