{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:43:08Z","timestamp":1760028188072,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2011,10,1]],"date-time":"2011-10-01T00:00:00Z","timestamp":1317427200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2011,10]]},"abstract":"<jats:p>This article proposes techniques to predict the performance impact of pending cache hits, hardware prefetching, and miss status holding register resources on superscalar microprocessors using hybrid analytical models. The proposed models focus on timeliness of pending hits and prefetches and account for a limited number of MSHRs. They improve modeling accuracy of pending hits by 3.9\u00d7 and when modeling data prefetching, a limited number of MSHRs, or both, these techniques result in average errors of 9.5% to 17.8%. The impact of non-uniform DRAM memory latency is shown to be approximated well by using a moving average of memory access latency.<\/jats:p>","DOI":"10.1145\/2019608.2019609","type":"journal-article","created":{"date-parts":[[2011,10,18]],"date-time":"2011-10-18T13:01:58Z","timestamp":1318942918000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs"],"prefix":"10.1145","volume":"8","author":[{"given":"Xi E.","family":"Chen","sequence":"first","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tor M.","family":"Aamodt","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,10,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/63404.63407"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.125932"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/381718.381727"},{"key":"e_1_2_1_4_1","first-page":"1","article-title":"The microarchitecture of the Intel Pentium 4 processor on 90mm technology","volume":"8","author":"Boggs D.","year":"2004","unstructured":"Boggs , D. , Baktha , A. , Hawkins , J. , Marr , D. T. , Miller , J. A. , Roussel , P. , Singhal , R. , Toll , B. , and Venkatraman , K. 2004 . The microarchitecture of the Intel Pentium 4 processor on 90mm technology . Intel Technol. J. 8 , 1 . Boggs, D., Baktha, A., Hawkins, J., Marr, D. T., Miller, J. A., Roussel, P., Singhal, R., Toll, B., and Venkatraman, K. 2004. The microarchitecture of the Intel Pentium 4 processor on 90mm technology. Intel Technol. J. 8, 1.","journal-title":"Intel Technol. J."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/268806.268810"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771779"},{"volume-title":"Proceedings of the 4th Workshop on Modeling, Benchmarking and Simulation 7--16","author":"Chen X. E.","key":"e_1_2_1_8_1","unstructured":"Chen , X. E. and Aamodt , T. M . 2008b. An improved analytical superscalar microprocessor memory model . In Proceedings of the 4th Workshop on Modeling, Benchmarking and Simulation 7--16 . Chen, X. E. and Aamodt, T. M. 2008b. An improved analytical superscalar microprocessor memory model. In Proceedings of the 4th Workshop on Modeling, Benchmarking and Simulation 7--16."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.183.0194"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1044823.1044825"},{"key":"e_1_2_1_11_1","unstructured":"Eeckhout L. 2008. Personal communication.  Eeckhout L. 2008. Personal communication."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982918"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168880"},{"volume-title":"Proceedings of the 1st International Symposium on High-Performance Computer Architecture. 78","author":"Farkas K. I.","key":"e_1_2_1_15_1","unstructured":"Farkas , K. I. , Jouppi , N. P. , and Chow , P . 1995. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? In Proceedings of the 1st International Symposium on High-Performance Computer Architecture. 78 . Farkas, K. I., Jouppi, N. P., and Chow, P. 1995. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? In Proceedings of the 1st International Symposium on High-Performance Computer Architecture. 78."},{"key":"e_1_2_1_16_1","first-page":"696","article-title":"Buffer block prefetching method","volume":"20","author":"Gindele J. D.","year":"1977","unstructured":"Gindele , J. D. 1977 . Buffer block prefetching method . IBM Techn. Disclo. Bull. 20 , 2, 696 -- 697 . Gindele, J. D. 1977. Buffer block prefetching method. IBM Techn. Disclo. Bull. 20, 2, 696--697.","journal-title":"IBM Techn. Disclo. Bull."},{"key":"e_1_2_1_17_1","first-page":"1","article-title":"The microarchitecture of the Pentium 4 processor","volume":"5","author":"Hinton G.","year":"2001","unstructured":"Hinton , G. , Sager , D. , Upton , M. , Boggs , D. , Carmean , D. , Kyker , A. , and Roussel , P. 2001 . The microarchitecture of the Pentium 4 processor . Intel Techn. J. 5 , 1 . Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P. 2001. The microarchitecture of the Pentium 4 processor. Intel Techn. J. 5, 1.","journal-title":"Intel Techn. J."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.543711"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/325164.325162"},{"volume-title":"Proceedings of the 31st Annual International Symposium on Computer Architecture. 338--349","author":"Karkhanis T. S.","key":"e_1_2_1_21_1","unstructured":"Karkhanis , T. S. and Smith , J. E . 2004. A first-order superscalar processor model . In Proceedings of the 31st Annual International Symposium on Computer Architecture. 338--349 . Karkhanis, T. S. and Smith, J. E. 2004. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture. 338--349."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250712"},{"key":"e_1_2_1_23_1","unstructured":"Karkhanis T. S. 2008. Personal Communication.  Karkhanis T. S. 2008. Personal Communication."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/800052.801868"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.192.0133"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 2--10","author":"Michaud P.","key":"e_1_2_1_26_1","unstructured":"Michaud , P. , Seznec , A. , and Jourdan , S . 1999. Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors . In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 2--10 . Michaud, P., Seznec, A., and Jourdan, S. 1999. Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 2--10."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026431920605"},{"key":"e_1_2_1_28_1","unstructured":"Micron Technology Inc. 2Gb DDR2 SDRAM Component : MT47H128MI6HG-25. http:\/\/download.micron. com\/pdf\/datasheets\/dram\/ddr2\/2gbddr2.pdf.  Micron Technology Inc. 2Gb DDR2 SDRAM Component : MT47H128MI6HG-25. http:\/\/download.micron. com\/pdf\/datasheets\/dram\/ddr2\/2gbddr2.pdf."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/192724.192730"},{"volume-title":"Proceedings of the 3rd International Symposium on High-Performance Computer Architecture. 298--309","author":"Noonburg D. B.","key":"e_1_2_1_30_1","unstructured":"Noonburg , D. B. and Shen , J. P . 1997. A framework for statistical modeling of superscalar processor performance . In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture. 298--309 . Noonburg, D. B. and Shen, J. P. 1997. A framework for statistical modeling of superscalar processor performance. In Proceedings of the 3rd International Symposium on High-Performance Computer Architecture. 298--309."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605403"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/356887.356892"},{"volume-title":"Performance Evaluation Corporation. SPEC CPU2000 benchmarks. http:\/\/www.spec.org.","author":"Standard","key":"e_1_2_1_34_1","unstructured":"Standard Performance Evaluation Corporation. SPEC CPU2000 benchmarks. http:\/\/www.spec.org. Standard Performance Evaluation Corporation. SPEC CPU2000 benchmarks. http:\/\/www.spec.org."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.44"},{"volume-title":"Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture. 271--282","author":"Vachharajani M.","key":"e_1_2_1_36_1","unstructured":"Vachharajani , M. , Vachharajani , N. , Penry , D. A. , Blome , J. A. , and August , D. I . 2002. Microarchitectural exploration with liberty . In Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture. 271--282 . Vachharajani, M., Vachharajani, N., Penry, D. A., Blome, J. A., and August, D. I. 2002. Microarchitectural exploration with liberty. In Proceedings of the 35th Annual ACM\/IEEE International Symposium on Microarchitecture. 271--282."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859629"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2006.404"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/503205.503206"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2019608.2019609","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2019608.2019609","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:07:42Z","timestamp":1750273662000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2019608.2019609"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,10]]}},"alternative-id":["10.1145\/2019608.2019609"],"URL":"https:\/\/doi.org\/10.1145\/2019608.2019609","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2011,10]]},"assertion":[{"value":"2010-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}