{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T08:27:02Z","timestamp":1774600022001,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2012,3,1]],"date-time":"2012-03-01T00:00:00Z","timestamp":1330560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["9.53E+12"],"award-info":[{"award-number":["9.53E+12"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["CCF-0903447"],"award-info":[{"award-number":["CCF-0903447"]}],"id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DESC0004915"],"award-info":[{"award-number":["DESC0004915"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,3]]},"abstract":"<jats:p>In emerging and future high-end processor systems, tolerating increasing cache miss latency and properly managing memory bandwidth will be critical to achieving high performance. Prefetching, in both hardware and software, is among our most important available techniques for doing so; yet, we claim that prefetching is perhaps also the least well-understood.<\/jats:p>\n          <jats:p>Thus, the goal of this study is to develop a novel, foundational understanding of both the benefits and limitations of hardware and software prefetching. Our study includes: source code-level analysis, to help in understanding the practical strengths and weaknesses of compiler- and software-based prefetching; a study of the synergistic and antagonistic effects between software and hardware prefetching; and an evaluation of hardware prefetching training policies in the presence of software prefetching requests. We use both simulation and measurement on real systems. We find, for instance, that although there are many opportunities for compilers to prefetch much more aggressively than they currently do, there is also a tangible risk of interference with training existing hardware prefetching mechanisms. Taken together, our observations suggest new research directions for cooperative hardware\/software prefetching.<\/jats:p>","DOI":"10.1145\/2133382.2133384","type":"journal-article","created":{"date-parts":[[2012,4,3]],"date-time":"2012-04-03T14:56:22Z","timestamp":1333464982000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":105,"title":["When Prefetching Works, When It Doesn\u2019t, and Why"],"prefix":"10.1145","volume":"9","author":[{"given":"Jaekyu","family":"Lee","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology"}]},{"given":"Hyesoon","family":"Kim","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology"}]},{"given":"Richard","family":"Vuduc","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology"}]}],"member":"320","published-online":{"date-parts":[[2012,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 12th International Conference on Parallel Architecture and Compilation Technology. IEEE","author":"Al-Sukhni H.","unstructured":"Al-Sukhni , H. , Bratt , I. , and Connors , D. A . 2003. Compiler-directed content-aware prefetching for dynamic data structures . In Proceedings of the 12th International Conference on Parallel Architecture and Compilation Technology. IEEE , Los Alamitos, CA, 91--100. Al-Sukhni, H., Bratt, I., and Connors, D. A. 2003. Compiler-directed content-aware prefetching for dynamic data structures. In Proceedings of the 12th International Conference on Parallel Architecture and Compilation Technology. IEEE, Los Alamitos, CA, 91--100."},{"key":"e_1_2_1_2_1","unstructured":"AMD. AMD Phenom II Processors. http:\/\/www.amd.com\/us\/products\/desktop\/processors\/phenom-ii\/Pages\/phenom-ii.aspx. AMD . AMD Phenom II Processors. http:\/\/www.amd.com\/us\/products\/desktop\/processors\/phenom-ii\/Pages\/phenom-ii.aspx."},{"key":"e_1_2_1_3_1","volume-title":"-W","author":"Badawy A.-H. A.","year":"2004","unstructured":"Badawy , A.-H. A. , Aggarwal , A. , Yeung , D. , and Tseng , C . -W . 2004 . The efficacy of software prefetching and locality optimizations on future memory systems. J. Instruct.-Level Parallelism 6. Badawy, A.-H. A., Aggarwal, A., Yeung, D., and Tseng, C.-W. 2004. The efficacy of software prefetching and locality optimizations on future memory systems. J. Instruct.-Level Parallelism 6."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.125932"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/106972.106979"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272743.1272747"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/191995.192030"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 34th International Symposium on Microarchitecture. IEEE Computer Society","author":"Collins J. D.","unstructured":"Collins , J. D. , Tullsen , D. M. , Wang , H. , and Shen , J. P . 2001. Dynamic speculative precomputation . In Proceedings of the 34th International Symposium on Microarchitecture. IEEE Computer Society , Los Alamitos, CA, 306--317. Collins, J. D., Tullsen, D. M., Wang, H., and Shen, J. P. 2001. Dynamic speculative precomputation. In Proceedings of the 34th International Symposium on Microarchitecture. IEEE Computer Society, Los Alamitos, CA, 306--317."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 35th International Symposium on Microarchitecture. IEEE Computer Society Press","author":"Collins J. D.","unstructured":"Collins , J. D. , Sair , S. , Calder , B. , and Tullsen , D. M . 2002. Pointer cache assisted prefetching . In Proceedings of the 35th International Symposium on Microarchitecture. IEEE Computer Society Press , Los Alamitos, CA, 62--73. Collins, J. D., Sair, S., Calder, B., and Tullsen, D. M. 2002. Pointer cache assisted prefetching. In Proceedings of the 35th International Symposium on Microarchitecture. IEEE Computer Society Press, Los Alamitos, CA, 62--73."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/605397.605427"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 15th International Symposium on High Performance Computer Architecture. IEEE Computer Society","author":"Ebrahimi E.","unstructured":"Ebrahimi , E. , Mutlu , O. , and Patt , Y. N . 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems . In Proceedings of the 15th International Symposium on High Performance Computer Architecture. IEEE Computer Society , Los Alamitos, CA, 7--17. Ebrahimi, E., Mutlu, O., and Patt, Y. N. 2009. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In Proceedings of the 15th International Symposium on High Performance Computer Architecture. IEEE Computer Society, Los Alamitos, CA, 7--17."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.491.0127"},{"key":"e_1_2_1_13_1","unstructured":"GCC-4.0. GNU compiler collection. http:\/\/gcc.gnu.org\/. GCC-4.0 . GNU compiler collection. http:\/\/gcc.gnu.org\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.32"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 15th International Symposium on High Perf Compo Architecture. IEEE Computer Society","author":"Hur I.","unstructured":"Hur , I. and Lin , C . 2009. Feedback mechanisms for improving probabilistic memory prefetching . In Proceedings of the 15th International Symposium on High Perf Compo Architecture. IEEE Computer Society , Los Alamitos, CA, 443--454. Hur, I. and Lin, C. 2009. Feedback mechanisms for improving probabilistic memory prefetching. In Proceedings of the 15th International Symposium on High Perf Compo Architecture. IEEE Computer Society, Los Alamitos, CA, 443--454."},{"key":"e_1_2_1_16_1","unstructured":"ICC. Intel C++ compiler. http:\/\/www.intel.comlcd\/software\/products\/asmo-na\/eng\/compilers\/clin\/277618.htm. ICC . Intel C++ compiler. http:\/\/www.intel.comlcd\/software\/products\/asmo-na\/eng\/compilers\/clin\/277618.htm."},{"key":"e_1_2_1_17_1","unstructured":"Intel. 2004. Intel Pentium M Processor. http:\/\/www.intel.com\/design\/intarch\/pentiumm\/pentiumm.htm. Intel . 2004. Intel Pentium M Processor. http:\/\/www.intel.com\/design\/intarch\/pentiumm\/pentiumm.htm."},{"key":"e_1_2_1_18_1","unstructured":"Intel. 2007. Intel core microarchitecture. http:\/\/www.intel.com\/technology\/45nm\/index.htm?iid=tech_micro+45nm. Intel . 2007. Intel core microarchitecture. http:\/\/www.intel.com\/technology\/45nm\/index.htm?iid=tech_micro+45nm."},{"key":"e_1_2_1_19_1","unstructured":"Intel. 2008. Intel AVX. http:\/\/software.intel.com\/en-us\/avx. Intel . 2008. Intel AVX. http:\/\/software.intel.com\/en-us\/avx."},{"key":"e_1_2_1_20_1","unstructured":"Intel. 2009. Intel Nehalem microarchitecture. http:\/\/www.intel.com\/technology\/architecture-silicon\/next-gen\/index.htm?iid=tech_micro+nehalem. Intel . 2009. Intel Nehalem microarchitecture. http:\/\/www.intel.com\/technology\/architecture-silicon\/next-gen\/index.htm?iid=tech_micro+nehalem."},{"key":"e_1_2_1_21_1","unstructured":"Intel. 2011. Intel 64 and IA-32 Architectures Software Developer\u2019s Manual. http:\/\/www3.intel.com\/Assets\/PDF\/manual\/253667.pdf. Intel . 2011. Intel 64 and IA-32 Architectures Software Developer\u2019s Manual . http:\/\/www3.intel.com\/Assets\/PDF\/manual\/253667.pdf."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society","author":"Jerger N.","unstructured":"Jerger , N. , Hill , E. , and Lipasti , M . 2006. Friendly fire: Understanding the effects of multiprocessor prefetches . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society , Los Alamitos, CA, 177--188. Jerger, N., Hill, E., and Lipasti, M. 2006. Friendly fire: Understanding the effects of multiprocessor prefetches. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society, Los Alamitos, CA, 177--188."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/264107.264207"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 12th International Symposium on Computer Architecture. ACM","author":"Jouppi N. P.","year":"1990","unstructured":"Jouppi , N. P. 1990 . Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers . In Proceedings of the 12th International Symposium on Computer Architecture. ACM , New York, NY, 388--397. Jouppi, N. P. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 12th International Symposium on Computer Architecture. ACM, New York, NY, 388--397."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 3rd International Symposium on Computer Architecture. IEEE Computer Society Press","author":"Kroft D.","year":"1981","unstructured":"Kroft , D. 1981 . Lockup-free instruction fetch\/prefetch cache organization . In Proceedings of the 3rd International Symposium on Computer Architecture. IEEE Computer Society Press , Los Alamitos, CA, 81--87. Kroft, D. 1981. Lockup-free instruction fetch\/prefetch cache organization. In Proceedings of the 3rd International Symposium on Computer Architecture. IEEE Computer Society Press, Los Alamitos, CA, 81--87."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379259"},{"key":"e_1_2_1_27_1","volume-title":"SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th International Symposium on Microarchitecture","author":"Lipasti M. H.","year":"1995","unstructured":"Lipasti , M. H. , Schmidt , W. J. , Kunkel , S. R. , and Roediger , R. R . 1995 . SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th International Symposium on Microarchitecture . IEEE Computer Society Press , Los Alamitos, CA , 232--236. Lipasti, M. H., Schmidt, W. J., Kunkel, S. R., and Roediger, R. R. 1995. SPAID: Software prefetching in pointer- and call-intensive environments. In Proceedings of the 28th International Symposium on Microarchitecture. IEEE Computer Society Press, Los Alamitos, CA, 232--236."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379250"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237190"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 31st International Symposium on Microarchitecture. IEEE Computer Society Press","author":"Luk C.-K.","unstructured":"Luk , C.-K. and Mowry , T. C . 1998. Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors . In Proceedings of the 31st International Symposium on Microarchitecture. IEEE Computer Society Press , Los Alamitos, CA, 182--194. Luk, C.-K. and Mowry, T. C. 1998. Cooperative prefetching: Compiler and hardware support for effective instruction prefetching in modern processors. In Proceedings of the 31st International Symposium on Microarchitecture. IEEE Computer Society Press, Los Alamitos, CA, 182--194."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/143365.143488"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1026003"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2004.10030"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 10th International Conference on Parallel Architecture and Compilation Technology. IEEE Computer Society","author":"Pai V. S.","unstructured":"Pai , V. S. and Adve , S. V . 2001. Comparing and combining read miss clustering and software prefetching . In Proceedings of the 10th International Conference on Parallel Architecture and Compilation Technology. IEEE Computer Society , Los Alamitos, CA, 292--303. Pai, V. S. and Adve, S. V. 2001. Comparing and combining read miss clustering and software prefetching. In Proceedings of the 10th International Conference on Parallel Architecture and Compilation Technology. IEEE Computer Society, Los Alamitos, CA, 292--303."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.28"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.25"},{"key":"e_1_2_1_37_1","unstructured":"Pin. A binary instrumentation tool. http:\/\/www.pintool.org. Pin . A binary instrumentation tool. http:\/\/www.pintool.org."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/300979.300989"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 5th International Conference on Parallel Architecture and Compilation Technology. IEEE Computer Society","author":"Saavedra R. H.","unstructured":"Saavedra , R. H. and Park , D . 1996. Improving the effectiveness of software prefetching with adaptive execution . In Proceedings of the 5th International Conference on Parallel Architecture and Compilation Technology. IEEE Computer Society , Los Alamitos, CA, 68--78. Saavedra, R. H. and Park, D. 1996. Improving the effectiveness of software prefetching with adaptive execution. In Proceedings of the 5th International Conference on Parallel Architecture and Compilation Technology. IEEE Computer Society, Los Alamitos, CA, 68--78."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504208"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346185"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.461.0005"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/358923.358939"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859663"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/512529.512555"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1044823.1044827"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.4"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/379240.379246"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2133382.2133384","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2133382.2133384","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:05Z","timestamp":1750241165000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2133382.2133384"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,3]]}},"alternative-id":["10.1145\/2133382.2133384"],"URL":"https:\/\/doi.org\/10.1145\/2133382.2133384","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,3]]},"assertion":[{"value":"2010-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}