{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:41:52Z","timestamp":1750308112529,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2005,9,1]],"date-time":"2005-09-01T00:00:00Z","timestamp":1125532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Program. Lang. Syst."],"published-print":{"date-parts":[[2005,9]]},"abstract":"<jats:p>Caches have become increasingly important with the widening gap between main memory and processor speeds. Small and fast cache memories are designed to bridge this discrepancy. However, they are only effective when programs exhibit sufficient data locality.The performance of the memory hierarchy can be improved by means of data and loop transformations. Tiling is a loop transformation that aims at reducing capacity misses by shortening the reuse distance. Padding is a data layout transformation targeted to reduce conflict misses.This article presents an accurate cost model that describes misses across different hierarchy levels and considers the effects of other hardware components such as branch predictors. The cost model drives the application of tiling and padding transformations. We combine the cost model with a genetic algorithm to compute the tile and pad factors that enhance the program performance.To validate our strategy, we ran experiments for a set of benchmarks on a large set of modern architectures. Our results show that this scheme is useful to optimize programs' performance. When compared to previous approaches, we observe that with a reasonable compile-time overhead, our approach gives significant performance improvements for all studied kernels on all architectures.<\/jats:p>","DOI":"10.1145\/1086642.1086646","type":"journal-article","created":{"date-parts":[[2005,11,7]],"date-time":"2005-11-07T16:00:45Z","timestamp":1131379245000},"page":"946-987","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["An accurate cost model for guiding data locality transformations"],"prefix":"10.1145","volume":"27","author":[{"given":"Xavier","family":"Vera","sequence":"first","affiliation":[{"name":"M\u00e4lardalens H\u00f6gskola, V\u00e4ster\u00e5s, Sweden"}]},{"given":"Jaume","family":"Abella","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya-Barcelona, Barcelona, Spain"}]},{"given":"Josep","family":"Llosa","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya-Barcelona, Barcelona, Spain"}]},{"given":"Antonio","family":"Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya-Barcelona, Barcelona, Spain"}]}],"member":"320","published-online":{"date-parts":[[2005,9]]},"reference":[{"volume-title":"Dependence Analysis for Supercomputing","author":"Banerjee U.","key":"e_1_2_1_1_1","unstructured":"Banerjee , U. 1988. Dependence Analysis for Supercomputing . Kluwer Academic Publishers .]] Banerjee, U. 1988. Dependence Analysis for Supercomputing. Kluwer Academic Publishers.]]"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Belady L. A. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J.]]  Belady L. A. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J.]]","DOI":"10.1147\/sj.52.0078"},{"volume-title":"Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'00)","author":"Bermudo N.","key":"e_1_2_1_3_1","unstructured":"Bermudo , N. , Vera , X. , Gonz\u00e1lez , A. , and Llosa , J . 2000. An efficient solver for cache miss equations . In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'00) . IEEE Computer Society Press, Los Alamitos, CA.]] Bermudo, N., Vera, X., Gonz\u00e1lez, A., and Llosa, J. 2000. An efficient solver for cache miss equations. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'00). IEEE Computer Society Press, Los Alamitos, CA.]]"},{"volume-title":"Proceedings of the 18th International Symposium on Computer Architecture (ISCA'91)","author":"Butler M.","key":"e_1_2_1_4_1","unstructured":"Butler , M. , Yeh , T.-Y. , Patt , Y. , Alsup , M. , Sales , H. , and Shebanow , M . 1991. Instruction level parallelism is greater than two . In Proceedings of the 18th International Symposium on Computer Architecture (ISCA'91) . 276--286.]] 10.1145\/115952.115980 Butler, M., Yeh, T.-Y., Patt, Y., Alsup, M., Sales, H., and Shebanow, M. 1991. Instruction level parallelism is greater than two. In Proceedings of the 18th International Symposium on Computer Architecture (ISCA'91). 276--286.]] 10.1145\/115952.115980"},{"volume-title":"Proceedings of Supercomputing (SC'92)","author":"Carr S.","key":"e_1_2_1_5_1","unstructured":"Carr , S. and Kennedy , K . 1992. Compiler blockability of numerical algorithms . In Proceedings of Supercomputing (SC'92) . 114--124.]] Carr, S. and Kennedy, K. 1992. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing (SC'92). 114--124.]]"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of VI International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'94)","author":"Carr S.","year":"1954","unstructured":"Carr , S. , McKinley , K. , and Tseng , C . -W. 1994. Compiler optimizations for improving data locality . In Proceedings of VI International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'94) . 252--262.]] 10.1145\/ 1954 73.195557 Carr, S., McKinley, K., and Tseng, C.-W. 1994. Compiler optimizations for improving data locality. In Proceedings of VI International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'94). 252--262.]] 10.1145\/195473.195557"},{"volume-title":"Proceedings of ACM International Conference on Supercomputing (ICS'99)","author":"Chatterjee S.","key":"e_1_2_1_7_1","unstructured":"Chatterjee , S. , Jain , V. V. , Lebeck , A. R. , Mundhra , S. , and Thottethodi , M . 1999. Nonlinear array layout for hierarchical memory systems . In Proceedings of ACM International Conference on Supercomputing (ICS'99) (Rhodes, Greece). ACM, New York, 444--453.]] 10.1145\/305138.305231 Chatterjee, S., Jain, V. V., Lebeck, A. R., Mundhra, S., and Thottethodi, M. 1999. Nonlinear array layout for hierarchical memory systems. In Proceedings of ACM International Conference on Supercomputing (ICS'99) (Rhodes, Greece). ACM, New York, 444--453.]] 10.1145\/305138.305231"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of ACM International Conference on Supercomputing (ICS'96)","author":"Clauss P.","year":"1996","unstructured":"Clauss , P. 1996 . Counting solutions to linear and non-linear constraints through Ehrhart polynomials . In Proceedings of ACM International Conference on Supercomputing (ICS'96) (Philadelphia, PA). ACM, New York, 278--285.]] 10.1145\/237578.237617 Clauss, P. 1996. Counting solutions to linear and non-linear constraints through Ehrhart polynomials. In Proceedings of ACM International Conference on Supercomputing (ICS'96) (Philadelphia, PA). ACM, New York, 278--285.]] 10.1145\/237578.237617"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95)","author":"Coleman S.","year":"1995","unstructured":"Coleman , S. and McKinley , K. S. 1995 . Tile size selection using cache organization and data layout . In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95) . ACM, New York, 279--290.]] 10.1145\/ 207110.207162 Coleman, S. and McKinley, K. S. 1995. Tile size selection using cache organization and data layout. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'95). ACM, New York, 279--290.]] 10.1145\/207110.207162"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/0217060"},{"key":"e_1_2_1_11_1","volume-title":"-B","author":"Ermoliev Y.","year":"1988","unstructured":"Ermoliev , Y. and Wets , R. J . -B . 1988 . Numerical Techniques for Stochastic Optimization. Springer-Verlag , New York.]] Ermoliev, Y. and Wets, R. J.-B. 1988. Numerical Techniques for Stochastic Optimization. Springer-Verlag, New York.]]"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(88)90014-7"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/325478.325479"},{"key":"e_1_2_1_15_1","unstructured":"Gill P. E. Murray W. and Wright M. H. 1981. Practical optimization. Academic Press Orlando FL.]]  Gill P. E. Murray W. and Wright M. H. 1981. Practical optimization. Academic Press Orlando FL.]]"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Glover F. and Laguna M. 1997. Tabu search. Kluwer.]]   Glover F. and Laguna M. 1997. Tabu search. Kluwer.]]","DOI":"10.1007\/978-1-4615-6089-0"},{"volume-title":"Genetic algorithms in search, optimizations and machine learning","author":"Goldberg D. E.","key":"e_1_2_1_17_1","unstructured":"Goldberg , D. E. 1989. Genetic algorithms in search, optimizations and machine learning . Addison-Wesley , Reading, MA .]] Goldberg, D. E. 1989. Genetic algorithms in search, optimizations and machine learning. Addison-Wesley, Reading, MA.]]"},{"key":"e_1_2_1_18_1","unstructured":"Hansen P. Jaumard B. and Mathon V. 1995. Constrained nonlinear 0-1 programming. ORSA J. Comput.]]  Hansen P. Jaumard B. and Mathon V. 1995. Constrained nonlinear 0-1 programming. ORSA J. Comput.]]"},{"volume-title":"Adaptation in natural and artificial systems","author":"Holland J.","key":"e_1_2_1_19_1","unstructured":"Holland , J. 1975. Adaptation in natural and artificial systems . The University of Michigan Press , Ann Arbor, MI .]] Holland, J. 1975. Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor, MI.]]"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Horst R. Pardalos P. M. and Thoai N. V. 1995. Introduction to Global Optimization. Kluwer Academic Publishers.]]   Horst R. Pardalos P. M. and Thoai N. V. 1995. Introduction to Global Optimization. Kluwer Academic Publishers.]]","DOI":"10.1007\/978-1-4615-2025-2"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.752779"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Kirkpatrick S. Gelatt C. D. and Vecchi M. P. 1983. Optimization by simulated annealing. Science 220.]]  Kirkpatrick S. Gelatt C. D. and Vecchi M. P. 1983. Optimization by simulated annealing. Science 220.]]","DOI":"10.1126\/science.220.4598.671"},{"volume-title":"Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91)","author":"Lam M.","key":"e_1_2_1_23_1","unstructured":"Lam , M. , Rothberg , E. E. , and Wolf , M. E . 1991. The cache performance of blocked algorithms . In Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91) .]] 10.1145\/106972.106981 Lam, M., Rothberg, E. E., and Wolf, M. E. 1991. The cache performance of blocked algorithms. In Proceedings of IV International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'91).]] 10.1145\/106972.106981"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/233561.233564"},{"volume-title":"Proceedings of VII International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'96)","author":"McKinley K. S.","key":"e_1_2_1_26_1","unstructured":"McKinley , K. S. and Temam , O . 1996. A quantitative analysis of loop nest locality . In Proceedings of VII International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'96) .]] 10.1145\/237090.237161 McKinley, K. S. and Temam, O. 1996. A quantitative analysis of loop nest locality. In Proceedings of VII International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'96).]] 10.1145\/237090.237161"},{"volume-title":"Genetic algorithms&plus;Data structures&equals;Evolution Programs","author":"Michalewicz Z.","key":"e_1_2_1_27_1","unstructured":"Michalewicz , Z. 1994. Genetic algorithms&plus;Data structures&equals;Evolution Programs . Springer-Verlag , New York .]] Michalewicz, Z. 1994. Genetic algorithms&plus;Data structures&equals;Evolution Programs. Springer-Verlag, New York.]]"},{"volume-title":"Proceedings of V International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'92)","author":"Mowry T.","key":"e_1_2_1_28_1","unstructured":"Mowry , T. , Lam , M. , and Gupta , A . 1992. Design and evaluation of a compiler algorithm for prefetching . In Proceedings of V International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'92) . 62--73.]] 10.1145\/143365.143488 Mowry, T., Lam, M., and Gupta, A. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of V International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'92). 62--73.]] 10.1145\/143365.143488"},{"volume-title":"Proceedings of International Conference on Principles of Programming Languages (POPL'02)","author":"Petrank E.","key":"e_1_2_1_29_1","unstructured":"Petrank , E. and Rawitz , D . 2002. Hardness of cache conscious data placement . In Proceedings of International Conference on Principles of Programming Languages (POPL'02) .]] 10.1145\/503272.503283 Petrank, E. and Rawitz, D. 2002. Hardness of cache conscious data placement. In Proceedings of International Conference on Principles of Programming Languages (POPL'02).]] 10.1145\/503272.503283"},{"volume-title":"Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98)","author":"Rivera G.","key":"e_1_2_1_30_1","unstructured":"Rivera , G. and Tseng , C . -W. 1998a. Data transformations for eliminating conflict misses . In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98) . ACM, New York, 38--49.]] 10.1145\/277650.277661 Rivera, G. and Tseng, C.-W. 1998a. Data transformations for eliminating conflict misses. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98). ACM, New York, 38--49.]] 10.1145\/277650.277661"},{"volume-title":"Proceedings of ACM International Conference on Supercomputing (ICS'98)","author":"Rivera G.","key":"e_1_2_1_31_1","unstructured":"Rivera , G. and Tseng , C . -W. 1998b. Eliminating conflict misses for high-performance architectures . In Proceedings of ACM International Conference on Supercomputing (ICS'98) . ACM, New York.]] 10.1145\/277830.277917 Rivera, G. and Tseng, C.-W. 1998b. Eliminating conflict misses for high-performance architectures. In Proceedings of ACM International Conference on Supercomputing (ICS'98). ACM, New York.]] 10.1145\/277830.277917"},{"volume-title":"Proceedings of the 8th International Conference on Compiler Construction (CC'99)","author":"Rivera G.","key":"e_1_2_1_32_1","unstructured":"Rivera , G. and Tseng , C . -W. 1999a. A comparison of compiler tiling algorithms . In Proceedings of the 8th International Conference on Compiler Construction (CC'99) .]] Rivera, G. and Tseng, C.-W. 1999a. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99).]]"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331534"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.877948"},{"volume-title":"Proceedings of Supercomputing (SC'93)","author":"Temam O.","key":"e_1_2_1_35_1","unstructured":"Temam , O. , Granston , E. , and Jalby , W . 1993. To copy or not to copy: A compile-time technique for accessing when data copying should be used to eliminate cache conflicts . In Proceedings of Supercomputing (SC'93) . 410--419.]] 10.1145\/169627.169762 Temam, O., Granston, E., and Jalby, W. 1993. To copy or not to copy: A compile-time technique for accessing when data copying should be used to eliminate cache conflicts. In Proceedings of Supercomputing (SC'93). 410--419.]] 10.1145\/169627.169762"},{"key":"e_1_2_1_36_1","unstructured":"Torn A. and Zilinskas A. 1989. Global optimization. Springer-Verlag New York.]]   Torn A. and Zilinskas A. 1989. Global optimization. Springer-Verlag New York.]]"},{"volume-title":"Complexity Issues","author":"Vavasis S. A.","key":"e_1_2_1_37_1","unstructured":"Vavasis , S. A. 1991. Nonlinear Optimization , Complexity Issues . Oxford University Press .]] Vavasis, S. A. 1991. Nonlinear Optimization, Complexity Issues. Oxford University Press.]]"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/973097.973099"},{"volume-title":"Proceedings of European Conference on Parallel Computing (Europar'00)","author":"Vera X.","key":"e_1_2_1_39_1","unstructured":"Vera , X. , Llosa , J. , Gonz\u00e1lez , A. , and Bermudo , N . 2000. A fast and accurate approach to analyze cache memory behavior . In Proceedings of European Conference on Parallel Computing (Europar'00) .]] Vera, X., Llosa, J., Gonz\u00e1lez, A., and Bermudo, N. 2000. A fast and accurate approach to analyze cache memory behavior. In Proceedings of European Conference on Parallel Computing (Europar'00).]]"},{"volume-title":"Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91)","author":"Wolf M.","key":"e_1_2_1_40_1","unstructured":"Wolf , M. and Lam , M . 1991. A data locality optimizing algorithm . In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91) . ACM, New York, 30--44.]] 10.1145\/113445.113449 Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'91). ACM, New York, 30--44.]] 10.1145\/113445.113449"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of International Conference on Parallel Processing (ICPP'96)","author":"Wolfe M.","year":"1996","unstructured":"Wolfe , M. 1996 . Advanced loop interchanging . In Proceedings of International Conference on Parallel Processing (ICPP'96) .]] Wolfe, M. 1996. Advanced loop interchanging. In Proceedings of International Conference on Parallel Processing (ICPP'96).]]"}],"container-title":["ACM Transactions on Programming Languages and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1086642.1086646","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1086642.1086646","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:12Z","timestamp":1750262892000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1086642.1086646"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,9]]},"references-count":39,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2005,9]]}},"alternative-id":["10.1145\/1086642.1086646"],"URL":"https:\/\/doi.org\/10.1145\/1086642.1086646","relation":{},"ISSN":["0164-0925","1558-4593"],"issn-type":[{"type":"print","value":"0164-0925"},{"type":"electronic","value":"1558-4593"}],"subject":[],"published":{"date-parts":[[2005,9]]},"assertion":[{"value":"2005-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}