{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:07:37Z","timestamp":1763467657097,"version":"3.41.0"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2004,11,1]],"date-time":"2004-11-01T00:00:00Z","timestamp":1099267200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Program. Lang. Syst."],"published-print":{"date-parts":[[2004,11]]},"abstract":"<jats:p>Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation and signal processing. Such loops iteratively modify the same array elements over different time steps, which presents opportunities for the compiler to improve the temporal data locality through loop tiling. This article presents a compiler framework for automatic tiling of iterative stencil loops, with the objective of improving the cache performance. The article first presents a technique which allows loop tiling to satisfy data dependences in spite of the difficulty created by imperfectly nested inner loops. It does so by skewing the inner loops over the time steps and by applying a uniform skew factor to all loops at the same nesting level. Based on a memory cost analysis, the article shows that the skew factor must be minimized at every loop level in order to minimize cache misses. A graph-theoretical algorithm, which takes polynomial time, is presented to determine the minimum skew factor. Furthermore, the memory-cost analysis derives the tile size which minimizes capacity misses. Given the tile size, an efficient and general &lt;i&gt;array-padding&lt;\/i&gt; scheme is applied to remove conflict misses. Experiments were conducted on 16 test programs and preliminary results showed an average speedup of 1.58 and a maximum speedup of 5.06 across those test programs.<\/jats:p>","DOI":"10.1145\/1034774.1034777","type":"journal-article","created":{"date-parts":[[2005,1,26]],"date-time":"2005-01-26T16:35:53Z","timestamp":1106757353000},"page":"975-1028","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":52,"title":["Automatic tiling of iterative stencil loops"],"prefix":"10.1145","volume":"26","author":[{"given":"Zhiyuan","family":"Li","sequence":"first","affiliation":[{"name":"Purdue University, West Lafayette, IN"}]},{"given":"Yonghong","family":"Song","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, IN"}]}],"member":"320","published-online":{"date-parts":[[2004,11]]},"reference":[{"volume-title":"MUDPACK: Multigrid Software for Elliptic Partial Differential Equations. Available on line at http:\/\/www.scd.ucar.edu\/css\/software\/mudpack\/.]]","year":"1999","author":"Admas J. C.","key":"e_1_2_1_1_1"},{"volume-title":"Proceedings of the 2000 International Conference on Supercomputing (Santa FE, NM). 141--152","author":"Ahmed N.","key":"e_1_2_1_2_1"},{"volume-title":"Network Flows: Theory, Algorithms, and Applications","year":"1993","author":"Ahuja R.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/212094.212131"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"volume-title":"Proceedings of the SIAM Conference on Parallel Processing for Scientific Computing","author":"Andersen B. S.","key":"e_1_2_1_6_1"},{"volume-title":"Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","year":"2099","author":"Anderson J. M.","key":"e_1_2_1_7_1"},{"volume-title":"Proceedings of CASCON'94","author":"Bacon D.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.737695"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1016\/S0167-8191(99)00012-5","article-title":"Static tiling for heterogeneous computing platforms","volume":"25","author":"Boulet P.","year":"1999","journal-title":"Parall. Comput."},{"volume-title":"Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation. 275--384","author":"Briggs P.","key":"e_1_2_1_11_1"},{"volume-title":"Proceedings of the 23rd International Symposium on Computer Architecture","year":"1996","author":"Burger D. C.","key":"e_1_2_1_12_1"},{"volume-title":"Proceedings of the Thirteenth ACM International Conference on Supercomputing","author":"Chame J.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the Thirteenth ACM International Conference on Supercomputing","author":"Chatterjee S.","key":"e_1_2_1_14_1"},{"volume-title":"Proceedings of the 11th ACM Symposium on Parallel Algorithms and Architectures","author":"Chatterjee S.","key":"e_1_2_1_15_1"},{"volume-title":"Proceedings of the 15th ACM International Conference on Supercomputing","author":"Cociorva D.","key":"e_1_2_1_16_1"},{"volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation","year":"1995","author":"Coleman S.","key":"e_1_2_1_17_1"},{"volume-title":"Proceedings of the Scalable High Performance Computing Conference","year":"1994","author":"Collard J.-F.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","unstructured":"Cormen T. Leiserson C. and Rivest R. 1990. Introduction to Algorithms. MIT Press Cambridge MA and McGraw-Hill Book Company New York NY.]]   Cormen T. Leiserson C. and Rivest R. 1990. Introduction to Algorithms. MIT Press Cambridge MA and McGraw-Hill Book Company New York NY.]]"},{"volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium.]]","author":"Ding C.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science","volume":"1863","author":"Ferrante J.","year":"1991"},{"key":"e_1_2_1_22_1","unstructured":"Gary M. R. and Johnson D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company New York NY.]]   Gary M. R. and Johnson D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company New York NY.]]"},{"volume-title":"Proceedings of the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systems","author":"Ghosh S.","key":"e_1_2_1_23_1"},{"volume-title":"Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Gu J.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","unstructured":"Haghighat M. R. 1990. Symbolic dependence analysis for high performance parallelizing compilers. Ph.D. dissertation. Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL.]]  Haghighat M. R. 1990. Symbolic dependence analysis for high performance parallelizing compilers. Ph.D. dissertation. Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL.]]"},{"volume-title":"Computer Architecture: A Quantitative Approach","year":"1996","author":"Hennessy J.","key":"e_1_2_1_26_1"},{"volume-title":"Proceedings of IEEE\/ACM SC 2001","year":"2034","author":"Jin G.","key":"e_1_2_1_27_1"},{"volume-title":"Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'98","author":"Kandemir M.","key":"e_1_2_1_28_1"},{"volume-title":"Proceedings of the 2000 International Conference on Supercomputing","year":"2000","author":"Kennedy K.","key":"e_1_2_1_29_1"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Sixth Workhsop on Languages and Compilers for Parallel Computing","volume":"768","author":"Kennedy K.","year":"1993"},{"volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation","author":"Kodukula I.","key":"e_1_2_1_31_1"},{"volume-title":"Proceedings of Supercomputing.]] 10","author":"Kodukula I.","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Lam M. S.","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.577265"},{"key":"e_1_2_1_35_1","unstructured":"Matula D. and Beck L. 1981. Smallest-last ordering and clustering and graph coloring algorithms. Tech. rep. TR CSE 8104. Department of Computer Science and Engineering Southern Methodist University Dallas TX.]]  Matula D. and Beck L. 1981. Smallest-last ordering and clustering and graph coloring algorithms. Tech. rep. TR CSE 8104. Department of Computer Science and Engineering Southern Methodist University Dallas TX.]]"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018782528453"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(98)00022-2"},{"key":"e_1_2_1_38_1","unstructured":"Object-Oriented Scientific Computing. 2001. Blitz++. Object-Oriented Scientific Computing Available online at http:\/\/www.oonumerics.org\/blitz\/benchmarks\/.]]  Object-Oriented Scientific Computing. 2001. Blitz++. Object-Oriented Scientific Computing Available online at http:\/\/www.oonumerics.org\/blitz\/benchmarks\/.]]"},{"volume-title":"Proceedings of the ACM International Conference on Supercomputing","author":"O'Boyle M.","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.752655"},{"volume-title":"Proceedings of the International Conference on Parallel Processing","author":"Park N.","key":"e_1_2_1_41_1"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/135226.135233"},{"volume-title":"Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing","author":"Pugh W.","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Pugh W. Rosser E. and Shpeisman T. 1996. Exploiting monotone convergence functions in parallel programs. Tech. rep. CS-TR-3636. University of Maryland College Park MD.]]   Pugh W. Rosser E. and Shpeisman T. 1996. Exploiting monotone convergence functions in parallel programs. Tech. rep. CS-TR-3636. University of Maryland College Park MD.]]","DOI":"10.1007\/BFb0017246"},{"volume-title":"Proceedings of the Eighth International Conference on Compiler Construction","author":"Rivera G.","key":"e_1_2_1_45_1"},{"volume-title":"Proceedings of the IEEE\/ACM SC","year":"2000","author":"Rivera G.","key":"e_1_2_1_46_1"},{"key":"e_1_2_1_47_1","unstructured":"Rosser E. 1998. Fine-grained analysis of array computations. Ph.D. dissertation. Department of Computer Science University of Maryland at College Park MD.]]   Rosser E. 1998. Fine-grained analysis of array computations. Ph.D. dissertation. Department of Computer Science University of Maryland at College Park MD.]]"},{"volume-title":"Proceedings of the Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers","year":"1998","author":"Sarkar V.","key":"e_1_2_1_48_1"},{"volume-title":"Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation","author":"Song Y.","key":"e_1_2_1_49_1"},{"volume-title":"Proceedings of the 15th ACM International Conference on Supercomputing","author":"Song Y.","key":"e_1_2_1_50_1"},{"volume-title":"Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Strout M.","key":"e_1_2_1_51_1"},{"volume-title":"Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems","year":"1830","author":"Temam O.","key":"e_1_2_1_52_1"},{"volume-title":"Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Wise D. S.","key":"e_1_2_1_53_1"},{"key":"e_1_2_1_54_1","unstructured":"Wolf M. 1992. Improving locality and parallelism in nested loops. Ph.D. dissertation. Department of Computer Science Stanford University Stanford CA.]]   Wolf M. 1992. Improving locality and parallelism in nested loops. Ph.D. dissertation. Department of Computer Science Stanford University Stanford CA.]]"},{"key":"e_1_2_1_55_1","unstructured":"Wolfe M. 1995. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company Reading MA.]]   Wolfe M. 1995. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company Reading MA.]]"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015460304860"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Xue J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers Dordrecht The Netherlands.]]   Xue J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers Dordrecht The Netherlands.]]","DOI":"10.1007\/978-1-4615-4337-4"}],"container-title":["ACM Transactions on Programming Languages and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1034774.1034777","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1034774.1034777","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:23:58Z","timestamp":1750267438000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1034774.1034777"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,11]]},"references-count":57,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2004,11]]}},"alternative-id":["10.1145\/1034774.1034777"],"URL":"https:\/\/doi.org\/10.1145\/1034774.1034777","relation":{},"ISSN":["0164-0925","1558-4593"],"issn-type":[{"type":"print","value":"0164-0925"},{"type":"electronic","value":"1558-4593"}],"subject":[],"published":{"date-parts":[[2004,11]]},"assertion":[{"value":"2004-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}