{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T23:14:45Z","timestamp":1776122085056,"version":"3.50.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,12,1]],"date-time":"2013-12-01T00:00:00Z","timestamp":1385856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p>Loop tiling is a widely used loop transformation to enhance data locality and allow data reuse. In the tiled code, however, tiles of different sizes can lead to significant variation in performance. Thus, selection of an optimal tile size is critical to performance of tiled codes.<\/jats:p>\n          <jats:p>In the past, tile size selection has been attempted using both static analytical and dynamic empirical (auto-tuning) models. Past work using static models assumed a direct-mapped cache for the purpose of analysis and thus proved to be less robust. On the other hand, the auto-tuning models involve an exhaustive search in a large space of tiled codes. In this article, we propose a new analytical model for tile size selection that leverages the high set associativity in modern caches to minimize conflict misses. Our tile size selection model targets data reuse in multiple levels of cache. In addition, it considers the interaction of tiling with the SIMD unit in modern processors in estimating the optimal tile size. We find that these factors, not considered in previous models, are critical in developing a robust model for tile size selection. We implement our tile size selection model in a polyhedral compiler and test it on 12 benchmark kernels using two different problem sizes. Our model outperforms the previous analytical models that are based on reusing data in a single level of cache and achieves an average performance improvement of 9.7% and 20.4%, respectively, over the best square (cubic) tiles for the two problem sizes. In addition, the tile size chosen by our tile size selection algorithm is similar to the best performing size obtained through an extensive search, validating the analytical model underlying the algorithm.<\/jats:p>","DOI":"10.1145\/2541228.2555292","type":"journal-article","created":{"date-parts":[[2014,1,14]],"date-time":"2014-01-14T13:39:57Z","timestamp":1389706797000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Tile size selection revisited"],"prefix":"10.1145","volume":"10","author":[{"given":"Sanyam","family":"Mehta","sequence":"first","affiliation":[{"name":"University of Minnesota"}]},{"given":"Gautham","family":"Beeraka","sequence":"additional","affiliation":[{"name":"University of Minnesota"}]},{"given":"Pen-Chung","family":"Yew","sequence":"additional","affiliation":[{"name":"University of Minnesota"}]}],"member":"320","published-online":{"date-parts":[[2013,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772983"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1025992"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263662"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305245"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/378795.378859"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.10"},{"key":"e_1_2_1_8_1","unstructured":"Chen C. Chame J. and Hall M. 2008. CHiLL: A Framework for Composing High-Level Loop Transformations. Technical Report (2008) 08--897. University of Southern California.  Chen C. Chame J. and Hall M. 2008. CHiLL: A Framework for Composing High-Level Loop Transformations. Technical Report (2008) 08--897. University of Southern California."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207162"},{"key":"e_1_2_1_10_1","unstructured":"Cooper K. and Sandoval J. 2011. Portable Techniques to Find Effective Memory Hierarchy Parameters. Technical Report.  Cooper K. and Sandoval J. 2011. Portable Techniques to Find Effective Memory Hierarchy Parameters. Technical Report."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/77626.79170"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC\u201992)","author":"Ferrante J.","unstructured":"Ferrante , J. , Sarkar , V. , and Thrash , W . 1992. On estimating and enhancing cache effectiveness . In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC\u201992) . Springer, Berlin, 328--343. Ferrante, J., Sarkar, V., and Thrash, W. 1992. On estimating and enhancing cache effectiveness. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC\u201992). Springer, Berlin, 328--343."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301661"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263657"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1377603.1377607"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:SUPE.0000011388.54204.8e"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1020989410030"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/106972.106981"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/568014.379586"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.752655"},{"key":"e_1_2_1_23_1","unstructured":"Pouchet L.-N. 2013. Polybench Benchmark Suite. (2013). Available at http:\/\/www\/-roc.inria.fr\/&sim;pouchet\/software\/polybench\/.  Pouchet L.-N. 2013. Polybench Benchmark Suite. (2013). Available at http:\/\/www\/-roc.inria.fr\/&sim;pouchet\/software\/polybench\/."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(92)90027-K"},{"key":"e_1_2_1_25_1","volume-title":"-W","author":"Rivera G.","year":"1999","unstructured":"Rivera , G. and Tseng , C . -W . 1999 . A comparison of compiler tiling algorithms. In Compiler Construction. Springer , 168--182. Rivera, G. and Tseng, C.-W. 1999. A comparison of compiler tiling algorithms. In Compiler Construction. Springer, 168--182."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.413.0233"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201900)","author":"Sarkar V.","unstructured":"Sarkar , V. and Megiddo , N . 2000. An analytical model for loop tiling and its solution . In Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201900) . 146--153. Sarkar, V. and Megiddo, N. 2000. An analytical model for loop tiling and its solution. In Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201900). 146--153."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28652-0_6"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201902)","author":"\u0162\u0103pu\u015f C.","unstructured":"\u0162\u0103pu\u015f , C. , Chung , I.-H. , and Hollingsworth , J. K . 2002. Active harmony: Towards automated performance tuning . In Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201902) . IEEE Computer Society Press, Los Alamitos, CA, 1--11. \u0162\u0103pu\u015f, C., Chung, I.-H., and Hollingsworth, J. K. 2002. Active harmony: Towards automated performance tuning. In Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201902). IEEE Computer Society Press, Los Alamitos, CA, 1--11."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2011.6152742"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/169627.169762"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161054"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(00)00087-9"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113449"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/76263.76337"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015460304860"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/378795.378860"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840444"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772982"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555292","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2541228.2555292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:35:00Z","timestamp":1750232100000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1145\/2541228.2555292"],"URL":"https:\/\/doi.org\/10.1145\/2541228.2555292","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"2013-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}