{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:15:31Z","timestamp":1750306531402,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,1,9]],"date-time":"2015-01-09T00:00:00Z","timestamp":1420761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>Loop tiling is an effective optimization to improve performance of multiply nested loops, which are the most time-consuming parts in many programs. Most massively parallel systems today are organized hierarchically, and different levels of the hierarchy differ in the organization of parallelism and the memory models they adopt. To make better use of these machines, it is clear that loop nests should be tiled hierarchically to fit the hierarchical organization of the machine; however, it is not so clear what should be the exact form of these hierarchical tiles. In particular, tile shape selection is of critical importance to expose parallelism of the tiled loop nests. Although loop tiling is a well-known optimization, not much is known about tile shape selection.<\/jats:p>\n          <jats:p>In this article, we study tile shape selection when the shapes are any type of parallelograms and introduce a model to relate the tile shape of the hierarchy to the execution time. Using this model, we implement a system that automatically finds the tile shapes that minimize the execution time in a hierarchical system. Our experimental results show that in several cases, the tiles automatically selected by our system outperform the most intuitive tiling schemes usually adopted by programmers because of their simplicity.<\/jats:p>","DOI":"10.1145\/2687414","type":"journal-article","created":{"date-parts":[[2015,1,12]],"date-time":"2015-01-12T20:02:10Z","timestamp":1421092930000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Optimal Parallelogram Selection for Hierarchical Tiling"],"prefix":"10.1145","volume":"11","author":[{"given":"Xing","family":"Zhou","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mar\u00eda J.","family":"Garzar\u00e1n","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David A.","family":"Padua","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,1,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/080716980"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1981.1675792"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/370049.370401"},{"key":"e_1_2_1_4_1","volume-title":"Numerical Methods for Partial Differential Equations","author":"Ames William F.","unstructured":"William F. Ames . 1977. Numerical Methods for Partial Differential Equations ( 2 nd ed.). Academic , San Diego, CA . William F. Ames. 1977. Numerical Methods for Partial Differential Equations (2nd ed.). Academic, San Diego, CA.","edition":"2"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/378580.378619"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504209"},{"key":"e_1_2_1_7_1","volume-title":"Retrieved","author":"Berkelaar Michel","year":"2014","unstructured":"Michel Berkelaar , Kjell Eikland , and Peter Notebaert . n.d. Introduction to lp_solve 5.5.2.0 . Retrieved November 17, 2014 , from http:\/\/lpsolve.sourceforge.net\/5.5\/. Michel Berkelaar, Kjell Eikland, and Peter Notebaert. n.d. Introduction to lp_solve 5.5.2.0. Retrieved November 17, 2014, from http:\/\/lpsolve.sourceforge.net\/5.5\/."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122971.1122981"},{"key":"e_1_2_1_9_1","series-title":"Lecture Notes in Computer Science","volume-title":"Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model","author":"Bondhugula Uday","unstructured":"Uday Bondhugula , Muthu Baskaran , Sriram Krishnamoorthy , Jagannathan Ramanujam , Atanas Rountev , and Ponnuswamy Sadayappan . 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model . In Compiler Construction, Laurie Hendren (Ed.). Lecture Notes in Computer Science , Vol. 4959 . Springer , 132--146. Uday Bondhugula, Muthu Baskaran, Sriram Krishnamoorthy, Jagannathan Ramanujam, Atanas Rountev, and Ponnuswamy Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Compiler Construction, Laurie Hendren (Ed.). Lecture Notes in Computer Science, Vol. 4959. Springer, 132--146."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/645605.662909"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207162"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 20th International Symposium on Parallel and Distributed Processing (IPDPS\u201906)","author":"Guo Jia","year":"2006","unstructured":"Jia Guo , Ganesh Bikshandi , Daniel Hoeflinger , Gheorge Almasi , Basilio Fraguela , Maria J. Garzaran , David Padua , and Christoph von Praun . 2006 . Hierarchically tiled arrays for parallelism and locality . In Proceedings of the 20th International Symposium on Parallel and Distributed Processing (IPDPS\u201906) . DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2006.1639573 10.1109\/IPDPS.2006.1639573 Jia Guo, Ganesh Bikshandi, Daniel Hoeflinger, Gheorge Almasi, Basilio Fraguela, Maria J. Garzaran, David Padua, and Christoph von Praun. 2006. Hierarchically tiled arrays for parallelism and locality. In Proceedings of the 20th International Symposium on Parallel and Distributed Processing (IPDPS\u201906). DOI: http:\/\/dx.doi.org\/10.1109\/IPDPS.2006.1639573"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542301"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/305619.305641"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996800"},{"issue":"0","key":"e_1_2_1_18_1","first-page":"29","article-title":"The OpenCL Specification","volume":"1","author":"Khronos OpenCL Working Group","year":"2008","unstructured":"Khronos OpenCL Working Group . 2008 . The OpenCL Specification , Version 1 . 0 . 29 . Retrieved November 17, 2014, from http:\/\/khronos.org\/registry\/cl\/specs\/opencl-1.0.29.pdf. Khronos OpenCL Working Group. 2008. The OpenCL Specification, Version 1.0.29. Retrieved November 17, 2014, from http:\/\/khronos.org\/registry\/cl\/specs\/opencl-1.0.29.pdf.","journal-title":"Version"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1916461.1916468"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735698"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305197"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/2190025.2190063"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/224538.224571"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.125893"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2004.3"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301668"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.97902"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113449"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics","author":"Wolfe Michael","year":"1989","unstructured":"Michael Wolfe . 1989 a. Iteration space tiling for memory hierarchies . In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics , Philadelphia, PA, 357--361. http:\/\/dl.acm.org\/citation.cfm&quest;id=645818.669220 Michael Wolfe. 1989a. Iteration space tiling for memory hierarchies. In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial and Applied Mathematics, Philadelphia, PA, 357--361. http:\/\/dl.acm.org\/citation.cfm&quest;id=645818.669220"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/76263.76337"},{"key":"e_1_2_1_31_1","series-title":"Lecture Notes in Computer Science","volume-title":"Languages and Compilers for Parallel Computing","author":"Wonnacott David","unstructured":"David Wonnacott . 2000. Time skewing for parallel computers . In Languages and Compilers for Parallel Computing . Lecture Notes in Computer Science , Vol. 1863 . Springer , 477--480. http:\/\/dx.doi.org\/10.1007\/3-540-44905-1_35. 10.1007\/3-540-44905-1_35 David Wonnacott. 2000. Time skewing for parallel computers. In Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, Vol. 1863. Springer, 477--480. http:\/\/dx.doi.org\/10.1007\/3-540-44905-1_35."},{"key":"e_1_2_1_32_1","series-title":"Lecture Notes in Computer Science","volume-title":"Languages and Compilers for Parallel Computing","author":"Xue Jingling","unstructured":"Jingling Xue . 1997. Communication-minimal tiling of uniform dependence loops . In Languages and Compilers for Parallel Computing . Lecture Notes in Computer Science , Vol. 1239 . Springer , 330--349. http:\/\/dx.doi.org\/10.1007\/BFb0017262. 10.1007\/BFb0017262 Jingling Xue. 1997. Communication-minimal tiling of uniform dependence loops. In Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, Vol. 1239. Springer, 330--349. http:\/\/dx.doi.org\/10.1007\/BFb0017262."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259044"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2687414","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2687414","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:12:14Z","timestamp":1750227134000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2687414"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,9]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2687414"],"URL":"https:\/\/doi.org\/10.1145\/2687414","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,1,9]]},"assertion":[{"value":"2014-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}