{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:14:31Z","timestamp":1763468071137,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2012,4,1]],"date-time":"2012-04-01T00:00:00Z","timestamp":1333238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Program. Lang. Syst."],"published-print":{"date-parts":[[2012,4]]},"abstract":"<jats:p>Loop tiling is a widely used program optimization that improves data locality and enables coarse-grained parallelism. Parameterized tiled loops, where the tile sizes remain symbolic parameters until runtime, are quite useful for iterative compilers and autotuners that produce highly optimized libraries and codes. Although it is easy to generate such loops for (hyper-) rectangular iteration spaces tiled with (hyper-) rectangular tiles, many important computations do not fall into this restricted domain. In the past, parameterized tiled code generation for the general case of convex iteration spaces being tiled by (hyper-) rectangular tiles has been solved with bounding box approaches or with sophisticated and expensive machinery.<\/jats:p>\n          <jats:p>\n            We present a novel formulation of the parameterized tiled loop generation problem using a polyhedral set called the\n            <jats:italic>outset<\/jats:italic>\n            . By reducing the problem of parameterized tiled code generation to that of generating standard loops and simple postprocessing of these loops, the outset method achieves a code generation efficiency that is comparable to existing code generation techniques, including those for fixed tile sizes. We compare the performance of our technique with several other tiled loop generation methods on kernels from BLAS3 and scientific computations. The simplicity of our solution makes it well suited for use in production compilers\u2014in particular, the IBM XL compiler uses the inset-based technique introduced in this article for register tiling. We also provide a complete coverage of parameterized tiling of perfect loop nests by describing three related techniques: (i) a scheme for separating full and partial tiles; (ii) a scheme for generating tiled loops directly from the abstract syntax tree representation of loops; (iii) a formal characterization of parameterized loop tiling using bilinear forms and a Symbolic Fourier-Motzkin Elimination (SFME)-based parameterized tiled loop generation method.\n          <\/jats:p>","DOI":"10.1145\/2160910.2160912","type":"journal-article","created":{"date-parts":[[2012,5,1]],"date-time":"2012-05-01T13:43:38Z","timestamp":1335879818000},"page":"1-41","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Parameterized loop tiling"],"prefix":"10.1145","volume":"34","author":[{"given":"Lakshminarayanan","family":"Renganarayanan","sequence":"first","affiliation":[{"name":"IBM T.J. Watson Research Center, Yorktown Heights, NY"}]},{"given":"Daegon","family":"Kim","sequence":"additional","affiliation":[{"name":"CORESPEQ, Inc., Milpitas, CA"}]},{"given":"Michelle Mills","family":"Strout","sequence":"additional","affiliation":[{"name":"Colorado State University, Fort Collins, CO"}]},{"given":"Sanjay","family":"Rajopadhye","sequence":"additional","affiliation":[{"name":"Colorado State University, Fort Collins, CO"}]}],"member":"320","published-online":{"date-parts":[[2012,5,4]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/155090.155102"},{"doi-asserted-by":"publisher","key":"e_1_2_1_3_1","DOI":"10.1145\/109625.109631"},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.1006\/jpdc.1997.1371"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.5555\/1025127.1025992"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1016\/0167-9260(94)90019-1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1145\/207110.207162"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.2307\/1967869"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1287\/moor.14.3.502"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1007\/BF01585709"},{"key":"e_1_2_1_11_1","volume-title":"Reported in Analyse des travaux de l'Academie Royale des Sciences, pendant l'annee","author":"Fourier L.","year":"1824","unstructured":"Fourier , L. 1827. Reported in Analyse des travaux de l'Academie Royale des Sciences, pendant l'annee 1824 , Partie mathematique. Histoire de l'Acacdemie Royale des Sciences de l'Institut de France 7, xlvii--lv. Fourier, L. 1827. Reported in Analyse des travaux de l'Academie Royale des Sciences, pendant l'annee 1824, Partie mathematique. Histoire de l'Acacdemie Royale des Sciences de l'Institut de France 7, xlvii--lv."},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.1109\/TPDS.2003.1239870"},{"volume-title":"Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC'04)","author":"Gr\u00f6sslinger A.","unstructured":"Gr\u00f6sslinger , A. , Griebl , M. , and Lengauer , C . 2004. Introducing nonlinear parameters to the polyhedron model . In Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC'04) . M. Gerndt and E. Kereku, Eds., Research Report Series, LRR-TUM, Technische Universit\u00e4t M\u00fcnchen, Seeon, Germany, 1--12. Gr\u00f6sslinger, A., Griebl, M., and Lengauer, C. 2004. Introducing nonlinear parameters to the polyhedron model. In Proceedings of the 11th Workshop on Compilers for Parallel Computers (CPC'04). M. Gerndt and E. Kereku, Eds., Research Report Series, LRR-TUM, Technische Universit\u00e4t M\u00fcnchen, Seeon, Germany, 1--12.","key":"e_1_2_1_13_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1145\/1542275.1542301"},{"unstructured":"HiTLoG 2007. HiTLoG: Hierarchical Tiled Loop Generator. http:\/\/www.cs.colostate.edu\/MMAlpha\/HiTLoG\/.  HiTLoG 2007. HiTLoG: Hierarchical Tiled Loop Generator. http:\/\/www.cs.colostate.edu\/MMAlpha\/HiTLoG\/.","key":"e_1_2_1_15_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1145\/263699.263716"},{"doi-asserted-by":"publisher","key":"e_1_2_1_17_1","DOI":"10.1145\/73560.73588"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1145\/567097.567101"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1145\/1178597.1178605"},{"volume-title":"Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)","author":"Kelly W.","unstructured":"Kelly , W. , Pugh , W. , and Rosser , E . 1995. Code generation for multiple mappings . In Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95) . IEEE Computer Society, Los Alamitos, CA, 332--341. Kelly, W., Pugh, W., and Rosser, E. 1995. Code generation for multiple mappings. In Proceedings of the 5th Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95). IEEE Computer Society, Los Alamitos, CA, 332--341.","key":"e_1_2_1_20_1"},{"unstructured":"Kim D. and Rajopadhye S. 2009. Parameterized tiling for imperfectly nested loops. Tech. rep. CS-09-101 Colorado State University.  Kim D. and Rajopadhye S. 2009. Parameterized tiling for imperfectly nested loops. Tech. rep. CS-09-101 Colorado State University.","key":"e_1_2_1_21_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_22_1","DOI":"10.1145\/1362622.1362691"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.5555\/517554.825767"},{"volume-title":"Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation (SAMOS)","author":"Knijnenburg P. M. W.","unstructured":"Knijnenburg , P. M. W. , Kisuki , T. , and O'Boyle , M. F. P. 2002. Iterative compilation . In Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation (SAMOS) . Springer-Verlag, Berlin , ACM Press , New York, NY, 171--187. Knijnenburg, P. M. W., Kisuki, T., and O'Boyle, M. F. P. 2002. Iterative compilation. In Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation (SAMOS). Springer-Verlag, Berlin, ACM Press, New York, NY, 171--187.","key":"e_1_2_1_24_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.1002\/(SICI)1096-9128(199607)8:6<445::AID-CPE253>3.0.CO;2-G"},{"unstructured":"Le Verge H. Van Dongen V. and Wilde D. 1994a. La synth\u00e8se de nids de boucles avec la biblioth\u00e8que poly\u00e9drique. In RenPar'6. IRISA Lyon France. (English version: Loop nest synthesis using the polyhedral library. In IRISA TR 830.)  Le Verge H. Van Dongen V. and Wilde D. 1994a. La synth\u00e8se de nids de boucles avec la biblioth\u00e8que poly\u00e9drique. In RenPar'6. IRISA Lyon France. (English version: Loop nest synthesis using the polyhedral library. In IRISA TR 830.)","key":"e_1_2_1_26_1"},{"unstructured":"Le Verge H. Van Dongen V. and Wilde D. 1994b. Loop nest synthesis using the polyhedral library. Tech. rep. PI 830 IRISA Rennes France. (Also published as INRIA Research Report 2288.)  Le Verge H. Van Dongen V. and Wilde D. 1994b. Loop nest synthesis using the polyhedral library. Tech. rep. PI 830 IRISA Rennes France. (Also published as INRIA Research Report 2288.)","key":"e_1_2_1_27_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1023\/A:1007577115980"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.1504\/IJHPCN.2004.009265"},{"unstructured":"OMEGA&plus; 2011. OMEGA&plus;: Pressburger engine and code generator. http:\/\/chunchen.info\/omega\/.  OMEGA&plus; 2011. OMEGA&plus;: Pressburger engine and code generator. http:\/\/chunchen.info\/omega\/.","key":"e_1_2_1_31_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1145\/135226.135233"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.1109\/JPROC.2004.840306"},{"doi-asserted-by":"publisher","key":"e_1_2_1_34_1","DOI":"10.1023\/A:1007554627716"},{"doi-asserted-by":"publisher","key":"e_1_2_1_35_1","DOI":"10.1016\/0743-7315(92)90027-K"},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.1145\/1654059.1654105"},{"unstructured":"Schreiber R. and Dongarra J. 1990. Automatic blocking of nested loops. Tech. rep. 90.38 RIACS NASA Ames Research Center.   Schreiber R. and Dongarra J. 1990. Automatic blocking of nested loops. Tech. rep. 90.38 RIACS NASA Ames Research Center.","key":"e_1_2_1_37_1"},{"unstructured":"Weispfenning V. 1994. Parametric linear and quadratic optimization by elimination. Tech. rep. MIP-9404 Fakult\u00e4t f\u00fcr Mathematik und Informatik Universit\u00e4t Passau.  Weispfenning V. 1994. Parametric linear and quadratic optimization by elimination. Tech. rep. MIP-9404 Fakult\u00e4t f\u00fcr Mathematik und Informatik Universit\u00e4t Passau.","key":"e_1_2_1_38_1"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing. IEEE Computer Society","author":"Whaley R. C.","unstructured":"Whaley , R. C. and Dongarra , J. J . 1998. Automatically tuned linear algebra software . In Proceedings of the ACM\/IEEE Conference on Supercomputing. IEEE Computer Society , Los Alamitos, CA, 1--27. Whaley, R. C. and Dongarra, J. J. 1998. Automatically tuned linear algebra software. In Proceedings of the ACM\/IEEE Conference on Supercomputing. IEEE Computer Society, Los Alamitos, CA, 1--27.","key":"e_1_2_1_39_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_40_1","DOI":"10.1145\/193209.193217"},{"doi-asserted-by":"publisher","key":"e_1_2_1_41_1","DOI":"10.1145\/113445.113449"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. SIAM","author":"Wolfe M.","year":"1989","unstructured":"Wolfe , M. 1989 . Iteration space tiling for memory hierarchies . In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. SIAM , Philadelphia, PA, 357--361. Wolfe, M. 1989. Iteration space tiling for memory hierarchies. In Proceedings of the 3rd SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia, PA, 357--361."},{"volume-title":"Loop Tiling for Parallelism","author":"Xue J.","unstructured":"Xue , J. 2000. Loop Tiling for Parallelism . Kluwer Academic Publishers, Norwell , MA. Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers, Norwell, MA.","key":"e_1_2_1_43_1"}],"container-title":["ACM Transactions on Programming Languages and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2160910.2160912","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2160910.2160912","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:05:48Z","timestamp":1750241148000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2160910.2160912"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,4]]}},"alternative-id":["10.1145\/2160910.2160912"],"URL":"https:\/\/doi.org\/10.1145\/2160910.2160912","relation":{},"ISSN":["0164-0925","1558-4593"],"issn-type":[{"type":"print","value":"0164-0925"},{"type":"electronic","value":"1558-4593"}],"subject":[],"published":{"date-parts":[[2012,4]]},"assertion":[{"value":"2009-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}