{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T22:43:09Z","timestamp":1774219389374,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,12,17]],"date-time":"2019-12-17T00:00:00Z","timestamp":1576540800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61702546"],"award-info":[{"award-number":["61702546"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"European Commission through the MNEMOSENE project","award":["780215"],"award-info":[{"award-number":["780215"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>Loop tiling to exploit data locality and parallelism plays an essential role in a variety of general-purpose and domain-specific compilers. Affine transformations in polyhedral frameworks implement classical forms of rectangular and parallelogram tiling, but these lead to pipelined start with rather inefficient wavefront parallelism. Multiple extensions to polyhedral compilers evaluated sophisticated shapes such as trapezoid or diamond tiles, enabling concurrent start along the axes of the iteration space; yet these resort to custom schedulers and code generators insufficiently integrated within the general framework. One of these modified shapes referred to as overlapped tiling also lacks a unifying framework to reason about its composition with affine transformations; this prevents its application in general-purpose loop-nest optimizers and the fair comparison with other techniques. We revisit overlapped tiling, recasting it as an affine transformation on schedule trees composable with any affine scheduling algorithm. We demonstrate how to derive tighter tile shapes with less redundant computations. Our method models the traditional \u201cscalene trapezoid\u201d shapes and novel \u201cright-rectangle\u201d variants. It goes beyond the state of the art by avoiding the restriction to a domain-specific language or introducing post-pass rescheduling and custom code generation. We conduct experiments on the PolyMage benchmarks and iterated stencils, validating the effectiveness and applicability of our technique on both general-purpose multicores and GPU accelerators.<\/jats:p>","DOI":"10.1145\/3369382","type":"journal-article","created":{"date-parts":[[2019,12,18]],"date-time":"2019-12-18T13:21:11Z","timestamp":1576675271000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Flextended Tiles"],"prefix":"10.1145","volume":"16","author":[{"given":"Jie","family":"Zhao","sequence":"first","affiliation":[{"name":"INRIA 8 DI, \u00c9cole Normale Sup\u00e9rieure, Paris, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8866-5343","authenticated-orcid":false,"given":"Albert","family":"Cohen","sequence":"additional","affiliation":[{"name":"Google, Paris, France"}]}],"member":"320","published-online":{"date-parts":[[2019,12,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629645"},{"key":"e_1_2_1_3_1","volume-title":"Sharp","author":"Bacon David F.","year":"1994","unstructured":"David F. Bacon , Susan L. Graham , and Oliver J . Sharp . 1994 . Compiler transformations for high-performance computing. ACM Comput.ing Surveys 26, 4 (Dec. 1994), 345--420. DOI:https:\/\/doi.org\/10.1145\/197405.197406 10.1145\/197405.197406 David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler transformations for high-performance computing. ACM Comput.ing Surveys 26, 4 (Dec. 1994), 345--420. DOI:https:\/\/doi.org\/10.1145\/197405.197406"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389051"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628106"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2615094"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI \u201908)","author":"Bondhugula U.","unstructured":"U. Bondhugula , A. Hartono , J. Ramanujam , and P. Sadayappan . 2008. A practical automatic polyhedral parallelizer and locality optimizer . In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI \u201908) . ACM, New York, NY, 101--113. DOI:https:\/\/doi.org\/10.1145\/1375581.1375595 10.1145\/1375581.1375595 U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A practical automatic polyhedral parallelizer and locality optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI \u201908). ACM, New York, NY, 101--113. DOI:https:\/\/doi.org\/10.1145\/1375581.1375595"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/245.247"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1275808.1276506"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3168832"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 1990 5th Distributed Memory Computing Conference. 568--572","author":"Eissfeller H.","unstructured":"H. Eissfeller and S. M. Muller . 1990. The triangle method for saving startup time in parallel computers . In Proceedings of the 1990 5th Distributed Memory Computing Conference. 568--572 . H. Eissfeller and S. M. Muller. 1990. The triangle method for saving startup time in parallel computers. In Proceedings of the 1990 5th Distributed Memory Computing Conference. 568--572."},{"key":"e_1_2_1_12_1","first-page":"6","article-title":"Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time","volume":"21","author":"Feautrier Paul","year":"1992","unstructured":"Paul Feautrier . 1992 . Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time . International Journal of Parallel Programming 21 , 6 (Dec. 1992), 389--420. DOI:https:\/\/doi.org\/10.1007\/BF01379404 10.1007\/BF01379404 Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389--420. DOI:https:\/\/doi.org\/10.1007\/BF01379404","journal-title":"International Journal of Parallel Programming"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/130935781"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201914)","author":"Grosser T.","unstructured":"T. Grosser , A. Cohen , J. Holewinski , P. Sadayappan , and S. Verdoolaege . 2014. Hybrid hexagonal\/classical tiling for GPUs . In Proceedings of the Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201914) . ACM, New York, NY, Article 66, 10 pages. DOI:https:\/\/doi.org\/10.1145\/2544137.2544160 10.1145\/2544137.2544160 T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan, and S. Verdoolaege. 2014. Hybrid hexagonal\/classical tiling for GPUs. In Proceedings of the Annual IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201914). ACM, New York, NY, Article 66, 10 pages. DOI:https:\/\/doi.org\/10.1145\/2544137.2544160"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). ACM","author":"Grosser T.","unstructured":"T. Grosser , A. Cohen , P. H. J. Kelly , J. Ramanujam , P. Sadayappan , and S. Verdoolaege . 2013. Split tiling for GPUs: Automatic parallelization using trapezoidal tiles . In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). ACM , New York, NY, 24--31. DOI:https:\/\/doi.org\/10.1145\/2458523.2458526 10.1145\/2458523.2458526 T. Grosser, A. Cohen, P. H. J. Kelly, J. Ramanujam, P. Sadayappan, and S. Verdoolaege. 2013. Split tiling for GPUs: Automatic parallelization using trapezoidal tiles. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). ACM, New York, NY, 24--31. DOI:https:\/\/doi.org\/10.1145\/2458523.2458526"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2743016"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626414410023"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.2.23"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 26th ACM International Conference on Supercomputing (ICS\u201912)","author":"Holewinski J.","unstructured":"J. Holewinski , L.-N. Pouchet , and P. Sadayappan . 2012. High-performance code generation for stencil computations on GPU architectures . In Proceedings of the 26th ACM International Conference on Supercomputing (ICS\u201912) . ACM, New York, NY, 311--320. DOI:https:\/\/doi.org\/10.1145\/2304576.2304619 10.1145\/2304576.2304619 J. Holewinski, L.-N. Pouchet, and P. Sadayappan. 2012. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS\u201912). ACM, New York, NY, 311--320. DOI:https:\/\/doi.org\/10.1145\/2304576.2304619"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362691"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI \u201907)","author":"Krishnamoorthy S.","unstructured":"S. Krishnamoorthy , M. Baskaran , U. Bondhugula , J. Ramanujam , A. Rountev , and P. Sadayappan . 2007. Effective automatic parallelization of stencil computations . In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI \u201907) . ACM, New York, NY, 235--244. DOI:https:\/\/doi.org\/10.1145\/1250734.1250761 10.1145\/1250734.1250761 S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI \u201907). ACM, New York, NY, 235--244. DOI:https:\/\/doi.org\/10.1145\/1250734.1250761"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3155290"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925952"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694364"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the International Conference on Parallel Processing. 309--316","author":"Daniel","unstructured":"Daniel A. Orozco and Guang R. Gao. 2009. Mapping the FDTD application to many-core chip architectures . In Proceedings of the International Conference on Parallel Processing. 309--316 . Daniel A. Orozco and Guang R. Gao. 2009. Mapping the FDTD application to many-core chip architectures. In Proceedings of the International Conference on Parallel Processing. 309--316."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2739047"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723694"},{"key":"e_1_2_1_28_1","volume-title":"Bilateral filtering: Theory and applications. Foundations and Trends\u00ae in Computer Graphics and Vision 4, 1","author":"Paris Sylvain","year":"2009","unstructured":"Sylvain Paris , Pierre Kornprobst , Jack Tumblin , and Fr\u00e9do Durand . 2009. Bilateral filtering: Theory and applications. Foundations and Trends\u00ae in Computer Graphics and Vision 4, 1 ( 2009 ), 1--73. Sylvain Paris, Pierre Kornprobst, Jack Tumblin, and Fr\u00e9do Durand. 2009. Bilateral filtering: Theory and applications. Foundations and Trends\u00ae in Computer Graphics and Vision 4, 1 (2009), 1--73."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/183432.183525"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185528"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2002.1003856"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/2738600.2738620"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2011.47"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2687414"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259044"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178372.3179507"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369382","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3369382","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:30Z","timestamp":1750202610000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369382"}},"subtitle":["A Flexible Extension of Overlapped Tiles for Polyhedral Compilation"],"short-title":[],"issued":{"date-parts":[[2019,12,17]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3369382"],"URL":"https:\/\/doi.org\/10.1145\/3369382","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,17]]},"assertion":[{"value":"2019-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}