{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T17:35:22Z","timestamp":1762018522548,"version":"build-2065373602"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,12,26]],"date-time":"2019-12-26T00:00:00Z","timestamp":1577318400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Google Faculty grant"},{"name":"EPSRC Centre for Doctoral Training in Pervasive Parallelism"},{"DOI":"10.13039\/501100000266","name":"UK Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/L01503X\/1"],"award-info":[{"award-number":["EP\/L01503X\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]},{"name":"University of Edinburgh, HiPEAC collaboration"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>Stencil computations are a widely used type of algorithm, found in applications from physical simulations to machine learning. Stencils are embarrassingly parallel, therefore fit on modern hardware such as Graphic Processing Units perfectly. Although stencil computations have been extensively studied, optimizing them for increasingly diverse hardware remains challenging. Domain-specific Languages (DSLs) have raised the programming abstraction and offer good performance; however, this method places the burden on DSL implementers to write almost full-fledged parallelizing compilers and optimizers.<\/jats:p>\n          <jats:p>\n            <jats:sc>Lift<\/jats:sc>\n            has recently emerged as a promising approach to achieve\n            <jats:italic>performance portability<\/jats:italic>\n            by using a small set of reusable parallel primitives that DSL or library writers utilize. L\n            <jats:sc>ift<\/jats:sc>\n            \u2019s key novelty is in its encoding of optimizations as a system of extensible rewrite rules which are used to explore the optimization space.\n          <\/jats:p>\n          <jats:p>\n            This article demonstrates how complex multi-dimensional stencil code and optimizations are expressed using compositions of simple 1D L\n            <jats:sc>ift<\/jats:sc>\n            primitives and rewrite rules. We introduce two optimizations that provide high performance for stencils in particular: classical overlapped tiling for multi-dimensional stencils and 2.5D tiling specifically for 3D stencils. We provide an in-depth analysis on how the tiling optimizations affects stencils of different shapes and sizes across different applications. Our experimental results show that our approach outperforms existing compiler approaches and hand-tuned codes.\n          <\/jats:p>","DOI":"10.1145\/3368858","type":"journal-article","created":{"date-parts":[[2019,12,26]],"date-time":"2019-12-26T21:05:46Z","timestamp":1577394346000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Tiling Optimizations for Stencil Computations Using Rewrite Rules in L\n            <scp>ift<\/scp>"],"prefix":"10.1145","volume":"16","author":[{"given":"Larisa","family":"Stoltzfus","sequence":"first","affiliation":[{"name":"The University of Edinburgh, United Kingdom, Edinburgh, UK"}]},{"given":"Bastian","family":"Hagedorn","sequence":"additional","affiliation":[{"name":"University of M\u00fcnster, M\u00fcnster, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5048-0741","authenticated-orcid":false,"given":"Michel","family":"Steuwer","sequence":"additional","affiliation":[{"name":"University of Glasgow, Lilybank Gardens, Glasgow, UK, United Kingdom"}]},{"given":"Sergei","family":"Gorlatch","sequence":"additional","affiliation":[{"name":"University of M\u00fcnster, M\u00fcnster, Germany"}]},{"given":"Christophe","family":"Dubach","sequence":"additional","affiliation":[{"name":"The University of Edinburgh, United Kingdom, Edinburgh, UK"}]}],"member":"320","published-online":{"date-parts":[[2019,12,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"volume-title":"Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al.","year":"2006","author":"Asanovic Krste","key":"e_1_2_1_2_1"},{"volume-title":"Proceedings of the SYCL Workshop at ACM SIGPLAN PPoPP.","year":"2016","author":"Aumage Olivier","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of the SST\u201906","year":"2006","author":"Bastian Peter","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIT.2010.214"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.70"},{"volume-title":"Parallel Processing and Applied Mathematics","author":"Ciznicki Milosz","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1115\/1.1497470"},{"key":"e_1_2_1_10_1","unstructured":"Murray I. Cole. 1988. Algorithmic Skeletons: A Structured Approach to the Management of Parallel Computation. Ph.D. Dissertation. University of Edinburgh.  Murray I. Cole. 1988. Algorithmic Skeletons: A Structured Approach to the Management of Parallel Computation. Ph.D. Dissertation. University of Edinburgh."},{"volume-title":"Proceedings of the GPGPU\u201910","author":"Danalis Anthony","key":"e_1_2_1_11_1"},{"volume-title":"Proceedings of the MULTIPROG Workshop at HiPEAC\u201912","year":"2012","author":"Dastgeer Usman","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/070693199"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2686745.2686751"},{"volume-title":"Kessler","year":"2010","author":"Enmyren Johan","key":"e_1_2_1_15_1"},{"volume-title":"Elster","year":"2016","author":"Falch Thomas L.","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088197"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2458523.2458526"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626414410023"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1464463.1464469"},{"volume-title":"Proceedings of the CGO. ACM, 100--112","year":"2018","author":"Hagedorn Bastian","key":"e_1_2_1_22_1"},{"volume-title":"Proceedings of the Conference on DAFx\u201915","year":"2015","author":"Hamilton Brian","key":"e_1_2_1_23_1"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2467268"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470421"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145865"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362691"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45706-2_86"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1708046.1708052"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3155290"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2012.46"},{"volume-title":"Proceedings of the HiStencils\u201914","year":"2014","author":"Maruyama Naoya","key":"e_1_2_1_32_1"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500365.2500595"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.Companion.2012.136"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694364"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.2"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1002\/num.20129"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC-SmartCity-DSS.2017.9"},{"volume-title":"Proceedings of the PACT\u201916","author":"Rawat Prashant Singh","key":"e_1_2_1_40_1"},{"volume-title":"Proceedings of the GPGPU\u201916","author":"Rawat Prashant Singh","key":"e_1_2_1_41_1"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884045.2884046"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370291"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2784731.2784754"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626414410059"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.269"},{"key":"e_1_2_1_47_1","first-page":"1","article-title":"Matrix multiplication beyond auto-tuning: Rewrite-based GPU code generation","volume":"15","author":"Steuwer Michel","year":"2016","journal-title":"Proceedings of the CASES. ACM"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863730"},{"volume-title":"Proceedings of the DAFx\u201917","year":"2017","author":"Stoltzfus Larisa","key":"e_1_2_1_49_1"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2011.47"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2584665"},{"volume-title":"Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson.","year":"2011","author":"Tang Yuan","key":"e_1_2_1_52_1"},{"volume-title":"Thazhuthaveetil","year":"2009","author":"Udupa Abhishek","key":"e_1_2_1_53_1"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2009.82"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2010.5470813"},{"volume-title":"Proceedings of the HPEC. IEEE Computer Society.","year":"2016","author":"Zhang Guangwei","key":"e_1_2_1_59_1"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3368858","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3368858","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:50Z","timestamp":1750203890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3368858"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,26]]},"references-count":57,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3368858"],"URL":"https:\/\/doi.org\/10.1145\/3368858","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2019,12,26]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}