{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T23:41:46Z","timestamp":1783035706936,"version":"3.54.6"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T00:00:00Z","timestamp":1630627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["PZ00P2168016"],"award-info":[{"award-number":["PZ00P2168016"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["678880"],"award-info":[{"award-number":["678880"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM\u2019s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.<\/jats:p>","DOI":"10.1145\/3469030","type":"journal-article","created":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T16:12:01Z","timestamp":1630685521000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":52,"title":["Domain-Specific Multi-Level IR Rewriting for GPU"],"prefix":"10.1145","volume":"18","author":[{"given":"Tobias","family":"Gysi","sequence":"first","affiliation":[{"name":"ETH Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christoph","family":"M\u00fcller","sequence":"additional","affiliation":[{"name":"ETH Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1978-0222","authenticated-orcid":false,"given":"Oleksandr","family":"Zinenko","sequence":"additional","affiliation":[{"name":"Google, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stephan","family":"Herhut","sequence":"additional","affiliation":[{"name":"Google, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eddie","family":"Davis","sequence":"additional","affiliation":[{"name":"Vulcan Inc, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tobias","family":"Wicky","sequence":"additional","affiliation":[{"name":"Vulcan Inc, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Oliver","family":"Fuhrer","sequence":"additional","affiliation":[{"name":"Vulcan Inc, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Torsten","family":"Hoefler","sequence":"additional","affiliation":[{"name":"ETH Zurich, Switzerland"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tobias","family":"Grosser","sequence":"additional","affiliation":[{"name":"University of Edinburgh, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,9,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2020. CLIMA. Retrieved from https:\/\/github.com\/climate-machine\/CLIMA\/.  2020. CLIMA. Retrieved from https:\/\/github.com\/climate-machine\/CLIMA\/."},{"key":"e_1_2_1_2_1","unstructured":"2020. Consortium for Small-scale Modeling. Retrieved from http:\/\/www.cosmo-model.org\/.  2020. Consortium for Small-scale Modeling. Retrieved from http:\/\/www.cosmo-model.org\/."},{"key":"e_1_2_1_3_1","unstructured":"2020. FV3: Finite-Volume Cubed-Sphere Dynamical Core. Retrieved from https:\/\/www.gfdl.noaa.gov\/fv3\/.  2020. FV3: Finite-Volume Cubed-Sphere Dynamical Core. Retrieved from https:\/\/www.gfdl.noaa.gov\/fv3\/."},{"key":"e_1_2_1_4_1","unstructured":"2020. GridTools. Retrieved from https:\/\/github.com\/GridTools\/gridtools.  2020. GridTools. Retrieved from https:\/\/github.com\/GridTools\/gridtools."},{"key":"e_1_2_1_5_1","unstructured":"2020. GT4Py. Retrieved from https:\/\/github.com\/gridtools\/gt4py.  2020. GT4Py. Retrieved from https:\/\/github.com\/gridtools\/gt4py."},{"key":"e_1_2_1_6_1","unstructured":"2020. RAJA. Retrieved from https:\/\/github.com\/LLNL\/RAJA.  2020. RAJA. Retrieved from https:\/\/github.com\/LLNL\/RAJA."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322967"},{"key":"e_1_2_1_8_1","volume-title":"LFRic: Meeting the challenges of scalability and performance portability in weather and climate models. J. Parallel Distrib. Comput. 132 (2019","author":"Adams S. V.","year":"2019","unstructured":"S. V. Adams , R. W. Ford , M. Hambley , J. M. Hobson , I. Kavc\u0306ic\u0306 , C. M. Maynard , T. Melvin , E. H. M\u00fcller , S. Mullerworth , A. R. Porter , M. Rezny , B. J. Shipway , and R. Wong . 2019 . LFRic: Meeting the challenges of scalability and performance portability in weather and climate models. J. Parallel Distrib. Comput. 132 (2019 ), 383\u2013396. DOI:https:\/\/doi.org\/10.1016\/j.jpdc. 2019 .02.007 S. V. Adams, R. W. Ford, M. Hambley, J. M. Hobson, I. Kavc\u0306ic\u0306, C. M. Maynard, T. Melvin, E. H. M\u00fcller, S. Mullerworth, A. R. Porter, M. Rezny, B. J. Shipway, and R. Wong. 2019. LFRic: Meeting the challenges of scalability and performance portability in weather and climate models. J. Parallel Distrib. Comput. 132 (2019), 383\u2013396. DOI:https:\/\/doi.org\/10.1016\/j.jpdc.2019.02.007"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the International Conference on Parallel Architecture and Compilation (PACT\u201915)","author":"Baghdadi R.","unstructured":"R. Baghdadi , U. Beaugnon , A. Cohen , T. Grosser , M. Kruse , C. Reddy , S. Verdoolaege , A. Betts , A. F. Donaldson , J. Ketema , J. Absar , S. v. Haastregt , A. Kravets , A. Lokhmotov , R. David , and E. Hajiyev . 2015. PENCIL: A platform-neutral compute intermediate language for accelerator programming . In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT\u201915) . 138\u2013149. R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse, C. Reddy, S. Verdoolaege, A. Betts, A. F. Donaldson, J. Ketema, J. Absar, S. v. Haastregt, A. Kravets, A. Lokhmotov, R. David, and E. Hajiyev. 2015. PENCIL: A platform-neutral compute intermediate language for accelerator programming. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT\u201915). 138\u2013149."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1175\/MWR-D-10-05013.1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3033019.3033023"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356173"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An automated end-to-end optimizing compiler for deep learning . In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918) . USENIX Association, 578\u2013594. Retrieved from https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/chen. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association, 578\u2013594. Retrieved from https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/chen."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3218176.3218226"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462166"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Extreme Scaling Workshop (XSW\u201913)","author":"Edwards H. C.","unstructured":"H. C. Edwards and C. R. Trott . 2013. Kokkos: Enabling performance portability across manycore architectures . In Proceedings of the Extreme Scaling Workshop (XSW\u201913) . 18\u201324. H. C. Edwards and C. R. Trott. 2013. Kokkos: Enabling performance portability across manycore architectures. In Proceedings of the Extreme Scaling Workshop (XSW\u201913). 18\u201324."},{"key":"e_1_2_1_17_1","volume-title":"Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomput. Front. Innov. 1, 1","author":"Fuhrer Oliver","year":"2014","unstructured":"Oliver Fuhrer , Carlos Osuna , Xavier Lapillonne , Tobias Gysi , Ben Cumming , Mauro Bianco , Andrea Arteaga , and Thomas Schulthess . 2014. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomput. Front. Innov. 1, 1 ( 2014 ). Oliver Fuhrer, Carlos Osuna, Xavier Lapillonne, Tobias Gysi, Ben Cumming, Mauro Bianco, Andrea Arteaga, and Thomas Schulthess. 2014. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomput. Front. Innov. 1, 1 (2014)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544160"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2458523.2458526"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626412500107"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926286"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751223"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919)","author":"Gysi T.","unstructured":"T. Gysi , T. Grosser , and T. Hoefler . 2019. Absinthe: Learning an analytical performance model to fuse and tile stencil codes in one shot . In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919) . 370\u2013382. T. Gysi, T. Grosser, and T. Hoefler. 2019. Absinthe: Learning an analytical performance model to fuse and tile stencil codes in one shot. In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919). 370\u2013382."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915)","author":"Gysi Tobias","unstructured":"Tobias Gysi , Carlos Osuna , Oliver Fuhrer , Mauro Bianco , and Thomas C. Schulthess . 2015. STELLA: A domain-specific tool for structured grid methods in weather and climate models . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915) . Association for Computing Machinery, New York, NY. DOI:https:\/\/doi.org\/10.1145\/2807591.2807627 Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas C. Schulthess. 2015. STELLA: A domain-specific tool for structured grid methods in weather and climate models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201915). Association for Computing Machinery, New York, NY. DOI:https:\/\/doi.org\/10.1145\/2807591.2807627"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918)","author":"Hagedorn Bastian","year":"2018","unstructured":"Bastian Hagedorn , Larisa Stoltzfus , Michel Steuwer , Sergei Gorlatch , and Christophe Dubach . 2018 . High performance stencil code generation with lift . In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918) . Association for Computing Machinery, New York, NY, 100\u2013112. DOI:https:\/\/doi.org\/10.1145\/3168824 Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with lift. In Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201918). Association for Computing Machinery, New York, NY, 100\u2013112. DOI:https:\/\/doi.org\/10.1145\/3168824"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1175\/MWR-D-11-00201.1"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 26th ACM International Conference on Supercomputing (ICS\u201912)","author":"Holewinski Justin","unstructured":"Justin Holewinski , Louis-No\u00ebl Pouchet , and P. Sadayappan . 2012. High-performance code generation for stencil computations on GPU architectures . In Proceedings of the 26th ACM International Conference on Supercomputing (ICS\u201912) . Association for Computing Machinery, New York, NY, 311\u2013320. DOI:https:\/\/doi.org\/10.1145\/2304576.2304619 Justin Holewinski, Louis-No\u00ebl Pouchet, and P. Sadayappan. 2012. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS\u201912). Association for Computing Machinery, New York, NY, 311\u2013320. DOI:https:\/\/doi.org\/10.1145\/2304576.2304619"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/LLVM-HPC.2018.8639402"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370308"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the TensorFlow Dev Summit.","author":"Leary Chris","year":"2017","unstructured":"Chris Leary and Todd Wang . 2017 . XLA: TensorFlow, compiled . In Proceedings of the TensorFlow Dev Summit. Chris Leary and Todd Wang. 2017. XLA: TensorFlow, compiled. In Proceedings of the TensorFlow Dev Summit."},{"key":"e_1_2_1_32_1","volume-title":"Proc. ACM Program. Lang. 2, OOPSLA (Oct.","author":"Lei\u00dfa Roland","year":"2018","unstructured":"Roland Lei\u00dfa , Klaas Boesche , Sebastian Hack , Ars\u00e8ne P\u00e9rard-Gayot , Richard Membarth , Philipp Slusallek , Andr\u00e9 M\u00fcller , and Bertil Schmidt . 2018 . AnyDSL: A partial evaluation framework for programming high-performance libraries . Proc. ACM Program. Lang. 2, OOPSLA (Oct. 2018). DOI:https:\/\/doi.org\/10.1145\/3276489 Roland Lei\u00dfa, Klaas Boesche, Sebastian Hack, Ars\u00e8ne P\u00e9rard-Gayot, Richard Membarth, Philipp Slusallek, Andr\u00e9 M\u00fcller, and Bertil Schmidt. 2018. AnyDSL: A partial evaluation framework for programming high-performance libraries. Proc. ACM Program. Lang. 2, OOPSLA (Oct. 2018). DOI:https:\/\/doi.org\/10.1145\/3276489"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 1st International Workshop on High-performance Stencil Computations. 89\u201395","author":"Maruyama Naoya","year":"2014","unstructured":"Naoya Maruyama and Takayuki Aoki . 2014 . Optimizing stencil computations for NVIDIA Kepler GPUs . In Proceedings of the 1st International Workshop on High-performance Stencil Computations. 89\u201395 . Naoya Maruyama and Takayuki Aoki. 2014. Optimizing stencil computations for NVIDIA Kepler GPUs. In Proceedings of the 1st International Workshop on High-performance Stencil Computations. 89\u201395."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377904"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/364995.365000"},{"key":"e_1_2_1_36_1","volume-title":"Migration by extrapolation of time-dependent boundary VALUES. Geophys. Prospect. 31 (June","author":"McMechan G. A.","year":"1983","unstructured":"G. A. McMechan . 1983. Migration by extrapolation of time-dependent boundary VALUES. Geophys. Prospect. 31 (June 1983 ), 413\u2013420. DOI:https:\/\/doi.org\/10.1111\/j.1365-2478.1983.tb01060.x G. A. McMechan. 1983. Migration by extrapolation of time-dependent boundary VALUES. Geophys. Prospect. 31 (June 1983), 413\u2013420. DOI:https:\/\/doi.org\/10.1111\/j.1365-2478.1983.tb01060.x"},{"key":"e_1_2_1_37_1","volume-title":"Advanced Compiler Design and Implementation","author":"Muchnick Steven S.","unstructured":"Steven S. Muchnick . 1998. Advanced Compiler Design and Implementation . Morgan Kaufmann Publishers Inc ., San Francisco, CA. Steven S. Muchnick. 1998. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694364"},{"key":"e_1_2_1_39_1","volume-title":"Hybrid Fortran: High productivity GPU porting framework applied to Japanese weather prediction model","author":"M\u00fcller Michel","year":"2018","unstructured":"Michel M\u00fcller and Takayuki Aoki . 2018 . Hybrid Fortran: High productivity GPU porting framework applied to Japanese weather prediction model . In Accelerator Programming Using Directives, Sunita Chandrasekaran and Guido Juckeland (Eds.). Springer International Publishing , Cham , 20\u201341. Michel M\u00fcller and Takayuki Aoki. 2018. Hybrid Fortran: High productivity GPU porting framework applied to Japanese weather prediction model. In Accelerator Programming Using Directives, Sunita Chandrasekaran and Guido Juckeland (Eds.). Springer International Publishing, Cham, 20\u201341."},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201313","author":"Nguyen A.","unstructured":"A. Nguyen , N. Satish , J. Chhugani , C. Kim , and P. Dubey . 2010. 3.5-D Blocking optimization for stencil computations on modern CPUs and GPUs . In Proceedings of the ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201313 . A. Nguyen, N. Satish, J. Chhugani, C. Kim, and P. Dubey. 2010. 3.5-D Blocking optimization for stencil computations on modern CPUs and GPUs. In Proceedings of the ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201313."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_1_42_1","volume-title":"Dawn: A high-level domain-specific language compiler toolchain for weather and climate applications. Supercomput. Front. Innov. 7, 2","author":"Osuna Carlos","year":"2020","unstructured":"Carlos Osuna , Tobias Wicky , Fabian Thuering , Torsten Hoefler , and Oliver Fuhrer . 2020 . Dawn: A high-level domain-specific language compiler toolchain for weather and climate applications. Supercomput. Front. Innov. 7, 2 (2020). Carlos Osuna, Tobias Wicky, Fabian Thuering, Torsten Hoefler, and Oliver Fuhrer. 2020. Dawn: A high-level domain-specific language compiler toolchain for weather and climate applications. Supercomput. Front. Innov. 7, 2 (2020)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626400000214"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the 5th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC\u201915)","author":"Rawat Prashant","unstructured":"Prashant Rawat , Martin Kong , Tom Henretty , Justin Holewinski , Kevin Stock , Louis-No\u00ebl Pouchet , J. Ramanujam , Atanas Rountev , and P. Sadayappan . 2015. SDSLc: A multi-target domain-specific compiler for stencil computations . In Proceedings of the 5th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC\u201915) . Association for Computing Machinery, New York, NY. DOI:https:\/\/doi.org\/10.1145\/2830018.2830025 Prashant Rawat, Martin Kong, Tom Henretty, Justin Holewinski, Kevin Stock, Louis-No\u00ebl Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2015. SDSLc: A multi-target domain-specific compiler for stencil computations. In Proceedings of the 5th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC\u201915). Association for Computing Machinery, New York, NY. DOI:https:\/\/doi.org\/10.1145\/2830018.2830025"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 9th Workshop on General Purpose Processing Using Graphics Processing Unit (GPGPU\u201916)","author":"Rawat Prashant Singh","unstructured":"Prashant Singh Rawat , Changwan Hong , Mahesh Ravishankar , Vinod Grover , Louis-No\u00ebl Pouchet , and P. Sadayappan . 2016. Effective resource management for enhancing performance of 2D and 3D stencils on GPUs . In Proceedings of the 9th Workshop on General Purpose Processing Using Graphics Processing Unit (GPGPU\u201916) . Association for Computing Machinery, New York, NY, 92\u2013102. DOI:https:\/\/doi.org\/10.1145\/2884045.2884047 Prashant Singh Rawat, Changwan Hong, Mahesh Ravishankar, Vinod Grover, Louis-No\u00ebl Pouchet, and P. Sadayappan. 2016. Effective resource management for enhancing performance of 2D and 3D stencils on GPUs. In Proceedings of the 9th Workshop on General Purpose Processing Using Graphics Processing Unit (GPGPU\u201916). Association for Computing Machinery, New York, NY, 92\u2013102. DOI:https:\/\/doi.org\/10.1145\/2884045.2884047"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201918)","author":"Rawat Prashant Singh","unstructured":"Prashant Singh Rawat , Fabrice Rastello , Aravind Sukumaran-Rajam , Louis-No\u00ebl Pouchet , Atanas Rountev , and P. Sadayappan . 2018. Register optimizations for stencils on GPUs . In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201918) . Association for Computing Machinery, New York, NY, USA, 168\u2013182. DOI:https:\/\/doi.org\/10.1145\/3178487.3178500 Prashant Singh Rawat, Fabrice Rastello, Aravind Sukumaran-Rajam, Louis-No\u00ebl Pouchet, Atanas Rountev, and P. Sadayappan. 2018. Register optimizations for stencils on GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201918). Association for Computing Machinery, New York, NY, USA, 168\u2013182. DOI:https:\/\/doi.org\/10.1145\/3178487.3178500"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2862896"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1868294.1868314"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 12\u201327","author":"Rosen Barry K.","unstructured":"Barry K. Rosen , Mark N. Wegman , and F. Kenneth Zadeck . 1988. Global value numbers and redundant computations . In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 12\u201327 . Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1988. Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 12\u201327."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-016-0454-1"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201917)","author":"Steuwer M.","unstructured":"M. Steuwer , T. Remmelg , and C. Dubach . 2017. LIFT: A functional data-parallel IR for high-performance GPU code generation . In Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201917) . 74\u201385. M. Steuwer, T. Remmelg, and C. Dubach. 2017. LIFT: A functional data-parallel IR for high-performance GPU code generation. In Proceedings of the IEEE\/ACM International Symposium on Code Generation and Optimization (CGO\u201917). 74\u201385."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2584665"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201911)","author":"Tang Yuan","year":"1989","unstructured":"Yuan Tang , Rezaul Alam Chowdhury , Bradley C. Kuszmaul , Chi-Keung Luk , and Charles E. Leiserson . 2011. The Pochoir stencil compiler . In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201911) . Association for Computing Machinery, New York, NY, 117\u2013128. DOI:https:\/\/doi.org\/10.1145\/ 1989 493.1989508 Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201911). Association for Computing Machinery, New York, NY, 117\u2013128. DOI:https:\/\/doi.org\/10.1145\/1989493.1989508"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183448"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 191\u2013202","author":"Wahib M.","unstructured":"M. Wahib and N. Maruyama . 2014. Scalable kernel fusion for memory-bound GPU applications . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 191\u2013202 . M. Wahib and N. Maruyama. 2014. Scalable kernel fusion for memory-bound GPU applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 191\u2013202."},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the 6th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC\u201916)","author":"Yount C.","unstructured":"C. Yount , J. Tobin , A. Breuer , and A. Duran . 2016. YASK\u2014Yet Another Stencil Kernel: A framework for HPC stencil code-generation and tuning . In Proceedings of the 6th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC\u201916) . 30\u201339. C. Yount, J. Tobin, A. Breuer, and A. Duran. 2016. YASK\u2014Yet Another Stencil Kernel: A framework for HPC stencil code-generation and tuning. In Proceedings of the 6th International Workshop on Domain-specific Languages and High-level Frameworks for High-performance Computing (WOLFHPC\u201916). 30\u201339."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356210"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178372.3179507"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469030","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3469030","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:25:03Z","timestamp":1750195503000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469030"}},"subtitle":["The Open Earth Compiler for GPU-accelerated Climate Simulation"],"short-title":[],"issued":{"date-parts":[[2021,9,3]]},"references-count":60,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3469030"],"URL":"https:\/\/doi.org\/10.1145\/3469030","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,3]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}