{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,20]],"date-time":"2025-12-20T22:21:20Z","timestamp":1766269280332,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100011347","name":"State Key Laboratory of Software Development Environment","doi-asserted-by":"publisher","award":["SKLSDE-2021ZX-06"],"award-info":[{"award-number":["SKLSDE-2021ZX-06"]}],"id":[{"id":"10.13039\/501100011347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020YFB1506703"],"award-info":[{"award-number":["2020YFB1506703"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072018"],"award-info":[{"award-number":["62072018"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3473517","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:46:04Z","timestamp":1633459564000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors"],"prefix":"10.1145","author":[{"given":"Mingzhen","family":"Li","sequence":"first","affiliation":[{"name":"Beihang University, China"}]},{"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Hailong","family":"Yang","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Yongmin","family":"Hu","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Qingxiao","family":"Sun","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Bangduo","family":"Chen","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Xin","family":"You","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Xiaoyan","family":"Liu","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Zhongzhi","family":"Luan","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]},{"given":"Depei","family":"Qian","sequence":"additional","affiliation":[{"name":"Beihang University, China"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628092"},{"volume-title":"26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 535\u2013544","author":"Ao Y.","key":"e_1_3_2_1_2_1","unstructured":"Y. Ao , C. Yang , X. Wang , W. Xue , H. Fu , F. Liu , L. Gan , P. Xu , and W. Ma . 2017 . 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 535\u2013544 . Y. Ao, C. Yang, X. Wang, W. Xue, H. Fu, F. Liu, L. Gan, P. Xu, and W. Ma. 2017. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 535\u2013544."},{"key":"e_1_3_2_1_3_1","volume-title":"Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 193\u2013205","author":"Baghdadi Riyadh","year":"2019","unstructured":"Riyadh Baghdadi , Jessica Ray , Malek\u00a0Ben Romdhane , Emanuele\u00a0Del Sozzo , Abdurrahman Akkas , Yunming Zhang , Patricia Suriana , Shoaib Kamil , and Saman Amarasinghe . 2019 . Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 193\u2013205 . Riyadh Baghdadi, Jessica Ray, Malek\u00a0Ben Romdhane, Emanuele\u00a0Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In 2019 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 193\u2013205."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751226"},{"volume-title":"Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI \u201908)","author":"Bondhugula Uday","key":"e_1_3_2_1_5_1","unstructured":"Uday Bondhugula , Albert Hartono , J. Ramanujam , and P. Sadayappan . 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer . In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI \u201908) . Association for Computing Machinery, New York, NY, USA, 101\u2013113. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI \u201908). Association for Computing Machinery, New York, NY, USA, 101\u2013113."},{"key":"e_1_3_2_1_6_1","volume-title":"ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 779\u2013793","author":"Cheshmi Kazem","year":"2018","unstructured":"Kazem Cheshmi , Shoaib Kamil , Michelle\u00a0Mills Strout , and Maryam\u00a0Mehri Dehnavi . 2018 . ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 779\u2013793 . Kazem Cheshmi, Shoaib Kamil, Michelle\u00a0Mills Strout, and Maryam\u00a0Mehri Dehnavi. 2018. ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 779\u2013793."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.70"},{"key":"e_1_3_2_1_8_1","volume-title":"Distributed Halide. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Denniston Tyler","year":"2016","unstructured":"Tyler Denniston , Shoaib Kamil , and Saman Amarasinghe . 2016 . Distributed Halide. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ( Barcelona, Spain) (PPoPP \u201916). Association for Computing Machinery, New York, NY, USA, Article 5, 12\u00a0pages. Tyler Denniston, Shoaib Kamil, and Saman Amarasinghe. 2016. Distributed Halide. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Barcelona, Spain) (PPoPP \u201916). Association for Computing Machinery, New York, NY, USA, Article 5, 12\u00a0pages."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/DoD.HPCMP.UGC.2008.12"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.20"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088197"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126910"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-016-5588-7"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2544137.2544160"},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), Vol.\u00a02011","author":"Grosser Tobias","year":"2011","unstructured":"Tobias Grosser , Hongbin Zheng , Raghesh Aloor , Andreas Simb\u00fcrger , Armin Gr\u00f6\u00dflinger , and Louis-No\u00ebl Pouchet . 2011 . Polly-Polyhedral optimization in LLVM . In Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), Vol.\u00a02011 . 1. Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simb\u00fcrger, Armin Gr\u00f6\u00dflinger, and Louis-No\u00ebl Pouchet. 2011. Polly-Polyhedral optimization in LLVM. In Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), Vol.\u00a02011. 1."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1464463.1464469"},{"key":"e_1_3_2_1_17_1","unstructured":"Tobias Gysi Christoph M\u00fcller Oleksandr Zinenko Stephan Herhut Eddie Davis Tobias Wicky Oliver Fuhrer Torsten Hoefler and Tobias Grosser. 2020. Domain-specific Multi-Level IR rewriting for GPU. arXiv preprint arXiv:2005.13014(2020).  Tobias Gysi Christoph M\u00fcller Oleksandr Zinenko Stephan Herhut Eddie Davis Tobias Wicky Oliver Fuhrer Torsten Hoefler and Tobias Grosser. 2020. Domain-specific Multi-Level IR rewriting for GPU. arXiv preprint arXiv:2005.13014(2020)."},{"key":"e_1_3_2_1_18_1","volume-title":"STELLA: A Domain-Specific Tool for Structured Grid Methods in Weather and Climate Models(SC \u201915)","author":"Gysi Tobias","year":"2015","unstructured":"Tobias Gysi , Carlos Osuna , Oliver Fuhrer , Mauro Bianco , and Thomas\u00a0 C. Schulthess . 2015 . STELLA: A Domain-Specific Tool for Structured Grid Methods in Weather and Climate Models(SC \u201915) . Association for Computing Machinery , New York, NY, USA , Article 41, 12\u00a0pages. Tobias Gysi, Carlos Osuna, Oliver Fuhrer, Mauro Bianco, and Thomas\u00a0C. Schulthess. 2015. STELLA: A Domain-Specific Tool for Structured Grid Methods in Weather and Climate Models(SC \u201915). Association for Computing Machinery, New York, NY, USA, Article 41, 12\u00a0pages."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3168824"},{"volume-title":"Proceedings of the 27th International ACM Conference on International Conference on Supercomputing","author":"Henretty Tom","key":"e_1_3_2_1_20_1","unstructured":"Tom Henretty , Richard Veras , Franz Franchetti , Louis-No\u00ebl Pouchet , J. Ramanujam , and P. Sadayappan . 2013. A Stencil Compiler for Short-Vector SIMD Architectures . In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing ( Eugene, Oregon, USA) (ICS \u201913). Association for Computing Machinery, New York, NY, USA, 13\u201324. Tom Henretty, Richard Veras, Franz Franchetti, Louis-No\u00ebl Pouchet, J. Ramanujam, and P. Sadayappan. 2013. A Stencil Compiler for Short-Vector SIMD Architectures. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing(Eugene, Oregon, USA) (ICS \u201913). Association for Computing Machinery, New York, NY, USA, 13\u201324."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304619"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2962395"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2627373.2627387"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2953852"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3374916"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/140991133"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063398"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377904"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.2"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"volume-title":"On Optimizing Complex Stencils on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 641\u2013652","author":"Rawat Prashant\u00a0Singh","key":"e_1_3_2_1_32_1","unstructured":"Prashant\u00a0Singh Rawat , Miheer Vaidya , Aravind Sukumaran-Rajam , Atanas Rountev , Louis-No\u00ebl Pouchet , and P. Sadayappan . 2019 . On Optimizing Complex Stencils on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 641\u2013652 . Prashant\u00a0Singh Rawat, Miheer Vaidya, Aravind Sukumaran-Rajam, Atanas Rountev, Louis-No\u00ebl Pouchet, and P. Sadayappan. 2019. On Optimizing Complex Stencils on GPUs. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 641\u2013652."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2778161"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/370049.370403"},{"volume-title":"Computational electrodynamics: the finite-difference time-domain method","author":"Taflove Allen","key":"e_1_3_2_1_35_1","unstructured":"Allen Taflove and Susan\u00a0 C Hagness . 2005. Computational electrodynamics: the finite-difference time-domain method . Artech house. Allen Taflove and Susan\u00a0C Hagness. 2005. Computational electrodynamics: the finite-difference time-domain method. Artech house."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989508"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.1996.566468"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3014912"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-18645-6_6"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/WOLFHPC.2016.08"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5194\/gmd-13-4809-2020"}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","acronym":"ICPP 2021","location":"Lemont IL USA"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473517","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3473517","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:23Z","timestamp":1750191443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473517"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":40,"alternative-id":["10.1145\/3472456.3473517","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3473517","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}