{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:52:17Z","timestamp":1771951937111,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":60,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,21]],"date-time":"2023-06-21T00:00:00Z","timestamp":1687305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"JSPS KAKENHI","award":["JP22H03600"],"award-info":[{"award-number":["JP22H03600"]}]},{"name":"JSPS KAKENHI","award":["JP21K17750"],"award-info":[{"award-number":["JP21K17750"]}]},{"name":"JST, PRESTO","award":["JPMJPR20MA"],"award-info":[{"award-number":["JPMJPR20MA"]}]},{"name":"New Energy and Industrial Technology Development Organization (NEDO)","award":["JPNP20006"],"award-info":[{"award-number":["JPNP20006"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,21]]},"DOI":"10.1145\/3577193.3593716","type":"proceedings-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T18:47:05Z","timestamp":1687286825000},"page":"251-263","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Revisiting Temporal Blocking Stencil Optimizations"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2452-1551","authenticated-orcid":false,"given":"Lingqi","family":"Zhang","sequence":"first","affiliation":[{"name":"Tokyo Institute of Technology, Tokyo, Japan"},{"name":"National Institute of Advanced Industrial Science and Technology, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7165-2095","authenticated-orcid":false,"given":"Mohamed","family":"Wahib","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1244-3151","authenticated-orcid":false,"given":"Peng","family":"Chen","sequence":"additional","affiliation":[{"name":"National Institute of Advanced Industrial Science and Technology, Tokyo, Japan"},{"name":"RIKEN Center for Computational Science, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6208-4102","authenticated-orcid":false,"given":"Jintao","family":"Meng","sequence":"additional","affiliation":[{"name":"Shenzhen Institutes of Advanced Technology, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6545-1943","authenticated-orcid":false,"given":"Xiao","family":"Wang","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Knoxville, United States of America"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7297-6211","authenticated-orcid":false,"given":"Toshio","family":"Endo","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1910-8532","authenticated-orcid":false,"given":"Satoshi","family":"Matsuoka","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"},{"name":"Tokyo Institute of Technology, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2023,6,21]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"https:\/\/www.top500.org\/lists\/top500\/2022\/06\/highs\/ [Online","year":"2023","unstructured":"2023. TOP500. https:\/\/www.top500.org\/lists\/top500\/2022\/06\/highs\/ [Online ; accessed 19- Jan- 2023 ]. 2023. TOP500. https:\/\/www.top500.org\/lists\/top500\/2022\/06\/highs\/ [Online; accessed 19-Jan-2023]."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342020923027"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(84)90073-1"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2615094"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2018.00064"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356162"},{"key":"e_1_3_2_1_7_1","volume-title":"CUDA C Programming Guide. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/ [Online","author":"Nvidia CUDA.","year":"2022","unstructured":"Nvidia CUDA. 2022. CUDA C Programming Guide. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/ [Online ; accessed 31- Dec- 2022 ]. Nvidia CUDA. 2022. CUDA C Programming Guide. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/ [Online; accessed 31-Dec-2022]."},{"key":"e_1_3_2_1_8_1","series-title":"SIAM review 51, 1","volume-title":"Optimization and performance modeling of stencil computations on modern microprocessors","author":"Datta Kaushik","year":"2009","unstructured":"Kaushik Datta , Shoaib Kamil , Samuel Williams , Leonid Oliker , John Shalf , and Katherine Yelick . 2009. Optimization and performance modeling of stencil computations on modern microprocessors . SIAM review 51, 1 ( 2009 ), 129--159. Kaushik Datta, Shoaib Kamil, Samuel Williams, Leonid Oliker, John Shalf, and Katherine Yelick. 2009. Optimization and performance modeling of stencil computations on modern microprocessors. SIAM review 51, 1 (2009), 129--159."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5222004"},{"key":"e_1_3_2_1_10_1","volume-title":"2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 19--24","author":"Endo Toshio","year":"2018","unstructured":"Toshio Endo . 2018 . Applying recursive temporal blocking for stencil computations to deeper memory hierarchy . In 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 19--24 . Toshio Endo. 2018. Applying recursive temporal blocking for stencil computations to deeper memory hierarchy. In 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 19--24."},{"key":"e_1_3_2_1_11_1","volume-title":"Proceedings of Annual IEEE\/ACM International Symposium on Code Generation and Optimization. 66--75","author":"Grosser Tobias","year":"2014","unstructured":"Tobias Grosser , Albert Cohen , Justin Holewinski , Ponuswamy Sadayappan , and Sven Verdoolaege . 2014 . Hybrid hexagonal\/classical tiling for GPUs . In Proceedings of Annual IEEE\/ACM International Symposium on Code Generation and Optimization. 66--75 . Tobias Grosser, Albert Cohen, Justin Holewinski, Ponuswamy Sadayappan, and Sven Verdoolaege. 2014. Hybrid hexagonal\/classical tiling for GPUs. In Proceedings of Annual IEEE\/ACM International Symposium on Code Generation and Optimization. 66--75."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2458523.2458526"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626414410023"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3168824"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1987237.1987255"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304619"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/73560.73588"},{"key":"e_1_3_2_1_18_1","volume-title":"DRSTENCIL codebase. https:\/\/github.com\/simple86\/DRStencil [Online","author":"Jiang Zhonghui","year":"2023","unstructured":"Zhonghui Jiang . 2023. DRSTENCIL codebase. https:\/\/github.com\/simple86\/DRStencil [Online ; accessed 22- Jan- 2023 ]. Zhonghui Jiang. 2023. DRSTENCIL codebase. https:\/\/github.com\/simple86\/DRStencil [Online; accessed 22-Jan-2023]."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2011.01.025"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3524059.3532392"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1137\/140991133"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Armin Gr\u00f6\u00dflinger and Harald K\u00f6stler (Eds.)","author":"Maruyama Naoya","year":"2014","unstructured":"Naoya Maruyama and Takayuki Aoki . 2014 . Optimizing Stencil Computations for NVIDIA Kepler GPUs . In Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Armin Gr\u00f6\u00dflinger and Harald K\u00f6stler (Eds.) . Vienna, Austria, 89--95. Naoya Maruyama and Takayuki Aoki. 2014. Optimizing Stencil Computations for NVIDIA Kepler GPUs. In Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Armin Gr\u00f6\u00dflinger and Harald K\u00f6stler (Eds.). Vienna, Austria, 89--95."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063398"},{"key":"e_1_3_2_1_24_1","volume-title":"https:\/\/github.com\/khaki3\/AN5D-Artifact [Online","author":"Matsumura Kazuaki","year":"2023","unstructured":"Kazuaki Matsumura . 2023. AN5 D AD\/AE. https:\/\/github.com\/khaki3\/AN5D-Artifact [Online ; accessed 22- Jan- 2023 ]. Kazuaki Matsumura. 2023. AN5D AD\/AE. https:\/\/github.com\/khaki3\/AN5D-Artifact [Online; accessed 22-Jan-2023]."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377904"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2549523"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.05.315"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.2"},{"key":"e_1_3_2_1_30_1","unstructured":"Nvidia. 2022. Inside Kepler. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/tesla-product-literature\/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf  Nvidia. 2022. Inside Kepler. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/tesla-product-literature\/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf"},{"key":"e_1_3_2_1_31_1","unstructured":"Nvidia. 2022. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf  Nvidia. 2022. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844463"},{"key":"e_1_3_2_1_33_1","volume-title":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1--8.","author":"Podobas Artur","year":"2020","unstructured":"Artur Podobas , Kentaro Sano , and Satoshi Matsuoka . 2020 . A template-based framework for exploring coarse-grained reconfigurable architectures . In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1--8. Artur Podobas, Kentaro Sano, and Satoshi Matsuoka. 2020. A template-based framework for exploring coarse-grained reconfigurable architectures. In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1--8."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462176"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830018.2830025"},{"key":"e_1_3_2_1_36_1","volume-title":"ARTEMIS codebase. https:\/\/github.com\/pssrawat\/artemis [Online","author":"Rawat Prashant Singh","year":"2023","unstructured":"Prashant Singh Rawat . 2023. ARTEMIS codebase. https:\/\/github.com\/pssrawat\/artemis [Online ; accessed 22- Jan- 2023 ]. Prashant Singh Rawat. 2023. ARTEMIS codebase. https:\/\/github.com\/pssrawat\/artemis [Online; accessed 22-Jan-2023]."},{"key":"e_1_3_2_1_37_1","volume-title":"https:\/\/github.com\/pssrawat\/IEEE2017 [Online","author":"Rawat Prashant Singh","year":"2023","unstructured":"Prashant Singh Rawat . 2023. STENCILGEN AD\/AE. https:\/\/github.com\/pssrawat\/IEEE2017 [Online ; accessed 22- Jan- 2023 ]. Prashant Singh Rawat. 2023. STENCILGEN AD\/AE. https:\/\/github.com\/pssrawat\/IEEE2017 [Online; accessed 22-Jan-2023]."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884045.2884047"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3200691.3178500"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2862896"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00073"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342006064482"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751240"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468044.3468053"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688514"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2018.08.004"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_3_2_1_48_1","volume-title":"Proceedings of the GPU technology conference, GTC","volume":"10","author":"Volkov Vasily","year":"2010","unstructured":"Vasily Volkov . 2010 . Better performance at lower occupancy . In Proceedings of the GPU technology conference, GTC , Vol. 10 . San Jose, CA, 16. Vasily Volkov. 2010. Better performance at lower occupancy. In Proceedings of the GPU technology conference, GTC, Vol. 10. San Jose, CA, 16."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749246.2749255"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2614981"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2009.82"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/76263.76337"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452013"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC-DSS-SmartCity-DependSys53884.2021.00036"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476149"},{"key":"e_1_3_2_1_56_1","volume-title":"PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. arXiv preprint arXiv:2204.02064","author":"Zhang Lingqi","year":"2023","unstructured":"Lingqi Zhang , Mohamed Wahib , Peng Chen , Jintao Meng , Xiao Wang , Toshio Endo , and Satoshi Matsuoka . 2023. PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. arXiv preprint arXiv:2204.02064 ( 2023 ). Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, and Satoshi Matsuoka. 2023. PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. arXiv preprint arXiv:2204.02064 (2023)."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00057"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356210"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/P3HPC.2018.00009"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174248"}],"event":{"name":"ICS '23: 37th International Conference on Supercomputing","location":"Orlando FL USA","acronym":"ICS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 37th International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:31Z","timestamp":1750178851000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,21]]},"references-count":60,"alternative-id":["10.1145\/3577193.3593716","10.1145\/3577193"],"URL":"https:\/\/doi.org\/10.1145\/3577193.3593716","relation":{},"subject":[],"published":{"date-parts":[[2023,6,21]]},"assertion":[{"value":"2023-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}