{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T06:59:07Z","timestamp":1769842747954,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,28]],"date-time":"2022-06-28T00:00:00Z","timestamp":1656374400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020YFB1506703"],"award-info":[{"award-number":["2020YFB1506703"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072018"],"award-info":[{"award-number":["62072018"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,28]]},"DOI":"10.1145\/3524059.3532392","type":"proceedings-article","created":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T16:13:11Z","timestamp":1655395991000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":35,"title":["Toward accelerated stencil computation by adapting tensor core unit on GPU"],"prefix":"10.1145","author":[{"given":"Xiaoyan","family":"Liu","sequence":"first","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Hailong","family":"Yang","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Jianjin","family":"Liao","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Mingzhen","family":"Li","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Zhongzhi","family":"Luan","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Depei","family":"Qian","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2022,6,28]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"The ARTEMIS code generator. online. (2019-9-27). https:\/\/github.com\/pssrawat\/artemis Accessed","year":"2021","unstructured":"2019-9-27. The ARTEMIS code generator. online. (2019-9-27). https:\/\/github.com\/pssrawat\/artemis Accessed September 29, 2021 . 2019-9-27. The ARTEMIS code generator. online. (2019-9-27). https:\/\/github.com\/pssrawat\/artemis Accessed September 29, 2021."},{"key":"e_1_3_2_1_2_1","volume-title":"The user manual for NVIDIA Nsight Compute. online. (2020-7-30). https:\/\/docs.nvidia.com\/nsight-compute\/NsightCompute\/index.html#nvvp-sessions Accessed","year":"2021","unstructured":"2020-7-30. The user manual for NVIDIA Nsight Compute. online. (2020-7-30). https:\/\/docs.nvidia.com\/nsight-compute\/NsightCompute\/index.html#nvvp-sessions Accessed September 29, 2021 . 2020-7-30. The user manual for NVIDIA Nsight Compute. online. (2020-7-30). https:\/\/docs.nvidia.com\/nsight-compute\/NsightCompute\/index.html#nvvp-sessions Accessed September 29, 2021."},{"key":"e_1_3_2_1_3_1","volume-title":"Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al.","author":"Asanovic Krste","year":"2006","unstructured":"Krste Asanovic , Ras Bodik , Bryan Christopher Catanzaro , Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al. 2006 . The landscape of parallel computing research: A view from berkeley. (2006). Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, et al. 2006. The landscape of parallel computing research: A view from berkeley. (2006)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476182"},{"key":"e_1_3_2_1_5_1","volume-title":"Algorithm Design for Tensor Units. In European Conference on Parallel Processing. Springer, 353--367","author":"Chowdhury Rezaul","year":"2021","unstructured":"Rezaul Chowdhury , Francesco Silvestri , and Flavio Vella . 2021 . Algorithm Design for Tensor Units. In European Conference on Parallel Processing. Springer, 353--367 . Rezaul Chowdhury, Francesco Silvestri, and Flavio Vella. 2021. Algorithm Design for Tensor Units. In European Conference on Parallel Processing. Springer, 353--367."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.70"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3331057"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2591006"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1190\/geo2018-0760.1"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3469030"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00050"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3441830"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00052"},{"key":"e_1_3_2_1_14_1","volume-title":"Motivation for and evaluation of the first tensor processing unit. ieee Micro 38, 3","author":"Jouppi Norman","year":"2018","unstructured":"Norman Jouppi , Cliff Young , Nishant Patil , and David Patterson . 2018. Motivation for and evaluation of the first tensor processing unit. ieee Micro 38, 3 ( 2018 ), 10--19. Norman Jouppi, Cliff Young, Nishant Patil, and David Patterson. 2018. Motivation for and evaluation of the first tensor processing unit. ieee Micro 38, 3 (2018), 10--19."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662426"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Randall J LeVeque. 2007. Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM.  Randall J LeVeque. 2007. Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM.","DOI":"10.1137\/1.9780898717839"},{"key":"e_1_3_2_1_17_1","volume-title":"Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 90--102","author":"Li Guangli","year":"2021","unstructured":"Guangli Li , Jingling Xue , Lei Liu , Xueying Wang , Xiu Ma , Xiao Dong , Jiansong Li , and Xiaobing Feng . 2021 . Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 90--102 . Guangli Li, Jingling Xue, Lei Liu, Xueying Wang, Xiu Ma, Xiao Dong, Jiansong Li, and Xiaobing Feng. 2021. Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 90--102."},{"key":"e_1_3_2_1_18_1","volume-title":"Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors. In 50th International Conference on Parallel Processing. 1--12","author":"Li Mingzhen","year":"2021","unstructured":"Mingzhen Li , Yi Liu , Hailong Yang , Yongmin Hu , Qingxiao Sun , Bangduo Chen , Xin You , Xiaoyan Liu , Zhongzhi Luan , and Depei Qian . 2021 . Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors. In 50th International Conference on Parallel Processing. 1--12 . Mingzhen Li, Yi Liu, Hailong Yang, Yongmin Hu, Qingxiao Sun, Bangduo Chen, Xin You, Xiaoyan Liu, Zhongzhi Luan, and Depei Qian. 2021. Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors. In 50th International Conference on Parallel Processing. 1--12."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/120883153"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3374916"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063398"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368826.3377904"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/20M1348571"},{"key":"e_1_3_2_1_24_1","volume-title":"NVIDIA V100 TENSOR CORE GPU. online. (2020-11-10). https:\/\/www.nvidia.com\/en-us\/data-center\/v100\/ Accessed","author":"NVIDIA.","year":"2021","unstructured":"NVIDIA. 2020-11-10. NVIDIA V100 TENSOR CORE GPU. online. (2020-11-10). https:\/\/www.nvidia.com\/en-us\/data-center\/v100\/ Accessed March 30, 2021 . NVIDIA. 2020-11-10. NVIDIA V100 TENSOR CORE GPU. online. (2020-11-10). https:\/\/www.nvidia.com\/en-us\/data-center\/v100\/ Accessed March 30, 2021."},{"key":"e_1_3_2_1_25_1","volume-title":"NVIDIA NVIDIA Ampere Architecture Whitepaper. online. (2021-1-30). https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/ Accessed","author":"NVIDIA.","year":"2021","unstructured":"NVIDIA. 2021-1-30. NVIDIA NVIDIA Ampere Architecture Whitepaper. online. (2021-1-30). https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/ Accessed March 30, 2021 . NVIDIA. 2021-1-30. NVIDIA NVIDIA Ampere Architecture Whitepaper. online. (2021-1-30). https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/ Accessed March 30, 2021."},{"key":"e_1_3_2_1_26_1","volume-title":"NVIDIA H100 Tensor Core GPU Architecture. online. (2022-4-26). https:\/\/resources.nvidia.com\/en-us-tensor-core Accessed","author":"NVIDIA.","year":"2022","unstructured":"NVIDIA. 2022-4-26. NVIDIA H100 Tensor Core GPU Architecture. online. (2022-4-26). https:\/\/resources.nvidia.com\/en-us-tensor-core Accessed April 26, 2022 . NVIDIA. 2022-4-26. NVIDIA H100 Tensor Core GPU Architecture. online. (2022-4-26). https:\/\/resources.nvidia.com\/en-us-tensor-core Accessed April 26, 2022."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414656"},{"key":"e_1_3_2_1_28_1","volume-title":"Squeeze: Efficient Compact Fractals for Tensor Core GPUs. arXiv preprint arXiv:2201.00613","author":"Quezada Felipe A","year":"2022","unstructured":"Felipe A Quezada , Crist\u00f3bal A Navarro , Nancy Hitschfeld , and Benjamin Bustos . 2022 . Squeeze: Efficient Compact Fractals for Tensor Core GPUs. arXiv preprint arXiv:2201.00613 (2022). Felipe A Quezada, Crist\u00f3bal A Navarro, Nancy Hitschfeld, and Benjamin Bustos. 2022. Squeeze: Efficient Compact Fractals for Tensor Core GPUs. arXiv preprint arXiv:2201.00613 (2022)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_2_1_30_1","volume-title":"CUDA 11 Features Revealed. online. (2020-5-14). https:\/\/developer.nvidia.com\/blog\/cuda-11-features-revealed\/ Accessed","author":"Ramarao Pramod","year":"2021","unstructured":"Pramod Ramarao . 2020-5-14. CUDA 11 Features Revealed. online. (2020-5-14). https:\/\/developer.nvidia.com\/blog\/cuda-11-features-revealed\/ Accessed April 5, 2021 . Pramod Ramarao. 2020-5-14. CUDA 11 Features Revealed. online. (2020-5-14). https:\/\/developer.nvidia.com\/blog\/cuda-11-features-revealed\/ Accessed April 5, 2021."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884045.2884047"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178500"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2862896"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00073"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433778"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-27562-4_29"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/Cluster48925.2021.00037"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS53621.2022.00090"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00089"},{"key":"e_1_3_2_1_41_1","volume-title":"Computer modeling and simulation of solid-state sintering: A phase field approach. Acta materialia 54, 4","author":"Wang Yu U","year":"2006","unstructured":"Yu U Wang . 2006. Computer modeling and simulation of solid-state sintering: A phase field approach. Acta materialia 54, 4 ( 2006 ), 953--961. Yu U Wang. 2006. Computer modeling and simulation of solid-state sintering: A phase field approach. Acta materialia 54, 4 (2006), 953--961."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2019.2944146"},{"key":"e_1_3_2_1_43_1","volume-title":"Vector folding: Improving stencil performance via multidimensional simd-vector representation. In 2015 IEEE 17th International Conference on High Performance Computing and Communications","author":"Yount Charles","year":"2015","unstructured":"Charles Yount . 2015. Vector folding: Improving stencil performance via multidimensional simd-vector representation. In 2015 IEEE 17th International Conference on High Performance Computing and Communications , 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE , 865--870. Charles Yount. 2015. Vector folding: Improving stencil performance via multidimensional simd-vector representation. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, 865--870."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/WOLFHPC.2016.08"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476149"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259037"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD53106.2021.00054"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356210"}],"event":{"name":"ICS '22: 2022 International Conference on Supercomputing","location":"Virtual Event","acronym":"ICS '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 36th ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532392","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524059.3532392","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:38Z","timestamp":1750188638000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524059.3532392"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,28]]},"references-count":48,"alternative-id":["10.1145\/3524059.3532392","10.1145\/3524059"],"URL":"https:\/\/doi.org\/10.1145\/3524059.3532392","relation":{},"subject":[],"published":{"date-parts":[[2022,6,28]]},"assertion":[{"value":"2022-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}