{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:34:34Z","timestamp":1768030474715,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,11,13]],"date-time":"2021-11-13T00:00:00Z","timestamp":1636761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,14]]},"DOI":"10.1145\/3458817.3476149","type":"proceedings-article","created":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T14:42:14Z","timestamp":1637764934000},"page":"1-13","source":"Crossref","is-referenced-by-count":10,"title":["Temporal vectorization for stencils"],"prefix":"10.1145","author":[{"given":"Liang","family":"Yuan","sequence":"first","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hang","family":"Cao","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunquan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Li","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pengqi","family":"Lu","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yue","family":"Yue","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,11,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/29873.29875"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"V. Bandishti I. Pananilath and U. Bondhugula. 2012. Tiling stencil computations to maximize parallelism (SC '12). 1--11.  V. Bandishti I. Pananilath and U. Bondhugula. 2012. Tiling stencil computations to maximize parallelism (SC '12) . 1--11.","DOI":"10.1109\/SC.2012.107"},{"key":"e_1_3_2_1_3_1","volume-title":"Leonid Oliker, and Phillip Colella.","author":"Basu Protonu","year":"2015"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.04.002"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1137\/110838844"},{"key":"e_1_3_2_1_6_1","first-page":"2","article-title":"Automatic Intra-Register Vectorization for the Intel","volume":"30","author":"Bik Aart J. C.","year":"2002","journal-title":"Architecture. Int. J. Parallel Program."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Uday Bondhugula Albert Hartono J. Ramanujam and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer (PLDI '08). 101--113.  Uday Bondhugula Albert Hartono J. Ramanujam and P. Sadayappan. 2008. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer (PLDI '08) . 101--113.","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751224"},{"key":"e_1_3_2_1_9_1","article-title":"Algorithm 942","volume":"40","author":"de la Cruz Ra\u00fal","year":"2014","journal-title":"Semi-Stencil. ACM Trans. Math. Softw."},{"key":"e_1_3_2_1_10_1","volume-title":"Eliminating Redundancies in Sum-of-product Array Computations (ICS '01)","author":"Deitz Steven J.","year":"2001"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Chris Ding and Yun He. 2001. A Ghost Cell Expansion Method for Reducing Communications in Solving PDE Problems (SC '01). 50--50.  Chris Ding and Yun He. 2001. A Ghost Cell Expansion Method for Reducing Communications in Solving PDE Problems (SC '01) . 50--50.","DOI":"10.1145\/582034.582084"},{"key":"e_1_3_2_1_12_1","first-page":"21","article-title":"Cache optimization for structured and unstructured grid multigrid","volume":"10","author":"Douglas Craig C.","year":"2000","journal-title":"ETNA. Electronic Transactions on Numerical Analysis"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/996893.996853"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations (ICS '05). 361--366.  Matteo Frigo and Volker Strumpen. 2005. Cache oblivious stencil computations (ICS '05) . 361--366.","DOI":"10.1145\/1088149.1088197"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356058.1356085"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Tom Henretty Kevin Stock Louis-No\u00ebl Pouchet Franz Franchetti J. Ramanujam and P. Sadayappan. 2011. Data Layout Transformation for Stencil Computations on Short-vector SIMD Architectures (CC'11\/ETAPS'11). 225--245.  Tom Henretty Kevin Stock Louis-No\u00ebl Pouchet Franz Franchetti J. Ramanujam and P. Sadayappan. 2011. Data Layout Transformation for Stencil Computations on Short-vector SIMD Architectures (CC'11\/ETAPS'11) . 225--245.","DOI":"10.1007\/978-3-642-19861-8_13"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Tom Henretty Richard Veras Franz Franchetti Louis-No\u00ebl Pouchet J. Ramanujam and P. Sadayappan. 2013. A Stencil Compiler for Short-vector SIMD Architectures (ICS '13). 13--24.  Tom Henretty Richard Veras Franz Franchetti Louis-No\u00ebl Pouchet J. Ramanujam and P. Sadayappan. 2013. A Stencil Compiler for Short-vector SIMD Architectures (ICS '13) . 13--24.","DOI":"10.1145\/2464996.2467268"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"F. Irigoin and R. Triolet. 1988. Supernode Partitioning (POPL '88). 319--329.  F. Irigoin and R. Triolet. 1988. Supernode Partitioning (POPL '88) . 319--329.","DOI":"10.1145\/73560.73588"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2903150.2903158"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360620"},{"key":"e_1_3_2_1_21_1","volume-title":"Allen","author":"Kennedy Ken","year":"2001"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s006070070032"},{"key":"e_1_3_2_1_23_1","volume-title":"Effective Automatic Parallelization of Stencil Computations (PLDI '07)","author":"Krishnamoorthy Sriram","year":"2007"},{"key":"e_1_3_2_1_24_1","volume-title":"Wolf","author":"Lam Monica D.","year":"1991"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349320"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2002.1105970"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-016-1696-5"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2011.68"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Jiayuan Meng and Kevin Skadron. 2009. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs (ICS '09). 256--265.  Jiayuan Meng and Kevin Skadron. 2009. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs (ICS '09) . 256--265.","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"A. Nguyen N. Satish J. Chhugani C. Kim and P. Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs (SC '10). 1--13.  A. Nguyen N. Satish J. Chhugani C. Kim and P. Dubey. 2010. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs (SC '10) . 1--13.","DOI":"10.1109\/SC.2010.2"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1133997"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454119"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.82"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Fabrice Rastello and Thierry Dauxois. 2002. Efficient Tiling for an ODE Discrete Integration Program: Redundant Tasks Instead of Trapezoidal Shaped-Tiles (IPDPS '02). 138--.  Fabrice Rastello and Thierry Dauxois. 2002. Efficient Tiling for an ODE Discrete Integration Program: Redundant Tasks Instead of Trapezoidal Shaped-Tiles (IPDPS '02) . 138--.","DOI":"10.1109\/IPDPS.2002.1016667"},{"key":"e_1_3_2_1_35_1","volume-title":"Register Optimizations for Stencils on GPUs (PPoPP '18)","author":"Rawat Prashant Singh"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1133996"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Gabriel Rivera and Chau-Wen Tseng. 2000. Tiling Optimizations for 3D Scientific Computations (SC '00). Article 32.  Gabriel Rivera and Chau-Wen Tseng. 2000. Tiling Optimizations for 3D Scientific Computations (SC '00) . Article 32.","DOI":"10.1109\/SC.2000.10015"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Nadathur Satish Changkyu Kim Jatin Chhugani Hideki Saito Rakesh Krishnaiyer Mikhail Smelyanskiy Milind Girkar and Pradeep Dubey. 2012. Can traditional programming bridge the Ninja performance gap for parallel computing applications?. In ISCA. 440--451.  Nadathur Satish Changkyu Kim Jatin Chhugani Hideki Saito Rakesh Krishnaiyer Mikhail Smelyanskiy Milind Girkar and Pradeep Dubey. 2012. Can traditional programming bridge the Ninja performance gap for parallel computing applications?. In ISCA. 440--451.","DOI":"10.1145\/2366231.2337210"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/645455.654213"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007559022013"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Kevin Stock Martin Kong Tobias Grosser Louis-No\u00ebl Pouchet Fabrice Rastello Jagannathan Ramanujam and Ponnuswamy Sadayappan. 2014. A framework for enhancing data reuse via associative reordering. In PLDI. 65--76.  Kevin Stock Martin Kong Tobias Grosser Louis-No\u00ebl Pouchet Fabrice Rastello Jagannathan Ramanujam and Ponnuswamy Sadayappan. 2014. A framework for enhancing data reuse via associative reordering. In PLDI. 65--76.","DOI":"10.1145\/2666356.2594342"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751246"},{"key":"e_1_3_2_1_43_1","volume-title":"Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson.","author":"Tang Yuan","year":"2011"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.85"},{"key":"e_1_3_2_1_45_1","volume-title":"Lam","author":"Wolf Michael E.","year":"1991"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01407876"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015460304860"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.18"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/353939"},{"key":"e_1_3_2_1_50_1","volume-title":"Vector Folding: Improving Stencil Performance via Multi-Dimensional SIMD-Vector Representation (HPCC-CSS-ICESS '15)","author":"Yount Charles","year":"2015"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/WOLFHPC.2016.08"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126920"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Tuowen Zhao Protonu Basu Samuel Williams Mary Hall and Hans Johansen. 2019. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs. In SC. 1--44.  Tuowen Zhao Protonu Basu Samuel Williams Mary Hall and Hans Johansen. 2019. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs. In SC. 1--44.","DOI":"10.1145\/3295500.3356210"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2886101","article-title":"A compiler approach for exploiting partial SIMD parallelism","volume":"13","author":"Zhou Hao","year":"2016","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"crossref","volume-title":"Exploiting mixed SIMD parallelism by reducing data reorganization overhead","author":"Zhou Hao","DOI":"10.1145\/2854038.2854054"}],"event":{"name":"SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis","location":"St. Louis Missouri","acronym":"SC '21","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","IEEE CS"]},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3458817.3476149","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3458817.3476149","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:06Z","timestamp":1750268946000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3458817.3476149"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,13]]},"references-count":55,"alternative-id":["10.1145\/3458817.3476149","10.1145\/3458817"],"URL":"https:\/\/doi.org\/10.1145\/3458817.3476149","relation":{},"subject":[],"published":{"date-parts":[[2021,11,13]]}}}