{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:50:00Z","timestamp":1771951800894,"version":"3.50.1"},"reference-count":80,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2024,8,19]],"date-time":"2024-08-19T00:00:00Z","timestamp":1724025600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"U.S. Department of Energy Office of Science and the National Nuclear Security Administration","award":["Exascale Computing Project (17-SC-20-SC)"],"award-info":[{"award-number":["Exascale Computing Project (17-SC-20-SC)"]}]},{"name":"Office of Science of the U.S. Department of Energy","award":["DE-AC02-05CH11231, DE-AC05-00OR22725, DE-AC02-06CH11357"],"award-info":[{"award-number":["DE-AC02-05CH11231, DE-AC05-00OR22725, DE-AC02-06CH11357"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2024,11]]},"abstract":"<jats:p> From partial differential equations to the convolutional neural networks in deep learning, to matrix operations in dense linear algebra, computations on structured grids dominate high-performance computing and machine learning. The performance of such computations is key to effective utilization of the billions of US dollar\u2019s worth of GPU-accelerated systems such computations are run on. Concurrently, the end of Moore\u2019s law and Dennard scaling are driving the specialization of compute and memory architectures. This specialization often makes performance brittle (small changes in function can have severe ramifications on performance), non-portable (vendors are increasingly motivated to develop their programming models tailored for their specialized architectures), and not performance portable (even a given computation may perform very differently from one architecture to the next). The mismatch between computations that reference data that is logically neighboring in N-dimensional space but physically distant in memory motivated the creation of Bricks \u2014 a novel data-structure transformation for multi-dimensional structured grids that reorders data into small, fixed-sized bricks of contiguously-packed data. Whereas a cache-line naturally captures spatial locality in only one dimension of a structured grid, Bricks can capture spatial locality in three or more dimensions. When coupled with a Python interface, a code-generator, and autotuning, the resultant BrickLib software provides not only raw performance, but also performance portability across multiple CPUs and GPUs, scalability in distributed memory, user productivity, and generality across computational domains. In this paper, we provide an overview of BrickLib and provide a series of vignettes on how it delivers on the aforementioned metrics. <\/jats:p>","DOI":"10.1177\/10943420241268288","type":"journal-article","created":{"date-parts":[[2024,8,19]],"date-time":"2024-08-19T09:20:21Z","timestamp":1724059221000},"page":"549-567","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":3,"title":["Bricks: A high-performance portability layer for computations on block-structured grids"],"prefix":"10.1177","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6813-0536","authenticated-orcid":false,"given":"Mahesh","family":"Lakshminarasimhan","sequence":"first","affiliation":[{"name":"Kahlert School of Computing, University of Utah, Salt Lake City, UT, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4596-0289","authenticated-orcid":false,"given":"Oscar","family":"Antepara","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA, USA"}]},{"given":"Tuowen","family":"Zhao","sequence":"additional","affiliation":[{"name":"Kahlert School of Computing, University of Utah, Salt Lake City, UT, USA"}]},{"given":"Benjamin","family":"Sepanski","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Austin, TX, USA"}]},{"given":"Protonu","family":"Basu","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9644-9982","authenticated-orcid":false,"given":"Hans","family":"Johansen","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3058-7573","authenticated-orcid":false,"given":"Mary","family":"Hall","sequence":"additional","affiliation":[{"name":"Kahlert School of Computing, University of Utah, Salt Lake City, UT, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8327-5717","authenticated-orcid":false,"given":"Samuel","family":"Williams","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA, USA"}]}],"member":"179","published-online":{"date-parts":[[2024,8,19]]},"reference":[{"key":"bibr1-10943420241268288","doi-asserted-by":"crossref","unstructured":"Antepara O, Williams S, Johansen H, et al. (2023) Performance portability evaluation of blocked stencil computations on gpus Proc. Of the SC \u201923 P3HPC Workshop. Denver, CO, USA, 13 November 2023.","DOI":"10.1145\/3624062.3624177"},{"key":"bibr2-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1155\/2009\/382638"},{"key":"bibr3-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87475-1_9"},{"key":"bibr4-10943420241268288","doi-asserted-by":"crossref","unstructured":"Basu P, Venkat A, Hall M, et al. (2013) Compiler generation and autotuning of communication-avoiding operators for geometric multigrid 20th Annual International Conference on High Performance Computing, Bengaluru, India, 18-21 December 2013.","DOI":"10.1109\/HiPC.2013.6799131"},{"key":"bibr5-10943420241268288","doi-asserted-by":"crossref","unstructured":"Basu P, Hall M, Williams S, et al. (2015) Compiler-directed transformation for higher-order stencils IEEE IPDPS\u201915, Hyderabad, India, 25-29 May 2015.","DOI":"10.1109\/IPDPS.2015.103"},{"key":"bibr6-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.08.006"},{"key":"bibr7-10943420241268288","unstructured":"Chen T, Moreau T, Jiang Z, et al. (2018) TVM: an automated end-to-end optimizing compiler for deep learning. Munich, Germany, 8-14 September 2018."},{"key":"bibr8-10943420241268288","doi-asserted-by":"crossref","unstructured":"Christen M, Schenk O, Burkhart H (2011) Patus: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures 2011 IEEE International Parallel & Distributed Processing Symposium, Anchorage, AK, USA, 16-20 May 2011.","DOI":"10.1109\/IPDPS.2011.70"},{"key":"bibr9-10943420241268288","unstructured":"Colella P, Graves DT, Ligocki T, et al. (2009) Chombo software package for amr applications design document. Available at: https:\/\/www.seesar.lbl.gov\/ANAG\/chombo\/."},{"key":"bibr10-10943420241268288","volume-title":"EECS Department","author":"Datta K","year":"2009"},{"key":"bibr11-10943420241268288","doi-asserted-by":"crossref","unstructured":"Datta K, Murphy M, Volkov V, et al. (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures SC 08: Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing, Austin, TX, USA, 15-21 November 2008.","DOI":"10.1109\/SC.2008.5222004"},{"key":"bibr12-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1137\/070693199"},{"key":"bibr13-10943420241268288","doi-asserted-by":"crossref","unstructured":"De La Cruz R, Araya-Polo M, Cela JM (2010) Introducing the semi-stencil algorithm International Conference on Parallel Processing and Applied Mathematics, Wroclaw, Poland, September 13-16, 2009.","DOI":"10.1007\/978-3-642-14390-8_52"},{"key":"bibr14-10943420241268288","doi-asserted-by":"crossref","unstructured":"Deakin T, McIntosh-Smith S, Price J, et al. (2019) Performance portability across diverse computer architectures 2019 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA, 22-22 November 2019.","DOI":"10.1109\/P3HPC49587.2019.00006"},{"key":"bibr15-10943420241268288","doi-asserted-by":"crossref","unstructured":"Deitz SJ, Chamberlain BL, Snyder L (2001) Eliminating redundancies in sum-of-product array computations Proc. Of the 15th International Conference on Supercomputing. Sorrento, Italy, 16-21 June, 2001.","DOI":"10.1145\/377792.377807"},{"key":"bibr16-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536305"},{"key":"bibr17-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/582034.582084"},{"key":"bibr18-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575702"},{"key":"bibr19-10943420241268288","first-page":"21","volume":"10","author":"Douglas CC","year":"2000","journal-title":"Electronic Transactions on Numerical Analysis"},{"key":"bibr20-10943420241268288","doi-asserted-by":"crossref","unstructured":"Dufek AS, Gayatri R, Mehta N, et al. (2021) Case study of using Kokkos and SYCL as performance-portable frameworks for Milc-Dslash benchmark on NVIDIA, AMD and Intel GPUs P3HPC Workshop\u201921, St. Louis, MO, USA, 14-14 November 2021.","DOI":"10.1109\/P3HPC54578.2021.00009"},{"key":"bibr21-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1080\/13647830.2014.919410"},{"key":"bibr22-10943420241268288","unstructured":"Extreme-scale Scientific Software Stack (E4S) (2020) Extreme-scale scientific software stack (E4S). https:\/\/e4s-project.github.io\/."},{"key":"bibr23-10943420241268288","unstructured":"Frigo M, Strumpen V (2005) Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations SC '03: Proceedings of the 2003 ACM\/IEEE Conference on Supercomputing, Phoenix, AZ, USA, 15-21 November 2003."},{"key":"bibr24-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1063\/5.0046327"},{"key":"bibr25-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2011.05.034"},{"key":"bibr26-10943420241268288","unstructured":"Hagedorn B, Fan B, Chen H, et al. (2023) Graphene: an ir for optimized tensor computations on gpus ASPLOS\u201923. Vancouver, BC, Canada, 25-29 March 2023."},{"key":"bibr27-10943420241268288","doi-asserted-by":"crossref","unstructured":"Hashmi JM, Chakraborty S, Bayatpour M, et al. (2019) Falcon: efficient designs for zero-copy mpi datatype processing on emerging architectures IPDPS\u201919, Rio de Janeiro, Brazil, 20-24 May 2019.","DOI":"10.1109\/IPDPS.2019.00045"},{"key":"bibr28-10943420241268288","doi-asserted-by":"crossref","unstructured":"Holewinski J, Pouchet LN, Sadayappan P (2012) High-performance code generation for stencil computations on gpu architectures ICS\u201912. SanVenice, Italy Servolo Island, 25-28 June 2012.","DOI":"10.1145\/2304576.2304619"},{"key":"bibr29-10943420241268288","doi-asserted-by":"crossref","unstructured":"Ibrahim KZ, Yang C, Maris P (2022) Performance portability of sparse block diagonal matrix multiple vector multiplications on gpus P3HPC Workshop SC\u201922.13 Dallas, TX, USA, November 2022.","DOI":"10.1109\/P3HPC56579.2022.00011"},{"key":"bibr30-10943420241268288","volume-title":"A Strategy for High Performance in Computational Fluid Dynamics","author":"Jayaraj J","year":"2013"},{"key":"bibr31-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1063\/1.874014"},{"key":"bibr32-10943420241268288","unstructured":"Kokkos (2024) Kokkos view multidimensional arrays. https:\/\/kokkos.org\/kokkos-core-wiki\/ProgrammingGuide\/View.html."},{"key":"bibr33-10943420241268288","unstructured":"Kowarschik M, Wei\u00df C (2001) Dimepack - a cache-optimized multigrid library International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). Las Vegas, NV, USA, 25-28 June 2001."},{"key":"bibr34-10943420241268288","unstructured":"Krishnamoorthy S, Baskaran M, Bondhugula U, et al. (2007) Effective automatic parallelization of stencil computations PLDI\u201907. San Diego, CA, USA, 10-13 June 2007."},{"key":"bibr35-10943420241268288","volume":"18","author":"Kurth T","year":"2018","journal-title":"Exascale deep learning for climate analytics"},{"key":"bibr36-10943420241268288","doi-asserted-by":"crossref","unstructured":"Kwack J, Tramm J, Bertoni C, et al. (2021) Evaluation of performance portability of applications and mini-apps across amd, intel and nvidia gpus 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC), St. Louis, MO, USA, 14-14 November 2021.","DOI":"10.1109\/P3HPC54578.2021.00008"},{"key":"bibr37-10943420241268288","doi-asserted-by":"crossref","unstructured":"Lakshminarasimhan M, Hall M, Williams S, et al. (2024) BrickDL: graph-level optimizations for DNNs with fine-grained blocking on GPUs Proceedings of the 53nd International Conference on Parallel Processing (ICPP). Gotland, Sweden, 12-15 August 2024.","DOI":"10.1145\/3673038.3673046"},{"key":"bibr38-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2019.02.005"},{"key":"bibr39-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/3374916"},{"key":"bibr40-10943420241268288","volume":"1","author":"Mathuriya A","year":"2018","journal-title":"CosmoFlow: using deep learning to learn the universe at scale SC\u201918"},{"key":"bibr41-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-74224-9_1"},{"key":"bibr42-10943420241268288","doi-asserted-by":"crossref","unstructured":"Micikevicius P (2009) 3d finite difference computation on gpus using cuda Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. Washington, D.C., USA, 8 March 2009.","DOI":"10.1145\/1513895.1513905"},{"key":"bibr43-10943420241268288","doi-asserted-by":"crossref","unstructured":"Mohiyuddin M, Hoemmen M, Demmel J, et al. (2009) Minimizing communication in sparse matrix solvers SC\u201909. Portland, Oregon, USA, 14-20 November 2009.","DOI":"10.1145\/1654059.1654096"},{"key":"bibr44-10943420241268288","doi-asserted-by":"crossref","unstructured":"Nguyen A, Satish N, Chhugani J, et al. (2010) 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs. SC '10: Proceedings of the 2010 ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, 13-19 November 2010.","DOI":"10.1109\/SC.2010.2"},{"key":"bibr45-10943420241268288","doi-asserted-by":"crossref","unstructured":"Ozturk ME, Asudeh O, Sabin G, et al. (2023) A performance portability study using tensor contraction benchmarks AsHES 2023: The Thirteenth International Workshop on Accelerators and Hybrid Exascale Systems. St. Petersburg, FL, USA, 15-19 May 2023.","DOI":"10.1109\/IPDPSW59300.2023.00102"},{"key":"bibr46-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2017.08.007"},{"key":"bibr47-10943420241268288","author":"Rawat PS","year":"2018","journal-title":"Austria, 24-28"},{"key":"bibr48-10943420241268288","doi-asserted-by":"crossref","unstructured":"Rawat PS, Sukumaran-Rajam A, Rountev A, et al. (2018b) Associative instruction reordering to alleviate register pressure. SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA, 11-16 November 2018.","DOI":"10.1109\/SC.2018.00049"},{"key":"bibr49-10943420241268288","doi-asserted-by":"crossref","unstructured":"Rivera G, Tseng C (2000) Tiling optimizations for 3D scientific computations Supercomputing (SC\u201900), Dallas, TX, USA, 04-10 November 2000.","DOI":"10.1109\/SC.2000.10015"},{"key":"bibr50-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041295"},{"key":"bibr51-10943420241268288","doi-asserted-by":"crossref","unstructured":"Sepanski B, Zhao T, Johansen H, et al. (2022) Maximizing performance through memory hierarchy-driven data layout transformations 2022 IEEE\/ACM Workshop on Memory Centric High Performance Computing (MCHPC), Dallas, TX, USA, 04-10 November 2000.","DOI":"10.1109\/MCHPC56545.2022.00006"},{"key":"bibr52-10943420241268288","first-page":"208","volume":"3","author":"Shen H","year":"2021","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"bibr53-10943420241268288","doi-asserted-by":"crossref","unstructured":"Song Y, Li Z (1999) New tiling techniques to improve cache temporal locality Proc. ACM PLDI\u2019998. Atlanta, GA, USA, 1-4 May 1999.","DOI":"10.1145\/301618.301668"},{"key":"bibr54-10943420241268288","doi-asserted-by":"crossref","unstructured":"Stock K, Kong M, Grosser T, et al. (2014) A framework for enhancing data reuse via associative reordering PLDI \u201914. Edinburgh, United Kingdom, 9-11 June 2014.","DOI":"10.1145\/2594291.2594342"},{"key":"bibr55-10943420241268288","doi-asserted-by":"crossref","unstructured":"Tang Y, Chowdhury RA, Kuszmaul BC, et al. (2011) The pochoir stencil compiler ACM Symposium on Parallelism in Algorithms and Architectures. San Jose, CA, USA, 4-6 June 2011.","DOI":"10.1145\/1989493.1989508"},{"key":"bibr56-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/3519939.3523448"},{"key":"bibr57-10943420241268288","unstructured":"The Top500 List, June 2024. (2024) The Top500 list, June 2024. https:\/\/www.top500.org\/lists\/top500\/2024\/06\/."},{"key":"bibr58-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/3315508.3329973"},{"key":"bibr59-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1201\/b21930-12"},{"key":"bibr60-10943420241268288","volume-title":"TiDA: High-Level Programming Abstractions for Data Locality Management","author":"Unat D","year":"2016"},{"key":"bibr61-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2703149"},{"key":"bibr62-10943420241268288","doi-asserted-by":"crossref","unstructured":"Wellein G, Hager G, Zeiser T, et al. (2009) Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization International Computer Software and Applications Conference, Seattle, WA, USA, 20-24 July 2009.","DOI":"10.1109\/COMPSAC.2009.82"},{"key":"bibr63-10943420241268288","unstructured":"Williams S, Shalf J, Oliker L, et al. (2006) The potential of the Cell processor for scientific computing Proc. Conference on Computing Frontiers. Ischia, Italy, 2-5 May, 2006."},{"key":"bibr64-10943420241268288","doi-asserted-by":"crossref","unstructured":"Williams S, Carter J, Oliker L, et al. (2008) Lattice Boltzmann simulation optimization on leading multicore platforms 2008 IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, USA, 14-18 April 2008.","DOI":"10.1109\/IPDPS.2008.4536295"},{"key":"bibr65-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"bibr66-10943420241268288","doi-asserted-by":"crossref","unstructured":"Williams S, Oliker L, Carter J, et al. (2011) Extracting ultra-scale lattice Boltzmann performance via hierarchical and distributed auto-tuning. SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA, 13-18 November 2022.","DOI":"10.1145\/2063384.2063458"},{"key":"bibr67-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.85"},{"key":"bibr68-10943420241268288","volume":"2001","author":"Wissink A","year":"2001","journal-title":"SC01 Proceedings"},{"key":"bibr69-10943420241268288","unstructured":"Wonnacott D (2000) Using time skewing to eliminate idle time due to memory bandwidth and network limitationsProceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000, Cancun, Mexico, 01-05 May 2000."},{"key":"bibr70-10943420241268288","doi-asserted-by":"crossref","unstructured":"Yount C, Tobin J, Breuer A, et al. (2016) Yask\u2014yet another stencil kernel: a framework for hpc stencil code-generation and tuning 2016 Sixth International Workshop on Domain-specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), Salt Lake City, UT, USA, 14-14 November 2016.","DOI":"10.1109\/WOLFHPC.2016.08"},{"key":"bibr71-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1504\/PCFD.2008.018088"},{"key":"bibr72-10943420241268288","doi-asserted-by":"crossref","unstructured":"Zhang Y, Mueller F (2012) Auto-generation and auto-tuning of 3d stencil codes on gpu clusters CGO\u201912. San Jose, CA, USA, 31 March - 4 April, 2012.","DOI":"10.1145\/2259016.2259037"},{"key":"bibr73-10943420241268288","doi-asserted-by":"crossref","unstructured":"Zhang N, Driscoll M, Markley C, et al. (2017) Snowflake: a lightweight portable stencil dsl 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lake Buena Vista, FL, USA, 29 May 2017 - 02 June 2017.","DOI":"10.1109\/IPDPSW.2017.89"},{"key":"bibr74-10943420241268288","doi-asserted-by":"publisher","DOI":"10.21105\/joss.01370"},{"key":"bibr75-10943420241268288","doi-asserted-by":"crossref","unstructured":"Zhao T, Williams S, Hall M, et al. (2018) Delivering performance-portable stencil computations on cpus and gpus using bricks 2018 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Dallas, TX, USA, 16-16 November 2018.","DOI":"10.1109\/P3HPC.2018.00009"},{"key":"bibr76-10943420241268288","doi-asserted-by":"crossref","unstructured":"Zhao T, Basu P, Williams S, et al. (2019) Exploiting reuse and vectorization in blocked stencil computations on cpus and gpus Proceedings of SC\u201919. Denver, CO, USA, 17-22 November 2019.","DOI":"10.1145\/3295500.3356210"},{"key":"bibr77-10943420241268288","doi-asserted-by":"crossref","unstructured":"Zhao T, Hall M, Johansen H, et al. (2021) Improving communication by optimizing on-node data movement with data layout ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. [virtual], 27 February - 3 March 2021.","DOI":"10.1145\/3437801.3441598"},{"key":"bibr78-10943420241268288","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378508"},{"key":"bibr79-10943420241268288","doi-asserted-by":"crossref","unstructured":"Zhou X, Giacalone JP, Garzar\u00e1n MJ, et al. (2012) Hierarchical overlapped tiling Proc. International Symposium on Code Generation and Optimization (CGO). San Jose, CA, USA, 31 March - 4 April 2012.","DOI":"10.1145\/2259016.2259044"},{"key":"bibr80-10943420241268288","unstructured":"Zhu H, Wu R, Diao Y, et al. (2022) ROLLER: fast and efficient tensor compilation for deep learning OSDI\u201922. Carlsbad, CA, USA, 11-13 July, 2022."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241268288","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420241268288","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241268288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T22:04:04Z","timestamp":1740866644000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420241268288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,19]]},"references-count":80,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["10.1177\/10943420241268288"],"URL":"https:\/\/doi.org\/10.1177\/10943420241268288","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,19]]}}}