{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:31:43Z","timestamp":1750221103755,"version":"3.41.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,5,3]],"date-time":"2019-05-03T00:00:00Z","timestamp":1556841600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/I00677X\/1,EP\/L000407\/1,EP\/I012036\/1"],"award-info":[{"award-number":["EP\/I00677X\/1,EP\/L000407\/1,EP\/I012036\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2019,6,30]]},"abstract":"<jats:p>Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory accesses, such as A[map[i]]\u2014hence the name \u201csparse.\u201d One notable example of such loops arises in discontinuous-Galerkin finite element methods, because of the computation of numerical integrals over different domains (e.g., cells, facets). The major challenge with sparse tiling is implementation\u2014not only is it cumbersome to understand and synthesize, but it is also onerous to maintain and generalize, as it requires a complete rewrite of the bulk of the numerical computation. In this article, we propose an approach to extend the applicability of sparse tiling based on raising the level of abstraction. Through a sequence of compiler passes, the mathematical specification of a problem is progressively lowered, and eventually sparse-tiled C for-loops are generated. Besides automation, we advance the state-of-the-art by introducing a revisited, more efficient sparse tiling algorithm; support for distributed-memory parallelism; a range of fine-grained optimizations for increased runtime performance; implementation in a publicly available library, SLOPE; and an in-depth study of the performance impact in Seigen, a real-world elastic wave equation solver for seismological problems, which shows speed-ups up to 1.28\u00d7 on a platform consisting of 896 Intel Broadwell cores.<\/jats:p>","DOI":"10.1145\/3302256","type":"journal-article","created":{"date-parts":[[2019,5,6]],"date-time":"2019-05-06T17:24:38Z","timestamp":1557163478000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modeling"],"prefix":"10.1145","volume":"45","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7161-2942","authenticated-orcid":false,"given":"Fabio","family":"Luporini","sequence":"first","affiliation":[{"name":"Imperial College London"}]},{"given":"Michael","family":"Lange","sequence":"additional","affiliation":[{"name":"Imperial College London"}]},{"given":"Christian T.","family":"Jacobs","sequence":"additional","affiliation":[{"name":"University of Southampton"}]},{"given":"Gerard J.","family":"Gorman","sequence":"additional","affiliation":[{"name":"Imperial College London"}]},{"given":"J.","family":"Ramanujam","sequence":"additional","affiliation":[{"name":"Louisiana State University"}]},{"given":"Paul H. J.","family":"Kelly","sequence":"additional","affiliation":[{"name":"Imperial College London"}]}],"member":"320","published-online":{"date-parts":[[2019,5,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331559"},{"key":"e_1_2_1_2_1","unstructured":"Utkarsh Ayachit. 2015. The ParaView Guide (Full Color Version): A Parallel Visualization Application (paraview 4.3 ed.). Kitware Incorporated. http:\/\/www.amazon.com\/exec\/obidos\/redirect?tag&equals;citeulike07-20&path&equals;&equals;&equals;ASIN\/1930934300.   Utkarsh Ayachit. 2015. The ParaView Guide (Full Color Version): A Parallel Visualization Application (paraview 4.3 ed.). Kitware Incorporated. http:\/\/www.amazon.com\/exec\/obidos\/redirect?tag&equals;citeulike07-20&path&equals;&equals;&equals;ASIN\/1930934300."},{"volume-title":"Proceedings of the 2nd International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE\u201998)","author":"Bassetti Federico","key":"e_1_2_1_3_1","unstructured":"Federico Bassetti , Kei Davis , and Daniel J. Quinlan . 1998. Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures . In Proceedings of the 2nd International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE\u201998) . Springer-Verlag, London, UK, 107--118. http:\/\/dl.acm.org\/citation.cfm?id&equals;646894.709707. Federico Bassetti, Kei Davis, and Daniel J. Quinlan. 1998. Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures. In Proceedings of the 2nd International Symposium on Computing in Object-Oriented Parallel Environments (ISCOPE\u201998). Springer-Verlag, London, UK, 107--118. http:\/\/dl.acm.org\/citation.cfm?id&equals;646894.709707."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628106"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 5th International Conference on Algorithms and Architectures for Parallel Processing","author":"Chen Li","year":"2002","unstructured":"Li Chen , Zhao-Qing Zhang , and Xiao-Bing Feng . 2002 . Redundant computation partition on distributed-memory systems . In Proceedings of the 5th International Conference on Algorithms and Architectures for Parallel Processing , 2002. 252--260. Li Chen, Zhao-Qing Zhang, and Xiao-Bing Feng. 2002. Redundant computation partition on distributed-memory systems. In Proceedings of the 5th International Conference on Algorithms and Architectures for Parallel Processing, 2002. 252--260."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/0728088"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1051\/proc\/2009020"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536305"},{"key":"e_1_2_1_10_1","volume-title":"Cache optimization for structured and unstructured grid multigrid. Electronic Transaction on Numerical Analysis (February","author":"Douglas Craig C.","year":"2000","unstructured":"Craig C. Douglas , Jonathan Hu , Markus Kowarschik , Ulrich R\u00fcde , and Christian Wei\u00df . 2000. Cache optimization for structured and unstructured grid multigrid. Electronic Transaction on Numerical Analysis (February 2000 ), 21--40. Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich R\u00fcde, and Christian Wei\u00df. 2000. Cache optimization for structured and unstructured grid multigrid. Electronic Transaction on Numerical Analysis (February 2000), 21--40."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.euromechflu.2011.05.005"},{"key":"e_1_2_1_12_1","series-title":"Series A 234, 1199","volume-title":"Exact transient solution of the buried line source problem. Proceedings of the Royal Society of London","author":"Garvin W. W.","year":"1956","unstructured":"W. W. Garvin . 1956. Exact transient solution of the buried line source problem. Proceedings of the Royal Society of London , Series A 234, 1199 ( 1956 ). W. W. Garvin. 1956. Exact transient solution of the buried line source problem. Proceedings of the Royal Society of London, Series A 234, 1199 (1956)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.Companion.2012.68"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964218.1964221"},{"key":"e_1_2_1_15_1","volume-title":"Mike Warner, Fabio Luporini, and Navjot Kukreja.","author":"Gorman Gerard","year":"2015","unstructured":"Gerard Gorman , Marcos de Aguiar , David Ham , Felix Herrmann , Christian Jacobs , Paul Kelly , Michael Lange , Chris Pain , Matthew Piggott , Spencer Sherwin , Felippe Vieira Zacarias , Mike Warner, Fabio Luporini, and Navjot Kukreja. 2015 . OPESCI project web page. http:\/\/www.opesci.org. Gerard Gorman, Marcos de Aguiar, David Ham, Felix Herrmann, Christian Jacobs, Paul Kelly, Michael Lange, Chris Pain, Matthew Piggott, Spencer Sherwin, Felippe Vieira Zacarias, Mike Warner, Fabio Luporini, and Navjot Kukreja. 2015. OPESCI project web page. http:\/\/www.opesci.org."},{"key":"e_1_2_1_16_1","unstructured":"Intel Corporation. 2016. Intel VTune Performance Analyzer. software.intel.com\/en-us\/intel-vtune-amplifier-xe.  Intel Corporation. 2016. Intel VTune Performance Analyzer. software.intel.com\/en-us\/intel-vtune-amplifier-xe."},{"key":"e_1_2_1_17_1","volume-title":"Gorman","author":"Jacobs Christian T.","year":"2017","unstructured":"Christian T. Jacobs , Michael Lange , Fabio Luporini , and Gerard J . Gorman . 2017 . Application of code generation to high-order seismic modelling with the discontinuous Galerkin finite element method. In preparation. Christian T. Jacobs, Michael Lange, Fabio Luporini, and Gerard J. Gorman. 2017. Application of code generation to high-order seismic modelling with the discontinuous Galerkin finite element method. In preparation."},{"key":"e_1_2_1_18_1","unstructured":"George Karypis and Vipin Kumar. 2011. MeTis: Unstructured Graph Partitioning and Sparse Matrix Ordering System Version 5.0. http:\/\/www.cs.umn.edu\/&sim;metis.  George Karypis and Vipin Kumar. 2011. MeTis: Unstructured Graph Partitioning and Sparse Matrix Ordering System Version 5.0. http:\/\/www.cs.umn.edu\/&sim;metis."},{"key":"e_1_2_1_19_1","unstructured":"Matthew G. Knepley Michael Lange and Gerard Gorman. 2015. Unstructured overlapping mesh distribution in parallel. Submitted to ACM Transactions on Mathematical Software (TOMS'15) arxiv:1506.06194 http:\/\/arxiv.org\/abs\/1506.06194  Matthew G. Knepley Michael Lange and Gerard Gorman. 2015. Unstructured overlapping mesh distribution in parallel. Submitted to ACM Transactions on Mathematical Software (TOMS'15) arxiv:1506.06194 http:\/\/arxiv.org\/abs\/1506.06194"},{"volume-title":"Proceedings of the 2nd Workshop on Irregular Applications: Architectures and Algorithms (IA<sup>3<\/sup>) Held in Conjunction with SC12","author":"Christopher","key":"e_1_2_1_20_1","unstructured":"Christopher D. Krieger and Michelle Mills Strout. 2012. Executing optimized irregular applications using task graphs within existing parallel models . In Proceedings of the 2nd Workshop on Irregular Applications: Architectures and Algorithms (IA<sup>3<\/sup>) Held in Conjunction with SC12 . Christopher D. Krieger and Michelle Mills Strout. 2012. Executing optimized irregular applications using task graphs within existing parallel models. In Proceedings of the 2nd Workshop on Irregular Applications: Architectures and Algorithms (IA<sup>3<\/sup>) Held in Conjunction with SC12."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.68"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273442.1250761"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/15M1026092"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2687415"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654096"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2998441"},{"volume-title":"Proceedings of the International Confrencedel.e on High Performance Computing, Networking, Storage and Analysis. Article 72","author":"Ravishankar Mahesh","key":"e_1_2_1_29_1","unstructured":"Mahesh Ravishankar , John Eisenlohr , Louis-No\u00ebl Pouchet , J. Ramanujam , Atanas Rountev , and P. Sadayappan . 2012. Code generation for parallel execution of a class of irregular loops on distributed memory systems . In Proceedings of the International Confrencedel.e on High Performance Computing, Networking, Storage and Analysis. Article 72 , 11 pages. Mahesh Ravishankar, John Eisenlohr, Louis-No\u00ebl Pouchet, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2012. Code generation for parallel execution of a class of irregular loops on distributed memory systems. In Proceedings of the International Confrencedel.e on High Performance Computing, Networking, Storage and Analysis. Article 72, 11 pages."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2453972"},{"key":"e_1_2_1_31_1","volume-title":"Giles","author":"Reguly Istv\u00e1n Z.","year":"2017","unstructured":"Istv\u00e1n Z. Reguly , Gihan R. Mudalige , and Mike B . Giles . 2017 . Loop tiling in large-scale stencil codes at runtime with OPS. arXiv preprint arXiv:1704.00693. Istv\u00e1n Z. Reguly, Gihan R. Mudalige, and Mike B. Giles. 2017. Loop tiling in large-scale stencil codes at runtime with OPS. arXiv preprint arXiv:1704.00693."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.88484"},{"key":"e_1_2_1_33_1","unstructured":"Imperial College High Performance Computing Service. 2017. The cx2\/Helen Cluster.  Imperial College High Performance Computing Service. 2017. The cx2\/Helen Cluster."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.118"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781142"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/11596110_7"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041294"},{"key":"e_1_2_1_38_1","unstructured":"TOP500. 2016. cx2\/Helen Cluster Specification in the TOP500 Ranking. https:\/\/www.top500.org\/system\/178845.  TOP500. 2016. cx2\/Helen Cluster Specification in the TOP500 Ranking. https:\/\/www.top500.org\/system\/178845."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1190\/1.1442147"},{"key":"e_1_2_1_40_1","unstructured":"Zenodo\/COFFEE. 2017. coneoproject\/COFFEE: A Compiler for Fast Expression Evaluation.  Zenodo\/COFFEE. 2017. coneoproject\/COFFEE: A Compiler for Fast Expression Evaluation."},{"key":"e_1_2_1_41_1","unstructured":"Zenodo\/FIAT. 2017. firedrakeproject\/fiat: The Finite Element Automated Tabulator.  Zenodo\/FIAT. 2017. firedrakeproject\/fiat: The Finite Element Automated Tabulator."},{"key":"e_1_2_1_42_1","unstructured":"Zenodo\/Firedrake. 2017. firedrakeproject\/firedrake: An Automated Finite Element System.  Zenodo\/Firedrake. 2017. firedrakeproject\/firedrake: An Automated Finite Element System."},{"key":"e_1_2_1_43_1","unstructured":"Zenodo\/PETSc. 2017. firedrakeproject\/petsc: Portable Extensible Toolkit for Scientific Computation.  Zenodo\/PETSc. 2017. firedrakeproject\/petsc: Portable Extensible Toolkit for Scientific Computation."},{"key":"e_1_2_1_44_1","unstructured":"Zenodo\/PETSc4Py. 2017. firedrakeproject\/petsc4py: The Python Interface to PETSc.  Zenodo\/PETSc4Py. 2017. firedrakeproject\/petsc4py: The Python Interface to PETSc."},{"key":"e_1_2_1_45_1","unstructured":"Zenodo\/PyOP2. 2017. OP2\/PyOP2: Framework for Performance-Portable Parallel Computations on Unstructured Meshes.  Zenodo\/PyOP2. 2017. OP2\/PyOP2: Framework for Performance-Portable Parallel Computations on Unstructured Meshes."},{"key":"e_1_2_1_46_1","unstructured":"Zenodo\/Seigen. 2017. OPESCI\/Seigen: An Elastic Wave Equation Solver for Seismological Problems Based on the Finite Element Method.  Zenodo\/Seigen. 2017. OPESCI\/Seigen: An Elastic Wave Equation Solver for Seismological Problems Based on the Finite Element Method."},{"key":"e_1_2_1_47_1","unstructured":"Zenodo\/SLOPE. 2017. coneoproject\/SLOPE: A Run-Time System to Tile Irregular Loops.  Zenodo\/SLOPE. 2017. coneoproject\/SLOPE: A Run-Time System to Tile Irregular Loops."},{"key":"e_1_2_1_48_1","unstructured":"Zenodo\/TSFC. 2017. firedrakeproject\/tsfc: The Two Stage Form Compiler firedrakeproject\/tsfc: The Two Stage Form Compiler.  Zenodo\/TSFC. 2017. firedrakeproject\/tsfc: The Two Stage Form Compiler firedrakeproject\/tsfc: The Two Stage Form Compiler."},{"key":"e_1_2_1_49_1","unstructured":"Zenodo\/UFL. 2017. firedrakeproject\/ufl: The Unified Form Language.  Zenodo\/UFL. 2017. firedrakeproject\/ufl: The Unified Form Language."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259044"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3302256","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3302256","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:01:48Z","timestamp":1750208508000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3302256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,3]]},"references-count":49,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,6,30]]}},"alternative-id":["10.1145\/3302256"],"URL":"https:\/\/doi.org\/10.1145\/3302256","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"type":"print","value":"0098-3500"},{"type":"electronic","value":"1557-7295"}],"subject":[],"published":{"date-parts":[[2019,5,3]]},"assertion":[{"value":"2017-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-05-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}