{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T17:10:07Z","timestamp":1759165807768,"version":"3.44.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"Chinese Government","award":["202106380059"],"award-info":[{"award-number":["202106380059"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Math. Softw."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>We present a matrix-free multigrid method for high-order Discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 40% of the peak performance on NVIDIA A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and Message Passing Interface (MPI) parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.<\/jats:p>","DOI":"10.1145\/3765616","type":"journal-article","created":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T13:17:48Z","timestamp":1756819068000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Multilevel Interior Penalty Methods on GPUs"],"prefix":"10.1145","volume":"51","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0341-4447","authenticated-orcid":false,"given":"Cu","family":"Cui","sequence":"first","affiliation":[{"name":"Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1687-7328","authenticated-orcid":false,"given":"Guido","family":"Kanschat","sequence":"additional","affiliation":[{"name":"Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2025,9,29]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Ahmad Abdelfattah Valeria Barra Natalie Beams Ryan Bleile Jed Brown Jean-Sylvain Camier Robert Carson Noel Chalmers Veselin Dobrev Yohann Dudouit et al. 2021. GPU algorithms for efficient exascale discretizations. Parallel Computing 108 (2021) 102841.","DOI":"10.1016\/j.parco.2021.102841"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2020.06.009"},{"issue":"5","key":"e_1_3_2_4_2","first-page":"447","article-title":"High-performance finite elements with MFEM","volume":"38","author":"Andrej Julian","year":"2024","unstructured":"Julian Andrej, Nabil Atallah, Jan-Phillip B\u00e4cker, Jean-Sylvain Camier, Dylan Copeland, Veselin Dobrev, Yohann Dudouit, Tobias Duswald, Brendan Keith, Dohyun Kim, et al. 2024. High-performance finite elements with MFEM. The International Journal of High Performance Computing Applications 38, 5 (2024), 447\u2013467.","journal-title":"The International Journal of High Performance Computing Applications"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1515\/jnma-2023-0089"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1137\/0719052"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-97-00826-0"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"Douglas N. Arnold Richard S. Falk and Ragnar Winther. 2000. Multigrid in \\(H({\\rm div})\\) and \\(H({\\rm curl})\\) . Numerische Mathematik 85 2 (2000) 197\u2013217.","DOI":"10.1007\/PL00005386"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Wolfgang Bangerth Carsten Burstedde Timo Heister and Martin Kronbichler. 2011. Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Transactions on Mathematical Software 38 Article 14 (2011) 1\u201328.","DOI":"10.1145\/2049673.2049678"},{"key":"e_1_3_2_10_2","unstructured":"Jed Brown A. Abdelfata J. S. Camier Veselin Dobrev J. Dongarra Paul Fischer A. Fisher Y. Dudouit A. Haidar K. Kamran et al. 2017. CEED ECP Milestone Report: Identify Initial Kernels Bake-Off Problems (Benchmarks) and Miniapps. Technical report US Department of Energy USA."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1137\/100791634"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Cu Cui. 2024. Acceleration of tensor-product operations with tensor cores. ACM Transactions on Parallel Computing 11 4 Article 15 (Nov. 2024) 24 pages.","DOI":"10.1145\/3695466"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1137\/24M1642706"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2025.106703"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2020.104541"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1080\/17445760601122076"},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"Jay Gopalakrishnan and Guido Kanschat. 2003. A multilevel discontinuous Galerkin method. Numerische Mathematik 95 (2003) 527\u2013550.","DOI":"10.1007\/s002110200392"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/370049.370405"},{"key":"e_1_3_2_19_2","unstructured":"David A. Ham Paul H. J. Kelly Lawrence Mitchell Colin J. Cotter Robert C. Kirby Koki Sagiyama Nacime Bouziani Sophia Vorderwuelbecke Thomas J. Gregory Jack Betteridge et al. 2023. Firedrake User Manual (1st ed.). Imperial College London and University of Oxford and Baylor University and University of Washington."},{"key":"e_1_3_2_20_2","unstructured":"Mikl\u00f3s Homolya Robert C. Kirby and David A. Ham. 2017. Exposing and exploiting structure: Optimal code generation for high-order finite element methods. arXiv:1711.02473. Retrieved from https:\/\/arxiv.org\/abs\/1711.02473"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"Immo Huismann J\u00f6rg Stiller and Jochen Fr\u00f6hlich. 2019. Scaling to the stars\u2014A linearly scaling elliptic solver for p-multigrid. Journal of Computational Physics 398 (2019) 108868.","DOI":"10.1016\/j.jcp.2019.108868"},{"key":"e_1_3_2_22_2","unstructured":"Immo Huismann J\u00f6rg Stiller and Jochen Fr\u00f6hlich. 2020. Linearizing the hybridizable discontinuous Galerkin method: A linearly scaling operator. arXiv:2007.11891. Retrieved from https:\/\/arxiv.org\/abs\/2007.11891"},{"key":"e_1_3_2_23_2","unstructured":"G. Kanschat. 2003. Discontinuous Galerkin Finite Element Methods for Advection-Diffusion Problems. Ph.D. Dissertation. Habilitationsschrift Universit\u00e4t Heidelberg."},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","unstructured":"Guido Kanschat. 2008. Robust smoothers for high-order discontinuous Galerkin discretizations of advection\u2013diffusion problems. Journal of Computational and Applied Mathematics 218 1 (2008) 53\u201360.","DOI":"10.1016\/j.cam.2007.04.032"},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","unstructured":"Guido Kanschat Raytcho Lazarov and Youli Mao. 2017. Geometric multigrid for Darcy and Brinkman models of flows in highly heterogeneous porous media: A numerical study. Journal of Computational and Applied Mathematics 310 (2017) 174\u2013185.","DOI":"10.1016\/j.cam.2016.05.016"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1515\/jnma-2015-0005"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"Andreas Kl\u00f6ckner Tim Warburton Jeff Bridge and Jan S. Hesthaven. 2009. Nodal discontinuous Galerkin methods on graphics processors. Journal of Computational Physics 228 21 (2009) 7863\u20137882.","DOI":"10.1016\/j.jcp.2009.06.041"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1177\/10943420211020803"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2010.06.024"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2012.04.012"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3325864"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3322813"},{"key":"e_1_3_2_33_2","unstructured":"Karl Ljungkvist. 2017. Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes. In Proceedings of the 25th High Performance Computing Symposium (Virginia Beach Virginia) (HPC \u201917). Society for Computer Simulation International San Diego CA USA Article 1 12 pages."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"Robert E. Lynch John R. Rice and Donald H. Thomas. 1964. Direct solution of partial difference equations by tensor product methods. Numerische Mathematik 6 (1964) 185\u2013199.","DOI":"10.1007\/BF01386067"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cageo.2016.03.008"},{"key":"e_1_3_2_36_2","unstructured":"NVIDIA Corporation. 2023. CUDA C++ Programming Guide. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html"},{"key":"e_1_3_2_37_2","unstructured":"NVIDIA Corporation. 2023. Nsight Compute. Retrieved from https:\/\/docs.nvidia.com\/nsight-compute\/index.html"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","unstructured":"Steven A. Orszag. 1980. Spectral methods for problems in complex geometries. Journal of Computational Physics 37 1 (1980) 70\u201392.","DOI":"10.1016\/0021-9991(80)90005-4"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(84)90128-1"},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","unstructured":"Luca F. Pavarino. 1994. Additive Schwarz methods for the \\( p \\) -version finite element method. Numerische Mathematik 66 1 (1994) 493\u2013515.","DOI":"10.1007\/BF01385709"},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"Luca F. Pavarino. 1994. Schwarz methods with local refinement for the \\( p \\) -version finite element method. Numerische Mathematik 69 2 (1994) 185\u2013211.","DOI":"10.1007\/s002110050087"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Will Pazner and Per-Olof Persson. 2018. Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. Journal of Computational Physics 354 (2018) 344\u2013369.","DOI":"10.1016\/j.jcp.2017.10.030"},{"key":"e_1_3_2_43_2","doi-asserted-by":"crossref","unstructured":"J.-F. Remacle Rajesh Gandham and Tim Warburton. 2016. GPU accelerated spectral finite elements on all-hex meshes. Journal of Computational Physics 324 (2016) 246\u2013257.","DOI":"10.1016\/j.jcp.2016.08.005"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3217824"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342018816368"},{"key":"e_1_3_2_46_2","doi-asserted-by":"crossref","unstructured":"Brian C. Vermeire Freddie D. Witherden and Peter E. Vincent. 2017. On the utility of GPU accelerated high-order methods for unsteady flow simulations: A comparison with industry-standard tools. Journal of Computational Physics 334 (2017) 497\u2013521.","DOI":"10.1016\/j.jcp.2016.12.049"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","unstructured":"Peter E. J. Vos Spencer J. Sherwin and Robert M. Kirby. 2010. From h to p efficiently: Implementing finite and spectral\/hp element methods to achieve optimal performance for low-and high-order discretisations. Journal of Computational Physics 229 13 (2010) 5161\u20135181.","DOI":"10.1016\/j.jcp.2010.03.031"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1137\/23M1625962"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1515\/cmam-2020-0078"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1515\/cmam-2024-0192"}],"container-title":["ACM Transactions on Mathematical Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3765616","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T16:29:07Z","timestamp":1759163347000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3765616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,29]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3765616"],"URL":"https:\/\/doi.org\/10.1145\/3765616","relation":{},"ISSN":["0098-3500","1557-7295"],"issn-type":[{"type":"print","value":"0098-3500"},{"type":"electronic","value":"1557-7295"}],"subject":[],"published":{"date-parts":[[2025,9,29]]},"assertion":[{"value":"2024-04-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-25","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}