{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T20:04:04Z","timestamp":1770753844525,"version":"3.50.0"},"publisher-location":"Cham","reference-count":22,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031320408","type":"print"},{"value":"9783031320415","type":"electronic"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,10]],"date-time":"2023-05-10T00:00:00Z","timestamp":1683676800000},"content-version":"vor","delay-in-days":129,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP\u2019s  clause, i.e.,\u00a0the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude.<\/jats:p>","DOI":"10.1007\/978-3-031-32041-5_4","type":"book-chapter","created":{"date-parts":[[2023,5,10]],"date-time":"2023-05-10T21:56:29Z","timestamp":1683755789000},"page":"65-85","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Efficient GPU Offloading with\u00a0OpenMP for\u00a0a\u00a0Hyperbolic Finite Volume Solver on\u00a0Dynamically Adaptive Meshes"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2567-643X","authenticated-orcid":false,"given":"Mario","family":"Wille","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6208-1841","authenticated-orcid":false,"given":"Tobias","family":"Weinzierl","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1138-3679","authenticated-orcid":false,"given":"Gonzalo","family":"Brito Gadeschi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4334-1938","authenticated-orcid":false,"given":"Michael","family":"Bader","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,5,10]]},"reference":[{"issue":"6","key":"4_CR1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.85.064040","volume":"85","author":"D Alic","year":"2012","unstructured":"Alic, D., Bona-Casas, C., Bona, C., Rezzolla, L., Palenzuela, C.: Conformal and covariant formulation of the Z4 system with constraint-violation damping. Phys. Rev. D 85(6), 064040 (2012)","journal-title":"Phys. Rev. D"},{"key":"4_CR2","series-title":"Texts in Computational Science and Engineering","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31046-1","volume-title":"Space-Filling Curves\u2013An Introduction with Applications in Scientific Computing","author":"M Bader","year":"2013","unstructured":"Bader, M.: Space-Filling Curves\u2013An Introduction with Applications in Scientific Computing. Texts in Computational Science and Engineering, vol. 9. Springer, Heidelberg (2013). https:\/\/doi.org\/10.1007\/978-3-642-31046-1"},{"key":"4_CR3","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1086\/191028","volume":"58","author":"E Bertschinger","year":"1985","unstructured":"Bertschinger, E.: Self-similar secondary infall and accretion in an Einstein-de Sitter universe. Astrophys. J. Suppl. Ser. 58, 39\u201365 (1985)","journal-title":"Astrophys. J. Suppl. Ser."},{"issue":"3","key":"4_CR4","doi-asserted-by":"publisher","first-page":"C69","DOI":"10.1137\/19M1276194","volume":"42","author":"D Charrier","year":"2020","unstructured":"Charrier, D., Hazelwood, B., Weinzierl, T.: Enclave tasking for DG methods on dynamically adaptive meshes. SIAM J. Sci. Comput. 42(3), C69\u2013C96 (2020)","journal-title":"SIAM J. Sci. Comput."},{"issue":"2","key":"4_CR5","doi-asserted-by":"publisher","first-page":"25","DOI":"10.3847\/1538-4365\/ac157b","volume":"257","author":"B Daszuta","year":"2021","unstructured":"Daszuta, B., Zappa, F., Cook, W., Radice, D., Bernuzzi, S., Morozova, V.: GR-Athena++: puncture evolutions on vertex-centered oct-tree adaptive mesh refinement. Astrophys. J. Suppl. Ser. 257(2), 25 (2021)","journal-title":"Astrophys. J. Suppl. Ser."},{"issue":"05","key":"4_CR6","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1109\/MCSE.2021.3099603","volume":"23","author":"A Dubey","year":"2021","unstructured":"Dubey, A., Berzins, M., Burstedde, C., Norman, M.L., Unat, D., Wahib, M.: Structured adaptive mesh refinement adaptations to retain performance portability with increasing heterogeneity. Comput. Sci. Eng. 23(05), 62\u201366 (2021)","journal-title":"Comput. Sci. Eng."},{"issue":"3","key":"4_CR7","doi-asserted-by":"publisher","first-page":"63","DOI":"10.3390\/axioms7030063","volume":"7","author":"M Dumbser","year":"2018","unstructured":"Dumbser, M., Fambri, F., Tavelli, M., Bader, M., Weinzierl, T.: Efficient implementation of ADER discontinuous Galerkin schemes for a scalable hyperbolic PDE engine. Axioms 7(3), 63 (2018)","journal-title":"Axioms"},{"key":"4_CR8","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.97.084053","volume":"97","author":"M Dumbser","year":"2018","unstructured":"Dumbser, M., Guercilena, F., K\u00f6ppel, S., Rezzolla, L., Zanotti, O.: Conformal and covariant Z4 formulation of the Einstein equations: strongly hyperbolic first-order reduction and solution with discontinuous Galerkin schemes. Phys. Rev. D 97, 084053 (2018)","journal-title":"Phys. Rev. D"},{"key":"4_CR9","doi-asserted-by":"crossref","unstructured":"Fernando, M., et al.: A GPU-accelerated AMR solver for gravitational wave propagation. In: 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1078\u20131092. IEEE Computer Society (2022)","DOI":"10.1109\/SC41404.2022.00080"},{"key":"4_CR10","doi-asserted-by":"crossref","unstructured":"Huber, J., et al.: Efficient execution of OpenMP on GPUs. In: 2022 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO), pp. 41\u201352 (2022)","DOI":"10.1109\/CGO53902.2022.9741290"},{"key":"4_CR11","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.jcp.2016.12.059","volume":"335","author":"L Kidder","year":"2017","unstructured":"Kidder, L., et al.: SpECTRE: a task-based discontinuous Galerkin code for relativistic astrophysics. J. Comput. Phys. 335, 84\u2013114 (2017)","journal-title":"J. Comput. Phys."},{"key":"4_CR12","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1007\/978-3-031-07312-0_8","volume-title":"High Performance Computing","author":"B Li","year":"2022","unstructured":"Li, B., Schulz, H., Weinzierl, T., Zhang, H.: Dynamic task fusion for a block-structured finite volume solver over a dynamically adaptive mesh with local time stepping. In: Varbanescu, A.L., Bhatele, A., Luszczek, P., Marc, B. (eds.) ISC High Performance 2022. LNCS, vol. 13289, pp. 153\u2013173. Springer, Cham (2022). https:\/\/doi.org\/10.1007\/978-3-031-07312-0_8"},{"issue":"5\u20136","key":"4_CR13","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.1007\/s10766-018-0619-1","volume":"47","author":"B Peterson","year":"2018","unstructured":"Peterson, B., et al.: Automatic halo management for the Uintah GPU-heterogeneous asynchronous many-task runtime. Int. J. Parallel Programm. 47(5\u20136), 1086\u20131116 (2018). https:\/\/doi.org\/10.1007\/s10766-018-0619-1","journal-title":"Int. J. Parallel Programm."},{"issue":"8","key":"4_CR14","doi-asserted-by":"publisher","first-page":"2606","DOI":"10.1029\/2019MS001635","volume":"11","author":"X Qin","year":"2019","unstructured":"Qin, X., LeVeque, R., Motley, M.: Accelerating an adaptive mesh refinement code for depth-averaged flows using GPUs. J. Adv. Model. Earth Syst. 11(8), 2606\u20132628 (2019)","journal-title":"J. Adv. Model. Earth Syst."},{"key":"4_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2020.107251","volume":"254","author":"A Reinarz","year":"2020","unstructured":"Reinarz, A., et al.: ExaHyPE: an engine for parallel dynamically adaptive simulations of wave problems. Comput. Phys. Commun. 254, 107251 (2020)","journal-title":"Comput. Phys. Commun."},{"key":"4_CR16","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/978-3-030-85262-7_8","volume-title":"OpenMP: Enabling Massive Node-Level Parallelism","author":"H Schulz","year":"2021","unstructured":"Schulz, H., Gadeschi, G.B., Rudyy, O., Weinzierl, T.: Task inefficiency patterns for a wave equation solver. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 111\u2013124. Springer, Cham (2021). https:\/\/doi.org\/10.1007\/978-3-030-85262-7_8"},{"key":"4_CR17","doi-asserted-by":"crossref","unstructured":"Sundar, H., Ghattas, O.: A nested partitioning algorithm for adaptive meshes on heterogeneous clusters. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 319\u2013328 (2015)","DOI":"10.1145\/2751205.2751246"},{"key":"4_CR18","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1007\/978-3-030-85262-7_11","volume-title":"OpenMP: Enabling Massive Node-Level Parallelism","author":"S Tian","year":"2021","unstructured":"Tian, S., Chesterfield, J., Doerfert, J., Chapman, B.: Experience report: writing a portable GPU runtime with OpenMP 5.1. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 159\u2013169. Springer, Cham (2021). https:\/\/doi.org\/10.1007\/978-3-030-85262-7_11"},{"key":"4_CR19","doi-asserted-by":"crossref","unstructured":"Wahib, M., Maruyama, N., Aoki, T.: Daino: a high-level framework for parallel and efficient AMR on GPUs. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 621\u2013632 (2016)","DOI":"10.1109\/SC.2016.52"},{"issue":"2","key":"4_CR20","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1145\/3319797","volume":"45","author":"T Weinzierl","year":"2019","unstructured":"Weinzierl, T.: The Peano software\u2013parallel, automaton-based, dynamically adaptive grid traversals. ACM Trans. Math. Softw. 45(2), 14 (2019)","journal-title":"ACM Trans. Math. Softw."},{"key":"4_CR21","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1016\/j.compfluid.2015.06.020","volume":"118","author":"O Zanotti","year":"2015","unstructured":"Zanotti, O., Fambri, F., Dumbser, M., Hidalgo, A.: Space-time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori sub-cell finite volume limiting. Comput. Fluids 118, 204\u2013224 (2015)","journal-title":"Comput. Fluids"},{"issue":"2","key":"4_CR22","doi-asserted-by":"publisher","first-page":"2464","DOI":"10.1093\/mnras\/stac1991","volume":"515","author":"H Zhang","year":"2022","unstructured":"Zhang, H., Weinzierl, T., Schulz, H., Li, B.: Spherical accretion of collisional gas in modified gravity I: self-similar solutions and a new cosmological hydrodynamical code. Mon. Not. Roy. Astron. Soc. 515(2), 2464\u20132482 (2022)","journal-title":"Mon. Not. Roy. Astron. Soc."}],"container-title":["Lecture Notes in Computer Science","High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-32041-5_4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,10]],"date-time":"2023-05-10T22:03:10Z","timestamp":1683756190000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-32041-5_4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031320408","9783031320415"],"references-count":22,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-32041-5_4","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"10 May 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ISC High Performance","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on High Performance Computing","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Hamburg","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Germany","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2023","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"21 May 2023","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 May 2023","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"38","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"supercomputing2023","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/www.isc-hpc.com\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Linklings","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"78","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"21","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"27% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.74","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"4.49","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}