{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,7]],"date-time":"2025-07-07T00:26:05Z","timestamp":1751847965706,"version":"3.40.3"},"publisher-location":"Cham","reference-count":41,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783031104183"},{"type":"electronic","value":"9783031104190"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,1]],"date-time":"2022-07-01T00:00:00Z","timestamp":1656633600000},"content-version":"vor","delay-in-days":181,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>It is common in the HPC community that the achieved performance with just CPUs is limited for many computational cases. The EuroHPC pre-exascale and the coming exascale systems are mainly focused on accelerators, and some of the largest upcoming supercomputers such as LUMI and Frontier will be powered by AMD Instinct\u2122 accelerators. However, these new systems create many challenges for developers who are not familiar with the new ecosystem or with the required programming models that can be used to program for heterogeneous architectures. In this paper, we present some of the more well-known programming models to program for current and future GPU systems. We then measure the performance of each approach using a benchmark and a mini-app, test with various compilers, and tune the codes where necessary. Finally, we compare the performance, where possible, between the NVIDIA Volta (V100), Ampere (A100) GPUs, and the AMD MI100 GPU.<\/jats:p>","DOI":"10.1007\/978-3-031-10419-0_6","type":"book-chapter","created":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T17:07:51Z","timestamp":1656608871000},"page":"79-101","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Evaluating GPU Programming Models for\u00a0the\u00a0LUMI Supercomputer"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5571-4823","authenticated-orcid":false,"given":"George S.","family":"Markomanolis","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aksel","family":"Alpay","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9841-4057","authenticated-orcid":false,"given":"Jeffrey","family":"Young","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8634-4634","authenticated-orcid":false,"given":"Michael","family":"Klemm","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6259-7453","authenticated-orcid":false,"given":"Nicholas","family":"Malaya","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1597-0811","authenticated-orcid":false,"given":"Aniello","family":"Esposito","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3396-6154","authenticated-orcid":false,"given":"Jussi","family":"Heikonen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sergei","family":"Bastrakov","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3844-3697","authenticated-orcid":false,"given":"Alexander","family":"Debus","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4861-5584","authenticated-orcid":false,"given":"Thomas","family":"Kluge","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8965-1149","authenticated-orcid":false,"given":"Klaus","family":"Steiniger","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7839-4386","authenticated-orcid":false,"given":"Jan","family":"Stephan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1642-0459","authenticated-orcid":false,"given":"Rene","family":"Widera","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8258-3881","authenticated-orcid":false,"given":"Michael","family":"Bussmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,7,1]]},"reference":[{"key":"6_CR1","unstructured":"CSC LUMI supercomputer. https:\/\/www.lumi-supercomputer.eu\/lumis-full-system-architecture-revealed\/"},{"key":"6_CR2","unstructured":"Frontier web page. https:\/\/www.olcf.ornl.gov\/frontier\/"},{"key":"6_CR3","unstructured":"NVIDIA. CUDA. https:\/\/developer.nvidia.com\/about-cuda"},{"key":"6_CR4","doi-asserted-by":"publisher","unstructured":"Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. In: Computing in Science & Engineering, vol. 12, no. 3, pp. 66\u201373, May-June 2010. https:\/\/doi.org\/10.1109\/MCSE.2010.69","DOI":"10.1109\/MCSE.2010.69"},{"key":"6_CR5","unstructured":"OpenMP Architecture Review Board. OpenMP Application Programming Interface, version 4.0. https:\/\/openmp.org\/40pdf"},{"key":"6_CR6","unstructured":"OpenACC Specification 3.0. https:\/\/www.openacc.org\/sites\/default\/files\/inline-images\/Specification\/OpenACC.3.0.pdf"},{"key":"6_CR7","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1007\/978-3-030-74224-9_2","volume-title":"Accelerator Programming Using Directives","author":"JH Davis","year":"2021","unstructured":"Davis, J.H., Daley, C., Pophale, S., Huber, T., Chandrasekaran, S., Wright, N.J.: Performance assessment of OpenMP compilers targeting NVIDIA V100 GPUs. In: Bhalachandra, S., Wienke, S., Chandrasekaran, S., Juckeland, G. (eds.) WACCPD 2020. LNCS, vol. 12655, pp. 25\u201344. Springer, Cham (2021). https:\/\/doi.org\/10.1007\/978-3-030-74224-9_2"},{"key":"6_CR8","doi-asserted-by":"crossref","unstructured":"Poenaru, A., Lin, W.-C., McIntosh-Smith, S.: A performance analysis of modern parallel programming models using a compute-bound application. In: 36th International Conference, ISC High Performance 2021, Frankfurt, Germany (2021)","DOI":"10.1007\/978-3-030-78713-4_18"},{"key":"6_CR9","doi-asserted-by":"crossref","unstructured":"Khalilov, M., Timoveev, A.: Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU. In: Journal of Physics: Conference Series, vol. 1740 (2021)","DOI":"10.1088\/1742-6596\/1740\/1\/012056"},{"key":"6_CR10","doi-asserted-by":"crossref","unstructured":"Deakin, T., McIntosh-Smith, S.: Evaluating the performance of HPC-style SYCL applications. In: Proceedings of the International Workshop on OpenCL (2020)","DOI":"10.1145\/3388333.3388643"},{"key":"6_CR11","doi-asserted-by":"publisher","unstructured":"Deakin, T., et al.: Performance portability across diverse computer architectures. In: 2019 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 1\u201313 (2019). https:\/\/doi.org\/10.1109\/P3HPC49587.2019.00006","DOI":"10.1109\/P3HPC49587.2019.00006"},{"key":"6_CR12","unstructured":"AMD. ROCm Platform. https:\/\/github.com\/RadeonOpenCompute\/ROCm"},{"key":"6_CR13","unstructured":"AMD. ROCm Documentation. https:\/\/rocmdocs.amd.com\/en\/latest\/"},{"key":"6_CR14","unstructured":"AMD. HIP. https:\/\/github.com\/ROCm-Developer-Tools\/HIP"},{"key":"6_CR15","unstructured":"AMD. HIPify Tools. https:\/\/github.com\/ROCm-Developer-Tools\/HIPIFY"},{"key":"6_CR16","unstructured":"AMD. HIP Porting Guide. https:\/\/github.com\/RadeonOpenCompute\/ROCm_Documentation\/blob\/master\/Programming_Guides\/HIP-porting-guide.rst"},{"key":"6_CR17","unstructured":"CSC. Porting GPU Codes to HIP. https:\/\/github.com\/csc-training\/hip"},{"key":"6_CR18","doi-asserted-by":"crossref","unstructured":"de Supinski, B.R., et al.: The ongoing evolution of OpenMP. In: Proceedings of the IEEE, vol. 106, no. 11, pp. 2004\u20132019, November 2018","DOI":"10.1109\/JPROC.2018.2853600"},{"key":"6_CR19","unstructured":"Khronos Group. SYCL 2020 Specification. https:\/\/www.khronos.org\/registry\/SYCL\/specs\/sycl-2020\/pdf\/sycl-2020.pdf"},{"key":"6_CR20","unstructured":"Codeplay Software. ComputeCpp. https:\/\/www.codeplay.com\/solutions\/ecosystem\/"},{"key":"6_CR21","unstructured":"Intel Corporation. SYCL* Compiler and Runtimes. https:\/\/github.com\/intel\/llvm"},{"key":"6_CR22","doi-asserted-by":"crossref","unstructured":"Alpay, A., Heuveline, V.: SYCL beyond OpenCL: the architecture, current state and future direction of hipSYCL. In: Proceedings of the International Workshop on OpenCL (IWOCL 2020), Association for Computing Machinery, New York, Article vol. 8, no. 1 (2020). https:\/\/github.com\/illuhad\/hipSYCL","DOI":"10.1145\/3388333.3388658"},{"key":"6_CR23","unstructured":"triSYCL. https:\/\/github.com\/trisycl\/trisycl"},{"key":"6_CR24","unstructured":"ORNL and Mentor Graphics. https:\/\/www.olcf.ornl.gov\/2020\/09\/03\/oak-ridge-leadership-computing-facility-fosters-gcc-compiler-development-with-mentor-contract\/"},{"key":"6_CR25","doi-asserted-by":"crossref","unstructured":"Denny, J.E., Lee, S. and Vetter, J.S.: Clacc: translating OpenACC to OpenMP in clang. In: 2018 IEEE\/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC. LLVM-HPC), Dallas, TX, USA (2018)","DOI":"10.1109\/LLVM-HPC.2018.8639349"},{"key":"6_CR26","unstructured":"Clacc. https:\/\/github.com\/llvm-doe-org\/llvm-project\/tree\/clacc\/main"},{"key":"6_CR27","doi-asserted-by":"crossref","unstructured":"Zenker, E., et al.: Alpaka-an abstraction library for parallel kernel acceleration. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 631\u2013640, May 2016","DOI":"10.1109\/IPDPSW.2016.50"},{"key":"6_CR28","doi-asserted-by":"crossref","unstructured":"Bussmann, M., et al.: Radiative signature of the relativistic kelvin-helmholtz instability. In: SC 2013: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1\u201312 (2013)","DOI":"10.1145\/2503210.2504564"},{"key":"6_CR29","unstructured":"Ren\u00e9, W., Sergei, B., Simeon, E., Jeffrey, K., Jan, S.: Cupla - C++ User interface for the Platform Independent Library alpaka. https:\/\/rodare.hzdr.de\/record\/1103"},{"key":"6_CR30","doi-asserted-by":"publisher","unstructured":"Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J Parall. Distrib. Comput. 74, 3202\u20133216 (2014). https:\/\/doi.org\/10.1016\/j.jpdc.2014.07.003","DOI":"10.1016\/j.jpdc.2014.07.003"},{"key":"6_CR31","unstructured":"AMD. hipfort. https:\/\/github.com\/ROCmSoftwarePlatform\/hipfort"},{"key":"6_CR32","doi-asserted-by":"publisher","unstructured":"Deakin, T., Price, J., Martineau, M., McIntosh-Smith, S.: GPU-STREAM v2.0: benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. In: Paper presented at P$$\\hat{\\,}$$MA Workshop at ISC High Performance, Frankfurt, Germany (2016). https:\/\/doi.org\/10.1007\/978-3-319-46079-6_34","DOI":"10.1007\/978-3-319-46079-6_34"},{"key":"6_CR33","unstructured":"Tom, D., Simon, M.-S.: BabelStream. https:\/\/github.com\/UoB-HPC\/BabelStream"},{"key":"6_CR34","unstructured":"miniBUDE. https:\/\/github.com\/UoB-HPC\/miniBUDE\/"},{"key":"6_CR35","unstructured":"CSC. Puhti Supercomputer. https:\/\/docs.csc.fi\/computing\/systems-puhti\/"},{"key":"6_CR36","unstructured":"CSC. Mahti Supercomputer. https:\/\/docs.csc.fi\/computing\/systems-mahti\/"},{"key":"6_CR37","unstructured":"CUPLA BabelStream Fork, v3.4-alpaka release https:\/\/github.com\/jyoung3131\/BabelStream\/releases\/tag\/v3.4-alpaka"},{"key":"6_CR38","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/j.jpdc.2017.04.002","volume":"107","author":"E Konstantinidis","year":"2017","unstructured":"Konstantinidis, E., Cotronis, Y.: A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parall. Distrib. Comput. 107, 37\u201356 (2017)","journal-title":"J. Parall. Distrib. Comput."},{"key":"6_CR39","unstructured":"Mixbench. https:\/\/github.com\/ekondis\/mixbench"},{"key":"6_CR40","unstructured":"Reproduce the results of the paper Evaluating GPU Programming Models for the LUMI Supercomputer. https:\/\/zenodo.org\/record\/6307447"},{"key":"6_CR41","unstructured":"Elbencho. https:\/\/github.com\/breuner\/elbencho"}],"container-title":["Lecture Notes in Computer Science","Supercomputing Frontiers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-10419-0_6","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T17:13:50Z","timestamp":1656609230000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-10419-0_6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783031104183","9783031104190"],"references-count":41,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-10419-0_6","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"1 July 2022","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"SCFA","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Asian Conference on Supercomputing Frontiers","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Singapore","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Singapore","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1 March 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"3 March 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"7","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"scfa2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"21","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"8","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"38% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.8","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.5","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}