{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T08:02:20Z","timestamp":1768032140343,"version":"3.49.0"},"reference-count":37,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T00:00:00Z","timestamp":1747699200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>\n            Looking for high performance hydrocode simulations on heterogeneous architectures, we detail a performance portable implementation of a second-order accurate 2-D Cartesian explicit CFD solver using Julia\u2019s Just-in-Time (JIT) compilation. In this work, a custom abstraction layer is used targeting two Julia packages, Polyester.jl for efficient shared memory multithreading on CPUs and KernelAbstractions.jl for appropriate backends on GPUs. Using very same optimizations and data structures than those used with Julia, comparisons to static C++ Kokkos compilation are then provided, including speedups and energy consumptions on high-end CPUs and GPUs available mid-2022. Using a single 64-core CPU with a few million cells to benefit from cache effects in multithread mode, the Julia code (\u22480.5 \u00d7 10\n            <jats:sup>9<\/jats:sup>\n            cell-cycles\/s) is superior to its C++ Kokkos counterpart, with a very same lower limit (\u22480.16 \u00d7 10\n            <jats:sup>9<\/jats:sup>\n            cell-cycles\/s) for higher numbers of cells. Using one GPU, the C++ Kokkos implementation is slightly superior, the Julia implementation tending to the same upper limit (\u22481.5 \u00d7 10\n            <jats:sup>9<\/jats:sup>\n            cell-cycles\/s) when the GPU memory (40 GiB) is entirely used. With a small number of floating-point operations per cell and time step, Cartesian solvers are singular in the CFD landscape, such solvers being essentially memory bandwidth bound on both CPUs and GPUs. In this context, at the compute node level, the compute capability of the CPU(s) cannot be underestimated, with (much) more memory available per cell for multi-physics variables and - year over year - improved memory bandwidths, larger caches and higher floating-point capabilities. Indeed, for high performance computing (HPC) simulations involving many MPI processes, communications between compute nodes become significant and best efforts are requested to overlap communications with computations. The performance portable Julia implementation of the CFD solver presented here combines domain decomposition and directional splitting using a static scheduling approach. Benefits from asynchronous communications appear with 16 GPUs on 4 nodes. At best, on this small-size configuration, the GPU mode of the Julia performance portable code brings at full GPUs\u2019 memory capacity a factor of 14\u00d7 in performance and a factor of 8\u00d7 in device energy efficiency compared to the CPU mode. Such a work, among others, confirms the potential of the Julia programming language and its emerging HPC software stack, offering (i) the power of a scripting language, (ii) the performances of a compiled language, and perhaps even more importantly (iii) an access to a compilation toolchain with new opportunities for developers to tackle heterogeneous computing architectures.\n          <\/jats:p>","DOI":"10.1177\/10943420251341179","type":"journal-article","created":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T12:04:12Z","timestamp":1747742652000},"page":"481-501","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["Julia versus C++ Kokkos for performance portable Cartesian CFD solvers on heterogeneous architectures"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-6261-6472","authenticated-orcid":false,"given":"Luc","family":"Briand","sequence":"first","affiliation":[{"name":"Laboratoire en Informatique Haute Performance pour le Calcul et La Simulation, Universit\u00e9 Paris-Saclay, CEA DAM DIF, Arpajon, France"},{"name":"CEA, DAM, DIF, DSSI, Arpajon, France"}]},{"given":"Herv\u00e9","family":"Jourdren","sequence":"additional","affiliation":[{"name":"Laboratoire en Informatique Haute Performance pour le Calcul et La Simulation, Universit\u00e9 Paris-Saclay, CEA DAM DIF, Arpajon, France"},{"name":"CEA, DAM, DIF, DSSI, Arpajon, France"}]},{"given":"Marc","family":"P\u00e9rache","sequence":"additional","affiliation":[{"name":"Laboratoire en Informatique Haute Performance pour le Calcul et La Simulation, Universit\u00e9 Paris-Saclay, CEA DAM DIF, Arpajon, France"},{"name":"CEA, DAM, DIF, DSSI, Arpajon, France"}]}],"member":"179","published-online":{"date-parts":[[2025,5,20]]},"reference":[{"key":"e_1_3_5_2_1","unstructured":"Bauer C (2023a) LIKWID.jl. Original-date:2020-10-20T14:37:42Z. https:\/\/github.com\/JuliaPerf\/LIKWID.jl"},{"key":"e_1_3_5_3_1","unstructured":"Bauer C (2023b) ThreadPinning.jl. Original-date: 2021-10-13T09:30:43Z. https:\/\/github.com\/carstenbauer\/ThreadPinning.jl"},{"key":"e_1_3_5_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/P3HPC49587.2019.00012"},{"key":"e_1_3_5_5_1","unstructured":"Besard T (2022) oneAPI.jl. https:\/\/github.com\/JuliaGPU\/oneAPI.jl"},{"key":"e_1_3_5_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2872064"},{"key":"e_1_3_5_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/141000671"},{"key":"e_1_3_5_8_1","doi-asserted-by":"publisher","DOI":"10.21105\/jcon.00068"},{"key":"e_1_3_5_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.07.003"},{"key":"e_1_3_5_10_1","unstructured":"Churavy V (2023) KernelAbstractions.jl. https:\/\/github.com\/JuliaGPU\/KernelAbstractions.jl"},{"key":"e_1_3_5_11_1","unstructured":"Churavy V Godoy WF Bauer C et al. (2022) Bridging HPC communities through the Julia programming language. Submitted for review. https:\/\/arxiv.org\/abs\/2211.02740"},{"key":"e_1_3_5_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.crma.2015.11.008"},{"key":"e_1_3_5_13_1","unstructured":"Danial A (2021) cloc: v1.92. https:\/\/doi.org\/10.5281\/zenodo.5760077"},{"key":"e_1_3_5_14_1","unstructured":"Djoudi L Barthou D Carribault P et al. (2005) Maqao : modular assembler quality analyzer and optimizer for itanium 2. https:\/\/www.labri.fr\/perso\/barthou\/ps\/maqao.pdf"},{"key":"e_1_3_5_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.crma.2009.12.008"},{"key":"e_1_3_5_16_1","unstructured":"Elrod C (2023a) LoopVectorization.jl. Original-date: 2019-01-14T05:55:52Z. https:\/\/github.com\/JuliaSIMD\/LoopVectorization.jl"},{"key":"e_1_3_5_17_1","unstructured":"Elrod C (2023b) Polyester.jl. Original-date: 2021-02-20T01:45:49Z. https:\/\/github.com\/JuliaSIMD\/Polyester.jl"},{"key":"e_1_3_5_18_1","doi-asserted-by":"publisher","DOI":"10.1177\/10943420211028940"},{"key":"e_1_3_5_19_1","first-page":"1","article-title":"The Spack package manager: bringing order to HPC software chaos","author":"Gamblin T","year":"2015","unstructured":"Gamblin T, LeGendre M, Collette MR, et al. (2015) The Spack package manager: bringing order to HPC software chaos. IEEE Computer Society: 1\u201312. https:\/\/www.computer.org\/csdl\/proceedings-article\/sc\/2015\/2807623\/12OmNBf94Xq.ISSN:2167-4337","journal-title":"IEEE Computer Society"},{"key":"e_1_3_5_20_1","doi-asserted-by":"crossref","unstructured":"Godoy WF Valero-Lara P Anderson C et al. (2023) Julia as a unifying end-to-end workflow language on the Frontier exascale system. In: Proceedings of the SC \u201923 workshops of the international conference on high performance computing network storage and analysis SC-W \u201923. New York NY USA: Association for Computing Machinery 1989\u20131999.","DOI":"10.1145\/3624062.3624278"},{"key":"e_1_3_5_21_1","volume-title":"The International Journal of High Performance Computing Applications","author":"Grete P","year":"2022","unstructured":"Grete P, Dolence JC, Miller JM, et al. (2022) Parthenon\u2014a performance portable block-structured adaptive mesh refinement framework. The International Journal of High Performance Computing Applications. Publisher: Sage Publications Ltd STM, 10943420221143775."},{"key":"e_1_3_5_22_1","unstructured":"Gruber T Eitzinger J Hager G et al. (2023) Likwid. https:\/\/zenodo.org\/records\/10105559"},{"key":"e_1_3_5_23_1","unstructured":"Haber T (2023) PAPI.jl. Original-date: 2020-10-28T15:46:18Z. https:\/\/github.com\/JuliaPerf\/PAPI.jl"},{"key":"e_1_3_5_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2008.10.005"},{"key":"e_1_3_5_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/PMBS51919.2020.00008"},{"key":"e_1_3_5_26_1","unstructured":"Innes JM (2023) MacroTools.jl. Original-date:2015-07-09T14:20:08Z. https:\/\/github.com\/FluxML\/MacroTools.jl"},{"key":"e_1_3_5_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-27039-6_19"},{"key":"e_1_3_5_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2017.04.002"},{"key":"e_1_3_5_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/PMBS54543.2021.00016"},{"key":"e_1_3_5_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2011.5999834"},{"key":"e_1_3_5_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85451-7_9"},{"key":"e_1_3_5_32_1","doi-asserted-by":"crossref","unstructured":"Ramadhan A Wagner GL Hill C et al. (2020) Oceananigans.jl: fast and friendly geophysical fluid dynamics on GPUs. Issue: 53 Pages: 2018 Publication Title: Journal of Open Source Software Volume: 5 original-date: 2018-10-13T14:15:44Z. https:\/\/github.com\/CliMA\/Oceananigans.jl","DOI":"10.21105\/joss.02018"},{"key":"e_1_3_5_33_1","unstructured":"Samaroo J Smirnov A Churavy V et al. (2023) AMDGPU.jl. https:\/\/github.com\/JuliaGPU\/AMDGPU.jl.Original-date:2020-07-02T16:16:24Z"},{"key":"e_1_3_5_34_1","unstructured":"Schanen M Maldonado A Pacaud F et al. (2020) ExaPF.jl: a power flow solver for GPUs. In: Proceedings of JuliaCon 2020. Groupe d\u2019\u00e9tudes et de recherche en analyse des d\u00e9cisions (GERAD). https:\/\/www.gerad.ca\/fr\/papers\/G-2020-74"},{"key":"e_1_3_5_35_1","unstructured":"Schlottke-Lakemper M Gassner GJ Ranocha H et al. (2021) Trixi.jl: adaptive high-order numerical simulations of hyperbolic PDEs in Julia. https:\/\/github.com\/trixi-framework\/Trixi.jl"},{"key":"e_1_3_5_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3097283"},{"key":"e_1_3_5_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(84)90142-6"},{"key":"e_1_3_5_38_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.01370"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251341179","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420251341179","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251341179","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,8]],"date-time":"2025-07-08T17:30:57Z","timestamp":1751995857000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420251341179"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,20]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.1177\/10943420251341179"],"URL":"https:\/\/doi.org\/10.1177\/10943420251341179","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,20]]}}}