{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:29:20Z","timestamp":1753882160412,"version":"3.41.2"},"reference-count":29,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T00:00:00Z","timestamp":1744934400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"PNRR National Centre for High Performance Simulations, Computing and Data Analysis","award":["(Spoke 1, CUP: I53C22000690001)"],"award-info":[{"award-number":["(Spoke 1, CUP: I53C22000690001)"]}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["205602"],"award-info":[{"award-number":["205602"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>\n            Speed and efficiency of codes for atomistic simulations can be improved through refactoring and tailoring for GPU architectures. This activity, however, comes with associated, often overlooked, costs, namely a reduced readability and flexibility upon optimization and a non-negligible development time. The first element becomes particularly cogent when who carries out the code GPU porting task is not the creator of the algorithm. In this manuscript we investigate these issues by developing and comparing a CUDA (Compute Unified Device Architecture) and an OpenACC version of the MaZe simulative engine, a recently proposed tool for\n            <jats:italic>first principles<\/jats:italic>\n            molecular dynamics with interactions computed at the Orbital Free Density Functional level. We developed in approximately the same amount of time the two code bases. Given that this code bears several computational bottlenecks, and given the development time restraints, we ultimately found that OpenACC leads to a code that is not only simpler to maintain, but also faster, as in the OpenACC code base more routines were optimized compared to CUDA.\n          <\/jats:p>","DOI":"10.1177\/10943420251331673","type":"journal-article","created":{"date-parts":[[2025,4,19]],"date-time":"2025-04-19T11:44:34Z","timestamp":1745063074000},"page":"502-518","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["A tale of two codes: CUDA vs OpenACC for mass-zero constrained dynamics"],"prefix":"10.1177","volume":"39","author":[{"given":"Alessia","family":"Vignolo","sequence":"first","affiliation":[{"name":"Liguria Digitale, Genova, Italy"},{"name":"Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genova, Italy"}]},{"given":"Taylor James","family":"Baird","sequence":"additional","affiliation":[{"name":"Centre Europ\u00e9en de Calcul Atomique et Mol\u00e9culaire (CECAM), Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}]},{"given":"Filippo","family":"Spiga","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation, Cambridge, UK"}]},{"given":"Claudia","family":"Canevari","sequence":"additional","affiliation":[{"name":"Liguria Digitale, Genova, Italy"},{"name":"Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genova, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7131-3210","authenticated-orcid":false,"given":"Alessandro","family":"Coretti","sequence":"additional","affiliation":[{"name":"Faculty of Physics, University of Vienna, Vienna, Austria"}]},{"given":"Rodolphe","family":"Vuilleumier","sequence":"additional","affiliation":[{"name":"PASTEUR, D\u00e9partement de chimie, \u00c9cole Normale Sup\u00e9rieure, PSL University, Sorbonne Universit\u00e9, CNRS, Paris, France"}]},{"given":"Andrea","family":"Cavalli","sequence":"additional","affiliation":[{"name":"Centre Europ\u00e9en de Calcul Atomique et Mol\u00e9culaire (CECAM), Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}]},{"given":"Sara","family":"Bonella","sequence":"additional","affiliation":[{"name":"Centre Europ\u00e9en de Calcul Atomique et Mol\u00e9culaire (CECAM), Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne, Lausanne, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8371-2270","authenticated-orcid":false,"given":"Sergio","family":"Decherchi","sequence":"additional","affiliation":[{"name":"Data Science and Computation Facility, Fondazione, Istituto Italiano di Tecnologia, Genova, Italy"}]}],"member":"179","published-online":{"date-parts":[[2025,4,18]]},"reference":[{"key":"e_1_3_4_2_1","unstructured":"AMD (2023) HIP: heterogeneous-compute interface for portability documentation. URL: https:\/\/rocm.docs.amd.com\/en\/latest\/Programming_Guides\/HIP-GUIDE.html (Accessed November 2023)."},{"key":"e_1_3_4_3_1","doi-asserted-by":"publisher","DOI":"10.1039\/D0CP00163E"},{"key":"e_1_3_4_4_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.5055704"},{"key":"e_1_3_4_5_1","doi-asserted-by":"publisher","DOI":"10.1063\/5.0007192"},{"key":"e_1_3_4_6_1","doi-asserted-by":"publisher","DOI":"10.1063\/5.0130117"},{"key":"e_1_3_4_7_1","doi-asserted-by":"publisher","DOI":"10.1140\/epjb\/s10051-021-00165-0"},{"key":"e_1_3_4_8_1","unstructured":"Group K (2020) SYCL specification. Version 2020 Provisional. URL: https:\/\/www.khronos.org\/registry\/SYCL\/specs\/sycl-2020\/html\/sycl-2020.html."},{"key":"e_1_3_4_9_1","doi-asserted-by":"publisher","unstructured":"Hoshino T Maruyama N Matsuoka S et al. (2013) Cuda vs openacc: performance case studies with kernel benchmarks and a memory-bound cfd application. In 2013 13th IEEE\/ACM international symposium on cluster cloud and grid computing Delft Netherlands 13\u201316 May 2013. pp. 136\u2013143. DOI: 10.1109\/CCGrid.2013.12.","DOI":"10.1109\/CCGrid.2013.12"},{"key":"e_1_3_4_10_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.02352"},{"key":"e_1_3_4_11_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/1740\/1\/012056"},{"key":"e_1_3_4_12_1","doi-asserted-by":"publisher","DOI":"10.1201\/9781003176664-14"},{"key":"e_1_3_4_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.softx.2017.11.002"},{"key":"e_1_3_4_14_1","doi-asserted-by":"publisher","unstructured":"Li X Shih PC (2018) Performance comparison of cuda and openacc based on optimizations. In: Proceedings of the 2018 2nd high performance computing and cluster technologies conference Beijing China 22\u201324 June 2018. ACM. DOI: 10.1145\/3234664.3234681.","DOI":"10.1145\/3234664.3234681"},{"key":"e_1_3_4_15_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.02373"},{"key":"e_1_3_4_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_3_4_17_1","doi-asserted-by":"publisher","unstructured":"Memeti S Li L Pllana S et al. (2017) Benchmarking opencl openacc openmp and cuda: programming productivity performance and energy consumption. In Proceedings of the 2017 workshop on adaptive resource management and scheduling for cloud computing. PODC\u2019 17 ACM Washington DC 28 July 2017 1\u20136. DOI:10.1145\/3110355.3110356.","DOI":"10.1145\/3110355.3110356"},{"key":"e_1_3_4_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2015.04.022"},{"key":"e_1_3_4_19_1","unstructured":"NVIDIA (2024) CUDA toolkit documentation v12.5."},{"key":"e_1_3_4_20_1","unstructured":"OpenACC-Standard.org (2020) The OpenACC application programming interface. Ver. 3.1."},{"key":"e_1_3_4_21_1","unstructured":"OpenMP Architecture Review Board (2015) OpenMP application programming interface. Ver. 4.5."},{"key":"e_1_3_4_22_1","unstructured":"Oyarzun G Mira D Houzeaux G (2021) Performance assessment of cuda and openacc in large scale combustion simulations. ArXiv Preprint arXiv:2107.11541."},{"key":"e_1_3_4_23_1","unstructured":"RISC-V (2017) The RISC-V instruction set manual. URL: https:\/\/riscv.org\/wp-content\/uploads\/2017\/05\/riscv-spec-v2.2.pdf."},{"key":"e_1_3_4_24_1","unstructured":"Rozi\u00e8re B Gehring J Gloeckle F et al. (2024) Code llama: open foundation models for code. ArXiv Preprint arXiv:2308.12950."},{"key":"e_1_3_4_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.69"},{"key":"e_1_3_4_26_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0305004100011683"},{"key":"e_1_3_4_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3097283"},{"key":"e_1_3_4_28_1","doi-asserted-by":"publisher","unstructured":"Valero-Lara P Lee S Gonzalez-Tallada M et al. (2022) Kokkacc: enhancing kokkos with openacc. In 2022 workshop on accelerator programming using directives (WACCPD) Dallas TX 13\u201318 November 2022 pp. 32\u201342. DOI: 10.1109\/WACCPD56842.2022.00009.","DOI":"10.1109\/WACCPD56842.2022.00009"},{"key":"e_1_3_4_29_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.60.16350"},{"key":"e_1_3_4_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01337700"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251331673","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420251331673","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420251331673","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,8]],"date-time":"2025-07-08T17:31:03Z","timestamp":1751995863000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420251331673"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,18]]},"references-count":29,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.1177\/10943420251331673"],"URL":"https:\/\/doi.org\/10.1177\/10943420251331673","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2025,4,18]]}}}