{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T20:28:15Z","timestamp":1776457695938,"version":"3.51.2"},"reference-count":54,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2108970"],"award-info":[{"award-number":["2108970"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["ACI-1339893"],"award-info":[{"award-number":["ACI-1339893"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006168","name":"National Nuclear Security Administration","doi-asserted-by":"publisher","award":["DE- NA0004144"],"award-info":[{"award-number":["DE- NA0004144"]}],"id":[{"id":"10.13039\/100006168","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006168","name":"National Nuclear Security Administration","doi-asserted-by":"publisher","award":["DE-NA0003842"],"award-info":[{"award-number":["DE-NA0003842"]}],"id":[{"id":"10.13039\/100006168","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006168","name":"National Nuclear Security Administration","doi-asserted-by":"publisher","award":["DE-NA0004131"],"award-info":[{"award-number":["DE-NA0004131"]}],"id":[{"id":"10.13039\/100006168","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006168","name":"National Nuclear Security Administration","doi-asserted-by":"publisher","award":["DE-NA0004147"],"award-info":[{"award-number":["DE-NA0004147"]}],"id":[{"id":"10.13039\/100006168","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["695008"],"award-info":[{"award-number":["695008"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["ERC-2015-AdG"],"award-info":[{"award-number":["ERC-2015-AdG"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["J. Plasma Phys."],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:p>Fully relativistic particle-in-cell (PIC) simulations are crucial for advancing our knowledge of plasma physics. Modern supercomputers based on graphics processing units (GPUs) offer the potential to perform PIC simulations of unprecedented scale, but require robust and feature-rich codes that can fully leverage their computational resources. In this work, this demand is addressed by adding GPU acceleration to the PIC code <jats:sc>Osiris<\/jats:sc>. An overview of the algorithm, which features a CUDA extension to the underlying Fortran architecture, is given. Detailed performance benchmarks for thermal plasmas are presented, which demonstrate excellent weak scaling on NERSC's Perlmutter supercomputer and high levels of absolute performance. The robustness of the code to model a variety of physical systems is demonstrated via simulations of Weibel filamentation and laser-wakefield acceleration run with dynamic load balancing. Finally, measurements and analysis of energy consumption are provided that indicate that the GPU algorithm is up to <jats:inline-formula>\n\t      <jats:alternatives>\n\t\t<jats:tex-math>${\\sim }$<\/jats:tex-math>\n\t\t<jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" mime-subtype=\"png\" xlink:href=\"S0022377824001569_inline1.png\"\/>\n\t      <\/jats:alternatives>\n\t    <\/jats:inline-formula>14 times faster and <jats:inline-formula>\n\t      <jats:alternatives>\n\t\t<jats:tex-math>$\\sim$<\/jats:tex-math>\n\t\t<jats:inline-graphic xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" mime-subtype=\"png\" xlink:href=\"S0022377824001569_inline2.png\"\/>\n\t      <\/jats:alternatives>\n\t    <\/jats:inline-formula>7 times more energy efficient than the optimized CPU algorithm on a node-to-node basis. The described development addresses the PIC simulation community's computational demands both by contributing a robust and performant GPU-accelerated PIC code and by providing insight into efficient use of GPU hardware.<\/jats:p>","DOI":"10.1017\/s0022377824001569","type":"journal-article","created":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T07:29:29Z","timestamp":1736234969000},"update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":3,"title":["Acceleration of the particle-in-cell code <scp>Osiris<\/scp> with graphics processing units"],"prefix":"10.1017","volume":"91","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6482-1735","authenticated-orcid":false,"given":"Roman P.","family":"Lee","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1450-873X","authenticated-orcid":false,"given":"Jacob R.","family":"Pierce","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4826-9001","authenticated-orcid":false,"given":"Kyle G.","family":"Miller","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7572-5072","authenticated-orcid":false,"given":"Maria","family":"Almanza","sequence":"additional","affiliation":[]},{"given":"Adam","family":"Tableman","sequence":"additional","affiliation":[]},{"given":"Viktor K.","family":"Decyk","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6342-6226","authenticated-orcid":false,"given":"Ricardo A.","family":"Fonseca","sequence":"additional","affiliation":[]},{"given":"E. Paulo","family":"Alves","sequence":"additional","affiliation":[]},{"given":"Warren B.","family":"Mori","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2025,1,7]]},"reference":[{"key":"S0022377824001569_ref29","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2020.107580"},{"key":"S0022377824001569_ref31","first-page":"061301","article-title":"Generating multi-GeV electron bunches using single stage laser wakefield acceleration in a 3D nonlinear regime","volume":"10","author":"Lu","year":"2007","journal-title":"Phys. Rev. Spec. Top"},{"key":"S0022377824001569_ref33","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2020.107633"},{"key":"S0022377824001569_ref16","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2022.108421"},{"key":"S0022377824001569_ref13","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2013.10.013"},{"key":"S0022377824001569_ref34","doi-asserted-by":"publisher","DOI":"10.1063\/5.0065232"},{"key":"S0022377824001569_ref38","unstructured":"NVIDIA 2024 CUDA C++ programming guide, release 12.6. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/, accessed: 2024-11-06."},{"key":"S0022377824001569_ref18","doi-asserted-by":"publisher","DOI":"10.1063\/1.1556605"},{"key":"S0022377824001569_ref35","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2021.102833"},{"key":"S0022377824001569_ref39","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917757"},{"key":"S0022377824001569_ref19","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-47789-6_36"},{"key":"S0022377824001569_ref22","doi-asserted-by":"publisher","DOI":"10.1201\/9780367806934"},{"key":"S0022377824001569_ref3","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2012.08.012"},{"key":"S0022377824001569_ref20","doi-asserted-by":"publisher","DOI":"10.1088\/0741-3335\/55\/12\/124011"},{"key":"S0022377824001569_ref27","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2010.11.032"},{"key":"S0022377824001569_ref4","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3084795"},{"key":"S0022377824001569_ref1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.103.013306"},{"key":"S0022377824001569_ref30","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2017.01.001"},{"key":"S0022377824001569_ref36","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.1998.6049"},{"key":"S0022377824001569_ref24","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-32149-3_5"},{"key":"S0022377824001569_ref11","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2014.131"},{"key":"S0022377824001569_ref42","doi-asserted-by":"publisher","DOI":"10.1086\/379156"},{"key":"S0022377824001569_ref46","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-77964-1_35"},{"key":"S0022377824001569_ref23","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5160980"},{"key":"S0022377824001569_ref51","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.2.83"},{"key":"S0022377824001569_ref8","doi-asserted-by":"crossref","first-page":"2831","DOI":"10.1109\/TPS.2010.2064310","article-title":"PIConGPU: a fully relativistic particle-in-cell code for a GPU cluster","volume":"38","author":"Burau","year":"2010","journal-title":"IEEE Trans. Plasma Sci"},{"key":"S0022377824001569_ref7","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/j.jpdc.2012.04.003","article-title":"Graphics processing unit (GPU) programming strategies and trends in GPU computing","volume":"73","author":"Brodtkorb","year":"2013","journal-title":"J. Parallel Distrib. Comput"},{"key":"S0022377824001569_ref25","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.46"},{"key":"S0022377824001569_ref54","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3624200"},{"key":"S0022377824001569_ref52","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2020.109451"},{"key":"S0022377824001569_ref2","doi-asserted-by":"publisher","DOI":"10.1088\/0741-3335\/57\/11\/113001"},{"key":"S0022377824001569_ref43","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.05.009"},{"key":"S0022377824001569_ref37","doi-asserted-by":"publisher","DOI":"10.1007\/s41115-021-00012-0"},{"key":"S0022377824001569_ref32","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2011.05.012"},{"key":"S0022377824001569_ref17","doi-asserted-by":"publisher","DOI":"10.1109\/SC41404.2022.00008"},{"key":"S0022377824001569_ref28","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2021.110367"},{"key":"S0022377824001569_ref47","unstructured":"TOP500 2024 Top500 June 2024 list. https:\/\/top500.org\/lists\/top500\/2024\/06\/, accessed: 2024-06-17."},{"key":"S0022377824001569_ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2022.108634"},{"key":"S0022377824001569_ref6","unstructured":"Blelloch, G.E. 1990 Prefix sums and their applications. Tech. Rep. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA."},{"key":"S0022377824001569_ref14","doi-asserted-by":"publisher","DOI":"10.1063\/1.1524875"},{"key":"S0022377824001569_ref45","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(77)90099-7"},{"key":"S0022377824001569_ref21","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2016.05.008"},{"key":"S0022377824001569_ref41","doi-asserted-by":"publisher","DOI":"10.1063\/1.4773692"},{"key":"S0022377824001569_ref9","doi-asserted-by":"crossref","first-page":"5374","DOI":"10.1016\/j.jcp.2012.04.040","article-title":"An efficient mixed-precision, hybrid CPU\u2013GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm","volume":"231","author":"Chen","year":"2012","journal-title":"J. Comput. Phys"},{"key":"S0022377824001569_ref49","doi-asserted-by":"publisher","DOI":"10.1063\/5.0028512"},{"key":"S0022377824001569_ref50","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2015.01.020"},{"key":"S0022377824001569_ref26","unstructured":"Kong, X. , Huang, M.C. & Ren, C. 2009 Preliminary results on GPU acceleration of the PIC simulation code OSIRIS using CUDA. In APS Division of Plasma Physics Meeting Abstracts, vol. 51, pp. JP8\u2013138."},{"key":"S0022377824001569_ref5","volume-title":"Plasma Physics Via Computer Simulation","author":"Birdsall","year":"2004"},{"key":"S0022377824001569_ref15","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1016\/j.cpc.2017.09.024","article-title":"Smilei: a collaborative, open-source, multi-purpose particle-in-cell code for plasma simulation","volume":"222","author":"Derouillat","year":"2018","journal-title":"Comput. Phys. Commun"},{"key":"S0022377824001569_ref48","doi-asserted-by":"crossref","first-page":"190301","DOI":"10.1088\/0022-3727\/42\/19\/190301","article-title":"Plasma modelling and numerical simulation","volume":"42","author":"Van Dijk","year":"2009","journal-title":"J. Phys. D: Appl. Phys"},{"key":"S0022377824001569_ref44","volume-title":"Kinetic Plasma Simulation: Meeting the Demands of Increased Complexity","author":"Tableman","year":"2019"},{"key":"S0022377824001569_ref12","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1016\/j.cpc.2010.11.009","article-title":"Adaptable particle-in-cell algorithms for graphical processing units","volume":"182","author":"Decyk","year":"2011","journal-title":"Comput. Phys. Commun"},{"key":"S0022377824001569_ref10","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2014.10.064"},{"key":"S0022377824001569_ref53","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46079-6_21"}],"container-title":["Journal of Plasma Physics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0022377824001569","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T07:29:41Z","timestamp":1736234981000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0022377824001569\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,7]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["S0022377824001569"],"URL":"https:\/\/doi.org\/10.1017\/s0022377824001569","relation":{},"ISSN":["0022-3778","1469-7807"],"issn-type":[{"value":"0022-3778","type":"print"},{"value":"1469-7807","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,7]]},"assertion":[{"value":"Copyright \u00a9 The Author(s), 2025. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}],"article-number":"E8"}}