{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T17:15:40Z","timestamp":1776791740424,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":42,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,11,12]],"date-time":"2023-11-12T00:00:00Z","timestamp":1699747200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"US Department of Energy Office of Science","award":["DE-AC02-05CH11231"],"award-info":[{"award-number":["DE-AC02-05CH11231"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,11,12]]},"DOI":"10.1145\/3624062.3624186","type":"proceedings-article","created":{"date-parts":[[2023,11,10]],"date-time":"2023-11-10T13:53:39Z","timestamp":1699624419000},"page":"1105-1113","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["High-level GPU code: a case study examining JAX and OpenMP."],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0530-6530","authenticated-orcid":false,"given":"Nestor","family":"Demeure","sequence":"first","affiliation":[{"name":"Lawrence Berkeley National Laboratory, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3510-7134","authenticated-orcid":false,"given":"Theodore","family":"Kisner","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5748-5182","authenticated-orcid":false,"given":"Reijo","family":"Keskitalo","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2834-4257","authenticated-orcid":false,"given":"Rollin","family":"Thomas","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5104-7122","authenticated-orcid":false,"given":"Julian","family":"Borrill","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6213-8617","authenticated-orcid":false,"given":"Wahid","family":"Bhimji","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, United States of America"}]}],"member":"320","published-online":{"date-parts":[[2023,11,12]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Carlo Baccigalupi","author":"Abazajian Kevork","year":"2019","unstructured":"Kevork Abazajian, Graeme Addison, Peter Adshead, Zeeshan Ahmed, Steven\u00a0W Allen, David Alonso, Marcelo Alvarez, Adam Anderson, Kam\u00a0S Arnold, Carlo Baccigalupi, 2019. CMB-S4 science case, reference design, and project plan. arXiv preprint arXiv:1907.04473 (2019)."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1088\/1475-7516\/2019\/02\/056"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/LLVM-HPC.2016.006"},{"key":"e_1_3_2_2_4_1","volume-title":"RAJA: Portable performance for large-scale scientific applications. In 2019 ieee\/acm international workshop on performance, portability and productivity in hpc (p3hpc)","author":"Beckingsale A","year":"2019","unstructured":"David\u00a0A Beckingsale, Jason Burmark, Rich Hornung, Holger Jones, William Killian, Adam\u00a0J Kunen, Olga Pearce, Peter Robinson, Brian\u00a0S Ryujin, and Thomas\u00a0RW Scogland. 2019. RAJA: Portable performance for large-scale scientific applications. In 2019 ieee\/acm international workshop on performance, portability and productivity in hpc (p3hpc). IEEE, 71\u201381."},{"key":"e_1_3_2_2_5_1","volume-title":"JAX-FLUIDS: A fully-differentiable high-order computational fluid dynamics solver for compressible two-phase flows. arXiv preprint arXiv:2203.13760","author":"Bezgin A","year":"2022","unstructured":"Deniz\u00a0A Bezgin, Aaron\u00a0B Buhendwa, and Nikolaus\u00a0A Adams. 2022. JAX-FLUIDS: A fully-differentiable high-order computational fluid dynamics solver for compressible two-phase flows. arXiv preprint arXiv:2203.13760 (2022)."},{"key":"e_1_3_2_2_6_1","unstructured":"James Bradbury Roy Frostig Peter Hawkins Matthew\u00a0James Johnson Chris Leary Dougal Maclaurin George Necula Adam Paszke Jake VanderPlas Skye Wanderman-Milne and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http:\/\/github.com\/google\/jax"},{"key":"e_1_3_2_2_7_1","unstructured":"NVIDIA Corporation. 2023. Multi-Process Service: GPU Deployment and Management. https:\/\/docs.nvidia.com\/deploy\/mps\/index.html"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5 1 (1998) 46\u201355.","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_2_9_1","volume-title":"A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload","author":"Daley Christopher","unstructured":"Christopher Daley, Hadia Ahmed, Samuel Williams, and Nicholas Wright. 2020. A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload. In OpenMP: Portable Multi-Level Parallelism on Modern Systems, Kent Milfeld, Bronis\u00a0R. de\u00a0Supinski, Lars Koesterke, and Jannis Klinkenberg (Eds.). Springer International Publishing, Cham, 37\u201351."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","unstructured":"Albert Danial. 2021. cloc: v1.92. https:\/\/doi.org\/10.5281\/zenodo.5760077","DOI":"10.5281\/zenodo.5760077"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.25080\/majora-1b6fd038-004"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-74224-9_2"},{"key":"e_1_3_2_2_13_1","unstructured":"TensorFlow Developers. 2022. TensorFlow. Zenodo (2022)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2019.102546"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Rob Farber. 2016. Parallel programming with OpenACC. Newnes.","DOI":"10.1016\/B978-0-12-410397-9.00001-9"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1086\/427976"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC43674.2020.9286224"},{"key":"e_1_3_2_2_18_1","first-page":"1","article-title":"Taichi: a language for high-performance computation on spatially sparse data structures","volume":"38","author":"Hu Yuanming","year":"2019","unstructured":"Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Fr\u00e9do Durand. 2019. Taichi: a language for high-performance computation on spatially sparse data structures. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1\u201316.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.5194\/gmd-11-3299-2018"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.03419"},{"key":"e_1_3_2_2_21_1","volume-title":"GPU Technology Conference (GTC), Vol.\u00a02.","author":"Jeaugey Sylvain","year":"2017","unstructured":"Sylvain Jeaugey. 2017. Nccl 2.0. In GPU Technology Conference (GTC), Vol.\u00a02."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","unstructured":"Theodore Kisner Reijo Keskitalo Andrea Zonca Jonathan\u00a0R. Madsen Jean Savarit Maurizio Tomasi Kolen Cheung Giuseppe Puglisi David Liu and Matthew Hasselfield. 2021. hpc4cmb\/toast: Update Pybind11. https:\/\/doi.org\/10.5281\/zenodo.5559597","DOI":"10.5281\/zenodo.5559597"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.09.001"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2101784118"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833157.2833162"},{"key":"e_1_3_2_2_26_1","volume-title":"The BSD conference, Vol.\u00a05. 1\u201320","author":"Lattner Chris","year":"2008","unstructured":"Chris Lattner. 2008. LLVM and Clang: Next generation compiler technology. In The BSD conference, Vol.\u00a05. 1\u201320."},{"key":"e_1_3_2_2_27_1","volume-title":"MPI: A Message-Passing Interface Standard Version 4.0. https:\/\/www.mpi-forum.org\/docs\/mpi-4.0\/mpi40-report.pdf","author":"Interface Forum Message Passing","year":"2021","unstructured":"Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https:\/\/www.mpi-forum.org\/docs\/mpi-4.0\/mpi40-report.pdf"},{"key":"e_1_3_2_2_28_1","volume-title":"31st conference on neural information processing systems 151","author":"Nishino ROYUD","year":"2017","unstructured":"ROYUD Nishino and Shohei Hido\u00a0Crissman Loomis. 2017. Cupy: A numpy-compatible library for nvidia gpu calculations. 31st conference on neural information processing systems 151, 7 (2017)."},{"key":"e_1_3_2_2_29_1","unstructured":"NVIDIA P\u00e9ter Vingelmann and Frank\u00a0H.P. Fitzek. 2020. CUDA release: 10.2.89. https:\/\/developer.nvidia.com\/cuda-toolkit"},{"key":"e_1_3_2_2_30_1","unstructured":"OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface Version 4.0. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP4.0.0.pdf"},{"key":"e_1_3_2_2_31_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_2_32_1","volume-title":"XLA: Compiling Machine Learning for Peak Performance.","author":"Sabne Amit","year":"2020","unstructured":"Amit Sabne. 2020. XLA: Compiling Machine Learning for Peak Performance."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2019.8916312"},{"key":"e_1_3_2_2_34_1","volume-title":"Advances in Neural Information Processing Systems, Vol.\u00a033. Curran Associates","author":"Schoenholz S.","year":"2020","unstructured":"Samuel\u00a0S. Schoenholz and Ekin\u00a0D. Cubuk. 2020. JAX M.D. A Framework for Differentiable Physics. In Advances in Neural Information Processing Systems, Vol.\u00a033. Curran Associates, Inc.https:\/\/papers.nips.cc\/paper\/2020\/file\/83d3d4b6c9579515e1679aca8cbc8033-Paper.pdf"},{"key":"e_1_3_2_2_35_1","volume-title":"Using and porting the GNU compiler collection. Vol.\u00a086","author":"M Stallman","unstructured":"Richard\u00a0M Stallman 1999. Using and porting the GNU compiler collection. Vol.\u00a086. Free Software Foundation."},{"key":"e_1_3_2_2_36_1","volume-title":"OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3","author":"Stone E","year":"2010","unstructured":"John\u00a0E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3315508.3329973"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3097283"},{"key":"e_1_3_2_2_39_1","volume-title":"The NumPy array: a structure for efficient numerical computation. Computing in science & engineering 13, 2","author":"Der\u00a0Walt Stefan Van","year":"2011","unstructured":"Stefan Van Der\u00a0Walt, S\u00a0Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in science & engineering 13, 2 (2011), 22\u201330."},{"key":"e_1_3_2_2_40_1","volume-title":"fundamental algorithms for scientific computing in Python. Nature methods 17, 3","author":"Virtanen Pauli","year":"2020","unstructured":"Pauli Virtanen, Ralf Gommers, Travis\u00a0E Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature methods 17, 3 (2020), 261\u2013272."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2014.07.011"},{"key":"e_1_3_2_2_42_1","volume-title":"Accelerate Science on Perlmutter with NERSC. Bulletin of the American Physical Society 65","author":"Yang Charlene","year":"2020","unstructured":"Charlene Yang and Jack Deslippe. 2020. Accelerate Science on Perlmutter with NERSC. Bulletin of the American Physical Society 65 (2020)."}],"event":{"name":"SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis","location":"Denver CO USA","acronym":"SC-W 2023"},"container-title":["Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3624062.3624186","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3624062.3624186","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T03:05:17Z","timestamp":1755745517000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3624062.3624186"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,12]]},"references-count":42,"alternative-id":["10.1145\/3624062.3624186","10.1145\/3624062"],"URL":"https:\/\/doi.org\/10.1145\/3624062.3624186","relation":{},"subject":[],"published":{"date-parts":[[2023,11,12]]},"assertion":[{"value":"2023-11-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}