{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T17:15:52Z","timestamp":1781889352779,"version":"3.54.5"},"reference-count":49,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100006168","name":"Los Alamos National Laboratory (LANL)","doi-asserted-by":"publisher","award":["89233218CNA000001"],"award-info":[{"award-number":["89233218CNA000001"]}],"id":[{"id":"10.13039\/100006168","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures.<\/jats:p>","DOI":"10.3390\/info15110673","type":"journal-article","created":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T08:39:07Z","timestamp":1730104747000},"page":"673","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7611-8449","authenticated-orcid":false,"given":"Nathaniel","family":"Morgan","sequence":"first","affiliation":[{"name":"Engineering Technology & Design Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4926-6569","authenticated-orcid":false,"given":"Caleb","family":"Yenusah","sequence":"additional","affiliation":[{"name":"Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4695-8105","authenticated-orcid":false,"given":"Adrian","family":"Diaz","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8190-0075","authenticated-orcid":false,"given":"Daniel","family":"Dunning","sequence":"additional","affiliation":[{"name":"Engineering Technology & Design Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3744-5615","authenticated-orcid":false,"given":"Jacob","family":"Moore","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8215-1379","authenticated-orcid":false,"given":"Erin","family":"Heilman","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Calvin","family":"Roth","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Evan","family":"Lieberman","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Steven","family":"Walton","sequence":"additional","affiliation":[{"name":"Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sarah","family":"Brown","sequence":"additional","affiliation":[{"name":"Engineering Technology & Design Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0673-9741","authenticated-orcid":false,"given":"Daniel","family":"Holladay","sequence":"additional","affiliation":[{"name":"Computer, Computational & Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marko","family":"Knezevic","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering, University of New Hampshire, Durham, NH 03824, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gavin","family":"Whetstone","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zachary","family":"Baker","sequence":"additional","affiliation":[{"name":"Engineering Technology & Design Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robert","family":"Robey","sequence":"additional","affiliation":[{"name":"Computational Physics Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2024,10,28]]},"reference":[{"key":"ref_1","unstructured":"Sicard, E., and Trojman, L. (2022). Introducing 2-nm\/20 \u00c5 Nano-Sheet FET Technology with Buried Power Rails and Nano Through-Silicon-Vias in Microwind. [Ph.D. Thesis, INSA Toulouse]."},{"key":"ref_2","unstructured":"Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., Cowan, M., Wang, L., Hu, Y., and Ceze, L. (2018, January 8\u201310). TVM: An automated End-to-End optimizing compiler for deep learning. Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Haidl, M., and Gorlatch, S. (2014, January 17). PACXX: Towards a unified programming model for programming accelerators using C++ 14. Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, New Orleans, LA, USA.","DOI":"10.1109\/LLVM-HPC.2014.9"},{"key":"ref_4","unstructured":"Zheng, L., Jia, C., Sun, M., Wu, Z., Yu, C.H., Haj-Ali, A., Wang, Y., Yang, J., Zhuo, D., and Sen, K. (2020, January 4\u20136). Ansor: Generating High-Performance tensor programs for deep learning. Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), Virtual Event."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3427093","article-title":"Efficient auto-tuning of parallel programs with interdependent tuning parameters via auto-tuning framework (ATF)","volume":"18","author":"Rasch","year":"2021","journal-title":"ACM Trans. Archit. Code Optim. (TACO)"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3202","DOI":"10.1016\/j.jpdc.2014.07.003","article-title":"Kokkos","volume":"74","author":"Edwards","year":"2014","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Beckingsale, D.A., Burmark, J., Hornung, R., Jones, H., Killian, W., Kunen, A.J., Pearce, O., Robinson, P., Ryujin, B.S., and Scogland, T.R. (2019, January 22). RAJA: Portable performance for large-scale scientific applications. Proceedings of the 2019 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA.","DOI":"10.1109\/P3HPC49587.2019.00012"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Arndt, D., Lebrun-Grandie, D., and Trott, C. (2024, January 8\u201311). Experiences with implementing Kokkos\u2019 SYCL backend. Proceedings of the 12th International Workshop on OpenCL and SYCL, Chicago, IL, USA.","DOI":"10.1145\/3648115.3648118"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Steuwer, M., Remmelg, T., and Dubach, C. (2017, January 4\u20138). Lift: A functional data-parallel IR for high-performance GPU code generation. Proceedings of the 2017 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO), Austin, TX, USA.","DOI":"10.1109\/CGO.2017.7863730"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.jpdc.2021.03.016","article-title":"MATAR: A Performance Portability and Productivity Implementation of Data-Oriented Design with Kokkos","volume":"157","author":"Dunning","year":"2021","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_11","unstructured":"Rajamanickam, S., Acer, S., Berger-Vergiat, L., Dang, V., Ellingwood, N., Harvey, E., Kelley, B., Trott, C.R., Wilke, J., and Yamazaki, I. (2021). Kokkos kernels: Performance portable sparse\/dense linear algebra and graph kernels. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yenusah, C., Morgan, N., Robey, R., Stone, T., Liu, Y., and Chen, L. (2022, January 14\u201317). Incorporating performance portability and data-oriented design in phase-field modeling. Proceedings of the ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference IDETC\/CIE2022, St. Louis, MO, USA.","DOI":"10.1115\/DETC2022-89513"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"109190","DOI":"10.1016\/j.cpc.2024.109190","article-title":"A parallel and performance portable implementation of a full-field crystal plasticity model","volume":"300","author":"Yenusah","year":"2024","journal-title":"Comput. Phys. Commun."},{"key":"ref_14","unstructured":"Morgan, N., Moore, J., Brown, S., Chiravalle, V., Diaz, A., Dunning, D., Lieberman, E., Walton, S., Welsh, K., and Yenusah, C. (2024, October 05). Fierro. Available online: https:\/\/github.com\/LANL\/Fierro."},{"key":"ref_15","unstructured":"Diaz, A., Morgan, N., and Bernardin, J. (2022, January 14\u201317). A parallel multi-constraint topology optimization solver. Proceedings of the ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference IDETC\/CIE2022, St. Louis, MO, USA."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1531","DOI":"10.1007\/s11081-023-09852-6","article-title":"Parallel 3D topology optimization with multiple constraints and objectives","volume":"25","author":"Diaz","year":"2023","journal-title":"Optim. Eng."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"642","DOI":"10.1002\/fld.4284","article-title":"A 3D finite element ALE method using an approximate Riemann solution","volume":"83","author":"Chiravalle","year":"2016","journal-title":"Int. J. Numer. Methods Fluids"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.compfluid.2012.09.008","article-title":"A Cell Centered Lagrangian Godunov-like method of solid dynamics","volume":"83","author":"Burton","year":"2013","journal-title":"Comput. Fluids"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.jcp.2019.02.008","article-title":"A high-order Lagrangian discontinuous Galerkin hydrodynamic method for quadratic cells using a subcell mesh stabilization scheme","volume":"386","author":"Liu","year":"2019","journal-title":"J. Comput. Phys."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"113890","DOI":"10.1016\/j.cam.2021.113890","article-title":"A fourth-order Lagrangian discontinuous Galerkin method using a hierarchical orthogonal basis on curvilinear grids","volume":"404","author":"Liu","year":"2022","journal-title":"J. Comput. Appl. Math."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1016\/j.cma.2019.05.006","article-title":"A higher-order Lagrangian discontinuous Galerkin hydrodynamic method for solid dynamics","volume":"353","author":"Lieberman","year":"2019","journal-title":"Comput. Methods Appl. Mech. Eng."},{"key":"ref_22","first-page":"100022","article-title":"A multiphase Lagrangian discontinuous Galerkin hydrodynamic method for high-explosive detonation physics","volume":"4","author":"Lieberman","year":"2020","journal-title":"Appl. Eng. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"A343","DOI":"10.1137\/18M1223939","article-title":"Multidimensional staggered grid residual distribution scheme for Lagrangian hydrodynamics","volume":"42","author":"Abgrall","year":"2020","journal-title":"SIAM J. Sci. Comput."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"100257","DOI":"10.1016\/j.softx.2019.100257","article-title":"ELEMENTS: A high-order finite element library in C++","volume":"10","author":"Moore","year":"2019","journal-title":"SoftwareX"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Morgan, N., Moore, J., Kiviaho, J., and Diaz, A. (2022, January 14\u201317). A 3D arbitrary-order element mesh library to support diverse numerical methods. Proceedings of the ASME 2022 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference IDETC\/CIE2022, St. Louis, MO, USA.","DOI":"10.1115\/DETC2022-89562"},{"key":"ref_26","first-page":"100040","article-title":"Viscoplastic self-consistent formulation as generalized material model for solid mechanics applications","volume":"6","author":"Zecevic","year":"2021","journal-title":"Appl. Eng. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"104208","DOI":"10.1016\/j.mechmat.2021.104208","article-title":"New large-strain FFT-based formulation and its application to model strain localization in nano-metallic laminates and other strongly anisotropic crystalline materials","volume":"166","author":"Zecevic","year":"2022","journal-title":"Mech. Mater."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1038\/30918","article-title":"Collective dynamics of \u2018small-world\u2019networks","volume":"393","author":"Watts","year":"1998","journal-title":"Nature"},{"key":"ref_29","first-page":"17","article-title":"On the evolution of random graphs","volume":"5","author":"Erdos","year":"1960","journal-title":"Publ. Math. Inst. Hung. Acad. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1145\/367766.368168","article-title":"Algorithm 97: Shortest path","volume":"5","author":"Floyd","year":"1962","journal-title":"Commun. ACM"},{"key":"ref_31","unstructured":"Varoquaux, G., Vaught, T., and Millman, J. (2008, January 19\u201324). Exploring Network Structure, Dynamics, and Function using NetworkX. Proceedings of the 7th Python in Science Conference, Pasadena, CA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/BF02478259","article-title":"A logical calculus of the ideas immanent in nervous activity","volume":"5","author":"McCulloch","year":"1943","journal-title":"Bull. Math. Biophys."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Drori, I. (2022). The Science of Deep Learning, Cambridge University Press. Available online: http:\/\/www.dlbook.org.","DOI":"10.1017\/9781108891530"},{"key":"ref_34","unstructured":"Chollet, F. (2024, October 05). And Others Keras. Available online: https:\/\/keras.io."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1038\/359123a0","article-title":"A model for the global variation in oceanic depth and heat flow with lithospheric age","volume":"359","author":"Stein","year":"1992","journal-title":"Nature"},{"key":"ref_36","first-page":"375","article-title":"Finite half space model of oceanic lithosphere","volume":"Volume 11","author":"Veress","year":"2011","journal-title":"Horizons in Earth Science Research"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.1002\/andp.19293950803","article-title":"Zur kinetischen Theorie der warmeleitung in kristallen","volume":"395","author":"Peierls","year":"1929","journal-title":"Ann. Phys."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1098\/rspa.1966.0013","article-title":"Nonlinear interactions of random waves in a dispersive medium","volume":"289","author":"Benney","year":"1966","journal-title":"Proc. R. Soc. Lond. A"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1017\/S0022112062000373","article-title":"On the non-linear energy transfer in a gravity-wave spectrum Part 1. General theory","volume":"12","author":"Hasselmann","year":"1962","journal-title":"J. Fluid Mech."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1002\/sapm196948129","article-title":"Random wave closures","volume":"48","author":"Benney","year":"1969","journal-title":"Stud. Appl. Math."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/BF00915178","article-title":"Weak turbulence of capillary waves","volume":"8","author":"Zakharov","year":"1967","journal-title":"J. Appl. Mech. Tech. Phys."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1007\/BF00232479","article-title":"On the spectral dissipation of ocean waves due to white capping","volume":"6","author":"Hasselmann","year":"1974","journal-title":"Bound.-Layer Meteorol."},{"key":"ref_43","first-page":"xvi+279","article-title":"Wave Turbulence","volume":"Volume 825","author":"Nazarenko","year":"2011","journal-title":"Lecture Notes in Physics"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1146\/annurev-fluid-021021-102043","article-title":"Experiments in Surface Gravity\u2013Capillary Wave Turbulence","volume":"54","author":"Falcon","year":"2022","journal-title":"Annu. Rev. Fluid Mech."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"L063101","DOI":"10.1103\/PhysRevE.105.L063101","article-title":"Three-dimensional direct numerical simulation of free-surface magnetohydrodynamic wave turbulence","volume":"105","author":"Kochurin","year":"2022","journal-title":"Phys. Rev. E"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2229","DOI":"10.1007\/s00220-019-03651-w","article-title":"On the energy cascade of 3-wave kinetic equations: Beyond Kolmogorov\u2013Zakharov solutions","volume":"376","author":"Soffer","year":"2020","journal-title":"Commun. Math. Phys."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"B467","DOI":"10.1137\/22M1492210","article-title":"A numerical scheme for wave turbulence: 3-wave kinetic equations","volume":"45","author":"Walton","year":"2023","journal-title":"SIAM J. Sci. Comput."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Galtier, S. (2022). Physics of Wave Turbulence, Cambridge University Press.","DOI":"10.1017\/9781009275880"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1016\/j.apnum.2022.12.010","article-title":"A deep learning approximation of non-stationary solutions to wave kinetic equations","volume":"199","author":"Walton","year":"2022","journal-title":"Appl. Numer. Math."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/11\/673\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:21:50Z","timestamp":1760113310000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/11\/673"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,28]]},"references-count":49,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2024,11]]}},"alternative-id":["info15110673"],"URL":"https:\/\/doi.org\/10.3390\/info15110673","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,28]]}}}