{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T20:42:07Z","timestamp":1772829727886,"version":"3.50.1"},"reference-count":39,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2018,2,5]],"date-time":"2018-02-05T00:00:00Z","timestamp":1517788800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,3]]},"abstract":"<jats:p> Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This article presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We present performance results for the Aeras global atmosphere dynamical core module in Albany. Numerical experiments show that our single code implementation gives reasonable performance across three multicore\/many-core architectures: NVIDIA General Processing Units (GPU\u2019s), Intel Xeon Phis, and multicore CPUs. <\/jats:p>","DOI":"10.1177\/1094342017749957","type":"journal-article","created":{"date-parts":[[2018,2,5]],"date-time":"2018-02-05T11:57:34Z","timestamp":1517831854000},"page":"332-352","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":18,"title":["Toward performance portability of the Albany finite element analysis code using the Kokkos library"],"prefix":"10.1177","volume":"33","author":[{"given":"Irina","family":"Demeshko","sequence":"first","affiliation":[{"name":"Los Alamos National Laboratory, Los Alamos, NM, USA"}]},{"given":"Jerry","family":"Watkins","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Livermore, CA, USA"}]},{"given":"Irina K","family":"Tezaur","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Livermore, CA, USA"}]},{"given":"Oksana","family":"Guba","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM, USA"}]},{"given":"William F","family":"Spotz","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM, USA"}]},{"given":"Andrew G","family":"Salinger","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM, USA"}]},{"given":"Roger P","family":"Pawlowski","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM, USA"}]},{"given":"Michael A","family":"Heroux","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM, USA"}]}],"member":"179","published-online":{"date-parts":[[2018,2,5]]},"reference":[{"issue":"100","key":"bibr1-1094342017749957","first-page":"9","volume":"3","author":"Aln\u00e6s MS","year":"2015","journal-title":"Archive of Numerical Software"},{"key":"bibr2-1094342017749957","doi-asserted-by":"crossref","unstructured":"Aln\u00e6s MS, Logg A, \u00d8lgaard KB, (2014) Unified form language: a domain-specific language for weak formulations of partial differential equations. ACM Transactions on Mathematical Software (TOMS) 40(2): 9. Available at: https:\/\/doi.org\/10.1145\/2566630.","DOI":"10.1145\/2566630"},{"key":"bibr3-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1175\/MWR3360.1"},{"key":"bibr4-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1155\/2012\/403902"},{"key":"bibr5-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011428142"},{"key":"bibr6-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.07.003"},{"key":"bibr7-1094342017749957","volume-title":"SIERRA Toolkit Computational Mesh Conceptual Model: Technical Report","author":"Edwards HC","year":"2010"},{"key":"bibr8-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1175\/2010MWR3288.1"},{"key":"bibr9-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1063\/1.4825209"},{"key":"bibr10-1094342017749957","doi-asserted-by":"publisher","DOI":"10.5194\/gmdd-7-4081-2014"},{"key":"bibr11-1094342017749957","unstructured":"Harris M (2015) Developing portable CUDA C\/C++ code with Hemi. Available at: http:\/\/devblogs.nvidia.com\/parallelforall\/developing-portable-cuda-cc-code-hemi\/ (accessed 11 January 2018)."},{"key":"bibr12-1094342017749957","volume-title":"An Overview of Trilinos: Technical Report SAND2003-2927","author":"Heroux MA","year":"2003"},{"key":"bibr13-1094342017749957","unstructured":"Heuveline V (2010) Hiflow3: a flexible and hardware-aware parallel finite element package. In: Proceedings of the 9th workshop on parallel\/high-performance object-oriented scientific computing. Available at: http:\/\/journals.ub.uni-heidelberg.de\/index.php\/emcl-pp\/article\/view\/11675 (accesed January 12, 2018)."},{"key":"bibr14-1094342017749957","doi-asserted-by":"publisher","DOI":"10.2172\/1169830"},{"key":"bibr15-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1256\/qj.06.12"},{"key":"bibr16-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2009.14"},{"key":"bibr17-1094342017749957","first-page":"421","volume-title":"Petascale Computing: Algorithms and Applications","author":"Kale LV","year":"2008"},{"key":"bibr18-1094342017749957","volume-title":"Multithreaded Programming With Pthreads","author":"Lewis B","year":"1998"},{"key":"bibr19-1094342017749957","unstructured":"Medina DD, St-Cyr A, Warburton T (2014) OCCA: a unified approach to multi-threading languages. SIAM Journal on Scientific Computing (SISC). Avaialable at: http:\/\/arxiv.org\/abs\/1403.0968."},{"key":"bibr20-1094342017749957","first-page":"1","author":"Mudalige GR","year":"2012","journal-title":"Innovative Parallel Computing (InPar)"},{"key":"bibr21-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2009.7478342"},{"key":"bibr23-1094342017749957","unstructured":"Nvp (2014) Nvidia visual profiler (nvprof) users\u2019 guide. Available: http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide (accessed 11 January 2018)."},{"key":"bibr24-1094342017749957","volume-title":"Rythmos: solution and analysis package for differential-algebraic and ordinary-differential equations. Technical report","author":"Ober CC","year":"2013"},{"key":"bibr25-1094342017749957","unstructured":"OpenACC (2013) The OpenACC Application Programming Interface: Technical Report. OpenACC-Standard.org Avaialable at: https:\/\/www.openacc.org (accessed 11 January 2018)."},{"key":"bibr26-1094342017749957","volume-title":"OpenMP Application Program Interface: Technical Report","author":"OpenMP","year":"2013"},{"key":"bibr27-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1155\/2012\/202071"},{"key":"bibr28-1094342017749957","doi-asserted-by":"crossref","unstructured":"Phipps E, Pawlowski R (2012) Efficient expression templates for operator overloading-based automatic differentiation. In: Proceedings of the 6th international conference on automatic differentiation, July 2012. Avaialable at: https:\/\/doi.org\/10.1007\/978-3-642-30023-3_28.","DOI":"10.1007\/978-3-642-30023-3_28"},{"key":"bibr29-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1145\/2998441"},{"key":"bibr30-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1109\/SC.Companion.2012.134"},{"key":"bibr31-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1615\/IntJMultCompEng.2016017040"},{"key":"bibr33-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.05.478"},{"key":"bibr34-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1002\/nag.2161"},{"key":"bibr35-1094342017749957","first-page":"80","volume-title":"Numerical Techniques for Global Atmospheric Models. Lecture Notes in Computational Science and Engineering","author":"Taylor MA","year":"2012"},{"key":"bibr36-1094342017749957","first-page":"78","volume":"012074","author":"Taylor MA","year":"2007","journal-title":"Journal of Physics: Conference Series"},{"key":"bibr37-1094342017749957","doi-asserted-by":"publisher","DOI":"10.5194\/gmd-8-1197-2015"},{"key":"bibr38-1094342017749957","unstructured":"Ullrich PA, Jablonowski C, Kent J, (2012) Dynamical Core Model Intercomparison Project (DCMIP) Test Case Document: Technical Report. National Center for Atmospheric Research. Available at: https:\/\/earthsystemcog.org\/projects\/dcmip-2012\/test_cases (accessed 11 January 2018)."},{"key":"bibr39-1094342017749957","unstructured":"Unat D, Chan C, Zhang W, (2013) Tiling as a durable abstraction for parallelism and data locality. In: WOLFHPC: Workshop on domain-specific languages and high-level frameworks for HPC."},{"key":"bibr40-1094342017749957","first-page":"58","volume":"99","author":"Weber R","year":"2010","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"bibr41-1094342017749957","doi-asserted-by":"publisher","DOI":"10.1016\/S0021-9991(05)80016-6"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342017749957","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342017749957","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342017749957","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T00:16:34Z","timestamp":1740874594000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342017749957"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,2,5]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,3]]}},"alternative-id":["10.1177\/1094342017749957"],"URL":"https:\/\/doi.org\/10.1177\/1094342017749957","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,2,5]]}}}