{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T04:08:16Z","timestamp":1774325296603,"version":"3.50.1"},"reference-count":12,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2016,7,27]],"date-time":"2016-07-27T00:00:00Z","timestamp":1469577600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2016,8]]},"abstract":"<jats:p> We present performance results and an analysis of a message passing interface (MPI)\/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather\u2013scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, streaming, and data management. Performance results using up to 16,384 graphics processing units of the Cray XK7 supercomputer Titan show more than 2.5\u00d7 speedup over central processing unit-only performance on the same number of nodes (262,144 MPI ranks) for problem sizes of up to 6.9 billion grid points. We discuss performance-enhancement strategies and the overall potential of GPU-based computing for this class of problems. <\/jats:p>","DOI":"10.1177\/1094342015626584","type":"journal-article","created":{"date-parts":[[2016,2,3]],"date-time":"2016-02-03T02:03:02Z","timestamp":1454464982000},"page":"320-334","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":30,"title":["An MPI\/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication"],"prefix":"10.1177","volume":"30","author":[{"given":"Matthew","family":"Otten","sequence":"first","affiliation":[{"name":"Department of Physics, Cornell University, Ithaca, NY, USA"},{"name":"Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Jing","family":"Gong","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology, Stockholm, Sweden"}]},{"given":"Azamat","family":"Mametjanov","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Aaron","family":"Vose","sequence":"additional","affiliation":[{"name":"Cray\u2019s Suercomputing Cener of Excellence, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"John","family":"Levesque","sequence":"additional","affiliation":[{"name":"Cray\u2019s Suercomputing Cener of Excellence, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"Paul","family":"Fischer","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA"},{"name":"Department of Computer Science, Univerisity of Illinois at Urbana\u2013Champaign, Champaign, IL, USA"},{"name":"Department of Mechanical Engineering, Univerisity of Illinois at Urbana\u2013Champaign, Champaign, IL, USA"}]},{"given":"Misun","family":"Min","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA"}]}],"member":"179","published-online":{"date-parts":[[2016,7,27]]},"reference":[{"key":"bibr1-1094342015626584","volume-title":"NASA Report TM 109112","author":"Carpenter M","year":"1994"},{"key":"bibr2-1094342015626584","unstructured":"Cray Inc (2012) Cray Fortran Reference Manual. Cray Inc. Available at: http:\/\/docs.cray.com\/books\/S-3901-60\/\/S-3901-60.pdf"},{"key":"bibr3-1094342015626584","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511546792"},{"key":"bibr4-1094342015626584","volume-title":"Nodal Discontinuous Galerkin Methods, Algorithms, Analysis, and Applications","author":"Hesthaven J","year":"2008"},{"key":"bibr5-1094342015626584","volume-title":"Spectral Methods for Time-dependent Problems, Volume 21 of Cambridge Monographs on Applied and Computational Mathematics","author":"Hesthaven J","year":"2007"},{"key":"bibr6-1094342015626584","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2009.06.041"},{"key":"bibr7-1094342015626584","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015576846"},{"key":"bibr8-1094342015626584","unstructured":"Medina DS, St-Cyr A, Warburton T (2014) OCCA: a unified approach to multi-threading languages. Available at: http:\/\/arxiv.org\/abs\/1403.0968"},{"key":"bibr9-1094342015626584","doi-asserted-by":"publisher","DOI":"10.1007\/s10915-013-9718-8"},{"key":"bibr10-1094342015626584","unstructured":"Nvidia (2012) Developing a Linux kernel module using RDMA for GPUDirect. Nvidia Corporation. Available at: http:\/\/developer.download.nvidia.com\/compute\/cuda\/5_0\/rc\/docs\/GPUDirect_RDMA.pdf"},{"key":"bibr11-1094342015626584","unstructured":"Openaccorg (2011) The OpenACC\u2122 Application Programming Interface. Openacc Inc. Available at: http:\/\/www.openacc.org\/sites\/default\/files\/OpenACC.1.0_0.pdf"},{"key":"bibr12-1094342015626584","volume-title":"Computational Electrodynamics, The Finite Difference Time Domain Method","author":"Taflove A","year":"2000"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015626584","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342015626584","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015626584","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T04:04:57Z","timestamp":1740801897000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342015626584"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,27]]},"references-count":12,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2016,8]]}},"alternative-id":["10.1177\/1094342015626584"],"URL":"https:\/\/doi.org\/10.1177\/1094342015626584","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,7,27]]}}}