{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T23:01:48Z","timestamp":1777676508298,"version":"3.51.4"},"reference-count":19,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2012,11,16]],"date-time":"2012-11-16T00:00:00Z","timestamp":1353024000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2013,8]]},"abstract":"<jats:p>The suitability of a spectral element based dynamical core (HOMME) within the Community Atmospheric Model (CAM) for GPU-based architectures is examined and initial performance results are reported. This work was done within a project to enable CAM to run at high resolution on next-generation, multi-petaflop systems. The dynamical core is the present focus because it dominates the performance profile of our target problem. HOMME enjoys good scalability due to its underlying cubed-sphere mesh with full two-dimensional decomposition and the localization of all computational work within each element. The thread blocking and code changes that allow HOMME to effectively use GPUs are described along with a rewritten vertical remapping scheme, which improves performance on both CPUs and GPUs. Validation of results in the full HOMME model is also described. We demonstrate that the most expensive kernel in the model executes more than three times faster on the GPU than the CPU. These improvements are expected to provide improved efficiency when incorporated into the full model that has been configured for the target problem. Remaining issues affecting performance include optimizing the boundary exchanges for the case of multiple spectral elements being computed on the GPU.<\/jats:p>","DOI":"10.1177\/1094342012462751","type":"journal-article","created":{"date-parts":[[2012,11,17]],"date-time":"2012-11-17T20:33:28Z","timestamp":1353184408000},"page":"335-347","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":28,"title":["Progress towards accelerating HOMME on hybrid multi-core systems"],"prefix":"10.1177","volume":"27","author":[{"given":"I.","family":"Carpenter","sequence":"first","affiliation":[{"name":"National Renewable Energy Laboratory, Golden, CO, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"R.K.","family":"Archibald","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"K.J.","family":"Evans","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J.","family":"Larkin","sequence":"additional","affiliation":[{"name":"Cray Inc., Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P.","family":"Micikevicius","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M.","family":"Norman","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J.","family":"Rosinski","sequence":"additional","affiliation":[{"name":"National Oceanic and Atmospheric Administration, Boulder, CO, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J.","family":"Schwarzmeier","sequence":"additional","affiliation":[{"name":"Cray Inc., Chippewa Falls, WI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"M.A.","family":"Taylor","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, Albuquerque, NM, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2012,11,16]]},"reference":[{"key":"bibr1-1094342012462751","author":"Dennis J","year":"2011","journal-title":"Int J High Perf Comput Appl"},{"key":"bibr2-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005056108"},{"key":"bibr3-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1175\/2011JCLI4083.1"},{"key":"bibr4-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.106"},{"key":"bibr5-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0477(1994)075<1825:APFTIO>2.0.CO;2"},{"key":"bibr6-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.26"},{"key":"bibr7-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0493(2004)132<2293:AVLFDC>2.0.CO;2"},{"key":"bibr8-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654067"},{"key":"bibr9-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626408003557"},{"key":"bibr10-1094342012462751","unstructured":"Micikevicius P (2010) Fundamental Optimizations. High Performance Computing with CUDA, Tutorial S03, Supercomputing 2010, New Orleans. Available at: http:\/\/www.nvidia.com\/content\/PDF\/sc_2010\/CUDA_Tutorial\/SC10_Fundamental_Optimizations.pdf."},{"key":"bibr11-1094342012462751","unstructured":"Neale RB et al. (2010) Description of the NCAR Community Atmosphere Model CAM 4.0. NCAR Technical Note. Available at: http:\/\/www.cesm.ucar.edu\/models\/cesm1.0\/cam\/docs\/description\/cam5_desc.pdf"},{"key":"bibr12-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"bibr13-1094342012462751","unstructured":"NVIDIA (2009) NVIDIA\u2019s Next Generation CUDA Compute Architecture: Fermi. Available at: http:\/\/www.nvidia.com\/content\/PDF\/fermi_white_papers\/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf."},{"key":"bibr14-1094342012462751","unstructured":"NVIDIA (2010) CUDA Best Practices Guide. Available at: http:\/\/developer.download.nvidia.com\/compute\/cuda\/3_2_prod\/toolkit\/docs\/CUDA_C_Best_Practices_Guide.pdf."},{"key":"bibr15-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827594275534"},{"key":"bibr16-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0493(1981)109<0758:AEAAMC>2.0.CO;2"},{"key":"bibr17-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/125\/1\/012023"},{"key":"bibr18-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1256\/qj.04.97"},{"key":"bibr19-1094342012462751","doi-asserted-by":"publisher","DOI":"10.1002\/fld.1154"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342012462751","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342012462751","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342012462751","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:19:12Z","timestamp":1777450752000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342012462751"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11,16]]},"references-count":19,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,8]]}},"alternative-id":["10.1177\/1094342012462751"],"URL":"https:\/\/doi.org\/10.1177\/1094342012462751","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,11,16]]}}}