{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T05:37:22Z","timestamp":1740980242741,"version":"3.38.0"},"reference-count":37,"publisher":"SAGE Publications","issue":"5","license":[{"start":{"date-parts":[[2020,5,30]],"date-time":"2020-05-30T00:00:00Z","timestamp":1590796800000},"content-version":"vor","delay-in-days":366,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["NSF01"],"award-info":[{"award-number":["NSF01"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,9]]},"abstract":"<jats:p> The approach of the next-generation computing platforms offers a tremendous opportunity to advance the state-of-the-art in global atmospheric dynamical models. We detail our incremental approach to utilize this emerging technology by enhancing concurrency within the High-Order Method Modeling Environment (HOMME) atmospheric dynamical model developed at the National Center for Atmospheric Research (NCAR). The study focused on improvements to the performance of HOMME which is a Fortran 90 code with a hybrid (MPIOpenMP) programming model. The article describes the changes made to the use of message passing interface (MPI) and OpenMP as well as single-core optimizations to achieve significant improvements in concurrency and overall code performance. For our optimization studies, we utilize the \u201cCori\u201d system with an Intel Xeon Phi Knights Landing processor deployed at the National Energy Research Supercomputing Center and the \u201c`Cheyenne\u201d system with an Intel Xeon Broadwell processor installed at the NCAR. The results from the studies, using \u201cworkhorse\u201d configurations performed at NCAR, show that these changes have a transformative impact on the computational performance of HOMME. Our improvements have shown that we can effectively increase potential concurrency by efficiently threading the vertical dimension. Further, we have seen a factor of two overall improvement in the computational performance of the code resulting from the single-core optimizations. Most notably from the work is that our incremental approach allows for high-impact changes without disrupting existing scientific productivity in the HOMME community. <\/jats:p>","DOI":"10.1177\/1094342019849618","type":"journal-article","created":{"date-parts":[[2019,5,31]],"date-time":"2019-05-31T03:50:33Z","timestamp":1559274633000},"page":"1030-1045","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Optimizing the HOMME dynamical core for multicore platforms"],"prefix":"10.1177","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2119-8242","authenticated-orcid":false,"given":"John M","family":"Dennis","sequence":"first","affiliation":[{"name":"National Center for Atmospheric Research, Computational Information Systems Laboratory, Boulder, CO, USA"}]},{"given":"Brian","family":"Dobbins","sequence":"additional","affiliation":[{"name":"National Center for Atmospheric Research, Computational Information Systems Laboratory, Boulder, CO, USA"}]},{"given":"Christopher","family":"Kerr","sequence":"additional","affiliation":[{"name":"Kerr Computing Associates, Wakefield, RI, USA"}]},{"given":"Youngsung","family":"Kim","sequence":"additional","affiliation":[{"name":"National Center for Atmospheric Research, Computational Information Systems Laboratory, Boulder, CO, USA"}]}],"member":"179","published-online":{"date-parts":[[2019,5,30]]},"reference":[{"key":"bibr1-1094342019849618","doi-asserted-by":"publisher","DOI":"10.2172\/1081802"},{"key":"bibr2-1094342019849618","first-page":"1","author":"Ashby S","year":"2010","journal-title":"Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee"},{"key":"bibr3-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1177\/1094342012462751"},{"key":"bibr4-1094342019849618","unstructured":"Cheyenne (2017) Cheyenne overview. Available at: https:\/\/www2.cisl.ucar.edu\/resources\/computational-systems\/cheyenne (accessed 18 May 2019)."},{"key":"bibr5-1094342019849618","unstructured":"Cori (2017) Cori overview. Available at: http:\/\/www.nersc.gov\/users\/computational-systems\/cori (accessed 18 May 2019)."},{"key":"bibr6-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.1974.1050511"},{"key":"bibr7-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2003.1213486"},{"key":"bibr8-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011428142"},{"key":"bibr9-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1145\/2488551.2488553"},{"key":"bibr10-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1177\/1094342010391989"},{"key":"bibr11-1094342019849618","unstructured":"Extrae (2017) Extrae instrumentation package. Available at: https:\/\/tools.bsc.es\/extrae (accessed 18 May 2019)."},{"key":"bibr12-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126909"},{"key":"bibr13-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.82"},{"key":"bibr14-1094342019849618","first-page":"45","volume":"1","author":"Fuhrer O","year":"2014","journal-title":"Supercomputing Frontiers and Innovations: an International Journal"},{"key":"bibr15-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1109\/WACCPD.2014.9"},{"key":"bibr16-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1175\/BAMS-D-15-00278.1"},{"key":"bibr17-1094342019849618","unstructured":"Green RW (2014) Intel-vectorization and optimization reports. Available at: https:\/\/software.intel.com\/en-us\/articles\/vectorization-and-optimization-reports (accessed 18 May 2019)."},{"key":"bibr18-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-803819-2.00016-1"},{"key":"bibr19-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.05.466"},{"key":"bibr20-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626414500030"},{"key":"bibr21-1094342019849618","doi-asserted-by":"publisher","DOI":"10.5194\/gmd-11-1799-2018"},{"key":"bibr22-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0493(2004)132<2293:AVLFDC>2.0.CO;2"},{"key":"bibr23-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1145\/582034.582052"},{"key":"bibr24-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536351"},{"key":"bibr25-1094342019849618","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1177\/1094342018763966","volume":"33","author":"M\u00fcller A","year":"2016","journal-title":"The International Journal of High Performance Computing Applications"},{"key":"bibr26-1094342019849618","unstructured":"Paraview (2017) Parallel visualization application. Available at: https:\/\/www.paraview.org\/overview\/ (accessed 18 May 2019)."},{"key":"bibr27-1094342019849618","unstructured":"Psyclone (2015) PSyclone - a compiler for finite element\/volume\/difference DSLs in Fortran. Available at: https:\/\/github.com\/stfc\/psyclone (accessed 18 May 2019)."},{"key":"bibr28-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48096-0_45"},{"key":"bibr29-1094342019849618","unstructured":"Rosinski J (2017) GPTL \u2013 general purpose timing library. Available at: http:\/\/jmrosinski.github.io\/GPTL\/ (accessed 18 May 2019)."},{"key":"bibr30-1094342019849618","first-page":"105","volume-title":"Tools for High Performance Computing","author":"Servat H","year":"2011"},{"key":"bibr31-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19328-6_1"},{"key":"bibr32-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0493(1981)109<0758:AEAAMC>2.0.CO;2"},{"key":"bibr33-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1002\/2014MS000363"},{"key":"bibr34-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.1996.5554"},{"key":"bibr35-1094342019849618","unstructured":"Walkup B (2017) Personal Communication."},{"key":"bibr36-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2008.0219"},{"key":"bibr37-1094342019849618","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.5"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019849618","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342019849618","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019849618","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019849618","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T11:24:27Z","timestamp":1740914667000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342019849618"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,30]]},"references-count":37,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2019,9]]}},"alternative-id":["10.1177\/1094342019849618"],"URL":"https:\/\/doi.org\/10.1177\/1094342019849618","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2019,5,30]]}}}