{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:29:33Z","timestamp":1750220973429,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2018,6,30]],"date-time":"2018-06-30T00:00:00Z","timestamp":1530316800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Advanced Computation and I\/O Methods for Earth-System Simulations"},{"name":"Joint Usage\/Research Center for Interdisciplinary Large-scale Information Infrastructures"},{"name":"Japan Science and Technology Agency (JST) Core Research of Evolutional Science and Technology"},{"name":"Scientific Researc","award":["26220002"],"award-info":[{"award-number":["26220002"]}]},{"name":"Highly Productive, High Performance Application Frameworks for Post Peta-scale Computing"},{"name":"High Performance Computing Infrastructure"},{"DOI":"10.13039\/501100001691","name":"KAKENHI","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001700","name":"Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001700","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Software for Exascale Computing"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2018,6,30]]},"abstract":"<jats:p>We introduce \u201cHybrid Fortran,\u201d a new approach that allows a high-performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms of productivity. It has been successfully applied to both dynamical core and physical processes of ASUCA, a Japanese mesoscale weather prediction model with more than 150k lines of code. By means of a minimal weather application that resembles ASUCA\u2019s code structure, Hybrid Fortran is compared to both a performance model as well as today\u2019s commonly used method, OpenACC. As a result, the Hybrid Fortran implementation is shown to deliver the same or better performance than OpenACC, and its performance agrees with the model both on CPU and GPU. In a full-scale production run, using an ASUCA grid with 1581 \u00d7 1301 \u00d7 58 cells and real-world weather data in 2km resolution, 24 NVIDIA Tesla P100 running the Hybrid Fortran\u2013based GPU port are shown to replace more than fifty 18-core Intel Xeon Broadwell E5-2695 v4 running the reference implementation\u2014an achievement comparable to more invasive GPGPU rewrites of other weather models.<\/jats:p>","DOI":"10.1145\/3291523","type":"journal-article","created":{"date-parts":[[2018,12,19]],"date-time":"2018-12-19T13:07:08Z","timestamp":1545224828000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code"],"prefix":"10.1145","volume":"5","author":[{"given":"Michel","family":"M\u00fcller","sequence":"first","affiliation":[{"name":"Tokyo Institute of Technology, Meguro-ku, Tokyo"}]},{"given":"Takayuki","family":"Aoki","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Meguro-ku, Tokyo"}]}],"member":"320","published-online":{"date-parts":[[2018,12,19]]},"reference":[{"volume-title":"Retrieved","year":"2014","author":"Bokhanko Andrey","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","first-page":"1","article-title":"A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems","volume":"2013","author":"Cumming Ben","year":"2013","journal-title":"Proceedings of Cray User Group"},{"key":"e_1_2_1_3_1","first-page":"21","article-title":"Cache optimization for structured and unstructured grid multigrid","volume":"10","author":"Douglas Craig C.","year":"2000","journal-title":"Electr. Trans. Numer. Anal."},{"volume-title":"Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201909)","year":"2009","author":"Dursun Hikmet","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.07.003"},{"volume-title":"Retrieved","year":"2014","author":"Fuhrer Oliver","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14529\/jsfi140103"},{"volume-title":"Retrieved","year":"2012","author":"Govett Mark","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.106"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACCPD.2014.9"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1175\/BAMS-D-15-00278.1"},{"volume-title":"Retrieved","year":"2012","author":"The Portland Group","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Mark Harris. 2007. Optimizing CUDA. SC07: High Performance Computing with CUDA (2007).  Mark Harris. 2007. Optimizing CUDA. SC07: High Performance Computing with CUDA (2007).","DOI":"10.1145\/1281500.1281650"},{"volume-title":"Retrieved","year":"2010","key":"e_1_2_1_14_1"},{"volume-title":"Retrieved","year":"2012","key":"e_1_2_1_15_1"},{"volume-title":"Retrieved","year":"2016","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","first-page":"0511","article-title":"Development of a new nonhydrostatic model ASUCA at JMA","volume":"40","author":"Ishida Junichi","year":"2010","journal-title":"CAS\/JSC WGNE Res. Activ. Atmos. Oceanic Model."},{"volume-title":"Retrieved","year":"2016","author":"NVIDIA Inc. James Beyer.","key":"e_1_2_1_18_1"},{"volume-title":"Retrieved","year":"2013","author":"Cray Inc. James C. Beyer.","key":"e_1_2_1_19_1"},{"volume-title":"International Conference on Parallel Processing and Applied Mathematics. Springer, 145--153","year":"2001","author":"Kwiatkowski Jan","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626414500030"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188677"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063398"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536351"},{"volume-title":"SPIE Sensing Technology+ Applications","author":"Mielikainen Jarno","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTARS.2012.2186119"},{"volume-title":"Hybrid Fortran: High productivity GPU porting framework applied to Japanese weather prediction model","year":"2018","author":"M\u00fcller Michel","key":"e_1_2_1_27_1"},{"volume-title":"Retrieved","year":"2015","author":"ACC.","key":"e_1_2_1_29_1"},{"volume-title":"Retrieved","year":"2012","author":"Preshing Jeff","key":"e_1_2_1_30_1"},{"volume-title":"December 22, 2017 from http:\/\/www.theregister.co.uk\/2012\/11\/12\/nvidia_tesla_k20_k20x_gpu_coprocessors\/?page&equals;2.","year":"2012","author":"Morgan Timothy Prickett","key":"e_1_2_1_31_1"},{"volume-title":"Proceedings of the 9th International Workshop on OpenMP (IWOMP'13)","author":"Rendell Alistair P.","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of the Many-Core and Reconfigurable Supercomputing Conference.","year":"2010","author":"Ruetsch Greg","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","unstructured":"M. Sakamoto J. Ishida K Kawano K. Matsubayashi K. Aranami T. Hara H. Kusabiraki C. Muroi and Y. Kitamura. 2014. Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA. (2014).  M. Sakamoto J. Ishida K Kawano K. Matsubayashi K. Aranami T. Hara H. Kusabiraki C. Muroi and Y. Kitamura. 2014. Development of Yin-Yang Grid Global Model Using a New Dynamical Core ASUCA. (2014)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2011.04.166"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.9"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.26"},{"key":"e_1_2_1_38_1","unstructured":"Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb\u2019s J. 30 3 (2005) 202--210.  Herb Sutter. 2005. The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb\u2019s J. 30 3 (2005) 202--210."},{"key":"e_1_2_1_39_1","unstructured":"Irina Tezaur Jerry Watkins and Irina Demeshko. {n.d.}. Towards performance-portability of the albany\/FELIX land-ice solver to new and emerging architectures using Kokkos (unpublished).  Irina Tezaur Jerry Watkins and Irina Demeshko. {n.d.}. Towards performance-portability of the albany\/FELIX land-ice solver to new and emerging architectures using Kokkos (unpublished)."},{"volume-title":"Retrieved","year":"2013","author":"Architecture Review Board The","key":"e_1_2_1_40_1"},{"volume-title":"Retrieved","year":"2016","key":"e_1_2_1_41_1"},{"volume-title":"Retrieved","year":"2017","author":"Tokyo University","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2013.6641457"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1175\/1520-0493(2002)130<2088:TSMFEM>2.0.CO;2"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"volume-title":"Retrieved","year":"2017","author":"Xu Rengan","key":"e_1_2_1_47_1"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291523","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3291523","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:33Z","timestamp":1750204473000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3291523"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,30]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,6,30]]}},"alternative-id":["10.1145\/3291523"],"URL":"https:\/\/doi.org\/10.1145\/3291523","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2018,6,30]]},"assertion":[{"value":"2017-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}