{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T07:19:57Z","timestamp":1753600797601,"version":"3.38.0"},"reference-count":28,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2013,7,18]],"date-time":"2013-07-18T00:00:00Z","timestamp":1374105600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2013,11]]},"abstract":"<jats:p> The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This work presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2\u00d7 speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores. <\/jats:p>","DOI":"10.1177\/1094342013492446","type":"journal-article","created":{"date-parts":[[2013,7,19]],"date-time":"2013-07-19T04:32:06Z","timestamp":1374208326000},"page":"454-473","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":12,"title":["Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms"],"prefix":"10.1177","volume":"27","author":[{"given":"Khaled Z","family":"Ibrahim","sequence":"first","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"given":"Kamesh","family":"Madduri","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, University Park, USA"}]},{"given":"Samuel","family":"Williams","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"given":"Bei","family":"Wang","sequence":"additional","affiliation":[{"name":"Princeton Institute of Computational Science and Engineering, Princeton University, USA"}]},{"given":"Stephane","family":"Ethier","sequence":"additional","affiliation":[{"name":"Princeton Plasma Physics Laboratory, USA"}]},{"given":"Leonid","family":"Oliker","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]}],"member":"179","published-online":{"date-parts":[[2013,7,18]]},"reference":[{"key":"bibr1-1094342013492446","first-page":"012001","volume":"78","author":"Adams M","year":"2007","journal-title":"Journal of Physics"},{"key":"bibr2-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1145\/369028.369108"},{"key":"bibr3-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1063\/1.4822978"},{"key":"bibr4-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.2001.6851"},{"key":"bibr5-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5222734"},{"key":"bibr6-1094342013492446","first-page":"180","volume-title":"recent advances in parallel virtual machine and message passing interface (Euro PVM\/MPI)","author":"Briguglio S","year":"1996"},{"key":"bibr7-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1096-9128(199712)9:12<1377::AID-CPE284>3.0.CO;2-Q"},{"key":"bibr8-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2007.02.092"},{"key":"bibr9-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2010.11.009"},{"key":"bibr10-1094342013492446","first-page":"1","volume":"16","author":"Ethier S","year":"2005","journal-title":"Journal of Physics"},{"key":"bibr11-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1147\/rd.521.0105"},{"first-page":"342","volume-title":"international conference on computational science (ICCS \u201802)","year":"2002","key":"bibr12-1094342013492446"},{"key":"bibr13-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1887\/0852743920"},{"key":"bibr14-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2006.01.039"},{"key":"bibr15-1094342013492446","unstructured":"Koniges A, Preissl R, Kim J, et al. (2009) Application acceleration on current and future Cray platforms. In: Cray user group meeting, Edinburgh, Scotland, 24\u201327 May 2010."},{"key":"bibr16-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(87)90080-5"},{"key":"bibr17-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1126\/science.281.5384.1835"},{"key":"bibr18-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.52.5646"},{"key":"bibr19-1094342013492446","doi-asserted-by":"crossref","unstructured":"Madduri K, Ibrahim KZ, Williams S, et al. (2011a) Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems. In:  Proceedings Conference on High Performance Computing Networking, Storage and Analysis (SC 2011), Seattle, WA, USA, 12-18 November 2011, article number 23. New York, USA: ACM.","DOI":"10.1145\/2063384.2063415"},{"key":"bibr19a-1094342013492446","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1016\/j.parco.2011.02.001","volume":"37","author":"Madduri K","year":"2011","journal-title":"Parallel Computing"},{"key":"bibr20-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654108"},{"key":"bibr21-1094342013492446","first-page":"012087","volume":"125","author":"Marin G","year":"2008","journal-title":"Journal of Physics"},{"key":"bibr22-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542293"},{"key":"bibr23-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2003.11.004"},{"key":"bibr24-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2005.10.011"},{"key":"bibr25-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2004.54"},{"key":"bibr26-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/S0010-4655(01)00417-9"},{"key":"bibr27-1094342013492446","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.05.009"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342013492446","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342013492446","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342013492446","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T17:43:43Z","timestamp":1740937423000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342013492446"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,7,18]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,11]]}},"alternative-id":["10.1177\/1094342013492446"],"URL":"https:\/\/doi.org\/10.1177\/1094342013492446","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2013,7,18]]}}}