{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T04:55:32Z","timestamp":1755838532404,"version":"3.38.0"},"reference-count":30,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2016,8,24]],"date-time":"2016-08-24T00:00:00Z","timestamp":1471996800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"European Community\u2019s Seventh Framework programme ICT","award":["287703"],"award-info":[{"award-number":["287703"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2017,5]]},"abstract":"<jats:p> The lattice Boltzmann method is a well-established numerical approach for complex fluid flow simulations. Recently, general-purpose graphics processing units (GPUs) have become available as high-performance computing resources at large scale. We report on designing and implementing a lattice Boltzmann solver for multi-GPU systems that achieves 1.79 PFLOPS performance on 16,384 GPUs. To achieve this performance, we introduce a GPU compatible version of the so-called bundle data layout and eliminate the halo sites in order to improve data access alignment. Furthermore, we make use of the possibility to overlap data transfer between the host central processing unit and the device GPU with computing on the GPU. As a benchmark case, we simulate flow in porous media and measure both strong and weak scaling performance with the emphasis being on large-scale simulations using realistic input data. <\/jats:p>","DOI":"10.1177\/1094342016658109","type":"journal-article","created":{"date-parts":[[2016,8,25]],"date-time":"2016-08-25T00:58:18Z","timestamp":1472086698000},"page":"246-255","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication"],"prefix":"10.1177","volume":"31","author":[{"given":"Fredrik","family":"Roberts\u00e9n","sequence":"first","affiliation":[{"name":"\u00c5bo Akademi University, Faculty of Science and Engineering, \u00c5bo, Finland"}]},{"given":"Jan","family":"Westerholm","sequence":"additional","affiliation":[{"name":"\u00c5bo Akademi University, Faculty of Science and Engineering, \u00c5bo, Finland"}]},{"given":"Keijo","family":"Mattila","sequence":"additional","affiliation":[{"name":"Department of Physics and Nanoscience Center, University of Jyv\u00e4skyl\u00e4, Jyv\u00e4skyl\u00e4, Finland"},{"name":"Department of Physics, Tampere University of Technology, Tampere, Finland"}]}],"member":"179","published-online":{"date-parts":[[2016,8,24]]},"reference":[{"key":"bibr1-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-fluid-121108-145519"},{"key":"bibr2-1094342016658109","unstructured":"AMD. Amd opteronTM 6200 series processors. Available at: http:\/\/www.amd.com\/en-us\/products\/server\/opteron\/6000\/6200 (accessed 14 September 2015)"},{"key":"bibr3-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2009.38"},{"key":"bibr4-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1016\/0370-1573(92)90090-M"},{"volume-title":"Proceedings of cray user group conference (CUG 2012)","year":"2012","author":"Bland A","key":"bibr5-1094342016658109"},{"key":"bibr6-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1016\/0167-2789(91)90295-K"},{"volume-title":"Cray XK7","year":"2011","key":"bibr7-1094342016658109"},{"issue":"2","key":"bibr8-1094342016658109","first-page":"427","volume":"3","author":"Ginzburg I","year":"2008","journal-title":"Communications in Computational Physics"},{"key":"bibr9-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503273"},{"key":"bibr10-1094342016658109","first-page":"167","volume-title":"Applications, tools and techniques on the road to exascale computing","author":"Gray A","year":"2011"},{"key":"bibr11-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015576848"},{"key":"bibr12-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.84.062301"},{"key":"bibr13-1094342016658109","unstructured":"Lustre File SystemTM Operations Manual for Lustre. Oracle 2.0, January2011."},{"key":"bibr14-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2007.08.001"},{"key":"bibr15-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2015.11.013"},{"key":"bibr16-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.67"},{"key":"bibr17-1094342016658109","unstructured":"MPI (2012) A Message-Passing Interface Standard Version 3.0. Message Passing Interface Forum, University of Tennessee, Knoxville, TN, USA."},{"key":"bibr18-1094342016658109","unstructured":"NVIDIA\u00ae TESLA\u00ae GPU ACCELERATORS. (2013) NVIDIA, Clara, CA."},{"key":"bibr19-1094342016658109","unstructured":"Oak Ridge Leadership Computing Facility (2015a) Lustre basics. Available at: https:\/\/www.olcf.ornl.gov\/kb_articles\/lustre-basics\/ (accessed 14 September 2015)"},{"key":"bibr20-1094342016658109","unstructured":"Oak Ridge Leadership Computing Facility (2015b) Atlas updates. Available at: https:\/\/www.olcf.ornl.gov\/kb_articles\/atlas-updates\/ (accessed 14 September 2015)"},{"key":"bibr21-1094342016658109","unstructured":"Oak Ridge National Laboratory (2015) Introducing titan \u2013 the world\u2019s #1 open science supercomputer. Available at: https:\/\/www.olcf.ornl.gov\/titan\/ (accessed 14 September 2015)"},{"key":"bibr22-1094342016658109","unstructured":"PCI-SIG (2009) PCI Express\u00ae Base Specification Revision 2.1. PCI-SIG. Available at: http:\/\/www.pcisig.com\/specifications\/pciexpress\/base. (accessed 14 September 2015)."},{"key":"bibr23-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2004.37"},{"volume-title":"Profiler Users\u2019s Guide 6.5","year":"2014","key":"bibr24-1094342016658109"},{"key":"bibr25-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1209\/0295-5075\/17\/6\/001"},{"key":"bibr26-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1109\/PDP.2015.71"},{"key":"bibr27-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-55919-8_13"},{"volume-title":"The Gemini Network","year":"2010","key":"bibr28-1094342016658109"},{"key":"bibr29-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1016\/j.compfluid.2005.02.008"},{"key":"bibr30-1094342016658109","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2012.05.002"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016658109","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016658109","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016658109","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T23:16:06Z","timestamp":1740957366000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016658109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,8,24]]},"references-count":30,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,5]]}},"alternative-id":["10.1177\/1094342016658109"],"URL":"https:\/\/doi.org\/10.1177\/1094342016658109","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2016,8,24]]}}}