{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T05:02:07Z","timestamp":1769835727639,"version":"3.49.0"},"publisher-location":"Cham","reference-count":48,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783319075174","type":"print"},{"value":"9783319075181","type":"electronic"}],"license":[{"start":{"date-parts":[[2014,1,1]],"date-time":"2014-01-01T00:00:00Z","timestamp":1388534400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2014,1,1]],"date-time":"2014-01-01T00:00:00Z","timestamp":1388534400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014]]},"DOI":"10.1007\/978-3-319-07518-1_4","type":"book-chapter","created":{"date-parts":[[2014,6,3]],"date-time":"2014-06-03T06:54:08Z","timestamp":1401778448000},"page":"53-75","source":"Crossref","is-referenced-by-count":29,"title":["On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures"],"prefix":"10.1007","author":[{"given":"Simon","family":"McIntosh-Smith","sequence":"first","affiliation":[]},{"given":"Michael","family":"Boulton","sequence":"additional","affiliation":[]},{"given":"Dan","family":"Curran","sequence":"additional","affiliation":[]},{"given":"James","family":"Price","sequence":"additional","affiliation":[]}],"member":"297","reference":[{"key":"4_CR1","unstructured":"Moore, G.: Cramming more components onto integrated circuits. Electronics Magazine, 114\u2013117 (April 1965)"},{"key":"4_CR2","unstructured":"Demmel, J., Dongarra, J., Parlett, B., Kahan, W., Gu, M., Bindel, D., Hida, Y., Li, X., Marques, O., Riedy, E.J., et al.: Prospectus for a dense linear algebra software library (April 2006)"},{"key":"4_CR3","doi-asserted-by":"crossref","unstructured":"Munshi, A. (ed.): The Khronous OpenCL Working Group: The OpenCL specification (2008)","DOI":"10.1109\/HOTCHIPS.2009.7478342"},{"key":"4_CR4","volume-title":"AMBER 2012","author":"D. Case","year":"2012","unstructured":"Case, D., Darden, T., Cheatham III, T., Simmerling, C., Wang, J., Duke, R., Luo, R., Walker, R., Zhang, W., Merz, K., et al.: AMBER 2012. University of California, San Francisco (2012)"},{"key":"4_CR5","doi-asserted-by":"crossref","unstructured":"G\u00f6tz, A.W., Williamson, M.J., Xu, D., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized Born. Journal of Chemical Theory and Computation\u00a08(5), 1542\u20131555 (2012)","DOI":"10.1021\/ct200909j"},{"issue":"9","key":"4_CR6","doi-asserted-by":"publisher","first-page":"3878","DOI":"10.1021\/ct400314y","volume":"9","author":"R. Salomon-Ferrer","year":"2013","unstructured":"Salomon-Ferrer, R., G\u00f6tz, A.W., Poole, D., Le Grand, S., Walker, R.C.: Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. Journal of Chemical Theory and Computation\u00a09(9), 3878\u20133888 (2013)","journal-title":"Journal of Chemical Theory and Computation"},{"issue":"2","key":"4_CR7","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1016\/j.cpc.2012.09.022","volume":"184","author":"S.L. Grand","year":"2013","unstructured":"Grand, S.L., G\u00f6tz, A.W., Walker, R.C.: SPFP: Speed without compromise\u2014a mixed precision model for GPU accelerated molecular dynamics simulations. Computer Physics Communications\u00a0184(2), 374\u2013380 (2013)","journal-title":"Computer Physics Communications"},{"key":"4_CR8","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1007\/978-3-642-28145-7_11","volume-title":"Applied Parallel and Scientific Computing","author":"A. Davidson","year":"2012","unstructured":"Davidson, A., Owens, J.: Toward techniques for auto-tuning gpu algorithms. In: J\u00f3nasson, K. (ed.) PARA 2010, Part II. LNCS, vol.\u00a07134, pp. 110\u2013119. Springer, Heidelberg (2012)"},{"key":"4_CR9","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1007\/978-3-642-38750-0_11","volume-title":"Supercomputing","author":"Y. Zhang","year":"2013","unstructured":"Zhang, Y., Sinclair II, M., Chien, A.A.: Improving performance portability in OpenCL programs. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol.\u00a07905, pp. 136\u2013150. Springer, Heidelberg (2013)"},{"key":"4_CR10","doi-asserted-by":"crossref","unstructured":"McIntosh-Smith, S., Price, J., Sessions, R.B., Ibarra, A.A.: High performance in silico virtual drug screening on many-core processors. International Journal of High Performance Computing Applications (IJHPCA) (April 2014)","DOI":"10.1177\/1094342014528252"},{"key":"4_CR11","unstructured":"McIntosh-Smith, S., Sessions, R.B.: An accelerated, computer assisted molecular modeling method for drug design. In: International Supercomputing (June 2008)"},{"key":"4_CR12","unstructured":"Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from Berkeley. Technical Report UCB\/EECS-2006-183, EECS Department, University of California, Berkeley (2006)"},{"key":"4_CR13","unstructured":"Colella, P.: Defining software requirements for scientific computing (2004)"},{"key":"4_CR14","first-page":"275","volume":"66","author":"L. Boltzmann","year":"1872","unstructured":"Boltzmann, L.: Weitere studien \u00fcber das W\u00e4rmegleichgewicht unter gasmolek\u00fclen (further studies on the heat equilibrium of gas molecules). Wiener Berichte\u00a066, 275\u2013370 (1872)","journal-title":"Wiener Berichte"},{"issue":"6","key":"4_CR15","doi-asserted-by":"publisher","first-page":"479","DOI":"10.1209\/0295-5075\/17\/6\/001","volume":"17","author":"Y.H. Qian","year":"1992","unstructured":"Qian, Y.H., D\u2019Humi\u00e8res, D., Lallemand, P.: Lattice BGK models for Navier-Stokes equation. EPL (Europhysics Letters)\u00a017(6), 479 (1992)","journal-title":"EPL (Europhysics Letters)"},{"key":"4_CR16","doi-asserted-by":"crossref","unstructured":"Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Numerical Mathematics and Scientific Computation. Clarendon Press (2001)","DOI":"10.1093\/oso\/9780198503989.001.0001"},{"issue":"5","key":"4_CR17","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1016\/j.advengsoft.2010.10.007","volume":"42","author":"J. Habich","year":"2011","unstructured":"Habich, J., Zeiser, T., Hager, G., Wellein, G.: Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA. Advances in Engineering Software\u00a042(5), 266\u2013272 (2011)","journal-title":"Advances in Engineering Software"},{"key":"4_CR18","doi-asserted-by":"crossref","unstructured":"Mawson, M., Revell, A.: Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs. arXiv preprint arXiv:1309.1983 (2013)","DOI":"10.1016\/j.cpc.2014.06.003"},{"key":"4_CR19","doi-asserted-by":"crossref","unstructured":"Januszewski, M., Kostur, M.: Sailfish: a flexible multi-GPU implementation of the lattice Boltzmann method. ArXiv e-prints (November 2013)","DOI":"10.1016\/j.cpc.2014.04.018"},{"issue":"9","key":"4_CR20","doi-asserted-by":"publisher","first-page":"973","DOI":"10.1002\/fld.711","volume":"45","author":"C.B. Allen","year":"2004","unstructured":"Allen, C.B.: An unsteady multiblock multigrid scheme for lifting forward flight rotor simulation. International Journal for Numerical Methods in Fluids\u00a045(9), 973\u2013984 (2004)","journal-title":"International Journal for Numerical Methods in Fluids"},{"issue":"10","key":"4_CR21","doi-asserted-by":"publisher","first-page":"2126","DOI":"10.1002\/nme.1846","volume":"69","author":"C.B. Allen","year":"2007","unstructured":"Allen, C.B.: Parallel universal approach to mesh motion and application to rotors in forward flight. International Journal for Numerical Methods in Engineering\u00a069(10), 2126\u20132149 (2007)","journal-title":"International Journal for Numerical Methods in Engineering"},{"issue":"6","key":"4_CR22","doi-asserted-by":"publisher","first-page":"632","DOI":"10.1002\/nme.1723","volume":"68","author":"C.B. Allen","year":"2006","unstructured":"Allen, C.B.: Parallel simulation of unsteady hovering rotor wakes. International Journal for Numerical Methods in Engineering\u00a068(6), 632\u2013649 (2006)","journal-title":"International Journal for Numerical Methods in Engineering"},{"issue":"10","key":"4_CR23","doi-asserted-by":"publisher","first-page":"1519","DOI":"10.1002\/nme.2219","volume":"74","author":"T.C.S. Rendall","year":"2008","unstructured":"Rendall, T.C.S., Allen, C.B.: Unified fluid\u2013structure interpolation and mesh motion using radial basis functions. International Journal for Numerical Methods in Engineering\u00a074(10), 1519\u20131559 (2008)","journal-title":"International Journal for Numerical Methods in Engineering"},{"issue":"1","key":"4_CR24","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s11081-011-9179-6","volume":"14","author":"C.B. Allen","year":"2013","unstructured":"Allen, C.B., Rendall, T.C.: CFD-based optimization of hovering rotors using radial basis functions for shape parameterization and mesh deformation. Optimization and Engineering\u00a014(1), 97\u2013118 (2013)","journal-title":"Optimization and Engineering"},{"key":"4_CR25","doi-asserted-by":"crossref","unstructured":"Herdman, J., Gaudin, W., McIntosh-Smith, S., Boulton, M., Beckingsale, D., Mallinson, A., Jarvis, S.: Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 465\u2013471 (November 2012)","DOI":"10.1109\/SC.Companion.2012.66"},{"key":"4_CR26","unstructured":"Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Sandia National Laboratories. Tech. Rep. (2009)"},{"key":"4_CR27","unstructured":"Sandia National Laboratory: The Mantevo project home page (February 2014), http:\/\/mantevo.org"},{"key":"4_CR28","unstructured":"Mallinson, A.C., Beckingsale, D.A., Gaudin, W.P., Herdman, J.A., Jarvis, S.A.: Towards portable performance for explicit hydrodynamics codes. In: Proceedings of the 1st International Workshop on OpenCL (IWOCL 2013). ACM (May 2013)"},{"key":"4_CR29","doi-asserted-by":"crossref","unstructured":"Saad, Y.: Iterative methods for sparse linear systems. SIAM (2003)","DOI":"10.1137\/1.9780898718003"},{"key":"4_CR30","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1007\/978-3-642-36949-0_47","volume-title":"Euro-Par 2012: Parallel Processing Workshops","author":"H. Servat","year":"2013","unstructured":"Servat, H., Teruel, X., Llort, G., Duran, A., Gim\u00e9nez, J., Martorell, X., Ayguad\u00e9, E., Labarta, J.: On the instrumentation of OpenMP and OmpSs tasking constructs. In: Caragiannis, I., et al. (eds.) Euro-Par Workshops 2012. LNCS, vol.\u00a07640, pp. 414\u2013428. Springer, Heidelberg (2013)"},{"key":"4_CR31","unstructured":"Komatsu, K., Sato, K., Arai, Y., Koyama, K., Takizawa, H., Kobayashi, H.: Evaluating performance and portability of OpenCL programs. In: The Fifth International Workshop on Automatic Performance Tuning (2010)"},{"key":"4_CR32","unstructured":"Rul, S., Vandierendonck, H., D\u2019Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on Application Accelerators in High Performance Computing (2010) (papers)"},{"key":"4_CR33","doi-asserted-by":"crossref","unstructured":"Seo, S., Jo, G., Lee, J.: Performance characterization of the NAS parallel benchmarks in OpenCL. In: 2011 IEEE International Symposium on Workload Characterization (IISWC), pp. 137\u2013148. IEEE (2011)","DOI":"10.1109\/IISWC.2011.6114174"},{"issue":"11","key":"4_CR34","doi-asserted-by":"publisher","first-page":"1439","DOI":"10.1016\/j.jpdc.2012.07.005","volume":"73","author":"S. Pennycook","year":"2013","unstructured":"Pennycook, S., Hammond, S., Wright, S., Herdman, J., Miller, I., Jarvis, S.: An investigation of the performance portability of OpenCL. Journal of Parallel and Distributed Computing\u00a073(11), 1439\u20131450 (2013)","journal-title":"Journal of Parallel and Distributed Computing"},{"issue":"8","key":"4_CR35","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1016\/j.parco.2011.10.002","volume":"38","author":"P. Du","year":"2012","unstructured":"Du, P., Weber, R., Luszczek, P., Tomov, S., Peterson, G., Dongarra, J.: From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming. Parallel Computing\u00a038(8), 391\u2013407 (2012)","journal-title":"Parallel Computing"},{"key":"4_CR36","unstructured":"Cao, C., Dongarra, J., Du, P., Gates, M., Luszczek, P., Tomov, S.: clMAGMA: High performance dense linear algebra with OpenCL. Technical report (lawn 275), ut-cs-13-706, University of Tennessee Computer Science (March 2013)"},{"key":"4_CR37","unstructured":"Habich, J., Feichtinger, C., Kostler, H., Hager, G., Wellein, G.: Performance engineering for the lattice Boltzmann method on GPGPUs: Architectural requirements and performance results. ArXiv e-prints (December 2011)"},{"key":"4_CR38","unstructured":"Gray, A., Stratford, K.: Ludwig: multiple GPUs for a complex fluid lattice Boltzmann application. In: Couturier, R. (ed.) Designing Scientific Applications on GPUs. Chapman & Hall\/CRC Numerical Analysis and Scientific Computing Series, Taylor & Francis (2013)"},{"key":"4_CR39","unstructured":"Gray, A., Hart, A., Henrich, O., Stratford, K.: Scaling soft matter physics to thousands of GPUs in parallel (2013)"},{"issue":"7","key":"4_CR40","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1007\/s11434-011-4908-y","volume":"57","author":"Q. Xiong","year":"2012","unstructured":"Xiong, Q., Li, B., Xu, J., Fang, X., Wang, X., Wang, L., He, X., Ge, W.: Efficient parallel implementation of the lattice Boltzmann method on large clusters of graphic processing units. Chinese Science Bulletin\u00a057(7), 707\u2013715 (2012)","journal-title":"Chinese Science Bulletin"},{"issue":"2","key":"4_CR41","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1016\/j.jocs.2011.01.008","volume":"2","author":"M. Geveler","year":"2011","unstructured":"Geveler, M., Ribbrock, D., Mallach, S., Goddeke, D.: A simulation suite for Lattice-Boltzmann based real-time CFD applications exploiting multi-level parallelism on modern multi- and many-core architectures. Journal of Computational Science\u00a02(2), 113\u2013123 (2011)","journal-title":"Journal of Computational Science"},{"key":"4_CR42","doi-asserted-by":"crossref","unstructured":"Brandvik, T., Pullan, G.: Acceleration of a 3D Euler solver using commodity graphics hardware. In: 46th AIAA Aerospace Sciences Meeting and Exhibit, January 2008, pp. 607\u2013661 (2008)","DOI":"10.2514\/6.2008-607"},{"issue":"24","key":"4_CR43","doi-asserted-by":"publisher","first-page":"10148","DOI":"10.1016\/j.jcp.2008.08.023","volume":"227","author":"E. Elsen","year":"2008","unstructured":"Elsen, E., LeGresley, P., Darve, E.: Large calculation of the flow over a hypersonic vehicle using a GPU. Journal of Computational Physics\u00a0227(24), 10148\u201310161 (2008)","journal-title":"Journal of Computational Physics"},{"key":"4_CR44","unstructured":"Cohen, J., Molemaker, M.J.: A fast double precision CFD code using CUDA. In: Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, pp. 414\u2013429 (2009)"},{"key":"4_CR45","doi-asserted-by":"crossref","unstructured":"G\u00f6ddeke, D., Buijssen, S., Wobker, H., Turek, S.: GPU acceleration of an unmodified parallel finite element Navier-Stokes solver. In: International Conference on High Performance Computing Simulation, HPCS 2009, pp. 12\u201321 (June 2009)","DOI":"10.1109\/HPCSIM.2009.5191718"},{"key":"4_CR46","doi-asserted-by":"crossref","unstructured":"Phillips, E.H., Zhang, Y., Davis, R.L., Owens, J.D.: Rapid aerodynamic performance prediction on a cluster of graphics processing units. In: Proceedings of the 47th AIAA Aerospace Sciences Meeting, pp. 1\u201311 (2009)","DOI":"10.2514\/6.2009-565"},{"key":"4_CR47","doi-asserted-by":"crossref","unstructured":"Barnette, D.W., Barrett, R.F., Hammond, S.D., Jayaraj, J., Laros III, J.H.: Using miniapplications in a Mantevo framework for optimizing Sandia\u2019s SPARC CFD code on multi-core, many-core, and GPU-accelerated compute platforms. Technical report, Sandia National Laboratories (2012)","DOI":"10.2514\/6.2013-1126"},{"key":"4_CR48","unstructured":"Mallinson, A., Beckingsale, D., Gaudin, W., Herdman, J., Levesque, J., Jarvis, S.: CloverLeaf: Preparing hydrodynamics codes for Exascale. Cray User Group (CUG), Napa Valley (2013)"}],"container-title":["Lecture Notes in Computer Science","Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-319-07518-1_4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T07:54:33Z","timestamp":1716796473000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-319-07518-1_4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014]]},"ISBN":["9783319075174","9783319075181"],"references-count":48,"URL":"https:\/\/doi.org\/10.1007\/978-3-319-07518-1_4","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014]]}}}