{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:06:43Z","timestamp":1759133203537,"version":"3.38.0"},"reference-count":32,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2016,12,21]],"date-time":"2016-12-21T00:00:00Z","timestamp":1482278400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2018,3]]},"abstract":"<jats:p> Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers. <\/jats:p>","DOI":"10.1177\/1094342016682071","type":"journal-article","created":{"date-parts":[[2017,12,31]],"date-time":"2017-12-31T15:33:09Z","timestamp":1514734389000},"page":"288-301","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["A lightweight approach to performance portability with targetDP"],"prefix":"10.1177","volume":"32","author":[{"given":"Alan","family":"Gray","sequence":"first","affiliation":[{"name":"EPCC, University of Edinburgh, UK"}]},{"given":"Kevin","family":"Stratford","sequence":"additional","affiliation":[{"name":"EPCC, University of Edinburgh, UK"}]}],"member":"179","published-online":{"date-parts":[[2016,12,21]]},"reference":[{"key":"bibr1-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.71"},{"key":"bibr2-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1093\/oso\/9780195076943.001.0001"},{"key":"bibr3-1094342016682071","unstructured":"Bull M (2013) PRACE 2IP D7.4, Unified European Application Benchmark Suite. Available at: http:\/\/www.prace-ri.eu\/ueabs\/ (accessed 28 November 2016)."},{"key":"bibr4-1094342016682071","unstructured":"Cepeda S (2012) Optimization and performance tuning for intel xeon phi coprocessors, part 2. Available at: https:\/\/software.intel.com\/en-us\/articles (accessed 28 November 2016)."},{"key":"bibr5-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.92.022001"},{"key":"bibr6-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1063\/1.2808028"},{"key":"bibr7-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1016\/S0010-4655(00)00205-8"},{"key":"bibr8-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1155\/2012\/917630"},{"key":"bibr9-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015576848"},{"key":"bibr10-1094342016682071","unstructured":"Gray A, Hart A, Richardson A, (2012) Lattice Boltzmann for Large-Scale GPU Systems ( Advances in Parallel Computing). Amsterdam, Netherlands: IOS Press, pp. 167\u2013174."},{"volume-title":"Ludwig: Multiple GPUs for a Complex Fluid Lattice Boltzmann Application","year":"2013","author":"Gray A","key":"bibr11-1094342016682071"},{"key":"bibr12-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC.2014.212"},{"key":"bibr13-1094342016682071","unstructured":"Gray A, Stratford K (2016) targetDP Web page. Available at: http:\/\/ludwig.epcc.ed.ac.uk\/targetdp. (accessed 28 November 2016)."},{"key":"bibr14-1094342016682071","unstructured":"Harris M (2016) HEMI Web Page. Available at: Github.com\/harrism\/hemi. (accessed 28 November 2016)."},{"key":"bibr15-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1039\/c3sm50228g"},{"key":"bibr16-1094342016682071","doi-asserted-by":"publisher","DOI":"10.2172\/1169830"},{"key":"bibr17-1094342016682071","unstructured":"Khronos OpenCL Working Group (2015) OpenCL Specification. Version 2.1. Available at: https:\/\/www.khronos.org\/opencl. (accessed 28 November 2016)."},{"key":"bibr18-1094342016682071","unstructured":"Khronos OpenCL Working Group - SYCL Subgroup (2015) SYCL Specification. Version 1.2. Available at: https:\/\/www.khronos.org\/sycl. (accessed 28 November 2016)."},{"key":"bibr19-1094342016682071","first-page":"19","author":"McCalpin JD","year":"1995","journal-title":"IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter"},{"key":"bibr20-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07518-1_4"},{"key":"bibr21-1094342016682071","unstructured":"Medina DS, St-Cyr A, Warburton T (2014) Occa: A unified approach to multi-threading languages. arXiv preprint arXiv:1403.0968."},{"key":"bibr22-1094342016682071","unstructured":"NVIDIA Whitepaper (2014) NVIDIA NVLink high-speed interconnect: Application performance. Available at: http:\/\/www.nvidia.com\/object\/nvlink.html. (accessed 28 November 2016)."},{"key":"bibr23-1094342016682071","unstructured":"OpenMP Architecture Review Board (2015) OpenMP application program interface version 4.5. Available at: http:\/\/www.openmp.org\/specifications\/. (accessed 28 November 2016)."},{"key":"bibr24-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807629"},{"key":"bibr25-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1145\/2784731.2784754"},{"key":"bibr26-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1007\/s10955-015-1411-x"},{"key":"bibr27-1094342016682071","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198503989.001.0001","volume-title":"The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond","author":"Succi S","year":"2001"},{"key":"bibr28-1094342016682071","unstructured":"The MILC Collaboration (2014) The MILC Code manual. Available at: http:\/\/www.physics.utah.edu\/~detar\/milc. (accessed 28 November 2016)."},{"key":"bibr29-1094342016682071","unstructured":"The OpenACC Standard Committee (2015) The OpenACC application programming interface version 2.5. Available at: http:\/\/www.openacc.org\/. (accessed 28 November 2016)."},{"key":"bibr30-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1039\/c4sm00042k"},{"key":"bibr31-1094342016682071","unstructured":"Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC, San Jose, CA, 20\u201323 September 2010, volume 10. Available at: www.nvidia.com\/content\/gtc-2010\/pdfs\/2238_gtc2010.pdf (accessed 28 November 2016)."},{"key":"bibr32-1094342016682071","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016682071","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016682071","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016682071","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T16:40:21Z","timestamp":1740760821000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016682071"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,21]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,3]]}},"alternative-id":["10.1177\/1094342016682071"],"URL":"https:\/\/doi.org\/10.1177\/1094342016682071","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2016,12,21]]}}}