{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:54:57Z","timestamp":1775069697564,"version":"3.50.1"},"reference-count":51,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2019,2,24]],"date-time":"2019-02-24T00:00:00Z","timestamp":1550966400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100004281","name":"Narodowe Centrum Nauki","doi-asserted-by":"publisher","award":["UMO-2017\/26\/D\/ST6\/00687"],"award-info":[{"award-number":["UMO-2017\/26\/D\/ST6\/00687"]}],"id":[{"id":"10.13039\/501100004281","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,5]]},"abstract":"<jats:p> In this work, we take up the challenge of performance portable programming of heterogeneous stencil computations across a wide range of modern shared-memory systems. An important example of such computations is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), the second major part of the dynamic core of the EULAG geophysical model. For this aim, we develop a set of parametric optimization techniques and four-step procedure for customization of the MPDATA code. Among these techniques are: islands-of-cores strategy, (3+1)D decomposition, exploiting data parallelism and simultaneous multithreading, data flow synchronization, and vectorization. The proposed adaptation methodology helps us to develop the automatic transformation of the MPDATA code to achieve high sustained scalable performance for all tested ccNUMA platforms with Intel processors of last generations. This means that for a given platform, the sustained performance of the new code is kept at a similar level, independently of the problem size. The highest performance utilization rate of about 41\u201346% of the theoretical peak, measured for all benchmarks, is provided for any of the two-socket servers based on Skylake-SP (SKL-SP), Broadwell, and Haswell CPU architectures. At the same time, the four-socket server with SKL-SP processors achieves the highest sustained performance of around 1.0\u20131.1 Tflop\/s that corresponds to about 33% of the peak. <\/jats:p>","DOI":"10.1177\/1094342019828153","type":"journal-article","created":{"date-parts":[[2019,2,25]],"date-time":"2019-02-25T03:14:46Z","timestamp":1551064486000},"page":"534-553","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors"],"prefix":"10.1177","volume":"33","author":[{"given":"Lukasz","family":"Szustak","sequence":"first","affiliation":[{"name":"Institute of Computer and Information Science, Faculty of Mechanical Engineering and Computer Science, Czestochowa University of Technology, Czestochowa, Poland"}]},{"given":"Pawel","family":"Bratek","sequence":"additional","affiliation":[{"name":"Institute of Computer and Information Science, Faculty of Mechanical Engineering and Computer Science, Czestochowa University of Technology, Czestochowa, Poland"}]}],"member":"179","published-online":{"date-parts":[[2019,2,24]]},"reference":[{"key":"bibr1-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/Co-HPC.2014.4"},{"key":"bibr2-1094342019828153","first-page":"1","volume-title":"SC\u201912 proceedings of the international conference on high performance computing, networking, storage and analysis","author":"Bandishti V","year":"2013"},{"key":"bibr3-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751226"},{"key":"bibr4-1094342019828153","first-page":"258","volume-title":"International multi-conference on advanced computer systems (ACS 2016)","author":"Bobulski J","year":"2016"},{"key":"bibr5-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.102"},{"key":"bibr6-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-016-0455-0"},{"key":"bibr7-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1201\/b22395"},{"key":"bibr8-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-017-2159-7"},{"key":"bibr9-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1137\/070693199"},{"key":"bibr10-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-28145-7_11"},{"key":"bibr11-1094342019828153","volume-title":"Capabilities of Intel AVX-512 in Intel Xeon Scalable Processors (Skylake)","author":"Eltablawy A","year":"2015"},{"key":"bibr12-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2015.85"},{"key":"bibr13-1094342019828153","volume-title":"Introduction to High Performance Computing for Science and Engineers","author":"Hager G","year":"2011"},{"key":"bibr14-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19861-8_13"},{"key":"bibr15-1094342019828153","unstructured":"Intel (2018) Intel 64 and IA-32 architectures optimization reference manual. Available at: https:\/\/software.intel.com\/sites\/default\/files\/managed\/9e\/bc\/64-ia-32-architectures-optimization-manual.pdf (accessed 1 July 2018)."},{"key":"bibr16-1094342019828153","unstructured":"Intel Xeon Processor (2017) Intel Xeon Processor E7-8800\/4800 v4 Product Family Specification. Available at: https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/specification-updates\/xeon-e7-v4-spec-update.pdf (accessed 1 July 2018)."},{"key":"bibr17-1094342019828153","unstructured":"Intel Xeon Processor (2018) Intel Xeon Processor Scalable Family Specification. Available at: https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/specification-updates\/xeon-scalable-spec-update.pdf (accessed 1 July 2018)."},{"key":"bibr18-1094342019828153","volume-title":"Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition","author":"Jeffers J","year":"2016"},{"key":"bibr19-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2013.95"},{"key":"bibr20-1094342019828153","doi-asserted-by":"publisher","DOI":"10.3847\/0004-637X\/830\/2\/80"},{"key":"bibr21-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-015-0398-x"},{"key":"bibr22-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2599527"},{"key":"bibr23-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2009.21"},{"key":"bibr24-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.87"},{"key":"bibr25-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3609"},{"key":"bibr26-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49956-7_3"},{"key":"bibr27-1094342019828153","unstructured":"Naono K, Teranishi K, Cavazos J, et al. (eds) (2011) Software Automatic Tuning: From Concepts to State-of-the-Art Results, Granada, Spain, 14\u201316 December 2016, pp. 30\u201342. Berlin: Springer."},{"key":"bibr28-1094342019828153","unstructured":"OpenMP (2015) OpenMP application programming interface version 4.5. Available at: https:\/\/www.openmp.org\/wp-content\/uploads\/openmp-4.5.pdf (accessed 1 July 2018)."},{"key":"bibr29-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2715809"},{"key":"bibr30-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31500-8_40"},{"key":"bibr31-1094342019828153","doi-asserted-by":"publisher","DOI":"10.7763\/IJMO.2015.V5.456"},{"key":"bibr32-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1002\/fld.1071"},{"key":"bibr33-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2012.11.008"},{"key":"bibr34-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.1998.5901"},{"key":"bibr35-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2016.06.048"},{"key":"bibr36-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1016\/j.asr.2016.05.043"},{"key":"bibr37-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2011.47"},{"key":"bibr38-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-018-2239-3"},{"key":"bibr39-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1177\/1094342016677740"},{"key":"bibr40-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49956-7_30"},{"key":"bibr41-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-32149-3_39"},{"key":"bibr42-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-55224-3_54"},{"key":"bibr43-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/642705"},{"key":"bibr44-1094342019828153","first-page":"51","volume-title":"Proceedings of 1st international workshop on high-performance stencil computations, HiStencils 2014, in conjunction with HiPEAC 20\u201322 January 2014","author":"Szustak L","year":"2014"},{"key":"bibr45-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-62932-2_34"},{"key":"bibr46-1094342019828153","unstructured":"Unat D and others (eds) (2014) Programming abstractions for data locality, Report no. 01083080, v1, November 2014. Available at: https:\/\/hal.inria.fr\/hal-01083080\/file\/PADAL-report.pdf (accessed 1 June 2018)."},{"key":"bibr47-1094342019828153","volume-title":"Parallel Programming and Optimization with Intel Xeon Phi Coprocessors","author":"Vladimirov A","year":"2015","edition":"2"},{"key":"bibr48-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.08.006"},{"key":"bibr49-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29843-1_77"},{"key":"bibr50-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2014.04.009"},{"key":"bibr51-1094342019828153","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259044"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019828153","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342019828153","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342019828153","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T13:57:43Z","timestamp":1740837463000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342019828153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,24]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,5]]}},"alternative-id":["10.1177\/1094342019828153"],"URL":"https:\/\/doi.org\/10.1177\/1094342019828153","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,2,24]]}}}