{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T20:55:50Z","timestamp":1760820950526,"version":"3.38.0"},"reference-count":41,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2016,12,23]],"date-time":"2016-12-23T00:00:00Z","timestamp":1482451200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2018,7]]},"abstract":"<jats:p> Modern heterogeneous computing platforms have become powerful HPC solutions, which could be applied to a wide range of real-life applications. In particular, the hybrid platforms equipped with Intel Xeon Phi coprocessors offer the advantages of massively parallel computing, while supporting practically the same parallel programming model as conventional homogeneous solutions. However, there is still an open issue as to how scientific applications can efficiently utilize hybrid platforms with Intel MIC coprocessors. In this article, we propose an approach for porting a real-life scientific application to such hybrid platforms, assuming no significant modifications of the application code. It allows us to take advantage of all the computing components, including two CPUs and two coprocessors, for the parallel execution of computational workloads. In this study, we focus on the parallel implementation of a numerical model of the dendritic solidification process in isothermal conditions. We develop a sequence of steps that are necessary for the porting and optimization of the solidification application to hybrid platforms with Intel coprocessors. The main challenges include not only overlapping data movements with computations, but also ensuring adequate utilization of cores\/threads and vector units of processors, as well as coprocessors. To reach this aim, we propose an efficient and flexible method for the workload distribution between heterogeneous computing components. For implementing the potential benefits of the proposed approach, we choose a heterogeneous programming model based on a combination of the offload mode for Intel MIC and OpenMP programming standard. The developed approach allows us to execute the whole application up to 9.33\u00d7 faster than the original parallel version that uses two CPUs. Furthermore, the CPU\u2013MIC hybrid platforms enable achieving the speedup of about 1.9\u00d7 that of the CPU platform with 24 cores based on the Ivy Bridge architecture, and about 1.5\u00d7 that of the Haswell-based CPU platform with 36 cores. <\/jats:p>","DOI":"10.1177\/1094342016677740","type":"journal-article","created":{"date-parts":[[2017,12,31]],"date-time":"2017-12-31T15:33:09Z","timestamp":1514734389000},"page":"523-539","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["Porting and optimization of solidification application for CPU\u2013MIC hybrid platforms"],"prefix":"10.1177","volume":"32","author":[{"given":"Lukasz","family":"Szustak","sequence":"first","affiliation":[{"name":"Insitute of Computer and Information Science, Czestochowa University of Technology, Poland"}]},{"given":"Kamil","family":"Halbiniak","sequence":"additional","affiliation":[{"name":"Insitute of Computer and Information Science, Czestochowa University of Technology, Poland"}]},{"given":"Lukasz","family":"Kuczynski","sequence":"additional","affiliation":[{"name":"Insitute of Computer and Information Science, Czestochowa University of Technology, Poland"}]},{"given":"Joanna","family":"Wrobel","sequence":"additional","affiliation":[{"name":"Insitute of Computer and Information Science, Czestochowa University of Technology, Poland"}]},{"given":"Adam","family":"Kulawik","sequence":"additional","affiliation":[{"name":"Insitute of Computer and Information Science, Czestochowa University of Technology, Poland"}]}],"member":"179","published-online":{"date-parts":[[2016,12,23]]},"reference":[{"issue":"2","key":"bibr1-1094342016677740","first-page":"89","volume":"40","author":"Adrian H","year":"2009","journal-title":"Archives of Materials Science and Engineering"},{"key":"bibr2-1094342016677740","first-page":"251","volume-title":"Leading-Edge Applied Mathematical Modeling Research","author":"Benito J","year":"2008"},{"key":"bibr3-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.88"},{"key":"bibr4-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1016\/j.commatsci.2011.12.019"},{"key":"bibr5-1094342016677740","unstructured":"Colfax International (2015) Colfax Servers based on Intel\u00ae Xeon Phi\u2122 Coprocessors. Available at: http:\/\/www.colfax-intl.com\/nd\/xeonphi\/servers.aspx (accessed 20 November 2016)."},{"key":"bibr6-1094342016677740","unstructured":"Corden M (2013) Differences in floating-point arithmetic between Intel Xeon Processors and the Intel Xeon Phi Coprocessor. Intel Corporation. Available at: https:\/\/software.intel.com\/en-us\/articles\/differences-in-floating-point-arithmeticbetween-intel-xeon-processors-and-the-intel-xeon (accessed 28 March 2013)."},{"key":"bibr7-1094342016677740","unstructured":"Corden M, Kreitzer D (2015) Consistency of floating-point results using the Intel Compiler. Software Solutions Group, Intel Corporation. Available at: https:\/\/software.intel.com\/en-us\/articles\/consistency-of-floating-point-resultsusing-the-intel-compiler (accessed 2 August 2012)."},{"key":"bibr8-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.60.1734"},{"volume-title":"Introduction to High Performance Computing for Science and Engineers","year":"2011","author":"Hager G","key":"bibr9-1094342016677740"},{"key":"bibr10-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.6"},{"journal-title":"Intel Xeon Phi Coprocessor System Software Developers Guide","year":"2013","author":"Intel Corporation","key":"bibr11-1094342016677740"},{"key":"bibr12-1094342016677740","unstructured":"Intel Corporation (2015) Intel Product Specifications. Available at: http:\/\/ark.intel.com\/ (accessed 20 November 2016)."},{"key":"bibr13-1094342016677740","unstructured":"IT4Innovations (2015) National Supercomputing Center IT4Innovations. Available at: http:\/\/www.it4i.cz."},{"volume-title":"Intel Xeon Phi Coprocessor High-Performance Programming","year":"2014","author":"Jeffers J","key":"bibr14-1094342016677740"},{"key":"bibr15-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.87.045501"},{"key":"bibr16-1094342016677740","volume-title":"The Modeling of the Phenomena of the Heat Treatment of the Medium Carbon Steel","volume":"281","author":"Kulawik A","year":"2013"},{"key":"bibr17-1094342016677740","doi-asserted-by":"crossref","unstructured":"Kurzak J, Bader D, Dongarra J (eds.) (2011) Scientific Computing with Multicore and Accelerators. Boca Raton, FL: CRC Press.","DOI":"10.1201\/b10376"},{"key":"bibr18-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-27161-3_77"},{"key":"bibr19-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1109\/PADSW.2014.7097852"},{"key":"bibr20-1094342016677740","unstructured":"Liviero B (2015) Intel Xeon Phi: Application and solutions catalogue. Available at: https:\/\/software.intel.com\/en-us\/xeonphionlinecatalog (accessed 20 November 2016)."},{"key":"bibr21-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1016\/S1359-6454(00)00360-8"},{"key":"bibr22-1094342016677740","unstructured":"MICLAB (2015) Pilot Laboratory of Massively Parallel Systems (MICLAB). Available at: http:\/\/miclab.pl (accessed 20 November 2016)."},{"key":"bibr23-1094342016677740","unstructured":"OpenMP (2015) OpenMP Application Programming Interface. Available at: http:\/\/www.openmp.org\/"},{"journal-title":"Parallel Programming and Optimization with Intel Xeon Phi Coprocessors","year":"2013","author":"Colfax International","key":"bibr24-1094342016677740"},{"key":"bibr25-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1002\/9783527631520"},{"key":"bibr26-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4302-5927-5"},{"key":"bibr27-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063388"},{"key":"bibr28-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1088\/0965-0393\/17\/7\/073001"},{"key":"bibr29-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-32149-3_39"},{"key":"bibr30-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-55224-3_54"},{"key":"bibr31-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/642705"},{"key":"bibr32-1094342016677740","first-page":"51","volume-title":"Proceedings of the 1st international workshop on high-performance stencil computations (HiStencils\u201914)","author":"Szustak L","year":"2014"},{"key":"bibr33-1094342016677740","doi-asserted-by":"publisher","DOI":"10.2355\/isijinternational.54.437"},{"key":"bibr34-1094342016677740","unstructured":"Vladimirov A (2015) Performance to power and performance to cost ratios with Intel Xeon Phi Coprocessors (And why 1\u00d7 acceleration may be enough). 27 January 2015, 8 pages. Sunnyvale, CA: Colfax International. Available at: https:\/\/colfaxresearch.com\/performance-to-power-and-performance-to-cost-ratioswith-intel-xeon-phi-coprocessors-and-why-1x-acceleration-may-be-enough\/ (accessed 20 November 2016)."},{"key":"bibr35-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1016\/0956-7151(94)00285-P"},{"key":"bibr36-1094342016677740","first-page":"49","volume-title":"Proceedings of the fourth workshop on irregular applications: Architectures and algorithms","author":"Wolfe N","year":"2014"},{"key":"bibr37-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.08.006"},{"key":"bibr38-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2014.04.009"},{"key":"bibr39-1094342016677740","first-page":"434","volume":"8353","author":"Wyrzykowski R","year":"2014","journal-title":"Lecture Notes in Computer Science"},{"key":"bibr40-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2014.2366754"},{"key":"bibr41-1094342016677740","doi-asserted-by":"publisher","DOI":"10.1016\/j.apm.2012.08.005"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016677740","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342016677740","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342016677740","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T18:45:56Z","timestamp":1740681956000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342016677740"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,12,23]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,7]]}},"alternative-id":["10.1177\/1094342016677740"],"URL":"https:\/\/doi.org\/10.1177\/1094342016677740","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2016,12,23]]}}}