{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T22:53:05Z","timestamp":1777675985764,"version":"3.51.4"},"reference-count":40,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2022,3,21]],"date-time":"2022-03-21T00:00:00Z","timestamp":1647820800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","award":["RTI2018-098156-B-C53"],"award-info":[{"award-number":["RTI2018-098156-B-C53"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2022,5]]},"abstract":"<jats:p>This work covers the PHAST Library\u2019s employment, a hardware-agnostic programming library, to a real-world application like the Caffe framework. The original implementation of Caffe consists of two different versions of the source code: one to run on CPU platforms and another one to run on the GPU side. With PHAST, we aim to develop a single-source code implementation capable of running efficiently on CPU and GPU. In this paper, we start by carrying out a basic Caffe implementation performance analysis using PHAST. Then, we detail possible performance upgrades. We find that the overall performance is dominated by few \u2018heavy\u2019 layers. In refining the inefficient parts of this version, we find two different approaches: improvements to the Caffe source code and improvements to the PHAST Library itself, which ultimately translates into improved performance in the PHAST version of Caffe. We demonstrate that our PHAST implementation achieves performance portability on CPUs and GPUs. With a single source, the PHAST version of Caffe provides the same or even better performance than the original version of Caffe built from two different codebases. For the MNIST database, the PHAST implementation takes an equivalent amount of time as native code in CPU and GPU. Furthermore, PHAST achieves a speedup of 51% and a 49% with the CIFAR-10 database against native code in CPU and GPU, respectively. These results provide a new horizon for software development in the upcoming heterogeneous computing era.<\/jats:p>","DOI":"10.1177\/10943420221077107","type":"journal-article","created":{"date-parts":[[2022,3,21]],"date-time":"2022-03-21T08:53:32Z","timestamp":1647852812000},"page":"419-439","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Performance portability in a real world application: PHAST applied to Caffe"],"prefix":"10.1177","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4391-2451","authenticated-orcid":false,"given":"Pablo Antonio","family":"Mart\u00ednez","sequence":"first","affiliation":[{"name":"Computer Engineering Department, University of Murcia, Murcia, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4998-0092","authenticated-orcid":false,"given":"Biagio","family":"Peccerillo","sequence":"additional","affiliation":[{"name":"Department of Information Engineering and Mathematics, University of Siena, Siena, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sandro","family":"Bartolini","sequence":"additional","affiliation":[{"name":"Department of Information Engineering and Mathematics, University of Siena, Siena, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6388-2835","authenticated-orcid":false,"given":"Jos\u00e9 M","family":"Garc\u00eda","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, University of Murcia, Murcia, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7265-3508","authenticated-orcid":false,"given":"Gregorio","family":"Bernab\u00e9","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, University of Murcia, Murcia, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2022,3,21]]},"reference":[{"key":"bibr1-10943420221077107","first-page":"265","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Abadi M","year":"2016"},{"key":"bibr2-10943420221077107","unstructured":"Adve S, Bodik R (2019) I-USHER: Interfaces to Unlock the Specialized Hardware Revolution. Information Science and Technology (ISAT), p. 27. URL http:\/\/rsim.cs.illinois.edu\/Talks\/I-USHER.pdf."},{"key":"bibr3-10943420221077107","unstructured":"Aksel Alpay (2019) hipSYCL - an implementation of SYCL over NVIDIA CUDA\/AMD HIP. URL https:\/\/github.com\/illuhad\/hipSYCL"},{"key":"bibr4-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2019.102584"},{"key":"bibr5-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"bibr6-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356162"},{"key":"bibr7-10943420221077107","unstructured":"CodePlay (2019) ComputeCpp - Accelerate Complex C++ Applications on Heterogeneous Compute Systems using Open Standards. URL https:\/\/www.codeplay.com\/products\/computesuite\/computecpp."},{"key":"bibr8-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-020-03257-3"},{"key":"bibr9-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008373903657"},{"key":"bibr10-10943420221077107","first-page":"10","volume-title":"Efficient Deep Learning for Compute Vision (ECV) Workshop","author":"Dukhan M","year":"2019"},{"key":"bibr11-10943420221077107","doi-asserted-by":"crossref","unstructured":"Edwards HC, Trott CR (2013) Kokkos: enabling performance portability across manycore architectures. In: 2013 Extreme Scaling Workshop (Xsw 2013), Boulder, CO, USA, 15\u201316 August 2013, pp. 18\u201324.","DOI":"10.1109\/XSW.2013.7"},{"key":"bibr12-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00069"},{"key":"bibr13-10943420221077107","first-page":"11","volume-title":"13th International Workshop on Programmability and Architectures for Heterogeneous Multicores","author":"G\u00f3mez-Hern\u00e1ndez EJ","year":"2020"},{"key":"bibr14-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/P3HPC51967.2020.00008"},{"key":"bibr15-10943420221077107","volume-title":"Neural Network Accelerator Comparison","author":"Guo K","year":"2021"},{"key":"bibr16-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/3282307"},{"key":"bibr17-10943420221077107","unstructured":"Hill MD, Reddi VJ (2020) Accelerator-level Parallelism. arXiv. URL https:\/\/arxiv.org\/abs\/1907.02064"},{"key":"bibr18-10943420221077107","unstructured":"Intel (2020) oneAPI Specification. URL https:\/\/spec.oneapi.com\/versions\/latest\/oneAPI-spec.pdf"},{"key":"bibr19-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"bibr20-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"bibr21-10943420221077107","volume-title":"SYCL Provisional Specification","author":"Khronos OpenCL Working Group","year":"2019"},{"key":"bibr22-10943420221077107","first-page":"15","volume-title":"Conference Track Proceedings. 3rd International Conference on Learning Representations, ICLR 2015","author":"Kingma DP","year":"2015"},{"key":"bibr23-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/3200691.3178493"},{"key":"bibr24-10943420221077107","unstructured":"Lattner C, Amini M, Bondhugula U, et al. (2020) Mlir: A Compiler Infrastructure for the End of Moore\u2019s Law, p. 21. URL https:\/\/arxiv.org\/abs\/2002.11054."},{"key":"bibr25-10943420221077107","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.5549433"},{"key":"bibr26-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/N-SSC.2006.4785860"},{"key":"bibr27-10943420221077107","doi-asserted-by":"publisher","DOI":"10.2172\/1332474"},{"key":"bibr28-10943420221077107","unstructured":"NVIDIA (2021) CUDA C Programming Guide. URL docs.nvidia.com\/cuda\/pdf\/CUDA_C_Programming_Guide.pdf"},{"key":"bibr29-10943420221077107","first-page":"8024","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Paszke A","year":"2019"},{"key":"bibr30-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2855182"},{"key":"bibr31-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1145\/3303084.3309496"},{"key":"bibr32-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.5842"},{"key":"bibr33-10943420221077107","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.5557540"},{"key":"bibr34-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2017.08.007"},{"key":"bibr35-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2021.3097276"},{"key":"bibr36-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/hpec43674.2020.9286149"},{"key":"bibr37-10943420221077107","unstructured":"Rotem N, Fix J, Abdulrasool S, et al. (2019) Glow: Graph lowering compiler techniques for neural networks. arXiv. URL https:\/\/arxiv.org\/abs\/1805.00907."},{"key":"bibr38-10943420221077107","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"bibr39-10943420221077107","first-page":"19","volume":"30","author":"Yamada Y","year":"2018","journal-title":"Proceedings of A Symposium on High Performance Chips"},{"key":"bibr40-10943420221077107","first-page":"5776","volume-title":"Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research","volume":"80","author":"Zhang J","year":"2018"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221077107","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420221077107","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221077107","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:17:19Z","timestamp":1777450639000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420221077107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,21]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,5]]}},"alternative-id":["10.1177\/10943420221077107"],"URL":"https:\/\/doi.org\/10.1177\/10943420221077107","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,21]]}}}