{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:20:27Z","timestamp":1750220427333,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T00:00:00Z","timestamp":1619481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,27]]},"DOI":"10.1145\/3456669.3456694","type":"proceedings-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T15:22:31Z","timestamp":1619536951000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Toward Performance Portability of Highly Parametrizable TRSM Algorithm Using SYCL"],"prefix":"10.1145","author":[{"given":"Thales","family":"Sabino","sequence":"first","affiliation":[{"name":"Codeplay Software Ltd., UK"}]},{"given":"Mehdi","family":"Goli","sequence":"additional","affiliation":[{"name":"Codeplay software Ltd, UK"}]}],"member":"320","published-online":{"date-parts":[[2021,4,27]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n.d.]. The ARM Computer Vision and Machine Learning library. https:\/\/github.com\/ARM-software\/ComputeLibrary\/  [n.d.]. The ARM Computer Vision and Machine Learning library. https:\/\/github.com\/ARM-software\/ComputeLibrary\/"},{"key":"e_1_3_2_1_2_1","unstructured":"[n.d.]. BLAS (Basic Linear Algebra Subprograms). https:\/\/www.netlib.org\/blas\/  [n.d.]. BLAS (Basic Linear Algebra Subprograms). https:\/\/www.netlib.org\/blas\/"},{"key":"e_1_3_2_1_3_1","unstructured":"[n.d.]. clBLAS. https:\/\/rocmdocs.amd.com\/en\/latest\/ROCm_Tools\/clBLA.html  [n.d.]. clBLAS. https:\/\/rocmdocs.amd.com\/en\/latest\/ROCm_Tools\/clBLA.html"},{"key":"e_1_3_2_1_4_1","unstructured":"[n.d.]. The HiKey 960 development platform. https:\/\/www.96boards.org\/product\/hikey960  [n.d.]. The HiKey 960 development platform. https:\/\/www.96boards.org\/product\/hikey960"},{"key":"e_1_3_2_1_5_1","unstructured":"[n.d.]. OpenBLAS - An optimized BLAS library. http:\/\/www.openblas.net\/  [n.d.]. OpenBLAS - An optimized BLAS library. http:\/\/www.openblas.net\/"},{"key":"e_1_3_2_1_6_1","unstructured":"[n.d.]. SYCL-BLAS: An implementation of BLAS using the SYCL open standard. https:\/\/github.com\/CodeplaySoftware\/SYCL-BLAS. Accessed: 2019-04-09.  [n.d.]. SYCL-BLAS: An implementation of BLAS using the SYCL open standard. https:\/\/github.com\/CodeplaySoftware\/SYCL-BLAS. Accessed: 2019-04-09."},{"key":"e_1_3_2_1_7_1","unstructured":"[n.d.]. SYCL Specification. https:\/\/www.khronos.org\/registry\/SYCL\/  [n.d.]. SYCL Specification. https:\/\/www.khronos.org\/registry\/SYCL\/"},{"key":"e_1_3_2_1_8_1","unstructured":"2020. Intel\u00ae oneAPI Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/documentation\/oneapi-mkl-dpcpp-developer-reference\/top.html  2020. Intel\u00ae oneAPI Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/documentation\/oneapi-mkl-dpcpp-developer-reference\/top.html"},{"key":"e_1_3_2_1_9_1","unstructured":"2020. The oneAPI Specification. https:\/\/www.oneapi.com\/  2020. The oneAPI Specification. https:\/\/www.oneapi.com\/"},{"key":"e_1_3_2_1_10_1","unstructured":"M. Abadi P. Barham J. Chen Z. Chen Andy Davis J. Dean M. Devin Sanjay Ghemawat Geoffrey Irving M. Isard M. Kudlur Josh Levenberg Rajat Monga Sherry Moore D. Murray B. Steiner P. Tucker V. Vasudevan Pete Warden Martin Wicke Y. Yu and Xiaoqiang Zhang. 2016. TensorFlow: A system for large-scale machine learning. In OSDI.  M. Abadi P. Barham J. Chen Z. Chen Andy Davis J. Dean M. Devin Sanjay Ghemawat Geoffrey Irving M. Isard M. Kudlur Josh Levenberg Rajat Monga Sherry Moore D. Murray B. Steiner P. Tucker V. Vasudevan Pete Warden Martin Wicke Y. Yu and Xiaoqiang Zhang. 2016. TensorFlow: A system for large-scale machine learning. In OSDI."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078155.3078189"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/898871"},{"volume-title":"Professional CUDA c programming","author":"Cheng John","key":"e_1_3_2_1_14_1","unstructured":"John Cheng , Max Grossman , and Ty McKercher . 2014. Professional CUDA c programming . John Wiley & Sons . John Cheng, Max Grossman, and Ty McKercher. 2014. Professional CUDA c programming. John Wiley & Sons."},{"key":"e_1_3_2_1_15_1","volume-title":"Torch: A Modular Machine Learning Software Library. (11","author":"Collobert Ronan","year":"2002","unstructured":"Ronan Collobert , Samy Bengio , and Johnny Marithoz . 2002 . Torch: A Modular Machine Learning Software Library. (11 2002). Ronan Collobert, Samy Bengio, and Johnny Marithoz. 2002. Torch: A Modular Machine Learning Software Library. (11 2002)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Jack Dongarra Mark Gates Azzam Haidar Jakub Kurzak Piotr Luszczek Stanimire Tomov and Ichitaro Yamazaki. 2014. Accelerating Numerical Dense Linear Algebra Calculations with GPUs. Numerical Computations with GPUs(2014) 1\u201326.  Jack Dongarra Mark Gates Azzam Haidar Jakub Kurzak Piotr Luszczek Stanimire Tomov and Ichitaro Yamazaki. 2014. Accelerating Numerical Dense Linear Algebra Calculations with GPUs. Numerical Computations with GPUs(2014) 1\u201326.","DOI":"10.1007\/978-3-319-06548-9_1"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3204919.3204926"},{"key":"e_1_3_2_1_18_1","volume-title":"Accessed 22","author":"Guennebaud Gael","year":"2014","unstructured":"Gael Guennebaud , Benoit Jacob , 2014. Eigen: a c++ linear algebra library. URL http:\/\/eigen. tuxfamily. org , Accessed 22 ( 2014 ). Gael Guennebaud, Benoit Jacob, 2014. Eigen: a c++ linear algebra library. URL http:\/\/eigen. tuxfamily. org, Accessed 22 (2014)."},{"key":"e_1_3_2_1_19_1","volume-title":"5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 14)","author":"Haidar Azzam","year":"2014","unstructured":"Azzam Haidar , Chongxiao Cao , Ichitaro Yamazaki , Jack Dongarra , Mark Gates , Piotr Luszczek , and Stanimire Tomov . 2014 . Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors . In 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 14) . IEEE, New Orleans, LA. https:\/\/doi.org\/10.1109\/ScalA. 2014.8 Azzam Haidar, Chongxiao Cao, Ichitaro Yamazaki, Jack Dongarra, Mark Gates, Piotr Luszczek, and Stanimire Tomov. 2014. Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors. In 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 14). IEEE, New Orleans, LA. https:\/\/doi.org\/10.1109\/ScalA.2014.8"},{"key":"e_1_3_2_1_20_1","volume-title":"Caffe: Convolutional Architecture for Fast Feature Embedding. arxiv:1408.5093\u00a0[cs.CV]","author":"Jia Yangqing","year":"2014","unstructured":"Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , and Trevor Darrell . 2014 . Caffe: Convolutional Architecture for Fast Feature Embedding. arxiv:1408.5093\u00a0[cs.CV] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arxiv:1408.5093\u00a0[cs.CV]"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/292395.292412"},{"key":"e_1_3_2_1_22_1","first-page":"3","article-title":"Basic Linear Algebra Subprograms for Fortran Usage","volume":"5","author":"Lawson L.","year":"1979","unstructured":"C.\u00a0 L. Lawson , R.\u00a0 J. Hanson , D.\u00a0 R. Kincaid , and F.\u00a0 T. Krogh . 1979 . Basic Linear Algebra Subprograms for Fortran Usage . ACM Trans. Math. Softw. 5 , 3 (Sept. 1979), 308\u2013323. https:\/\/doi.org\/10.1145\/355841.355847 C.\u00a0L. Lawson, R.\u00a0J. Hanson, D.\u00a0R. Kincaid, and F.\u00a0T. Krogh. 1979. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Softw. 5, 3 (Sept. 1979), 308\u2013323. https:\/\/doi.org\/10.1145\/355841.355847","journal-title":"ACM Trans. Math. Softw."},{"key":"e_1_3_2_1_23_1","unstructured":"John\u00a0W. Lawson Mehdi Goli Duncan McBain Daniel Soutar and Louis Sugy. 2019. Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels. CoRR abs\/1904.05347(2019). arxiv:1904.05347http:\/\/arxiv.org\/abs\/1904.05347  John\u00a0W. Lawson Mehdi Goli Duncan McBain Daniel Soutar and Louis Sugy. 2019. Cross-Platform Performance Portability Using Highly Parametrized SYCL Kernels. CoRR abs\/1904.05347(2019). arxiv:1904.05347http:\/\/arxiv.org\/abs\/1904.05347"},{"key":"e_1_3_2_1_24_1","unstructured":"Codeplay\u00a0Software Ltd.2021. ComputeCpp CE 2.3.0. https:\/\/developer.codeplay.com\/products\/computecpp\/ce\/home  Codeplay\u00a0Software Ltd.2021. ComputeCpp CE 2.3.0. https:\/\/developer.codeplay.com\/products\/computecpp\/ce\/home"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3204919.3204924"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.catena.2013.09.006"},{"key":"e_1_3_2_1_27_1","volume-title":"SYCL: Single-source C++ accelerator programming.. In PARCO. 673\u2013682.","author":"Reyes Ruyman","year":"2015","unstructured":"Ruyman Reyes and Victor Lom\u00fcller . 2015 . SYCL: Single-source C++ accelerator programming.. In PARCO. 673\u2013682. Ruyman Reyes and Victor Lom\u00fcller. 2015. SYCL: Single-source C++ accelerator programming.. In PARCO. 673\u2013682."},{"key":"e_1_3_2_1_28_1","volume-title":"A parallel implementation of the ordinary kriging algorithm for heterogeneous computing environments. (08","author":"Sabino Thales\u00a0Luis","year":"2017","unstructured":"Thales\u00a0Luis Sabino , Gisele Tavares , Leonardo Goliatt , Marcelo Lobosco , Filipe Chaves , and Rodrigo Santos . 2017. A parallel implementation of the ordinary kriging algorithm for heterogeneous computing environments. (08 2017 ). Thales\u00a0Luis Sabino, Gisele Tavares, Leonardo Goliatt, Marcelo Lobosco, Filipe Chaves, and Rodrigo Santos. 2017. A parallel implementation of the ordinary kriging algorithm for heterogeneous computing environments. (08 2017)."},{"key":"e_1_3_2_1_29_1","volume-title":"OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3","author":"Stone E","year":"2010","unstructured":"John\u00a0 E Stone , David Gohara , and Guochun Shi . 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 ( 2010 ), 66. John\u00a0E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66."},{"key":"e_1_3_2_1_30_1","unstructured":"Cuda Toolkit. [n.d.]. CUBLAS Library.  Cuda Toolkit. [n.d.]. CUBLAS Library."},{"volume-title":"Intel Math Kernel Library","author":"Wang Endong","key":"e_1_3_2_1_31_1","unstructured":"Endong Wang , Qing Zhang , Bo Shen , Guangyong Zhang , Xiaowei Lu , Qing Wu , and Yajuan Wang . 2014. Intel Math Kernel Library . Springer International Publishing , Cham , 167\u2013188. https:\/\/doi.org\/10.1007\/978-3-319-06486-4_7 Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel Math Kernel Library. Springer International Publishing, Cham, 167\u2013188. https:\/\/doi.org\/10.1007\/978-3-319-06486-4_7"},{"key":"e_1_3_2_1_32_1","volume-title":"Enrique\u00a0S Quintana-Orti, and Gregorio Quintana-Ort\u00ed.","author":"Zee Field","year":"2009","unstructured":"Field Zee , Ernie Chan , Robert van\u00a0de Geijn , Enrique\u00a0S Quintana-Orti, and Gregorio Quintana-Ort\u00ed. 2009 . Introducing: the LIBFLAME library for dense matrix computations. Computing in Science and Engineering 11 (11 2009), 56\u201363. https:\/\/doi.org\/10.1109\/MCSE.2009.207 Field Zee, Ernie Chan, Robert van\u00a0de Geijn, Enrique\u00a0S Quintana-Orti, and Gregorio Quintana-Ort\u00ed. 2009. Introducing: the LIBFLAME library for dense matrix computations. Computing in Science and Engineering 11 (11 2009), 56\u201363. https:\/\/doi.org\/10.1109\/MCSE.2009.207"}],"event":{"name":"IWOCL'21: International Workshop on OpenCL","acronym":"IWOCL'21","location":"Munich Germany"},"container-title":["International Workshop on OpenCL"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3456669.3456694","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3456669.3456694","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:46:55Z","timestamp":1750193215000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3456669.3456694"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,27]]},"references-count":31,"alternative-id":["10.1145\/3456669.3456694","10.1145\/3456669"],"URL":"https:\/\/doi.org\/10.1145\/3456669.3456694","relation":{},"subject":[],"published":{"date-parts":[[2021,4,27]]},"assertion":[{"value":"2021-04-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}