{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:46:50Z","timestamp":1750308410726,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"U.S Department of Energy","award":["DE-AC05-00OR22725"],"award-info":[{"award-number":["DE-AC05-00OR22725"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3458744.3473360","type":"proceedings-article","created":{"date-parts":[[2021,9,23]],"date-time":"2021-09-23T16:38:30Z","timestamp":1632415110000},"page":"1-8","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs"],"prefix":"10.1145","author":[{"given":"Zheming","family":"Jin","sequence":"first","affiliation":[{"name":"Oak Ridge National Laboratory, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey","family":"Vetter","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,23]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"271","article-title":"Khronos OpenCL Working Group. The OpenCL Specification","volume":"1","author":"Munshi A.","year":"2007","unstructured":"Munshi , A. , Jacobs , I.S. , Bean , C.P. , Rado , G.T. and Suhl , H. , 2007 . Khronos OpenCL Working Group. The OpenCL Specification , Version 1 , pp. 271 - 350 . Munshi, A., Jacobs, I.S., Bean, C.P., Rado, G.T. and Suhl, H., 2007. Khronos OpenCL Working Group. The OpenCL Specification, Version 1, pp.271-350.","journal-title":"Version"},{"volume-title":"22nd international conference on field programmable logic and applications (pp. 531-534)","author":"Czajkowski T.S.","key":"e_1_3_2_1_2_1","unstructured":"Czajkowski , T.S. , Aydonat , U. , Denisenko , D. , Freeman , J. , Kinsner , M. , Neto , D. , Wong , J. , Yiannacouras , P. and Singh , D.P ., 2012, August. From OpenCL to high-performance hardware on FPGAs . In 22nd international conference on field programmable logic and applications (pp. 531-534) . IEEE. Czajkowski, T.S., Aydonat, U., Denisenko, D., Freeman, J., Kinsner, M., Neto, D., Wong, J., Yiannacouras, P. and Singh, D.P., 2012, August. From OpenCL to high-performance hardware on FPGAs. In 22nd international conference on field programmable logic and applications (pp. 531-534). IEEE."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Stone J.E. Gohara D. and Shi G. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12(3) p.66.  Stone J.E. Gohara D. and Shi G. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12(3) p.66.","DOI":"10.1109\/MCSE.2010.69"},{"key":"e_1_3_2_1_4_1","unstructured":"https:\/\/www.khronos.org\/registry\/SYCL\/specs\/sycl-1.2.1.pdf  https:\/\/www.khronos.org\/registry\/SYCL\/specs\/sycl-1.2.1.pdf"},{"volume-title":"The International Conference on High Performance Computing in Asia-Pacific Region (pp. 50-57)","author":"Ke Y.","key":"e_1_3_2_1_5_1","unstructured":"Ke , Y. , Agung , M. and Takizawa , H ., 2021, January. neoSYCL: a SYCL implementation for SX-Aurora TSUBASA . In The International Conference on High Performance Computing in Asia-Pacific Region (pp. 50-57) . Ke, Y., Agung, M. and Takizawa, H., 2021, January. neoSYCL: a SYCL implementation for SX-Aurora TSUBASA. In The International Conference on High Performance Computing in Asia-Pacific Region (pp. 50-57)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3388333.3388643"},{"volume-title":"Proceedings of the International Workshop on OpenCL (pp. 1-7).","author":"Homerding B.","key":"e_1_3_2_1_7_1","unstructured":"Homerding , B. and Tramm , J ., 2020, April. Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs . In Proceedings of the International Workshop on OpenCL (pp. 1-7). Homerding, B. and Tramm, J., 2020, April. Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs. In Proceedings of the International Workshop on OpenCL (pp. 1-7)."},{"volume-title":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (pp. 359-367)","author":"Christgau S.","key":"e_1_3_2_1_8_1","unstructured":"Christgau , S. and Steinke , T ., 2020, May. Porting a Legacy CUDA Stencil Code to oneAPI . In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (pp. 359-367) . IEEE Christgau, S. and Steinke, T., 2020, May. Porting a Legacy CUDA Stencil Code to oneAPI. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (pp. 359-367). IEEE"},{"key":"e_1_3_2_1_9_1","first-page":"1","article-title":"Efficiency and productivity for decision making on low-power heterogeneous CPU+ GPU SoCs","author":"Constantinescu D.A.","year":"2020","unstructured":"Constantinescu , D.A. , Navarro , A. , Corbera , F. , Fern\u00e1ndez-Madrigal , J.A. and Asenjo , R. , 2020 . Efficiency and productivity for decision making on low-power heterogeneous CPU+ GPU SoCs . The Journal of Supercomputing , pp. 1 - 22 . Constantinescu, D.A., Navarro, A., Corbera, F., Fern\u00e1ndez-Madrigal, J.A. and Asenjo, R., 2020. Efficiency and productivity for decision making on low-power heterogeneous CPU+ GPU SoCs. The Journal of Supercomputing, pp.1-22.","journal-title":"The Journal of Supercomputing"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Johnston B. Vetter J.S. and Milthorpe J. 2020 November. Evaluating the Performance and Portability of Contemporary SYCL Implementations. In 2020 IEEE\/ACM International Workshop on Performance Portability and Productivity in HPC (P3HPC) (pp. 45-56). IEEE.  Johnston B. Vetter J.S. and Milthorpe J. 2020 November. Evaluating the Performance and Portability of Contemporary SYCL Implementations. In 2020 IEEE\/ACM International Workshop on Performance Portability and Productivity in HPC (P3HPC) (pp. 45-56). IEEE.","DOI":"10.1109\/P3HPC51967.2020.00010"},{"volume-title":"2019 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (pp. 14-25)","author":"Jo\u00f3 B.","key":"e_1_3_2_1_11_1","unstructured":"Jo\u00f3 , B. , Kurth , T. , Clark , M.A. , Kim , J. , Trott , C.R. , Ibanez , D. , Sunderland , D. and Deslippe , J ., 2019, November. Performance portability of a Wilson Dslash stencil operator mini-app using Kokkos and SYCL . In 2019 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (pp. 14-25) . IEEE. Jo\u00f3, B., Kurth, T., Clark, M.A., Kim, J., Trott, C.R., Ibanez, D., Sunderland, D. and Deslippe, J., 2019, November. Performance portability of a Wilson Dslash stencil operator mini-app using Kokkos and SYCL. In 2019 IEEE\/ACM International Workshop on Performance, Portability and Productivity in HPC (pp. 14-25). IEEE."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/P3HPC49587.2019.00008"},{"volume-title":"2019 IEEE International Conference on Bioinformatics and Biomedicine (pp. 2259-2264)","author":"Jin Z.","key":"e_1_3_2_1_13_1","unstructured":"Jin , Z. and Finkel , H ., 2019, November. Evaluation of Medical Imaging Applications using SYCL . In 2019 IEEE International Conference on Bioinformatics and Biomedicine (pp. 2259-2264) . IEEE. Jin, Z. and Finkel, H., 2019, November. Evaluation of Medical Imaging Applications using SYCL. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (pp. 2259-2264). IEEE."},{"volume-title":"2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1-8). IEEE.","author":"Afzal A.","key":"e_1_3_2_1_14_1","unstructured":"Afzal , A. , Schmitt , C. , Alhaddad , S. , Grynko , Y. , Teich , J. , Forstner , J. and Hannig , F ., 2018, July. Solving Maxwell's Equations with Modern C++ and SYCL: A Case Study . In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1-8). IEEE. Afzal, A., Schmitt, C., Alhaddad, S., Grynko, Y., Teich, J., Forstner, J. and Hannig, F., 2018, July. Solving Maxwell's Equations with Modern C++ and SYCL: A Case Study. In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (pp. 1-8). IEEE."},{"volume-title":"2016 International Symposium on Computer Architecture and High-Performance Computing Workshops (SBAC-PADW) (pp. 61-66)","author":"Da Silva H.C.","key":"e_1_3_2_1_15_1","unstructured":"Da Silva , H.C. , Pisani , F. and Borin , E ., 2016, October. A comparative study of SYCL, OpenCL, and OpenMP . In 2016 International Symposium on Computer Architecture and High-Performance Computing Workshops (SBAC-PADW) (pp. 61-66) . IEEE. Da Silva, H.C., Pisani, F. and Borin, E., 2016, October. A comparative study of SYCL, OpenCL, and OpenMP. In 2016 International Symposium on Computer Architecture and High-Performance Computing Workshops (SBAC-PADW) (pp. 61-66). IEEE."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Garland M. Le Grand S. Nickolls J. Anderson J. Hardwick J. Morton S. Phillips E. Zhang Y. and Volkov V. 2008. Parallel computing experiences with CUDA. IEEE micro 28(4) pp.13-27.  Garland M. Le Grand S. Nickolls J. Anderson J. Hardwick J. Morton S. Phillips E. Zhang Y. and Volkov V. 2008. Parallel computing experiences with CUDA. IEEE micro 28(4) pp.13-27.","DOI":"10.1109\/MM.2008.57"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2010.12.052"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078155.3078156"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3293320.3293338"},{"volume-title":"Proceedings of the International Workshop on OpenCL (pp. 1-3).","author":"Babej M.","key":"e_1_3_2_1_20_1","unstructured":"Babej , M. and J\u00e4\u00e4skel\u00e4inen , P ., 2020, April. HIPCL: Tool for Porting CUDA Applications to Advanced OpenCL Platforms Through HIP . In Proceedings of the International Workshop on OpenCL (pp. 1-3). Babej, M. and J\u00e4\u00e4skel\u00e4inen, P., 2020, April. HIPCL: Tool for Porting CUDA Applications to Advanced OpenCL Platforms Through HIP. In Proceedings of the International Workshop on OpenCL (pp. 1-3)."},{"volume-title":"International Symposium on Code Generation and Optimization, 2004. CGO 2004. (pp. 75-86)","author":"Lattner C.","key":"e_1_3_2_1_21_1","unstructured":"Lattner , C. and Adve , V ., 2004, March. LLVM: A compilation framework for lifelong program analysis & transformation . In International Symposium on Code Generation and Optimization, 2004. CGO 2004. (pp. 75-86) . IEEE. Lattner, C. and Adve, V., 2004, March. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. (pp. 75-86). IEEE."},{"key":"e_1_3_2_1_22_1","unstructured":"https:\/\/github.com\/intel\/llvm  https:\/\/github.com\/intel\/llvm"},{"key":"e_1_3_2_1_23_1","unstructured":"https:\/\/github.com\/intel\/llvm\/blob\/sycl\/sycl\/doc\/CompilerAndRuntimeDesign.md  https:\/\/github.com\/intel\/llvm\/blob\/sycl\/sycl\/doc\/CompilerAndRuntimeDesign.md"},{"key":"e_1_3_2_1_24_1","volume-title":"NVIDIA Compute PTX: Parallel Thread Execution","author":"NVIDIA","year":"2008","unstructured":"NVIDIA , NVIDIA Compute PTX: Parallel Thread Execution , 1 st ed., NVIDIA Corporation , Santa Clara , California, October 2008 NVIDIA, NVIDIA Compute PTX: Parallel Thread Execution, 1st ed., NVIDIA Corporation, Santa Clara, California, October 2008","edition":"1"},{"volume-title":"2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 139-148)","author":"Gera P.","key":"e_1_3_2_1_25_1","unstructured":"Gera , P. , Kim , H. , Kim , H. , Hong , S. , George , V. and Luk , C.K ., 2018, April. Performance characterisation and simulation of intel's integrated GPU architecture . In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 139-148) . IEEE. Gera, P., Kim, H., Kim, H., Hong, S., George, V. and Luk, C.K., 2018, April. Performance characterisation and simulation of intel's integrated GPU architecture. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (pp. 139-148). IEEE."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3204919.3204933"},{"volume-title":"GPU performance analysis and optimisation","author":"Bradley T.","key":"e_1_3_2_1_27_1","unstructured":"Bradley , T. , 2012. GPU performance analysis and optimisation . NVIDIA Corporation . Bradley, T., 2012. GPU performance analysis and optimisation. NVIDIA Corporation."},{"key":"e_1_3_2_1_28_1","unstructured":"https:\/\/github.com\/ekondis\/cl2-reduce-bench\/  https:\/\/github.com\/ekondis\/cl2-reduce-bench\/"},{"key":"e_1_3_2_1_29_1","unstructured":"https:\/\/github.com\/intel\/llvm\/tree\/sycl\/sycl\/doc\/extensions\/Reduction  https:\/\/github.com\/intel\/llvm\/tree\/sycl\/sycl\/doc\/extensions\/Reduction"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-385963-1.00026-5"},{"key":"e_1_3_2_1_31_1","volume-title":"CUB: A pattern of \u201ccollective","author":"Merrill D.","year":"2015","unstructured":"Merrill , D. , 2015 . CUB: A pattern of \u201ccollective \u201d software design, abstraction, and reuse for kernel-level programming. Nvidia Research . Merrill, D., 2015. CUB: A pattern of \u201ccollective\u201d software design, abstraction, and reuse for kernel-level programming. Nvidia Research."},{"key":"e_1_3_2_1_32_1","unstructured":"https:\/\/nvlabs.github.io\/cub\/  https:\/\/nvlabs.github.io\/cub\/"},{"key":"e_1_3_2_1_33_1","unstructured":"Mark H. 2008. Optimizing parallel reduction in CUDA. NVIDIA CUDA SDK.  Mark H. 2008. Optimizing parallel reduction in CUDA. NVIDIA CUDA SDK."},{"volume-title":"High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 511-519)","author":"Mart\u00edn P.J.","key":"e_1_3_2_1_34_1","unstructured":"Mart\u00edn , P.J. , Ayuso , L.F. , Torres , R. and Gavilanes , A ., 2012, July. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA . In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 511-519) . IEEE. Mart\u00edn, P.J., Ayuso, L.F., Torres, R. and Gavilanes, A., 2012, July. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA. In High Performance Computing and Simulation (HPCS), 2012 International Conference on (pp. 511-519). IEEE."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2775049.2602993"},{"key":"e_1_3_2_1_36_1","unstructured":"OpenCL Developer Guide for Intel\u00ae Processor Graphics. 2019 Update 4.  OpenCL Developer Guide for Intel\u00ae Processor Graphics. 2019 Update 4."},{"volume-title":"European Conference on Parallel Processing (pp. 438-452)","author":"Thoman P.","key":"e_1_3_2_1_37_1","unstructured":"Thoman , P. , Kofler , K. , Studt , H. , Thomson , J. and Fahringer , T ., 2011, August. Automatic OpenCL device characterization: Guiding optimized kernel design . In European Conference on Parallel Processing (pp. 438-452) . Springer, Berlin, Heidelberg. Thoman, P., Kofler, K., Studt, H., Thomson, J. and Fahringer, T., 2011, August. Automatic OpenCL device characterization: Guiding optimized kernel design. In European Conference on Parallel Processing (pp. 438-452). Springer, Berlin, Heidelberg."},{"volume-title":"Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 48-53)","author":"Ramanathan N.","key":"e_1_3_2_1_38_1","unstructured":"Ramanathan , N. , Wickerson , J. , Winterstein , F. and Constantinides , G.A ., 2016, February. A case for work-stealing on FPGAs with OpenCL atomics . In Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 48-53) . ACM Ramanathan, N., Wickerson, J., Winterstein, F. and Constantinides, G.A., 2016, February. A case for work-stealing on FPGAs with OpenCL atomics. In Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 48-53). ACM"},{"volume-title":"Proceedings of the International Workshop on OpenCL (pp. 1-9).","author":"Jin Z.","key":"e_1_3_2_1_39_1","unstructured":"Jin , Z. and Finkel , H ., 2018, May. Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench . In Proceedings of the International Workshop on OpenCL (pp. 1-9). Jin, Z. and Finkel, H., 2018, May. Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench. In Proceedings of the International Workshop on OpenCL (pp. 1-9)."}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","acronym":"ICPP 2021","location":"Lemont IL USA"},"container-title":["50th International Conference on Parallel Processing Workshop"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3458744.3473360","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3458744.3473360","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:06Z","timestamp":1750268946000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3458744.3473360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":39,"alternative-id":["10.1145\/3458744.3473360","10.1145\/3458744"],"URL":"https:\/\/doi.org\/10.1145\/3458744.3473360","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-09-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}