{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T19:51:05Z","timestamp":1765828265977,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":26,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,4,18]],"date-time":"2023-04-18T00:00:00Z","timestamp":1681776000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,4,18]]},"DOI":"10.1145\/3585341.3585342","type":"proceedings-article","created":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T14:05:24Z","timestamp":1680789924000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Implementation Techniques for SPMD Kernels on CPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2656-9863","authenticated-orcid":false,"given":"Joachim","family":"Meyer","sequence":"first","affiliation":[{"name":"Compiler Design Lab, Saarland Informatics Campus, Saarland University, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1976-6375","authenticated-orcid":false,"given":"Aksel","family":"Alpay","sequence":"additional","affiliation":[{"name":"Engineering Mathematics and Computing Lab, Interdisciplinary Center for Scientific Computin, Heidelberg University, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3387-2134","authenticated-orcid":false,"given":"Sebastian","family":"Hack","sequence":"additional","affiliation":[{"name":"Compiler Design Lab, Saarland Informatics Campus, Saarland University, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9562-0680","authenticated-orcid":false,"given":"Holger","family":"Fr\u00f6ning","sequence":"additional","affiliation":[{"name":"Computing Systems Group, Institute of Computer Engineering, Heidelberg University, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2217-7558","authenticated-orcid":false,"given":"Vincent","family":"Heuveline","sequence":"additional","affiliation":[{"name":"Engineering Mathematics and Computing Lab, Interdisciplinary Center for Scientific Computing, Heidelberg University, Germany"}]}],"member":"320","published-online":{"date-parts":[[2023,4,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3388333.3388658"},{"key":"e_1_3_2_1_2_1","volume-title":"Unique Features and SYCL 2020. In International Workshop on OpenCL","author":"Alpay Aksel","year":"2021","unstructured":"Aksel Alpay and Vincent Heuveline . 2021 . HipSYCL in 2021: Peculiarities , Unique Features and SYCL 2020. In International Workshop on OpenCL ( Munich, Germany) (IWOCL\u201921). Association for Computing Machinery, New York, NY, USA, Article 18, 1\u00a0pages. https:\/\/doi.org\/10.1145\/3456669.3456691 10.1145\/3456669.3456691 Aksel Alpay and Vincent Heuveline. 2021. HipSYCL in 2021: Peculiarities, Unique Features and SYCL 2020. In International Workshop on OpenCL (Munich, Germany) (IWOCL\u201921). Association for Computing Machinery, New York, NY, USA, Article 18, 1\u00a0pages. https:\/\/doi.org\/10.1145\/3456669.3456691"},{"key":"e_1_3_2_1_3_1","volume-title":"International Workshop on OpenCL","author":"Alpay Aksel","year":"2023","unstructured":"Aksel Alpay and Vincent Heuveline . 2023 . One pass to bind them: The first single-pass SYCL compiler . In International Workshop on OpenCL ( Cambridge, United Kingdom) (IWOCL\u201923). Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/ Aksel Alpay and Vincent Heuveline. 2023. One pass to bind them: The first single-pass SYCL compiler. In International Workshop on OpenCL (Cambridge, United Kingdom) (IWOCL\u201923). Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/"},{"key":"e_1_3_2_1_4_1","volume-title":"OpenMP Application Program Interface. OpenMP Architecture Review Board. Version 5.0","author":"Architecture\u00a0Review Board MP","year":"2018","unstructured":"Open MP Architecture\u00a0Review Board . 2018. OpenMP Application Program Interface. OpenMP Architecture Review Board. Version 5.0 November 2018 . OpenMP Architecture\u00a0Review Board. 2018. OpenMP Application Program Interface. OpenMP Architecture Review Board. Version 5.0 November 2018."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735702"},{"key":"e_1_3_2_1_6_1","volume-title":"Benchmarking and Extending SYCL Hierarchical Parallelism. In Workshop on Hierarchical Parallelism for Exascale Computing. IEEE Computer Society, United States, 10\u201319","author":"Deakin Tom","year":"2021","unstructured":"Tom Deakin , Simon N McIntosh-Smith , Aksel Alpay , and Vincent Heuveline . 2021 . Benchmarking and Extending SYCL Hierarchical Parallelism. In Workshop on Hierarchical Parallelism for Exascale Computing. IEEE Computer Society, United States, 10\u201319 . Tom Deakin, Simon N McIntosh-Smith, Aksel Alpay, and Vincent Heuveline. 2021. Benchmarking and Extending SYCL Hierarchical Parallelism. In Workshop on Hierarchical Parallelism for Exascale Computing. IEEE Computer Society, United States, 10\u201319."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528425.3529099"},{"volume-title":"The OpenCL\u2122\u00a0 Specification","author":"The Khronos\u00ae OpenCL\u00a0Working Group","key":"e_1_3_2_1_8_1","unstructured":"The Khronos\u00ae OpenCL\u00a0Working Group . 2021. The OpenCL\u2122\u00a0 Specification . Khronos\u00ae Group . Version 3.0.8, retrieved on 19.08.21 from https:\/\/www.khronos.org\/registry\/OpenCL\/specs\/3.0-unified\/pdf\/OpenCL_API.pdf. The Khronos\u00ae OpenCL\u00a0Working Group. 2021. The OpenCL\u2122\u00a0 Specification. Khronos\u00ae Group. Version 3.0.8, retrieved on 19.08.21 from https:\/\/www.khronos.org\/registry\/OpenCL\/specs\/3.0-unified\/pdf\/OpenCL_API.pdf."},{"volume-title":"The OpenCL\u2122\u00a0C Specification","author":"The Khronos\u00ae OpenCL\u00a0Working Group","key":"e_1_3_2_1_9_1","unstructured":"The Khronos\u00ae OpenCL\u00a0Working Group . 2022. The OpenCL\u2122\u00a0C Specification . Khronos\u00ae Group . Version 3.0.12, retrieved on 22.09.22 from https:\/\/registry.khronos.org\/OpenCL\/specs\/3.0-unified\/html\/OpenCL_C.html. The Khronos\u00ae OpenCL\u00a0Working Group. 2022. The OpenCL\u2122\u00a0C Specification. Khronos\u00ae Group. Version 3.0.12, retrieved on 22.09.22 from https:\/\/registry.khronos.org\/OpenCL\/specs\/3.0-unified\/html\/OpenCL_C.html."},{"key":"e_1_3_2_1_10_1","volume-title":"COX: Exposing CUDA Warp-Level Functions to CPUs. ACM Trans. Archit. Code Optim. (jul","author":"Han Ruobing","year":"2022","unstructured":"Ruobing Han , Jaewon Lee , Jaewoong Sim , and Hyesoon Kim . 2022 . COX: Exposing CUDA Warp-Level Functions to CPUs. ACM Trans. Archit. Code Optim. (jul 2022). https:\/\/doi.org\/10.1145\/3554736 Just Accepted . 10.1145\/3554736 Ruobing Han, Jaewon Lee, Jaewoong Sim, and Hyesoon Kim. 2022. COX: Exposing CUDA Warp-Level Functions to CPUs. ACM Trans. Archit. Code Optim. (jul 2022). https:\/\/doi.org\/10.1145\/3554736 Just Accepted."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-014-0320-y"},{"key":"e_1_3_2_1_12_1","unstructured":"Zheming Jin. 2021. HeCBench. Version ba8310c1 https:\/\/github.com\/zjin-lcf\/HeCBench last accessed on 07.11.21.  Zheming Jin. 2021. HeCBench. Version ba8310c1 https:\/\/github.com\/zjin-lcf\/HeCBench last accessed on 07.11.21."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSAMOS.2010.5642061"},{"key":"#cr-split#-e_1_3_2_1_14_1.1","doi-asserted-by":"crossref","unstructured":"David Kaeli Perhaad Mistry Dana Schaa and Dong\u00a0Ping Zhang. 2015. Chapter 8 - Dissecting OpenCL on a heterogeneous system. In Heterogeneous Computing with OpenCL 2.0 David Kaeli Perhaad Mistry Dana Schaa and Dong\u00a0Ping Zhang (Eds.). Morgan Kaufmann Boston 187-212. https:\/\/doi.org\/10.1016\/B978-0-12-801414-1.00008-9 10.1016\/B978-0-12-801414-1.00008-9","DOI":"10.1016\/B978-0-12-801414-1.00008-9"},{"key":"#cr-split#-e_1_3_2_1_14_1.2","doi-asserted-by":"crossref","unstructured":"David Kaeli Perhaad Mistry Dana Schaa and Dong\u00a0Ping Zhang. 2015. Chapter 8 - Dissecting OpenCL on a heterogeneous system. In Heterogeneous Computing with OpenCL 2.0 David Kaeli Perhaad Mistry Dana Schaa and Dong\u00a0Ping Zhang (Eds.). Morgan Kaufmann Boston 187-212. https:\/\/doi.org\/10.1016\/B978-0-12-801414-1.00008-9","DOI":"10.1016\/B978-0-12-801414-1.00008-9"},{"volume-title":"Compiler Construction, Michael O\u2019Boyle (Ed.)","author":"Karrenberg Ralf","key":"e_1_3_2_1_15_1","unstructured":"Ralf Karrenberg and Sebastian Hack . 2012. Improving Performance of OpenCL on CPUs . In Compiler Construction, Michael O\u2019Boyle (Ed.) . Springer Berlin Heidelberg, Berlin , Heidelberg , 1\u201320. Ralf Karrenberg and Sebastian Hack. 2012. Improving Performance of OpenCL on CPUs. In Compiler Construction, Michael O\u2019Boyle (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1\u201320."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3388333.3388669"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3456669.3456701"},{"key":"e_1_3_2_1_19_1","volume-title":"Compiler-Aided Nd-Range Parallel-for Implementations on CPU in HipSYCL. In International Workshop on OpenCL","author":"Meyer Joachim","year":"2022","unstructured":"Joachim Meyer , Aksel Alpay , Holger Fr\u00f6ning , and Vincent Heuveline . 2022 . Compiler-Aided Nd-Range Parallel-for Implementations on CPU in HipSYCL. In International Workshop on OpenCL ( Bristol, United Kingdom, United Kingdom) (IWOCL\u201922). Association for Computing Machinery, New York, NY, USA, Article 28, 3\u00a0pages. https:\/\/doi.org\/10.1145\/3529538.3530216 10.1145\/3529538.3530216 Joachim Meyer, Aksel Alpay, Holger Fr\u00f6ning, and Vincent Heuveline. 2022. Compiler-Aided Nd-Range Parallel-for Implementations on CPU in HipSYCL. In International Workshop on OpenCL (Bristol, United Kingdom, United Kingdom) (IWOCL\u201922). Association for Computing Machinery, New York, NY, USA, Article 28, 3\u00a0pages. https:\/\/doi.org\/10.1145\/3529538.3530216"},{"key":"#cr-split#-e_1_3_2_1_20_1.1","unstructured":"William\u00a0S. Moses Ivan\u00a0R. Ivanov Jens Domke Toshio Endo Johannes Doerfert and Oleksandr Zinenko. 2022. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. https:\/\/doi.org\/10.48550\/ARXIV.2207.00257 10.48550\/ARXIV.2207.00257"},{"key":"#cr-split#-e_1_3_2_1_20_1.2","doi-asserted-by":"crossref","unstructured":"William\u00a0S. Moses Ivan\u00a0R. Ivanov Jens Domke Toshio Endo Johannes Doerfert and Oleksandr Zinenko. 2022. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. https:\/\/doi.org\/10.48550\/ARXIV.2207.00257","DOI":"10.1145\/3572848.3577475"},{"volume-title":"A Virtual GPU as Developer-Friendly OpenMP Offload Target","author":"Patel Atmn","key":"e_1_3_2_1_21_1","unstructured":"Atmn Patel , Shilei Tian , Johannes Doerfert , and Barbara Chapman . 2021. A Virtual GPU as Developer-Friendly OpenMP Offload Target . Association for Computing Machinery , New York, NY, USA , 1\u20137. https:\/\/doi.org\/10.1145\/3458744.3473356 10.1145\/3458744.3473356 Atmn Patel, Shilei Tian, Johannes Doerfert, and Barbara Chapman. 2021. A Virtual GPU as Developer-Friendly OpenMP Offload Target. Association for Computing Machinery, New York, NY, USA, 1\u20137. https:\/\/doi.org\/10.1145\/3458744.3473356"},{"key":"e_1_3_2_1_22_1","first-page":"4","article-title":"Intel\u00ae Threading Building Blocks","volume":"23","author":"Pheatt Chuck","year":"2008","unstructured":"Chuck Pheatt . 2008 . Intel\u00ae Threading Building Blocks . J. Comput. Sci. Coll. 23 , 4 (April 2008), 298. Chuck Pheatt. 2008. Intel\u00ae Threading Building Blocks. J. Comput. Sci. Coll. 23, 4 (April 2008), 298.","journal-title":"J. Comput. Sci. Coll."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3434312"},{"key":"e_1_3_2_1_24_1","volume-title":"MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In Languages and Compilers for Parallel Computing","author":"Stratton A.","year":"2008","unstructured":"John\u00a0 A. Stratton , Sam\u00a0 S. Stone , and Wen-mei\u00a0 W. Hwu . 2008 . MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In Languages and Compilers for Parallel Computing , Jos\u00e9\u00a0Nelson Amaral (Ed.). Springer Berlin Heidelberg , Berlin, Heidelberg, 16\u201330. John\u00a0A. Stratton, Sam\u00a0S. Stone, and Wen-mei\u00a0W. Hwu. 2008. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In Languages and Compilers for Parallel Computing, Jos\u00e9\u00a0Nelson Amaral (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 16\u201330."}],"event":{"name":"IWOCL '23: International Workshop on OpenCL","acronym":"IWOCL '23","location":"Cambridge United Kingdom"},"container-title":["International Workshop on OpenCL"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3585341.3585342","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:56Z","timestamp":1750178276000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3585341.3585342"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,18]]},"references-count":26,"alternative-id":["10.1145\/3585341.3585342","10.1145\/3585341"],"URL":"https:\/\/doi.org\/10.1145\/3585341.3585342","relation":{},"subject":[],"published":{"date-parts":[[2023,4,18]]},"assertion":[{"value":"2023-04-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}