{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:32:01Z","timestamp":1759332721176,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T00:00:00Z","timestamp":1634256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004164","name":"MediaTek","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004164","id-type":"DOI","asserted-by":"crossref"}]},{"name":"MOST of Taiwan"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This GPU architecture is suitable for applications with a high degree of data parallelism, but its performance degrades seriously when divergence occurs. Many optimizations for divergence have been proposed, and they vary with the divergence information about variables and branches. A previous analysis scheme viewed pointers and return values from functions as divergence directly, and only focused on OpenCL 1.x. In this article, we present a novel scheme that reports the divergence information for pointer-intensive OpenCL programs. The approach is based on extended static single assignment (SSA) and adds some special functions and annotations from memory SSA and gated SSA. The proposed scheme first constructs extended SSA, which is then used to build a divergence relation graph that includes all of the possible points-to relationships of the pointers and initialized divergence states. The divergence state of the pointers can be determined by propagating the divergence state of the divergence relation graph. The scheme is further extended for interprocedural cases by considering function-related statements. The proposed scheme was implemented in an LLVM compiler and can be applied to OpenCL programs. We analyzed 10 programs with 24 kernels, with a total analyzed program size of 1,306 instructions in an LLVM intermediate representation, with 885 variables, 108 branches, and 313 pointer-related statements. The total number of divergent pointers detected was 146 for the proposed scheme, 200 for the scheme in which the pointer was always divergent, and 155 for the current LLVM default scheme; the total numbers of divergent variables detected were 458, 519, and 482, respectively, with 31, 34, and 32 divergent branches. These experimental results indicate that the proposed scheme is more precise than both a scheme in which a pointer is always divergent and the current LLVM default scheme.<\/jats:p>","DOI":"10.1145\/3470644","type":"journal-article","created":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T01:38:50Z","timestamp":1634434730000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Pointer-Based Divergence Analysis for OpenCL 2.0 Programs"],"prefix":"10.1145","volume":"8","author":[{"given":"Shao-Chung","family":"Wang","sequence":"first","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lin-Ya","family":"Yu","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li-An","family":"Her","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuan-Shin","family":"Hwang","sequence":"additional","affiliation":[{"name":"National Taiwan University of Science and Technology, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jenq-Kuen","family":"Lee","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,15]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_1_2_2","DOI":"10.1007\/978-3-642-54807-9_8"},{"doi-asserted-by":"publisher","key":"e_1_3_1_3_2","DOI":"10.1145\/1531743.1531766"},{"doi-asserted-by":"publisher","key":"e_1_3_1_4_2","DOI":"10.1109\/IISWC.2009.5306797"},{"doi-asserted-by":"publisher","key":"e_1_3_1_5_2","DOI":"10.1145\/158511.158639"},{"doi-asserted-by":"publisher","key":"e_1_3_1_6_2","DOI":"10.5555\/647473.760381"},{"doi-asserted-by":"publisher","key":"e_1_3_1_7_2","DOI":"10.1109\/PACT.2011.63"},{"doi-asserted-by":"publisher","key":"e_1_3_1_8_2","DOI":"10.1145\/115372.115320"},{"doi-asserted-by":"publisher","key":"e_1_3_1_9_2","DOI":"10.1109\/SBAC-PAD.2015.16"},{"unstructured":"International Organization for Standardization. ISO\/IEC 14882:2017 Programming Languages\u2014C++. Retrieved August 3 2020 from https:\/\/www.iso.org\/standard\/68564.html.","key":"e_1_3_1_10_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_11_2","DOI":"10.5555\/2014698.2014893"},{"doi-asserted-by":"publisher","key":"e_1_3_1_12_2","DOI":"10.1109\/MICRO.2007.12"},{"unstructured":"Khronos\u00ae OpenCL Working Group. The OpenCL\u2122 C 3.0 Specification. Retrieved August 3 2020 from https:\/\/www.khronos.org\/registry\/OpenCL\/specs\/3.0-unified\/html\/OpenCL_C.html.","key":"e_1_3_1_13_2"},{"unstructured":"Vinod Grover Bastiaan Joannes Matheus Aarts and Michael Murphy. 2009. Variance analysis for translating CUDA code for execution by a general purpose processor. US Patent 8 984 498.","key":"e_1_3_1_14_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_15_2","DOI":"10.1145\/1964179.1964184"},{"doi-asserted-by":"publisher","key":"e_1_3_1_16_2","DOI":"10.1145\/2458523.2458525"},{"doi-asserted-by":"publisher","key":"e_1_3_1_17_2","DOI":"10.1109\/TPDS.2012.73"},{"unstructured":"Adel Johar and Anton Gorenko. n.d. GEGL-OpenCL: OpenCL in GIMP. Retrieved June 18 2021 from https:\/\/opencl.org\/projects\/gegl-opencl-in-gimp\/.","key":"e_1_3_1_18_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_19_2","DOI":"10.5555\/2190025.2190061"},{"doi-asserted-by":"publisher","key":"e_1_3_1_20_2","DOI":"10.1007\/978-3-642-28652-0_1"},{"doi-asserted-by":"publisher","key":"e_1_3_1_21_2","DOI":"10.1145\/2259016.2259020"},{"unstructured":"Shorin Kyo. 2012. Selecting broadcast SIMD instruction or cached MIMD instruction stored in local memory of one of plurality of processing elements for all elements in each unit. US Patent 8 112 613.","key":"e_1_3_1_22_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_23_2","DOI":"10.5555\/977395.977673"},{"doi-asserted-by":"publisher","key":"e_1_3_1_24_2","DOI":"10.1145\/1594835.1504194"},{"doi-asserted-by":"publisher","key":"e_1_3_1_25_2","DOI":"10.1109\/CGO.2013.6494995"},{"doi-asserted-by":"publisher","key":"e_1_3_1_26_2","DOI":"10.1145\/1816038.1815992"},{"doi-asserted-by":"publisher","key":"e_1_3_1_27_2","DOI":"10.1145\/3296979.3192413"},{"issue":"4","key":"e_1_3_1_28_2","first-page":"28","article-title":"C++\u2014A code-based introduction to C++ AMP","volume":"27","author":"Moth Daniel","year":"2012","unstructured":"Daniel Moth. 2012. C++\u2014A code-based introduction to C++ AMP. MSDN Magazine-Louisville 27, 4 (April 2012), 28.","journal-title":"MSDN Magazine-Louisville"},{"doi-asserted-by":"publisher","key":"e_1_3_1_29_2","DOI":"10.1145\/1365490.1365500"},{"doi-asserted-by":"publisher","key":"e_1_3_1_30_2","DOI":"10.1109\/ASSCC.2011.6123653"},{"doi-asserted-by":"publisher","key":"e_1_3_1_31_2","DOI":"10.1145\/93548.93578"},{"doi-asserted-by":"publisher","key":"e_1_3_1_32_2","DOI":"10.1109\/HPCSim.2016.7568315"},{"doi-asserted-by":"publisher","key":"e_1_3_1_33_2","DOI":"10.1145\/73560.73562"},{"doi-asserted-by":"publisher","key":"e_1_3_1_34_2","DOI":"10.1145\/1345206.1345220"},{"doi-asserted-by":"publisher","key":"e_1_3_1_35_2","DOI":"10.1109\/SBAC-PAD.2012.22"},{"doi-asserted-by":"publisher","key":"e_1_3_1_36_2","DOI":"10.1007\/978-3-642-33182-4_3"},{"unstructured":"ISO Standard. 2014. Programming Languages\u2014Technical Specification for C++ Extensions for Parallelism . Standard ISO\/IEC TS. ISO http:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2015\/n4354.pdf.","key":"e_1_3_1_37_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_38_2","DOI":"10.5555\/2220077.2220227"},{"doi-asserted-by":"publisher","key":"e_1_3_1_39_2","DOI":"10.1145\/1772954.1772971"},{"doi-asserted-by":"publisher","key":"e_1_3_1_40_2","DOI":"10.1145\/223428.207115"},{"doi-asserted-by":"publisher","key":"e_1_3_1_41_2","DOI":"10.1145\/3133218"},{"doi-asserted-by":"publisher","key":"e_1_3_1_42_2","DOI":"10.1007\/978-3-642-32820-6_85"},{"doi-asserted-by":"publisher","key":"e_1_3_1_43_2","DOI":"10.1177\/1094342011434814"},{"doi-asserted-by":"publisher","key":"e_1_3_1_44_2","DOI":"10.1145\/1810085.1810104"},{"doi-asserted-by":"publisher","key":"e_1_3_1_45_2","DOI":"10.1145\/1950365.1950408"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470644","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470644","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:55Z","timestamp":1750191535000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470644"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,15]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3470644"],"URL":"https:\/\/doi.org\/10.1145\/3470644","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2021,10,15]]},"assertion":[{"value":"2020-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}