{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T14:12:55Z","timestamp":1768831975154,"version":"3.49.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,7,18]],"date-time":"2019-07-18T00:00:00Z","timestamp":1563408000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"IBM Centre for Advanced Studies"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,9,30]]},"abstract":"<jats:p>Iteration Point Difference Analysis is a new static analysis framework that can be used to determine the memory coalescing characteristics of parallel loops that target GPU offloading and to ascertain safety and profitability of loop transformations with the goal of improving their memory access characteristics. This analysis can propagate definitions through control flow, works for non-affine expressions, and is capable of analyzing expressions that reference conditionally defined values. This analysis framework enables safe and profitable loop transformations. Experimental results demonstrate potential for dramatic performance improvements. GPU kernel execution time across the Polybench suite is improved by up to 25.5\u00d7 on an Nvidia P100 with benchmark overall improvement of up to 3.2\u00d7. An opportunity detected in a SPEC ACCEL benchmark yields kernel speedup of 86.5\u00d7 with a benchmark improvement of 3.3\u00d7. This work also demonstrates how architecture-aware compilers improve code portability and reduce programmer effort.<\/jats:p>","DOI":"10.1145\/3333060","type":"journal-article","created":{"date-parts":[[2019,7,19]],"date-time":"2019-07-19T13:17:14Z","timestamp":1563542234000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops"],"prefix":"10.1145","volume":"16","author":[{"given":"Artem","family":"Chikin","sequence":"first","affiliation":[{"name":"Intel Corporation, Toronto, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taylor","family":"Lloyd","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 Nelson","family":"Amaral","sequence":"additional","affiliation":[{"name":"University of Alberta, Edmonton, AB, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ettore","family":"Tiotto","sequence":"additional","affiliation":[{"name":"IBM Canada, Markham, ON, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Usman","sequence":"additional","affiliation":[{"name":"University of Alberta, Edmonton, AB, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,7,18]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"ACF-Coalescing-LLVM 2019. ACF static analysis framework source-code. Retrieved from: https:\/\/github.com\/uasys\/ACF-Coalescing-LLVM.  ACF-Coalescing-LLVM 2019. ACF static analysis framework source-code. Retrieved from: https:\/\/github.com\/uasys\/ACF-Coalescing-LLVM."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS\u201909)","author":"Bakhoda A."},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Utpal K. Banerjee. 1988. Dependence Analysis for Supercomputing. Kluwer Academic Publishers Norwell MA.   Utpal K. Banerjee. 1988. Dependence Analysis for Supercomputing. Kluwer Academic Publishers Norwell MA.","DOI":"10.1007\/978-1-4684-6894-6"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Conference on Supercomputing. 528--537","author":"Blume W."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/800205.806327"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/512950.512973"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/646348.690401"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/200994.201003"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113448"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the Workshop on Languages and Compilers and Parallel Computing (LCPC\u201993)","author":"Haghighat M."},{"key":"e_1_2_1_12_1","volume-title":"Retrieved on","year":"2011"},{"key":"e_1_2_1_13_1","volume-title":"SPEC ACCEL: A standard application suite for measuring hardware accelerator performance. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation (LNCS)","author":"Juckeland Guido","year":"2015"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/360827.360844"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504194"},{"key":"e_1_2_1_16_1","unstructured":"T. Lloyd K. Ali and J. N. Amaral. 2019. GPUCheck: Detecting CUDA Performance Problems with Static Analysis. Technical Report. University of Alberta Edmonton AB Canada.  T. Lloyd K. Ali and J. N. Amaral. 2019. GPUCheck: Detecting CUDA Performance Problems with Static Analysis. Technical Report. University of Alberta Edmonton AB Canada."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Workshop on Applications for Multi-Core Architectures (WAMCA\u201918)","author":"Lloyd T."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO\u201992)","author":"Mahlke Scott A."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the Conference of Cray User Group (CUG\u201914)","author":"Miles D."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/277830.277874"},{"key":"e_1_2_1_22_1","unstructured":"Nvidia. {n.d.}. Nvidia Tesla V100 GPU architecture\u2014The world\u2019s most advanced data center GPU. Retrieved from: http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf.  Nvidia. {n.d.}. Nvidia Tesla V100 GPU architecture\u2014The world\u2019s most advanced data center GPU. Retrieved from: http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2254064.2254124"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO\u201915)","author":"Cosmin"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6494993"},{"key":"e_1_2_1_26_1","volume-title":"Retrieved on","author":"OpenMP Language Committee","year":"2013"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/509705.509708"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2010.36"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1026013"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/563998.564022"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006226"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504189"},{"key":"e_1_2_1_33_1","unstructured":"Michael Joseph Wolfe. 1982. Optimizing Supercompilers for Supercomputers. Ph.D. Dissertation. University of Illinois at Urbana-Champaign Champaign IL.  Michael Joseph Wolfe. 1982. Optimizing Supercompilers for Supercomputers. Ph.D. Dissertation. University of Illinois at Urbana-Champaign Champaign IL."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMEE.2011.6199223"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3333060","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3333060","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:57:57Z","timestamp":1750208277000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3333060"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,18]]},"references-count":34,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,9,30]]}},"alternative-id":["10.1145\/3333060"],"URL":"https:\/\/doi.org\/10.1145\/3333060","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,18]]},"assertion":[{"value":"2018-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}