{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T07:17:09Z","timestamp":1778743029861,"version":"3.51.4"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T00:00:00Z","timestamp":1417996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51\u00d7 and 4.20\u00d7 (143\u00d7 and 67\u00d7) on Core i7\/NVIDIA GeForce GTX580 and Core i7\/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10\u00d7 speedups over two state-of-the-art automatic GPU code generators.<\/jats:p>","DOI":"10.1145\/2677036","type":"journal-article","created":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T16:17:14Z","timestamp":1418055434000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems"],"prefix":"10.1145","volume":"11","author":[{"given":"Zheng","family":"Wang","sequence":"first","affiliation":[{"name":"Lancaster University"}]},{"given":"Dominik","family":"Grewe","sequence":"additional","affiliation":[{"name":"University of Edinburgh"}]},{"given":"Michael F. P.","family":"O\u2019boyle","sequence":"additional","affiliation":[{"name":"University of Edinburgh"}]}],"member":"320","published-online":{"date-parts":[[2014,12,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Retrieved","author":"AMD.","year":"2013"},{"key":"e_1_2_1_2_1","volume-title":"Profiling & Analysis. Retrieved","author":"AMD.","year":"2014"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 2nd International Workshop on Polyhedral Compilation Techniques (IMPACT\u201912)","author":"Amini Mehdi","year":"2012"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1631"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11970-5_14"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854340"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2012.04.209"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063401"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2013.6799098"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/314403.314414"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735702"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1413370.1413375"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996853"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6495010"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1987237.1987259"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2013.6494993"},{"key":"e_1_2_1_18_1","volume-title":"O\u2019Boyle","author":"Grewe Dominik","year":"2013"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1944862.1944881"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168773.2168775"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950365.1950409"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504219"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259038"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993516"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT\u201913)","author":"Kayiran Onur","year":"2013"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941553.1941591"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.44"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.36"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504194"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1816021"},{"key":"e_1_2_1_32_1","volume-title":"Retrieved","author":"LLVM.","year":"2013"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/258915.258943"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669121"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2581122.2544156"},{"key":"e_1_2_1_36_1","volume-title":"Retrieved","author":"NVIDIA Corp.","year":"2013"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the Workshop on Languages and Compilers for Parallel Computing (LCPC\u201914)","author":"Ogilvie William","year":"2014"},{"key":"e_1_2_1_38_1","volume-title":"Retrieved","author":"Project Omini Compiler","year":"2009"},{"key":"e_1_2_1_39_1","volume-title":"Retrieved","author":"ACC.","year":"2013"},{"key":"e_1_2_1_40_1","volume-title":"Retrieved","author":"PathScale Inc.","year":"2013"},{"key":"e_1_2_1_41_1","volume-title":"White Paper. Retrieved","author":"Portland Group","year":"2010"},{"key":"e_1_2_1_42_1","volume-title":"Programs for Machine Learning. Morgan Kaufmann","author":"Quinlan J. Ross"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/76263.76335"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345220"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2011.6114174"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145816.2145819"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/237721.237727"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772971"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555266"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of Innovative Parallel Computing (InPar). 1--11","author":"Sung Jui"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854336"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542476.1542496"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_1"},{"key":"e_1_2_1_54_1","volume-title":"Retrieved","author":"University of Illinois at Urbana-Champaign (UIUC).","year":"2013"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400713"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1504176.1504189"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854313"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2512436"},{"key":"e_1_2_1_59_1","volume-title":"O\u2019Boyle","author":"Wang Zheng","year":"2014"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2579561"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2014.7116910"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/1735688.1735697"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806596.1806606"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2677036","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2677036","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:11:56Z","timestamp":1750227116000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2677036"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,8]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2677036"],"URL":"https:\/\/doi.org\/10.1145\/2677036","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,12,8]]},"assertion":[{"value":"2013-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-12-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}