{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:13:13Z","timestamp":1750306393847,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2016,2,12]],"date-time":"2016-02-12T00:00:00Z","timestamp":1455235200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM Trans. Model. Perform. Eval. Comput. Syst."],"published-print":{"date-parts":[[2016,3,31]]},"abstract":"<jats:p>The OpenACC programming model has been developed to simplify accelerator programming and improve development productivity. In this article, we investigate the main limitations faced by OpenACC in harnessing all capabilities of GPU-like accelerators. We build on our findings and discuss the opportunity to exploit a software-managed cache as (i) a fast communication medium and (ii) a cache for data reuse. To this end, we propose a new directive and communication model for OpenACC. Investigating several benchmarks, we show that the proposed directive can improve performance up to 2.54\u00d7 , and at the cost of minor programming effort.<\/jats:p>","DOI":"10.1145\/2798724","type":"journal-article","created":{"date-parts":[[2016,5,21]],"date-time":"2016-05-21T22:27:38Z","timestamp":1463869658000},"page":"1-34","source":"Crossref","is-referenced-by-count":6,"title":["Employing Software-Managed Caches in OpenACC"],"prefix":"10.1145","volume":"1","author":[{"given":"Ahmad","family":"Lashgar","sequence":"first","affiliation":[{"name":"University of Victoria, Victoria, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amirali","family":"Baniasadi","sequence":"additional","affiliation":[{"name":"University of Victoria, Victoria, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2016,2,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Retrieved","author":"Incorporated AMD","year":"2007","unstructured":"AMD , Incorporated . 2007 . AMD\u2019s close-to-the-metal . Retrieved June 19, 2015 from http:\/\/sourceforge.net\/projects\/amdctm\/. AMD, Incorporated. 2007. AMD\u2019s close-to-the-metal. Retrieved June 19, 2015 from http:\/\/sourceforge.net\/projects\/amdctm\/."},{"volume-title":"Matters Computational","author":"Arndt Jorg","key":"e_1_2_1_2_1","unstructured":"Jorg Arndt . 2011. Matters Computational . Springer , Chap . 23. Jorg Arndt. 2011. Matters Computational. Springer, Chap. 23."},{"key":"e_1_2_1_3_1","volume-title":"Retrieved","author":"Buck Ian","year":"2004","unstructured":"Ian Buck . 2004 . BrookGPU . Retrieved June 19, 2015 from http:\/\/graphics.stanford.edu\/projects\/brookgpu\/. Ian Buck. 2004. BrookGPU. Retrieved June 19, 2015 from http:\/\/graphics.stanford.edu\/projects\/brookgpu\/."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.57"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACCPD.2014.12"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid.2013.12"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2003.1220582"},{"key":"e_1_2_1_9_1","volume-title":"IPMACC: Open source OpenACC to CUDA\/OpenCL translator. arXiv:1412.1127v1 {cs.PL}.","author":"Lashgar Ahmad","year":"2014","unstructured":"Ahmad Lashgar , Alireza Majidi , and Amirali Baniasadi . 2014 . IPMACC: Open source OpenACC to CUDA\/OpenCL translator. arXiv:1412.1127v1 {cs.PL}. Ahmad Lashgar, Alireza Majidi, and Amirali Baniasadi. 2014. IPMACC: Open source OpenACC to CUDA\/OpenCL translator. arXiv:1412.1127v1 {cs.PL}."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2691158.2691159"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2014.6844487"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v19:18"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.31"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/882262.882362"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Supercomputing Conference Poster (SC\u201914 Poster Session)","author":"Murai Hitoshi","year":"2014","unstructured":"Hitoshi Murai , Masahiro Nakao , Takenori Shimosaka , Akihiro Tabuchi , Taisuke Bokut , and Mitsuhisa Sato . 2014 . XcalableACC\u2014A directive-based language extension for accelerated parallel computing . In Proceedings of the Supercomputing Conference Poster (SC\u201914 Poster Session) . Piscataway, NJ, 2. Hitoshi Murai, Masahiro Nakao, Takenori Shimosaka, Akihiro Tabuchi, Taisuke Bokut, and Mitsuhisa Sato. 2014. XcalableACC\u2014A directive-based language extension for accelerated parallel computing. In Proceedings of the Supercomputing Conference Poster (SC\u201914 Poster Session). Piscataway, NJ, 2."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACCPD.2014.6"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_1_19_1","volume-title":"Retrieved","author":"NVIDIA Corp.","year":"2015","unstructured":"NVIDIA Corp. 2015 a. CUDA Toolkit 6.0 . Retrieved June 19, 2015 from https:\/\/developer.nvidia.com\/cuda-downloads. NVIDIA Corp. 2015a. CUDA Toolkit 6.0. Retrieved June 19, 2015 from https:\/\/developer.nvidia.com\/cuda-downloads."},{"key":"e_1_2_1_20_1","volume-title":"Retrieved","author":"NVIDIA Corp.","year":"2015","unstructured":"NVIDIA Corp. 2015 b. NVIDIA CUDA Occupancy Calculator . Retrieved June 19, 2015 from http:\/\/developer.download.nvidia.com\/compute\/cuda\/CUDA_Occupancy_calculator.xls. NVIDIA Corp. 2015b. NVIDIA CUDA Occupancy Calculator. Retrieved June 19, 2015 from http:\/\/developer.download.nvidia.com\/compute\/cuda\/CUDA_Occupancy_calculator.xls."},{"key":"e_1_2_1_21_1","volume-title":"Retrieved","author":"NVIDIA Corp.","year":"2015","unstructured":"NVIDIA Corp. 2015 c. Profiler\u2019s User Guide . Retrieved June 19, 2015 from http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/. NVIDIA Corp. 2015c. Profiler\u2019s User Guide. Retrieved June 19, 2015 from http:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/."},{"volume-title":"Gpu Gems 3","author":"Nyland Lars","key":"e_1_2_1_22_1","unstructured":"Lars Nyland , Mark Harris , and Jan Prins . 2007. Gpu Gems 3 : Chapter 31 (1st ed.). Addison-Wesley Professional . Lars Nyland, Mark Harris, and Jan Prins. 2007. Gpu Gems 3: Chapter 31 (1st ed.). Addison-Wesley Professional."},{"key":"e_1_2_1_23_1","volume-title":"Retrieved","author":"ACC.","year":"2015","unstructured":"Open ACC. 2015 . The OpenACC Application Programming Interface . Retrieved June 19, 2015 from http:\/\/www.openacc-standard.org\/. OpenACC. 2015. The OpenACC Application Programming Interface. Retrieved June 19, 2015 from http:\/\/www.openacc-standard.org\/."},{"volume-title":"Retrieved","year":"2013","key":"e_1_2_1_24_1","unstructured":"PathScale. 2013 . Modified Rodinia Benchmark Suite . Retrieved June 19, 2015 from https:\/\/github.com\/pathscale\/rodinia. PathScale. 2013. Modified Rodinia Benchmark Suite. Retrieved June 19, 2015 from https:\/\/github.com\/pathscale\/rodinia."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32820-6_86"},{"key":"e_1_2_1_26_1","series-title":"Lecture Notes in Computer Science","volume-title":"Euro-Par 2013: Parallel Processing Workshops","author":"Tabuchi Akihiro","unstructured":"Akihiro Tabuchi , Masahiro Nakao , and Mitsuhisa Sato . 2014. A source-to-source OpenACC compiler for CUDA . In Euro-Par 2013: Parallel Processing Workshops . Lecture Notes in Computer Science , Vol. 8374 . Springer , Berlin , 178--187. DOI:http:\/\/dx.doi.org\/10.1007\/978-3-642-54420-0_18 10.1007\/978-3-642-54420-0_18 Akihiro Tabuchi, Masahiro Nakao, and Mitsuhisa Sato. 2014. A source-to-source OpenACC compiler for CUDA. In Euro-Par 2013: Parallel Processing Workshops. Lecture Notes in Computer Science, Vol. 8374. Springer, Berlin, 178--187. DOI:http:\/\/dx.doi.org\/10.1007\/978-3-642-54420-0_18"},{"key":"e_1_2_1_27_1","volume-title":"Retrieved","author":"The Khronos Group","year":"2015","unstructured":"The Khronos Group . 2015 . OpenCL: The open standard for parallel programming of heterogeneous systems . Retrieved June 19, 2015 from https:\/\/www.khronos.org\/opencl\/. The Khronos Group. 2015. OpenCL: The open standard for parallel programming of heterogeneous systems. Retrieved June 19, 2015 from https:\/\/www.khronos.org\/opencl\/."},{"volume-title":"Proceedings of the 11th International Conference on Compiler Construction (CC\u201902)","author":"Thies William","key":"e_1_2_1_28_1","unstructured":"William Thies , Michal Karczmarek , and Saman P. Amarasinghe . 2002. StreamIt: A language for streaming applications . In Proceedings of the 11th International Conference on Compiler Construction (CC\u201902) . Springer-Verlag, London, 179--196. William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC\u201902). Springer-Verlag, London, 179--196."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 26th International Workshop on Languages and Compilers for High Performance Computing (LCPC\u201913)","author":"Tian Xiaonan","year":"2013","unstructured":"Xiaonan Tian , Rengan Xu , Yonghong Yan , Zhifeng Yun , Sunita Chandrasekaran , and Barbara Chapman . 2013 . Compiling a high-level directive-based programming model for GPGPUs . In Proceedings of the 26th International Workshop on Languages and Compilers for High Performance Computing (LCPC\u201913) . Xiaonan Tian, Rengan Xu, Yonghong Yan, Zhifeng Yun, Sunita Chandrasekaran, and Barbara Chapman. 2013. Compiling a high-level directive-based programming model for GPGPUs. In Proceedings of the 26th International Workshop on Languages and Compilers for High Performance Computing (LCPC\u201913)."},{"key":"e_1_2_1_30_1","volume-title":"Retrieved","author":"Vanderbruggen Tristan","year":"2015","unstructured":"Tristan Vanderbruggen . 2015 . RoseACC . Retrieved June 19, 2015 from http:\/\/roseacc.org\/. Tristan Vanderbruggen. 2015. RoseACC. Retrieved June 19, 2015 from http:\/\/roseacc.org\/."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32820-6_85"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2010.5452013"}],"container-title":["ACM Transactions on Modeling and Performance Evaluation of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2798724","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2798724","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:07:18Z","timestamp":1750223238000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2798724"}},"subtitle":["Opportunities and Benefits"],"short-title":[],"issued":{"date-parts":[[2016,2,12]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2016,3,31]]}},"alternative-id":["10.1145\/2798724"],"URL":"https:\/\/doi.org\/10.1145\/2798724","relation":{},"ISSN":["2376-3639","2376-3647"],"issn-type":[{"type":"print","value":"2376-3639"},{"type":"electronic","value":"2376-3647"}],"subject":[],"published":{"date-parts":[[2016,2,12]]}}}