{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,3]],"date-time":"2023-09-03T05:41:27Z","timestamp":1693719687440},"reference-count":23,"publisher":"Wiley","issue":"2","license":[{"start":{"date-parts":[[2015,8,18]],"date-time":"2015-08-18T00:00:00Z","timestamp":1439856000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2016,2]]},"abstract":"<jats:title>Summary<\/jats:title><jats:p>Manycore accelerators have the potential to significantly improve performance of scientific applications when offloading computationally intensive program portions to accelerators. Directive\u2010based high\u2010level programming models, such as OpenACC and OpenMP, are used to create applications for accelerators through annotating regions of code meant for offloading. OpenACC is an emerging directive\u2010based programming model for programming accelerators that typically enable inexperienced programmers to achieve portable and productive performance within applications. In this paper, we present our research in developing challenges and solutions when creating an open\u2010source OpenACC compiler in an industrial framework (OpenUH as a branch of Open64). We then discuss in detail techniques we developed for loop scheduling reduction operations on general purpose GPUs. The compiler is evaluated with benchmarks from the NAS Parallel Benchmarks suite and self\u2010written micro\u2010benchmarks for reduction operations. This implementation has been designed to serve as a compiler infrastructure for researchers to explore advanced compiler techniques, extend OpenACC to other programming models, and build performance tools used in conjunction with OpenACC programs. Copyright \u00a9 2015 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.3648","type":"journal-article","created":{"date-parts":[[2015,8,19]],"date-time":"2015-08-19T00:37:33Z","timestamp":1439944653000},"page":"537-556","source":"Crossref","is-referenced-by-count":8,"title":["Compiler transformation of nested loops for general purpose GPUs"],"prefix":"10.1002","volume":"28","author":[{"given":"Xiaonan","family":"Tian","sequence":"first","affiliation":[{"name":"Department of Computer Science University of Houston  Houston 77204 TX USA"}]},{"given":"Rengan","family":"Xu","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Houston  Houston 77204 TX USA"}]},{"given":"Yonghong","family":"Yan","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Houston  Houston 77204 TX USA"}]},{"given":"Sunita","family":"Chandrasekaran","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Houston  Houston 77204 TX USA"}]},{"given":"Deepak","family":"Eachempati","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Houston  Houston 77204 TX USA"}]},{"given":"Barbara","family":"Chapman","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Houston  Houston 77204 TX USA"}]}],"member":"311","published-online":{"date-parts":[[2015,8,18]]},"reference":[{"key":"e_1_2_11_2_1","unstructured":"CUDA. 2013. (Available from:http:\/\/www.nvidia.com\/object\/cuda_{h}ome_{n}ew.html.) [Accessed on 2 April 2014]."},{"key":"e_1_2_11_3_1","unstructured":"OpenCL Standard. 2013. (Available from:http:\/\/www.khronos.org\/opencl.) [Accessed on 2 November 2013]."},{"key":"e_1_2_11_4_1","unstructured":"DolbeauR BihanS BodinF.HMPP: a hybrid multi\u2010core parallel programming environment. InWorkshop on General Purpose Processing on Graphics Processing Units (GPGPU'07):Boston MA 2007."},{"key":"e_1_2_11_5_1","unstructured":"OpenACC. 2013. (Available from:http:\/\/www.openacc-standard.org.) [Accessed on 17 March 2014]."},{"key":"e_1_2_11_6_1","unstructured":"OpenMP. 2013. (Available from:http:\/\/www.openmp.org.) [Accessed on 20 November 2013]."},{"key":"e_1_2_11_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1174"},{"key":"e_1_2_11_8_1","doi-asserted-by":"crossref","unstructured":"TianX XuR YanY YunZ ChandrasekaranS ChapmanB.Compiling a high\u2010level directive\u2010based programming model for accelerators. InLCPC 2013: The 26th International Workshop on Languages and Compilers for Parallel Computing:San Jose CA USA 2013;105\u2013120.","DOI":"10.1007\/978-3-319-09967-5_6"},{"key":"e_1_2_11_9_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434209100500306"},{"key":"e_1_2_11_10_1","unstructured":"NPB CUDA benchmarks 2013. (Available from:http:\/\/www.tu-chemnitz.de\/informatik\/PI\/forschung\/download\/npb-gpu\/index.php.en.) [Accessed on 14 February 2014]."},{"key":"e_1_2_11_11_1","article-title":"The PGI Fortran and C99 OpenACC compilers","author":"Leback B","year":"2012","journal-title":"Cray User Group"},{"key":"e_1_2_11_12_1","doi-asserted-by":"crossref","unstructured":"CUDA C PROGRAMMING GUIDE 2013. (Available from:http:\/\/docs.nvidia.com\/cuda\/pdf\/CUDA\\_{C}\\_Programming\\_{G}uide.pdf.) [Accessed on 19 August 2013].","DOI":"10.1016\/S1353-4858(13)70015-1"},{"key":"e_1_2_11_13_1","unstructured":"CAPS OpenACC Parallism Mapping 2013. (Available from: http:\/\/kb.caps\u2010entreprise.com\/what\u2010gang\u2010workers\u2010and\u2010threads\u2010correspond\u2010to\u2010on\u2010a\u2010cuda\u2010card.) [Accessed on 16 June 2013]."},{"key":"e_1_2_11_14_1","unstructured":"CrayC.C++ reference manual 2003."},{"key":"e_1_2_11_15_1","doi-asserted-by":"crossref","unstructured":"XuR TianX YanY ChandrasekaranS ChapmanB.Reduction operations in parallel loops for GPGPUs. InProceedings of Programming Models and Applications on Multicores and Manycores.ACM:Orlando Florida 2014;10.","DOI":"10.1145\/2578948.2560692"},{"key":"e_1_2_11_16_1","article-title":"Optimizing parallel reduction in CUDA","volume":"6","author":"Harris M","year":"2007","journal-title":"NVIDIA Developer Technology"},{"key":"e_1_2_11_17_1","article-title":"Precision & performance: floating point and IEEE 754 compliance for NVIDIA GPUs","author":"Whitehead N","year":"2011","journal-title":"nVidia Technical White Paper"},{"key":"e_1_2_11_18_1","unstructured":"SPEC ACCEL 2014. (Available from:http:\/\/www.spec.org\/auto\/accel\/Docs.) [Accessed on 2 April 2014]."},{"key":"e_1_2_11_19_1","unstructured":"CAPS Enterprise OpenACC Compiler Reference Manual 2013. (Available from:http:\/\/www.openacc.org\/sites\/default\/files\/HMPPOpenACC-3.2_{R}eferenceManual.pdf.) [Accessed on 21 October 2013]."},{"key":"e_1_2_11_20_1","unstructured":"PGI Accelerator Compilers 2013. (Available from:http:\/\/www.pgroup.com\/resources\/accel.htm.) [Accessed on 22 October 2013]."},{"key":"e_1_2_11_21_1","unstructured":"OpenARC:Open Accelerator Research Compiler 2013. (Available from:http:\/\/ft.ornl.gov\/research\/openarc.) [Accessed on 2 April 2014]."},{"key":"e_1_2_11_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40698-0_7"},{"key":"e_1_2_11_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.62"},{"key":"e_1_2_11_24_1","doi-asserted-by":"crossref","unstructured":"WolfeM.Implementing the PGI accelerator model. InProceedings of the 3rd Workshop on General\u2010Purpose Computation on Graphics Processing Units (GPGPU \u201810).ACM:New York NY USA 2010;43\u201350.","DOI":"10.1145\/1735688.1735697"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.3648","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.3648","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T13:25:42Z","timestamp":1693661142000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.3648"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,8,18]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,2]]}},"alternative-id":["10.1002\/cpe.3648"],"URL":"http:\/\/dx.doi.org\/10.1002\/cpe.3648","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"value":"1532-0626","type":"print"},{"value":"1532-0634","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,8,18]]}}}