{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T05:11:09Z","timestamp":1654146669402},"reference-count":13,"publisher":"IGI Global","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,7,1]]},"abstract":"<p>Heterogeneous platforms that are consisted of CPU and add-on streaming processors are widely used in modern computer systems. These add-on processors provide substantially more computation capability and memory bandwidth than conventional multi-cores platforms. General-purpose computations can also be leveraged onto these add-on processors. In order to utilize their potential performance, programming these streaming processors is challenging because of their diverse underlying architectural characteristics. Several optimization techniques are applied on OpenCL-compatible heterogeneous platforms to achieve thread-level, data-level, and instruction-level parallelism. The architectural implications of these techniques and optimization principles are discussed. Finally, a case study of MRI-Q benchmark will be addressed to illustrate to capabilities of these optimization techniques. The experimental results reveal the speedup from non-optimized to optimized kernel can vary from 8 to 63 on different target platforms.<\/p>","DOI":"10.4018\/jghpc.2012070103","type":"journal-article","created":{"date-parts":[[2012,8,15]],"date-time":"2012-08-15T20:10:21Z","timestamp":1345061421000},"page":"48-62","source":"Crossref","is-referenced-by-count":0,"title":["Optimizing Techniques for OpenCL Programs on Heterogeneous Platforms"],"prefix":"10.4018","volume":"4","author":[{"given":"Slo-Li","family":"Chu","sequence":"first","affiliation":[{"name":"Chung Yuan Christian University, Taiwan"}]},{"given":"Chih-Chieh","family":"Hsiao","sequence":"additional","affiliation":[{"name":"Chung Yuan Christian University, Taiwan"}]}],"member":"2432","reference":[{"key":"jghpc.2012070103-0","unstructured":"ATi. (2010). ATi stream computing OpenCLTM\u201d programming guide. Markham, ON, Canada: ATi."},{"key":"jghpc.2012070103-1","doi-asserted-by":"crossref","unstructured":"Breitbart, J., & Fohry, C. (2010). OpenCL - An effective programming model for data parallel computations at the cell broadband engine. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, Workshops and Ph.D. Forum.","DOI":"10.1109\/IPDPSW.2010.5470823"},{"issue":"10","key":"jghpc.2012070103-2","article-title":"A performance study of general-purpose applications on graphics processors using CUDA.","volume":"68","author":"S.Che","year":"2008","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"jghpc.2012070103-3","doi-asserted-by":"crossref","unstructured":"Gummaraju, J., Morichetti, L., Houston, M., Sander, B., Gaster, B., & Zheng, B. (2010). Twin Peaks: A software platform for heterogeneous computing on general-purpose and graphics processors. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (pp. 205-215).","DOI":"10.1145\/1854273.1854302"},{"key":"jghpc.2012070103-4","unstructured":"IMPACT Research Group. (2007). Parboil benchmark suite. Retrieved from http:\/\/impact.crhc.illinois.edu\/parboil.php"},{"key":"jghpc.2012070103-5","unstructured":"Khronos. (2009). The OpenCL specification 1.0 rev.48. Beaverton, OR: Khronos OpenCL Working Group."},{"key":"jghpc.2012070103-6","unstructured":"Komatsu, K., Sato, K., Arai, Y., Koyama, K., Takizawa, H., & Kobayashil, H. (2010). Evaluating performance and portability of OpenCL programs. In Proceedings of the 9th International Meeting on High Performance for Computational Science."},{"key":"jghpc.2012070103-7","doi-asserted-by":"crossref","unstructured":"Lee, J., Kim, J., Seo, S., Kim, S., Park, J., & Kim, H. \u2026Choi, J. (2010). An OpenCL framework for heterogeneous multicores with local memory. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (pp. 193-204).","DOI":"10.1145\/1854273.1854301"},{"key":"jghpc.2012070103-8","unstructured":"nVidia. (2009). OpenCL programming for the CUDA architecture. Santa Clara, CA: nVidia."},{"key":"jghpc.2012070103-9","doi-asserted-by":"crossref","unstructured":"Ryoo, S., Rodrigues, C., Baghsorkhi, S., Stone, S., Kirk, D., & Hwu, W. (2008). Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 73-82).","DOI":"10.1145\/1345206.1345220"},{"key":"jghpc.2012070103-10","doi-asserted-by":"crossref","unstructured":"Sharma, B., & Vydyanathan, N. (2010). Parallel discrete wavelet transform using the open computing language: A performance and portability study. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, Workshops and Ph.D. Forum (pp. 1-8).","DOI":"10.1109\/IPDPSW.2010.5470830"},{"key":"jghpc.2012070103-11","unstructured":"Stone, S., Yi, H., Haldar, J., Hwu, W., Sutton, B., & Liang, Z. (2007). How GPUs can improve the quality of magnetic resonance imaging. In Proceedings of the First Workshop on General Purpose Processing on Graphics Processing Units."},{"issue":"1","key":"jghpc.2012070103-12","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1109\/TPDS.2010.125","article-title":"Comparing hardware accelerators in scientific applications: A case study.","volume":"22","author":"R.Weber","year":"2010","journal-title":"IEEE Transactions on Parallel and Distributed Systems"}],"container-title":["International Journal of Grid and High Performance Computing"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=69805","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,2]],"date-time":"2022-06-02T04:52:23Z","timestamp":1654145543000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jghpc.2012070103"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2012,7,1]]},"references-count":13,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,7]]}},"URL":"https:\/\/doi.org\/10.4018\/jghpc.2012070103","relation":{},"ISSN":["1938-0259","1938-0267"],"issn-type":[{"value":"1938-0259","type":"print"},{"value":"1938-0267","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,1]]}}}