{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,2]],"date-time":"2022-04-02T10:18:39Z","timestamp":1648894719819},"reference-count":22,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2015]]},"DOI":"10.1587\/transinf.2014edp7244","type":"journal-article","created":{"date-parts":[[2015,4,1]],"date-time":"2015-04-01T11:11:58Z","timestamp":1427886718000},"page":"812-823","source":"Crossref","is-referenced-by-count":0,"title":["Enabling a Uniform OpenCL Device View for Heterogeneous Platforms"],"prefix":"10.1587","volume":"E98.D","author":[{"given":"Dafei","family":"HUANG","sequence":"first","affiliation":[{"name":"School of Computer, National University of Defense Technology"}]},{"given":"Changqing","family":"XUN","sequence":"additional","affiliation":[{"name":"School of Computer, National University of Defense Technology"}]},{"given":"Nan","family":"WU","sequence":"additional","affiliation":[{"name":"School of Computer, National University of Defense Technology"}]},{"given":"Mei","family":"WEN","sequence":"additional","affiliation":[{"name":"School of Computer, National University of Defense Technology"}]},{"given":"Chunyuan","family":"ZHANG","sequence":"additional","affiliation":[{"name":"School of Computer, National University of Defense Technology"}]},{"given":"Xing","family":"CAI","sequence":"additional","affiliation":[{"name":"Department of Informatics, University of Oslo"}]},{"given":"Qianming","family":"YANG","sequence":"additional","affiliation":[{"name":"School of Computer, National University of Defense Technology"}]}],"member":"532","reference":[{"key":"1","unstructured":"[1] R. Brochard and N. Nikolaev, \u201cMulti-platform implementation of OpenCL 1.2 targeting CPUs,\u201d http:\/\/code.google.com\/p\/freeocl\/, accessed June 5. 2013."},{"key":"2","doi-asserted-by":"crossref","unstructured":"[2] C.K. Luk, S. Hong, and H. Kim, \u201cQilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping,\u201d Proc. 42nd Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 42, pp.45-55, New York, NY, USA, 2009.","DOI":"10.1145\/1669112.1669121"},{"key":"3","unstructured":"[3] D. Grewe and M.F.P. O&apos;Boyle, \u201cA static task partitioning approach for heterogeneous systems using OpenCL,\u201d Compiler Construction, ed. J. Knoop, Lecture Notes in Computer Science, vol.6601, pp.286-305, Springer, 2011."},{"key":"4","unstructured":"[4] M. Boyer, S. Che, K. Skadron, J. Gummaraju, and N. Jayasena, \u201cAutomatic intra-application load balancing for heterogeneous systems,\u201d AMD Fusion Developer Summit, 2011."},{"key":"5","unstructured":"[5] C. Lattner, \u201cLLVM and Clang: Advancing compiler technology,\u201d FOSDEM &apos;11: Free and Open Source Developers&apos; European Meeting, Brussels, Belgium, 2011."},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] A. Danalis, G. Marin, C. McCurdy, J.S. Meredith, P.C. Roth, K. Spafford, V. Tipparaju, and J.S. Vetter, \u201cThe scalable heterogeneous computing (SHOC) benchmark suite,\u201d Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU &apos;10, pp.63-74, New York, NY, USA, 2010.","DOI":"10.1145\/1735688.1735702"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] S. Rixner, W. Dally, U. Kapasi, P. Mattson, and J. Owens, \u201cMemory access scheduling,\u201d Proc. 27th Annual International Symposium on Computer Architecture, ISCA &apos;00, pp.128-138, Vancouver, BC, Canada, June 2000.","DOI":"10.1145\/339647.339668"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] N. Wu, M. Wen, J. Ren, Y. He, C. Xun, W. Wu, and C. Zhang, \u201cCache streamization for high performance stream processor,\u201d International Conference on High Performance Computing (HiPC), pp.140-149, Kochi, India, Dec. 2009.","DOI":"10.1109\/HIPC.2009.5433214"},{"key":"9","unstructured":"[9] J.A. Stratton, C. Rodrigues, I.J. Sung, N. Obeid, L.W. Chang, N. Anssari, G.D. Liu, and W.W. Hwu, \u201cParboil: A revised benchmark suite for scientific and commercial throughput computing,\u201d Tech. Rep. IMPACT-12-01, Center for Reliable and High-Performance Computing, 2012."},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] S. Seo, G. Jo, and J. Lee, \u201cPerformance characterization of the NAS parallel benchmarks in OpenCL,\u201d IEEE International Symposium on Workload Characterization (IISWC), pp.137-148, Austin, TX, USA, 2011.","DOI":"10.1109\/IISWC.2011.6114174"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] J. Kim, H. Kim, J.H. Lee, and J. Lee, \u201cAchieving a single compute device image in OpenCL for multiple GPUs,\u201d Proc. 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP &apos;11, pp.277-288, New York, NY, USA, 2011.","DOI":"10.1145\/1941553.1941591"},{"key":"12","doi-asserted-by":"crossref","unstructured":"[12] P. Pandit and R. Govindarajan, \u201cFluidic kernels: Cooperative execution of OpenCL programs on multiple heterogeneous devices,\u201d Proc. Annual IEEE\/ACM International Symposium on Code Generation and Optimization, CGO &apos;14, pp.273-283, Orlando, FL, USA, 2014.","DOI":"10.1145\/2581122.2544163"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] E. Sun, D. Schaa, R. Bagley, N. Rubin, and D. Kaeli, \u201cEnabling task-level scheduling on heterogeneous platforms,\u201d Proc. 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp.84-93, New York, NY, USA, 2012.","DOI":"10.1145\/2159430.2159440"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee, \u201cOpenCL as a unified programming model for heterogeneous CPU\/GPU clusters,\u201d Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP &apos;12, pp.299-300, New Orleans, LA, USA, 2012.","DOI":"10.1145\/2145816.2145863"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] H.P. Huynh, A. Hagiescu, W.F. Wong, and R.S.M. Goh, \u201cScalable framework for mapping streaming applications onto multi-GPU systems,\u201d Proc. 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP &apos;12, pp.1-10, New Orleans, LA, USA, Feb. 2012.","DOI":"10.1145\/2145816.2145818"},{"key":"16","unstructured":"[16] D. Kunzman and L. Kale, \u201cProgramming heterogeneous systems,\u201d IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp.2061-2064, Anchorage, AK, USA, 2011."},{"key":"17","doi-asserted-by":"crossref","unstructured":"[17] D. Unat, X. Cai, and S.B. Baden, \u201cMint: realizing CUDA performance in 3D stencil methods with annotated C,\u201d Proc. International Conference on Supercomputing, ICS &apos;11, pp.214-224, New York, NY, USA, 2011.","DOI":"10.1145\/1995896.1995932"},{"key":"18","unstructured":"[18] N. Maruyama, T. Nomura, K. Sato, and S. Matsuoka, \u201cPhysis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers,\u201d Proc. 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC &apos;11, pp.11: 1-11: 12, Seattle, WA, USA, 2011."},{"key":"19","unstructured":"[19] G. Quintana-Ort\u00ed, E.S. Quintana-Ort\u00ed, A. Rem\u00f3n, and R.A. van de Geijn, \u201cAn algorithm-by-blocks for supermatrix band Cholesky factorization,\u201d in High Performance Computing for Computational Science-VECPAR 2008, ed. J.M.L.M. Palma, P.R. Amestoy, M. Dayd\u00e9, M. Mattoso, and J.C. Lopes, Lecture Notes in Computer Science, vol.5336, pp.228-239, Springer-Verlag, 2008."},{"key":"20","unstructured":"[20] S. Pennycook, S. Hammond, S. Wright, J. Herdman, I. Miller, and S.A. Jarvis, \u201cAn investigation of the performance portability of OpenCL,\u201d Journal of Parallel and Distributed Computing, vol.73, pp.1439-1450, Elsevier, Nov. 2013."},{"key":"21","unstructured":"[21] H. Dong, D. Ghosh, F. Zafar, and S. Zhou, \u201cCross-platform OpenCL code and performance portability for CPU and GPU architectures investigated with a climate and weather physics model,\u201d Proc. 2012 41st International Conference on Parallel Processing Workshops, pp.126-134, Pittsburgh, PA, USA, Sept. 2012."},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] D. Huang, M. Wen, C. Xun, D. Chen, X. Cai, Y Qiao, N. Wu, and C. Zhang, \u201cAutomated transformation of GPU-specific OpenCL kernels targeting performance portability on multi-Core\/Many-core CPU,\u201d Proc. 20th International European Conference on Parallel and Distributed Computing, pp.210-211, Porto, Portugal, Aug. 2014.","DOI":"10.1007\/978-3-319-09873-9_18"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E98.D\/4\/E98.D_2014EDP7244\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,22]],"date-time":"2019-08-22T14:44:36Z","timestamp":1566485076000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E98.D\/4\/E98.D_2014EDP7244\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015]]},"references-count":22,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2014edp7244","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015]]}}}