{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:34:08Z","timestamp":1760243648312,"version":"build-2065373602"},"reference-count":35,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2013,11,19]],"date-time":"2013-11-19T00:00:00Z","timestamp":1384819200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>The increasing incorporation of Graphics Processing Units (GPUs) as accelerators has been one of the forefront High Performance Computing (HPC) trends and provides unprecedented performance; however, the prevalent adoption of the Single-Program Multiple-Data (SPMD) programming model brings with it challenges of resource underutilization. In other words, under SPMD, every CPU needs GPU capability available to it. However, since CPUs generally outnumber GPUs, the asymmetric resource distribution gives rise to overall computing resource underutilization. In this paper, we propose to efficiently share the GPU under SPMD and formally define a series of GPU sharing scenarios. We provide performance-modeling analysis for each sharing scenario with accurate experimentation validation. With the modeling basis, we further conduct experimental studies to explore potential GPU sharing efficiency improvements from multiple perspectives. Both further theoretical and experimental GPU sharing performance analysis and results are presented. Our results not only demonstrate the significant performance gain for SPMD programs with the proposed efficient GPU sharing, but also the further improved sharing efficiency with the optimization techniques based on our accurate modeling.<\/jats:p>","DOI":"10.3390\/computers2040176","type":"journal-article","created":{"date-parts":[[2013,11,19]],"date-time":"2013-11-19T12:33:03Z","timestamp":1384864383000},"page":"176-214","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Exploring Graphics Processing Unit (GPU) Resource Sharing Efficiency for High Performance Computing"],"prefix":"10.3390","volume":"2","author":[{"given":"Teng","family":"Li","sequence":"first","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC), Department of Electrical and Computer Engineering, The George Washington University, 801 22nd Street NW, Washington, DC, 20052, USA"}]},{"given":"Vikram","family":"Narayana","sequence":"additional","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC), Department of Electrical and Computer Engineering, The George Washington University, 801 22nd Street NW, Washington, DC, 20052, USA"}]},{"given":"Tarek","family":"El-Ghazawi","sequence":"additional","affiliation":[{"name":"NSF Center for High-Performance Reconfigurable Computing (CHREC), Department of Electrical and Computer Engineering, The George Washington University, 801 22nd Street NW, Washington, DC, 20052, USA"}]}],"member":"1968","published-online":{"date-parts":[[2013,11,19]]},"reference":[{"key":"ref_1","unstructured":"GPGPU Webpage. http:\/\/www.gpgpu.org."},{"key":"ref_2","unstructured":"NVIDIA Corp NVIDIA Tesla GPU Computing: Revolutionizing High Performance Computing. Available online: http:\/\/www.nvidia.com\/docs\/IO\/43399\/tesla-brochure-12-lr.pdf."},{"key":"ref_3","unstructured":"Advanced Micro Devices, Inc. AMD Firestream 9350 Datasheet. Available online: http:\/\/www.amd.com\/us\/Documents\/FireStream_9350_Datasheet.pdf."},{"key":"ref_4","unstructured":"Fan, Z., Qiu, F., Kaufman, A., and Yoakum-Stover, S. GPU Cluster for High Performance Computing. Proceedings of the 2004 ACM\/IEEE Conference on Supercomputing."},{"key":"ref_5","unstructured":"Kindratenko, V., Enos, J., Shi, G., Showerman, M., Arnold, G., Stone, J., Phillips, J., and Hwu, W. (2009). Proceedings of the IEEE International Conference on Cluster Computing and Workshops, IEEE."},{"key":"ref_6","unstructured":"Cray Inc Cray XK7 Brochure. Available online: http:\/\/www.cray.com\/Assets\/PDF\/products\/xk\/CrayXK7Brochure.pdf."},{"key":"ref_7","unstructured":"Cray Inc. Cray XK6 Brochure. Available online: http:\/\/www.cray.com\/Assets\/PDF\/products\/xk\/CrayXK6Brochure.pdf."},{"key":"ref_8","unstructured":"SGI Corp SGI GPU Compute Solutions. Available online: http:\/\/www.sgi.com\/pdfs\/4235.pdf."},{"key":"ref_9","unstructured":"Top 500 Supercomputer Sites Webpage. http:\/\/www.top500.org."},{"key":"ref_10","unstructured":"Titan Webpage in the Oak Ridge National Lab, http:\/\/www.olcf.ornl.gov\/titan."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cotronis, Y., and Dongarra, J. (2001). Recent Advances in Parallel Virtual Machine and Message Passing Interface, Springer. Chapter 1.","DOI":"10.1007\/3-540-45417-9"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Gropp, W.D., Lusk, E.L., and Skjellum, A. (1999). Using MPI: Portable Parallel Programming with the Message-Passing Interface, MIT Press.","DOI":"10.7551\/mitpress\/7056.001.0001"},{"key":"ref_13","unstructured":"China National Supercomputer Center in Tianjin Webpage, http:\/\/nscc-tj.gov.cn."},{"key":"ref_14","unstructured":"Nebulae Specification in Sugon Webpage. http:\/\/www.sugon.com\/en\/."},{"key":"ref_15","unstructured":"GSIC, Tokyo Institute of Technology Tsubame Hardware Software Specifications. Available online: http:\/\/www.gsic.titech.ac.jp\/sites\/default\/files\/TSUBAME_SPECIFICATIONS_en_0.pdf."},{"key":"ref_16","unstructured":"NVIDIA Corp NVIDIA\u2019s Next Generation CUDA Compute Architecture: Fermi. Available online: http:\/\/www.nvidia.com\/content\/PDF\/fermi_white_papers\/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf."},{"key":"ref_17","unstructured":"NVIDIA Corp NVIDIA CUDA C-Programming Guide Ver. 5.5. Available online: http:\/\/docs.nvidia.com\/cuda\/pdf\/CUDA_C_Programming_Guide.pdf."},{"key":"ref_18","unstructured":"Li, T., Narayana, V.K., El-Araby, E., and El-Ghazawi, T. GPU Resource Sharing and Virtualization on High Performance Computing Systems. Proceedings of the 40th International Conference on Parallel Processing."},{"key":"ref_19","unstructured":"NVIDIA Corp NVIDIA\u2019s Next Generation CUDA Compute Architecture: Kepler GK110. Available online: http:\/\/www.nvidia.com\/content\/PDF\/kepler\/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf."},{"key":"ref_20","unstructured":"Guevara, M., Gregg, C., Hazelwood, K., and Skadron, K. Enabling Task Parallelism in the CUDA Scheduler. Proceedings of the Workshop on Programming Models for Emerging Architectures (PMEA)."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mangharam, R., and Saba, A. (-, January 29). Anytime Algorithms for GPU Architectures. Proceedings of the IEEE Real-Time Systems Symposium (IEEE RTSS) 2011, Article No. 31.","DOI":"10.1109\/RTSS.2011.41"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Peters, H., Koper, M., and Luttenberger, N. (,  2010). Efficiently Using a CUDA-enabled GPU as Shared Resource. Proceedings of the IEEE 10th International Conference on Computer and Information Technology (CIT).","DOI":"10.1109\/CIT.2010.204"},{"key":"ref_23","unstructured":"S_GPU Project Webpage. http:\/\/sgpu.ligforge.imag.fr."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, T., Narayana, V.K., and El-Ghazawi, T. Accelerated High-Performance Computing Through Efficient Multi-Process GPU Resource Sharing. Proceedings of the 2012 ACM International Conference on Computing Frontiers (CF\u201912).","DOI":"10.1145\/2212908.2212950"},{"key":"ref_25","unstructured":"Ravi, V., Becchi, M., Agrawal, G., and Chakradhar, S. Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework. Proceedings of the 20th International Symposium on High-Performance Parallel and Distributed Computing."},{"key":"ref_26","unstructured":"Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., and Ranganathan, P. GViM: GPU-accelerated Virtual Machines. Proceedings of the 3rd ACM Workshop on System-Level Virtualization for High Performance Computing."},{"key":"ref_27","unstructured":"Shi, L., Chen, H., and Sun, J. vCUDA: GPU Accelerated High Performance Computing in Virtual Machines. Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS)."},{"key":"ref_28","unstructured":"DAmbra, P., Guarracino, M., and Talia, D. (-, January 31). A GPGPU Transparent Virtualization Component for High Performance Computing Clouds. Proceedings of the 16th International Euro-Par Conference, Ischia, Italy."},{"key":"ref_29","unstructured":"Khronos OpenCL Working Group The OpenCL Specification Ver. 2.0. Available online: http:\/\/www.khronos.org\/registry\/cl\/specs\/opencl-2.0-openclc.pdf."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/99.660313","article-title":"OpenMP: An Industry Standard API for Shared-Memory Programming","volume":"5","author":"Dagum","year":"1998","journal-title":"IEEE Comput. Sci. Eng."},{"key":"ref_31","unstructured":"Gummaraju, J., Coburn, J., Turner, Y., and Rosenblum, M. Streamware: Programming General-Purpose Multicore Processors Using Streams. Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems."},{"key":"ref_32","first-page":"63","article-title":"The NAS Parallel Benchmarks","volume":"5","author":"Bailey","year":"1991","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1002\/cpe.1860","article-title":"Productivity of GPUs Under Different Programming Paradigms","volume":"24","author":"Malik","year":"2012","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1086\/260062","article-title":"The Pricing of Options and Corporate Liabilities","volume":"81","author":"Black","year":"1973","journal-title":"J. Polit. Econ."},{"key":"ref_35","unstructured":"Visual Molecular Dynamics Program Webpage. http:\/\/www.ks.uiuc.edu\/Research\/vmd."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/2\/4\/176\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:50:44Z","timestamp":1760219444000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/2\/4\/176"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,11,19]]},"references-count":35,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2013,12]]}},"alternative-id":["computers2040176"],"URL":"https:\/\/doi.org\/10.3390\/computers2040176","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2013,11,19]]}}}