{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T07:20:06Z","timestamp":1768548006217,"version":"3.49.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,14]],"date-time":"2023-12-14T00:00:00Z","timestamp":1702512000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022YFE0113200"],"award-info":[{"award-number":["2022YFE0113200"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U21A20464"],"award-info":[{"award-number":["U21A20464"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Alibaba Group through Alibaba Research Intern Program"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>The rapid adoption of AI and convenience offered by cloud services have resulted in the growing demands for GPUs in the cloud. Generally, GPUs are physically attached to host servers as PCIe devices. However, the fixed assembly combination of host servers and GPUs is extremely inefficient in resource utilization, upgrade, and maintenance. Due to these issues, the GPU disaggregation technique has been proposed to decouple GPUs from host servers. It aggregates GPUs into a pool and allocates GPU node(s) according to user demands. However, existing GPU disaggregation systems have flaws in software-hardware compatibility, disaggregation scope, and capacity.<\/jats:p>\n          <jats:p>\n            In this article, we present a new implementation of\n            <jats:italic>datacenter-scale<\/jats:italic>\n            GPU disaggregation, named DxPU. DxPU efficiently solves the above problems and can flexibly allocate as many GPU node(s) as users demand. To understand the performance overhead incurred by DxPU, we build up a performance model for AI specific workloads. With the guidance of modeling results, we develop a prototype system, which has been deployed into the datacenter of a leading cloud provider for a test run. We also conduct detailed experiments to evaluate the performance overhead caused by our system. The results show that the overhead of DxPU is less than 10%, compared with native GPU servers, in most of user scenarios.\n          <\/jats:p>","DOI":"10.1145\/3617995","type":"journal-article","created":{"date-parts":[[2023,10,5]],"date-time":"2023-10-05T15:41:35Z","timestamp":1696520495000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["DxPU: Large-scale Disaggregated GPU Pools in the Datacenter"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7794-2520","authenticated-orcid":false,"given":"Bowen","family":"He","sequence":"first","affiliation":[{"name":"Zhejiang University and Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3191-5636","authenticated-orcid":false,"given":"Xiao","family":"Zheng","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2321-4910","authenticated-orcid":false,"given":"Yuan","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University and Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6923-5312","authenticated-orcid":false,"given":"Weinan","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7610-4736","authenticated-orcid":false,"given":"Yajin","family":"Zhou","sequence":"additional","affiliation":[{"name":"Zhejiang University, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4528-0993","authenticated-orcid":false,"given":"Xin","family":"Long","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-6740-227X","authenticated-orcid":false,"given":"Pengcheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-9230-3589","authenticated-orcid":false,"given":"Xiaowei","family":"Lu","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4624-5174","authenticated-orcid":false,"given":"Linquan","family":"Jiang","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5792-322X","authenticated-orcid":false,"given":"Qiang","family":"Liu","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7272-8143","authenticated-orcid":false,"given":"Dennis","family":"Cai","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-3065-7646","authenticated-orcid":false,"given":"Xiantao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Alibaba Group, China"}]}],"member":"320","published-online":{"date-parts":[[2023,12,14]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2014. PCI Express\u00ae Electrical Basics. Retrieved from https:\/\/pcisig.com\/sites\/default\/files\/files\/PCI_Express_Electrical_Basics.pdf"},{"key":"e_1_3_2_3_2","unstructured":"2018. Intel Rack Scale Design. Retrieved from https:\/\/www.kernel.org\/doc\/Documentation\/ntb.txt"},{"key":"e_1_3_2_4_2","unstructured":"2018. Intel Rack Scale Design Architecture. Retrieved from https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/white-papers\/rack-scale-design-architecture-white-paper.pdf"},{"key":"e_1_3_2_5_2","unstructured":"2019. The Impact of Bit Errors in PCI Express\u00ae Links. Retrieved from https:\/\/www.asteralabs.com\/insights\/impact-of-bit-errors-in-pci-express-links\/"},{"key":"e_1_3_2_6_2","unstructured":"2019. PCI Express\u00ae 5.0 Architecture Channel Insertion Loss Budget. Retrieved from https:\/\/pcisig.com\/pci-express%C2%AE-50-architecture-channel-insertion-loss-budget-0"},{"key":"e_1_3_2_7_2","unstructured":"2021. AMD I\/O Virtualization Technology (IOMMU) Specification. Retrieved from https:\/\/www.amd.com\/system\/files\/TechDocs\/48882_IOMMU.pdf"},{"key":"e_1_3_2_8_2","unstructured":"2021. cGPU. Retrieved from https:\/\/partners-intl.aliyun.com\/help\/zh\/elastic-gpu-service\/latest\/what-is-the-cgpu-service"},{"key":"e_1_3_2_9_2","unstructured":"2021. NVM Express. Retrieved from https:\/\/nvmexpress.org\/wp-content\/uploads\/NVM-Express-Base-Specification-2.0b-2021.12.18-Ratified.pdf"},{"key":"e_1_3_2_10_2","unstructured":"2021. NVM Express over Fabrics. Retrieved from https:\/\/nvmexpress.org\/wp-content\/uploads\/NVMe-over-Fabrics-1.1a-2021.07.12-Ratified.pdf"},{"key":"e_1_3_2_11_2","unstructured":"2022. Cisco Catalyst 9300 Series Switches. Retrieved from https:\/\/www.cisco.com\/c\/en\/us\/products\/switches\/catalyst-9300-series-switches\/index.html"},{"key":"e_1_3_2_12_2","unstructured":"2022. DeepLearningExamples. Retrieved from https:\/\/github.com\/NVIDIA\/DeepLearningExamples"},{"key":"e_1_3_2_13_2","unstructured":"2022. Ethernet Cables Explained. Retrieved from https:\/\/www.tripplite.com\/products\/ethernet-cable-types"},{"key":"e_1_3_2_14_2","unstructured":"2022. glmark2. Retrieved from https:\/\/github.com\/glmark2\/glmark2"},{"key":"e_1_3_2_15_2","unstructured":"2022. heaven. Retrieved from https:\/\/benchmark.unigine.com\/heaven"},{"key":"e_1_3_2_16_2","unstructured":"2022. Huawei CloudEngine Data Center Storage Network Switches. Retrieved from https:\/\/e.huawei.com\/en\/products\/enterprise-networking\/switches\/data-center-switches\/data-center-storage-network"},{"key":"e_1_3_2_17_2","unstructured":"2022. Intel Rack Scale Design. Retrieved from https:\/\/www.intel.co.uk\/content\/www\/uk\/en\/architecture-and-technology\/rack-scale-design-overview.html"},{"key":"e_1_3_2_18_2","unstructured":"2022. Liqid Powered GPU Composable AI Platform. Retrieved from https:\/\/www.liqid.com\/solutions\/ai\/liqid-composable-ai-platform"},{"key":"e_1_3_2_19_2","unstructured":"2022. NVIDIA CUDA Samples. Retrieved from https:\/\/github.com\/NVIDIA\/cuda-samples"},{"key":"e_1_3_2_20_2","unstructured":"2022. NVIDIA DGX Systems. Retrieved from https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-systems\/"},{"key":"e_1_3_2_21_2","unstructured":"2022. NVIDIA GRID. Retrieved from https:\/\/www.nvidia.cn\/design-visualization\/technologies\/grid-technology\/"},{"key":"e_1_3_2_22_2","unstructured":"2022. NVIDIA Nsight System. Retrieved from https:\/\/developer.nvidia.com\/nsight-systems"},{"key":"e_1_3_2_23_2","unstructured":"2022. PCI Express Base Specification Revision 6.0 Version 1.0. Retrieved from https:\/\/members.pcisig.com\/wg\/PCI-SIG\/document\/16609"},{"key":"e_1_3_2_24_2","unstructured":"2022. SGXLock: Towards efficiently establishing mutual distrust between host application and enclave for SGX. In 31st USENIX Security Symposium (USENIX Security\u201922) . USENIX Association. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity22\/presentation\/chen-yuan"},{"key":"e_1_3_2_25_2","unstructured":"2022. Tensorflow NGC Container. Retrieved from https:\/\/catalog.ngc.nvidia.com\/orgs\/nvidia\/containers\/tensorflow"},{"key":"e_1_3_2_26_2","unstructured":"2022. tf_cnn_benchmarks. Retrieved from https:\/\/github.com\/tensorflow\/benchmarks"},{"key":"e_1_3_2_27_2","unstructured":"2022. valley. Retrieved from https:\/\/benchmark.unigine.com\/valley"},{"key":"e_1_3_2_28_2","unstructured":"2022. VMware vSphere Bitfusion. Retrieved from https:\/\/docs.vmware.com\/en\/VMware-vSphere-Bitfusion\/index.html"},{"issue":"3","key":"e_1_3_2_29_2","article-title":"Intel virtualization technology for directed I\/O.","volume":"10","author":"Abramson Darren","year":"2006","unstructured":"Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. 2006. Intel virtualization technology for directed I\/O. Intel Technol. J. 10, 3 (2006).","journal-title":"Intel Technol. J."},{"issue":"3","key":"e_1_3_2_30_2","first-page":"6","article-title":"The next generation of Intel IXP network processors","volume":"6","author":"Adiletta Matthew","year":"2002","unstructured":"Matthew Adiletta, Mark Rosenbluth, Debra Bernstein, Gilbert Wolrich, and Hugh Wilkinson. 2002. The next generation of Intel IXP network processors. INTEL Technol. J. 6, 3 (2002), 6\u201318.","journal-title":"INTEL Technol. J."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2007.443"},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","unstructured":"Maciej Bielski Ilias Syrigos Kostas Katrinis Dimitris Syrivelis Andrea Reale Dimitris Theodoropoulos Nikolaos Alachiotis Dionisios Pnevmatikatos E. H. Pap George Zervas and others. 2018. dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter. In 2018 Design Automation & Test in Europe Conference & Exhibition (DATE) IEEE 1093\u20131098.","DOI":"10.23919\/DATE.2018.8342174"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2654822.2541967"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3149457.3149466"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS.2010.5547126"},{"key":"e_1_3_2_36_2","first-page":"739","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201922)","author":"Fingler Henrique","year":"2022","unstructured":"Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, Emmett Witchel, and Christopher J. Rossbach. 2022. DGSF: Disaggregated GPUs for serverless functions. In IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201922). IEEE, 739\u2013750."},{"key":"e_1_3_2_37_2","first-page":"249","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","author":"Gao Peter X.","year":"2016","unstructured":"Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. 2016. Network requirements for resource disaggregation. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916). 249\u2013264."},{"key":"e_1_3_2_38_2","first-page":"1721","volume-title":"IEEE 10th International Conference on High Performance Computing and Communications and the IEEE International Conference on Embedded and Ubiquitous Computing","author":"Gottschlag Mathias","year":"2013","unstructured":"Mathias Gottschlag, Marius Hillenbrand, Jens Kehne, Jan Stoess, and Frank Bellosa. 2013. LoGV: Low-overhead GPGPU virtualization. In IEEE 10th International Conference on High Performance Computing and Communications and the IEEE International Conference on Embedded and Ubiquitous Computing. IEEE, 1721\u20131726."},{"key":"e_1_3_2_39_2","first-page":"1","volume-title":"IEEE International Conference on Cloud Computing in Emerging Markets (CCEM\u201919)","author":"Guleria Anubhav","year":"2019","unstructured":"Anubhav Guleria, J. Lakshmi, and Chakri Padala. 2019. EMF: Disaggregated GPUs in datacenters for efficiency, modularity and flexibility. In IEEE International Conference on Cloud Computing in Emerging Markets (CCEM\u201919). IEEE, 1\u20138."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3068281"},{"key":"e_1_3_2_41_2","first-page":"629","volume-title":"14th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201917)","author":"Hsieh Kevin","year":"2017","unstructured":"Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, and Onur Mutlu. 2017. Gaia: Geo-distributed machine learning approaching LAN speeds. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201917). USENIX Association, 629\u2013647. Retrieved from https:\/\/www.usenix.org\/conference\/nsdi17\/technical-sessions\/presentation\/hsieh"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2017.2736066"},{"issue":"1","key":"e_1_3_2_43_2","first-page":"1","article-title":"SmartIO: Zero-overhead device sharing through PCIe networking","volume":"38","author":"Markussen Jonas","year":"2021","unstructured":"Jonas Markussen, Lars Bj\u00f8rlykke Kristiansen, P\u00e5l Halvorsen, Halvor Kielland-Gyrud, H\u00e5kon Kvale Stensland, and Carsten Griwodz. 2021. SmartIO: Zero-overhead device sharing through PCIe networking. ACM Trans. Comput. Syst. 38, 1-2 (2021), 1\u201378.","journal-title":"ACM Trans. Comput. Syst."},{"key":"e_1_3_2_44_2","first-page":"327","volume-title":"Conference of the ACM Special Interest Group on Data Communication","author":"Neugebauer Rolf","year":"2018","unstructured":"Rolf Neugebauer, Gianni Antichi, Jos\u00e9 Fernando Zazo, Yury Audzevich, Sergio L\u00f3pez-Buedo, and Andrew W. Moore. 2018. Understanding PCIe performance for end host networking. In Conference of the ACM Special Interest Group on Data Communication. 327\u2013341."},{"key":"e_1_3_2_45_2","first-page":"1207","volume-title":"Conference on High Performance Computing, Networking Storage and Analysis","author":"Oikawa Minoru","year":"2012","unstructured":"Minoru Oikawa, Atsushi Kawai, Kentaro Nomura, Kenji Yasuoka, Kazuyuki Yoshikawa, and Tetsu Narumi. 2012. DS-CUDA: A middleware to use many GPUs in the cloud environment. In Conference on High Performance Computing, Networking Storage and Analysis. IEEE, 1207\u20131214."},{"issue":"6","key":"e_1_3_2_46_2","first-page":"804","article-title":"vCUDA: GPU-accelerated high-performance computing in virtual machines","volume":"61","author":"Shi Lin","year":"2011","unstructured":"Lin Shi, Hao Chen, Jianhua Sun, and Kenli Li. 2011. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61, 6 (2011), 804\u2013816.","journal-title":"IEEE Trans. Comput."},{"key":"e_1_3_2_47_2","unstructured":"Yusuke Suzuki Shinpei Kato Hiroshi Yamada and Kenji Kono. 2014. GPUvm: Why Not Virtualizing GPUs at the Hypervisor?109\u2013120. Retrieved from https:\/\/www.usenix.org\/conference\/atc14\/technical-sessions\/presentation\/suzuki"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1145\/1134760.1134762","volume-title":"ACM\/Usenix International Conference on Virtual Execution Environments","volume":"14","author":"Doorn Leendert Van","year":"2006","unstructured":"Leendert Van Doorn. 2006. Hardware virtualization trends. In ACM\/Usenix International Conference on Virtual Execution Environments, Vol. 14. 45\u201345."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377454"},{"key":"e_1_3_2_50_2","first-page":"1","volume-title":"Innovative Parallel Computing (InPar\u201912)","author":"Xiao Shucai","year":"2012","unstructured":"Shucai Xiao, Pavan Balaji, Qian Zhu, Rajeev Thakur, Susan Coghlan, Heshan Lin, Gaojin Wen, Jue Hong, and Wu-chun Feng. 2012. VOCL: An optimized environment for transparent virtualization of graphics processing units. In Innovative Parallel Computing (InPar\u201912). IEEE, 1\u201312."},{"key":"e_1_3_2_51_2","first-page":"579","volume-title":"USENIX Annual Technical Conference (USENIX ATC\u201916)","author":"Xue Mochi","year":"2016","unstructured":"Mochi Xue, Kun Tian, Yaozu Dong, Jiacheng Ma, Jiajun Wang, Zhengwei Qi, Bingsheng He, and Haibing Guan. 2016. gScale: Scaling up GPU virtualization with dynamic sharing of graphics memory space. In USENIX Annual Technical Conference (USENIX ATC\u201916). USENIX Association, 579\u2013590. Retrieved from https:\/\/www.usenix.org\/conference\/atc16\/technical-sessions\/presentation\/xue"},{"key":"e_1_3_2_52_2","first-page":"430","volume-title":"European Symposium on Research in Computer Security","author":"Zhang Bingsheng","year":"2021","unstructured":"Bingsheng Zhang, Yuan Chen, Jiaqi Li, Yajin Zhou, Phuc Thai, Hong-Sheng Zhou, and Kui Ren. 2021. Succinct scriptable NIZK via trusted hardware. In European Symposium on Research in Computer Security. Springer, 430\u2013451."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617995","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3617995","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:58Z","timestamp":1750178278000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3617995"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,14]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3617995"],"URL":"https:\/\/doi.org\/10.1145\/3617995","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,14]]},"assertion":[{"value":"2023-01-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}