{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,21]],"date-time":"2025-09-21T07:13:43Z","timestamp":1758438823438,"version":"3.44.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"National Key Research and Development Program of China","award":["2024YFB4505703"],"award-info":[{"award-number":["2024YFB4505703"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62302302, 62232011"],"award-info":[{"award-number":["62302302, 62232011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Natural Science Foundation of Shanghai Municipality","award":["24ZR1430500"],"award-info":[{"award-number":["24ZR1430500"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>Integrating GPUs into serverless computing platforms is crucial for improving efficiency. Many GPU functions, such as DNN inferences and scientific services, benefit from GPU usage, which requires only tens to hundreds of milliseconds for pure computation. Under these circumstances, fast data loading is imperative for function performance. However, existing GPU serverless systems face significant data stall issues, leading to extremely low GPU efficiency.<\/jats:p>\n          <jats:p>Faced with the above problems, we observe opportunities to optimize data loading, such as data preloading and deduplicated data loading. However, these optimizations are impossible in existing GPU serverless systems due to the lack of insights into data information, such as data sizes and read-write attributes of function inputs. To address this, we propose a novel GPU serverless system, EDAS. EDAS first enhances user request specifications, allowing users to annotate data retrieved by GPU functions from the database with additional attributes. Based on this, EDAS takes over data loading from GPU functions and proposes two innovative data loading management schemes: a parallelized data loading scheme and a multi-stage resource exit scheme. Our experimental results show that EDAS reduces function duration by 16.2\u00d7 and improves system throughput by 1.91\u00d7 compared with the state-of-the-art serverless platform.<\/jats:p>","DOI":"10.1145\/3743137","type":"journal-article","created":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T07:12:59Z","timestamp":1750144379000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["EDAS: Enabling Fast Data Loading for GPU Serverless Computing"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1561-5329","authenticated-orcid":false,"given":"Han","family":"Zhao","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]},{"name":"Alibaba Cloud Computing","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6646-5260","authenticated-orcid":false,"given":"Weihao","family":"Cui","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5832-0347","authenticated-orcid":false,"given":"Quan","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4706-8451","authenticated-orcid":false,"given":"Zijun","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2880-7100","authenticated-orcid":false,"given":"Zhenhua","family":"Han","sequence":"additional","affiliation":[{"name":"Unaffiliated","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8276-1868","authenticated-orcid":false,"given":"Nan","family":"Wang","sequence":"additional","affiliation":[{"name":"Alibaba Cloud Computing","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2192-5737","authenticated-orcid":false,"given":"Yu","family":"Feng","sequence":"additional","affiliation":[{"name":"John Hopcropt Center, Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8211-2812","authenticated-orcid":false,"given":"Jieru","family":"Zhao","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9480-5632","authenticated-orcid":false,"given":"Chen","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5660-5493","authenticated-orcid":false,"given":"Jingwen","family":"Leng","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0034-2302","authenticated-orcid":false,"given":"Minyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Computer Science, Shanghai Jiao Tong University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,19]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Apache OpenWhisk is a serverless open source cloud platform. Retrieved from https:\/\/openwhisk.apache.org\/. OpenWhisk. Accessed: January 23 2025."},{"key":"e_1_3_1_3_2","unstructured":"AWS Lambda. AWS. Retrieved from https:\/\/aws.amazon.com\/lambda\/. Accessed: January 23 2025."},{"key":"e_1_3_1_4_2","unstructured":"Best practices for GPU-accelerated instances. Alibaba. Retrieved from https:\/\/www.alibabacloud.com\/help\/en\/functioncompute\/fc-3-0\/product-overview\/instance-types-and-usage-modes. Accessed: January 23 2025."},{"key":"e_1_3_1_5_2","unstructured":"CUDA Interprocess Communication. Nvidia. Retrieved from https:\/\/docs.nvidia.com\/cuda\/cuda-c-programmingguide\/index.html?highlight=interprocess#interprocess-communication. Accessed: January 23 2025."},{"key":"e_1_3_1_6_2","unstructured":"GPU-Enabled Docker Image to Host a Python PyTorch Azure Function. Microsoft. Retrieved from https:\/\/github.com\/puthurr\/python-azure-function-gpu. Accessed: January 23 2025."},{"key":"e_1_3_1_7_2","unstructured":"MinIO. Retrieved from https:\/\/min.io\/. Accessed: January 23 2025."},{"key":"e_1_3_1_8_2","unstructured":"NVIDIA Container Toolkit. Nvidia. Retrieved from https:\/\/github.com\/NVIDIA\/nvidia-container-toolkit. Accessed: January 23 2025."},{"key":"e_1_3_1_9_2","unstructured":"NVIDIA Multi-Process Service. Nvidia. Retrieved from https:\/\/docs.nvidia.com\/deploy\/mps\/index.html. Accessed: January 23 2025."},{"key":"e_1_3_1_10_2","unstructured":"qGPU Overview. Tencent. Retrieved from https:\/\/www.tencentcloud.com\/document\/product\/457\/42973. Accessed: January 23 2025."},{"key":"e_1_3_1_11_2","unstructured":"Remote Procedure Call (RPC) framework. Google. Retrieved from https:\/\/grpc.io\/. Accessed: January 23 2025."},{"key":"e_1_3_1_12_2","unstructured":"What is the cGPU service. Alibaba. Retrieved from https:\/\/www.alibabacloud.com\/help\/en\/elastic-gpu-service\/latest\/what-isthe-cgpu-service. Accessed: January 23 2025."},{"key":"e_1_3_1_13_2","article-title":"Docker","author":"Year of the Docker release or last update","unstructured":"Year of the Docker release or last update. Docker. Retrieved from https:\/\/www.docker.com\/. (Year of the Docker release or last update). Accessed: January 23, 2025.","journal-title":"https:\/\/www.docker.com\/"},{"key":"e_1_3_1_14_2","first-page":"419","volume-title":"17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)","author":"Agache Alexandru","year":"2020","unstructured":"Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). 419\u2013434."},{"key":"e_1_3_1_15_2","first-page":"923","volume-title":"2018 Usenix Annual Technical Conference (USENIX ATC 18)","author":"Akkus Istemi Ekin","year":"2018","unstructured":"Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards high-performance serverless computing. In 2018 Usenix Annual Technical Conference (USENIX ATC 18). 923\u2013935."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00073"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.14778\/3547305.3547313"},{"key":"e_1_3_1_18_2","first-page":"499","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. Pipeswitch: Fast pipelined context switching for deep learning applications. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. 499\u2013514."},{"key":"e_1_3_1_19_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507732"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378512"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2020.3023302"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS53621.2022.00077"},{"key":"e_1_3_1_24_2","article-title":"ggerganov\/llama.cpp: Inference of Meta\u2019s LLaMA model (and others) in pure C\/C++","author":"Gerganov Georgi","unstructured":"Georgi Gerganov. ggerganov\/llama.cpp: Inference of Meta\u2019s LLaMA model (and others) in pure C\/C++. Retrieved from https:\/\/github.com\/ggerganov\/llama.cpp. (n.d.). Accessed: January 23, 2025.","journal-title":"https:\/\/github.com\/ggerganov\/llama.cpp"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2016.94"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3605573.3605638"},{"key":"e_1_3_1_27_2","unstructured":"Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta Adam Coates et\u00a0al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv:1412.5567. Retrieved from https:\/\/arxiv.org\/abs\/1412.5567"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/IC2E.2018.00052"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00019"},{"key":"e_1_3_1_32_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ili\u0107 Daniel Hesslow et\u00a0al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022)."},{"key":"e_1_3_1_33_2","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Li Jie","year":"2022","unstructured":"Jie Li, Laiping Zhao, Yanan Yang, Kunlin Zhan, and Keqiu Li. 2022. Tetris: Memory-efficient serverless inference through tensor sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22)."},{"key":"e_1_3_1_34_2","first-page":"53","volume-title":"2022 USENIX Annual Technical Conference, USENIX ATC 2022, Carlsbad, CA, USA, July 11-13, 2022","author":"Li Zijun","year":"2022","unstructured":"Zijun Li, Jiagan Cheng, Quan Chen, Eryu Guan, Zizheng Bian, Yi Tao, Bin Zha, Qiang Wang, Weidong Han, and Minyi Guo. 2022. RunD: A lightweight secure container runtime for high-density deployment and high-concurrency startup in serverless computing. In 2022 USENIX Annual Technical Conference, USENIX ATC 2022, Carlsbad, CA, USA, July 11-13, 2022. USENIX Association, 53\u201368. Retrieved from https:\/\/www.usenix.org\/conference\/atc22\/presentation\/li-zijun-rund"},{"key":"e_1_3_1_35_2","first-page":"69","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Li Zijun","year":"2022","unstructured":"Zijun Li, Linsong Guo, Quan Chen, Jiagan Cheng, Chuhao Xu, Deze Zeng, Zhuo Song, Tao Ma, Yong Yang, Chao Li, et\u00a0al. 2022. Help rather than recycle: Alleviating cold startup in serverless computing through \\(\\lbrace\\) Inter-Function \\(\\rbrace\\) container sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 69\u201384."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CloudCom.2019.00025"},{"key":"e_1_3_1_37_2","first-page":"881","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation","author":"Ma Lingxiao","year":"2020","unstructured":"Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou. 2020. Rammer: Enabling holistic deep learning compiler optimizations with rtasks. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. 881\u2013897."},{"issue":"10","key":"e_1_3_1_38_2","first-page":"3357034","article-title":"Agile cold starts for scalable serverless.","volume":"2019","author":"Mohan Anup","year":"2019","unstructured":"Anup Mohan, Harshad S Sane, Kshitij Doshi, Saikrishna Edupuganti, Naren Nayak, and Vadim Sukhomlinov. 2019. Agile cold starts for scalable serverless. HotCloud 2019, 10.5555 (2019), 3357034\u20133357060.","journal-title":"HotCloud"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2020.01.004"},{"key":"e_1_3_1_40_2","first-page":"57","volume-title":"2018  \\(\\lbrace\\) USENIX \\(\\rbrace\\)  Annual Technical Conference ( \\(\\lbrace\\) USENIX \\(\\rbrace\\)  \\(\\lbrace\\) ATC \\(\\rbrace\\)  18)","author":"Oakes Edward","year":"2018","unstructured":"Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. \\(\\lbrace\\) SOCK \\(\\rbrace\\) : Rapid task provisioning with serverless-optimized containers. In 2018 \\(\\lbrace\\) USENIX \\(\\rbrace\\) Annual Technical Conference ( \\(\\lbrace\\) USENIX \\(\\rbrace\\) \\(\\lbrace\\) ATC \\(\\rbrace\\) 18). 57\u201370."},{"key":"e_1_3_1_41_2","first-page":"8024","volume-title":"Advances in Neural Information Processing Systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et\u00a0al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024\u20138035."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_1_43_2","unstructured":"Mohammad Shahrad Rodrigo Fonseca Inigo Goiri Gohar Chaudhry Paul Batum Jason Cooke Eduardo Laureano Colby Tresness Mark Russinovich and Ricardo Bianchini. 2020. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 205\u2013218."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.112"},{"key":"e_1_3_1_45_2","unstructured":"K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (ICLR 2015). Computational and Biological Learning Society."},{"key":"e_1_3_1_46_2","first-page":"27","article-title":"Parboil: A revised benchmark suite for scientific and commercial throughput computing","volume":"127","author":"Stratton John A","year":"2012","unstructured":"John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127, 7.2 (2012), 27.","journal-title":"Center for Reliable and High-Performance Computing"},{"key":"e_1_3_1_47_2","unstructured":"Ilya Sutskever Oriol Vinyals and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems 2 (2014) 3104\u20133112."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6420\/ace9d4"},{"key":"e_1_3_1_50_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446714"},{"key":"e_1_3_1_52_2","first-page":"1","volume-title":"Proceedings of the 2nd International Workshop on Serverless Computing","author":"Eyk Erwin Van","year":"2017","unstructured":"Erwin Van Eyk, Alexandru Iosup, Simon Seif, and Markus Th\u00f6mmes. 2017. The SPEC cloud group\u2019s research vision on FaaS and serverless architectures. In Proceedings of the 2nd International Workshop on Serverless Computing. 1\u20134."},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/1095810.1095825"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303978"},{"key":"e_1_3_1_55_2","first-page":"533","volume-title":"OSDI","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. AntMan: Dynamic scaling on GPU clusters for deep learning. In OSDI. 533\u2013548."},{"key":"e_1_3_1_56_2","first-page":"98","article-title":"Fine-grained GPU sharing primitives for deep learning applications","volume":"2","author":"Yu Peifeng","year":"2020","unstructured":"Peifeng Yu and Mosharaf Chowdhury. 2020. Fine-grained GPU sharing primitives for deep learning applications. Proceedings of Machine Learning and Systems 2 (2020), 98\u2013111.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00907"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3743137","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T00:48:57Z","timestamp":1758329337000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3743137"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,19]]},"references-count":56,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3743137"],"URL":"https:\/\/doi.org\/10.1145\/3743137","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,9,19]]},"assertion":[{"value":"2025-01-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-07","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}