{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T22:10:09Z","timestamp":1755900609600,"version":"3.44.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Alibaba Group through Alibaba Innovative Research Program"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2025,5,27]]},"abstract":"<jats:p>The fusion of serverless computing and deep learning (DL) has led to serverless inference, offering a promising approach for developing and deploying scalable and cost-efficient deep learning inference services (DLISs). However, the challenge of cold start presents a significant obstacle for DLISs, where DL model size greatly impacts latency. Existing studies mitigate cold starts by extending keep-alive times, which unfortunately leads to decreased resource utilization efficiency. To address this issue, we introduce PipeCo, a system designed to alleviate DLIS cold start. The core concept of PipeCo is to achieve the miniaturization and pipelining of DLIS cold start. Firstly, PipeCo utilizes a vertical partitioning approach to divide each DLIS into multiple slices, prewarming slices in a sequential and overlapping manner to decrease the overall cold-start latency. Secondly, PipeCo employs an attention-based prediction mechanism to estimate periodic patterns in requests and idle containers for scheduling slices. Thirdly, PipeCo incorporates a similarity-based container matcher for the reuse of idle containers. We implemented a prototype of PipeCo on the OpenFaaS platform and conducted extensive experiments using three real-world DLIS repositories. The results demonstrate that PipeCo effectively decreases end-to-end (E2E) latency by up to 62.67% on CPU and 58.81% on GPU clusters and reduces the overall resource usage by 65.31% compared to five state-of-the-art baselines.<\/jats:p>","DOI":"10.1145\/3727125","type":"journal-article","created":{"date-parts":[[2025,6,4]],"date-time":"2025-06-04T09:43:35Z","timestamp":1749030215000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-0629-4865","authenticated-orcid":false,"given":"Jiaang","family":"Duan","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7775-1740","authenticated-orcid":false,"given":"Shiyou","family":"Qian","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6439-5169","authenticated-orcid":false,"given":"Hanwen","family":"Hu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8156-3926","authenticated-orcid":false,"given":"Dingyu","family":"Yang","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0036-9436","authenticated-orcid":false,"given":"Jian","family":"Cao","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1617-3593","authenticated-orcid":false,"given":"Guangtao","family":"Xue","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,3]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"209","volume-title":"International Conference on Internet of Things: Systems, Management and Security, IOTSMS 2019","author":"Kata","year":"2019","unstructured":"Kata containers: An emerging architecture for enabling MEC services in fast and secure way. In Mohammad A. Alsmirat and Yaser Jararweh, editors, International Conference on Internet of Things: Systems, Management and Security, IOTSMS 2019, pages 209--214, 2019."},{"key":"e_1_2_1_2_1","volume-title":"Queueing theory","author":"Adan Ivo","year":"2002","unstructured":"Ivo Adan and Jacques Resing. Queueing theory. Eindhoven University of Technology, 180, 2002."},{"key":"e_1_2_1_3_1","first-page":"419","volume-title":"17th USENIX symposium on networked systems design and implementation (NSDI 20)","author":"Agache Alexandru","year":"2020","unstructured":"Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. Firecracker: Lightweight virtualization for serverless applications. In 17th USENIX symposium on networked systems design and implementation (NSDI 20), pages 419--434, 2020."},{"key":"e_1_2_1_4_1","first-page":"923","volume-title":"2018 Usenix Annual Technical Conference (USENIX ATC 18)","author":"Akkus Istemi Ekin","year":"2018","unstructured":"Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. {SAND}: Towards {High-Performance} serverless computing. In 2018 Usenix Annual Technical Conference (USENIX ATC 18), pages 923--935, 2018."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00073"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3547305.3547313"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3552326.3567506"},{"key":"e_1_2_1_8_1","volume-title":"The frobenius norm and the commutator. Linear algebra and its applications, 429(8--9):1864--1885","author":"B\u00f6ttcher Albrecht","year":"2008","unstructured":"Albrecht B\u00f6ttcher and David Wenzel. The frobenius norm and the commutator. Linear algebra and its applications, 429(8--9):1864--1885, 2008."},{"key":"e_1_2_1_9_1","volume-title":"Microsoft Azure Functions, and Google Cloud Functions","author":"Chowhan Kuldeep","year":"2018","unstructured":"Kuldeep Chowhan. Hands-on Serverless Computing: Build, Run and Orchestrate Serverless Applications Using AWS Lambda, Microsoft Azure Functions, and Google Cloud Functions. Packt Publishing Ltd, 2018."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620678.3624790"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00296-0"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPWRS.2002.804943"},{"key":"e_1_2_1_13_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, 2018."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378512"},{"key":"e_1_2_1_15_1","volume-title":"The llama 3 herd of models. arXiv preprint arXiv:2407.21783","author":"Dubey Abhimanyu","year":"2024","unstructured":"Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446757"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575721"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3629567"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620678.3624783"},{"key":"e_1_2_1_20_1","first-page":"805","volume-title":"2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Kotni Swaroop","year":"2021","unstructured":"Swaroop Kotni, Ajay Nayak, Vinod Ganapathy, and Arkaprava Basu. Faastlane: Accelerating {Function-as-a-Service} workflows. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 805--820, 2021."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210006"},{"key":"e_1_2_1_22_1","first-page":"287","volume-title":"Implementation and Security","author":"Le Dac-Nhuong","year":"2022","unstructured":"Dac-Nhuong Le, Souvik Pal, and Prasant Kumar Pattnaik. Openfaas. Cloud Computing Solutions: Architecture, Data Storage, Implementation and Security, pages 287--303, 2022."},{"key":"e_1_2_1_23_1","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Li Jie","year":"2022","unstructured":"Jie Li, Laiping Zhao, Yanan Yang, Kunlin Zhan, and Keqiu Li. Tetris: Memory-efficient serverless inference through tensor sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), 2022."},{"key":"e_1_2_1_24_1","first-page":"69","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Li Zijun","year":"2022","unstructured":"Zijun Li, Linsong Guo, Quan Chen, Jiagan Cheng, Chuhao Xu, Deze Zeng, Zhuo Song, Tao Ma, Yong Yang, Chao Li, et al. Help rather than recycle: Alleviating cold startup in serverless computing through {Inter-Function} container sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 69--84, 2022."},{"key":"e_1_2_1_25_1","volume-title":"The serverless computing survey: A technical primer for design architecture. ACM Computing Surveys (CSUR), 54(10s):1--34","author":"Li Zijun","year":"2022","unstructured":"Zijun Li, Linsong Guo, Jiagan Cheng, Quan Chen, BingSheng He, and Minyi Guo. The serverless computing survey: A technical primer for design architecture. ACM Computing Surveys (CSUR), 54(10s):1--34, 2022."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386126"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3585007"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1177\/0961000618759414"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_2_1_30_1","volume-title":"Aws lambda case study","author":"Netflix AWS","year":"2020","unstructured":"AWS Netflix. Aws lambda case study, 2020."},{"key":"e_1_2_1_31_1","volume-title":"Noah Constant, Ji Ma, Keith B Hall, Daniel Cer, and Yinfei Yang. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877","author":"Ni Jianmo","year":"2021","unstructured":"Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith B Hall, Daniel Cer, and Yinfei Yang. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877, 2021."},{"key":"e_1_2_1_32_1","volume-title":"A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730","author":"Nie Yuqi","year":"2022","unstructured":"Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730, 2022."},{"key":"e_1_2_1_33_1","first-page":"57","volume-title":"Rapid task provisioning with {Serverless-Optimized} containers. In 2018 USENIX annual technical conference (USENIX ATC 18)","author":"Oakes Edward","year":"2018","unstructured":"Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. {SOCK}: Rapid task provisioning with {Serverless-Optimized} containers. In 2018 USENIX annual technical conference (USENIX ATC 18), pages 57--70, 2018."},{"issue":"1","key":"e_1_2_1_34_1","first-page":"430","article-title":"Deflate compression algorithm","volume":"4","author":"Oswal Savan","year":"2016","unstructured":"Savan Oswal, Anjali Singh, and Kirthi Kumari. Deflate compression algorithm. International Journal of Engineering Research and General Science, 4(1):430--436, 2016.","journal-title":"International Journal of Engineering Research and General Science"},{"key":"e_1_2_1_35_1","first-page":"8024","volume-title":"Advances in neural information processing systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, pages 8024--8035, 2019."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507750"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3406011"},{"key":"e_1_2_1_38_1","first-page":"205","volume-title":"2020 USENIX Annual Technical Conference, USENIX ATC","author":"Shahrad Mohammad","year":"2020","unstructured":"Mohammad Shahrad, Rodrigo Fonseca, I\u00f1igo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In 2020 USENIX Annual Technical Conference, USENIX ATC, pages 205--218, 2020."},{"key":"e_1_2_1_39_1","first-page":"205","volume-title":"Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In 2020 USENIX annual technical conference (USENIX ATC 20)","author":"Shahrad Mohammad","year":"2020","unstructured":"Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In 2020 USENIX annual technical conference (USENIX ATC 20), pages 205--218, 2020."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compedu.2020.103862"},{"key":"e_1_2_1_41_1","volume-title":"Sandbox for training deep learning networks","author":"S\u00e9mery Oleg","year":"2021","unstructured":"Oleg S\u00e9mery. Sandbox for training deep learning networks., 2021. Retrieved from https:\/\/github.com\/osmr\/imgclsmob."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-21395-3_8"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446714"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4842-2199-0_7"},{"key":"e_1_2_1_45_1","first-page":"443","volume-title":"2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Wang Ao","year":"2021","unstructured":"Ao Wang, Shuai Chang, Huangshi Tian, Hongqi Wang, Haoran Yang, Huiba Li, Rui Du, and Yue Cheng. {FaaSNet}: Scalable and fast provisioning of custom serverless container runtimes at alibaba cloud function compute. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 443--457, 2021."},{"key":"e_1_2_1_46_1","volume-title":"Xstore: Fast rdma-based ordered key-value store using remote learned cache. ACM Transactions on Storage (TOS), 17(3):1--32","author":"Wei Xingda","year":"2021","unstructured":"Xingda Wei, Rong Chen, Haibo Chen, and Binyu Zang. Xstore: Fast rdma-based ordered key-value store using remote learned cache. ACM Transactions on Storage (TOS), 17(3):1--32, 2021."},{"key":"e_1_2_1_47_1","first-page":"497","volume-title":"17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)","author":"Wei Xingda","year":"2023","unstructured":"Xingda Wei, Fangming Lu, Tianxia Wang, Jinyu Gu, Yuhan Yang, Rong Chen, and Haibo Chen. No provisioned concurrency: Fast {RDMA-codesigned} remote fork for serverless computing. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 497--517, 2023."},{"key":"e_1_2_1_48_1","volume-title":"Autoformer: Decomposition transformers with autocorrelation for long-term series forecasting. Advances in neural information processing systems, 34:22419--22430","author":"Wu Haixu","year":"2021","unstructured":"Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with autocorrelation for long-term series forecasting. Advances in neural information processing systems, 34:22419--22430, 2021."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507709"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17325"},{"key":"e_1_2_1_51_1","first-page":"27268","volume-title":"International conference on machine learning","author":"Zhou Tian","year":"2022","unstructured":"Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, pages 27268--27286, 2022."},{"key":"e_1_2_1_52_1","first-page":"1","volume-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"1","author":"Zhou Zhuangzhuang","year":"2023","unstructured":"Zhuangzhuang Zhou, Yanqi Zhang, and Christina Delimitrou. Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, pages 1--14, 2023."}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3727125","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3727125","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T21:30:32Z","timestamp":1755898232000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3727125"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,27]]},"references-count":52,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5,27]]}},"alternative-id":["10.1145\/3727125"],"URL":"https:\/\/doi.org\/10.1145\/3727125","relation":{},"ISSN":["2476-1249"],"issn-type":[{"type":"electronic","value":"2476-1249"}],"subject":[],"published":{"date-parts":[[2025,5,27]]},"assertion":[{"value":"2025-06-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}