{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T17:54:52Z","timestamp":1769018092578,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,4,22]],"date-time":"2024-04-22T00:00:00Z","timestamp":1713744000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,4,22]]},"DOI":"10.1145\/3642970.3655833","type":"proceedings-article","created":{"date-parts":[[2024,4,19]],"date-time":"2024-04-19T10:46:57Z","timestamp":1713523617000},"page":"184-191","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Sponge"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3232-5657","authenticated-orcid":false,"given":"Kamran","family":"Razavi","sequence":"first","affiliation":[{"name":"Technical University of Darmstadt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3799-5702","authenticated-orcid":false,"given":"Saeid","family":"Ghafouri","sequence":"additional","affiliation":[{"name":"Queen Mary University of London"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4713-5327","authenticated-orcid":false,"given":"Max","family":"M\u00fchlh\u00e4user","sequence":"additional","affiliation":[{"name":"Technical University of Darmstadt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9342-0703","authenticated-orcid":false,"given":"Pooyan","family":"Jamshidi","sequence":"additional","affiliation":[{"name":"University of South Carolina"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7181-6128","authenticated-orcid":false,"given":"Lin","family":"Wang","sequence":"additional","affiliation":[{"name":"Paderborn University"}]}],"member":"320","published-online":{"date-parts":[[2024,4,22]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"USENIX Symposium on Networked Systems Design and Implementation (NSDI). 1063--1081","author":"Ahmad Fawad","year":"2020","unstructured":"Fawad Ahmad, Hang Qiu, Ray Eells, Fan Bai, and Ramesh Govindan. 2020. CarMap: Fast 3D Feature Map Updates for Automobiles. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 1063--1081."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1465482.1465560"},{"key":"e_1_3_2_1_3_1","unstructured":"The Kubernetes Authors. 2023. In-place Resource Resize for Kubernetes Pods. https:\/\/kubernetes.io\/blog\/2023\/05\/12\/in-place-pod-resize-alpha\/. (2023). Accessed on 30.01.2024."},{"key":"e_1_3_2_1_4_1","unstructured":"The Kubernetes Authors. 2024. Kubernetes Horizontal Pod Autoscaling. https:\/\/kubernetes.io\/docs\/tasks\/run-application\/horizontal-pod-autoscale\/. (2024). Accessed on 30.01.2024."},{"key":"e_1_3_2_1_5_1","unstructured":"The Kubernetes Authors. 2024. Kubernetes Vertical Pod Autoscaling. https:\/\/cloud.google.com\/kubernetes-engine\/docs\/concepts\/verticalpodautoscaler\/. (2024). Accessed on 30.01.2024."},{"key":"e_1_3_2_1_6_1","unstructured":"The Kubernetes Authors. 2024. Minikube. https:\/\/minikube.sigs.k8s.io\/.(2024). Accessed on 30.01.2024."},{"key":"e_1_3_2_1_7_1","unstructured":"The Prometheus Authors. 2024. Prometheus monitoring and alerting toolkit. https:\/\/prometheus.io\/. (2024). Accessed on 30.01.2024."},{"key":"e_1_3_2_1_8_1","volume-title":"BigMEC: Scalable Service Migration for Mobile Edge Computing. In 2022 IEEE\/ACM 7th Symposium on Edge Computing (SEC). IEEE, 136--148","author":"Brandherm Florian","year":"2022","unstructured":"Florian Brandherm, Julien Gedeon, Osama Abboud, and Max M\u00fchlh\u00e4user. 2022. BigMEC: Scalable Service Migration for Mobile Edge Computing. In 2022 IEEE\/ACM 7th Symposium on Edge Computing (SEC). IEEE, 136--148."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00049"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421285"},{"key":"e_1_3_2_1_11_1","volume-title":"USENIX Symposium on Networked Systems Design and Implementation (NSDI). 613--627","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A {Low-Latency} Online Prediction Serving System. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). 613--627."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421284"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/358669.358692"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/UCC56403.2022.00019"},{"key":"e_1_3_2_1_15_1","volume-title":"IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency.","author":"Ghafouri Saeid","year":"2024","unstructured":"Saeid Ghafouri, Kamran Razavi, Mehran Salmani, Alireza Sanaee, Tania Lorido-Botran, Lin Wang, Joseph Doyle, and Pooyan Jamshidi. 2024. IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency. (2024). arXiv:cs.DC\/2308.12871"},{"key":"e_1_3_2_1_16_1","unstructured":"grpc [n. d.]. gRPC. https:\/\/grpc.io. ([n. d.]). Accessed on 29.10.2021."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135993"},{"key":"e_1_3_2_1_18_1","volume-title":"USENIX Symposium on Operating Systems Design and Implementation (OSDI). 443--462","author":"Gujarati Arpan","year":"2020","unstructured":"Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 443--462."},{"key":"e_1_3_2_1_19_1","volume-title":"Prashanth Thinakaran, Bikash Sharma, Mahmut Taylan Kandemir, and Chita R Das.","author":"Gunasekaran Jashwant Raj","year":"2022","unstructured":"Jashwant Raj Gunasekaran, Cyan Subhra Mishra, Prashanth Thinakaran, Bikash Sharma, Mahmut Taylan Kandemir, and Chita R Das. 2022. Cocktail: A multidimensional optimization for model serving in cloud. In USENIX NSDI. 1041--1057."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3486993"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450268.3453521"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2307636.2307658"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303958"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC '20). USENIX Association.","author":"Keahey Kate","year":"2020","unstructured":"Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, Fran\u00e7ois Halbach, Alex Rocha, and Joe Stubbs. 2020. Lessons learned from the Chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC '20). USENIX Association."},{"key":"e_1_3_2_1_25_1","volume-title":"Next-generation of virtual personal assistants (microsoft cortana, apple siri, amazon alexa and google home). In 2018 IEEE 8th annual computing and communication workshop and conference (CCWC)","author":"Kepuska Veton","unstructured":"Veton Kepuska and Gamal Bohouta. 2018. Next-generation of virtual personal assistants (microsoft cortana, apple siri, amazon alexa and google home). In 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, 99--103."},{"key":"e_1_3_2_1_26_1","volume-title":"The 25th annual international conference on mobile computing and networking. 1--16.","author":"Liu Luyang","unstructured":"Luyang Liu, Hongyu Li, and Marco Gruteser. 2019. Edge assisted realtime object detection for mobile augmented reality. In The 25th annual international conference on mobile computing and networking. 1--16."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00032"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTAS54340.2022.00020"},{"key":"e_1_3_2_1_29_1","volume-title":"INFaaS: Automated Model-less Inference Serving. In USENIX Annual Technical Conference (ATC). 397--411","author":"Romero Francisco","year":"2021","unstructured":"Francisco Romero, Qian Li, Neeraja J Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In USENIX Annual Technical Conference (ATC). 397--411."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3486972"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3578356.3592578"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359658"},{"key":"e_1_3_2_1_33_1","unstructured":"ultralytics. 2024. YOLOv5. https:\/\/github.com\/ultralytics\/yolov5. (2024). Accessed on 30.01.2024."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCOMM.2016.2601087"},{"key":"e_1_3_2_1_35_1","volume-title":"Right-Sizing Server Capacity Headroom for Global Online Services. In IEEE International Conference on Distributed Computing Systems (ICDCS). 645--659","author":"Verbowski Chad","year":"2018","unstructured":"Chad Verbowski, Ed Thayer, Paolo Costa, Hugh Leather, and Bj\u00f6rn Franke. 2018. Right-Sizing Server Capacity Headroom for Global Online Services. In IEEE International Conference on Distributed Computing Systems (ICDCS). 645--659."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3387514.3405882"},{"key":"e_1_3_2_1_37_1","volume-title":"Mark: Exploiting cloud services for cost-effective, SLO-aware machine learning inference serving. In 2019 {USENIX} Annual Technical Conference ({USENIX} {ATC} 19). 1049--1062.","author":"Zhang Chengliang","year":"2019","unstructured":"Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. Mark: Exploiting cloud services for cost-effective, SLO-aware machine learning inference serving. In 2019 {USENIX} Annual Technical Conference ({USENIX} {ATC} 19). 1049--1062."},{"key":"e_1_3_2_1_38_1","volume-title":"Model-switching: Dealing with fluctuating workloads in machine-learning-as-a-service systems. In 12th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 20).","author":"Zhang Jeff","year":"2020","unstructured":"Jeff Zhang, Sameh Elnikety, Shuayb Zarar, Atul Gupta, and Siddharth Garg. 2020. Model-switching: Dealing with fluctuating workloads in machine-learning-as-a-service systems. In 12th {USENIX} Workshop on Hot Topics in Cloud Computing (HotCloud 20)."}],"event":{"name":"EuroSys '24: Nineteenth European Conference on Computer Systems","location":"Athens Greece","acronym":"EuroSys '24","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the 4th Workshop on Machine Learning and Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3642970.3655833","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3642970.3655833","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:15:23Z","timestamp":1755908123000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3642970.3655833"}},"subtitle":["Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling"],"short-title":[],"issued":{"date-parts":[[2024,4,22]]},"references-count":38,"alternative-id":["10.1145\/3642970.3655833","10.1145\/3642970"],"URL":"https:\/\/doi.org\/10.1145\/3642970.3655833","relation":{},"subject":[],"published":{"date-parts":[[2024,4,22]]},"assertion":[{"value":"2024-04-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}