{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T20:59:07Z","timestamp":1774731547550,"version":"3.50.1"},"reference-count":77,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T00:00:00Z","timestamp":1707177600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T00:00:00Z","timestamp":1707177600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003958","name":"Stichting voor de Technische Wetenschappen","doi-asserted-by":"publisher","award":["P16-25"],"award-info":[{"award-number":["P16-25"]}],"id":[{"id":"10.13039\/501100003958","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Real-Time Syst"],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish\u2014a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate <jats:italic>dynamic<\/jats:italic> DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.\n<\/jats:p>","DOI":"10.1007\/s11241-024-09418-4","type":"journal-article","created":{"date-parts":[[2024,2,6]],"date-time":"2024-02-06T19:02:06Z","timestamp":1707246126000},"page":"239-290","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Inference serving with end-to-end latency SLOs over dynamic edge networks"],"prefix":"10.1007","volume":"60","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9020-555X","authenticated-orcid":false,"given":"Vinod","family":"Nigade","sequence":"first","affiliation":[]},{"given":"Pablo","family":"Bauszat","sequence":"additional","affiliation":[]},{"given":"Henri","family":"Bal","sequence":"additional","affiliation":[]},{"given":"Lin","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,6]]},"reference":[{"key":"9418_CR1","volume-title":"Simulated annealing and Boltzmann machines\u2014a stochastic approach to combinatorial optimization and neural computing","author":"EHL Aarts","year":"1990","unstructured":"Aarts EHL, Korst JHM (1990) Simulated annealing and Boltzmann machines\u2014a stochastic approach to combinatorial optimization and neural computing. Wiley, New Jersey"},{"key":"9418_CR2","first-page":"1063","volume-title":"Carmap: fast 3d feature map updates for automobiles","author":"F Ahmad","year":"2020","unstructured":"Ahmad F, Qiu H, Eells R et al (2020) Carmap: fast 3d feature map updates for automobiles. USENIX NSDI, Santa Clara, pp 1063\u20131081"},{"key":"9418_CR3","first-page":"325","volume-title":"Edge-slam: edge-assisted visual simultaneous localization and mapping","author":"AJB Ali","year":"2020","unstructured":"Ali AJB, Hashemifar ZS, Dantu K (2020) Edge-slam: edge-assisted visual simultaneous localization and mapping. ACM MobiSys, New York, pp 325\u2013337"},{"key":"9418_CR4","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1109\/MC.2017.3641638","volume":"50","author":"G Ananthanarayanan","year":"2017","unstructured":"Ananthanarayanan G, Bahl P, Bod\u00edk P et al (2017) Real-time video analytics: the killer app for edge computing. Computer 50:58\u201367","journal-title":"Computer"},{"key":"9418_CR5","first-page":"119","volume-title":"Ekya: continuous learning of video analytics models on edge compute servers","author":"R Bhardwaj","year":"2022","unstructured":"Bhardwaj R, Xia Z, Ananthanarayanan G et al (2022) Ekya: continuous learning of video analytics models on edge compute servers. USENIX NSDI, Santa Clara, pp 119\u2013135"},{"key":"9418_CR6","unstructured":"Bochkovskiy A, Wang C, Liao HM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv arXiv:2004.10934"},{"key":"9418_CR7","first-page":"40","volume-title":"At your service: designing voice assistant personalities to improve automotive user interfaces","author":"M Braun","year":"2019","unstructured":"Braun M, Mainz A, Chadowitz R et al (2019) At your service: designing voice assistant personalities to improve automotive user interfaces. ACM CHI, Boston, p 40"},{"key":"9418_CR8","volume-title":"Proxylessnas: direct neural architecture search on target task and hardware","author":"H Cai","year":"2019","unstructured":"Cai H, Zhu L, Han S (2019) Proxylessnas: direct neural architecture search on target task and hardware. ICLR, Vienna"},{"key":"9418_CR9","volume-title":"Once-for-All: train one network and specialize it for efficient deployment","author":"H Cai","year":"2020","unstructured":"Cai H, Gan C, Wang T et al (2020) Once-for-All: train one network and specialize it for efficient deployment. ICLR, Vienna"},{"key":"9418_CR10","first-page":"213","volume-title":"End-to-end object detection with transformers","author":"N Carion","year":"2020","unstructured":"Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. ECCV, Glasgow, pp 213\u2013229"},{"key":"9418_CR11","first-page":"14:1","volume-title":"An empirical study of latency in an emerging class of edge computing applications for wearable cognitive assistance","author":"Z Chen","year":"2017","unstructured":"Chen Z, Hu W, Wang J et al (2017) An empirical study of latency in an emerging class of edge computing applications for wearable cognitive assistance. ACM\/IEEE SEC, Wilmington, p 14:1-14:14"},{"key":"9418_CR12","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1109\/MSP.2017.2765695","volume":"35","author":"Y Cheng","year":"2018","unstructured":"Cheng Y, Wang D, Zhou P et al (2018) Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process Mag 35:126\u2013136","journal-title":"IEEE Signal Process Mag"},{"key":"9418_CR13","first-page":"613","volume-title":"Clipper: a low-latency online prediction serving system","author":"D Crankshaw","year":"2017","unstructured":"Crankshaw D, Wang X, Zhou G et al (2017) Clipper: a low-latency online prediction serving system. USENIX NSDI, Santa Clara, pp 613\u2013627"},{"key":"9418_CR14","first-page":"477","volume-title":"InferLine: latency-aware provisioning and scaling for prediction serving pipelines","author":"D Crankshaw","year":"2020","unstructured":"Crankshaw D, Sela G, Mo X et al (2020) InferLine: latency-aware provisioning and scaling for prediction serving pipelines. ACM SoCC, Seattle, pp 477\u2013491"},{"key":"9418_CR15","first-page":"183","volume-title":"DVABatch: diversity-aware multi-entry multi-exit batching for efficient processing of DNN services on gpus","author":"W Cui","year":"2022","unstructured":"Cui W, Zhao H, Chen Q et al (2022) DVABatch: diversity-aware multi-entry multi-exit batching for efficient processing of DNN services on gpus. USENIX ATC, Boston, pp 183\u2013198"},{"key":"9418_CR16","first-page":"1","volume-title":"Is image super-resolution helpful for other vision tasks?","author":"D Dai","year":"2016","unstructured":"Dai D, Wang Y, Chen Y et al (2016) Is image super-resolution helpful for other vision tasks? IEEE WACV, Snowmass, pp 1\u20139"},{"key":"9418_CR17","first-page":"557","volume-title":"Server-driven video streaming for deep learning inference","author":"K Du","year":"2020","unstructured":"Du K, Pervaiz A, Yuan X et al (2020) Server-driven video streaming for deep learning inference. ACM SIGCOMM, New York, pp 557\u2013570"},{"key":"9418_CR18","unstructured":"G\u00f6rmez A, Koyuncu E (2022) Pruning early exit networks. CoRR arXiv:2207.03644"},{"key":"9418_CR19","first-page":"443","volume-title":"Serving dnns like clockwork: performance predictability from the bottom up","author":"A Gujarati","year":"2020","unstructured":"Gujarati A, Karimi R, Alzayat S et al (2020) Serving dnns like clockwork: performance predictability from the bottom up. USENIX OSDI, Berkeley, pp 443\u2013462"},{"key":"9418_CR20","first-page":"123","volume-title":"MCDNN: an approximation-based execution framework for deep stream processing under resource constraints","author":"S Han","year":"2016","unstructured":"Han S, Shen H, Philipose M et al (2016) MCDNN: an approximation-based execution framework for deep stream processing under resource constraints. ACM MobiSys, New York, pp 123\u2013136"},{"key":"9418_CR21","doi-asserted-by":"publisher","first-page":"7436","DOI":"10.1109\/TPAMI.2021.3117837","volume":"44","author":"Y Han","year":"2022","unstructured":"Han Y, Huang G, Song S et al (2022) Dynamic neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 44:7436\u20137456","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"9418_CR22","first-page":"770","volume-title":"Deep residual learning for image recognition","author":"K He","year":"2016","unstructured":"He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. IEEE CVPR, Seattle, pp 770\u2013778"},{"key":"9418_CR23","first-page":"174","volume-title":"Real-time object detection system with multi-path neural networks","author":"S Heo","year":"2020","unstructured":"Heo S, Cho S, Kim Y et al (2020) Real-time object detection system with multi-path neural networks. IEEE RTAS, San Antonio, pp 174\u2013187"},{"key":"9418_CR24","unstructured":"Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR arXiv:1503.02531"},{"key":"9418_CR25","first-page":"225","volume-title":"A close examination of performance and power characteristics of 4g LTE networks","author":"J Huang","year":"2012","unstructured":"Huang J, Qian F, Gerber A et al (2012) A close examination of performance and power characteristics of 4g LTE networks. ACM MobiSys, New York, pp 225\u2013238"},{"key":"9418_CR26","first-page":"253","volume-title":"Chameleon: scalable adaptation of video analytics","author":"J Jiang","year":"2018","unstructured":"Jiang J, Ananthanarayanan G, Bod\u00edk P et al (2018) Chameleon: scalable adaptation of video analytics. ACM SIGCOMM, New York, pp 253\u2013266"},{"key":"9418_CR27","first-page":"279","volume-title":"Joint model and data adaptation for cloud inference serving","author":"J Jiang","year":"2021","unstructured":"Jiang J, Luo Z, Hu C et al (2021) Joint model and data adaptation for cloud inference serving. IEEE RTSS, Houston, pp 279\u2013289"},{"key":"9418_CR28","first-page":"143","volume-title":"Budget rnns: multi-capacity neural networks to improve in-sensor inference under energy budgets","author":"T Kannan","year":"2021","unstructured":"Kannan T, Hoffmann H (2021) Budget rnns: multi-capacity neural networks to improve in-sensor inference under energy budgets. IEEE RTAS, San Antonio, pp 143\u2013156"},{"key":"9418_CR29","first-page":"34:1","volume-title":"GrandSLAm: guaranteeing slas for jobs in microservices execution frameworks","author":"RS Kannan","year":"2019","unstructured":"Kannan RS, Subramanian L, Raju A et al (2019) GrandSLAm: guaranteeing slas for jobs in microservices execution frameworks. ACM EuroSys, New York, p 34:1-34:16"},{"key":"9418_CR30","unstructured":"Kouris A, Venieris SI, Laskaridis S, et\u00a0al (2022) Fluid batching: Exit-aware preemptive serving of early-exit neural networks on edge npus. CoRR arXiv:2209.13443"},{"key":"9418_CR31","unstructured":"Krizhevsky A, Hinton G, et\u00a0al (2009) Learning multiple layers of features from tiny images. CoRR"},{"key":"9418_CR32","first-page":"37:1","volume-title":"SPINN: synergistic progressive inference of neural networks over device and cloud","author":"S Laskaridis","year":"2020","unstructured":"Laskaridis S, Venieris SI, Almeida M et al (2020) SPINN: synergistic progressive inference of neural networks over device and cloud. ACM MobiCom, Los Cabos, p 37:1-37:15"},{"key":"9418_CR33","first-page":"1","volume-title":"HAPI: hardware-aware progressive inference","author":"S Laskaridis","year":"2020","unstructured":"Laskaridis S, Venieris SI, Kim H et al (2020) HAPI: hardware-aware progressive inference. IEEE\/ACM ICCAD, NewYork, pp 1\u20139"},{"key":"9418_CR34","first-page":"1","volume-title":"Adaptive inference through early-exit networks: design, challenges and directions","author":"S Laskaridis","year":"2021","unstructured":"Laskaridis S, Kouris A, Lane ND (2021) Adaptive inference through early-exit networks: design, challenges and directions. ACM EMDLMobiSys, New York, pp 1\u20136"},{"key":"9418_CR35","first-page":"15","volume-title":"SubFlow: a dynamic induced-subgraph strategy toward real-time DNN inference and training","author":"S Lee","year":"2020","unstructured":"Lee S, Nirjon S (2020) SubFlow: a dynamic induced-subgraph strategy toward real-time DNN inference and training. IEEE RTAS, San Antonio, pp 15\u201329"},{"key":"9418_CR36","first-page":"3193","volume-title":"Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade","author":"X Li","year":"2017","unstructured":"Li X, Liu Z, Luo P et al (2017) Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. IEEE CVPR, Seattle, pp 3193\u20133202"},{"key":"9418_CR37","first-page":"740","volume-title":"Microsoft COCO: common objects in context","author":"T Lin","year":"2014","unstructured":"Lin T, Maire M, Belongie SJ et al (2014) Microsoft COCO: common objects in context. ECCV, Zurich, pp 740\u2013755"},{"key":"9418_CR38","first-page":"25:1","volume-title":"Edge assisted real-time object detection for mobile augmented reality","author":"L Liu","year":"2019","unstructured":"Liu L, Li H, Gruteser M (2019) Edge assisted real-time object detection for mobile augmented reality. ACM MobiCom, Los Cabos, p 25:1-25:16"},{"key":"9418_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3527155","volume":"55","author":"Y Matsubara","year":"2023","unstructured":"Matsubara Y, Levorato M, Restuccia F (2023) Split computing and early exiting for deep learning applications: survey and research challenges. ACM Comput Surv 55:1\u201330","journal-title":"ACM Comput Surv"},{"key":"9418_CR40","first-page":"426","volume-title":"Better never than late: timely edge video analytics over the air","author":"V Nigade","year":"2021","unstructured":"Nigade V, Winder R, Bal HE et al (2021) Better never than late: timely edge video analytics over the air. ACM SenSys, New York, pp 426\u2013432"},{"key":"9418_CR41","first-page":"277","volume-title":"Jellyfish: timely inference serving for dynamic edge networks","author":"V Nigade","year":"2022","unstructured":"Nigade V, Bauszat P, Bal H et al (2022) Jellyfish: timely inference serving for dynamic edge networks. IEEE RTSS, Houston, pp 277\u2013290"},{"key":"9418_CR42","unstructured":"NVIDIA (2021) NVIDIA Triton Inference Server. https:\/\/developer.nvidia.com\/nvidia-triton-inference-server"},{"key":"9418_CR43","unstructured":"Olston C, Fiedel N, Gorovoy K, et\u00a0al (2017) Tensorflow-serving: flexible, high-performance ML serving. arXiv arXiv:1712.06139"},{"key":"9418_CR44","unstructured":"Pytorch (2021) TorchServe. https:\/\/pytorch.org\/serve\/"},{"key":"9418_CR45","unstructured":"PyTorch (2022) Reproducibility. https:\/\/pytorch.org\/docs\/stable\/notes\/randomness.html"},{"key":"9418_CR46","unstructured":"Qu Z, Sarwar SS, Dong X, et\u00a0al (2022) DRESS: dynamic real-time sparse subnets. CoRR arXiv:2207.00670"},{"key":"9418_CR47","first-page":"1421","volume-title":"DeepDecision: a mobile deep learning framework for edge video analytics","author":"X Ran","year":"2018","unstructured":"Ran X, Chen H, Zhu X et al (2018) DeepDecision: a mobile deep learning framework for edge video analytics. IEEE INFOCOM, New Jersey, pp 1421\u20131429"},{"key":"9418_CR48","first-page":"146","volume-title":"FA2: fast, accurate autoscaling for serving deep learning inference with SLA guarantees","author":"K Razavi","year":"2022","unstructured":"Razavi K, Luthra M, Koldehofe B et al (2022) FA2: fast, accurate autoscaling for serving deep learning inference with SLA guarantees. IEEE RTAS, San Antonio, pp 146\u2013159"},{"key":"9418_CR49","first-page":"1137","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence","author":"S Ren","year":"2017","unstructured":"Ren S, He K, Girshick RB et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, New Jersey, pp 1137\u20131149"},{"key":"9418_CR50","first-page":"397","volume-title":"INFaaS: automated model-less inference serving","author":"F Romero","year":"2021","unstructured":"Romero F, Li Q, Yadwadkar NJ et al (2021) INFaaS: automated model-less inference serving. USENIX ATC, Boston, pp 397\u2013411"},{"key":"9418_CR51","doi-asserted-by":"crossref","unstructured":"Romero F, Zhao M, Yadwadkar NJ, et\u00a0al (2021b) Llama: A heterogeneous & serverless framework for auto-tuning video analytics pipelines. arXiv arXiv:2102.01887","DOI":"10.1145\/3472883.3486972"},{"key":"9418_CR52","first-page":"326","volume-title":"Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers","author":"M Rusci","year":"2020","unstructured":"Rusci M, Capotondi A, Benini L (2020) Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers. MLSys, Austin, pp 326\u2013335"},{"key":"9418_CR53","first-page":"898","volume-title":"Towards transformer-based real-time object detection at the edge: a benchmarking study","author":"C Samplawski","year":"2021","unstructured":"Samplawski C, Marlin BM (2021) Towards transformer-based real-time object detection at the edge: a benchmarking study. IEEE MILCOM, Boston, pp 898\u2013903"},{"key":"9418_CR54","first-page":"6640","volume-title":"The right tool for the job: matching model and instance complexities","author":"R Schwartz","year":"2020","unstructured":"Schwartz R, Stanovsky G, Swayamdipta S et al (2020) The right tool for the job: matching model and instance complexities. ACL, Dublin, pp 6640\u20136651"},{"key":"9418_CR55","first-page":"322","volume-title":"Nexus: a GPU cluster engine for accelerating DNN-based video analysis","author":"H Shen","year":"2019","unstructured":"Shen H, Chen L, Jin Y et al (2019) Nexus: a GPU cluster engine for accelerating DNN-based video analysis. ACM SOSP, New York, pp 322\u2013337"},{"key":"9418_CR56","unstructured":"Sreedhar K, Clemons J, Venkatesan R, et\u00a0al (2022) Enabling and accelerating dynamic vision transformer inference for real-time applications. CoRR arXiv:2212.02687"},{"key":"9418_CR57","first-page":"54","volume-title":"EuroSys\u201922","author":"F Svoboda","year":"2022","unstructured":"Svoboda F, Fern\u00e1ndez-Marqu\u00e9s J, Liberis E et al (2022) Deep learning on microcontrollers: a study on deployment costs and challenges. In: Yoneki E, Nardi L (eds) EuroSys\u201922. ACM EuroMLSys, New York, pp 54\u201363"},{"key":"9418_CR58","first-page":"4278","volume-title":"Inception-v4, inception-ResNet and the impact of residual connections on learning","author":"C Szegedy","year":"2017","unstructured":"Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. AAAI, Washington, pp 4278\u20134284"},{"key":"9418_CR59","first-page":"2820","volume-title":"Mnasnet: platform-aware neural architecture search for mobile","author":"M Tan","year":"2019","unstructured":"Tan M, Chen B, Pang R et al (2019) Mnasnet: platform-aware neural architecture search for mobile. IEEE CVPR, Long Beach, pp 2820\u20132828"},{"key":"9418_CR60","first-page":"2464","volume-title":"Branchynet: fast inference via early exiting from deep neural networks","author":"S Teerapittayanon","year":"2016","unstructured":"Teerapittayanon S, McDanel B, Kung HT (2016) Branchynet: fast inference via early exiting from deep neural networks. ICPR, New York, pp 2464\u20132469"},{"key":"9418_CR61","unstructured":"Tianxiaomo (2020) Pytorch-yolov4. https:\/\/github.com\/Tianxiaomo\/pytorch-YOLOv4"},{"key":"9418_CR62","doi-asserted-by":"publisher","first-page":"2177","DOI":"10.1109\/LCOMM.2016.2601087","volume":"20","author":"J van der Hooft","year":"2016","unstructured":"van der Hooft J, Petrangeli S, Wauters T et al (2016) Http\/2-based adaptive streaming of HEVC video over 4g\/lte networks. IEEE Commun Lett 20:2177\u20132180","journal-title":"IEEE Commun Lett"},{"key":"9418_CR63","first-page":"353","volume-title":"ALERT: accurate learning for energy and timeliness","author":"C Wan","year":"2020","unstructured":"Wan C, Santriaji MH, Rogers E et al (2020) ALERT: accurate learning for energy and timeliness. USENIX ATC, Boston, pp 353\u2013369"},{"key":"9418_CR64","first-page":"40:1","volume":"52","author":"E Wang","year":"2019","unstructured":"Wang E, Davis JJ, Zhao R et al (2019) Deep neural network approximation for custom hardware: where we\u2019ve been, where we\u2019re going. ACM Comput Surv 52:40:1-40:39","journal-title":"ACM Comput Surv"},{"key":"9418_CR65","first-page":"257","volume-title":"Joint configuration adaptation and bandwidth allocation for edge-based real-time video analytics","author":"C Wang","year":"2020","unstructured":"Wang C, Zhang S, Chen Y et al (2020) Joint configuration adaptation and bandwidth allocation for edge-based real-time video analytics. IEEE INFOCOM, New York, pp 257\u2013266"},{"key":"9418_CR66","first-page":"479","volume-title":"Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption","author":"D Xu","year":"2020","unstructured":"Xu D, Zhou A, Zhang X et al (2020) Understanding operational 5G: a first measurement study on its coverage, performance and energy consumption. ACM SIGCOMM, New York, pp 479\u2013494"},{"key":"9418_CR67","first-page":"325","volume-title":"A control-theoretic approach for dynamic adaptive video streaming over HTTP","author":"X Yin","year":"2015","unstructured":"Yin X, Jindal A, Sekar V et al (2015) A control-theoretic approach for dynamic adaptive video streaming over HTTP. ACM SIGCOMM, New York, pp 325\u2013338"},{"key":"9418_CR68","unstructured":"Yu F, Wang D, Shangguan L, et\u00a0al (2022) A survey of multi-tenant deep learning inference on GPU. arXiv:2203.09040"},{"key":"9418_CR69","first-page":"409","volume-title":"Distream: scaling live video analytics with workload-adaptive distributed edge intelligence","author":"X Zeng","year":"2020","unstructured":"Zeng X, Fang B, Shen H et al (2020) Distream: scaling live video analytics with workload-adaptive distributed edge intelligence. ACM SenSys, New York, pp 409\u2013421"},{"key":"9418_CR70","first-page":"377","volume-title":"Live video analytics at scale with approximation and delay-tolerance","author":"H Zhang","year":"2017","unstructured":"Zhang H, Ananthanarayanan G, Bod\u00edk P et al (2017) Live video analytics at scale with approximation and delay-tolerance. USENIX NSDI, Berkeley, pp 377\u2013392"},{"key":"9418_CR71","first-page":"236","volume-title":"AWStream: adaptive wide-area streaming analytics","author":"B Zhang","year":"2018","unstructured":"Zhang B, Jin X, Ratnasamy S et al (2018) AWStream: adaptive wide-area streaming analytics. ACM SIGCOMM, New York, pp 236\u2013252"},{"key":"9418_CR72","first-page":"191","volume-title":"A systematic DNN weight pruning framework using alternating direction method of multipliers","author":"T Zhang","year":"2018","unstructured":"Zhang T, Ye S, Zhang K et al (2018) A systematic DNN weight pruning framework using alternating direction method of multipliers. ECCV, Munich, pp 191\u2013207"},{"key":"9418_CR73","volume-title":"Model-switching: dealing with fluctuating workloads in machine-learning-as-a-service systems","author":"J Zhang","year":"2020","unstructured":"Zhang J, Elnikety S, Zarar S et al (2020) Model-switching: dealing with fluctuating workloads in machine-learning-as-a-service systems. USENIX HotCloud, Berkeley"},{"key":"9418_CR74","first-page":"216","volume-title":"SkyNet: a hardware-efficient method for object detection and tracking on embedded systems","author":"X Zhang","year":"2020","unstructured":"Zhang X, Lu H, Hao C et al (2020) SkyNet: a hardware-efficient method for object detection and tracking on embedded systems. MLSys, Austin, pp 216\u2013229"},{"key":"9418_CR75","first-page":"1","volume-title":"PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences","author":"S Zhang","year":"2022","unstructured":"Zhang S, Cui W, Chen Q et al (2022) PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences. ACM ICS, New York, pp 1\u201312"},{"key":"9418_CR76","first-page":"4596","volume-title":"Adaptive quantization for deep neural network","author":"Y Zhou","year":"2018","unstructured":"Zhou Y, Moosavi-Dezfooli S, Cheung N et al (2018) Adaptive quantization for deep neural network. AAAI, Washington, pp 4596\u20134604"},{"key":"9418_CR77","volume-title":"Deformable DETR: deformable transformers for end-to-end object detection","author":"X Zhu","year":"2021","unstructured":"Zhu X, Su W, Lu L et al (2021) Deformable DETR: deformable transformers for end-to-end object detection. ICLR, Vienna"}],"container-title":["Real-Time Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11241-024-09418-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11241-024-09418-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11241-024-09418-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,13]],"date-time":"2024-07-13T17:06:45Z","timestamp":1720890405000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11241-024-09418-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,6]]},"references-count":77,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["9418"],"URL":"https:\/\/doi.org\/10.1007\/s11241-024-09418-4","relation":{},"ISSN":["0922-6443","1573-1383"],"issn-type":[{"value":"0922-6443","type":"print"},{"value":"1573-1383","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,6]]},"assertion":[{"value":"25 October 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}