{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T15:54:31Z","timestamp":1775145271109,"version":"3.50.1"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["2211302, 2211888, 2213636, 2105494"],"award-info":[{"award-number":["2211302, 2211888, 2213636, 2105494"]}]},{"name":"U.S. Army Contract","award":["W911NF-17-2-0196"],"award-info":[{"award-number":["W911NF-17-2-0196"]}]},{"name":"Adobe and VMware"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. In this article, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors. After validating our models using extensive experiments, we use them to design various cluster resource management algorithms to intelligently manage multiple applications on edge accelerators while respecting their latency constraints. We implement a prototype of our system in Kubernetes and show that our system can host 2.3\u00d7 more DNN applications in heterogeneous multi-tenant edge clusters with no latency violations when compared to traditional knapsack hosting algorithms.<\/jats:p>","DOI":"10.1145\/3582080","type":"journal-article","created":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:52:04Z","timestamp":1674647524000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":50,"title":["Model-driven Cluster Resource Management for AI Workloads in Edge Clouds"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4702-5689","authenticated-orcid":false,"given":"Qianlin","family":"Liang","sequence":"first","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5765-8194","authenticated-orcid":false,"given":"Walid A.","family":"Hanafy","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2774-9284","authenticated-orcid":false,"given":"Ahmed","family":"Ali-Eldin","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5435-1901","authenticated-orcid":false,"given":"Prashant","family":"Shenoy","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,3,27]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201916). USENIX Association, USA, 265\u2013283."},{"issue":"2","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/j.dcan.2017.07.001","article-title":"Edge computing technologies for Internet of Things: A primer","volume":"4","author":"Ai Yuan","year":"2018","unstructured":"Yuan Ai, Mugen Peng, and Kecheng Zhang. 2018. Edge computing technologies for Internet of Things: A primer. Dig. Commun. Netw. 4, 2 (2018), 77\u201386.","journal-title":"Dig. Commun. Netw."},{"key":"e_1_3_2_4_2","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201920)","author":"Ambati Pradeep","year":"2020","unstructured":"Pradeep Ambati, Noman Bashir, David W. Irwin, and Prashant J. Shenoy. 2020. Waiting game: Optimally provisioning fixed resources for cloud-enabled schedulers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC\u201920)."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2017.00017"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2015.10"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539700382820"},{"issue":"8","key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"1655","DOI":"10.1109\/JPROC.2019.2921977","article-title":"Deep learning with edge computing: A review","volume":"107","author":"Chen J.","year":"2019","unstructured":"J. Chen and X. Ran. 2019. Deep learning with edge computing: A review. Proc. IEEE 107, 8 (2019), 1655\u20131674.","journal-title":"Proc. IEEE"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1145\/3274783.3274834","volume-title":"Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (SenSys\u201918)","author":"Chen Kaifei","year":"2018","unstructured":"Kaifei Chen, Tong Li, Hyung-Sin Kim, David E. Culler, and Randy H. Katz. 2018. MARVEL: Enabling mobile augmented reality with low energy and low latency. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems (SenSys\u201918). ACM, New York, NY, 292\u2013304. 10.1145\/3274783.3274834"},{"key":"e_1_3_2_10_2","first-page":"220","article-title":"PREMA: A predictive multi-task scheduling algorithm for preemptible neural processing units","author":"Choi Yujeong","year":"2020","unstructured":"Yujeong Choi and Minsoo Rhu. 2020. PREMA: A predictive multi-task scheduling algorithm for preemptible neural processing units. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920), 220\u2013233.","journal-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132747.3132772"},{"key":"e_1_3_2_12_2","volume-title":"Proceedings of the Usenix Conference on Networked Systems Design and Implementation (NSDI\u201917)","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In Proceedings of the Usenix Conference on Networked Systems Design and Implementation (NSDI\u201917)."},{"key":"e_1_3_2_13_2","first-page":"248","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Deng Jia","year":"2009","unstructured":"Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248\u2013255. 10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421284"},{"key":"e_1_3_2_15_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Hot Chips (HCS\u201908)","author":"Fatica M.","year":"2008","unstructured":"M. Fatica. 2008. CUDA toolkit and libraries. In Proceedings of the IEEE Conference on Hot Chips (HCS\u201908). 1\u201322."},{"key":"e_1_3_2_16_2","first-page":"1","volume-title":"Proceedings of the 20th International Middleware Conference Tutorials (Middleware\u201919)","author":"Gandhi Anshul","year":"2019","unstructured":"Anshul Gandhi and Amoghvarsha Suresh. 2019. Leveraging queueing theory and OS profiling to reduce application latency. In Proceedings of the 20th International Middleware Conference Tutorials (Middleware\u201919). ACM, New York, NY, 1\u20135. 10.1145\/3366625.3368853"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2016.94"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","DOI":"10.1002\/9781118625651","volume-title":"Fundamentals of Queueing Theory (4th ed.)","author":"Gross Donald","year":"2008","unstructured":"Donald Gross, John F. Shortle, James M. Thompson, and Carl M. Harris. 2008. Fundamentals of Queueing Theory (4th ed.). Wiley-Interscience, USA."},{"key":"e_1_3_2_19_2","volume-title":"Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI\u201920)","author":"Gujarati Arpan","year":"2020","unstructured":"Arpan Gujarati, Reza Karimi, Safya Alzayat, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like Clockwork: Performance predictability from the bottom up. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI\u201920)."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC47752.2019.9041955"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447555.3465326"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69814-2_1"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139226424","volume-title":"Performance Modeling and Design of Computer Systems: Queueing Theory in Action (1st ed.)","author":"Harchol-Balter Mor","year":"2013","unstructured":"Mor Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action (1st ed.). Cambridge University Press, USA."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11134-005-2898-7"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3068281"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3274808.3274813"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.peva.2020.102183"},{"key":"e_1_3_2_29_2","first-page":"675","volume-title":"Proceedings of the 22nd ACM International Conference on Multimedia (MM\u201914)","author":"Jia Yangqing","year":"2014","unstructured":"Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM\u201914). ACM, New York, NY, 675\u2013678. 10.1145\/2647868.2654889"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-5152-4"},{"key":"e_1_3_2_31_2","article-title":"Learning Multiple Layers of Features from Tiny Images","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky. 2012. Learning Multiple Layers of Features from Tiny Images. University of Toronto.","journal-title":"University of Toronto"},{"key":"e_1_3_2_32_2","first-page":"145","volume-title":"Proceedings of the International Symposium on Workload Characterization","author":"Liang Qianlin","year":"2020","unstructured":"Qianlin Liang, Prashant J. Shenoy, and David E. Irwin. 2020. AI on the edge: Characterizing AI-based IoT applications using specialized edge architectures. In Proceedings of the International Symposium on Workload Characterization. IEEE, 145\u2013156."},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201914)","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV\u201914)."},{"issue":"2","key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1109\/TPDS.2016.2563428","article-title":"CloudFog: Leveraging fog to extend cloud gaming for thin-client MMOG with high quality of service","volume":"28","author":"Lin Yuhua","year":"2016","unstructured":"Yuhua Lin and Haiying Shen. 2016. CloudFog: Leveraging fog to extend cloud gaming for thin-client MMOG with high quality of service. Trans. Parallel Distrib. Syst. 28, 2 (2016), 431\u2013445.","journal-title":"Trans. Parallel Distrib. Syst."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2964608"},{"key":"e_1_3_2_36_2","unstructured":"Nvidia. 2020. NVIDIA Jetson Modules. Retrieved October 19 2020 from https:\/\/developer.nvidia.com\/embedded\/jetson-modules."},{"key":"e_1_3_2_37_2","unstructured":"Nvidida. 2020. Multi Process Service. Retrieved from https:\/\/docs.nvidia.com\/deploy\/pdf\/CUDA_Multi_Process_Service_Overview.pdf."},{"key":"e_1_3_2_38_2","volume-title":"Advances in Neural Information Processing Systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32."},{"key":"e_1_3_2_39_2","first-page":"1","volume-title":"Proceedings of the International Green Computing Conference (IGCC\u201912)","author":"Pawlish Michael","year":"2012","unstructured":"Michael Pawlish, Aparna S. Varde, and Stefan A. Robila. 2012. Analyzing utilization rates in data centers for optimizing energy management. In Proceedings of the International Green Computing Conference (IGCC\u201912). 1\u20136. 10.1109\/IGCC.2012.6322248"},{"key":"e_1_3_2_40_2","first-page":"779","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision Pattern Recognition","author":"Redmon Joseph","year":"2016","unstructured":"Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision Pattern Recognition. 779\u2013788."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366626.3368131"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2017.9"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/MPRV.2009.82"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1145\/3341301.3359658","volume-title":"Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP\u201919)","author":"Shen Haichen","year":"2019","unstructured":"Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU cluster engine for accelerating DNN-Based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP\u201919). 322\u2013337. 10.1145\/3341301.3359658"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/2038916.2038921"},{"key":"e_1_3_2_47_2","first-page":"15","volume-title":"Proceedings of the USENIX Conference on Operational Machine Learning (OpML\u201919)","author":"Soifer Jonathan","year":"2019","unstructured":"Jonathan Soifer, Jason Li, Mingqin Li, Jeffrey Zhu, Yingnan Li, Yuxiong He, Elton Zheng, Adi Oltean, Maya Mosyak, Chris Barnes, Thomas Liu, and Junhua Wang. 2019. Deep learning inference service at microsoft. In Proceedings of the USENIX Conference on Operational Machine Learning (OpML\u201919). Santa Clara, CA, 15\u201317."},{"key":"e_1_3_2_48_2","first-page":"2818","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Szegedy C.","year":"2016","unstructured":"C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 2818\u20132826. 10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_49_2","unstructured":"Olivier Temam Harshit Khaitan Ravi Narayanaswami and Dong Hyuk Woo. 2019. Neural network accelerator with parameters resident on chip. U.S. Patent No. US20190050717A1."},{"key":"e_1_3_2_50_2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1145\/1064212.1064252","volume-title":"Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS\u201905)","author":"Urgaonkar Bhuvan","year":"2005","unstructured":"Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. 2005. An analytical model for multi-tier internet services and its applications. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS\u201905). 291\u2013302. 10.1145\/1064212.1064252"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1142\/S012905410700511X"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2019.00180"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2018.2808352"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.ECRTS.2018.20"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCC.2020.3006751"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132211.3134463"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2019.2918951"}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582080","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3582080","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:46Z","timestamp":1750178806000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582080"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,27]]},"references-count":56,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3582080"],"URL":"https:\/\/doi.org\/10.1145\/3582080","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"value":"1556-4665","type":"print"},{"value":"1556-4703","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,27]]},"assertion":[{"value":"2021-12-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-10","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}