{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T08:44:14Z","timestamp":1766047454992,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,9,30]],"date-time":"2022-09-30T00:00:00Z","timestamp":1664496000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000185","name":"DARPA","doi-asserted-by":"crossref","award":["304259-00001"],"award-info":[{"award-number":["304259-00001"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>Real-time machine vision applications running on resource-constrained embedded systems face challenges for maintaining performance. An especially challenging scenario arises when multiple applications execute at the same time, creating contention for the computational resources of the system. This contention results in increase in inference delay of the machine vision applications, which can be unacceptable for time-critical tasks. To address this challenge, we propose an adaptive model selection framework that mitigates the impact of system contention and prevents unexpected increases in inference delay by trading off the application accuracy minimally. The framework has two parts, which are performed pre-deployment and at runtime. The pre-deployment part profiles the system for contention in a black-box manner and produces a model set that is specifically optimized for the contention levels observed in the system. The runtime part predicts the inference delays of each model considering the system contention and selects the best model according to the predictions for each frame. Compared to a fixed individual model with similar accuracy, our framework improves the performance by significantly reducing the inference delay violations against a specified threshold. We implement our framework on the Nvidia Jetson TX2 platform and show that our approach achieves greater than 20% reductions in delay violations over the individual baseline models.<\/jats:p>","DOI":"10.1145\/3520134","type":"journal-article","created":{"date-parts":[[2022,3,26]],"date-time":"2022-03-26T11:20:19Z","timestamp":1648293619000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Contention Grading and Adaptive Model Selection for Machine Vision in Embedded Systems"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3868-421X","authenticated-orcid":false,"given":"Basar","family":"Kutukcu","sequence":"first","affiliation":[{"name":"University of California, San Diego, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sabur","family":"Baidya","sequence":"additional","affiliation":[{"name":"University of Louisville, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anand","family":"Raghunathan","sequence":"additional","affiliation":[{"name":"Purdue University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sujit","family":"Dey","sequence":"additional","affiliation":[{"name":"University of California, San Diego, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,8]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"7948","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS\u201919)","author":"Banner Ron","year":"2019","unstructured":"Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS\u201919), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 7948\u20137956."},{"key":"e_1_3_1_3_2","volume-title":"Proceedings of the Machine Learning and Systems (MLSys\u201920)","author":"Blalock Davis W.","year":"2020","unstructured":"Davis W. Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John V. Guttag. 2020. What is the state of neural network pruning? In Proceedings of the Machine Learning and Systems (MLSys\u201920), Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.)."},{"unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to  \\( + \\) 1 or  \\( - \\) 1. arXiv:1602.02830. Retrieved from http:\/\/arxiv.org\/abs\/1602.02830.","key":"e_1_3_1_4_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_5_2","DOI":"10.1109\/CVPR.2019.00020"},{"doi-asserted-by":"publisher","key":"e_1_3_1_6_2","DOI":"10.1109\/ICCD50377.2020.00053"},{"doi-asserted-by":"publisher","key":"e_1_3_1_7_2","DOI":"10.3390\/s20154220"},{"unstructured":"Boyuan Feng Kun Wan Shu Yang and Yufei Ding. 2018. SECS: Efficient deep stream processing via class skew dichotomy. arXiv:1809.06691. Retrieved from http:\/\/arxiv.org\/abs\/1809.06691.","key":"e_1_3_1_8_2"},{"doi-asserted-by":"crossref","unstructured":"Amir Gholami Sehoon Kim Zhen Dong Zhewei Yao Michael W. Mahoney and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv:2103.13630. Retrieved from https:\/\/arxiv.org\/abs\/2103.13630.","key":"e_1_3_1_9_2","DOI":"10.1201\/9781003162810-13"},{"key":"e_1_3_1_10_2","first-page":"1135","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 1135\u20131143."},{"doi-asserted-by":"publisher","key":"e_1_3_1_11_2","DOI":"10.1007\/978-3-319-46493-0_38"},{"unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved fromhttp:\/\/arxiv.org\/abs\/1704.04861.","key":"e_1_3_1_12_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_13_2","DOI":"10.1109\/CVPR.2017.243"},{"unstructured":"Forrest N. Iandola Matthew W. Moskewicz Khalid Ashraf Song Han William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and  \\( \\lt \\) 0.5MB model size. arXiv:1602.07360. Retrieved from http:\/\/arxiv.org\/abs\/1602.07360.","key":"e_1_3_1_14_2"},{"unstructured":"Forrest N. Iandola Matthew W. Moskewicz Sergey Karayev Ross B. Girshick Trevor Darrell and Kurt Keutzer. 2014. DenseNet: Implementing efficient convnet descriptor pyramids. arXiv:1404.1869. Retrieved from http:\/\/arxiv.org\/abs\/1404.1869.","key":"e_1_3_1_15_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_16_2","DOI":"10.3390\/s19204357"},{"key":"e_1_3_1_17_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915), Yoshua Bengio and Yann LeCun (Eds.)."},{"doi-asserted-by":"publisher","key":"e_1_3_1_18_2","DOI":"10.1109\/AICAS51828.2021.9458468"},{"doi-asserted-by":"publisher","key":"e_1_3_1_19_2","DOI":"10.1109\/ACC.2015.7171097"},{"doi-asserted-by":"publisher","key":"e_1_3_1_20_2","DOI":"10.1007\/978-3-319-46448-0_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_21_2","DOI":"10.1109\/ICCV.2017.541"},{"key":"e_1_3_1_22_2","series-title":"Proceedings of the 36th International Conference on Machine Learning (ICML\u201919),","first-page":"4486","volume":"97","author":"Meller Eldad","year":"2019","unstructured":"Eldad Meller, Alexander Finkelstein, Uri Almog, and Mark Grobman. 2019. Same, same but different: recovering neural network quantization error through weight factorization. In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919),Proceedings of Machine Learning Research, Vol. 97, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 4486\u20134495."},{"doi-asserted-by":"publisher","key":"e_1_3_1_23_2","DOI":"10.1109\/CODESISSS.2015.7331375"},{"doi-asserted-by":"publisher","key":"e_1_3_1_24_2","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_1_25_2","series-title":"Proceedings of the 36th International Conference on Machine Learning (ICML\u201919),","first-page":"5389","volume":"97","author":"Recht Benjamin","year":"2019","unstructured":"Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. 2019. Do imagenet classifiers generalize to imagenet? In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919),Proceedings of Machine Learning Research, Vol. 97, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 5389\u20135400."},{"unstructured":"Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767. Retrieved fromhttp:\/\/arxiv.org\/abs\/1804.02767.","key":"e_1_3_1_26_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_27_2","DOI":"10.1007\/s11263-015-0816-y"},{"doi-asserted-by":"publisher","key":"e_1_3_1_28_2","DOI":"10.1109\/CVPR.2018.00474"},{"doi-asserted-by":"publisher","key":"e_1_3_1_29_2","DOI":"10.5555\/2627435.2670313"},{"doi-asserted-by":"publisher","key":"e_1_3_1_30_2","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_1_31_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Tailor Shyam Anil","year":"2021","unstructured":"Shyam Anil Tailor, Javier Fern\u00e1ndez-Marqu\u00e9s, and Nicholas Donald Lane. 2021. Degree-Quant: Quantization-Aware training for graph neural networks. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)."},{"key":"e_1_3_1_32_2","series-title":"Proceedings of the 36th International Conference on Machine Learning (ICML\u201919)","first-page":"6105","volume":"97","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919)Proceedings of Machine Learning Research, Vol. 97, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 6105\u20136114."},{"doi-asserted-by":"publisher","key":"e_1_3_1_33_2","DOI":"10.1145\/3211332.3211336"},{"doi-asserted-by":"publisher","key":"e_1_3_1_34_2","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_1_35_2","first-page":"1967","volume-title":"Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS\u201918)","author":"Wang Jun","year":"2018","unstructured":"Jun Wang, Tanner A. Bohn, and Charles X. Ling. 2018. Pelee: A real-time object detection system on mobile devices. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS\u201918), Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 1967\u20131976."},{"unstructured":"Ran Xu Jinkyu Koo Rakesh Kumar Peter Bai Subrata Mitra Ganga Maghanath and Saurabh Bagchi. 2019. ApproxNet: Content and Contention-Aware Video Analytics System for Embedded Clients. arXiv:1909.02068. Retrieved from http:\/\/arxiv.org\/abs\/1909.02068.","key":"e_1_3_1_36_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_37_2","DOI":"10.3390\/s21062140"},{"doi-asserted-by":"publisher","key":"e_1_3_1_38_2","DOI":"10.1007\/978-3-030-58583-9_43"},{"unstructured":"Jiahui Yu and Thomas S. Huang. 2019. AutoSlim: Towards One-Shot Architecture Search for Channel Numbers. arXiv:1903.11728. Retrieved from http:\/\/arxiv.org\/abs\/1903.11728.","key":"e_1_3_1_39_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_40_2","DOI":"10.1109\/ICCV.2019.00189"},{"key":"e_1_3_1_41_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Yu Jiahui","year":"2019","unstructured":"Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas S. Huang. 2019. Slimmable neural networks. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)."},{"doi-asserted-by":"publisher","key":"e_1_3_1_42_2","DOI":"10.1007\/978-3-030-58536-5_11"},{"key":"e_1_3_1_43_2","volume-title":"Proceedings of the 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201920)","author":"Zhang Jeff","year":"2020","unstructured":"Jeff Zhang, Sameh Elnikety, Shuayb Zarar, Atul Gupta, and Siddharth Garg. 2020. Model-Switching: Dealing with fluctuating workloads in machine-learning-as-a-service systems. In Proceedings of the 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201920), Amar Phanishayee and Ryan Stutsman (Eds.). USENIX Association."},{"doi-asserted-by":"publisher","key":"e_1_3_1_44_2","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_3_1_45_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Zhu Chenzhuo","year":"2017","unstructured":"Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained ternary quantization. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)."},{"doi-asserted-by":"publisher","key":"e_1_3_1_46_2","DOI":"10.1109\/CVPR.2018.00907"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3520134","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3520134","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3520134","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:32Z","timestamp":1750183832000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3520134"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,30]]},"references-count":45,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3520134"],"URL":"https:\/\/doi.org\/10.1145\/3520134","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2022,9,30]]},"assertion":[{"value":"2021-07-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-19","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}