{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T18:37:51Z","timestamp":1772822271638,"version":"3.50.1"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T00:00:00Z","timestamp":1726012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>Deep learning is a proven method in many applications. However, it requires high computation resources and usually has a constant architecture. Mobile systems are good candidates to benefit from deep learning applications since they are closely integrated in people\u2019s life. However, mobile systems experience varying conditions for the same reason. Constant deep learning architectures against varying resources cannot satisfy the requirements of the applications, so dynamic deep learning architectures are needed. In this work, we propose SLEXNet, a slimmable early exit neural network architecture. SLEXNet combines dynamic depth and width architectures to adapt to varying time and power conditions. Moreover, we propose a runtime scheduling algorithm that can estimate inference time and power consumption of SLEXNet variations on runtime. We train SLEXNet on real aerial drone images and implement the runtime on NVIDIA Jetson Orin. We show that our approach achieves significantly better responses to time and power requirements in varying conditions than baseline dynamic depth and width techniques in a wide range of experiments.<\/jats:p>","DOI":"10.1145\/3689632","type":"journal-article","created":{"date-parts":[[2024,8,24]],"date-time":"2024-08-24T10:00:39Z","timestamp":1724493639000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["SLEXNet: Adaptive Inference Using Slimmable Early Exit Neural Networks"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3868-421X","authenticated-orcid":false,"given":"Basar","family":"Kutukcu","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering, University of California San Diego, La Jolla, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0245-2903","authenticated-orcid":false,"given":"Sabur","family":"Baidya","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, University of Louisville, Louisville, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9671-3950","authenticated-orcid":false,"given":"Sujit","family":"Dey","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering, University of California San Diego, La Jolla, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,9,11]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Ron Banner Yury Nahshan and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 7950\u20137958. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c0a62e133894cdce435bcb4a5df1db2d-Abstract.html"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","unstructured":"Ali Ehteshami Bejnordi and Ralf Krestel. 2020. Dynamic channel and layer gating in convolutional neural networks. In KI 2020: Advances in Artificial Intelligence. Lecture Notes in Computer Science Vol. 12325. Springer 33\u201345. 10.1007\/978-3-030-58285-2_3","DOI":"10.1007\/978-3-030-58285-2_3"},{"key":"e_1_3_1_4_2","unstructured":"Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle and John V. Guttag. 2020. What is the state of neural network pruning? In Proceedings of the Conference on Machine Learning and Systems (MLSys\u201920)."},{"key":"e_1_3_1_5_2","unstructured":"Tolga Bolukbasi Joseph Wang Ofer Dekel and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. In Proceedings of the 34th International Conference on Machine Learning. 527\u2013536. http:\/\/proceedings.mlr.press\/v70\/bolukbasi17a.html"},{"key":"e_1_3_1_6_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR\u201920)","author":"Cai Han","year":"2020","unstructured":"Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-all: Train one network and specialize it for efficient deployment. In Proceedings of the 8th International Conference on Learning Representations (ICLR\u201920). https:\/\/openreview.net\/forum?id=HylxE1HKwS"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.mlwa.2021.100134"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00939"},{"key":"e_1_3_1_9_2","article-title":"BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \u20131","volume":"1602","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \u20131. CoRR abs\/1602.02830 (2016). http:\/\/arxiv.org\/abs\/1602.02830","journal-title":"CoRR"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","unstructured":"Rostand A. K. Fezeu Eman Ramadan Wei Ye Benjamin Minneci Jack Xie Arvind Narayanan Ahmad Hassan Feng Qian Zhi-Li Zhang Jaideep Chandrashekar and Myungjin Lee. 2023. An in-depth measurement analysis of 5G mmWave PHY latency and its impact on end-to-end delay. In Passive and Active Measurement. Lecture Notes in Computer Science Vol. 13882. Springer 284\u2013312. 10.1007\/978-3-031-28486-1_13","DOI":"10.1007\/978-3-031-28486-1_13"},{"key":"e_1_3_1_11_2","article-title":"A survey of quantization methods for efficient neural network inference","volume":"2103","author":"Gholami Amir","year":"2021","unstructured":"Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. CoRR abs\/2103.13630 (2021). https:\/\/arxiv.org\/abs\/2103.13630","journal-title":"CoRR"},{"key":"e_1_3_1_12_2","unstructured":"Song Han Jeff Pool John Tran and William J. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915). 1135\u20131143. https:\/\/proceedings.neurips.cc\/paper\/2015\/hash\/ae0eb3eed39d2bcef4622b2499a05fe6-Abstract.html"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3117837"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2021.3097916"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Identity mappings in deep residual networks. In Computer Vision\u2014ECCV 2016. Lecture Notes in Computer Science Vol. 9908. Springer 630\u2013645. 10.1007\/978-3-319-46493-0_38","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_3_1_16_2","unstructured":"Gao Huang Danlu Chen Tianhong Li Felix Wu Laurens van der Maaten and Kilian Q. Weinberger. 2018. Multi-scale dense networks for resource efficient image classification. In Proceedings of the 6th International Conference on Learning Representations: Conference Track (ICLR\u201918). https:\/\/openreview.net\/forum?id=Hk2aImxAb"},{"key":"e_1_3_1_17_2","unstructured":"Alexander B. Jung Kentaro Wada Jon Crall Satoshi Tanaka Jake Graving Christoph Reinders Sarthak Yadav Joy Banerjee G\u00e1bor Vecsei Adam Kraft Zheng Rui Jirka Borovec Christian Vallentin Semen Zhydenko Kilian Pfeiffer Ben Cook Ismael Fern\u00e1ndez Fran\u00e7ois-Michel De Rainville Chi-Hung Weng Abner Ayala-Acevedo Raphael Meudec and Matias Laporte. 2020. Imgaug. Retrieved February 1 2020 from https:\/\/github.com\/aleju\/imgaug"},{"key":"e_1_3_1_18_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations: Conference Track (ICLR\u201915). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3520134"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTARS.2020.2969809"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00850"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICUAS54217.2022.9836119"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11630"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.541"},{"key":"e_1_3_1_25_2","unstructured":"Mason McGill and Pietro Perona. 2017. Deciding how to decide: Dynamic routing in artificial neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917). 2363\u20132372. http:\/\/proceedings.mlr.press\/v70\/mcgill17a.html"},{"key":"e_1_3_1_26_2","unstructured":"Eldad Meller Alexander Finkelstein Uri Almog and Mark Grobman. 2019. Same same but different: Recovering neural network quantization error through weight factorization. In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919). 4486\u20134495. http:\/\/proceedings.mlr.press\/v97\/meller19a.html"},{"key":"e_1_3_1_27_2","unstructured":"Augustus Odena Dieterich Lawson and Christopher Olah. 2017. Changing model behavior at test-time using reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations: Workshop Track (ICLR\u201917). https:\/\/openreview.net\/forum?id=Hk8-lkHKe"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/DCOSS52077.2021.00062"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CODESISSS.2015.7331375"},{"key":"e_1_3_1_30_2","article-title":"End-to-end speech recognition: A survey","author":"Prabhavalkar Rohit","year":"2023","unstructured":"Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schl\u00fcter, and Shinji Watanabe. 2023. End-to-end speech recognition: A survey. arXiv preprint arXiv:2303.03329 (2023).","journal-title":"arXiv preprint arXiv:2303.03329"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_32_2","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1109\/SMARTCOMP58114.2023.00053","volume-title":"Proceedings of the 2023 IEEE International Conference on Smart Computing (SMARTCOMP\u201923)","author":"Spicer Elijah","year":"2023","unstructured":"Elijah Spicer and Sabur Baidya. 2023. Performance tradeoff in DNN-based coexisting applications in resource-constrained cyber-physical systems. In Proceedings of the 2023 IEEE International Conference on Smart Computing (SMARTCOMP\u201923). IEEE, 219\u2013221."},{"key":"e_1_3_1_33_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Tailor Shyam Anil","year":"2021","unstructured":"Shyam Anil Tailor, Javier Fern\u00e1ndez-Marqu\u00e9s, and Nicholas Donald Lane. 2021. Degree-Quant: Quantization-aware training for graph neural networks. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921). https:\/\/openreview.net\/forum?id=NSBrFgJAHg"},{"key":"e_1_3_1_34_2","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML\u201919). 6105\u20136114. http:\/\/proceedings.mlr.press\/v97\/tan19a.html"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3211332.3211336"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","unstructured":"Xin Wang Fisher Yu Zi-Yi Dou Trevor Darrell and Joseph E. Gonzalez. 2018. SkipNet: Learning dynamic routing in convolutional networks. In Computer Vision\u2014ECCV 2018. Lecture Notes in Computer Science Vol. 11217. Springer 420\u2013436. 10.1007\/978-3-030-01261-8_25","DOI":"10.1007\/978-3-030-01261-8_25"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.2979669"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2021.3056031"},{"key":"e_1_3_1_40_2","article-title":"ApproxNet: Content and contention aware video analytics system for the edge","volume":"1909","author":"Xu Ran","year":"2019","unstructured":"Ran Xu, Jinkyu Koo, Rakesh Kumar, Peter Bai, Subrata Mitra, Ganga Maghanath, and Saurabh Bagchi. 2019. ApproxNet: Content and contention aware video analytics system for the edge. CoRR abs\/1909.02068 (2019). http:\/\/arxiv.org\/abs\/1909.02068","journal-title":"CoRR"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00189"},{"key":"e_1_3_1_42_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Yu Jiahui","year":"2019","unstructured":"Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, and Thomas S. Huang. 2019. Slimmable neural networks. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919). https:\/\/openreview.net\/forum?id=H1gMCsAqY7"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","unstructured":"Zhihang Yuan Bingzhe Wu Guangyu Sun Zheng Liang Shiwan Zhao and Weichen Bi. 2020. S2DNAS: Transforming static CNN model for dynamic inference via neural architecture search. In Computer Vision\u2014ECCV 2020. Lecture Notes in Computer Science Vol. 12347. Springer 175\u2013192. 10.1007\/978-3-030-58536-5_11","DOI":"10.1007\/978-3-030-58536-5_11"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.18223"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689632","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689632","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:47Z","timestamp":1750295387000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689632"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,11]]},"references-count":43,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3689632"],"URL":"https:\/\/doi.org\/10.1145\/3689632","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,11]]},"assertion":[{"value":"2023-11-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}