{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T02:10:05Z","timestamp":1750299005653,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T00:00:00Z","timestamp":1747785600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["No. 2021ZD0110400"],"award-info":[{"award-number":["No. 2021ZD0110400"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Innovation Program for Quantum Science and Technology","award":["2021ZD0302900"],"award-info":[{"award-number":["2021ZD0302900"]}]},{"DOI":"10.13039\/501100001809","name":"China National Natural Science Foundation","doi-asserted-by":"crossref","award":["No. 62132018, 62231015, U23A20308, 623B2093"],"award-info":[{"award-number":["No. 62132018, 62231015, U23A20308, 623B2093"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"\u201cPioneer\u201d and \u201cLeading Goose\u201d R&D Program of Zhejiang","award":["2023C01029, and 2023C01143"],"award-info":[{"award-number":["2023C01029, and 2023C01143"]}]},{"name":"Fundamental Research Funds"},{"name":"Central Universities and the Plans for Major Provincial Science&Technology Projects","award":["No. 202303a07020006"],"award-info":[{"award-number":["No. 202303a07020006"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Sen. Netw."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>Deploying deep neural networks (DNNs) on IoT devices for model serving is a promising solution for intelligent applications with high real-time requirements and bandwidth sensitivity. To cope with the prohibitive computation and storage overheads of modern DNNs, great efforts have been devoted to the model compression technique. Most existing model compression approaches focus on minimizing the model size and maximizing the average accuracy on all the inference tasks. However, real-world IoT tasks have various service-level objectives (SLOs). Models compressed by existing methods struggle to simultaneously meet SLOs in multiple dimensions, such as latency and accuracy. In this work, we study model compression with a joint consideration of SLO awareness and task adaptation. Through our extensive experience with model compression across various IoT tasks, we observe that the importance of individual channels in contributing to accuracy is heavily influenced by task-specific data distribution. Therefore, we design a channel Shapley algorithm to estimate the importance of individual channels in DNNs and propose a deep reinforcement learning based controller to incorporate SLOs into the compression objective. Integrating these designs, we propose and prototype ChannelZip, the first SLO-aware channel compression framework. Extensive evaluations on real IoT model serving systems show the effectiveness in task adaptation of ChannelZip. ChannelZip outperforms strong model compression baselines by 3.77% accuracy and achieves a 69% average parameter compression ratio. Real-world deployment on different IoT devices shows that ChannelZip meets all task SLOs and achieves up to 2.32 \u00d7 inference speedup.<\/jats:p>","DOI":"10.1145\/3729534","type":"journal-article","created":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T11:00:11Z","timestamp":1744801211000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ChannelZip: SLO-Aware Channel Compression for Task-Adaptive Model Serving on IoT Devices"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-6885-1185","authenticated-orcid":false,"given":"Puhan","family":"Luo","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3340-8585","authenticated-orcid":false,"given":"Jiahui","family":"Hou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3133-1430","authenticated-orcid":false,"given":"Haisheng","family":"Tan","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2624-8755","authenticated-orcid":false,"given":"Mu","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3603-3886","authenticated-orcid":false,"given":"Guangyu","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8759-8874","authenticated-orcid":false,"given":"Kaiwen","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China and School of Computer Science and Technology, Ocean University of China, Qingdao, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0398-2631","authenticated-orcid":false,"given":"Zhiqiang","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6070-6625","authenticated-orcid":false,"given":"XiangYang","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,5,21]]},"reference":[{"key":"e_1_3_2_2_2","article-title":"Shapley value as principled metric for structured network pruning","author":"Ancona Marco","year":"2020","unstructured":"Marco Ancona, Cengiz \u00d6ztireli, and Markus Gross. 2020. Shapley value as principled metric for structured network pruning. arXiv preprint arXiv:2006.01795 (2020).","journal-title":"arXiv preprint arXiv:2006.01795"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2765695"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2976475"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.iot.2021.100461"},{"key":"e_1_3_2_7_2","first-page":"16091","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201923)","author":"Fang Gongfan","year":"2023","unstructured":"Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. 2023. DepGraph: Towards any structural pruning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201923). 16091\u201316101."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2008.05.003"},{"key":"e_1_3_2_9_2","first-page":"2242","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ghorbani Amirata","year":"2019","unstructured":"Amirata Ghorbani and James Zou. 2019. Data Shapley: Equitable valuation of data for machine learning. In Proceedings of the International Conference on Machine Learning. 2242\u20132251."},{"key":"e_1_3_2_10_2","unstructured":"Amirata Ghorbani and James Y. Zou. 2020. Neuron Shapley: Discovering the responsible neurons. Advances in Neural Information Processing Systems 33 (2020) 5922\u20135932. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/41c542dfe6e4fc3deb251d64cf6ed2e4-Paper.pdf"},{"key":"e_1_3_2_11_2","first-page":"5922","article-title":"Neuron Shapley: Discovering the responsible neurons","volume":"33","author":"Ghorbani Amirata","year":"2020","unstructured":"Amirata Ghorbani and James Y. Zou. 2020. Neuron Shapley: Discovering the responsible neurons. Advances in Neural Information Processing Systems 33 (2020), 5922\u20135932.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"e_1_3_2_13_2","article-title":"Learning both weights and connections for efficient neural network","volume":"28","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems 28 (2015), 1135\u20131143.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_14_2","article-title":"Second order derivatives for network pruning: Optimal brain surgeon","volume":"5","author":"Hassibi Babak","year":"1992","unstructured":"Babak Hassibi and David Stork. 1992. Second order derivatives for network pruning: Optimal brain surgeon. Advances in Neural Information Processing Systems 5 (1992), 164\u2013171.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_48"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00447"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_2_19_2","unstructured":"GitHub. 2023. MindSpore Lite. Retrieved April 18 2025 from https:\/\/github.com\/mindsporeai\/mindspore"},{"key":"e_1_3_2_20_2","article-title":"Network trimming: A data-driven neuron pruning approach towards efficient deep architectures","author":"Hu Hengyuan","year":"2016","unstructured":"Hengyuan Hu, Rui Peng, Yu-Wing Tai, and Chi-Keung Tang. 2016. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250 (2016).","journal-title":"arXiv preprint arXiv:1607.03250"},{"key":"e_1_3_2_21_2","first-page":"1","article-title":"A review and state of art of Internet of Things (IoT)","author":"Laghari Asif Ali","year":"2021","unstructured":"Asif Ali Laghari, Kaishan Wu, Rashid Ali Laghari, Mureed Ali, and Abdullah Ayub Khan. 2021. A review and state of art of Internet of Things (IoT). In Archives of Computational Methods in Engineering. Springer, 1\u201319.","journal-title":"Archives of Computational Methods in Engineering."},{"key":"e_1_3_2_22_2","article-title":"Optimal brain damage","volume":"2","author":"LeCun Yann","year":"1989","unstructured":"Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage. Advances in Neural Information Processing Systems 2 (1989), 598\u2013605.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11713"},{"key":"e_1_3_2_24_2","article-title":"Pruning filters for efficient ConvNets","author":"Li Hao","year":"2016","unstructured":"Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient ConvNets. arXiv preprint arXiv:1608.08710 (2016).","journal-title":"arXiv preprint arXiv:1608.08710"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1145\/3447993.3483243","volume-title":"Proceedings of the 27th Annual International Conference on Mobile Computing and Networking","author":"Liao Zimo","year":"2021","unstructured":"Zimo Liao, Zhicheng Luo, Qianyi Huang, Linfeng Zhang, Fan Wu, Qian Zhang, and Yi Wang. 2021. SMART: Screen-based gesture recognition on commodity mobile devices. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 283\u2013295."},{"key":"e_1_3_2_27_2","unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2019. Continuous control with deep reinforcement learning. arxiv:1509.02971[cs.LG] (2019)."},{"key":"e_1_3_2_28_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201920)","author":"Lin Mingbao","year":"2020","unstructured":"Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. 2020. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR \u201920)."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3195774"},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Computer Vision\u2014ECCV 2014. Lecture Notes in Computer Science Vol. 8693. Springer 740\u2013755.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5924"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","unstructured":"Sicong Liu Yingyan Lin Zimu Zhou Kaiming Nan Hui Liu and Junzhao Du. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. InProceedings of the 16th Annual International Conference on Mobile Systems Applications and Services (MobiSys \u201918). ACM New York NY USA 389\u2013400. DOI:10.1145\/3210240.3210337","DOI":"10.1145\/3210240.3210337"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00339"},{"key":"e_1_3_2_34_2","article-title":"Rethinking the value of network pruning","author":"Liu Zhuang","year":"2018","unstructured":"Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).","journal-title":"arXiv preprint arXiv:1810.05270"},{"key":"e_1_3_2_35_2","article-title":"Frequency-domain dynamic pruning for convolutional neural networks","volume":"31","author":"Liu Zhenhua","year":"2018","unstructured":"Zhenhua Liu, Jizheng Xu, Xiulian Peng, and Ruiqin Xiong. 2018. Frequency-domain dynamic pruning for convolutional neural networks. Advances in Neural Information Processing Systems 31 (2018), 1\u201311.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.541"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3666025.3699319"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/978-3-030-57321-8_2","volume-title":"Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction","author":"Merrick Luke","year":"2020","unstructured":"Luke Merrick and Ankur Taly. 2020. The explanation game: Explaining machine learning models using Shapley values. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction. 17\u201338."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3554980"},{"issue":"5","key":"e_1_3_2_40_2","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1287\/mnsc.18.5.64","article-title":"Multilinear extensions of games","volume":"18","author":"Owen Guillermo","year":"1972","unstructured":"Guillermo Owen. 1972. Multilinear extensions of games. Management Science 18, 5 Pt. 2 (1972), 64\u201379.","journal-title":"Management Science"},{"key":"e_1_3_2_41_2","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS \u201917)."},{"key":"e_1_3_2_42_2","unstructured":"RangiLyu. 2021. NanoDet-Plus: Super Fast and High Accuracy Lightweight Anchor-Free Object Detection Model. Retrieved April 18 2025 from https:\/\/github.com\/RangiLyu\/nanodet"},{"key":"e_1_3_2_43_2","doi-asserted-by":"crossref","unstructured":"L. Shapley. 1953. Quota solutions of n-person games. In Contributions to the Theory of Games Volume II edited by Harold William Kuhn and Albert William Tucker. Princeton University Press 343\u2013360.","DOI":"10.1515\/9781400881970-021"},{"key":"e_1_3_2_44_2","unstructured":"Maying Shen Hongxu Yin Pavlo Molchanov Lei Mao Jianna Liu and Jose M. Alvarez. 2021. HALP: Hardware-aware latency pruning. arXiv:2110.10811 (2021). https:\/\/arxiv.org\/abs\/2110.10811"},{"key":"e_1_3_2_45_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).","journal-title":"arXiv preprint arXiv:1409.1556"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00293"},{"key":"e_1_3_2_47_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. 6105\u20136114."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"e_1_3_2_49_2","first-page":"2597","volume-title":"Proceedings of the 2019 IEEE International Conference on Big Data (Big Data \u201919)","author":"Wang Guan","year":"2019","unstructured":"Guan Wang, Charlie Xiaoqian Dang, and Ziye Zhou. 2019. Measure contribution of participants in federated learning. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data \u201919). IEEE, 2597\u20132604."},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV \u201918)","author":"Yang Tien-Ju","year":"2018","unstructured":"Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, and Hartwig Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV \u201918)."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00958"},{"key":"e_1_3_2_52_2","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1145\/3495243.3517016","volume-title":"Proceedings of the 28th Annual International Conference on Mobile Computing And Networking","author":"Yuan Mu","year":"2022","unstructured":"Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, and Xiang-Yang Li. 2022. InFi: End-to-end learnable input filter for resource-efficient mobile-centric inference. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 228\u2013241."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3603269.3604825"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2018.03.017"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460200"}],"container-title":["ACM Transactions on Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729534","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729534","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:56:57Z","timestamp":1750298217000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729534"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,21]]},"references-count":54,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3729534"],"URL":"https:\/\/doi.org\/10.1145\/3729534","relation":{},"ISSN":["1550-4859","1550-4867"],"issn-type":[{"type":"print","value":"1550-4859"},{"type":"electronic","value":"1550-4867"}],"subject":[],"published":{"date-parts":[[2025,5,21]]},"assertion":[{"value":"2024-04-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-04","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}