{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T02:11:42Z","timestamp":1770257502408,"version":"3.49.0"},"reference-count":147,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,6,30]]},"abstract":"<jats:p>With rapid advancements in artificial intelligence and Internet of Things technologies, the deployment of deep neural network (DNN) models on the edge nodes and the end nodes has become an essential trend. However, the limited computational power, storage capacity, and resource constraints of these devices present significant challenges for deep learning inference. Traditional acceleration methods, such as model compression and hardware optimization, often struggle to balance real-time performance, accuracy, and cost-effectiveness. To address these challenges, collaborative inference through DNN partitioning has emerged as a promising solution. This article provides a comprehensive overview of architectural frameworks for DNN partitioning in collaborative inference. We establish a unified mathematical framework to describe various architectures, DNN models, and their associated optimization problems. In addition, we systematically classify and analyze existing partitioning strategies based on partition count and granularity. Furthermore, we summarize commonly used experimental setups and tools, offering practical insight into implementation. Finally, we discuss key challenges and open issues in DNN partitioning for collaborative inference, such as ensuring data security and privacy and efficiently partitioning large-scale models, providing valuable guidance for future research.<\/jats:p>","DOI":"10.1145\/3786145","type":"journal-article","created":{"date-parts":[[2025,12,30]],"date-time":"2025-12-30T21:24:55Z","timestamp":1767129895000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["DNN Partitioning for Cooperative Inference in Edge Intelligence: Modeling, Solutions, Toolchains"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-3835-3434","authenticated-orcid":false,"given":"Yuntao","family":"Hao","sequence":"first","affiliation":[{"name":"Dalian University of Technology School of Computer Science and Technology","place":["Dalian, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9958-8224","authenticated-orcid":false,"given":"Nan","family":"Ding","sequence":"additional","affiliation":[{"name":"Dalian University of Technology School of Computer Science and Technology","place":["Dalian, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9970-7684","authenticated-orcid":false,"given":"Weiguo","family":"Xia","sequence":"additional","affiliation":[{"name":"Dalian University of Technology School of Control Science and Engineering","place":["Dalian, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8937-1515","authenticated-orcid":false,"given":"Hongwei","family":"Ge","sequence":"additional","affiliation":[{"name":"Dalian University of Technology School of Computer Science and Technology","place":["Dalian, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6710-8032","authenticated-orcid":false,"given":"Li","family":"Xu","sequence":"additional","affiliation":[{"name":"Beijing Bytedance Technology Co Ltd","place":["Beijing, China"]}]}],"member":"320","published-online":{"date-parts":[[2026,2,4]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1155\/2021\/8812542"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.4236\/jdaip.2021.92006"},{"key":"e_1_3_1_4_2","first-page":"1","article-title":"Design possibilities and challenges of DNN models: A review on the perspective of end devices","author":"Hussain Hanan","year":"2022","unstructured":"Hanan Hussain, P. S. Tamizharasan, and C. S. Rahul. 2022. Design possibilities and challenges of DNN models: A review on the perspective of end devices. Artificial Intelligence Review (2022), 1\u201359.","journal-title":"Artificial Intelligence Review"},{"key":"e_1_3_1_5_2","article-title":"Clothes-changing person re-identification via universal framework with association and forgetting learning","author":"Liu Yuxuan","year":"2023","unstructured":"Yuxuan Liu, Hongwei Ge, Zhen Wang, Yaqing Hou, and Mingde Zhao. 2023. Clothes-changing person re-identification via universal framework with association and forgetting learning. IEEE Transactions on Multimedia (2023).","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1162\/dint_a_00213"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2983149"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2020.101964"},{"key":"e_1_3_1_9_2","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 1097\u20131105.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_10_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.fmre.2021.11.011"},{"key":"e_1_3_1_12_2","unstructured":"Ziheng Jiang Tianqi Chen and Mu Li. 2018. Efficient deep learning inference on edge devices. In OSDI 2016 Conference Paper (2018)."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2983149"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFIRTP56122.2022.10059434"},{"key":"e_1_3_1_15_2","article-title":"Deep learning with limited numerical precision","author":"Gupta Suyog","year":"2015","unstructured":"Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the International Conference on Machine Learning (ICML).","journal-title":"Proceedings of the International Conference on Machine Learning (ICML)"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_1_17_2","unstructured":"Markus Nagel Raoul Amjad Mart van Baalen Tijmen Blankevoort and Max Welling. 2021. A white paper on neural network quantization. arXiv:2106.08295. Retrieved from https:\/\/arxiv.org\/abs\/2106.08295"},{"key":"e_1_3_1_18_2","article-title":"Learning both weights and connections for efficient neural networks","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS).","journal-title":"Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_3_1_19_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Li Hao","year":"2017","unstructured":"Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_20_2","article-title":"Learning sparse neural networks through  \\(L_0\\)  regularization","author":"Louizos Christos","year":"2018","unstructured":"Christos Louizos, Max Welling, and Diederik P. Kingma. 2018. Learning sparse neural networks through \\(L_0\\) regularization. In Proceedings of the International Conference on Learning Representations (ICLR).","journal-title":"Proceedings of the International Conference on Learning Representations (ICLR)"},{"key":"e_1_3_1_21_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https:\/\/arxiv.org\/abs\/1503.02531"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01453-z"},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Romero Adriana","year":"2015","unstructured":"Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. Fitnets: Hints for thin deep nets. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_1_25_2","article-title":"Should I leave this layer? A theoretical and practical perspective on early-exit strategies for deep neural networks","author":"Scardapane Simone","year":"2020","unstructured":"Simone Scardapane, Dian Wang, Yao Wang, and Aurelio Uncini. 2020. Should I leave this layer? A theoretical and practical perspective on early-exit strategies for deep neural networks. IEEE Transactions on Neural Networks and Learning Systems (2020).","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3186332"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_1_32_2","doi-asserted-by":"crossref","unstructured":"Ranjan Sapkota Rizwan Qureshi Marco Flores Calero Muhammad Hussain Chetan Badjugar Upesh Nepal Alwin Poulose Peter Zeno Uday Bhanu Prakash Vaddevolu Hong Yan et\u00a0al. 2024. Yolov10 to its genesis: A decadal and comprehensive review of the you only look once series. arXiv:2406.19407. Retrieved from https:\/\/arxiv.org\/abs\/2406.19407","DOI":"10.20944\/preprints202406.1366.v1"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3244497"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2023.3280746"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2020.2970550"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2019.2918951"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.3390\/s23041911"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-022-10141-4"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2022.3226481"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2022.3200740"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-022-1391-7"},{"key":"e_1_3_1_42_2","unstructured":"Di Xu Xiang He Tonghua Su and Zhongjie Wang. 2023. A survey on deep neural network partition over cloud edge and end devices. arXiv:2304.10020. Retrieved from https:\/\/arxiv.org\/abs\/2304.10020"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2019.2921977"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2022.3218527"},{"key":"e_1_3_1_45_2","unstructured":"Federico Nicol\u00e1s Peccia and Oliver Bringmann. 2024. Embedded distributed inference of deep neural networks: A systematic review. arXiv:2405.03360. Retrieved from https:\/\/arxiv.org\/abs\/2405.03360"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2024.3393230"},{"key":"e_1_3_1_47_2","article-title":"A survey on deep learning in edge-cloud collaboration: Model partitioning, privacy preservation, and prospects","author":"Zhang Xichen","year":"2025","unstructured":"Xichen Zhang, Roozbeh Razavi-Far, Haruna Isah, Amir David, Griffin Higgins, and Michael Zhang. 2025. A survey on deep learning in edge-cloud collaboration: Model partitioning, privacy preservation, and prospects. Knowledge-Based Systems 310, Article 112965 (2025).","journal-title":"Knowledge-Based Systems"},{"key":"e_1_3_1_48_2","article-title":"DNN partitioning, task offloading, and resource allocation in dynamic vehicular networks: A Lyapunov-guided diffusion-based reinforcement learning approach","author":"Liu Zhang","year":"2024","unstructured":"Zhang Liu, Hongyang Du, Junzhe Lin, Zhibin Gao, Lianfen Huang, Seyyedali Hosseinalipour, and Dusit Niyato. 2024. DNN partitioning, task offloading, and resource allocation in dynamic vehicular networks: A Lyapunov-guided diffusion-based reinforcement learning approach. IEEE Transactions on Mobile Computing (2024).","journal-title":"IEEE Transactions on Mobile Computing"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/DASC\/PiCom\/DataCom\/CyberSciTec.2018.000-4"},{"key":"e_1_3_1_50_2","unstructured":"Erxue Min Runfa Chen Yatao Bian Tingyang Xu Kangfei Zhao Wenbing Huang Peilin Zhao Junzhou Huang Sophia Ananiadou and Yu Rong. 2022. Transformer for graphs: An overview from architecture perspective. arXiv:2202.08455. Retrieved from https:\/\/arxiv.org\/abs\/2202.08455"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSD57027.2022.00048"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.naacl-industry.1"},{"key":"e_1_3_1_53_2","article-title":"Edgeshard: Efficient LLM inference via collaborative edge computing","author":"Zhang Mingjin","year":"2024","unstructured":"Mingjin Zhang, Xiaoming Shen, Jiannong Cao, Zeyang Cui, and Shan Jiang. 2024. Edgeshard: Efficient LLM inference via collaborative edge computing. IEEE Internet of Things Journal (2024).","journal-title":"IEEE Internet of Things Journal"},{"key":"e_1_3_1_54_2","unstructured":"Yuxuan Chen Rongpeng Li Xiaoxue Yu Zhifeng Zhao and Honggang Zhang. 2024. Adaptive layer splitting for wireless LLM inference in edge computing: A model-based reinforcement learning approach. arXiv:2406.02616. Retrieved from https:\/\/arxiv.org\/abs\/2406.02616"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_56_2","doi-asserted-by":"crossref","unstructured":"C. Szegedy W. Liu Y. Jia P. Sermanet S. Reed D. Anguelov D. Erhan V. Vanhoucke and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1\u20139.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_57_2","unstructured":"Andrew G. Howard. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2020.3042320"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3650200.3656628"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093337.3037698"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2019.2946140"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2023.103679"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCS52626.2021.9449178"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737614"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2019.2947893"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2021.3125949"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS54860.2022.00053"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN55064.2022.9892582"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics12173598"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.3010258"},{"issue":"9","key":"e_1_3_1_71_2","first-page":"5389","article-title":"Multi-exit DNN inference acceleration based on multi-dimensional optimization for edge intelligence","volume":"22","author":"Dong Fang","year":"2022","unstructured":"Fang Dong, Huitian Wang, Dian Shen, Zhaowu Huang, Qiang He, Jinghui Zhang, Liangsheng Wen, and Tingting Zhang. 2022. Multi-exit DNN inference acceleration based on multi-dimensional optimization for edge intelligence. IEEE Transactions on Mobile Computing 22, 9 (2022), 5389\u20135405.","journal-title":"IEEE Transactions on Mobile Computing"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/3634704"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2024.3354033"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.23919\/APNOMS52696.2021.9562657"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2022.3220521"},{"key":"e_1_3_1_76_2","unstructured":"Xiang Yang Dezhi Chen Qi Qi Jingyu Wang Haifeng Sun Jianxin Liao and Song Guo. 2023. Adaptive DNN surgery for selfish inference acceleration with on-demand edge resource. arXiv:2306.12185. Retrieved from https:\/\/arxiv.org\/abs\/2306.12185"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/CSCWD57460.2023.10152842"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2023.3276937"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2023.3279271"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/8804530"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1186\/s13677-023-00493-9"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2022.3219058"},{"key":"e_1_3_1_83_2","article-title":"Adaptive device-edge collaboration on DNN inference in AIoT: A digital twin-assisted approach","author":"Hu Shisheng","year":"2023","unstructured":"Shisheng Hu, Mushu Li, Jie Gao, Conghao Zhou, and Xuemin Sherman Shen. 2023. Adaptive device-edge collaboration on DNN inference in AIoT: A digital twin-assisted approach. IEEE Internet of Things Journal (2023).","journal-title":"IEEE Internet of Things Journal"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2023.109801"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155237"},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.3390\/s21010229"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS60453.2023.00150"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2023.3279512"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1145\/3630098"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC57777.2023.10422172"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.2981338"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2023.3235993"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2023.3237361"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/Ucom59132.2023.10257649"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2023.3266795"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS51040.2020.00097"},{"key":"e_1_3_1_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/IWCMC55113.2022.9824945"},{"key":"e_1_3_1_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOBECOM46510.2021.9685971"},{"key":"e_1_3_1_99_2","article-title":"PDD: Partitioning DAG-topology DNNs for streaming tasks","author":"Wu Liantao","year":"2023","unstructured":"Liantao Wu, Guoliang Gao, Jing Yu, Fangtong Zhou, Yang Yang, and Tengfei Wang. 2023. PDD: Partitioning DAG-topology DNNs for streaming tasks. IEEE Internet of Things Journal (2023).","journal-title":"IEEE Internet of Things Journal"},{"key":"e_1_3_1_100_2","doi-asserted-by":"publisher","DOI":"10.3390\/app122010619"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i6.25815"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS60453.2023.00330"},{"key":"e_1_3_1_103_2","article-title":"Distributed computation of DNN via DRL with spatio-temporal state embedding","author":"Kim Suhwan","year":"2023","unstructured":"Suhwan Kim, Sehun Jung, and Hyang-Won Lee. 2023. Distributed computation of DNN via DRL with spatio-temporal state embedding. IEEE Internet of Things Journal (2023).","journal-title":"IEEE Internet of Things Journal"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2023.3237572"},{"key":"e_1_3_1_105_2","article-title":"Joint optimization of device placement and model partitioning for cooperative DNN inference in heterogeneous edge computing","author":"Dai Penglin","year":"2024","unstructured":"Penglin Dai, Biao Han, Ke Li, Xincao Xu, Huanlai Xing, and Kai Liu. 2024. Joint optimization of device placement and model partitioning for cooperative DNN inference in heterogeneous edge computing. IEEE Transactions on Mobile Computing (2024).","journal-title":"IEEE Transactions on Mobile Computing"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/PADSW.2018.8645013"},{"key":"e_1_3_1_107_2","article-title":"Learning-based edge-device collaborative DNN inference in IoVT networks","author":"Xu Xiaodong","year":"2023","unstructured":"Xiaodong Xu, Kaiwen Yan, Shujun Han, Bizhu Wang, Xiaofeng Tao, and Ping Zhang. 2023. Learning-based edge-device collaborative DNN inference in IoVT networks. IEEE Internet of Things Journal (2023).","journal-title":"IEEE Internet of Things Journal"},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2024.104850"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2021.3116597"},{"key":"e_1_3_1_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3409057"},{"key":"e_1_3_1_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2022.3192882"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOBECOM42002.2020.9322591"},{"key":"e_1_3_1_113_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2023.103720"},{"key":"e_1_3_1_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2021.3116665"},{"key":"e_1_3_1_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/GLOBECOM48099.2022.10000741"},{"key":"e_1_3_1_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGCN.2021.3111731"},{"key":"e_1_3_1_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCSW53096.2021.00010"},{"key":"e_1_3_1_118_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2017.7927211"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2017.8203852"},{"key":"e_1_3_1_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2858384"},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.1145\/3318216.3363312"},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796896"},{"key":"e_1_3_1_123_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS60910.2024.00017"},{"key":"e_1_3_1_124_2","article-title":"Co-designing transformer architectures for distributed inference with low communication","author":"Du Jiangsu","year":"2024","unstructured":"Jiangsu Du, Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, and Yutong Lu. 2024. Co-designing transformer architectures for distributed inference with low communication. IEEE Transactions on Parallel and Distributed Systems (2024).","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_1_125_2","first-page":"606","article-title":"Efficiently scaling transformer inference","volume":"5","author":"Pope Reiner","year":"2023","unstructured":"Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. 2023. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems 5 (2023), 606\u2013624.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_126_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv:1909.08053. Retrieved from https:\/\/arxiv.org\/abs\/1909.08053"},{"key":"e_1_3_1_127_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_1_128_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et\u00a0al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_1_129_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_1_130_2","unstructured":"A. Paszke. 2019. Pytorch: An imperative style high-performance deep learning library. arXiv:1912.01703. Retrieved from https:\/\/arxiv.org\/abs\/1912.01703"},{"key":"e_1_3_1_131_2","article-title":"TensorFlow: Large-scale machine learning on heterogeneous systems","volume":"7","author":"Mart\u00edn Abadi","year":"2015","unstructured":"Abadi Mart\u00edn, Agarwal Ashish, Barham Paul, Brevdo Eugene, Chen Zhifeng, Citro Craig, S. Corrado Greg, Davis Andy, Dean Jeffrey, Devin Matthieu, et\u00a0al. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. Software Available from Tensorflow. org 7 (2015).","journal-title":"Software Available from Tensorflow. org"},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2016.7900006"},{"key":"e_1_3_1_134_2","first-page":"1","volume-title":"Proceedings of Workshop on Machine Learning Systems (LearningSys) in the 29th Annual Conference on Neural Information Processing Systems (NIPS)","volume":"5","author":"Tokui Seiya","year":"2015","unstructured":"Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: A next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in the 29th Annual Conference on Neural Information Processing Systems (NIPS), Vol. 5. 1\u20136."},{"key":"e_1_3_1_135_2","doi-asserted-by":"publisher","DOI":"10.1109\/IC2E52221.2021.00026"},{"key":"e_1_3_1_136_2","article-title":"Joint optimization of device placement and model partitioning for cooperative DNN inference in heterogeneous edge computing","author":"Dai Penglin","year":"2024","unstructured":"Penglin Dai, Biao Han, Ke Li, Xincao Xu, Huanlai Xing, and Kai Liu. 2024. Joint optimization of device placement and model partitioning for cooperative DNN inference in heterogeneous edge computing. IEEE Transactions on Mobile Computing (2024).","journal-title":"IEEE Transactions on Mobile Computing"},{"key":"e_1_3_1_137_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.2987070"},{"key":"e_1_3_1_138_2","first-page":"767","volume-title":"Proceedings of the International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications: ICMISC 2020","author":"Ghallab Abdullatif","year":"2021","unstructured":"Abdullatif Ghallab, Mohammed H. Saif, and Abdulqader Mohsen. 2021. Data integrity and security in distributed cloud computing\u2013A review. In Proceedings of the International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications: ICMISC 2020. Springer, 767\u2013784."},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813677"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00175"},{"issue":"6","key":"e_1_3_1_141_2","first-page":"162","article-title":"MitM attacks on federated learning: Challenges and research opportunities","volume":"33","author":"Yu Lei","year":"2019","unstructured":"Lei Yu, Ximeng Liu, Yixian Yang, and Jianfeng Ma. 2019. MitM attacks on federated learning: Challenges and research opportunities. IEEE Network 33, 6 (2019), 162\u2013167.","journal-title":"IEEE Network"},{"key":"e_1_3_1_142_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2017.3421554"},{"key":"e_1_3_1_143_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490237"},{"key":"e_1_3_1_144_2","doi-asserted-by":"publisher","DOI":"10.1145\/3214303"},{"key":"e_1_3_1_145_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-2767-8_40"},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASP-DAC58780.2024.10473970"},{"key":"e_1_3_1_147_2","doi-asserted-by":"publisher","DOI":"10.1109\/EDGE55608.2022.00029"},{"key":"e_1_3_1_148_2","doi-asserted-by":"publisher","DOI":"10.1109\/CLOUD53861.2021.00017"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3786145","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T12:18:11Z","timestamp":1770207491000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3786145"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,4]]},"references-count":147,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2026,6,30]]}},"alternative-id":["10.1145\/3786145"],"URL":"https:\/\/doi.org\/10.1145\/3786145","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,4]]},"assertion":[{"value":"2025-03-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-07","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-02-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}