{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:09:05Z","timestamp":1760058545096,"version":"build-2065373602"},"reference-count":18,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T00:00:00Z","timestamp":1744329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Distributed inference in resource-constrained heterogeneous edge clusters is fundamentally limited by disparities in device capabilities and load imbalance issues. Existing methods predominantly focus on optimizing single-pipeline allocation schemes for partitioned sub-models. However, such approaches often lead to load imbalance and suboptimal resource utilization under concurrent batch processing scenarios. To address these challenges, we propose a non-uniform deployment inference framework (NUDIF), which achieves high-throughput distributed inference service by adapting to heterogeneous resources and balancing inter-stage processing capabilities. Formulated as a mixed-integer nonlinear programming (MINLP) problem, NUDIF is responsible for planning the number of instances for each sub-model and determining the specific devices for deploying these instances, while considering computational capacity, memory constraints, and communication latency. This optimization minimizes inter-stage processing discrepancies and maximizes resource utilization. Experimental evaluations demonstrate that NUDIF enhances system throughput by an average of 9.95% compared to traditional single-pipeline optimization methods under various scales of cluster device configurations.<\/jats:p>","DOI":"10.3390\/fi17040168","type":"journal-article","created":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T06:45:31Z","timestamp":1744353931000},"page":"168","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters"],"prefix":"10.3390","volume":"17","author":[{"given":"Peng","family":"Li","sequence":"first","affiliation":[{"name":"National Key Laboratory of Complex Aviation System Simulation, Chengdu 610036, China"},{"name":"Southwest China Institute of Electronic Technology, Chengdu 610036, China"}]},{"given":"Chen","family":"Qing","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Complex Aviation System Simulation, Chengdu 610036, China"},{"name":"Southwest China Institute of Electronic Technology, Chengdu 610036, China"}]},{"given":"Hao","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications (BUPT), Beijing 100876, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1738","DOI":"10.1109\/JPROC.2019.2918951","article-title":"Edge intelligence: Paving the last mile of artificial intelligence with edge computing","volume":"107","author":"Zhou","year":"2019","journal-title":"Proc. IEEE"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1109\/TMC.2019.2947893","article-title":"JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services","volume":"20","author":"Eshratifar","year":"2019","journal-title":"IEEE Trans. Mob. Comput."},{"doi-asserted-by":"crossref","unstructured":"Hu, Y., Imes, C., Zhao, X., Kundu, S., Beerel, P.A., Crago, S.P., and Walters, J.P. (September, January 31). Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices. Proceedings of the 2022 25th IEEE Euromicro Conference on Digital System Design (DSD), Maspalomas, Spain.","key":"ref_3","DOI":"10.1109\/DSD57027.2022.00048"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1109\/COMST.2022.3218527","article-title":"Distributed artificial intelligence empowered by end-edge-cloud computing: A survey","volume":"25","author":"Duan","year":"2022","journal-title":"IEEE Commun. Surv. Tutor."},{"doi-asserted-by":"crossref","unstructured":"Zhao, J., Wan, B., Peng, Y., Lin, H., and Wu, C. (2024). Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization. arXiv.","key":"ref_5","DOI":"10.1145\/3627535.3638480"},{"doi-asserted-by":"crossref","unstructured":"Zhang, M., Shen, X., Cao, J., Cui, Z., and Jiang, S. (2024). Edgeshard: Efficient llm inference via collaborative edge computing. IEEE Internet Things J.","key":"ref_6","DOI":"10.1109\/JIOT.2024.3524255"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"103366","DOI":"10.1016\/j.jnca.2022.103366","article-title":"Computation offloading in mobile edge computing networks: A survey","volume":"202","author":"Feng","year":"2022","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1016\/j.ejor.2015.12.018","article-title":"Global optimization advances in mixed-integer nonlinear programming, MINLP, and constrained derivative-free optimization, CDFO","volume":"252","author":"Boukouvala","year":"2016","journal-title":"Eur. J. Oper. Res."},{"doi-asserted-by":"crossref","unstructured":"Yang, C.Y., Kuo, J.J., Sheu, J.P., and Zheng, K.J. (2021, January 14\u201323). Cooperative distributed deep neural network deployment with edge computing. Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada.","key":"ref_9","DOI":"10.1109\/ICC42927.2021.9500668"},{"doi-asserted-by":"crossref","unstructured":"Liu, H., Zheng, H., Jiao, M., and Chi, G. (2018, January 8\u201311). SCADS: Simultaneous computing and distribution strategy for task offloading in mobile-edge computing system. Proceedings of the 2018 IEEE 18th International Conference on Communication Technology (ICCT), Chongqing, China.","key":"ref_10","DOI":"10.1109\/ICCT.2018.8599958"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1109\/TGCN.2021.3111731","article-title":"EosDNN: An efficient offloading scheme for DNN inference acceleration in local-edge-cloud collaborative environments","volume":"6","author":"Xue","year":"2021","journal-title":"IEEE Trans. Green Commun. Netw."},{"doi-asserted-by":"crossref","unstructured":"Mohammed, T., Joe-Wong, C., Babbar, R., and Di Francesco, M. (2020, January 6\u20139). Distributed inference acceleration with adaptive DNN partitioning and offloading. Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada.","key":"ref_12","DOI":"10.1109\/INFOCOM41043.2020.9155237"},{"doi-asserted-by":"crossref","unstructured":"Lin, C.Y., Wang, T.C., Chen, K.C., Lee, B.Y., and Kuo, J.J. (2019, January 2). Distributed deep neural network deployment for smart devices from the edge to the cloud. Proceedings of the ACM MobiHoc Workshop on Pervasive Systems in the IoT Era, Catania, Italy.","key":"ref_13","DOI":"10.1145\/3331052.3332477"},{"doi-asserted-by":"crossref","unstructured":"Dhuheir, M., Baccour, E., Erbad, A., Sabeeh, S., and Hamdi, M. (2, January 28). Efficient real-time image recognition using collaborative swarm of uavs and convolutional networks. Proceedings of the 2021 IEEE International Wireless Communications and Mobile Computing (IWCMC), Harbin, China.","key":"ref_14","DOI":"10.1109\/IWCMC51323.2021.9498967"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1227","DOI":"10.1109\/JIOT.2021.3079164","article-title":"Distributed CNN inference on resource-constrained UAVs for surveillance systems: Design and optimization","volume":"9","author":"Jouhari","year":"2021","journal-title":"IEEE Internet Things J."},{"doi-asserted-by":"crossref","unstructured":"Hemmat, M., Davoodi, A., and Hu, Y.H. (2022, January 17\u201320). EdgenAI: Distributed Inference with Local Edge Devices and Minimal Latency. Proceedings of the 2022 27th IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan.","key":"ref_16","DOI":"10.1109\/ASP-DAC52403.2022.9712496"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2348","DOI":"10.1109\/TCAD.2018.2858384","article-title":"Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters","volume":"37","author":"Zhao","year":"2018","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"unstructured":"Gurobi Optimization, LLC (2025, April 06). Gurobi Optimizer Reference Manual. Available online: https:\/\/www.gurobi.com.","key":"ref_18"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/4\/168\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:12:47Z","timestamp":1760029967000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/4\/168"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,11]]},"references-count":18,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["fi17040168"],"URL":"https:\/\/doi.org\/10.3390\/fi17040168","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2025,4,11]]}}}