{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T09:39:23Z","timestamp":1765186763149,"version":"3.46.0"},"reference-count":32,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T00:00:00Z","timestamp":1765152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ningxia Natural Science Foundation of China","award":["2023AAC03846"],"award-info":[{"award-number":["2023AAC03846"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>In the field of communication maintenance, Augmented Reality (AR) applications are critical for enhancing operational safety and efficiency. However, deploying the required multimodal models on resource-constrained terminal devices is challenging, as traditional cloud or on-device strategies fail to balance low latency and energy consumption. This paper proposes a Cloud-Edge-End collaborative inference framework tailored to multimodal model deployment. A subgraph partitioning strategy is introduced to systematically decompose complex multimodal models into functionally independent sub-units. Subsequently, a fine-grained performance estimation model is employed to accurately characterize both computation and communication costs across heterogeneous devices. And, a joint optimization problem is formulated to minimize end-to-end inference latency and terminal energy consumption. To solve this problem efficiently, a Hybrid Genetic Algorithm for DNN Partitioning (HGA-DP) evolved over 100 generations is designed, incorporating constraint-aware repair mechanisms and local neighborhood search to navigate the exponential search space of possible deployment combinations. Experimental results on a simulated three-tier collaborative computing platform demonstrate that, compared to traditional full on-device deployment, the proposed method reduces end-to-end inference latency by 70\u201380% and terminal energy consumption by 81.1%, achieving a 4.86\u00d7 improvement in overall fitness score. Against the latency-optimized DADS heuristic, HGA-DP achieves 41.3% lower latency while reducing energy by 59.9%. Compared to the All-Cloud strategy, our approach delivers 71.5% latency reduction with only marginal additional terminal energy cost. This framework provides an adaptive and effective solution for real-time multimodal inference in resource-constrained scenarios, laying a foundation for intelligent, resource-aware deployment.<\/jats:p>","DOI":"10.3390\/info16121091","type":"journal-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T09:17:36Z","timestamp":1765185456000},"page":"1091","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["HGA-DP: Optimal Partitioning of Multimodal DNNs Enabling Real-Time Image Inference for AR-Assisted Communication Maintenance on Cloud-Edge-End Systems"],"prefix":"10.3390","volume":"16","author":[{"given":"Cong","family":"Ye","sequence":"first","affiliation":[{"name":"Information & Communication Company of State Grid Ningxia Electric Power Co., Ltd., Yinchuan 750001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruihang","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiao","family":"Li","sequence":"additional","affiliation":[{"name":"Information & Communication Company of State Grid Ningxia Electric Power Co., Ltd., Yinchuan 750001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenlong","family":"Deng","sequence":"additional","affiliation":[{"name":"Information & Communication Company of State Grid Ningxia Electric Power Co., Ltd., Yinchuan 750001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianlei","family":"Wang","sequence":"additional","affiliation":[{"name":"Information & Communication Company of State Grid Ningxia Electric Power Co., Ltd., Yinchuan 750001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3945-0706","authenticated-orcid":false,"given":"Sujie","family":"Shao","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1109\/COMST.2021.3061981","article-title":"A Survey on Mobile Augmented Reality with 5G Mobile Edge Computing: Architectures, Applications, and Technical Aspects","volume":"23","author":"Siriwardhana","year":"2021","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"63373","DOI":"10.1109\/ACCESS.2019.2916887","article-title":"Deep Multimodal Representation Learning: A Survey","volume":"7","author":"Guo","year":"2019","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Oh, S., Kim, M., Kim, D., Jeong, M., and Lee, M. (2017, January 8\u201310). Investigation on performance and energy efficiency of CNN-based object detection on embedded device. Proceedings of the 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), Bali, Indonesia.","DOI":"10.1109\/CAIPT.2017.8320657"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.future.2012.05.023","article-title":"Mobile cloud computing: A survey","volume":"29","author":"Fernando","year":"2013","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2647","DOI":"10.1109\/COMST.2024.3393230","article-title":"End-Edge-Cloud Collaborative Computing for Deep Learning: A Comprehensive Survey","volume":"26","author":"Wang","year":"2024","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_6","unstructured":"Kar, B., Yahya, W., Lin, Y.-D., and Ali, A. (2022). A Survey on Offloading in Federated Cloud-Edge-Fog Systems with Traditional Optimization and Machine Learning. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.future.2024.01.025","article-title":"Joint optimization of multi-dimensional resource allocation and task offloading for QoE enhancement in Cloud-Edge-End collaboration","volume":"155","author":"Zeng","year":"2024","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"105189","DOI":"10.1016\/j.jpdc.2025.105189","article-title":"Multi-modal model partition strategy for end-edge collaborative inference","volume":"208","author":"Huo","year":"2026","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1186\/s13638-023-02284-x","article-title":"Partitioning multi-layer edge network for neural network collaborative computing","volume":"2023","author":"Li","year":"2023","journal-title":"EURASIP J. Wirel. Commun. Netw."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, S.-F., Zhai, J.-H., Xie, B.-J., Zhan, Y., and Wang, X. (2019, January 7\u201310). Multimodal Representation Learning: Advances, Trends and Challenges. Proceedings of the 2019 International Conference on Machine Learning and Cybernetics (ICMLC), Kobe, Japan.","DOI":"10.1109\/ICMLC48188.2019.8949228"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1016\/j.future.2022.10.033","article-title":"An adaptive DNN inference acceleration framework with end\u2013edge\u2013cloud collaborative computing","volume":"140","author":"Liu","year":"2023","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"108177","DOI":"10.1016\/j.comnet.2021.108177","article-title":"Task offloading in Edge and Cloud Computing: A survey on mathematical, artificial intelligence and control theory solutions","volume":"195","author":"Saeik","year":"2021","journal-title":"Comput. Netw."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3555802","article-title":"Edge Computing with Artificial Intelligence: A Machine Learning Perspective","volume":"55","author":"Hua","year":"2023","journal-title":"ACM Comput Surv"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Guo, T. (2018, January 17\u201320). Cloud-Based or On-Device: An Empirical Study of Mobile Deep Inference. Proceedings of the 2018 IEEE International Conference on Cloud Engineering (IC2E), Orlando, FL, USA.","DOI":"10.1109\/IC2E.2018.00042"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Teerapittayanon, S., McDanel, B., and Kung, H.T. (2017, January 5\u20138). Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA.","DOI":"10.1109\/ICDCS.2017.226"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2348","DOI":"10.1109\/TCAD.2018.2858384","article-title":"DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters","volume":"37","author":"Zhao","year":"2018","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hu, S., Dong, C., and Wen, W. (2021, January 23\u201326). Enable Pipeline Processing of DNN Co-inference Tasks in the Mobile-Edge Cloud. Proceedings of the 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), Chengdu, China.","DOI":"10.1109\/ICCCS52626.2021.9449178"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1199","DOI":"10.1109\/COMST.2023.3239579","article-title":"Offloading Using Traditional Optimization and Machine Learning in Federated Cloud\u2013Edge\u2013Fog Systems: A Survey","volume":"25","author":"Kar","year":"2023","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Younis, A., Tran, T.X., and Pompili, D. (2019, January 4\u20137). Energy-Latency-Aware Task Offloading and Approximate Computing at the Mobile Edge. Proceedings of the 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Monterey, CA, USA.","DOI":"10.1109\/MASS.2019.00043"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Na, J., Zhang, H., Lian, J., and Zhang, B. (2022). Partitioning DNNs for Optimizing Distributed Inference Performance on Cooperative Edge Devices: A Genetic Algorithm Approach. Appl. Sci., 12.","DOI":"10.3390\/app122010619"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1145\/3093337.3037698","article-title":"Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge","volume":"45","author":"Kang","year":"2017","journal-title":"ACM SIGARCH Comput. Archit. News"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Banitalebi-Dehkordi, A., Vedula, N., Pei, J., Xia, F., Wang, L., and Zhang, Y. (2021). Auto-Split: A General Framework of Collaborative Edge-Cloud AI. arXiv.","DOI":"10.1145\/3447548.3467078"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Wang, K., Ling, N., and Xing, G. (2021, January 18\u201321). EdgeML: An AutoML Framework for Real-Time Deep Learning on the Edge. Proceedings of the International Conference on Internet-of-Things Design and Implementation, Nashville, TN, USA.","DOI":"10.1145\/3450268.3453520"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"9060","DOI":"10.1109\/TMC.2024.3357874","article-title":"Distributed DNN Inference with Fine-Grained Model Partitioning in Mobile Edge Computing Networks","volume":"23","author":"Li","year":"2024","journal-title":"IEEE Trans. Mob. Comput."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hu, C., Bao, W., Wang, D., and Liu, F. (May, January 29). Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France.","DOI":"10.1109\/INFOCOM.2019.8737614"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3432192","article-title":"Towards Real-time Cooperative Deep Inference over the Cloud and Edge End Devices","volume":"4","author":"Zhang","year":"2020","journal-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Parthasarathy, A., and Krishnamachari, B. (December, January 30). Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput. Proceedings of the 2022 32nd International Telecommunication Networks and Applications Conference (ITNAC), Wellington, New Zealand.","DOI":"10.1109\/ITNAC55475.2022.9998427"},{"key":"ref_28","unstructured":"Sada, A.B., Khelloufi, A., Naouri, A., Ning, H., and Dhelim, S. (2024). Selective Task offloading for Maximum Inference Accuracy and Energy efficient Real-Time IoT Sensing Systems. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"13076","DOI":"10.1109\/TMC.2024.3430103","article-title":"Real-Time Adaptive Partition and Resource Allocation for Multi-User End-Cloud Inference Collaboration in Mobile Environment","volume":"23","author":"Li","year":"2024","journal-title":"IEEE Trans. Mob. Comput."},{"key":"ref_30","unstructured":"Fudala, T., Tsouvalas, V., and Meratnia, N. (2025). Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach. arXiv."},{"key":"ref_31","first-page":"89","article-title":"Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing","volume":"1","author":"Sardellitti","year":"2015","journal-title":"IEEE Trans. Signal Inf. Process. Netw."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"13273","DOI":"10.1109\/JIOT.2025.3535623","article-title":"Generative AI-Aided Multimodal Parallel Offloading for AIGC Metaverse Service in IoT Networks","volume":"12","author":"Zeng","year":"2025","journal-title":"IEEE Internet Things J."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/12\/1091\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T09:33:09Z","timestamp":1765186389000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/12\/1091"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,8]]},"references-count":32,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["info16121091"],"URL":"https:\/\/doi.org\/10.3390\/info16121091","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,8]]}}}