{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T17:21:58Z","timestamp":1767806518675,"version":"3.49.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62301307"],"award-info":[{"award-number":["62301307"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100013285","name":"Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100013285","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"crossref","award":["23XD1420100"],"award-info":[{"award-number":["23XD1420100"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shanghai Pujiang Programme","award":["23PJD041"],"award-info":[{"award-number":["23PJD041"]}]},{"name":"Chenguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission","award":["23CGA60"],"award-info":[{"award-number":["23CGA60"]}]},{"name":"AI-Enhanced Research Program\u00a0of Shanghai Municipal Education Commission","award":["SMEC-AI-DHUY-10"],"award-info":[{"award-number":["SMEC-AI-DHUY-10"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["2232025D-47"],"award-info":[{"award-number":["2232025D-47"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>The emergence of deploying Deep neural network (DNN) services on edge servers has spurred research into efficiently provisioning inference services. However, previous studies have neglected to consider the implications of different types of DNN and varying quality of service (QoS) requirements on QoS violation rates. In this article, we propose a novel framework, named Coinf, for scheduling heterogeneous DNN inference tasks on edge servers. Coinf has the following four advantages to effectively handle attribute analysis, performance balancing, parallel execution, and model accuracy: (1) It enables efficient profiling of domain-specific attributes of various DNN tasks during the offline stage, achieved by constructing a regression model to predict the end-to-end latency of each task. (2) By utilizing the predicted execution time, Coinf achieves a commendable balance among inference latency, system throughput, and QoS violation rate. (3) It employs emerging deep reinforcement learning (DRL) to aggregate individual DNN tasks into batches, enabling concurrent parallel execution. (4) Coinf preserves the accuracies of the provided DNN models by not modifying them. Numerical experiments are constructed to validate the reliability and efficiency of Coinf in handling heterogeneous inference tasks.<\/jats:p>\n                  <jats:p\/>","DOI":"10.1145\/3777373","type":"journal-article","created":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T11:53:13Z","timestamp":1763380393000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Coinf: QoS-aware DRL-based Inference Task Scheduling Framework with Batching Processing"],"prefix":"10.1145","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4095-6843","authenticated-orcid":false,"given":"Guanglin","family":"Zhang","sequence":"first","affiliation":[{"name":"Donghua University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0552-1130","authenticated-orcid":false,"given":"Yuhao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Donghua University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6323-7070","authenticated-orcid":false,"given":"Xiaowen","family":"Huang","sequence":"additional","affiliation":[{"name":"Donghua University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2007-6478","authenticated-orcid":false,"given":"Wenqian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Donghua University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,1,7]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2023.3276759"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3095970"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCAS.2023.3267921"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.109886"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCN52139.2021.9524928"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2022.3219058"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3195664"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3177782"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_1_11_2","unstructured":"Neural Processor. 2024. [Online]. Available. Retrieved 19 June 2024 from https:\/\/en.wikichip.org\/wiki\/neural_processor"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2022.3192613"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00049"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530510"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3232715"},{"key":"e_1_3_1_16_2","first-page":"199","volume-title":"Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Choi Seungbeom","year":"2022","unstructured":"Seungbeom Choi, Sunho Lee, Yeonjae Kim, Jongse Park, Youngjin Kwon, and Jaehyuk Huh. 2022. Serving heterogeneous machine learning models on Multi-GPU servers with spatio-temporal sharing. In Proceedings of the 2022 USENIX Annual Technical Conference (USENIX ATC 22). 199\u2013216."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3047638"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2931558"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359658"},{"key":"e_1_3_1_20_2","first-page":"613","volume-title":"Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17)","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency online prediction serving system. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 613\u2013627."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421284"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071121"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796661"},{"key":"e_1_3_1_24_2","unstructured":"TensorFlow. 2016. [Online]. Retrieved 27 June 2024 from https:\/\/github.com\/tensorflow\/serving\/"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3634704"},{"issue":"3","key":"e_1_3_1_26_2","first-page":"1275","article-title":"A co-scheduling framework for DNN models on mobile and edge devices with heterogeneous hardware","volume":"22","author":"Xu Zhiyuan","year":"2021","unstructured":"Zhiyuan Xu, Dejun Yang, Chengxiang Yin, Jian Tang, Yanzhi Wang, and Guoliang Xue. 2021. A co-scheduling framework for DNN models on mobile and edge devices with heterogeneous hardware. IEEE Transactions on Mobile Computing 22, 3 (2021), 1275\u20131288.","journal-title":"IEEE Transactions on Mobile Computing"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2023.10.002"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2022.3183098"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3222509"},{"key":"e_1_3_1_30_2","unstructured":"DPDK. 2010. [online]. Retrieved 28 June 2024 from https:\/\/www.dpdk.org\/"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10070943"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3629584"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_34_2","unstructured":"NVIDIA Triton. 2018. [Online]. Retrieved 28 June 2024 from https:\/\/github.com\/triton-inference-server"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACPR.2015.7486599"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_38_2","unstructured":"ONNX Runtime. 2017. [Online]. Retrieved 7 July 2024 from https:\/\/onnxruntime.ai\/"},{"key":"e_1_3_1_39_2","unstructured":"2017. OpenAI Baselines: ACKTR and A2C - openai.com. Retrieved 7 July 2024 from https:\/\/openai.com\/research\/openai-baselines-acktr-a2c"},{"key":"e_1_3_1_40_2","first-page":"395","volume-title":"Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15).","author":"Dong Mo","year":"2015","unstructured":"Mo Dong, Qingxi Li, Doron Zarchy, P. Brighten Godfrey, and Michael Schapira. 2015. PCC: Re-architecting congestion control for consistent high performance. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15).395\u2013408."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2534169.2486020"},{"key":"e_1_3_1_42_2","unstructured":"TensorFlow. 2025. [Online]. Retrieved 1 July 2024 from https:\/\/tensorflow.google.cn\/"},{"key":"e_1_3_1_43_2","unstructured":"PyTorch. 2016. [Online]. Retrieved 5 July 2024 from https:\/\/pytorch.org\/"},{"key":"e_1_3_1_44_2","unstructured":"Python.org. 1994. [Online]. Retrieved 5 July 2024 from https:\/\/www.python.org\/"},{"key":"e_1_3_1_45_2","unstructured":"ONNX. 2017. [Online]. Retrieved 7 July 2024 from https:\/\/onnx.ai\/"},{"key":"e_1_3_1_46_2","unstructured":"NVIDIA TensorRT. 2016. [Online]. Retrieved 7 July 2024 from https:\/\/developer.nvidia.com\/tensorrt"},{"key":"e_1_3_1_47_2","unstructured":"XGBoost. 2014. [Online]. Retrieved 7 July 2024 from https:\/\/xgboost.readthedocs.io\/en\/stable\/"},{"key":"e_1_3_1_48_2","unstructured":"scikit-learn: machine learning in Python. 2010. [Online]. Retrieved 7 July 2024 from https:\/\/scikit-learn.org\/stable\/index.html"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3777373","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T15:58:59Z","timestamp":1767801539000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3777373"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,7]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3777373"],"URL":"https:\/\/doi.org\/10.1145\/3777373","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,7]]},"assertion":[{"value":"2024-09-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}