{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T18:36:39Z","timestamp":1775154999795,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,8,3]],"date-time":"2020-08-03T00:00:00Z","timestamp":1596412800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,8,3]]},"DOI":"10.1145\/3411029.3411035","type":"proceedings-article","created":{"date-parts":[[2020,8,11]],"date-time":"2020-08-11T16:10:28Z","timestamp":1597162228000},"page":"36-43","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Irina: Accelerating DNN Inference with Efficient Online Scheduling"],"prefix":"10.1145","author":[{"given":"Xiaorui","family":"Wu","sequence":"first","affiliation":[{"name":"City University of Hong Kong"}]},{"given":"Hong","family":"Xu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong"}]},{"given":"Yi","family":"Wang","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, China"}]}],"member":"320","published-online":{"date-parts":[[2020,8,11]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n.d.]. 11 Reasons Cloud Video Surveillance is Moving to the Cloud. https:\/\/www.een.com\/vsaas-video-surveillance-moving-to-cloud\/.  [n.d.]. 11 Reasons Cloud Video Surveillance is Moving to the Cloud. https:\/\/www.een.com\/vsaas-video-surveillance-moving-to-cloud\/."},{"key":"e_1_3_2_1_2_1","unstructured":"[n.d.]. CNN model inference benchmarks for some popular deep learning frameworks. https:\/\/github.com\/nicklhy\/DLInfBench\/tree\/master\/results\/.  [n.d.]. CNN model inference benchmarks for some popular deep learning frameworks. https:\/\/github.com\/nicklhy\/DLInfBench\/tree\/master\/results\/."},{"key":"e_1_3_2_1_3_1","volume-title":"MLPerf Inference v0.5 Results. https:\/\/www.mlperf.org\/inference-results\/ November 6th","year":"2019","unstructured":"[n.d.]. MLPerf Inference v0.5 Results. https:\/\/www.mlperf.org\/inference-results\/ November 6th , 2019 . [n.d.]. MLPerf Inference v0.5 Results. https:\/\/www.mlperf.org\/inference-results\/ November 6th, 2019."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098021"},{"key":"e_1_3_2_1_5_1","article-title":"Optimal memory-aware backpropagation of deep join networks","volume":"378","author":"Beaumont Olivier","year":"2020","unstructured":"Olivier Beaumont , Julien Herrmann , Guillaume Pallez , and Alena Shilova . 2020 . Optimal memory-aware backpropagation of deep join networks . Philosophical Transactions of the Royal Society A 378 , 2166(2020), 20190049. Olivier Beaumont, Julien Herrmann, Guillaume Pallez, and Alena Shilova. 2020. Optimal memory-aware backpropagation of deep join networks. Philosophical Transactions of the Royal Society A 378, 2166(2020), 20190049.","journal-title":"Philosophical Transactions of the Royal Society A"},{"key":"e_1_3_2_1_6_1","volume-title":"Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 613\u2013627.","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , Xin Wang , Guilio Zhou , Michael\u00a0 J Franklin , Joseph\u00a0 E Gonzalez , and Ion Stoica . 2017 . Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 613\u2013627. Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael\u00a0J Franklin, Joseph\u00a0E Gonzalez, and Ion Stoica. 2017. Clipper: A low-latency online prediction serving system. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 613\u2013627."},{"key":"e_1_3_2_1_7_1","volume-title":"NVIDIA","author":"Guide Design","year":"2013","unstructured":"Design Guide . 2013 . Cuda c programming guide . NVIDIA , July (2013). Design Guide. 2013. Cuda c programming guide. NVIDIA, July (2013)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3204949.3204975"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274808.3274813"},{"key":"e_1_3_2_1_11_1","unstructured":"Paras Jain Xiangxi Mo Ajay Jain Harikaran Subbaraj Rehan\u00a0Sohail Durrani Alexey Tumanov Joseph Gonzalez and Ion Stoica. 2018. Dynamic space-time scheduling for gpu inference. arXiv preprint arXiv:1901.00041(2018).  Paras Jain Xiangxi Mo Ajay Jain Harikaran Subbaraj Rehan\u00a0Sohail Durrani Alexey Tumanov Joseph Gonzalez and Ion Stoica. 2018. Dynamic space-time scheduling for gpu inference. arXiv preprint arXiv:1901.00041(2018)."},{"key":"e_1_3_2_1_12_1","unstructured":"Paras Jain Xiangxi Mo Ajay Jain Alexey Tumanov Joseph\u00a0E Gonzalez and Ion Stoica. 2019. The OoO VLIW JIT Compiler for GPU Inference. arXiv preprint arXiv:1901.10008(2019).  Paras Jain Xiangxi Mo Ajay Jain Alexey Tumanov Joseph\u00a0E Gonzalez and Ion Stoica. 2019. The OoO VLIW JIT Compiler for GPU Inference. arXiv preprint arXiv:1901.10008(2019)."},{"key":"e_1_3_2_1_13_1","volume-title":"Christopher Canel, Lilia Tang, Ishan Misra, Michael Kaminsky, Michael\u00a0A Kozuch, Padmanabhan Pillai, David\u00a0G Andersen, and Gregory\u00a0R Ganger.","author":"Jiang H","year":"2018","unstructured":"Angela\u00a0 H Jiang , Daniel L-K Wong , Christopher Canel, Lilia Tang, Ishan Misra, Michael Kaminsky, Michael\u00a0A Kozuch, Padmanabhan Pillai, David\u00a0G Andersen, and Gregory\u00a0R Ganger. 2018 . Mainstream : Dynamic stem-sharing for multi-tenant video processing. In 2018 {USENIX} Annual Technical Conference ( {USENIX}{ATC} 18). 29\u201342. Angela\u00a0H Jiang, Daniel L-K Wong, Christopher Canel, Lilia Tang, Ishan Misra, Michael Kaminsky, Michael\u00a0A Kozuch, Padmanabhan Pillai, David\u00a0G Andersen, and Gregory\u00a0R Ganger. 2018. Mainstream: Dynamic stem-sharing for multi-tenant video processing. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18). 29\u201342."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Daniel Kang John Emmons Firas Abuzaid Peter Bailis and Matei Zaharia. 2017. Noscope: optimizing neural network queries over video at scale. arXiv preprint arXiv:1703.02529(2017).  Daniel Kang John Emmons Firas Abuzaid Peter Bailis and Matei Zaharia. 2017. Noscope: optimizing neural network queries over video at scale. arXiv preprint arXiv:1703.02529(2017).","DOI":"10.14778\/3137628.3137664"},{"key":"e_1_3_2_1_15_1","volume-title":"International Conference on Computer Science, Engineering and Education Applications. Springer, 658\u2013668","author":"Kochura Yuriy","year":"2019","unstructured":"Yuriy Kochura , Yuri Gordienko , Vlad Taran , Nikita Gordienko , Alexandr Rokovyi , Oleg Alienin , and Sergii Stirenko . 2019 . Batch size influence on performance of graphic and tensor processing units during training and inference phases . In International Conference on Computer Science, Engineering and Education Applications. Springer, 658\u2013668 . Yuriy Kochura, Yuri Gordienko, Vlad Taran, Nikita Gordienko, Alexandr Rokovyi, Oleg Alienin, and Sergii Stirenko. 2019. Batch size influence on performance of graphic and tensor processing units during training and inference phases. In International Conference on Computer Science, Engineering and Education Applications. Springer, 658\u2013668."},{"key":"e_1_3_2_1_16_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey\u00a0E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097\u20131105.  Alex Krizhevsky Ilya Sutskever and Geoffrey\u00a0E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097\u20131105."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_1_18_1","unstructured":"Moshe Looks Marcello Herreshoff DeLesley Hutchins and Peter Norvig. 2017. Deep learning with dynamic computation graphs. arXiv preprint arXiv:1702.02181(2017).  Moshe Looks Marcello Herreshoff DeLesley Hutchins and Peter Norvig. 2017. Deep learning with dynamic computation graphs. arXiv preprint arXiv:1702.02181(2017)."},{"key":"e_1_3_2_1_19_1","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767(2018).  Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767(2018)."},{"key":"e_1_3_2_1_20_1","unstructured":"Francisco Romero Qian Li Neeraja\u00a0J Yadwadkar and Christos Kozyrakis. 2019. INFaaS: A Model-less Inference Serving System. arXiv preprint arXiv:1905.13348(2019).  Francisco Romero Qian Li Neeraja\u00a0J Yadwadkar and Christos Kozyrakis. 2019. INFaaS: A Model-less Inference Serving System. arXiv preprint arXiv:1905.13348(2019)."},{"key":"e_1_3_2_1_21_1","unstructured":"Haichen Shen Yuchen Jin Bingyu Kong Matthai Philipose Arvind Krishnamurthy and Ravi Sundaram. [n.d.]. Nexus: A GPU Cluster for Accelerating Neural Networks for Video Analysis. ([n.\u00a0d.]).  Haichen Shen Yuchen Jin Bingyu Kong Matthai Philipose Arvind Krishnamurthy and Ravi Sundaram. [n.d.]. Nexus: A GPU Cluster for Accelerating Neural Networks for Video Analysis. ([n.\u00a0d.])."},{"key":"e_1_3_2_1_22_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_1_24_1","volume-title":"Nanily: A QoS-Aware Scheduling for DNN Inference Workload in Clouds. In 2019 IEEE 21st International Conference on High Performance Computing and Communications","author":"Tang Xuehai","year":"2019","unstructured":"Xuehai Tang , Peng Wang , Qiuyang Liu , Wang Wang , and Jizhong Han . 2019 . Nanily: A QoS-Aware Scheduling for DNN Inference Workload in Clouds. In 2019 IEEE 21st International Conference on High Performance Computing and Communications ; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS). IEEE , 2395\u20132402. Xuehai Tang, Peng Wang, Qiuyang Liu, Wang Wang, and Jizhong Han. 2019. Nanily: A QoS-Aware Scheduling for DNN Inference Workload in Clouds. In 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS). IEEE, 2395\u20132402."},{"key":"e_1_3_2_1_25_1","unstructured":"Sil\u00a0C van\u00a0de Leemput Jonas Teuwen and Rashindra Manniesing. 2018. Memcnn: a framework for developing memory efficient deep invertible networks. (2018).  Sil\u00a0C van\u00a0de Leemput Jonas Teuwen and Rashindra Manniesing. 2018. Memcnn: a framework for developing memory efficient deep invertible networks. (2018)."},{"key":"e_1_3_2_1_26_1","volume-title":"Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications. arXiv preprint arXiv:1902.04610(2019).","author":"Yu Peifeng","year":"2019","unstructured":"Peifeng Yu and Mosharaf Chowdhury . 2019 . Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications. arXiv preprint arXiv:1902.04610(2019). Peifeng Yu and Mosharaf Chowdhury. 2019. Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications. arXiv preprint arXiv:1902.04610(2019)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230554"},{"key":"e_1_3_2_1_28_1","unstructured":"Corey Zumar. 2018. InferLine: ML Inference Pipeline Composition Framework. (2018).  Corey Zumar. 2018. InferLine: ML Inference Pipeline Composition Framework. (2018)."}],"event":{"name":"APNet '20: 4th Asia-Pacific Workshop on Networking","location":"Seoul Republic of Korea","acronym":"APNet '20"},"container-title":["4th Asia-Pacific Workshop on Networking"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3411029.3411035","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3411029.3411035","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:11Z","timestamp":1750195691000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3411029.3411035"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,3]]},"references-count":28,"alternative-id":["10.1145\/3411029.3411035","10.1145\/3411029"],"URL":"https:\/\/doi.org\/10.1145\/3411029.3411035","relation":{},"subject":[],"published":{"date-parts":[[2020,8,3]]},"assertion":[{"value":"2020-08-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}