{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T18:10:12Z","timestamp":1775067012953,"version":"3.50.1"},"reference-count":97,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T00:00:00Z","timestamp":1701907200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2023,12,7]]},"abstract":"<jats:p>On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e. concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Primary contribution of this paper is a backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios. Other noteworthy contributions include a novel pareto front estimator that runs on-device, and also a software-based GPU profiler with a lightweight algorithm to detect changing GPU workloads. Specifically, the pareto front estimator outperforms state of the art algorithms NSGA-II and SPEA2 by 94% on pareto coverage, and by almost 2x on computational overhead.<\/jats:p>","DOI":"10.1145\/3626793","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T15:20:29Z","timestamp":1702394429000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Automated Backend Allocation for Multi-Model, On-Device AI Inference"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8707-3196","authenticated-orcid":false,"given":"Venkatraman","family":"Iyer","sequence":"first","affiliation":[{"name":"Samsung Electronics, Seoul, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-3808-1592","authenticated-orcid":false,"given":"Sungho","family":"Lee","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-3550-7958","authenticated-orcid":false,"given":"Semun","family":"Lee","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5709-0875","authenticated-orcid":false,"given":"Juitem Joonwoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-3539-2679","authenticated-orcid":false,"given":"Hyunjun","family":"Kim","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5749-2995","authenticated-orcid":false,"given":"Youngjae","family":"Shin","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,12,12]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2011. netem tc manual. https:\/\/man7.org\/linux\/man-pages\/man8\/tc-netem.8.html. (2011)."},{"key":"e_1_2_1_2_1","unstructured":"2013. OpenCL 2.0 specification. https:\/\/registry.khronos.org\/OpenCL\/specs\/opencl-2.0.pdf. (2013)."},{"key":"e_1_2_1_3_1","unstructured":"2015. grpc - A high performance open source general-purpose RPC framework. https:\/\/github.com\/grpc. (2015)."},{"key":"e_1_2_1_4_1","unstructured":"2016. ARM Developer Streamline Performance Analyzer. https:\/\/developer.arm.com\/Tools%20and%20Software\/Streamline%20Performance%20Analyzer. (2016)."},{"key":"e_1_2_1_5_1","unstructured":"2016. SnapDragon Profiler. https:\/\/developer.qualcomm.com\/software\/snapdragon-profiler. (2016)."},{"key":"e_1_2_1_6_1","unstructured":"2017. Open Neural Network Exchange. https:\/\/github.com\/onnx. (2017)."},{"key":"e_1_2_1_7_1","unstructured":"2018. Mobile AI Compute Engine. https:\/\/github.com\/XiaoMi\/mace. (2018)."},{"key":"e_1_2_1_8_1","unstructured":"2019. VIM3 Pro. https:\/\/www.khadas.com\/vim3. (2019)."},{"key":"e_1_2_1_9_1","volume-title":"https:\/\/www.cltracer.com\/","year":"2020","unstructured":"2020. CLTuner. (2020). https:\/\/www.cltracer.com\/"},{"key":"e_1_2_1_10_1","unstructured":"2020. ONE - On-device Neural Engine. https:\/\/github.com\/Samsung\/ONE. (2020)."},{"key":"e_1_2_1_11_1","volume-title":"et al","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al . 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283."},{"key":"e_1_2_1_12_1","volume-title":"Learning the number of neurons in deep networks. Advances in Neural Information Processing Systems 29","author":"Alvarez Jose M","year":"2016","unstructured":"Jose M Alvarez and Mathieu Salzmann. 2016. Learning the number of neurons in deep networks. Advances in Neural Information Processing Systems 29 (2016)."},{"key":"e_1_2_1_13_1","volume-title":"Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823","author":"Baker Bowen","year":"2017","unstructured":"Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823 (2017)."},{"key":"e_1_2_1_14_1","volume-title":"Multimodal machine learning: A survey and taxonomy","author":"Ahuja Chaitanya","year":"2018","unstructured":"Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423--443."},{"key":"e_1_2_1_15_1","volume-title":"Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934","author":"Bochkovskiy Alexey","year":"2020","unstructured":"Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)."},{"key":"e_1_2_1_16_1","unstructured":"Shaileshh Bojja Venkatakrishnan Shreyan Gupta Hongzi Mao Mohammad Alizadeh et al. 2019. Learning Generalizable Device Placement Algorithms for Distributed Machine Learning. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_2_1_17_1","volume-title":"Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791","author":"Cai Han","year":"2019","unstructured":"Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019)."},{"key":"e_1_2_1_18_1","volume-title":"Kernel change-point detection with auxiliary deep generative models. arXiv preprint arXiv:1901.06077","author":"Chang Wei-Cheng","year":"2019","unstructured":"Wei-Cheng Chang, Chun-Liang Li, Yiming Yang, and Barnab\u00e1s P\u00f3czos. 2019. Kernel change-point detection with auxiliary deep generative models. arXiv preprint arXiv:1901.06077 (2019)."},{"key":"e_1_2_1_19_1","volume-title":"International conference on machine learning. PMLR, 2285--2294","author":"Chen Wenlin","year":"2015","unstructured":"Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In International conference on machine learning. PMLR, 2285--2294."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00307"},{"key":"e_1_2_1_21_1","volume-title":"Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning. arXiv preprint arXiv:2109.01611","author":"Choi Seungbeom","year":"2021","unstructured":"Seungbeom Choi, Sunho Lee, Yeonjae Kim, Jongse Park, Youngjin Kwon, and Jaehyuk Huh. 2021. Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning. arXiv preprint arXiv:2109.01611 (2021)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.pmcj.2022.101594"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01166"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2021.3087031"},{"key":"e_1_2_1_25_1","volume-title":"A fast and elitist multiobjective genetic algorithm: NSGA-II","author":"Deb Kalyanmoy","year":"2002","unstructured":"Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197."},{"key":"e_1_2_1_26_1","volume-title":"On-device machine learning: An algorithms and learning theory perspective. arXiv preprint arXiv:1911.00623","author":"Dhar Sauptik","year":"2019","unstructured":"Sauptik Dhar, Junyao Guo, Jiayi Liu, Samarth Tripathi, Unmesh Kurup, and Mohak Shah. 2019. On-device machine learning: An algorithms and learning theory perspective. arXiv preprint arXiv:1911.00623 (2019)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241559"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.peva.2021.102234"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the Sixth International Conference on Learning Representations","author":"Goldie A","year":"2018","unstructured":"A Goldie, A Mirhoseini, B Steiner, H Pham, J Dean, and QV Le. 2018. Hierarchical planning for device placement. In Proceedings of the Sixth International Conference on Learning Representations, Vancouver, BC, Canada. 1--11."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00171"},{"key":"e_1_2_1_31_1","volume-title":"Mistify: Automating DNN Model Porting for On-Device Inference at the Edge.. In NSDI. 705--719.","author":"Guo Peizhen","year":"2021","unstructured":"Peizhen Guo, Bo Hu, and Wenjun Hu. 2021. Mistify: Automating DNN Model Porting for On-Device Inference at the Edge.. In NSDI. 705--719."},{"key":"e_1_2_1_32_1","volume-title":"et al","author":"Ham MyungJoo","year":"2021","unstructured":"MyungJoo Ham, Jijoong Moon, Geunsik Lim, Jaeyun Jung, Hyoungjoo Ahn, Wook Song, Sangjung Woo, Parichay Kapoor, Dongju Chae, Gichan Jang, et al . 2021. NNStreamer: Efficient and Agile Development of On-Device AI Systems. In 2021 IEEE\/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 198--207."},{"key":"e_1_2_1_33_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149","author":"Han Song","year":"2015","unstructured":"Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2906388.2906396"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_36_1","volume-title":"Binarized neural networks. Advances in neural information processing systems 29","author":"Hubara Itay","year":"2016","unstructured":"Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. Advances in neural information processing systems 29 (2016)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/SEC.2018.00016"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10762-2_58"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421302"},{"key":"e_1_2_1_41_1","volume-title":"Accelerating Multi-Model Inference by Merging DNNs of Different Weights. arXiv preprint arXiv:2009.13062","author":"Jeong Joo Seong","year":"2020","unstructured":"Joo Seong Jeong, Soojeong Kim, Gyeong-In Yu, Yunseong Lee, and Byung-Gon Chun. 2020. Accelerating Multi-Model Inference by Merging DNNs of Different Weights. arXiv preprint arXiv:2009.13062 (2020)."},{"key":"e_1_2_1_42_1","volume-title":"Mnn: A universal and efficient inference engine. arXiv preprint arXiv:2002.12418","author":"Jiang Xiaotang","year":"2020","unstructured":"Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, et al. 2020. Mnn: A universal and efficient inference engine. arXiv preprint arXiv:2002.12418 (2020)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3093337.3037698"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2020.2990597"},{"key":"e_1_2_1_45_1","volume-title":"International Conference on Machine Learning. PMLR, 2718--2727","author":"Knoblauch Jeremias","year":"2018","unstructured":"Jeremias Knoblauch and Theodoros Damoulas. 2018. Spatio-temporal Bayesian on-line changepoint detection with model selection. In International Conference on Machine Learning. PMLR, 2718--2727."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00016"},{"key":"e_1_2_1_47_1","volume-title":"Nimble: Lightweight and parallel gpu task scheduling for deep learning. arXiv preprint arXiv:2012.02732","author":"Kwon Woosuk","year":"2020","unstructured":"Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, and Byung-Gon Chun. 2020. Nimble: Lightweight and parallel gpu task scheduling for deep learning. arXiv preprint arXiv:2012.02732 (2020)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPSN.2016.7460664"},{"key":"e_1_2_1_49_1","volume-title":"On-device neural net inference with mobile gpus. arXiv preprint arXiv:1907.01989","author":"Lee Juhyun","year":"2019","unstructured":"Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, and Matthias Grundmann. 2019. On-device neural net inference with mobile gpus. arXiv preprint arXiv:1907.01989 (2019)."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3469116.3470014"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2019.2946140"},{"key":"e_1_2_1_52_1","volume-title":"Ternary weight networks. arXiv preprint arXiv:1605.04711","author":"Li Fengfu","year":"2016","unstructured":"Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300917"},{"key":"e_1_2_1_54_1","volume-title":"Random Forests for Change Point Detection. arXiv preprint arXiv:2205.04997","author":"Londschien Malte","year":"2022","unstructured":"Malte Londschien, Peter B\u00fchlmann, and Solt Kov\u00e1cs. 2022. Random Forests for Change Point Detection. arXiv preprint arXiv:2205.04997 (2022)."},{"key":"e_1_2_1_55_1","volume-title":"CalcGraph: taming the high costs of deep learning using models. Software and Systems Modeling","author":"Lorentz Joe","year":"2022","unstructured":"Joe Lorentz, Thomas Hartmann, Assaad Moawad, Francois Fouquet, Djamila Aouada, and Yves Le Traon. 2022. CalcGraph: taming the high costs of deep learning using models. Software and Systems Modeling (2022), 1--24."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10732-009-9103-9"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081359"},{"key":"e_1_2_1_58_1","volume-title":"Alessandro Montanari, and Fahim Kawsar.","author":"Min Chulhong","year":"2021","unstructured":"Chulhong Min, Akhil Mathur, Utku Gunay Acer, Alessandro Montanari, and Fahim Kawsar. 2021. SensiX: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices. arXiv preprint arXiv:2109.03947 (2021)."},{"key":"e_1_2_1_59_1","volume-title":"Utku Gunay Acer, and Fahim Kawsar","author":"Min Chulhong","year":"2020","unstructured":"Chulhong Min, Akhil Mathur, Alessandro Montanari, Utku Gunay Acer, and Fahim Kawsar. 2020. Sensix: A platform for collaborative machine learning on the edge. arXiv preprint arXiv:2012.06035 (2020)."},{"key":"e_1_2_1_60_1","volume-title":"International Conference on Machine Learning. PMLR, 2430--2439","author":"Mirhoseini Azalia","year":"2017","unstructured":"Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device placement optimization with reinforcement learning. In International Conference on Machine Learning. PMLR, 2430--2439."},{"key":"e_1_2_1_61_1","unstructured":"Deepak Narayanan Keshav Santhanam Amar Phanishayee and Matei Zaharia. 2018. Accelerating Deep Learning Workloads Through Efficient Multi-Model Execution. In NeurIPS Workshop on Systems for Machine Learning. https:\/\/www.microsoft.com\/en-us\/research\/publication\/accelerating-deep-learning-workloads-through-efficient-multi-model-execution\/"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00013"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2015.10"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_17"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507754"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786763.2694346"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482240"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460352"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3483272"},{"key":"e_1_2_1_72_1","volume-title":"Parallel Pareto local search based on decomposition","author":"Shi Jialong","year":"2018","unstructured":"Jialong Shi, Qingfu Zhang, and Jianyong Sun. 2018. PPLS\/D: Parallel Pareto local search based on decomposition. IEEE transactions on cybernetics 50, 3 (2018), 1060--1071."},{"key":"e_1_2_1_73_1","volume-title":"Edge computing: Vision and challenges","author":"Shi Weisong","year":"2016","unstructured":"Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. 2016. Edge computing: Vision and challenges. IEEE internet of things journal 3, 5 (2016), 637--646."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2014.06.011"},{"key":"e_1_2_1_75_1","volume-title":"Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984","author":"Sun Zhiqing","year":"2020","unstructured":"Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984 (2020)."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00293"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665702"},{"key":"e_1_2_1_79_1","first-page":"15451","article-title":"Efficient algorithms for device placement of dnn graph operators","volume":"33","author":"Tarnawski Jakub M","year":"2020","unstructured":"Jakub M Tarnawski, Amar Phanishayee, Nikhil Devanur, Divya Mahajan, and Fanny Nina Paravecino. 2020. Efficient algorithms for device placement of dnn graph operators. Advances in Neural Information Processing Systems 33 (2020), 15451--15463.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2018.08.004"},{"key":"e_1_2_1_81_1","doi-asserted-by":"crossref","unstructured":"Guibin Wang YiSong Lin and Wei Yi. 2010. Kernel fusion: An effective method for better power efficiency on multithreaded GPU. In 2010 IEEE\/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber Physical and Social Computing. IEEE 344--350.","DOI":"10.1109\/GreenCom-CPSCom.2010.102"},{"key":"e_1_2_1_82_1","volume-title":"HSIP: A novel task scheduling algorithm for heterogeneous computing. Scientific Programming 2016","author":"Wang Guan","year":"2016","unstructured":"Guan Wang, Yuxin Wang, Hui Liu, and He Guo. 2016. HSIP: A novel task scheduling algorithm for heterogeneous computing. Scientific Programming 2016 (2016)."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446078"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS46320.2019.00042"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775148"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274783.3274840"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3380881"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3419192"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485730.3493453"},{"key":"e_1_2_1_90_1","volume-title":"14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17)","author":"Zhang Haoyu","year":"2017","unstructured":"Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J Freedman. 2017. Live video analytics at scale with approximation and {Delay-Tolerance}. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 377--392."},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3127490"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507723"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00191"},{"key":"e_1_2_1_94_1","volume-title":"SPEA2: Improving the strength Pareto evolutionary algorithm. TIK-report 103","author":"Zitzler Eckart","year":"2001","unstructured":"Eckart Zitzler, Marco Laumanns, and Lothar Thiele. 2001. SPEA2: Improving the strength Pareto evolutionary algorithm. TIK-report 103 (2001)."},{"key":"e_1_2_1_95_1","volume-title":"Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach","author":"Zitzler Eckart","year":"1999","unstructured":"Eckart Zitzler and Lothar Thiele. 1999. Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE transactions on Evolutionary Computation 3, 4 (1999), 257--271."},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2003.810758"},{"key":"e_1_2_1_97_1","volume-title":"Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578","author":"Zoph Barret","year":"2016","unstructured":"Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)."}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626793","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3626793","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:14:46Z","timestamp":1755908086000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626793"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,7]]},"references-count":97,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,12,7]]}},"alternative-id":["10.1145\/3626793"],"URL":"https:\/\/doi.org\/10.1145\/3626793","relation":{},"ISSN":["2476-1249"],"issn-type":[{"value":"2476-1249","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,7]]},"assertion":[{"value":"2023-12-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}