{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T16:03:39Z","timestamp":1781193819113,"version":"3.54.1"},"reference-count":8,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T00:00:00Z","timestamp":1648598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["GetMobile: Mobile Comp. and Comm."],"published-print":{"date-parts":[[2022,3,30]]},"abstract":"<jats:p>Inference latency has become a crucial metric in running Deep Neural Network (DNN) models on various mobile and edge devices. To this end, latency prediction of DNN inference is highly desirable for many tasks where measuring the latency on real devices is infeasible or too costly. Yet it is very challenging and existing approaches fail to achieve a high accuracy of prediction, due to the varying model-inference latency caused by the runtime optimizations on diverse edge devices. In this paper, we propose and develop nn-Meter, a novel and efficient system to accurately predict the DNN inference latency on diverse edge devices. The key idea of nn-Meter is dividing a whole model inference into kernels, i.e., the execution units on a device, and conducting kernel-level prediction. nn-Meter builds atop two key techniques: (i) kernel detection to automatically detect the execution unit of model inference via a set of well-designed test cases; and (ii) adaptive sampling to efficiently sample the most beneficial configurations from a large space to build accurate kernel-level latency predictors. nn-Meter achieves significant high prediction accuracy on four types of edge devices.<\/jats:p>","DOI":"10.1145\/3529706.3529712","type":"journal-article","created":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T22:24:07Z","timestamp":1648679047000},"page":"19-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["nn-METER"],"prefix":"10.1145","volume":"25","author":[{"given":"Li Lyna","family":"Zhang","sequence":"first","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shihao","family":"Han","sequence":"additional","affiliation":[{"name":"Microsoft Research, Rose-Hulman Institute of Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianyu","family":"Wei","sequence":"additional","affiliation":[{"name":"Microsoft Research, University of Science and Technology of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ningxin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ting","family":"Cao","sequence":"additional","affiliation":[{"name":"Microsoft Research"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yunxin","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute for AI Industry Research (AIR), Tsinghua University"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,3,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313591"},{"key":"e_1_2_1_2_1","volume":"201","author":"Han Song","unstructured":"Song Han , Huizi Mao , and William J. Dally. 201 6. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations (ICLR). Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. International Conference on Learning Representations (ICLR).","journal-title":"William J. Dally."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_48"},{"key":"e_1_2_1_4_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Cai Han","year":"2019","unstructured":"Han Cai , Ligeng Zhu , and Song Han . 2019 . ProxylessNAS: Direct neural architecture search on target task and hardware . In International Conference on Learning Representations (ICLR). Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01099"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV).","author":"Liu Zechun","year":"2019","unstructured":"Zechun Liu , Haoyuan Mu , Xiangyu Zhang , Zichao Guo , Tim Kwang-Ting Cheng Xin Yang , and Jian Sun . 2019 . MetaPruning: Meta learning for automatic neural architecture channel pruning . Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Tim Kwang-Ting Cheng Xin Yang, and Jian Sun. 2019. MetaPruning: Meta learning for automatic neural architecture channel pruning. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_1_7_1","unstructured":"Lukasz Dudziak Thomas Chau Mohamed Abdelfattah Royson Lee Hyeji Kim and Nicholas Lane. 2020. BRP-NAS: Predictionbased NAS using GCNs. Advances in Neural Information Processing Systems (Neurips).  Lukasz Dudziak Thomas Chau Mohamed Abdelfattah Royson Lee Hyeji Kim and Nicholas Lane. 2020. BRP-NAS: Predictionbased NAS using GCNs. Advances in Neural Information Processing Systems (Neurips)."},{"key":"e_1_2_1_8_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Dong Xuanyi","year":"2020","unstructured":"Xuanyi Dong and Yi Yang . 2020 . NASBench- 201: Extending the scope of reproducible neural architecture search . In International Conference on Learning Representations (ICLR). Xuanyi Dong and Yi Yang. 2020. NASBench- 201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR)."}],"container-title":["GetMobile: Mobile Computing and Communications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529706.3529712","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3529706.3529712","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:13Z","timestamp":1750183753000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529706.3529712"}},"subtitle":["Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices"],"short-title":[],"issued":{"date-parts":[[2022,3,30]]},"references-count":8,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,3,30]]}},"alternative-id":["10.1145\/3529706.3529712"],"URL":"https:\/\/doi.org\/10.1145\/3529706.3529712","relation":{},"ISSN":["2375-0529","2375-0537"],"issn-type":[{"value":"2375-0529","type":"print"},{"value":"2375-0537","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,30]]},"assertion":[{"value":"2022-03-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}