{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T07:41:15Z","timestamp":1765438875835,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":58,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,18]],"date-time":"2023-06-18T00:00:00Z","timestamp":1687046400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key R&D Program of China","award":["2022YFF0604501"],"award-info":[{"award-number":["2022YFF0604501"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62272261"],"award-info":[{"award-number":["62272261"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,18]]},"DOI":"10.1145\/3581791.3596831","type":"proceedings-article","created":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T17:52:21Z","timestamp":1686937941000},"page":"503-515","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["ConvReLU++: Reference-based Lossless Acceleration of Conv-ReLU Operations on Mobile CPU"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-2889-2266","authenticated-orcid":false,"given":"Rui","family":"Kong","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1591-2526","authenticated-orcid":false,"given":"Yuanchun","family":"Li","sequence":"additional","affiliation":[{"name":"Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-3714-8044","authenticated-orcid":false,"given":"Yizhen","family":"Yuan","sequence":"additional","affiliation":[{"name":"Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9266-3044","authenticated-orcid":false,"given":"Linghe","family":"Kong","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2023,6,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00061"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPSN.2018.00049"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.01.110"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2012.6239191"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3066883"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01147"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2903421"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240654"},{"key":"e_1_3_2_1_9_1","volume-title":"cuDNN: Efficient Primitives for Deep Learning. CoRR abs\/1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cuDNN: Efficient Primitives for Deep Learning. CoRR abs\/1410.0759 ( 2014 ). arXiv:1410.0759 http:\/\/arxiv.org\/abs\/1410.0759 Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. CoRR abs\/1410.0759 (2014). arXiv:1410.0759 http:\/\/arxiv.org\/abs\/1410.0759"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_1_11_1","first-page":"800","article-title":"TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems","volume":"3","author":"David Robert","year":"2021","unstructured":"Robert David , Jared Duke , Advait Jain , Vijay Janapa Reddi , Nat Jeffries , Jian Li , Nick Kreeger , Ian Nappier , Meghna Natraj , Tiezhen Wang , Pete Warden , and Rocky Rhodes . 2021 . TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems . In Proceedings of Machine Learning and Systems , Vol. 3. 800 -- 811 . Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, Pete Warden, and Rocky Rhodes. 2021. TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. In Proceedings of Machine Learning and Systems, Vol. 3. 800--811.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3062227"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.205"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304011"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00272"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"volume-title":"Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming.","author":"Hong Changwan","key":"e_1_3_2_1_19_1","unstructured":"Changwan Hong , Aravind Sukumaran-Rajam , Israt Nisa , Kunal Singh , and P. Sadayappan . 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication . In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming."},{"key":"e_1_3_2_1_20_1","volume-title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861 ( 2017 ). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861 (2017). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_2_1_21_1","volume-title":"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs\/1602.07360","author":"Iandola Forrest N.","year":"2016","unstructured":"Forrest N. Iandola , Matthew W. Moskewicz , Khalid Ashraf , Song Han , William J. Dally , and Kurt Keutzer . 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs\/1602.07360 ( 2016 ). arXiv:1602.07360 http:\/\/arxiv.org\/abs\/1602.07360 Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs\/1602.07360 (2016). arXiv:1602.07360 http:\/\/arxiv.org\/abs\/1602.07360"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3483274"},{"key":"e_1_3_2_1_23_1","first-page":"1","article-title":"MNN: A Universal and Efficient Inference Engine","volume":"2","author":"Jiang Xiaotang","year":"2020","unstructured":"Xiaotang Jiang , Huan Wang , Yiliu Chen , Ziqi Wu , Lichuan Wang , Bin Zou , Yafeng Yang , Zongyang Cui , Yu Cai , Tianhang Yu , Chengfei Lyu , and Zhihua Wu . 2020 . MNN: A Universal and Efficient Inference Engine . In Proceedings of Machine Learning and Systems , Vol. 2. 1 -- 13 . Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lyu, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In Proceedings of Machine Learning and Systems, Vol. 2. 1--13.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3092205"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273556"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3417313.3429382"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.481536"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2017.8050797"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432228"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACPR.2015.7486599"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2013.09.037"},{"key":"e_1_3_2_1_34_1","unstructured":"Nihui. 2018. NCNN is a high-performance neural network inference framework optimized for the mobile platform. http:\/\/github.com\/tencent\/ncnn.  Nihui. 2018. NCNN is a high-performance neural network inference framework optimized for the mobile platform. http:\/\/github.com\/tencent\/ncnn."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330384"},{"key":"e_1_3_2_1_36_1","unstructured":"NVIDIA. 2022. cuBLAS Library. https:\/\/developer.nvidia.com\/cublas  NVIDIA. 2022. cuBLAS Library. https:\/\/developer.nvidia.com\/cublas"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00166"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01217"},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems -","volume":"1","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015 . Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks . In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 91--99. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 91--99."},{"key":"e_1_3_2_1_40_1","volume-title":"INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021","author":"Romero Francisco","year":"2021","unstructured":"Francisco Romero , Qian Li 0027, Neeraja J. Yadwadkar , and Christos Kozyrakis . 2021 . INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021 , July 14 --16 , 2021. USENIX Association, 397--411. https:\/\/www.usenix.org\/conference\/atc21\/presentation\/romero Francisco Romero, Qian Li 0027, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14--16, 2021. USENIX Association, 397--411. https:\/\/www.usenix.org\/conference\/atc21\/presentation\/romero"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2897995"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00068"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3082282"},{"key":"e_1_3_2_1_45_1","unstructured":"TIANCHI-Alibaba. 2018. Industrial Defect Dataset. https:\/\/tianchi.aliyun.com\/competition\/entrance\/231682\/introduction?lang=en-us  TIANCHI-Alibaba. 2018. Industrial Defect Dataset. https:\/\/tianchi.aliyun.com\/competition\/entrance\/231682\/introduction?lang=en-us"},{"key":"e_1_3_2_1_46_1","first-page":"860","article-title":"Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity","volume":"3","author":"Wakatsuki Toshiaki","year":"2021","unstructured":"Toshiaki Wakatsuki , Sekitoshi Kanai , and Yasuhiro Fujiwara . 2021 . Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity . In Proceedings of Machine Learning and Systems , Vol. 3. 860 -- 872 . Toshiaki Wakatsuki, Sekitoshi Kanai, and Yasuhiro Fujiwara. 2021. Accelerate Inference of CNNs for Video Analysis While Preserving Exactness Exploiting Activation Sparsity. In Proceedings of Machine Learning and Systems, Vol. 3. 860--872.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2016.2587683"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00142"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIM.2020.3007292"},{"key":"e_1_3_2_1_50_1","volume-title":"AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments. arXiv preprint arXiv:2303.07129","author":"Wen Hao","year":"2023","unstructured":"Hao Wen , Yuanchun Li , Zunshuai Zhang , Shiqi Jiang , Xiaozhou Ye , Ye Ouyang , Ya-Qin Zhang , and Yunxin Liu . 2023. AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments. arXiv preprint arXiv:2303.07129 ( 2023 ). Hao Wen, Yuanchun Li, Zunshuai Zhang, Shiqi Jiang, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, and Yunxin Liu. 2023. AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments. arXiv preprint arXiv:2303.07129 (2023)."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.23919\/DATE54114.2022.9774744"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIE.2020.2982115"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241563"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3495243.3560517"},{"key":"e_1_3_2_1_55_1","volume-title":"2018 USENIX Annual Technical Conference, USENIX ATC 2018","author":"Zhang Minjia","year":"2018","unstructured":"Minjia Zhang , Samyam Rajbhandari , Wenhan Wang , and Yuxiong He . 2018 . DeepCPU: Serving RNN-based Deep Learning Models 10x Faster . In 2018 USENIX Annual Technical Conference, USENIX ATC 2018 , Boston, MA, USA, July 11--13 , 2018. USENIX Association, 951--965. https:\/\/www.usenix.org\/conference\/atc18\/presentation\/zhang-minjia Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018. USENIX Association, 951--965. https:\/\/www.usenix.org\/conference\/atc18\/presentation\/zhang-minjia"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2018.8342010"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2016.7428074"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.232"}],"event":{"name":"MobiSys '23: 21st Annual International Conference on Mobile Systems, Applications and Services","sponsor":["SIGMOBILE ACM Special Interest Group on Mobility of Systems, Users, Data and Computing","SIGOPS ACM Special Interest Group on Operating Systems"],"location":"Helsinki Finland","acronym":"MobiSys '23"},"container-title":["Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3581791.3596831","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:30Z","timestamp":1750178190000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3581791.3596831"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,18]]},"references-count":58,"alternative-id":["10.1145\/3581791.3596831","10.1145\/3581791"],"URL":"https:\/\/doi.org\/10.1145\/3581791.3596831","relation":{},"subject":[],"published":{"date-parts":[[2023,6,18]]},"assertion":[{"value":"2023-06-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}