{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:08:14Z","timestamp":1775228894113,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":64,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,9,9]],"date-time":"2021-09-09T00:00:00Z","timestamp":1631145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61872180"],"award-info":[{"award-number":["61872180"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,9,9]]},"DOI":"10.1145\/3447993.3448625","type":"proceedings-article","created":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T23:41:25Z","timestamp":1615592485000},"page":"215-228","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":65,"title":["AsyMo"],"prefix":"10.1145","author":[{"given":"Manni","family":"Wang","sequence":"first","affiliation":[{"name":"Xi'an Jiao Tong University"}]},{"given":"Shaohua","family":"Ding","sequence":"additional","affiliation":[{"name":"Nanjing University"}]},{"given":"Ting","family":"Cao","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Yunxin","family":"Liu","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Fengyuan","family":"Xu","sequence":"additional","affiliation":[{"name":"Nanjing University"}]}],"member":"320","published-online":{"date-parts":[[2021,9,9]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2019. stress-android. https:\/\/github.com\/m-ric\/stress-android\/  2019. stress-android . https:\/\/github.com\/m-ric\/stress-android\/"},{"key":"e_1_3_2_1_2_1","unstructured":"ARM. 2017. ARM documentation set for DynamIQ Shared Unit. http:\/\/infocenter.arm.com\/help\/index.jsp?topic=\/com.arm.doc.subset.cortexa.dsunit\/index.html  ARM. 2017. ARM documentation set for DynamIQ Shared Unit . http:\/\/infocenter.arm.com\/help\/index.jsp?topic=\/com.arm.doc.subset.cortexa.dsunit\/index.html"},{"key":"e_1_3_2_1_3_1","unstructured":"ARM. 2019. Energy Aware Scheduling (EAS). https:\/\/developer.arm.com\/tools-and-software\/open-source-software\/linux-kernel\/energy-aware-scheduling  ARM. 2019. Energy Aware Scheduling (EAS) . https:\/\/developer.arm.com\/tools-and-software\/open-source-software\/linux-kernel\/energy-aware-scheduling"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2020.102996"},{"key":"e_1_3_2_1_5_1","volume-title":"PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks","author":"Bateni Soroush","unstructured":"Soroush Bateni , Husheng Zhou , Yuankun Zhu , and Cong Liu . 2018. PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks . In RTSS. IEEE Computer Society , 107--118. Soroush Bateni, Husheng Zhou, Yuankun Zhu, and Cong Liu. 2018. PredJoule: A Timing-Predictable Energy Optimization Framework for Deep Neural Networks. In RTSS. IEEE Computer Society, 107--118."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2877890"},{"key":"e_1_3_2_1_7_1","volume-title":"ANSI C Coding Methodology. In International Conference on Supercomputing. ACM, 340--347","author":"Bilmes Jeff A.","year":"1997","unstructured":"Jeff A. Bilmes , Krste Asanovic , Chee-Whye Chin , and James Demmel . 1997 . Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance , ANSI C Coding Methodology. In International Conference on Supercomputing. ACM, 340--347 . Jeff A. Bilmes, Krste Asanovic, Chee-Whye Chin, and James Demmel. 1997. Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology. In International Conference on Supercomputing. ACM, 340--347."},{"key":"e_1_3_2_1_8_1","unstructured":"Dominik Brodowski. 2020. CPUFreq Governors. https:\/\/www.kernel.org\/doc\/Documentation\/cpu-freq\/governors.txt  Dominik Brodowski. 2020. CPUFreq Governors . https:\/\/www.kernel.org\/doc\/Documentation\/cpu-freq\/governors.txt"},{"key":"e_1_3_2_1_9_1","volume-title":"High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition.","author":"Chellapilla Kumar","year":"2006","unstructured":"Kumar Chellapilla , Sidd Puri , and Patrice Simard . 2006 . High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition. Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition."},{"key":"e_1_3_2_1_10_1","volume-title":"TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . USENIX Association, Carlsbad, CA, 578--594. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 578--594."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001177"},{"key":"e_1_3_2_1_12_1","unstructured":"Intel Corporation. 2004. Enhanced Intel Speed Step Technology for the Intel Pentium M Processor (White Paper).  Intel Corporation. 2004. Enhanced Intel Speed Step Technology for the Intel Pentium M Processor (White Paper)."},{"key":"e_1_3_2_1_13_1","volume-title":"Autotuning OpenCL Workgroup Size for Stencil Patterns. CoRR abs\/1511.02490","author":"Cummins Chris","year":"2015","unstructured":"Chris Cummins , Pavlos Petoumenos , Michel Steuwer , and Hugh Leather . 2015. Autotuning OpenCL Workgroup Size for Stencil Patterns. CoRR abs\/1511.02490 ( 2015 ). Chris Cummins, Pavlos Petoumenos, Michel Steuwer, and Hugh Leather. 2015. Autotuning OpenCL Workgroup Size for Stencil Patterns. CoRR abs\/1511.02490 (2015)."},{"key":"e_1_3_2_1_14_1","unstructured":"Marat Dukhan. 2018. NNPack acceleration package for neural networks on multi-core CPUs. https:\/\/github.com\/Maratyszcza\/NNPACK  Marat Dukhan. 2018. NNPack acceleration package for neural networks on multi-core CPUs . https:\/\/github.com\/Maratyszcza\/NNPACK"},{"key":"e_1_3_2_1_15_1","unstructured":"Eigen. 2020. Eigen. https:\/\/eigen.tuxfamily.org\/  Eigen. 2020. Eigen . https:\/\/eigen.tuxfamily.org\/"},{"key":"e_1_3_2_1_16_1","volume-title":"Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability. In IPDPS Workshops. IEEE Computer Society, 1231--1240","author":"Thomas","unstructured":"Thomas L. Falch and Anne C. Elster. 2015 . Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability. In IPDPS Workshops. IEEE Computer Society, 1231--1240 . Thomas L. Falch and Anne C. Elster. 2015. Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability. In IPDPS Workshops. IEEE Computer Society, 1231--1240."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2013.6606553"},{"key":"e_1_3_2_1_18_1","unstructured":"Google. 2019. TensorFlow: An end-to-end open source machine learning platform. https:\/\/www.tensorflow.org\/  Google. 2019. TensorFlow: An end-to-end open source machine learning platform . https:\/\/www.tensorflow.org\/"},{"key":"e_1_3_2_1_19_1","unstructured":"Google. 2019. TensorFlow Lite: Deploy machine learning models on mobile and IoT devices. https:\/\/www.tensorflow.org\/lite  Google. 2019. TensorFlow Lite: Deploy machine learning models on mobile and IoT devices . https:\/\/www.tensorflow.org\/lite"},{"key":"e_1_3_2_1_20_1","unstructured":"Google. 2020. Edge TPU. https:\/\/cloud.google.com\/edge-tpu\/  Google. 2020. Edge TPU . https:\/\/cloud.google.com\/edge-tpu\/"},{"key":"e_1_3_2_1_21_1","unstructured":"Peter Greenhalgh. 2011. Big.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7. https:\/\/www.cl.cam.ac.uk\/~rdm34\/big.LITTLE.pdf  Peter Greenhalgh. 2011. Big.LITTLE Processing with ARM Cortex TM -A15 & Cortex-A7 . https:\/\/www.cl.cam.ac.uk\/~rdm34\/big.LITTLE.pdf"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_23_1","unstructured":"HiSilicon. 2019. Kirin. http:\/\/www.hisilicon.com\/en\/Products\/ProductList\/Kirin  HiSilicon. 2019. Kirin . http:\/\/www.hisilicon.com\/en\/Products\/ProductList\/Kirin"},{"key":"e_1_3_2_1_24_1","first-page":"1","article-title":"GRNN","volume":"41","author":"Holmes Connor","year":"2019","unstructured":"Connor Holmes , Daniel Mawhirter , Yuxiong He , Feng Yan , and Bo Wu . 2019 . GRNN : Low-Latency and Scalable RNN Inference on GPUs. In EuroSys. ACM , 41 : 1 -- 41 :16. Connor Holmes, Daniel Mawhirter, Yuxiong He, Feng Yan, and Bo Wu. 2019. GRNN: Low-Latency and Scalable RNN Inference on GPUs. In EuroSys. ACM, 41:1--41:16.","journal-title":"Low-Latency and Scalable RNN Inference on GPUs. In EuroSys. ACM"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Andrew Howard Mark Sandler Grace Chu Liang-Chieh Chen Bo Chen Mingxing Tan Weijun Wang Yukun Zhu Ruoming Pang Vijay Vasudevan Quoc V. Le and Hartwig Adam. 2019. Searching for MobileNetV3. arXiv preprint arXiv:1905.02244.  Andrew Howard Mark Sandler Grace Chu Liang-Chieh Chen Bo Chen Mingxing Tan Weijun Wang Yukun Zhu Ruoming Pang Vijay Vasudevan Quoc V. Le and Hartwig Adam. 2019. Searching for MobileNetV3. arXiv preprint arXiv:1905.02244.","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_3_2_1_26_1","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. [n.d.]. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR ([n. d.]). http:\/\/arxiv.org\/abs\/1704.04861  Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. [n.d.]. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR ([n. d.]). http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_2_1_27_1","volume-title":"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and &lt;1MB model size. CoRR abs\/1602.07360","author":"Iandola Forrest N.","year":"2016","unstructured":"Forrest N. Iandola , Matthew W. Moskewicz , Khalid Ashraf , Song Han , William J. Dally , and Kurt Keutzer . 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and &lt;1MB model size. CoRR abs\/1602.07360 ( 2016 ). http:\/\/arxiv.org\/abs\/1602.07360 Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and &lt;1MB model size. CoRR abs\/1602.07360 (2016). http:\/\/arxiv.org\/abs\/1602.07360"},{"key":"e_1_3_2_1_28_1","unstructured":"Monsoon Solutions Inc. 2019. Monsoon. https:\/\/www.msoon.com\/online-store  Monsoon Solutions Inc. 2019. Monsoon . https:\/\/www.msoon.com\/online-store"},{"key":"e_1_3_2_1_29_1","unstructured":"Intel. 2020. OpenVINO Deploy high-performance deep learning inference. https:\/\/software.intel.com\/en-us\/openvino-toolkit  Intel. 2020. OpenVINO Deploy high-performance deep learning inference . https:\/\/software.intel.com\/en-us\/openvino-toolkit"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854038.2854047"},{"key":"e_1_3_2_1_31_1","volume-title":"2016 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 24--35","author":"Jibaja I.","unstructured":"I. Jibaja , T. Cao , S. M. Blackburn , and K. S. McKinley . 2016. Portable performance on Asymmetric Multicore Processors . In 2016 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 24--35 . I. Jibaja, T. Cao, S. M. Blackburn, and K. S. McKinley. 2016. Portable performance on Asymmetric Multicore Processors. In 2016 IEEE\/ACM International Symposium on Code Generation and Optimization (CGO). 24--35."},{"key":"e_1_3_2_1_32_1","volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems -","volume":"1","author":"Krizhevsky Alex","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012. ImageNet Classification with Deep Convolutional Neural Networks . In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (Lake Tahoe, Nevada) (NIPS'12). Curran Associates Inc., USA, 1097--1105. http:\/\/dl.acm.org\/citation.cfm?id=2999134.2999257 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (Lake Tahoe, Nevada) (NIPS'12). Curran Associates Inc., USA, 1097--1105. http:\/\/dl.acm.org\/citation.cfm?id=2999134.2999257"},{"key":"e_1_3_2_1_33_1","volume-title":"Farkas","author":"Kumar Rakesh","year":"2004","unstructured":"Rakesh Kumar , Dean M. Tullsen , Parthasarathy Ranganathan , Norman P. Jouppi , and Keith I . Farkas . 2004 . Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In ISCA. IEEE Computer Society , 64--75. Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In ISCA. IEEE Computer Society, 64--75."},{"key":"e_1_3_2_1_34_1","volume-title":"Special Issue","author":"Lam Monica D","year":"1991","unstructured":"Monica D Lam , Edward E Rothberg , and Michael E Wolf . 1991. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Operating Systems Review 25 , Special Issue ( 1991 ), 63--74. Monica D Lam, Edward E Rothberg, and Michael E Wolf. 1991. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Operating Systems Review 25, Special Issue (1991), 63--74."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2939785"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01970-8_89"},{"key":"e_1_3_2_1_37_1","unstructured":"Sicong Liu Yingyan Lin Zimu Zhou Kaiming Nan Hui Liu and Junzhao Du. 2018. On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework. In MobiSys. ACM 389--400.  Sicong Liu Yingyan Lin Zimu Zhou Kaiming Nan Hui Liu and Junzhao Du. 2018. On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework. In MobiSys . ACM 389--400."},{"key":"e_1_3_2_1_38_1","volume-title":"Berg","author":"Liu Wei","year":"2016","unstructured":"Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott Reed , Cheng-Yang Fu , and Alexander C . Berg . 2016 . SSD : Single Shot MultiBox Detector. http:\/\/arxiv.org\/abs\/1512.02325 To appear. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. http:\/\/arxiv.org\/abs\/1512.02325 To appear."},{"key":"e_1_3_2_1_39_1","volume-title":"Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Liu Yizhi","year":"2019","unstructured":"Yizhi Liu , Yao Wang , Ruofei Yu , Mu Li , Vin Sharma , and Yida Wang . 2019 . Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . USENIX Association, Renton, WA, 1025--1040. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/liu-yizhi Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 1025--1040. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/liu-yizhi"},{"key":"e_1_3_2_1_40_1","volume-title":"Yiming Wu and Bert Maher","author":"Marat Dukhan Hao Lu","year":"2018","unstructured":"Hao Lu Marat Dukhan , Yiming Wu and Bert Maher . 2018 . Quantized Neural Network PACKage . https:\/\/github.com\/pytorch\/QNNPACK Hao Lu Marat Dukhan, Yiming Wu and Bert Maher. 2018. Quantized Neural Network PACKage. https:\/\/github.com\/pytorch\/QNNPACK"},{"key":"e_1_3_2_1_41_1","volume-title":"Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference (USENIX ATC'96).","author":"McVoy Larry","year":"1996","unstructured":"Larry McVoy and Carl Staelin . 1996 . Lmbench: Portable Tools for Performance Analysis . In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference (USENIX ATC'96). Larry McVoy and Carl Staelin. 1996. Lmbench: Portable Tools for Performance Analysis. In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference (USENIX ATC'96)."},{"key":"e_1_3_2_1_42_1","unstructured":"Microsoft. 2019. ONNX Runtime. https:\/\/github.com\/microsoft\/onnxruntime  Microsoft. 2019. ONNX Runtime . https:\/\/github.com\/microsoft\/onnxruntime"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_3_2_1_44_1","unstructured":"Intel Movidius. 2020. Ultimate Performance at Ultra-Low PowerIntel Movidius Myriad X VPU. https:\/\/www.movidius.com\/myriadx  Intel Movidius. 2020. Ultimate Performance at Ultra-Low PowerIntel Movidius Myriad X VPU . https:\/\/www.movidius.com\/myriadx"},{"key":"e_1_3_2_1_45_1","volume-title":"CLTune: A Generic Auto-Tuner for OpenCL Kernels","author":"Nugteren Cedric","unstructured":"Cedric Nugteren and Valeriu Codreanu . 2015. CLTune: A Generic Auto-Tuner for OpenCL Kernels . In MCSoC. IEEE Computer Society , 195--202. Cedric Nugteren and Valeriu Codreanu. 2015. CLTune: A Generic Auto-Tuner for OpenCL Kernels. In MCSoC. IEEE Computer Society, 195--202."},{"key":"e_1_3_2_1_46_1","unstructured":"OpenMP. 2020. The OpenMP API specification for parallel programming. https:\/\/www.openmp.org\/  OpenMP. 2020. The OpenMP API specification for parallel programming . https:\/\/www.openmp.org\/"},{"key":"e_1_3_2_1_47_1","unstructured":"Qualcomm. 2019. Snapdragon 845 Mobile Platform. https:\/\/www.qualcomm.com\/products\/snapdragon-845-mobile-platform  Qualcomm. 2019. Snapdragon 845 Mobile Platform . https:\/\/www.qualcomm.com\/products\/snapdragon-845-mobile-platform"},{"key":"e_1_3_2_1_48_1","unstructured":"Qualcomm. 2020. Snapdragon Neural Processing Engine SDK. https:\/\/developer.qualcomm.com\/docs\/snpe\/overview.html  Qualcomm. 2020. Snapdragon Neural Processing Engine SDK . https:\/\/developer.qualcomm.com\/docs\/snpe\/overview.html"},{"key":"e_1_3_2_1_49_1","unstructured":"Rockchip. 2020. High performance AI development platform. http:\/\/t.rockchips.com\/en\/  Rockchip. 2020. High performance AI development platform . http:\/\/t.rockchips.com\/en\/"},{"key":"e_1_3_2_1_50_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA, USA, May 7--9 , 2015. http:\/\/arxiv.org\/abs\/1409.1556 Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015. http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_2_1_51_1","volume-title":"Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures","author":"Song Mingcong","unstructured":"Mingcong Song , Yang Hu , Huixiang Chen , and Tao Li. 2017. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures . In HPCA. IEEE Computer Society , 1--12. Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures. In HPCA. IEEE Computer Society, 1--12."},{"key":"e_1_3_2_1_52_1","volume-title":"SOL: Effortless Device Support for AI Frameworks without Source Code Changes. arXiv:2003.10688 [cs.DC]","author":"Weber Nicolas","year":"2020","unstructured":"Nicolas Weber and Felipe Huici . 2020 . SOL: Effortless Device Support for AI Frameworks without Source Code Changes. arXiv:2003.10688 [cs.DC] Nicolas Weber and Felipe Huici. 2020. SOL: Effortless Device Support for AI Frameworks without Source Code Changes. arXiv:2003.10688 [cs.DC]"},{"key":"e_1_3_2_1_53_1","volume-title":"Seinstra","author":"Werkhoven Benvan","year":"2014","unstructured":"Benvan Werkhoven , Jason Maassen , Henri E. Bal , and Frank J . Seinstra . 2014 . Optimizing convolution operations on GPUs using adaptive tiling. In Future Generation Computer System . 14--26. Benvan Werkhoven, Jason Maassen, Henri E.Bal, and Frank J.Seinstra. 2014. Optimizing convolution operations on GPUs using adaptive tiling. In Future Generation Computer System. 14--26."},{"key":"e_1_3_2_1_54_1","volume-title":"SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38","author":"Clinton Whaley R","year":"1998","unstructured":"R Clinton Whaley and Jack J Dongarra . 1998 . Automatically tuned linear algebra software . In SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38 . R Clinton Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In SC'98: Proceedings of the 1998 ACM\/IEEE conference on Supercomputing. IEEE, 38--38."},{"key":"e_1_3_2_1_55_1","volume-title":"Dongarra","author":"Clinton Whaley R.","year":"1999","unstructured":"R. Clinton Whaley and Jack J . Dongarra . 1999 . Automatically Tuned Linear Algebra Software. In PPSC. SIAM. R. Clinton Whaley and Jack J. Dongarra. 1999. Automatically Tuned Linear Algebra Software. In PPSC. SIAM."},{"key":"e_1_3_2_1_56_1","first-page":"2001","article-title":"Automated Empirical Optimization of Software and the ATLAS Project","volume":"27","author":"Whaley R. Clint","year":"2000","unstructured":"R. Clint Whaley , Antoine Petitet , and Jack J. Dongarra . 2000 . Automated Empirical Optimization of Software and the ATLAS Project . PARALLEL COMPUTING 27 (2000), 2001 . R. Clint Whaley, Antoine Petitet, and Jack J. Dongarra. 2000. Automated Empirical Optimization of Software and the ATLAS Project. PARALLEL COMPUTING 27 (2000), 2001.","journal-title":"PARALLEL COMPUTING"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00048"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313591"},{"key":"e_1_3_2_1_60_1","volume-title":"A method to estimate the energy consumption of deep neural networks","author":"Yang Tien-Ju","year":"1916","unstructured":"Tien-Ju Yang , Yu-Hsin Chen , Joel S. Emer , and Vivienne Sze . 2017. A method to estimate the energy consumption of deep neural networks . In ACSSC. IEEE , 1916 --1920. Tien-Ju Yang, Yu-Hsin Chen, Joel S. Emer, and Vivienne Sze. 2017. A method to estimate the energy consumption of deep neural networks. In ACSSC. IEEE, 1916--1920."},{"key":"e_1_3_2_1_61_1","volume-title":"Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning","author":"Yang Tien-Ju","unstructured":"Tien-Ju Yang , Yu-Hsin Chen , and Vivienne Sze . 2017. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning . In CVPR. IEEE Computer Society , 6071--6079. Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning. In CVPR. IEEE Computer Society, 6071--6079."},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3045279"},{"key":"e_1_3_2_1_63_1","unstructured":"ZeptoLab. 2020. Cut the Rope. https:\/\/cuttherope.net\/#ctr  ZeptoLab. 2020. Cut the Rope . https:\/\/cuttherope.net\/#ctr"},{"key":"e_1_3_2_1_64_1","volume-title":"2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Zhang Minjia","year":"2018","unstructured":"Minjia Zhang , Samyam Rajbhandari , Wenhan Wang , and Yuxiong He . 2018 . Deep-CPU: Serving RNN-based Deep Learning Models 10x Faster . In 2018 USENIX Annual Technical Conference (USENIX ATC 18) . USENIX Association, Boston, MA, 951--965. https:\/\/www.usenix.org\/conference\/atc18\/presentation\/zhang-minjia Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. Deep-CPU: Serving RNN-based Deep Learning Models 10x Faster. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 951--965. https:\/\/www.usenix.org\/conference\/atc18\/presentation\/zhang-minjia"}],"event":{"name":"ACM MobiCom '21: The 27th Annual International Conference on Mobile Computing and Networking","location":"New Orleans Louisiana","acronym":"ACM MobiCom '21","sponsor":["SIGMOBILE ACM Special Interest Group on Mobility of Systems, Users, Data and Computing"]},"container-title":["Proceedings of the 27th Annual International Conference on Mobile Computing and Networking"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447993.3448625","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447993.3448625","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:24Z","timestamp":1750195704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447993.3448625"}},"subtitle":["scalable and efficient deep-learning inference on asymmetric mobile CPUs"],"short-title":[],"issued":{"date-parts":[[2021,9,9]]},"references-count":64,"alternative-id":["10.1145\/3447993.3448625","10.1145\/3447993"],"URL":"https:\/\/doi.org\/10.1145\/3447993.3448625","relation":{},"subject":[],"published":{"date-parts":[[2021,9,9]]},"assertion":[{"value":"2021-09-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}