{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,5]],"date-time":"2026-04-05T05:36:21Z","timestamp":1775367381860,"version":"3.50.1"},"reference-count":256,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2024,6,24]],"date-time":"2024-06-24T00:00:00Z","timestamp":1719187200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science and Technology Council, Taiwan","award":["NSTC-112-2628-E-002-033-MY4, NSTC-112-2634-F-002-002-MBK, and NSTC-112-2218-E-A49-023"],"award-info":[{"award-number":["NSTC-112-2628-E-002-033-MY4, NSTC-112-2634-F-002-002-MBK, and NSTC-112-2218-E-A49-023"]}]},{"name":"National Key Fields Industry-University Cooperation and Skilled Personnel Training Act"},{"name":"Ministry of Education (MOE) and industry partners in Taiwan"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model\u2019s accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.<\/jats:p>","DOI":"10.1145\/3657282","type":"journal-article","created":{"date-parts":[[2024,5,11]],"date-time":"2024-05-11T11:09:59Z","timestamp":1715425799000},"page":"1-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":154,"title":["Lightweight Deep Learning for Resource-Constrained Environments: A Survey"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2101-2997","authenticated-orcid":false,"given":"Hou-I","family":"Liu","sequence":"first","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-9907-3253","authenticated-orcid":false,"given":"Marco","family":"Galindo","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5652-4327","authenticated-orcid":false,"given":"Hongxia","family":"Xie","sequence":"additional","affiliation":[{"name":"Jilin University, Changchun, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4517-0391","authenticated-orcid":false,"given":"Lai-Kuan","family":"Wong","sequence":"additional","affiliation":[{"name":"Multimedia University, Cyberjaya, Malaysia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2216-077X","authenticated-orcid":false,"given":"Hong-Han","family":"Shuai","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0475-3689","authenticated-orcid":false,"given":"Yung-Hui","family":"Li","sequence":"additional","affiliation":[{"name":"Foxconn Research, Taipei, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4662-7875","authenticated-orcid":false,"given":"Wen-Huang","family":"Cheng","sequence":"additional","affiliation":[{"name":"National Taiwan University, Taipei, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2024,6,24]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"265","volume-title":"OSDI","author":"Abadi M.","year":"2016","unstructured":"M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265\u2013283."},{"key":"e_1_3_1_3_2","unstructured":"M. S. Abdelfattah A. Mehrotra \u0141. Dudziak and N. D. Lane. 2021. Zero-cost proxies for lightweight NAS. In ICLR."},{"key":"e_1_3_1_4_2","volume-title":"Advances in Image Manipulation Workshop in Conjunction with ECCV 2022","unstructured":"2024. Advances in Image Manipulation Workshop in Conjunction with ECCV 2022. Retrieved from https:\/\/data.vision.ee.ethz.ch\/cvl\/aim22\/"},{"key":"e_1_3_1_5_2","volume-title":"AI and Compute","author":"Amodei D.","year":"2018","unstructured":"D. Amodei and D. Hernandez. 2018. AI and Compute. Retrieved from https:\/\/openai.com\/blog\/ai-and-compute"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3139001"},{"key":"e_1_3_1_7_2","article-title":"PaLM 2 technical report","author":"Anil R.","year":"2023","unstructured":"R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen, E. Chu, J. H. Clark, L. E. Shafey, Y. Huang, K. Meier-Hellstern, G. Mishra, E. Moreira, M. Omernick, K. Robinson, S. Ruder, Y. Tay, K. Xiao, Y. Xu, Y. Zhang, G. H. Abrego, J. Ahn, J. Austin, P. Barham, J. Botha, J. Bradbury, S. Brahma, K. Brooks, M. Catasta, Y. Cheng, C. Cherry, C. A. Choquette-Choo, A. Chowdhery, C. Crepy, S. Dave, M. Dehghani, S. Dev, J. Devlin, M. D\u00edaz, N. Du, E. Dyer, V. Feinberg, F. Feng, V. Fienber, M. Freitag, X. Garcia, S. Gehrmann, L. Gonzalez, G. Gur-Ari, S. Hand, H. Hashemi, L. Hou, J. Howland, A. Hu, J. Hui, J. Hurwitz, M. Isard, A. Ittycheriah, M. Jagielski, W. Jia, K. Kenealy, M. Krikun, S. Kudugunta, C. Lan, K. Lee, B. Lee, E. Li, M. Li, W. Li, Y. Li, J. Li, H. Lim, H. Lin, Z. Liu, F. Liu, M. Maggioni, A. Mahendru, J. Maynez, V. Misra, M. Moussalem, Z. Nado, J. Nham, E. Ni, A. Nystrom, A. Parrish, M. Pellat, M. Polacek, A. Polozov, R. Pope, S. Qiao, E. Reif, B. Richter, P. Riley, A. C. Ros, A. Roy, B. Saeta, R. Samuel, R. Shelby, A. Slone, D. Smilkov, D. R. So, D. Sohn, S. Tokumine, D. Valter, V. Vasudevan, K. Vodrahalli, X. Wang, P. Wang, Z. Wang, T. Wang, J. Wieting, Y. Wu, K. Xu, Y. Xu, L. Xue, P. Yin, J. Yu, Q. Zhang, S. Zheng, C. Zheng, W. Zhou, D. Zhou, S. Petrov, Y. Wu. 2023. PaLM 2 technical report. Google. arXiv preprint arXiv:2305.10403 (2023).","journal-title":"arXiv preprint arXiv:2305.10403"},{"key":"e_1_3_1_8_2","first-page":"86","volume-title":"LOD","author":"Asperti A.","year":"2021","unstructured":"A. Asperti, D. Evangelista, and M. Marzolla. 2021. Dissecting FLOPs along input dimensions for GreenAI cost estimations. In LOD. 86\u2013100."},{"key":"e_1_3_1_9_2","article-title":"MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers","author":"Banbury C.","year":"2021","unstructured":"C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V. Janapa Reddi, M. Mattina, and P. Whatmough. 2021. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. In Annual Conference on Machine Learning and Systems.","journal-title":"Annual Conference on Machine Learning and Systems"},{"key":"e_1_3_1_10_2","article-title":"Scalable methods for 8-bit training of neural networks","author":"Banner R.","year":"2018","unstructured":"R. Banner, I. Hubara, E. Hoffer, and D. Soudry. 2018. Scalable methods for 8-bit training of neural networks. In Annual Conference on Neural Information Processing Systems.","journal-title":"Annual Conference on Neural Information Processing Systems"},{"key":"e_1_3_1_11_2","volume-title":"GPT-4 Has More than a Trillion Parameters - Report","author":"Bastian M.","year":"2024","unstructured":"M. Bastian. 2024. GPT-4 Has More than a Trillion Parameters - Report. Retrieved from https:\/\/the-decoder.com\/gpt-4-has-a-trillion-parameters\/"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-020-01596-1"},{"issue":"5","key":"e_1_3_1_13_2","first-page":"73","article-title":"An improving method for loop unrolling","volume":"11","author":"Booshehri M.","year":"2013","unstructured":"M. Booshehri, A. Malekpour, and P. Luksch. 2013. An improving method for loop unrolling. Int. J. Comput. Sci. Inf. Secur. 11, 5 (2013), 73\u201376.","journal-title":"Int. J. Comput. Sci. Inf. Secur."},{"key":"e_1_3_1_14_2","volume-title":"ICLR","author":"Cai H.","year":"2020","unstructured":"H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. 2020. Once-for-all: Train one network and specialize it for efficient deployment. In ICLR."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01318"},{"issue":"5","key":"e_1_3_1_16_2","first-page":"871","article-title":"CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices","volume":"67","author":"Capotondi A.","year":"2020","unstructured":"A. Capotondi, M. Rusci, M. Fariselli, and L. Benini. 2020. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. IEEE Trans. Circ. Syst. II 67, 5 (2020), 871\u2013875.","journal-title":"IEEE Trans. Circ. Syst. II"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3039858"},{"key":"e_1_3_1_18_2","first-page":"291","article-title":"SLIDE: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems","author":"Chen B.","year":"2020","unstructured":"B. Chen, T. Medini, J. Farwell, C. Tai, A. Shrivastava, et\u00a0al. 2020. SLIDE: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. Annual Conference on Machine Learning and Systems. 291\u2013306.","journal-title":"Annual Conference on Machine Learning and Systems"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01355"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01163"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i8.16865"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00154"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00497"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2654822.2541967"},{"key":"e_1_3_1_25_2","article-title":"MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems","author":"Chen T.","year":"2016","unstructured":"T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. 2016. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPSW.","journal-title":"NIPSW"},{"key":"e_1_3_1_26_2","first-page":"1283","volume-title":"DATE","author":"Chen W.","year":"2020","unstructured":"W. Chen, Y. Wang, S. Yang, C. Liu, and L. Zhang. 2020. You only search once: A fast automation framework for single-stage DNN\/Accelerator co-design. In DATE. 1283\u20131286."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00741"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00520"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_1_30_2","article-title":"DetNAS: Neural architecture search on object detection","author":"Chen Y.","year":"2019","unstructured":"Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, and J. Sun. 2019. DetNAS: Neural architecture search on object detection. In NIPS. 4\u20131.","journal-title":"NIPS"},{"key":"e_1_3_1_31_2","article-title":"cuDNN: Efficient primitives for deep learning","author":"Chetlur S.","year":"2014","unstructured":"S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).","journal-title":"arXiv preprint arXiv:1410.0759"},{"key":"e_1_3_1_32_2","article-title":"Generating long sequences with sparse transformers","author":"Child R.","year":"2019","unstructured":"R. Child, S. Gray, A. Radford, and I. Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).","journal-title":"arXiv preprint arXiv:1904.10509"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10030230"},{"key":"e_1_3_1_34_2","article-title":"PACT: Parameterized clipping activation for quantized neural networks","author":"Choi J.","year":"2018","unstructured":"J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).","journal-title":"arXiv preprint arXiv:1805.06085"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586121"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.195"},{"key":"e_1_3_1_37_2","volume-title":"ICLR","author":"Choromanski K.","year":"2021","unstructured":"K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller. 2021. Rethinking attention with performers. In ICLR."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01601"},{"key":"e_1_3_1_39_2","first-page":"3965","article-title":"CoAtNet: Marrying convolution and attention for all data sizes","author":"Dai Z.","year":"2021","unstructured":"Z. Dai, H. Liu, Q. V. Le, and M. Tan. 2021. CoAtNet: Marrying convolution and attention for all data sizes. In NIPS. 3965\u20133977.","journal-title":"NIPS"},{"key":"e_1_3_1_40_2","first-page":"800","volume-title":"MLSys","author":"David R.","year":"2021","unstructured":"R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes. 2021. TensorFlow Lite Micro: Embedded machine learning for TinyML systems. In MLSys. 800\u2013811."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00408"},{"key":"e_1_3_1_42_2","article-title":"Learning to prune deep neural networks via layer-wise optimal brain surgeon","author":"Dong X.","year":"2017","unstructured":"X. Dong, S. Chen, and S. Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_43_2","first-page":"18518","article-title":"HAWQ-V2: Hessian aware trace-weighted quantization of neural networks","author":"Dong Z.","year":"2020","unstructured":"Z. Dong, Z. Yao, D. Arfeen, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. In NIPS. 18518\u201318529.","journal-title":"NIPS"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00038"},{"key":"e_1_3_1_45_2","volume-title":"ICLR","author":"Dosovitskiy A.","year":"2021","unstructured":"A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2017.2735490"},{"issue":"1","key":"e_1_3_1_47_2","first-page":"107","article-title":"Application of microcontroller in assembly line for safety and controlling","volume":"6","author":"Dubey S.","year":"2019","unstructured":"S. Dubey, V. K. Soni, and B. K. Dubey. 2019. Application of microcontroller in assembly line for safety and controlling. Int. J. Res. Analyt. Rev. 6, 1 (2019), 107\u2013111.","journal-title":"Int. J. Res. Analyt. Rev."},{"key":"e_1_3_1_48_2","first-page":"2286","volume-title":"ICML","author":"d\u2019Ascoli S.","year":"2021","unstructured":"S. d\u2019Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun. 2021. ConViT: Improving vision transformers with soft convolutional inductive biases. In ICML. 2286\u20132296."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00268"},{"key":"e_1_3_1_50_2","first-page":"3174","article-title":"Adaptive gradient quantization for data-parallel SGD","author":"Faghri F.","year":"2020","unstructured":"F. Faghri, I. Tabrizian, I. Markov, D. Alistarh, D. M. Roy, and A. Ramezani-Kebrya. 2020. Adaptive gradient quantization for data-parallel SGD. In NIPS. 3174\u20133185.","journal-title":"NIPS"},{"key":"e_1_3_1_51_2","first-page":"3212","volume-title":"SMC","author":"Fan Z.","year":"2021","unstructured":"Z. Fan, W. Hu, H. Guo, F. Liu, and D. Xu. 2021. Hardware and algorithm co-optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In SMC. 3212\u20133217."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-05318-5_6"},{"key":"e_1_3_1_53_2","volume-title":"ONNX","unstructured":"2024. ONNX. Retrieved from https:\/\/onnx.ai\/"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-06097-1"},{"key":"e_1_3_1_55_2","article-title":"The lottery ticket hypothesis: Finding sparse, trainable neural networks","author":"Frankle J.","year":"2019","unstructured":"J. Frankle and M. Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR.","journal-title":"ICLR"},{"key":"e_1_3_1_56_2","article-title":"SparseGPT: Massive language models can be accurately pruned in one-shot","author":"Frantar E.","year":"2023","unstructured":"E. Frantar and D. Alistarh. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. arXiv preprint arXiv:2301.00774 (2023).","journal-title":"arXiv preprint arXiv:2301.00774"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.2298\/CSIS220131065F"},{"key":"e_1_3_1_58_2","volume-title":"ICLR","author":"Getzner J.","year":"2023","unstructured":"J. Getzner, B. Charpentier, and S. G\u00fcnnemann. 2023. Accuracy is not the only metric that matters: Estimating the energy consumption of deep learning models. In ICLR."},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","unstructured":"A. Gholami S. Kim Z. Dong Z. Yao M. W. Mahoney and K. Keutzer. 2022. A survey of quantization methods for efficient neural network inference. LPCV 291\u2013326.","DOI":"10.1201\/9781003162810-13"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00215"},{"key":"e_1_3_1_61_2","article-title":"The reversible residual network: Backpropagation without storing activations","author":"Gomez A. N.","year":"2017","unstructured":"A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_62_2","volume-title":"Post-training Quantization | TensorFlow Lite","year":"2023","unstructured":"Google. 2023. Post-training Quantization | TensorFlow Lite. Retrieved from https:\/\/www.tensorflow.org\/lite\/performance\/post_training_quantization"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-021-01453-z"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01204"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/18.720541"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2705069"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01103"},{"key":"e_1_3_1_68_2","article-title":"Dynamic network surgery for efficient DNNs","author":"Guo Y.","year":"2016","unstructured":"Y. Guo, A. Yao, and Y. Chen. 2016. Dynamic network surgery for efficient DNNs. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i1.25152"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3487045"},{"key":"e_1_3_1_71_2","unstructured":"S. Gupta A. Agrawal K. Gopalakrishnan and P. Narayanan. 2015. Deep learning with limited numerical precision. In ICML 1737\u20131746."},{"key":"e_1_3_1_72_2","article-title":"Accelerator-aware neural network design using AutoML","author":"Gupta S.","year":"2020","unstructured":"S. Gupta and B. Akin. 2020. Accelerator-aware neural network design using AutoML. In Annual Conference on Machine Learning and Systems Workshop.","journal-title":"Annual Conference on Machine Learning and Systems Workshop"},{"key":"e_1_3_1_73_2","first-page":"328","volume-title":"HPCA","author":"Ham T. J.","year":"2020","unstructured":"T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, and D.-K. Jeong. 2020. A^2303 3: Accelerating attention mechanisms in neural networks with approximation. In HPCA. 328\u2013341."},{"key":"e_1_3_1_74_2","first-page":"692","volume-title":"ISCA","author":"Ham T. J.","year":"2021","unstructured":"T. J. Ham, Y. Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, and J. W. Lee. 2021. ELSA: Hardware-Software co-design for efficient, lightweight self-attention mechanism in neural networks. In ISCA. 692\u2013705."},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3152247"},{"key":"e_1_3_1_76_2","volume-title":"ICLR","author":"Han S.","year":"2016","unstructured":"S. Han, H. Mao, and W. J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In ICLR."},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICNN.1993.298572"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106622"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00208"},{"key":"e_1_3_1_81_2","first-page":"2234","article-title":"Soft filter pruning for accelerating deep convolutional neural networks","author":"He Y.","year":"2018","unstructured":"Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In IJCAI. 2234\u20132240.","journal-title":"IJCAI"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00447"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00134"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2980195"},{"key":"e_1_3_1_86_2","article-title":"Distilling the knowledge in a neural network","author":"Hinton G.","year":"2015","unstructured":"G. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).","journal-title":"arXiv preprint arXiv:1503.02531"},{"key":"e_1_3_1_87_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","author":"Ho J.","year":"2020","unstructured":"J. Ho, A. Jain, and P. Abbeel. 2020. Denoising diffusion probabilistic models. In NIPS. 6840\u20136851.","journal-title":"NIPS"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00110"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_3_1_90_2","article-title":"MobileNets: Efficient convolutional neural networks for mobile vision applications","author":"Howard A. G.","year":"2017","unstructured":"A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).","journal-title":"arXiv preprint arXiv:1704.04861"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2020.101831"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","unstructured":"W. Hu Z. Che N. Liu M. Li J. Tang C. Zhang and J. Wang. 2023. CATRO: Channel pruning via class-aware trace ratio optimization. Trans. Neural. Netw. Learn. Syst. (2023) 1\u201313. DOI:10.1109\/TNNLS.2023.3262952","DOI":"10.1109\/TNNLS.2023.3262952"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00291"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASSET.1999.756775"},{"key":"e_1_3_1_97_2","volume-title":"ICLR","author":"Huang Z.","year":"2019","unstructured":"Z. Huang and N. Wang. 2019. Like what you like: Knowledge distill via neuron selectivity transfer. In ICLR."},{"key":"e_1_3_1_98_2","first-page":"4114","volume-title":"NIPS","author":"Hubara I.","year":"2016","unstructured":"I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2016. Binarized neural networks. In NIPS. 4114\u20134122."},{"key":"e_1_3_1_99_2","volume-title":"ICLR","author":"Iandola F. N.","year":"2017","unstructured":"F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In ICLR."},{"key":"e_1_3_1_100_2","doi-asserted-by":"crossref","unstructured":"B. Jacob S. Kligys B. Chen M. Zhu M. Tang A. Howard H. Adam and D. Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR 2704\u20132713.","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_1_101_2","article-title":"Constructing fast network through deconstruction of convolution","author":"Jeon Y.","year":"2018","unstructured":"Y. Jeon and J. Kim. 2018. Constructing fast network through deconstruction of convolution. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19827-4_41"},{"key":"e_1_3_1_103_2","first-page":"1","volume-title":"ISCA","author":"Jouppi N.","year":"2023","unstructured":"N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. Patterson. 2023. TPU v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In ISCA. 1\u201314."},{"key":"e_1_3_1_104_2","first-page":"1","volume-title":"ISCA","author":"Jouppi N. P.","year":"2017","unstructured":"N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA. 1\u201312."},{"key":"e_1_3_1_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00448"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00881"},{"key":"e_1_3_1_107_2","first-page":"7021","volume-title":"ICML","author":"Kang M.","year":"2020","unstructured":"M. Kang and B. Han. 2020. Operation-aware soft channel pruning using differentiable masks. In ICML. 7021\u20137032."},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00650"},{"key":"e_1_3_1_109_2","volume-title":"ICLR","author":"Kitaev N.","year":"2020","unstructured":"N. Kitaev, \u0141. Kaiser, and A. Levskaya. 2020. Reformer: The efficient transformer. In ICLR."},{"key":"e_1_3_1_110_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-05318-5_4"},{"key":"e_1_3_1_111_2","first-page":"1097","article-title":"ImageNet classification with deep convolutional neural networks","author":"Krizhevsky A.","year":"2012","unstructured":"A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS. 1097\u20131105.","journal-title":"NIPS"},{"key":"e_1_3_1_112_2","article-title":"Scale MLPerf-0.6 models on Google TPU-v3 pods","author":"Kumar S.","year":"2019","unstructured":"S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang, et\u00a0al. 2019. Scale MLPerf-0.6 models on Google TPU-v3 pods. arXiv preprint arXiv:1909.09756 (2019).","journal-title":"arXiv preprint arXiv:1909.09756"},{"key":"e_1_3_1_113_2","article-title":"CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs","author":"Lai L.","year":"2018","unstructured":"L. Lai, N. Suda, and V. Chandra. 2018. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs. arXiv preprint arXiv:1801.06601 (2018).","journal-title":"arXiv preprint arXiv:1801.06601"},{"key":"e_1_3_1_114_2","article-title":"Optimal brain damage","author":"LeCun Y.","year":"1989","unstructured":"Y. LeCun, J. Denker, and S. Solla. 1989. Optimal brain damage. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_115_2","article-title":"SNIP: Single-shot network pruning based on connection sensitivity","author":"Lee N.","year":"2019","unstructured":"N. Lee, T. Ajanthan, and P. H. Torr. 2019. SNIP: Single-shot network pruning based on connection sensitivity. In ICLR.","journal-title":"ICLR"},{"key":"e_1_3_1_116_2","volume-title":"ICLR","author":"Li H.","year":"2017","unstructured":"H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. 2017. Pruning filters for efficient ConvNets. In ICLR."},{"key":"e_1_3_1_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/SSIAI.2016.7459201"},{"key":"e_1_3_1_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3152732"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00799"},{"key":"e_1_3_1_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218749"},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2022.102520"},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_3_1_123_2","volume-title":"ICLR","author":"Liang Y.","year":"2021","unstructured":"Y. Liang, G. Chongjian, Z. Tong, Y. Song, J. Wang, and P. Xie. 2021. EViT: Expediting vision transformers via token reorganizations. In ICLR."},{"key":"e_1_3_1_124_2","volume-title":"NIPS","author":"Lin J.","year":"2021","unstructured":"J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han. 2021. MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. In NIPS."},{"key":"e_1_3_1_125_2","first-page":"11711","article-title":"MCUNet: Tiny deep learning on IoT devices","author":"Lin J.","year":"2020","unstructured":"J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han, et\u00a0al. 2020. MCUNet: Tiny deep learning on IoT devices. In NIPS. 11711\u201311722.","journal-title":"NIPS"},{"key":"e_1_3_1_126_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01064"},{"key":"e_1_3_1_127_2","article-title":"Neural-hardware architecture search","author":"Lin Y.","year":"2020","unstructured":"Y. Lin, D. Hafdi, K. Wang, Z. Liu, and S. Han. 2020. Neural-hardware architecture search. In NIPSWS.","journal-title":"NIPSWS"},{"issue":"5","key":"e_1_3_1_128_2","first-page":"1642","article-title":"Data and hardware efficient design for convolutional neural network","volume":"65","author":"Lin Y.-J.","year":"2017","unstructured":"Y.-J. Lin and T. S. Chang. 2017. Data and hardware efficient design for convolutional neural network. IEEE Trans. Circ. Syst. I 65, 5 (2017), 1642\u20131651.","journal-title":"IEEE Trans. Circ. Syst. I"},{"key":"e_1_3_1_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10094626"},{"key":"e_1_3_1_130_2","unstructured":"H. Liu K. Simonyan and Y. Yang. 2019. DARTS: Differentiable architecture search. In ICLR."},{"key":"e_1_3_1_131_2","volume-title":"ICML","author":"Liu L.","year":"2021","unstructured":"L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y. Chen, W. Yang, Q. Liao, and W. Zhang. 2021. Group Fisher pruning for practical network compression. In ICML."},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i10.17054"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01170"},{"key":"e_1_3_1_134_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_1_135_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00339"},{"key":"e_1_3_1_136_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3139234"},{"key":"e_1_3_1_137_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2574353"},{"key":"e_1_3_1_138_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"e_1_3_1_139_2","volume-title":"Mobile AI Workshop 2021","year":"2021","unstructured":"2021. Mobile AI Workshop 2021. Retrieved from https:\/\/ai-benchmark.com\/workshops\/mai\/2021\/#challenges"},{"key":"e_1_3_1_140_2","volume-title":"Mobile AI Workshop 2022","year":"2022","unstructured":"2022. Mobile AI Workshop 2022. Retrieved from https:\/\/ai-benchmark.com\/workshops\/mai\/2022\/#challenges"},{"key":"e_1_3_1_141_2","volume-title":"Mobile AI Workshop 2023","year":"2023","unstructured":"2023. Mobile AI Workshop 2023. Retrieved from https:\/\/ai-benchmark.com\/workshops\/mai\/2023\/#challenges"},{"key":"e_1_3_1_142_2","volume-title":"ICLR","author":"Mehta S.","year":"2021","unstructured":"S. Mehta, M. Ghazvininejad, S. Iyer, L. Zettlemoyer, and H. Hajishirzi. 2021. DeLighT: Very deep and light-weight transformer. In ICLR."},{"key":"e_1_3_1_143_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1491"},{"key":"e_1_3_1_144_2","volume-title":"ICLR","author":"Mehta S.","year":"2020","unstructured":"S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2020. DeFINE: Deep factorized input token embeddings for neural sequence modeling. In ICLR."},{"key":"e_1_3_1_145_2","volume-title":"ICLR","author":"Mehta S.","year":"2022","unstructured":"S. Mehta and M. Rastegari. 2022. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. In ICLR."},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578360.3580257"},{"key":"e_1_3_1_147_2","unstructured":"P. Micikevicius S. Narang J. Alben G. Diamos E. Elsen D. Garcia B. Ginsburg M. Houston O. Kuchaiev G. Venkatesh and H. Wu. 2018. Mixed precision training. In ICLR."},{"key":"e_1_3_1_148_2","article-title":"Intriguing properties of vision transformers","author":"Naseer M. M.","year":"2021","unstructured":"M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang. 2021. Intriguing properties of vision transformers. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_149_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613963"},{"key":"e_1_3_1_150_2","volume-title":"NVIDIA CUDA-X: GPU Accelerated Libraries","year":"2023","unstructured":"NVIDIA. 2023. NVIDIA CUDA-X: GPU Accelerated Libraries. Retrieved from https:\/\/developer.nvidia.com\/gpu-accelerated-libraries"},{"key":"e_1_3_1_151_2","unstructured":"OpenAI. 2023. GPT-4 technical report. OpenAI. (2023)."},{"key":"e_1_3_1_152_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080254"},{"key":"e_1_3_1_153_2","article-title":"PyTorch: An imperative style, high-performance deep learning library","author":"Paszke A.","year":"2019","unstructured":"A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_154_2","first-page":"5113","volume-title":"ICML","author":"Peng H.","year":"2019","unstructured":"H. Peng, J. Wu, S. Chen, and J. Huang. 2019. Collaborative channel pruning for deep networks. In ICML. 5113\u20135122."},{"key":"e_1_3_1_155_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00357"},{"key":"e_1_3_1_156_2","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/1291103"},{"key":"e_1_3_1_157_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847265"},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01044"},{"key":"e_1_3_1_159_2","first-page":"13937","article-title":"DynamicViT: Efficient vision transformers with dynamic token sparsification","author":"Rao Y.","year":"2021","unstructured":"Y. Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh. 2021. DynamicViT: Efficient vision transformers with dynamic token sparsification. In NIPS. 13937\u201313949.","journal-title":"NIPS"},{"key":"e_1_3_1_160_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_1_161_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2021.11.019"},{"key":"e_1_3_1_162_2","first-page":"2902","volume-title":"ICML","author":"Real E.","year":"2017","unstructured":"E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-scale evolution of image classifiers. In ICML. 2902\u20132911."},{"key":"e_1_3_1_163_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447582"},{"key":"e_1_3_1_164_2","doi-asserted-by":"publisher","DOI":"10.1109\/PerComWorkshops53856.2022.9767398"},{"key":"e_1_3_1_165_2","doi-asserted-by":"publisher","DOI":"10.1145\/3623402"},{"key":"e_1_3_1_166_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_1_167_2","first-page":"19123","volume-title":"ICML","author":"Sakr C.","year":"2022","unstructured":"C. Sakr, S. Dai, R. Venkatesan, B. Zimmer, W. Dally, and B. Khailany. 2022. Optimal clipping and magnitude-aware differentiation for improved quantization-aware training. In ICML. 19123\u201319138."},{"key":"e_1_3_1_168_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_169_2","doi-asserted-by":"publisher","DOI":"10.1145\/3381831"},{"key":"e_1_3_1_170_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3126685"},{"key":"e_1_3_1_171_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10080895"},{"key":"e_1_3_1_172_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00196"},{"key":"e_1_3_1_173_2","volume-title":"ICLR","author":"Simonyan K.","year":"2015","unstructured":"K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR."},{"key":"e_1_3_1_174_2","volume-title":"State of IoT 2023: Number of Connected IoT Devices Growing 16% to 16.7 Billion Globally","author":"Sinha S.","year":"2023","unstructured":"S. Sinha. 2023. State of IoT 2023: Number of Connected IoT Devices Growing 16% to 16.7 Billion Globally. Retrieved from https:\/\/iot-analytics.com\/number-connected-iot-devices\/"},{"key":"e_1_3_1_175_2","volume-title":"ICLR","author":"Song Y.","year":"2021","unstructured":"Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. 2021. Score-based generative modeling through stochastic differential equations. In ICLR."},{"key":"e_1_3_1_176_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01625"},{"key":"e_1_3_1_177_2","article-title":"Optimally scheduling CNN convolutions for efficient memory access","author":"Stoutchinin A.","year":"2019","unstructured":"A. Stoutchinin, F. Conti, and L. Benini. 2019. Optimally scheduling CNN convolutions for efficient memory access. arXiv preprint arXiv:1902.01492 (2019).","journal-title":"arXiv preprint arXiv:1902.01492"},{"key":"e_1_3_1_178_2","article-title":"Energy and policy considerations for deep learning in NLP","author":"Strubell E.","year":"2019","unstructured":"E. Strubell, A. Ganesh, and A. McCallum. 2019. Energy and policy considerations for deep learning in NLP. In ACL.","journal-title":"ACL"},{"key":"e_1_3_1_179_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58539-6_9"},{"key":"e_1_3_1_180_2","first-page":"3068","volume-title":"ACCV","author":"Sultana M.","year":"2022","unstructured":"M. Sultana, M. Naseer, M. H. Khan, S. Khan, and F. S. Khan. 2022. Self-distilled vision transformer for domain generalization. In ACCV. 3068\u20133085."},{"key":"e_1_3_1_181_2","article-title":"A simple and effective pruning approach for large language models","author":"Sun M.","year":"2023","unstructured":"M. Sun, Z. Liu, A. Bair, and J. Z. Kolter. 2023. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695 (2023).","journal-title":"arXiv preprint arXiv:2306.11695"},{"key":"e_1_3_1_182_2","article-title":"VAQF: Fully automatic software-hardware co-design framework for low-bit vision transformer","author":"Sun M.","year":"2022","unstructured":"M. Sun, H. Ma, G. Kang, Y. Jiang, T. Chen, X. Ma, Z. Wang, and Y. Wang. 2022. VAQF: Fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618 (2022).","journal-title":"arXiv preprint arXiv:2201.06618"},{"key":"e_1_3_1_183_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2019.2924461"},{"key":"e_1_3_1_184_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSSC.2020.3002140"},{"key":"e_1_3_1_185_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"e_1_3_1_186_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_187_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_1_188_2","volume-title":"The Push for Energy Efficient \u201cGreen AI.\u201d","author":"Talwalkar A.","year":"2020","unstructured":"A. Talwalkar. 2020. The Push for Energy Efficient \u201cGreen AI.\u201d Retrieved from https:\/\/spectrum.ieee.org\/energy-efficient-green-ai-strategies"},{"key":"e_1_3_1_189_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCI.2018.2889933"},{"key":"e_1_3_1_190_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00293"},{"key":"e_1_3_1_191_2","first-page":"6105","volume-title":"ICML","author":"Tan M.","year":"2019","unstructured":"M. Tan and Q. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML. 6105\u20136114."},{"key":"e_1_3_1_192_2","first-page":"10096","volume-title":"ICML","author":"Tan M.","year":"2021","unstructured":"M. Tan and Q. Le. 2021. EfficientNetV2: Smaller models and faster training. In ICML. 10096\u201310106."},{"key":"e_1_3_1_193_2","unstructured":"M. Tan and Q. V. Le. 2019. MixConv: Mixed depthwise convolutional kernels. In BMVC."},{"key":"e_1_3_1_194_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.331"},{"key":"e_1_3_1_195_2","volume-title":"NIPS","author":"Tarvainen A.","year":"2017","unstructured":"A. Tarvainen and H. Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS."},{"issue":"4","key":"e_1_3_1_196_2","first-page":"1","article-title":"Efficient transformers: A survey","volume":"54","author":"Tay Y.","year":"2021","unstructured":"Y. Tay, M. Dehghani, D. Bahri, and D. Metzler. 2021. Efficient transformers: A survey. Comput. Surv. 54, 4 (2021), 1\u201341.","journal-title":"Comput. Surv."},{"key":"e_1_3_1_197_2","unstructured":"Y. Tian D. Krishnan and P. Isola. 2020. Contrastive representation distillation. In ICLR."},{"key":"e_1_3_1_198_2","first-page":"10347","volume-title":"ICML","author":"Touvron H.","year":"2021","unstructured":"H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347\u201310357."},{"key":"e_1_3_1_199_2","article-title":"Llama: Open and efficient foundation language models","author":"Touvron H.","year":"2023","unstructured":"H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi\u00e8re, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).","journal-title":"arXiv preprint arXiv:2302.13971"},{"key":"e_1_3_1_200_2","article-title":"Llama 2: Open foundation and fine-tuned chat models","author":"Touvron H.","year":"2023","unstructured":"H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).","journal-title":"arXiv preprint arXiv:2307.09288"},{"issue":"5","key":"e_1_3_1_201_2","first-page":"1605","article-title":"A 43.1 TOPS\/W energy-efficient absolute-difference-accumulation operation computing-in-memory with computation reuse","volume":"68","author":"Um S.","year":"2021","unstructured":"S. Um, S. Kim, S. Kim, and H.-J. Yoo. 2021. A 43.1 TOPS\/W energy-efficient absolute-difference-accumulation operation computing-in-memory with computation reuse. IEEE Trans. Circ. Syst. II 68, 5 (2021), 1605\u20131609.","journal-title":"IEEE Trans. Circ. Syst. II"},{"key":"e_1_3_1_202_2","volume-title":"GPU Technology Conference","author":"Vanholder H.","year":"2016","unstructured":"H. Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference."},{"key":"e_1_3_1_203_2","article-title":"Attention is all you need","author":"Vaswani A.","year":"2017","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \u0141. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_204_2","doi-asserted-by":"publisher","DOI":"10.1109\/KSE53942.2021.9648656"},{"key":"e_1_3_1_205_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01298"},{"key":"e_1_3_1_206_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00018"},{"key":"e_1_3_1_207_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01210"},{"key":"e_1_3_1_208_2","first-page":"7686","volume-title":"NIPS","author":"Wang N.","year":"2018","unstructured":"N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. In NIPS. 7686\u20137695."},{"key":"e_1_3_1_209_2","article-title":"Linformer: Self-attention with linear complexity","author":"Wang S.","year":"2020","unstructured":"S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).","journal-title":"arXiv preprint arXiv:2006.04768"},{"key":"e_1_3_1_210_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00926"},{"key":"e_1_3_1_211_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508396.3512869"},{"key":"e_1_3_1_212_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3004198"},{"key":"e_1_3_1_213_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062207"},{"key":"e_1_3_1_214_2","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113449"},{"key":"e_1_3_1_215_2","unstructured":"M. Wortsman G. Ilharco S. Y. Gadre R. Roelofs R. Gontijo-Lopes A. S. Morcos H. Namkoong A. Farhadi Y. Carmon S. Kornblith and L. Schmidt. 2022. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML 23965\u201323998."},{"key":"e_1_3_1_216_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01099"},{"key":"e_1_3_1_217_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00951"},{"key":"e_1_3_1_218_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"e_1_3_1_219_2","article-title":"Understanding INT4 quantization for transformer models: Latency speedup, composability, and failure cases","author":"Wu X.","year":"2023","unstructured":"X. Wu, C. Li, R. Y. Aminabadi, Z. Yao, and Y. He. 2023. Understanding INT4 quantization for transformer models: Latency speedup, composability, and failure cases. arXiv preprint arXiv:2301.12017 (2023).","journal-title":"arXiv preprint arXiv:2301.12017"},{"key":"e_1_3_1_220_2","article-title":"Lite transformer with long-short range attention","author":"Wu Z.","year":"2020","unstructured":"Z. Wu, Z. Liu, J. Lin, Y. Lin, and S. Han. 2020. Lite transformer with long-short range attention. In ICLR.","journal-title":"ICLR"},{"key":"e_1_3_1_221_2","article-title":"Early convolutions help transformers see better","author":"Xiao T.","year":"2021","unstructured":"T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick. 2021. Early convolutions help transformers see better. In NIPS.","journal-title":"NIPS"},{"key":"e_1_3_1_222_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01883"},{"key":"e_1_3_1_223_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2022.102799"},{"key":"e_1_3_1_224_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2023.3252612"},{"key":"e_1_3_1_225_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00297"},{"key":"e_1_3_1_226_2","volume-title":"ICLR","author":"Yang J.","year":"2021","unstructured":"J. Yang, B. Martinez, A. Bulat, G. Tzimiropoulos. 2021. Knowledge distillation via softmax regression representation learning. In ICLR."},{"key":"e_1_3_1_227_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00357"},{"key":"e_1_3_1_228_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01249-6_18"},{"key":"e_1_3_1_229_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00243"},{"key":"e_1_3_1_230_2","first-page":"11875","volume-title":"ICML","author":"Yao Z.","year":"2021","unstructured":"Z. Yao, Z. Dong, Z. Zheng, A. Gholami, J. Yu, E. Tan, L. Wang, Q. Huang, Y. Wang, M. Mahoney. 2021. HAWQ-V3: Dyadic neural network quantization. In ICML. 11875\u201311886."},{"key":"e_1_3_1_231_2","article-title":"A comprehensive capability analysis of GPT-3 and GPT-3.5 series models","author":"Ye J.","year":"2023","unstructured":"J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, J. Zhou, S. Chen, T. Gui, Q. Zhang, and X. Huang. 2023. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv preprint arXiv:2303.10420 (2023).","journal-title":"arXiv preprint arXiv:2303.10420"},{"key":"e_1_3_1_232_2","volume-title":"ICLR","author":"Ye J.","year":"2018","unstructured":"J. Ye, X. Lu, Z. Lin, and J. Z. Wang. 2018. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR."},{"key":"e_1_3_1_233_2","doi-asserted-by":"crossref","unstructured":"H. Yin A. Vahdat J. Alvarez A. Mallya J. Kautz and P. Molchanov. 2022. AdaViT: Adaptive tokens for efficient vision transformer. In CVPR 10809\u201310818.","DOI":"10.1109\/CVPR52688.2022.01054"},{"key":"e_1_3_1_234_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00175"},{"key":"e_1_3_1_235_2","first-page":"2771","article-title":"ShiftAddNet: A hardware-inspired deep network","author":"You H.","year":"2020","unstructured":"H. You, X. Chen, Y. Zhang, C. Li, S. Li, Z. Liu, Z. Wang, and Y. Lin. 2020. ShiftAddNet: A hardware-inspired deep network. In NIPS. 2771\u20132783.","journal-title":"NIPS"},{"key":"e_1_3_1_236_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.15"},{"key":"e_1_3_1_237_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20077-9_37"},{"key":"e_1_3_1_238_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"e_1_3_1_239_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00396"},{"key":"e_1_3_1_240_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2005.00532.x"},{"key":"e_1_3_1_241_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_3_1_242_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2785257"},{"key":"e_1_3_1_243_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107659"},{"key":"e_1_3_1_244_2","volume-title":"ICLR","author":"Zhang H.","year":"2023","unstructured":"H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum. 2023. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. In ICLR."},{"key":"e_1_3_1_245_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_1_246_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00381"},{"key":"e_1_3_1_247_2","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195662"},{"key":"e_1_3_1_248_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_3_1_249_2","unstructured":"F. Faghri I. Tabrizian I. Markov D. Alistarh D. M. Roy and A. Ramezani-Kebrya. 2020. Adaptive gradient quantization for data-parallel SGD. NIPS 33 (2020) 3174\u20133185."},{"key":"e_1_3_1_250_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00454"},{"key":"e_1_3_1_251_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00364"},{"key":"e_1_3_1_252_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01165"},{"key":"e_1_3_1_253_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58580-8_40"},{"key":"e_1_3_1_254_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00011"},{"key":"e_1_3_1_255_2","article-title":"Rethinking co-design of neural architectures and hardware accelerators","author":"Zhou Y.","year":"2021","unstructured":"Y. Zhou, X. Dong, B. Akin, M. Tan, D. Peng, T. Meng, A. Yazdanbakhsh, D. Huang, R. Narayanaswami, and J. Laudon. 2021. Rethinking co-design of neural architectures and hardware accelerators. arXiv preprint arXiv:2102.08619 (2021).","journal-title":"arXiv preprint arXiv:2102.08619"},{"key":"e_1_3_1_256_2","volume-title":"ICLR","author":"Zhu C.","year":"2017","unstructured":"C. Zhu, S. Han, H. Mao, and W. J. Dally. 2017. Trained ternary quantization. In ICLR."},{"key":"e_1_3_1_257_2","article-title":"Neural architecture search with reinforcement learning","author":"Zoph B.","year":"2017","unstructured":"B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. In ICLR.","journal-title":"ICLR"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3657282","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3657282","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:39Z","timestamp":1750295859000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3657282"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,24]]},"references-count":256,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3657282"],"URL":"https:\/\/doi.org\/10.1145\/3657282","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,24]]},"assertion":[{"value":"2022-12-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}