{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T21:26:44Z","timestamp":1775597204118,"version":"3.50.1"},"reference-count":191,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,11,14]],"date-time":"2023-11-14T00:00:00Z","timestamp":1699920000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory usage and energy consumption. As a result, deploying DNNs on devices with constrained hardware resources poses significant challenges. To overcome this, various compression techniques have been widely employed to optimize DNN accelerators. A promising approach is quantization, in which the full-precision values are stored in low bit-width precision. Quantization not only reduces memory requirements but also replaces high-cost operations with low-cost ones. DNN quantization offers flexibility and efficiency in hardware design, making it a widely adopted technique in various methods. Since quantization has been extensively utilized in previous works, there is a need for an integrated report that provides an understanding, analysis, and comparison of different quantization approaches. Consequently, we present a comprehensive survey of quantization concepts and methods, with a focus on image classification. We describe clustering-based quantization methods and explore the use of a scale factor parameter for approximating full-precision values. Moreover, we thoroughly review the training of a quantized DNN, including the use of a straight-through estimator and quantization regularization. We explain the replacement of floating-point operations with low-cost bitwise operations in a quantized DNN and the sensitivity of different layers in quantization. Furthermore, we highlight the evaluation metrics for quantization methods and important benchmarks in the image classification task. We also present the accuracy of the state-of-the-art methods on CIFAR-10 and ImageNet. This article attempts to make the readers familiar with the basic and advanced concepts of quantization, introduce important works in DNN quantization, and highlight challenges for future research in this field.<\/jats:p>","DOI":"10.1145\/3623402","type":"journal-article","created":{"date-parts":[[2023,9,11]],"date-time":"2023-09-11T11:49:36Z","timestamp":1694432976000},"page":"1-50","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":178,"title":["A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2481-5362","authenticated-orcid":false,"given":"Babak","family":"Rokh","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, University of Zanjan, Zanjan, Iran"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4166-7528","authenticated-orcid":false,"given":"Ali","family":"Azarpeyvand","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Zanjan, Zanjan, Iran"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6811-9196","authenticated-orcid":false,"given":"Alireza","family":"Khanteymoori","sequence":"additional","affiliation":[{"name":"Neurozentrum Department, Universit\u00e4tsklinikum Freiburg, Freiburg, Germany"}]}],"member":"320","published-online":{"date-parts":[[2023,11,14]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"1097","article-title":"ImageNet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012), 1097\u20131105.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","unstructured":"Alexis Conneau Holger Schwenk Lo\u0131c Barrault and Yann Lecun. 2016. Very deep convolutional networks for natural language processing. arXiv:1606.0178 1 vol. 2 http:\/\/arxiv.org\/abs\/1606.01781","DOI":"10.18653\/v1\/E17-1104"},{"key":"e_1_3_1_5_2","unstructured":"Xiaodong Liu Pengcheng He Weizhu Chen and Jianfeng Gao. 2019. Improving multi-task deep neural networks via knowledge distillation for natural language understanding. arXiv:1904.09482 . http:\/\/arxiv.org\/abs\/1904.09482"},{"key":"e_1_3_1_6_2","unstructured":"Xiaodong Liu Yu Wang Jianshu Ji Hao Cheng Xueyun Zhu Emmanuel Awa Pengcheng He Weizhu Chen Hoifung Poon Guihong Cao and Jianfeng Gao. 2020. The microsoft toolkit of multi-task deep neural networks for natural language understanding. arXiv:2002.07972 2020. http:\/\/arxiv.org\/abs\/2002.07972"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2205597"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Ying Zhang Mohammad Pezeshki Philemon Brakel Saizheng Zhang Cesar Laurent Yoshua Bengio and Aaron Courville. 2017. Towards end-to-end speech recognition with deep convolutional neural networks. arXiv:1701.02720 . http:\/\/arxiv.org\/abs\/1701.02720","DOI":"10.21437\/Interspeech.2016-1446"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_1_11_2","unstructured":"Pierre Sermanet David Eigen Xiang Zhang. 2013. OverFeat: Integrated recognition localization and detection using convolutional networks. arXiv:1312.6229 . http:\/\/arxiv.org\/abs\/1312.6229"},{"key":"e_1_3_1_12_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 . http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_1_13_2","unstructured":"Forrest N. Iandola Song Han Matthew W. Moskewicz Khalid Ashraf William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size. arXiv:1602.07360 . http:\/\/arxiv.org\/abs\/1602.07360"},{"key":"e_1_3_1_14_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 . http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/SiPS.2014.6986082"},{"key":"e_1_3_1_17_2","unstructured":"Yunchao Gong Liu Liu Ming Yang and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115 . http:\/\/arxiv.org\/abs\/1412.6115"},{"key":"e_1_3_1_18_2","unstructured":"Song Han Huizi Mao and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning trained quantization and Huffman coding. arXiv:1510.00149 . http:\/\/arxiv.org\/abs\/1510.00149"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157251"},{"key":"e_1_3_1_20_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","year":"2017","unstructured":"Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.541"},{"key":"e_1_3_1_22_2","first-page":"3123","article-title":"BinaryConnect: Training deep neural networks with binary weights during propagations","volume":"28","author":"Courbariaux Matthieu","year":"2015","unstructured":"Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. Adv. Neural Inf. Process. Syst. 28 (2015), 3123\u20133131.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_1_24_2","unstructured":"Shuchang Zhou Yuxin Wu Zekun Ni Xinyu Zhou He Wen and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160 . http:\/\/arxiv.org\/abs\/1606.06160"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00826"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01227-8"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2654822.2541967"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750389"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240801"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-018-00624-9"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3195970.3196120"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2020.2993045"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS45731.2020.9180868"},{"key":"e_1_3_1_35_2","unstructured":"Tianqi Chen Ian Goodfellow and Jonathon Shlens. 2015. Net2Net: Accelerating learning via knowledge transfer. arXiv:1511.05641 . http:\/\/arxiv.org\/abs\/1511.05641"},{"key":"e_1_3_1_36_2","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence","year":"2016","unstructured":"Ping Luo, Zhenyao Zhu, Ziwei Liu, Xiaogang Wang, and Xiaoou Tang. 2016. Face model compression by distilling knowledge from neurons. In Proceedings of the 30th AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_1_37_2","unstructured":"Zheng Xu Yen-Chang Hsu and Jiawei Huang. 2017. Training shallow and thin networks for acceleration via knowledge distillation with conditional adversarial networks. arXiv:1709.00513 . http:\/\/arxiv.org\/abs\/1709.00513"},{"key":"e_1_3_1_38_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Mishra Asit","year":"2018","unstructured":"Asit Mishra and Debbie Marr. 2018. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_39_2","first-page":"2765","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems","author":"Kim Jangho","year":"2018","unstructured":"Jangho Kim, SeongUk Park, and Nojun Kwak. 2018. Paraphrasing complex network: Network compression via factor transfer. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2765\u20132774."},{"key":"e_1_3_1_40_2","unstructured":"Raphael Tang Yao Lu and Linqing Liu. 2019. Distilling task-specific knowledge from BERT into simple neural networks. arXiv:1903.12136 . http:\/\/arxiv.org\/abs\/1903.12136"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2009.5272559"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_1_43_2","unstructured":"Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv:1602.02830 . http:\/\/arxiv.org\/abs\/1602.02830"},{"key":"e_1_3_1_44_2","first-page":"2285","volume-title":"Proceedings of the International Conference on Machine Learning","year":"2015","unstructured":"Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of the International Conference on Machine Learning. PMLR, 2285\u20132294."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2015.2494536"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.15"},{"key":"e_1_3_1_48_2","first-page":"598","article-title":"Optimal brain damage","volume":"2","author":"LeCun Yann","year":"1989","unstructured":"Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage. Adv. Neural Inf. Process. Syst. 2 (1989), 598\u2013605.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1992.4.4.473"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300017124"},{"key":"e_1_3_1_51_2","first-page":"164","article-title":"Second order derivatives for network pruning: Optimal brain surgeon","volume":"5","author":"Hassibi Babak","year":"1992","unstructured":"Babak Hassibi and David Stork. 1992. Second order derivatives for network pruning: Optimal brain surgeon. Adv. Neural Inf. Process. Syst. 5 (1992), 164\u2013171.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_52_2","unstructured":"Michael Zhu and Suyog Gupta. 2017. To prune or not to prune: Exploring the efficacy of pruning for model compression. arXiv:1710.01878 . http:\/\/arxiv.org\/abs\/1710.01878"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3005348"},{"key":"e_1_3_1_54_2","unstructured":"Hao Li Asim Kadav Igor Durdanovic Hanan Samet and Hans Peter Graf. 2016. Pruning filters for efficient ConvNets. arXiv:1608.08710 . http:\/\/arxiv.org\/abs\/1608.08710"},{"key":"e_1_3_1_55_2","unstructured":"Hengyuan Hu Rui Peng Yu-Wing Tai and Chi-Keung Tang. 2016. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250 . http:\/\/arxiv.org\/abs\/1607.03250"},{"key":"e_1_3_1_56_2","first-page":"1135","article-title":"Learning both weights and connections for efficient neural network","volume":"28","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 28 (2015), 1135\u20131143.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_57_2","unstructured":"Lucas Theis Iryna Korshunova Alykhan Tejani and Ferenc Husz\u00e1r. 2018. Faster gaze prediction with dense networks and Fisher pruning. arXiv:1801.05787 . http:\/\/arxiv.org\/abs\/1801.05787"},{"key":"e_1_3_1_58_2","doi-asserted-by":"crossref","unstructured":"Suraj Srinivas and R. Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. arXiv:1507.06149 . http:\/\/arxiv.org\/abs\/1507.06149","DOI":"10.5244\/C.29.31"},{"key":"e_1_3_1_59_2","unstructured":"Sajid Anwar and Wonyong Sung. 2016. Compact deep convolutional neural networks with coarse pruning. arXiv:1610.09639 . http:\/\/arxiv.org\/abs\/1610.09639"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2019.2954495"},{"key":"e_1_3_1_61_2","unstructured":"Shixing Yu Zhewei Yao Amir Gholami Zhen Dong Sehoon Kim Michael W. Mahoney and Kurt Keutzer. 2021. Hessian-aware pruning and optimal neural implant. arXiv:2101.08940 . http:\/\/arxiv.org\/abs\/2101.08940"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2502579"},{"key":"e_1_3_1_63_2","unstructured":"Yong-Deok Kim Eunhyeok Park Sungjoo Yoo Taelim Choi Lu Yang and Dongjun Shin. 2015. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv:1511.06530 . http:\/\/arxiv.org\/abs\/1511.06530"},{"key":"e_1_3_1_64_2","first-page":"1269","article-title":"Exploiting linear structure within convolutional networks for efficient evaluation","volume":"27","year":"2014","unstructured":"Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. Adv. Neural Inf. Process. Syst. 27 (2014), 1269\u20131277.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_65_2","first-page":"442","article-title":"Tensorizing neural networks","volume":"28","author":"Novikov Alexander","year":"2015","unstructured":"Alexander Novikov, Dmitrii Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing neural networks. Adv. Neural Inf. Process. Syst. 28 (2015), 442\u2013450.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_66_2","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)","author":"Tai Cheng","year":"2016","unstructured":"Cheng Tai, Tong Xiao, and Yi Zhang, 2016. Convolutional neural networks with low-rank regularization. In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)."},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00977"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00225"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00801"},{"key":"e_1_3_1_70_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations","author":"Polino Antonio","year":"2018","unstructured":"Antonio Polino, Razvan Pascanu, and Dan Alistarh. 2018. Model compression via distillation and quantization. In Proceedings of the 6th International Conference on Learning Representations."},{"key":"e_1_3_1_71_2","first-page":"2078","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","year":"2020","unstructured":"Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, and Song Han. 2020. Apq: Joint search for network architecture, pruning and quantization policy. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2078\u20132087."},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58526-6_16"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524066"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1117\/12.20700"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1016\/0893-6080(91)90077-I"},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/78.229903"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/72.182695"},{"key":"e_1_3_1_78_2","unstructured":"Miguel A. Carreira-Perpin\u00e1n. 2017. Model compression as constrained optimization with application to neural nets. Part I: General framework. arXiv:1707.01209 . http:\/\/arxiv.org\/abs\/1707.01209"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-017-1750-y"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.761"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-023-08718-3"},{"key":"e_1_3_1_83_2","unstructured":"Yu Cheng Duo Wang Pan Zhou and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282 . http:\/\/arxiv.org\/abs\/1710.09282"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOTEH.2018.8345545"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2765695"},{"issue":"6","key":"e_1_3_1_86_2","article-title":"Speeding-up convolutional neural networks: A survey","volume":"66","author":"Lebedev Vadim","year":"2018","unstructured":"Vadim Lebedev and Victor Lempitsky. 2018. Speeding-up convolutional neural networks: A survey. Bull. Polish Acad. Sci. Techni. Sci. 66, 6 (2018).","journal-title":"Bull. Polish Acad. Sci. Techni. Sci."},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.1700789"},{"key":"e_1_3_1_88_2","unstructured":"Jian Cheng Peisong Wang Gang Li Qinghao Hu and Hanqing Lu. 2018. A survey on acceleration of deep convolutional neural networks. arXiv:1802.00939 . http:\/\/arxiv.org\/abs\/1802.00939"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/1213\/5\/052003"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2976475"},{"key":"e_1_3_1_91_2","first-page":"1","article-title":"A comprehensive survey on model compression and acceleration","author":"Choudhary Tejalal","year":"2020","unstructured":"Tejalal Choudhary, Vipul Mishra, Anurag Goswami, and Jagannathan Sarangapani. 2020. A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 53, 7 (2020), 1\u201343.","journal-title":"Artif. Intell. Rev"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1109\/WF-IoT48130.2020.9221198"},{"key":"e_1_3_1_93_2","unstructured":"Manish Gupta and Puneet Agrawal. 2020. Compression of deep learning models for text: A survey. arXiv:2008.05221 . http:\/\/arxiv.org\/abs\/2008.05221"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics11060945"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503044"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.3390\/computers12030060"},{"key":"e_1_3_1_97_2","doi-asserted-by":"crossref","unstructured":"Tailin Liang John Glossner Lei Wang and Shaobo Shi. 2021. Pruning and quantization for deep neural network acceleration: A survey. arXiv:2101.09671 . http:\/\/arxiv.org\/abs\/2101.09671","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_3_1_98_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107281"},{"key":"e_1_3_1_99_2","unstructured":"Mariam Rakka Mohammed E. Fouda Pramod Khargonekar and Fadi Kurdahi. 2022. Mixed-precision neural networks: A survey. arXiv:2208.06064 . https:\/\/arxiv.org\/abs\/2208.06064"},{"key":"e_1_3_1_100_2","unstructured":"Yunhui Guo. 2018. A survey on methods and theories of quantized neural networks. arXiv:1808.04752 . http:\/\/arxiv.org\/abs\/1808.04752"},{"key":"e_1_3_1_101_2","doi-asserted-by":"crossref","unstructured":"Amir Gholami Sehoon Kim Zhen Dong Zhewei Yao Michael W. Mahoney and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv:2103.13630 . http:\/\/arxiv.org\/abs\/2103.13630","DOI":"10.1201\/9781003162810-13"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/29.21701"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_1_105_2","unstructured":"Fei-Fei li Andrej Karpathy and Justin Johnson. 2015. Convolutional Neural Networks for Visual Recognition . Cs231n Stanford University. Retrieved from http:\/\/cs231n.stanford.edu"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-804291-5.00010-6"},{"key":"e_1_3_1_107_2","unstructured":"Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv:1601.06071 . http:\/\/arxiv.org\/abs\/1601.06071"},{"key":"e_1_3_1_108_2","unstructured":"Jungwook Choi Zhuo Wang Swagath Venkataramani Pierce I-Jen Chuang Vijayalakshmi Srinivasan and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv:1805.06085 . http:\/\/arxiv.org\/abs\/1805.06085"},{"key":"e_1_3_1_109_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Mishra Asit","year":"2018","unstructured":"Asit Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2018. WRPN: Wide reduced-precision networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_110_2","unstructured":"Zhouhan Lin Matthieu Courbariaux Roland Memisevic and Yoshua Bengio. 2015. Neural networks with few multiplications. arXiv:1510.03009 . http:\/\/arxiv.org\/abs\/1510.03009"},{"key":"e_1_3_1_111_2","unstructured":"Daisuke Miyashita Edward H. Lee and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. arXiv:1603.01025 . http:\/\/arxiv.org\/abs\/1603.01025"},{"key":"e_1_3_1_112_2","unstructured":"Wonyong Sung Sungho Shin and Kyuyeon Hwang. 2015. Resiliency of deep neural networks under quantization. arXiv:1511.06488 . http:\/\/arxiv.org\/abs\/1511.06488"},{"key":"e_1_3_1_113_2","unstructured":"Fengfu Li and Bin Liu. 2016. Ternary weight networks. arXiv:1605.04711 . http:\/\/arxiv.org\/abs\/1605.04711"},{"key":"e_1_3_1_114_2","unstructured":"Chenzhuo Zhu Song Han Huizi Mao and William J. Dally. 2016. Trained ternary quantization. arXiv:1612.01064 . http:\/\/arxiv.org\/abs\/1612.01064"},{"key":"e_1_3_1_115_2","first-page":"5900","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201917)","year":"2017","unstructured":"Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong. 2017. LogNet: Energy-efficient neural networks using logarithmic computation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201917). 5900\u20135904."},{"key":"e_1_3_1_116_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10862"},{"key":"e_1_3_1_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.574"},{"key":"e_1_3_1_118_2","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294804"},{"key":"e_1_3_1_119_2","unstructured":"Aojun Zhou Anbang Yao and Yiwen Guo. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv:1702.03044 . http:\/\/arxiv.org\/abs\/1702.03044"},{"key":"e_1_3_1_120_2","unstructured":"Naveen Mellempudi Abhisek Kundu Dheevatsa Mudigere Dipankar Das Bharat Kaul and Pradeep Dubey. 2017. Ternary neural networks with fine-grained quantization. arXiv:1705.01462 . http:\/\/arxiv.org\/abs\/1705.01462"},{"key":"e_1_3_1_121_2","volume-title":"Proc. AAAI Conf. Artif. Intell.","volume":"32","year":"2018","unstructured":"Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, and Kai Xiong. 2018. Deep neural network compression with single and multiple level quantization. Proc. AAAI Conf. Artif. Intell. 32, 1 (2018)."},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00982"},{"key":"e_1_3_1_123_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_23"},{"key":"e_1_3_1_124_2","unstructured":"Li Yuhang Xin Dong and Wei Wang. 2019. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv:1909.13144 . https:\/\/arxiv.org\/abs\/1909.13144"},{"key":"e_1_3_1_125_2","unstructured":"Milo\u0161 Nikoli\u0107 Ghouthi Boukli Hacene Ciaran Bannon Alberto Delmas Lascorz Matthieu Courbariaux Yoshua Bengio Vincent Gripon and Andreas Moshovos. 2020. Bitpruning: Learning bitlengths for aggressive and accurate quantization. arXiv:2002.03090 . https:\/\/arxiv.org\/abs\/2002.03090"},{"key":"e_1_3_1_126_2","first-page":"2247","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920)","year":"2020","unstructured":"Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, and Jingkuan Song. 2020. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920). IEEE Computer Society, 2247\u20132256."},{"key":"e_1_3_1_127_2","first-page":"9281","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Sijie Zhao","year":"2021","unstructured":"Zhao Sijie, Tao Yue, and Xuemei Hu. 2021. Distribution-aware adaptive multi-bit quantization. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9281\u20139290."},{"key":"e_1_3_1_128_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00519"},{"key":"e_1_3_1_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952679"},{"key":"e_1_3_1_130_2","unstructured":"Tapani Raiko Mathias Berglund Guillaume Alain and Laurent Dinh. 2014. Techniques for learning binary stochastic feedforward neural networks. arXiv:1406.2989 . http:\/\/arxiv.org\/abs\/1406.2989"},{"key":"e_1_3_1_131_2","first-page":"52","volume-title":"Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS\u201919)","author":"Prateeth Nayak","year":"2019","unstructured":"Nayak Prateeth, David Zhang, and Sek Chai. 2019. Bit efficient quantization for deep neural networks. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS\u201919). IEEE, 52\u201356."},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.3390\/app9122559"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_3_1_134_2","volume-title":"Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations","author":"Rumelhart David E.","year":"1985","unstructured":"David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1985. Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. University of California-San Diego, La Jolla Institute for Cognitive Science."},{"key":"e_1_3_1_135_2","volume-title":"The Perceptron, a Perceiving and Recognizing Automaton","author":"Rosenblatt Frank","year":"1957","unstructured":"Frank Rosenblatt. 1957. The Perceptron, a Perceiving and Recognizing Automaton. Cornell Aeronautical Laboratory."},{"key":"e_1_3_1_136_2","unstructured":"Geoffrey Hinton. 2012. Neural networks for machine learning. Coursera Video Lectures. Retrieved from https:\/\/www.cs.toronto.edu\/\u02dchinton\/coursera_lectures"},{"key":"e_1_3_1_137_2","unstructured":"Yoshua Bengio Nicholas L\u00e9onard and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432 . http:\/\/arxiv.org\/abs\/1308.3432"},{"key":"e_1_3_1_138_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40687-018-0177-6"},{"key":"e_1_3_1_139_2","volume-title":"Proceedings of the International Conference on Learning Representations","year":"2018","unstructured":"Penghang Yin, Jiancheng Lyu, Shuai Zhang, Stanley Osher, Yingyong Qi, and Jack Xin. 2018. Understanding straight-through estimator in training activation quantized neural nets. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_140_2","unstructured":"Pengyu Cheng Chang Liu Chunyuan Li Dinghan Shen Ricardo Henao and Lawrence Carin. 2019. Straight-through estimator as projected Wasserstein gradient flow. arXiv:1910.02176 . http:\/\/arxiv.org\/abs\/1910.02176"},{"key":"e_1_3_1_141_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Yang Huanrui","year":"2020","unstructured":"Huanrui Yang, Lin Duan, Yiran Chen, and Hai Li. 2020. BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_142_2","unstructured":"Maxim Naumov Utku Diril Jongsoo Park Benjamin Ray Jedrzej Jablonski and Andrew Tulloch. 2018. On periodic functions as regularizers for quantization of neural networks. arXiv:1811.09862 . https:\/\/arxiv.org\/abs\/1811.09862"},{"key":"e_1_3_1_143_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Bai Yu","year":"2018","unstructured":"Yu Bai, Yu-Xiang Wang, and Edo Liberty. 2018. ProxQuant: Quantized neural networks via proximal operators. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_144_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2857080"},{"key":"e_1_3_1_145_2","unstructured":"Stefan Uhlich Lukas Mauch Kazuki Yoshiyama Fabien Cardinaux Javier Alonso Garcia Stephen Tiedemann Thomas Kemp and Akira Nakamura. 2019. Differentiable quantization of deep neural networks. arXiv:1905.11452 vol. 2 no. 8 https:\/\/arxiv.org\/abs\/1905.11452"},{"key":"e_1_3_1_146_2","volume-title":"Proceedings of the International Conference on Learning Representations","year":"2020","unstructured":"Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. 2020. BRECQ: Pushing the limit of post-training quantization by block reconstruction. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_147_2","first-page":"4017","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","year":"2020","unstructured":"Kai Han, Yunhe Wang, Yixing Xu, Chunjing Xu, Enhua Wu, and Chang Xu. 2020. Training binary neural networks through learning with noisy supervision. In Proceedings of the International Conference on Machine Learning. PMLR. 4017\u20134026."},{"key":"e_1_3_1_148_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00495"},{"key":"e_1_3_1_149_2","unstructured":"Bichen Wu Yanghan Wang Peizhao Zhang Yuandong Tian Peter Vajda and Kurt Keutzer. 2018. Mixed precision quantization of ConvNets via differentiable neural architecture search. arXiv:1812.00090 . https:\/\/arxiv.org\/abs\/1812.00090"},{"key":"e_1_3_1_150_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054599"},{"key":"e_1_3_1_151_2","first-page":"11875","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","year":"2021","unstructured":"Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, and Kurt Keutzer. 2021. HAWQ-V3: Dyadic neural network quantization. In Proceedings of the International Conference on Machine Learning. PMLR, 11875\u201311886."},{"key":"e_1_3_1_152_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00222"},{"key":"e_1_3_1_153_2","first-page":"85","volume-title":"Proceedings of the European Conference on Computer Vision","year":"2020","unstructured":"Ting-Wu Chin, Pierce I-Jen Chuang, Vikas Chandra, and Diana Marculescu. 2020. One weight bitwidth to rule them all. In Proceedings of the European Conference on Computer Vision. Springer, Cham, 85\u2013103."},{"key":"e_1_3_1_154_2","unstructured":"Min Lin Qiang Chen and Shuicheng Yan. 2013. Network in network. arXiv:1312.4400 . http:\/\/arxiv.org\/abs\/1312.4400"},{"key":"e_1_3_1_155_2","first-page":"293","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","year":"2019","unstructured":"Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2019. HAWQ: Hessian AWare Quantization of neural networks with mixed-precision. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 293\u2013302."},{"key":"e_1_3_1_156_2","doi-asserted-by":"crossref","unstructured":"Zhen Dong Zhewei Yao Daiyaan Arfeen Amir Gholami Michael W. Mahoney and Kurt Keutzer. 2019. HAWQ-v2: Hessian AWare trace-weighted Quantization of neural networks. arXiv:1911.03852 . http:\/\/arxiv.org\/abs\/1911.03852","DOI":"10.1109\/ICCV.2019.00038"},{"key":"e_1_3_1_157_2","volume-title":"Proceedings of the International Conference on Learning Representations","year":"2019","unstructured":"Stefan Uhlich, Lukas Mauch, Fabien Cardinaux, Kazuki Yoshiyama, Javier Alonso Garcia, Stephen Tiedemann, Thomas Kemp, and Akira Nakamura. 2019. Mixed Precision DNNs: All you need is a good parametrization. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_3_1_159_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58574-7_27"},{"key":"e_1_3_1_160_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00273"},{"key":"e_1_3_1_161_2","first-page":"259","volume-title":"Proceedings of the European Conference on Computer Vision","year":"2022","unstructured":"Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Wen Ji, Yaowei Wang, and Wenwu Zhu. 2022. Mixed-precision neural network quantization via learned layer-wise importance. In Proceedings of the European Conference on Computer Vision. Springer Nature Switzerland, 259\u2013275."},{"key":"e_1_3_1_162_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19775-8_1"},{"key":"e_1_3_1_163_2","first-page":"109780","article-title":"Data-Free quantization via mixed-precision compensation without fine-tuning","year":"2023","unstructured":"Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, and Yong Liu. 2023. Data-Free quantization via mixed-precision compensation without fine-tuning. Pattern Recog. 143 (2023), 109780.","journal-title":"Pattern Recog."},{"key":"e_1_3_1_164_2","doi-asserted-by":"publisher","DOI":"10.5555\/2834535"},{"key":"e_1_3_1_165_2","first-page":"3009","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919)","author":"Yoni Choukroun","year":"2019","unstructured":"Choukroun Yoni, Eli Kravchik, Fan Yang, and Pavel Kisilev. 2019. Low-bit quantization of neural networks for efficient inference. In Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919). 3009\u20133018."},{"key":"e_1_3_1_166_2","first-page":"2810","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","year":"2019","unstructured":"Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan. 2019. Fully quantized network for object detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2810\u20132819."},{"key":"e_1_3_1_167_2","unstructured":"Qian Lou Feng Guo Lantao Liu Minje Kim and Lei Jiang. 2019. AutoQ: Automated kernel-wise neural network quantization. arXiv:1902.05690 . https:\/\/arxiv.org\/abs\/1902.05690"},{"key":"e_1_3_1_168_2","unstructured":"Pierre Stock Armand Joulin R\u00e9mi Gribonval Benjamin Graham and Herv\u00e9 J\u00e9gou. 2019. And the bit goes down: Revisiting the quantization of neural networks. arXiv:1907.05686 . http:\/\/arxiv.org\/abs\/1907.05686"},{"key":"e_1_3_1_169_2","unstructured":"Nianhui Guo Joseph Bethge Haojin Yang Kai Zhong Xuefei Ning Christoph Meinel and Yu Wang. 2021. BoolNet: Minimizing the energy consumption of binary neural networks. arXiv:2106.06991 . http:\/\/arxiv.org\/abs\/2106.06991"},{"key":"e_1_3_1_170_2","article-title":"Distillation-guided residual learning for binary convolutional neural networks","author":"Ye Jianming","year":"2021","unstructured":"Jianming Ye, Jingdong Wang, and Shiliang Zhang. 2021. Distillation-guided residual learning for binary convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst 33, 12 (2021).","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"e_1_3_1_171_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Bulat Adrian","year":"2020","unstructured":"Adrian Bulat, Brais Martinez, and Georgios Tzimiropoulos. 2020. High-capacity expert binary networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_172_2","volume-title":"Learning Multiple Layers of Features from Tiny ImagesMaster's thesis","author":"Krizhevsky Alex","year":"2009","unstructured":"Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Master's thesis, University of Toronto."},{"key":"e_1_3_1_173_2","unstructured":"NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 2011 Reading digits in natural images with unsupervised feature learning"},{"key":"e_1_3_1_174_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_175_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_1_176_2","unstructured":"Adv. Neural Inf. Process. Syst. 2020 33 Rotated binary neural network"},{"key":"e_1_3_1_177_2","doi-asserted-by":"publisher","DOI":"10.1137\/18M1166134"},{"key":"e_1_3_1_178_2","unstructured":"Zhaohui Yang Yunhe Wang Kai Han Chunjing XU Chao Xu Dacheng Tao and Chang Xu. 2020. Searching for low-bit weights in quantized neural networks. arXiv:2009.08695 . http:\/\/arxiv.org\/abs\/2009.08695"},{"key":"e_1_3_1_179_2","first-page":"7300","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","year":"2019","unstructured":"Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). IEEE Computer Society, 7300\u20137308."},{"key":"e_1_3_1_180_2","unstructured":"Steven K. Esser Jeffrey L. McKinstry Deepika Bablani Rathinakumar Appuswamy and Dharmendra S. Modha. 2019. Learned step size quantization. arXiv:1902.08153 . http:\/\/arxiv.org\/abs\/1902.08153"},{"key":"e_1_3_1_181_2","first-page":"7197","volume-title":"Proceedings of the International Conference on Machine Learning. PMLR","year":"2020","unstructured":"Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. 2020. Up or down? adaptive rounding for post-training quantization. In Proceedings of the International Conference on Machine Learning. PMLR. 7197\u20137206."},{"key":"e_1_3_1_182_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19775-8_2"},{"key":"e_1_3_1_183_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00489"},{"key":"e_1_3_1_184_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3159369"},{"key":"e_1_3_1_185_2","first-page":"658","volume-title":"Proceedings of the European Conference on Computer Vision","year":"2022","unstructured":"Sangyun Oh, Hyeonuk Sim, Jounghyun Kim, and Jongeun Lee. 2022. Non-uniform step size quantization for accurate post-training quantization. In Proceedings of the European Conference on Computer Vision. Springer Nature Switzerland, Cham, 658\u2013673."},{"key":"e_1_3_1_186_2","first-page":"24427","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","year":"2023","unstructured":"Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, and Wenyu Liu. 2023. PD-Quant: Post-training quantization based on prediction difference metric. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 24427\u201324437."},{"key":"e_1_3_1_187_2","unstructured":"Qing Jin Linjie Yang and Zhenyu Liao. 2019. Towards efficient training for neural network quantization. arXiv:1912.10207. http:\/\/arxiv.org\/abs\/1912.10207"},{"key":"e_1_3_1_188_2","first-page":"69","volume-title":"Proceedings of the European Conference on Computer Vision","year":"2020","unstructured":"Jun Fang, Ali Shafiee, Hamzah Abdel-Aziz, David Thorsley, Georgios Georgiadis, and Joseph H. Hassoun. 2020. Post-training piecewise linear quantization for deep neural networks. In Proceedings of the European Conference on Computer Vision. Springer, Cham, 69\u201386."},{"key":"e_1_3_1_189_2","first-page":"13420","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","year":"2020","unstructured":"Hai Phan, Zechun Liu, Dang Huynh, Marios Savvides, Kwang-Ting Cheng, and Zhiqiang Shen. 2020. Binarizing MobileNet via evolution-based searching. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13420\u201313429."},{"key":"e_1_3_1_190_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58568-6_9"},{"key":"e_1_3_1_191_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_192_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623402","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3623402","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:26Z","timestamp":1750178186000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623402"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,14]]},"references-count":191,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3623402"],"URL":"https:\/\/doi.org\/10.1145\/3623402","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,14]]},"assertion":[{"value":"2022-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-11","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}