{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T00:27:51Z","timestamp":1775003271845,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T00:00:00Z","timestamp":1672790400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Chinese Universities Industry-University-Research Innovation Foud","award":["2020HYA02011"],"award-info":[{"award-number":["2020HYA02011"]}]},{"name":"Chinese Universities Industry-University-Research Innovation Foud","award":["ZR2019LZH002"],"award-info":[{"award-number":["ZR2019LZH002"]}]},{"name":"Natural Science Foundation of Shandong Province","award":["2020HYA02011"],"award-info":[{"award-number":["2020HYA02011"]}]},{"name":"Natural Science Foundation of Shandong Province","award":["ZR2019LZH002"],"award-info":[{"award-number":["ZR2019LZH002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Current edge devices for neural networks such as FPGA, CPLD, and ASIC can support low bit-width computing to improve the execution latency and energy efficiency, but traditional linear quantization can only maintain the inference accuracy of neural networks at a bit-width above 6 bits. Different from previous studies that address this problem by clipping the outliers, this paper proposes a two-stage quantization method. Before converting the weights into fixed-point numbers, this paper first prunes the network by unstructured pruning and then uses the K-means algorithm to cluster the weights in advance to protect the distribution of the weights. To solve the instability problem of the K-means results, the PSO (particle swarm optimization) algorithm is exploited to obtain the initial cluster centroids. The experimental results on baseline deep networks such as ResNet-50, Inception-v3, and DenseNet-121 show the proposed optimized quantization method can generate a 5-bit network with an accuracy loss of less than 5% and a 4-bit network with only 10% accuracy loss as compared to 8-bit quantization. By quantization and pruning, this method reduces the model bit-width from 32 to 4 and the number of neurons by 80%. Additionally, it can be easily integrated into frameworks such as TensorRt and TensorFlow-Lite for low bit-width network quantization.<\/jats:p>","DOI":"10.3390\/a16010031","type":"journal-article","created":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T02:00:57Z","timestamp":1672884057000},"page":"31","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Optimization of Linear Quantization for General and Effective Low Bit-Width Network Compression"],"prefix":"10.3390","volume":"16","author":[{"given":"Wenxin","family":"Yang","sequence":"first","affiliation":[{"name":"School of Computer Engineering & Science, Shanghai University, Shanghai 200444, China"}]},{"given":"Xiaoli","family":"Zhi","sequence":"additional","affiliation":[{"name":"Shanghai Engineering Research Center of Intelligent Computing System, Shanghai University, Shanghai 200444, China"}]},{"given":"Weiqin","family":"Tong","sequence":"additional","affiliation":[{"name":"Shanghai Engineering Research Center of Intelligent Computing System, Shanghai University, Shanghai 200444, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Lecun","year":"2015","journal-title":"Nature"},{"key":"ref_2","first-page":"16","article-title":"Speed control of three phase induction motor using neural network","volume":"16","author":"Sallam","year":"2018","journal-title":"IJCSIS"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Sallam, N.M., Saleh, A.I., Arafat Ali, H., and Abdelsalam, M.M. (2022). An Efficient Strategy for Blood Diseases Detection Based on Grey Wolf Optimization as Feature Selection and Machine Learning Techniques. Appl. Sci., 12.","DOI":"10.3390\/app122110760"},{"key":"ref_4","first-page":"3","article-title":"Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding","volume":"56","author":"Han","year":"2015","journal-title":"Fiber"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1038\/s41928-018-0059-3","article-title":"Scaling for edge inference of deep neural networks","volume":"1","author":"Xu","year":"2018","journal-title":"Nat. Electron."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1109\/72.248452","article-title":"Pruning algorithms-a survey","volume":"4","author":"Reed","year":"1993","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Maarif, M.R., Listyanda, R.F., Kang, Y.-S., and Syafrudin, M. (2022). Artificial Neural Network Training Using Structural Learning with Forgetting for Parameter Analysis of Injection Molding Quality Prediction. Information, 13.","DOI":"10.3390\/info13100488"},{"key":"ref_8","unstructured":"Zhu, M., and Gupta, S. (2017). To prune, or not to prune: Exploring the efficacy of pruning for model compression. arXiv."},{"key":"ref_9","unstructured":"Vanhoucke, V., and Mao, M.Z. (2011, January 12\u201317). Improving the speed of neural networks on CPUs. Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, Granada, Spain."},{"key":"ref_10","unstructured":"Courbariaux, M., Bengio, Y., and David, J.P. (2015, January 7\u201312). BinaryConnect: Training Deep Neural Networks with binary weights during propagations. Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1007\/978-3-319-46493-0_32","article-title":"XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks","volume":"Volume 9908","author":"Leibe","year":"2016","journal-title":"Proceedings of the Computer Vision\u2014ECCV 2016"},{"key":"ref_12","unstructured":"Li, F., and Liu, B. (2016). Ternary Weight Networks. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. (2017, January 21\u201326). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2018.00286"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chang, S.E., Li, Y., Sun, M., Shi, R., So, H.K.-H., Qian, X., Wang, Y., and Lin, X. (27\u20133, January 27). Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework. Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Republic of Korea.","DOI":"10.1109\/HPCA51647.2021.00027"},{"key":"ref_15","unstructured":"Migacz, S. (2017, January 8\u201311). 8-bit inference with TensorRT. Proceedings of the GPU Technology Conference, San Jose, CA, USA."},{"key":"ref_16","unstructured":"Han, S., Pool, J., Tran, J., and Dally, W. (2015). Learning both Weights and Connections for Efficient Neural Networks. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_17","unstructured":"Zmora, N., Jacob, G., Elharar, B., Zlotnik, L., Novik, G., Barad, H., Chen, Y., Muchsel, R., Fan, T.J., and Chavez, R. (2021, January 01). NervanaSystems\/Distillerv (V0.3.2). Zenodo. Available online: https:\/\/doi.org\/10.5281\/zenodo.3268730."},{"key":"ref_18","unstructured":"Miyashita, D., Lee, E.H., and Murmann, B. (2016). Convolutional Neural Networks using Logarithmic Data Representation. arXiv."},{"key":"ref_19","unstructured":"Chen, W., Wilson, J., Tyree, S., Weinberger, K., and Chen, Y. (2015, January 6\u201311). Compressing Neural Networks with the Hashing Trick. Proceedings of the International Conference on International Conference on Machine Learning, Lille, France."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wu, J., Leng, C., Wang, Y., Hu, Q., and Cheng, J. (2016, January 27\u201330). Quantized Convolutional Neural Networks for Mobile Devices. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.521"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Shin, S., Hwang, K., and Sung, W. (2016, January 20\u201325). Fixed-point performance analysis of recurrent neural networks. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7471821"},{"key":"ref_22","unstructured":"Banner, R., Nahshan, Y., and Soudry, D. (2019). Post training 4-bit quantization of convolutional networks for rapid-deployment. arXiv."},{"key":"ref_23","unstructured":"Zhao, R. (2019). Improving Neural Network Quantization without Retraining using Outlier Channel Splitting. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1016\/0305-0548(86)90048-1","article-title":"Future paths for integer programming and links to artificial intelligence","volume":"13","author":"Glover","year":"1986","journal-title":"Comput. Oper. Res."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"105622","DOI":"10.1016\/j.engappai.2022.105622","article-title":"A survey of recently developed metaheuristics and their comparative analysis","volume":"117","author":"Alorf","year":"2023","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_26","unstructured":"Dorigo, M. (1992). Optimization, Learning and Natural Algorithms. [Ph.D. Thesis, Politecnico di Milano]."},{"key":"ref_27","unstructured":"Kennedy, J., and Eberhart, R.C. (December, January 27). Particle Swarm Optimization. Proceedings of the IEEE International Joint Conference on Neural Networks, Perth, WA, Australia."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.swevo.2018.02.013","article-title":"A novel nature-inspired algorithm for optimization: Squirrel search algorithm","volume":"44","author":"Jain","year":"2019","journal-title":"Swarm Evol. Comput."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"103300","DOI":"10.1016\/j.engappai.2019.103300","article-title":"Manta ray foraging optimization: An effective bio-inspired optimizer for engineering applications","volume":"87","author":"Zhao","year":"2020","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_30","unstructured":"Omran, M., Salman, A., and Engelbrecht, A.P. (2002, January 18\u201322). Image Classification using Particle Swarm Optimization. Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning, Singapore."},{"key":"ref_31","unstructured":"Ballardini, A.L. (2018). A tutorial on Particle Swarm Optimization Clusterin. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Weinberger, K.Q., and van der Maaten, L. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going Deeper with Convolutions. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Li, F.-F. (2009, January 20\u201325). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_37","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 4\u20139). Automatic Differentiation in PyTorch. Proceedings of the Advances in Neural Information Processing Systems Workshops (NIPS-W), Long Beach, CA, USA."},{"key":"ref_38","unstructured":"Sung, W., Shin, S., and Hwang, K. (2015). Resiliency of Deep Neural Networks under Quantization. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/31\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T17:59:16Z","timestamp":1760119156000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/31"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,4]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["a16010031"],"URL":"https:\/\/doi.org\/10.3390\/a16010031","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,4]]}}}