{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T22:19:45Z","timestamp":1773008385001,"version":"3.50.1"},"reference-count":37,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T00:00:00Z","timestamp":1763596800000},"content-version":"vor","delay-in-days":323,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62301165"],"award-info":[{"award-number":["62301165"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100021171","name":"Basic and Applied Basic Research Foundation of Guangdong Province","doi-asserted-by":"publisher","award":["2022A1515110774"],"award-info":[{"award-number":["2022A1515110774"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Circuits, Devices &amp; Systems"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:p>In the field of hardware accelerators for convolutional neural network (CNN) inference, quantization techniques have been widely employed to enhance the performance. The prevailing quantization scheme of the accelerator at present is using signed 8\u2010bit integer variables (INT8). CNN accelerators support INT8, while lower precision INT4 is less common. Accelerators supporting INT4 depthwise separable convolution (DWC) are even rarer. Therefore, this article presents a high\u2010performance CNN accelerator that not only supports 8\u2010bit and 4\u2010bit data but also supports standard convolution (SC) and DWC. Additionally, in order to improve the transmission efficiency of DWC, an intermediate cache strategy is proposed, using a pointwise convolution (PW) input buffer (PW BUF) to store output data from depthwise convolution (DW) to avoid off\u2010chip transmission. Furthermore, to address the issue of a DSP cannot perform two 4\u2009\u00d7\u20094\u2010bit multiplications when dealing with DW, a processing element (PE) is designed to make full use of DSP hardware resources. Finally, this accelerator is implemented on ZYNQ ZC706 with a frequency of 200\u2009MHz. Experimental results show that it achieves a performance up to 307.88 giga operations per second (GOPS) on VGG, reaching 97.9% peak performance; while on MobileNet, it achieves efficient performance with 206.43 GOPS with only 392 DSPs. Compared with mainstream CNN accelerators, it increases DSP utilization rate (GOPS\/DSP) by 1.5\u00d7 to 33.5\u00d7.<\/jats:p>","DOI":"10.1049\/cds2\/5433740","type":"journal-article","created":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T04:37:04Z","timestamp":1763699824000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A High\u2010Efficiency CNN Accelerator With Mixed Low\u2010Precision Quantization"],"prefix":"10.1049","volume":"2025","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1237-4945","authenticated-orcid":false,"given":"Xianghong","family":"Hu","sequence":"first","affiliation":[]},{"given":"Jinhui","family":"Pan","sequence":"additional","affiliation":[]},{"given":"Yue","family":"Ding","sequence":"additional","affiliation":[]},{"given":"Wenji","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Zhejun","family":"Zheng","sequence":"additional","affiliation":[]},{"given":"Xueming","family":"Li","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8034-0616","authenticated-orcid":false,"given":"Hongmin","family":"Huang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2421-7621","authenticated-orcid":false,"given":"Xiaoming","family":"Xiong","sequence":"additional","affiliation":[]}],"member":"265","published-online":{"date-parts":[[2025,11,20]]},"reference":[{"key":"e_1_2_10_1_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2023.110326"},{"key":"e_1_2_10_2_2","unstructured":"DongZ. YaoZ. ArfeenD. GholamiA. MahoneyM. W. andKeutzerK. HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks."},{"key":"e_1_2_10_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.isprsjprs.2024.12.002"},{"key":"e_1_2_10_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TGRS.2025.3585119"},{"key":"e_1_2_10_5_2","unstructured":"KrizhevskyI. S.andHintonG. E. ImageNet Classification With Deep Convolutional Neural Networks Proceedings of the International Conference on Neural Information Processing Systems 2012 IEEE Access Advances in Neural Information Processing Systems 1097\u20131105."},{"key":"e_1_2_10_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2987080"},{"key":"e_1_2_10_7_2","unstructured":"DaiJ. LiY. HeK. andSunJ. R-FCN: Object Detection via Region Based Fully Convolutional Networks Proceedings of Neural Information Processing Systems 2016 Curran Associates Inc. 379\u2013387."},{"key":"e_1_2_10_8_2","first-page":"2287","article-title":"Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches","volume":"17","author":"\u017dbontar J.","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_10_9_2","first-page":"115","article-title":"Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks","volume":"542","author":"Andre E.","year":"2019","journal-title":"Nature"},{"key":"e_1_2_10_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.mejo.2021.105319"},{"key":"e_1_2_10_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.mejo.2023.105805"},{"key":"e_1_2_10_12_2","unstructured":"ZissermanA.andSimonyanK. Very Deep Convolutional Networks for Large-Scale Image Recognition International Conference on Learning Representations 2015 San Diego CA 64\u201369."},{"key":"e_1_2_10_13_2","doi-asserted-by":"crossref","unstructured":"HeK. ZhangX. RenS. andSunJ. Deep Residual Learning for Image Recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_10_14_2","doi-asserted-by":"crossref","unstructured":"MaY. CaoY. VrudhulaS. andSeoJ.-s. An Automatic RTL Compiler for High-Throughput FPGA Implementation of Diverse Deep Convolutional Neural Networks Field Programmable Logic and Applications (FPL) 2017 27th International Conference 2017 IEEE 1\u20138.","DOI":"10.23919\/FPL.2017.8056824"},{"key":"e_1_2_10_15_2","doi-asserted-by":"crossref","unstructured":"MousouliotisP. G. PanayiotouK. L. TsardouliasE. G. PetrouL. P. andSymeonidisA. L. Expanding a Robot\u2019s Life: Low Power Object Recognition via FPGA-Based DCNN Deployment 2018 7th International Conference on Modern Circuits and Systems Technologies (MOCAST) 2018 IEEE 1\u20134.","DOI":"10.1109\/MOCAST.2018.8376612"},{"key":"e_1_2_10_16_2","first-page":"1","article-title":"Accelerating Deep Convolutional Neural Networks Using Specialized Hardware","volume":"2","author":"Ovtcharov K.","year":"2015","journal-title":"Microsoft Research Whitepaper"},{"key":"e_1_2_10_17_2","unstructured":"HowardA. G. et al.MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 2017 arXiv April 16 2017."},{"key":"e_1_2_10_18_2","doi-asserted-by":"crossref","unstructured":"SuJ. FaraoneJ. andLiuJ. et al.Re-Dundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification International Symposium on Applied Reconfigurable Computing 2018 Springer 16\u201328.","DOI":"10.1007\/978-3-319-78890-6_2"},{"key":"e_1_2_10_19_2","doi-asserted-by":"crossref","unstructured":"ZhaoR. NiuX. andLukW. Automatic Optimising CNN With Depthwise Separable Convolution on FPGA: (Abstract Only) Proceedings of the 2018 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays 2018 ACM.","DOI":"10.1145\/3174243.3174959"},{"key":"e_1_2_10_20_2","doi-asserted-by":"crossref","unstructured":"YuY. ZhaoT. WangK. andHeL. Light-OPU: An FPGA-Based Overlay Processor for Lightweight Convolutional Neural Networks Proceedings of the 2020 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays 2020 Seaside CA USA ACM 122\u2013132.","DOI":"10.1145\/3373087.3375311"},{"key":"e_1_2_10_21_2","doi-asserted-by":"crossref","unstructured":"WuC. ZhuangJ. WangK. andHeL. MP-OPU: A Mixed Precision FPGA-based Overlay Processor for Convolutional Neural Networks 2021 31st International Conference on Field-Programmable Logic and Applications (FPL) 2021 Dresden Germany IEEE 33\u201337.","DOI":"10.1109\/FPL53798.2021.00014"},{"key":"e_1_2_10_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2019.2939726"},{"key":"e_1_2_10_23_2","doi-asserted-by":"crossref","unstructured":"RastegariM. OrdonezV. RedmonJ. andFarhadiA. et al.XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks Proceedings of the European Conference on Computer Vision 2016 Cham Springer International Publishing 525\u2013542 European Conference on Computer Vision.","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_10_24_2","unstructured":"ZhouS. WuY. NiZ. ZhouX. WenH. andZouY. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients 2018 arXiv February 01 2018 Accessed: July 15 2024."},{"key":"e_1_2_10_25_2","unstructured":"LouQ. GuoF. KimM. LiuL. andJiangL. AutoQ: Automated Kernel-Wise Neural Network Quantization 2019 International Conference on Learning Representations (ICLR) arXiv preprint arXiv:1902.05690."},{"key":"e_1_2_10_26_2","doi-asserted-by":"crossref","unstructured":"WangK. LiuZ. LinY. LinJ. andHanS. HAQ: Hardware-Aware Automated Quantization With Mixed Precision 2019 International Conference on Computer Vision and Pattern Recognition (CVPR) 2019 Long Beach CA USA IEEE 8604\u20138612.","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_2_10_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2021.3131581"},{"key":"e_1_2_10_28_2","doi-asserted-by":"crossref","unstructured":"HeZ. ShenA. LiQ. ChengQ. andYuH. Agile Hardware and Software Co-Design for RISC-V-Based Multi-Precision Deep Learning Microprocessor Proceedings of the 28th Asia and South Pacific Design Automation Conference 2023 ACM Tokyo Japan 490\u2013495.","DOI":"10.1145\/3566097.3567871"},{"key":"e_1_2_10_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2021.3078541"},{"key":"e_1_2_10_30_2","first-page":"4668","article-title":"Mobile-X: Dedicated FPGA Implementation of the MobileNet Accelerator Optimizing Depthwise Separable Convolution","volume":"71","author":"Hong H.","year":"2024","journal-title":"IEEE Transactions on Circuits and Systems"},{"key":"e_1_2_10_31_2","doi-asserted-by":"crossref","unstructured":"SunM. LiZ. andLuA. et al.Film-QNN: Efficient FPGA Acceleration of Deep Neural Networks With Intra-layer Mixed-Precision Quantization Proceedings of the ACM\/SIGDA International Symposium on Field Programmable Gate Arrays 2022 FPGA 134\u2013145.","DOI":"10.1145\/3490422.3502364"},{"key":"e_1_2_10_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.vlsi.2025.102383"},{"key":"e_1_2_10_33_2","unstructured":"YaoZ. DongZ. andZhengZ. et al.MeilaM.andZhangT. HAWQ-V3: Dyadic Neural Network Quantization Proceedings of Machine Learning Research 139 International Conference On Machine Learning 2021."},{"key":"e_1_2_10_34_2","doi-asserted-by":"crossref","unstructured":"WuJ. ZhouJ. GaoY. DingY. WongN. andSoH. K.-H. MSD: Mixing Signed Digit Representations for Hardware-efficient DNN Acceleration on FPGA With Heterogeneous Resources 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2023 Marina Del Rey CA USA IEEE 94\u2013104.","DOI":"10.1109\/FCCM57271.2023.00019"},{"key":"e_1_2_10_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00033"},{"key":"e_1_2_10_36_2","doi-asserted-by":"crossref","unstructured":"LiuW. LiY. YangY. ZhuJ. andLiuL. Design an Efficient DNN Inference Framework With PS-PL Synergies in FPGA for Edge Computing Proceedings Chinese Automation Congress 2022 CAC 4186\u20134190.","DOI":"10.1109\/CAC57257.2022.10055526"},{"key":"e_1_2_10_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11554-023-01378-5"}],"container-title":["IET Circuits, Devices &amp; Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/cds2\/5433740","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/full-xml\/10.1049\/cds2\/5433740","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/cds2\/5433740","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T20:21:11Z","timestamp":1773001271000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/cds2\/5433740"}},"subtitle":[],"editor":[{"given":"Siew-Kei","family":"Lam","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.1049\/cds2\/5433740"],"URL":"https:\/\/doi.org\/10.1049\/cds2\/5433740","archive":["Portico"],"relation":{},"ISSN":["1751-858X","1751-8598"],"issn-type":[{"value":"1751-858X","type":"print"},{"value":"1751-8598","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"2025-05-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"5433740"}}