{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:22:18Z","timestamp":1750220538526,"version":"3.41.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T00:00:00Z","timestamp":1631491200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Field-programmable Gate Array\u00a0(FPGA) is a high-performance computing platform for Convolution Neural Networks\u00a0(CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd.<\/jats:p>\n          <jats:p>In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity\u00a0(SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16\/VGG-nagadomi with CIFAR-10 and ResNet-18\/34\/50 with ImageNet show up to 11.8\u00d7\/8.67\u00d7 and 8.17\u00d7\/8.31\u00d7\/10.6\u00d7 speedup, 12.74\u00d7\/9.19\u00d7 and 8.75\u00d7\/8.81\u00d7\/11.1\u00d7 energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator\u00a0[20] with negligible loss of model accuracy. We also show that our design has 4.11\u00d7 speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.<\/jats:p>","DOI":"10.1145\/3467476","type":"journal-article","created":{"date-parts":[[2021,9,14]],"date-time":"2021-09-14T00:51:51Z","timestamp":1631580711000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization"],"prefix":"10.1145","volume":"14","author":[{"given":"Tao","family":"Yang","sequence":"first","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Zhezhi","family":"He","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Tengchuan","family":"Kou","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Qingzheng","family":"Li","sequence":"additional","affiliation":[{"name":"SenseTime Group Limited, Shanghai, China"}]},{"given":"Qi","family":"Han","sequence":"additional","affiliation":[{"name":"SenseTime Group Limited, Shanghai, China"}]},{"given":"Haibao","family":"Yu","sequence":"additional","affiliation":[{"name":"SenseTime Group Limited, Shanghai, China"}]},{"given":"Fangxin","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Yun","family":"Liang","sequence":"additional","affiliation":[{"name":"School of EECS, Peking University, Beijing, China"}]},{"given":"Li","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2021,9,13]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Wikipedia. 2018. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Sparse_matrix.  Wikipedia. 2018. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Sparse_matrix."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293898"},{"key":"e_1_2_1_3_1","volume-title":"PACT: Parameterized clipping activation for quantized neural networks.","author":"Choi Jungwook","year":"2018","unstructured":"Jungwook Choi , Zhuo Wang , Swagath Venkataramani , Pierce I.- Jen Chuang , Vijayalakshmi Srinivasan , and Kailash Gopalakrishnan . 2018 . PACT: Parameterized clipping activation for quantized neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1805.06085. Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I.-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1805.06085."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2976475"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289185"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2808319"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR'16)","author":"Han Song","key":"e_1_2_1_8_1","unstructured":"Song Han , Huizi Mao , and William J. Dally . 2016. Deep Compression: Compressing deep neural network with pruning, trained quantization and huffman coding . In Proceedings of the 4th International Conference on Learning Representations (ICLR'16) . Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR'16)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"volume-title":"Learning multiple layers of features from tiny images. Technical report","author":"Krizhevsky Alex","key":"e_1_2_1_10_1","unstructured":"Alex Krizhevsky . 2012. Learning multiple layers of features from tiny images. Technical report , University of Toronto . Alex Krizhevsky. 2012. Learning multiple layers of features from tiny images. Technical report, University of Toronto."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20)","author":"Li G.","key":"e_1_2_1_12_1","unstructured":"G. Li , L. Liu , X. Wang , X. Ma , and X. Feng . 2020. Lance: Efficient low-precision quantized winograd convolution for neural networks based on graphics processing units . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20) . 3842\u20133846. G. Li, L. Liu, X. Wang, X. Ma, and X. Feng. 2020. Lance: Efficient low-precision quantized winograd convolution for neural networks based on graphics processing units. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20). 3842\u20133846."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR'17)","author":"Li Hao","year":"2017","unstructured":"Hao Li , Asim Kadav , Igor Durdanovic , Hanan Samet , and Hans Peter Graf . 2017 . Pruning filters for efficient convnets . In Proceedings of the 5th International Conference on Learning Representations (ICLR'17) . Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In Proceedings of the 5th International Conference on Learning Representations (ICLR'17)."},{"key":"e_1_2_1_14_1","unstructured":"Sheng R. Li Jongsoo Park and Ping Tak Peter Tang. 2017. Enabling sparse winograd convolution by native pruning. Retrieved from https:\/\/abs\/1702.08597.  Sheng R. Li Jongsoo Park and Ping Tak Peter Tang. 2017. Enabling sparse winograd convolution by native pruning. Retrieved from https:\/\/abs\/1702.08597."},{"key":"e_1_2_1_15_1","unstructured":"Hanxiao Liu Karen Simonyan and Yiming Yang. 2018. DARTS: differentiable architecture search. Retrieved from http:\/\/arxiv.org\/abs\/1806.09055.  Hanxiao Liu Karen Simonyan and Yiming Yang. 2018. DARTS: differentiable architecture search. Retrieved from http:\/\/arxiv.org\/abs\/1806.09055."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.12.038"},{"volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR'18)","author":"Liu Xingyu","key":"e_1_2_1_17_1","unstructured":"Xingyu Liu , Jeff Pool , Song Han , and William J. Dally . 2018. Efficient sparse-winograd convolutional neural networks . In Proceedings of the 6th International Conference on Learning Representations (ICLR'18) . Xingyu Liu, Jeff Pool, Song Han, and William J. Dally. 2018. Efficient sparse-winograd convolutional neural networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR'18)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.298"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3195970.3196120"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the IEEE 25th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM'17)","author":"Lu L.","year":"2017","unstructured":"L. Lu , Y. Liang , Q. Xiao , and S. Yan . 2017. Evaluating fast algorithms for convolutional neural networks on FPGAs . In Proceedings of the IEEE 25th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM'17) . 101\u2013108. DOI:https:\/\/doi.org\/10.1109\/FCCM. 2017 .64 10.1109\/FCCM.2017.64 L. Lu, Y. Liang, Q. Xiao, and S. Yan. 2017. Evaluating fast algorithms for convolutional neural networks on FPGAs. In Proceedings of the IEEE 25th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM'17). 101\u2013108. DOI:https:\/\/doi.org\/10.1109\/FCCM.2017.64"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157645"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'17). 1927","author":"Mao H.","year":"1934","unstructured":"H. Mao , S. Han , J. Pool , W. Li , X. Liu , Y. Wang , and W. J. Dally . 2017. Exploring the granularity of sparsity in convolutional neural networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'17). 1927 \u2013 1934 . DOI:https:\/\/doi.org\/10.1109\/CVPRW.2017.241 10.1109\/CVPRW.2017.241 H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally. 2017. Exploring the granularity of sparsity in convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'17). 1927\u20131934. DOI:https:\/\/doi.org\/10.1109\/CVPRW.2017.241"},{"volume-title":"5th place.","key":"e_1_2_1_23_1","unstructured":"Nagadomi. 2014. Code for kaggle-cifar10 competition . 5th place. Retrieved from https:\/\/github.com\/nagadomi\/kaggle-cifar10-torch7. Nagadomi. 2014. Code for kaggle-cifar10 competition. 5th place.Retrieved from https:\/\/github.com\/nagadomi\/kaggle-cifar10-torch7."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00063"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR'15)","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very deep convolutional networks for large-scale image recognition . In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15) . Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15)."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15)","author":"Szegedy C.","year":"2015","unstructured":"C. Szegedy , Wei Liu , Yangqing Jia , P. Sermanet , S. Reed , D. Anguelov , D. Erhan , V. Vanhoucke , and A. Rabinovich . 2015. Going deeper with convolutions . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15) . 1\u20139. DOI:https:\/\/doi.org\/10.1109\/CVPR. 2015 .7298594 10.1109\/CVPR.2015.7298594 C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15). 1\u20139. DOI:https:\/\/doi.org\/10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_29_1","unstructured":"Yaman Umuroglu and Magnus Jahre. 2017. Streamlined deployment for quantized neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1709.04060.  Yaman Umuroglu and Magnus Jahre. 2017. Streamlined deployment for quantized neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1709.04060."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683512"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACRIM.1991.160742"},{"volume-title":"Proceedings of the 30th International Conference on Field-programmable Logic and Applications (FPL'20)","author":"Yang T.","key":"e_1_2_1_33_1","unstructured":"T. Yang , Y. Liao , J. Shi , Y. Liang , N. Jing , and L. Jiang . 2020. A winograd-based CNN accelerator with a fine-grained regular sparsity pattern . In Proceedings of the 30th International Conference on Field-programmable Logic and Applications (FPL'20) . 254\u2013261. T. Yang, Y. Liao, J. Shi, Y. Liang, N. Jing, and L. Jiang. 2020. A winograd-based CNN accelerator with a fine-grained regular sparsity pattern. In Proceedings of the 30th International Conference on Field-programmable Logic and Applications (FPL'20). 254\u2013261."},{"key":"e_1_2_1_34_1","unstructured":"Haibao Yu Qi Han Jianbo Li Jianping Shi Guangliang Cheng and Bin Fan. 2020. Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization. Retrieved from https:\/\/arXiv.cs.CV\/2007.10026.  Haibao Yu Qi Han Jianbo Li Jianping Shi Guangliang Cheng and Bin Fan. 2020. Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization. Retrieved from https:\/\/arXiv.cs.CV\/2007.10026."},{"key":"e_1_2_1_35_1","unstructured":"Jiecao Yu Jongsoo Park and Maxim Naumov. 2018. Spatial-winograd pruning enabling sparse winograd convolution. Retrieved from https:\/\/abs\/1901.02132.  Jiecao Yu Jongsoo Park and Maxim Naumov. 2018. Spatial-winograd pruning enabling sparse winograd convolution. Retrieved from https:\/\/abs\/1901.02132."},{"key":"e_1_2_1_36_1","unstructured":"Aojun Zhou Anbang Yao Yiwen Guo Lin Xu and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. Retrieved from http:\/\/arxiv.org\/abs\/1702.03044.  Aojun Zhou Anbang Yao Yiwen Guo Lin Xu and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. Retrieved from http:\/\/arxiv.org\/abs\/1702.03044."},{"key":"e_1_2_1_37_1","unstructured":"Shuchang Zhou Zekun Ni Xinyu Zhou He Wen Yuxin Wu and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from http:\/\/arxiv.org\/abs\/1606.06160.  Shuchang Zhou Zekun Ni Xinyu Zhou He Wen Yuxin Wu and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from http:\/\/arxiv.org\/abs\/1606.06160."},{"key":"e_1_2_1_38_1","volume-title":"Dally","author":"Zhu Chenzhuo","year":"2016","unstructured":"Chenzhuo Zhu , Song Han , Huizi Mao , and William J . Dally . 2016 . Trained ternary quantization. Retrieved from http:\/\/arxiv.org\/abs\/1612.01064. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. Retrieved from http:\/\/arxiv.org\/abs\/1612.01064."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3467476","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3467476","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:25:07Z","timestamp":1750195507000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3467476"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,13]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3467476"],"URL":"https:\/\/doi.org\/10.1145\/3467476","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2021,9,13]]},"assertion":[{"value":"2020-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}