{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T00:45:37Z","timestamp":1773708337512,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation","doi-asserted-by":"crossref","award":["62372253, 62002175"],"award-info":[{"award-number":["62372253, 62002175"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Natural Science Foundation of Tianjin Fund","award":["23JCYBJC00010"],"award-info":[{"award-number":["23JCYBJC00010"]}]},{"name":"CCF-Baidu Open Fund","award":["CCF-Baidu202310"],"award-info":[{"award-number":["CCF-Baidu202310"]}]},{"name":"Open Project Fund of State Key Laboratory of Computer Architecture"},{"name":"Institute of Computing Technology"},{"DOI":"10.13039\/501100002367","name":"Chinese Academy of Sciences","doi-asserted-by":"crossref","award":["CARCHB202016"],"award-info":[{"award-number":["CARCHB202016"]}],"id":[{"id":"10.13039\/501100002367","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Convolutional Neural Networks (CNNs) can benefit from the computational reductions provided by the Winograd minimal filtering algorithm and weight pruning. However, harnessing the potential of both methods simultaneously introduces complexity in designing pruning algorithms and accelerators. Prior studies aimed to establish regular sparsity patterns in the Winograd domain, but they were primarily suited for small tiles, with domain transformation dictating the sparsity ratio. The irregularities in data access and domain transformation pose challenges in accelerator design, especially for larger Winograd tiles. This paper introduces \u201cWinols,\u201d an innovative algorithm-hardware co-design strategy that emphasizes the strengths of the large-tiling Winograd algorithm. Through a spatial-to-Winograd relevance degree evaluation, we extensively explore domain transformation and propose a cross-domain pruning technique that retains sparsity across both spatial and Winograd domains. To compress pruned weight matrices, we invent a relative column encoding scheme. We further design an FPGA-based accelerator for CNN models with large Winograd tiles and sparse matrix-vector operations. Evaluations indicate our pruning method achieves up to 80% weight tile sparsity in the Winograd domain without compromising accuracy. Our Winols accelerator outperforms dense accelerator by a factor of 31.7\u00d7 in inference latency. When compared with prevailing sparse Winograd accelerators, Winols reduces latency by an average of 10.9\u00d7, and improves DSP and energy efficiencies by over 5.6\u00d7 and 5.7\u00d7, respectively. When compared with the CPU and GPU platform, Winols accelerator with tile size 8\u00d7 8 achieves 24.6\u00d7 and 2.84\u00d7 energy efficiency improvements, respectively.<\/jats:p>","DOI":"10.1145\/3643682","type":"journal-article","created":{"date-parts":[[2024,1,31]],"date-time":"2024-01-31T11:58:51Z","timestamp":1706702331000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAs"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9857-5352","authenticated-orcid":false,"given":"Kunpeng","family":"Xie","sequence":"first","affiliation":[{"name":"Nankai University, Tianjin Key Laboratory of Network and Data Security Technology, and the Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0805-6394","authenticated-orcid":false,"given":"Ye","family":"Lu","sequence":"additional","affiliation":[{"name":"Nankai University, Tianjin Key Laboratory of Network and Data Security Technology, and the Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9540-2093","authenticated-orcid":false,"given":"Xinyu","family":"He","sequence":"additional","affiliation":[{"name":"Nankai University, Tianjin Key Laboratory of Network and Data Security Technology, and the Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3553-734X","authenticated-orcid":false,"given":"Dezhi","family":"Yi","sequence":"additional","affiliation":[{"name":"Nankai University, Tianjin Key Laboratory of Network and Data Security Technology, and the Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-0397-4875","authenticated-orcid":false,"given":"Huijuan","family":"Dong","sequence":"additional","affiliation":[{"name":"Nankai University, Tianjin Key Laboratory of Network and Data Security Technology, and the Key Laboratory of Data and Intelligent System Security, Ministry of Education, Tianjin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5798-2282","authenticated-orcid":false,"given":"Yao","family":"Chen","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,23]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1109\/ICACCS48705.2020.9074315","volume-title":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","author":"Adarsh Pranav","year":"2020","unstructured":"Pranav Adarsh, Pratibha Rathi, and Manoj Kumar. 2020. YOLO v3-tiny: Object detection and recognition using one stage improved model. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). 687\u2013694."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00296-0_5"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3100298"},{"key":"e_1_3_1_5_2","series-title":"Proceedings of the 2021 on Great Lakes Symposium on VLSI","first-page":"157","author":"Chen Yao","year":"2021","unstructured":"Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, and Cong Hao. 2021. 3U-EdgeAI: Ultra-low memory training, ultra-low bitwidth quantization, and ultra-low latency acceleration. In Proceedings of the 2021 on Great Lakes Symposium on VLSI (Virtual Event, USA) (GLSVLSI \u201921). Association for Computing Machinery, New York, NY, USA, 157\u2013162."},{"key":"e_1_3_1_6_2","series-title":"Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","first-page":"73","author":"Chen Yao","year":"2019","unstructured":"Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, and Deming Chen. 2019. Cloud-DNN: An open framework for mapping DNN models to cloud FPGAs. In Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA \u201919). Association for Computing Machinery, New York, NY, USA, 73\u201382."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2019.00012"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_9_2","article-title":"cuDNN: Efficient primitives for deep learning","volume":"1410","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. CoRR abs\/1410.0759 (2014). arXiv:1410.0759","journal-title":"CoRR"},{"issue":"3","key":"e_1_3_1_10_2","first-page":"34","article-title":"An FPGA overlay for CNN inference with fine-grained flexible parallelism","volume":"19","author":"Choudhury Ziaul","year":"2022","unstructured":"Ziaul Choudhury, Shashwat Shrivastava, Lavanya Ramapantulu, and Suresh Purini. 2022. An FPGA overlay for CNN inference with fine-grained flexible parallelism. ACM Trans. Archit. Code Optim. 19, 3, Article 34 (May2022), 26 pages.","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"e_1_3_1_11_2","first-page":"1","volume-title":"2017 54th ACM\/EDAC\/IEEE Design Automation Conference (DAC)","author":"Cong Jason","year":"2017","unstructured":"Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2017. Bandwidth optimization through on-chip memory restructuring for HLS. In 2017 54th ACM\/EDAC\/IEEE Design Automation Conference (DAC). 1\u20136. DOI:10.1145\/3061639.3062208"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.1109\/WACV51458.2022.00138","volume-title":"2022 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV)","author":"Ganesh Prakhar","year":"2022","unstructured":"Prakhar Ganesh, Yao Chen, Yin Yang, Deming Chen, and Marianne Winslett. 2022. YOLO-ReT: Towards high accuracy real-time object detection on edge GPUs. In 2022 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE Computer Society, Los Alamitos, CA, USA, 1311\u20131321."},{"key":"e_1_3_1_13_2","first-page":"151","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12\u201316, 2019","author":"Gondimalla Ashish","year":"2019","unstructured":"Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. 2019. SparTen: A sparse tensor accelerator for convolutional neural networks. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12\u201316, 2019. ACM, 151\u2013165."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3129615"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3046762"},{"key":"e_1_3_1_16_2","volume-title":"4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2\u20134, 2016, Conference Track Proceedings","author":"Han Song","year":"2016","unstructured":"Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2\u20134, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_1_17_2","first-page":"770","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27\u201330, 2016","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27\u201330, 2016. IEEE Computer Society, 770\u2013778."},{"key":"e_1_3_1_18_2","first-page":"4174","volume-title":"The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7\u201312, 2020","author":"Huang Di","year":"2020","unstructured":"Di Huang, Xishan Zhang, Rui Zhang, Tian Zhi, Deyuan He, Jiaming Guo, Chang Liu, Qi Guo, Zidong Du, Shaoli Liu, Tianshi Chen, and Yunji Chen. 2020. DWM: A decomposable Winograd method for convolution acceleration. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7\u201312, 2020. AAAI Press, 4174\u20134181."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3564606"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2019.2941250"},{"key":"e_1_3_1_21_2","first-page":"1106","volume-title":"Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, L\u00e9on Bottou, and Kilian Q. Weinberger (Eds.). 1106\u20131114."},{"key":"e_1_3_1_22_2","series-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event","first-page":"5544","volume":"119","author":"Kusupati Aditya","year":"2020","unstructured":"Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham M. Kakade, and Ali Farhadi. 2020. Soft threshold weight reparameterization for learnable sparsity. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 5544\u20135555."},{"key":"e_1_3_1_23_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Lavin Andrew","year":"2016","unstructured":"Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_1_24_2","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings","author":"Li Hao","year":"2017","unstructured":"Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient ConvNets. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings. OpenReview.net."},{"key":"e_1_3_1_25_2","series-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event","first-page":"5863","volume":"119","author":"Li Shiyu","year":"2020","unstructured":"Shiyu Li, Edward Hanson, Hai Li, and Yiran Chen. 2020. PENNI: Pruned kernel sharing for efficient CNN inference. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event(Proceedings of Machine Learning Research, Vol. 119). PMLR, 5863\u20135873."},{"key":"e_1_3_1_26_2","first-page":"992","volume-title":"MICRO \u201921: 54th Annual IEEE\/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18\u201322, 2021","author":"Li Shiyu","year":"2021","unstructured":"Shiyu Li, Edward Hanson, Xuehai Qian, Hai (Helen) Li, and Yiran Chen. 2021. ESCALATE: Boosting the efficiency of sparse CNN accelerator with kernel decomposition. In MICRO \u201921: 54th Annual IEEE\/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18\u201322, 2021. ACM, 992\u20131004."},{"key":"e_1_3_1_27_2","article-title":"Enabling sparse Winograd convolution by native pruning","volume":"1702","author":"Li Sheng R.","year":"2017","unstructured":"Sheng R. Li, Jongsoo Park, and Ping Tak Peter Tang. 2017. Enabling sparse Winograd convolution by native pruning. CoRR abs\/1702.08597 (2017). arXiv:1702.08597","journal-title":"CoRR"},{"key":"e_1_3_1_28_2","first-page":"258","volume-title":"32nd IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2021, Virtual Conference, USA, July 7\u20139, 2021","author":"Liu Xinheng","year":"2021","unstructured":"Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, and Deming Chen. 2021. WinoCNN: Kernel sharing Winograd systolic array for efficient convolutional neural network acceleration on FPGAs. In 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors, ASAP 2021, Virtual Conference, USA, July 7\u20139, 2021. IEEE, 258\u2013265."},{"key":"e_1_3_1_29_2","first-page":"1","volume-title":"Proc. Convolutional Neural Netw. Vis. Recognit.","author":"Liu Xingyu","year":"2016","unstructured":"Xingyu Liu and Yatish Turakhia. 2016. Pruning of Winograd and FFT based convolution algorithm. In Proc. Convolutional Neural Netw. Vis. Recognit.1\u20137."},{"key":"e_1_3_1_30_2","first-page":"2755","volume-title":"IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22\u201329, 2017","author":"Liu Zhuang","year":"2017","unstructured":"Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22\u201329, 2017. IEEE Computer Society, 2755\u20132763."},{"key":"e_1_3_1_31_2","first-page":"1","volume-title":"2018 55th ACM\/ESDA\/IEEE Design Automation Conference (DAC)","author":"Lu Liqiang","year":"2018","unstructured":"Liqiang Lu and Yun Liang. 2018. SpWA: An efficient sparse Winograd convolutional neural networks accelerator on FPGAs. In 2018 55th ACM\/ESDA\/IEEE Design Automation Conference (DAC). 1\u20136."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.64"},{"key":"e_1_3_1_33_2","first-page":"1","article-title":"Non-structured DNN weight pruning\u2013is it beneficial in any platform?","author":"Ma Xiaolong","year":"2021","unstructured":"Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, and Yanzhi Wang. 2021. Non-structured DNN weight pruning\u2013is it beneficial in any platform? IEEE Transactions on Neural Networks and Learning Systems (2021), 1\u201315.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1109\/IMCSIT.2010.5680039","volume-title":"Proceedings of the International Multiconference on Computer Science and Information Technology","author":"Martone Michele","year":"2010","unstructured":"Michele Martone, Salvatore Filippone, Salvatore Tucci, Pawe\u0142 Gepner, and Marcin Paprzycki. 2010. Use of hybrid recursive CSR\/COO data structures in sparse matrix-vector multiplication. In Proceedings of the International Multiconference on Computer Science and Information Technology. 327\u2013335. DOI:10.1109\/IMCSIT.2010.5680039"},{"key":"e_1_3_1_35_2","first-page":"751","volume-title":"2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Mo Huiyu","year":"2020","unstructured":"Huiyu Mo, Leibo Liu, Wenjing Hu, Wenping Zhu, Qiang Li, Ang Li, Shouyi Yin, Jian Chen, Xiaowei Jiang, and Shaojun Wei. 2020. TFE: Energy-efficient transferred filter-based engine to compress and accelerate convolutional neural networks. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 751\u2013765."},{"key":"e_1_3_1_36_2","series-title":"Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems","first-page":"907","author":"Niu Wei","year":"2020","unstructured":"Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS \u201920). Association for Computing Machinery, New York, NY, USA, 907\u2013922."},{"key":"e_1_3_1_37_2","first-page":"27","volume-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24\u201328, 2017","author":"Parashar Angshuman","year":"2017","unstructured":"Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24\u201328, 2017. ACM, 27\u201340."},{"key":"e_1_3_1_38_2","article-title":"Faster CNNs with direct sparse convolutions and guided pruning","author":"Park Jongsoo","year":"2016","unstructured":"Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2016. Faster CNNs with direct sparse convolutions and guided pruning. arXiv preprint arXiv:1608.01409 (2016).","journal-title":"arXiv preprint arXiv:1608.01409"},{"key":"e_1_3_1_39_2","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-06486-4_7"},{"key":"e_1_3_1_41_2","first-page":"1448","volume-title":"IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019","author":"Wang Haonan","year":"2019","unstructured":"Haonan Wang, Wenjian Liu, Tianyi Xu, Jun Lin, and Zhongfeng Wang. 2019. A low-latency sparse-Winograd accelerator for convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019. IEEE, 1448\u20131452. DOI:10.1109\/ICASSP.2019.8683512"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00088"},{"key":"e_1_3_1_43_2","first-page":"29:1\u201329:6","volume-title":"Proceedings of the 54th Annual Design Automation Conference, DAC 2017, Austin, TX, USA, June 18\u201322, 2017","author":"Wei Xuechao","year":"2017","unstructured":"Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference, DAC 2017, Austin, TX, USA, June 18\u201322, 2017. ACM, 29:1\u201329:6."},{"issue":"5","key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"936","DOI":"10.1109\/TVLSI.2021.3060041","article-title":"SWM: A high-performance sparse-Winograd matrix multiplication CNN accelerator","volume":"29","author":"Wu Di","year":"2021","unstructured":"Di Wu, Xitian Fan, Wei Cao, and Lingli Wang. 2021. SWM: A high-performance sparse-Winograd matrix multiplication CNN accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29, 5 (2021), 936\u2013949.","journal-title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems"},{"key":"e_1_3_1_45_2","first-page":"570","volume-title":"2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)","author":"Xie Xinfeng","year":"2021","unstructured":"Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, and Yuan Xie. 2021. SpaceA: Sparse matrix vector multiplication on processing-in-memory accelerator. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 570\u2013583."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3600092"},{"issue":"4","key":"e_1_3_1_47_2","first-page":"18:1\u201318:28","article-title":"BISWSRBS: A Winograd-based CNN accelerator with a fine-grained regular sparsity pattern and mixed precision quantization","volume":"14","author":"Yang Tao","year":"2021","unstructured":"Tao Yang, Zhezhi He, Tengchuan Kou, Qingzheng Li, Qi Han, Haibao Yu, Fangxin Liu, Yun Liang, and Li Jiang. 2021. BISWSRBS: A Winograd-based CNN accelerator with a fine-grained regular sparsity pattern and mixed precision quantization. ACM Trans. Reconfigurable Technol. Syst. 14, 4 (2021), 18:1\u201318:28.","journal-title":"ACM Trans. Reconfigurable Technol. Syst."},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1109\/FPL50879.2020.00050","volume-title":"2020 30th International Conference on Field-Programmable Logic and Applications (FPL)","author":"Yang Tao","year":"2020","unstructured":"Tao Yang, Yunkun Liao, Jianping Shi, Yun Liang, Naifeng Jing, and Li Jiang. 2020. A Winograd-based CNN accelerator with a fine-grained regular sparsity pattern. In 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). 254\u2013261."},{"key":"e_1_3_1_49_2","article-title":"Spatial-Winograd pruning enabling sparse Winograd convolution","volume":"1901","author":"Yu Jiecao","year":"2019","unstructured":"Jiecao Yu, Jongsoo Park, and Maxim Naumov. 2019. Spatial-Winograd pruning enabling sparse Winograd convolution. CoRR abs\/1901.02132 (2019). arXiv:1901.02132","journal-title":"CoRR"},{"issue":"11","key":"e_1_3_1_50_2","doi-asserted-by":"crossref","first-page":"2072","DOI":"10.1109\/TCAD.2017.2785257","article-title":"Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks","volume":"38","author":"Zhang Chen","year":"2019","unstructured":"Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2019. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38, 11 (2019), 2072\u20132085.","journal-title":"IEEE Trans. Comput. Aided Des. Integr. Circuits Syst."},{"key":"e_1_3_1_51_2","first-page":"20:1\u201320:12","volume-title":"49th Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15\u201319, 2016","author":"Zhang Shijin","year":"2016","unstructured":"Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15\u201319, 2016. IEEE Computer Society, 20:1\u201320:12."},{"key":"e_1_3_1_52_2","series-title":"Computer Vision - ECCV 2018-15th European Conference, Munich, Germany, September 8\u201314, 2018, Proceedings, Part VIII","first-page":"191","volume":"11212","author":"Zhang Tianyun","year":"2018","unstructured":"Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A systematic DNN weight pruning framework using alternating direction method of multipliers. In Computer Vision - ECCV 2018-15th European Conference, Munich, Germany, September 8\u201314, 2018, Proceedings, Part VIII(Lecture Notes in Computer Science, Vol. 11212), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 191\u2013207."},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2019.2912916"},{"key":"e_1_3_1_54_2","first-page":"15","volume-title":"51st Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 2018, Fukuoka, Japan, October 20\u201324, 2018","author":"Zhou Xuda","year":"2018","unstructured":"Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. 2018. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software\/hardware approach. In 51st Annual IEEE\/ACM International Symposium on Microarchitecture, MICRO 2018, Fukuoka, Japan, October 20\u201324, 2018. IEEE Computer Society, 15\u201328."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2020.3002779"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643682","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643682","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:33Z","timestamp":1750291533000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643682"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,23]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3643682"],"URL":"https:\/\/doi.org\/10.1145\/3643682","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,23]]},"assertion":[{"value":"2023-09-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}