{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T00:10:16Z","timestamp":1768003816563,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":47,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Science Fund for Creative Research Groups of the National Natural Science Foundation of China","award":["61521092"],"award-info":[{"award-number":["61521092"]}]},{"name":"National Key R&D Program of China","award":["2017YFB1003103"],"award-info":[{"award-number":["2017YFB1003103"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3472464","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:46:04Z","timestamp":1633459564000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs"],"prefix":"10.1145","author":[{"given":"Guangli","family":"Li","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhen","family":"Jia","sequence":"additional","affiliation":[{"name":"Amazon, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaobing","family":"Feng","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yida","family":"Wang","sequence":"additional","affiliation":[{"name":"Amazon, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3412380"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2996864"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.1700789"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00363"},{"key":"e_1_3_2_1_5_1","unstructured":"Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830(2016).  Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830(2016)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of Machine Learning and Systems. 1\u201316","author":"Fern\u00e1ndez-Marqu\u00e9s Javier","year":"2020","unstructured":"Javier Fern\u00e1ndez-Marqu\u00e9s , Paul\u00a0 N. Whatmough , Andrew Mundy , and Matthew Mattina . 2020 . Searching for Winograd-aware Quantized Networks . In Proceedings of Machine Learning and Systems. 1\u201316 . Javier Fern\u00e1ndez-Marqu\u00e9s, Paul\u00a0N. Whatmough, Andrew Mundy, and Matthew Mattina. 2020. Searching for Winograd-aware Quantized Networks. In Proceedings of Machine Learning and Systems. 1\u201316."},{"key":"e_1_3_2_1_8_1","unstructured":"Song Han Huizi Mao and William\u00a0J Dally. 2015. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. arXiv preprint arXiv:1510.00149(2015).  Song Han Huizi Mao and William\u00a0J Dally. 2015. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. arXiv preprint arXiv:1510.00149(2015)."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5838"},{"key":"e_1_3_2_1_11_1","volume-title":"Retrieved","year":"2021","unstructured":"Intel. 2021 . Intrinsics Guide . Retrieved March 29, 2021 from https:\/\/software.intel.com\/sites\/landingpage\/IntrinsicsGuide\/ Intel. 2021. Intrinsics Guide. Retrieved March 29, 2021 from https:\/\/software.intel.com\/sites\/landingpage\/IntrinsicsGuide\/"},{"key":"e_1_3_2_1_12_1","volume-title":"Retrieved","year":"2021","unstructured":"Intel. 2021 . Introduction to Intel Deep Learning Boost on Second Generation Intel Xeon Scalable Processors . Retrieved March 24, 2021 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/introduction-to-intel-deep-learning-boost-on-second-generation-intel-xeon-scalable.html Intel. 2021. Introduction to Intel Deep Learning Boost on Second Generation Intel Xeon Scalable Processors. Retrieved March 24, 2021 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/introduction-to-intel-deep-learning-boost-on-second-generation-intel-xeon-scalable.html"},{"key":"e_1_3_2_1_13_1","volume-title":"Retrieved","year":"2021","unstructured":"Intel. 2021 . oneAPI Deep Neural Network Library (oneDNN) . Retrieved February 27, 2021 from https:\/\/github.com\/oneapi-src\/oneDNN Intel. 2021. oneAPI Deep Neural Network Library (oneDNN). Retrieved February 27, 2021 from https:\/\/github.com\/oneapi-src\/oneDNN"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_2_1_15_1","unstructured":"Animesh Jain Shoubhik Bhattacharya Masahiro Masuda Vin Sharma and Yida Wang. 2020. Efficient Execution of Quantized Deep Learning Models: A Compiler Approach. (2020). arxiv:2006.10226\u00a0[cs.DC]  Animesh Jain Shoubhik Bhattacharya Masahiro Masuda Vin Sharma and Yida Wang. 2020. Efficient Execution of Quantized Deep Learning Models: A Compiler Approach. (2020). arxiv:2006.10226\u00a0[cs.DC]"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2973144"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178496"},{"key":"e_1_3_2_1_18_1","unstructured":"Zhen Jia Aleksandar Zlateski Fredo Durand and Kai Li. 2018. Towards Optimal Winograd Convolution on Manycores.  Zhen Jia Aleksandar Zlateski Fredo Durand and Kai Li. 2018. Towards Optimal Winograd Convolution on Manycores."},{"key":"e_1_3_2_1_19_1","unstructured":"Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342(2018).  Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342(2018)."},{"key":"e_1_3_2_1_20_1","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey\u00a0 E Hinton . 2012 . Imagenet classification with deep convolutional neural networks . Advances in Neural Information Processing Systems 25 (2012), 1097 \u2013 1105 . Alex Krizhevsky, Ilya Sutskever, and Geoffrey\u00a0E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 1097\u20131105.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_21_1","volume-title":"Information theory and statistics","author":"Kullback Solomon","unstructured":"Solomon Kullback . 1997. Information theory and statistics . Courier Corporation . Solomon Kullback. 1997. Information theory and statistics. Courier Corporation."},{"key":"e_1_3_2_1_22_1","volume-title":"Retrieved","author":"Lavin Andrew","year":"2021","unstructured":"Andrew Lavin . 2021 . wincnn . Retrieved February 27, 2021 from https:\/\/github.com\/andravin\/wincnn Andrew Lavin. 2021. wincnn. Retrieved February 27, 2021 from https:\/\/github.com\/andravin\/wincnn"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054562"},{"key":"e_1_3_2_1_25_1","volume-title":"Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization. IEEE, 90\u2013102","author":"Li Guangli","year":"2021","unstructured":"Guangli Li , Jingling Xue , Lei Liu , Xueying Wang , Xiu Ma , Xiao Dong , Jiansong Li , and Xiaobing Feng . 2021 . Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization. IEEE, 90\u2013102 . Guangli Li, Jingling Xue, Lei Liu, Xueying Wang, Xiu Ma, Xiao Dong, Jiansong Li, and Xiaobing Feng. 2021. Unleashing the Low-Precision Computation Potential of Tensor Cores on GPUs. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization. IEEE, 90\u2013102."},{"key":"e_1_3_2_1_26_1","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Liu Yizhi","year":"2019","unstructured":"Yizhi Liu , Yao Wang , Ruofei Yu , Mu Li , Vin Sharma , and Yida Wang . 2019 . Optimizing CNN model inference on cpus . In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . 1025\u20131040. Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN model inference on cpus. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 1025\u20131040."},{"key":"e_1_3_2_1_27_1","volume-title":"2nd International Conference on Learning Representations.","author":"Mathieu Michael","year":"2014","unstructured":"Michael Mathieu , Mikael Henaff , and Yann LeCun . 2014 . Fast training of convolutional networks through FFTS: International Conference on Learning Representations . In 2nd International Conference on Learning Representations. Michael Mathieu, Mikael Henaff, and Yann LeCun. 2014. Fast training of convolutional networks through FFTS: International Conference on Learning Representations. In 2nd International Conference on Learning Representations."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387549"},{"key":"e_1_3_2_1_29_1","unstructured":"Szymon Migacz. 2017. 8-bit inference with tensorrt. In GPU technology conference Vol.\u00a02. 5.  Szymon Migacz. 2017. 8-bit inference with tensorrt. In GPU technology conference Vol.\u00a02. 5."},{"key":"e_1_3_2_1_30_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2021","unstructured":"NVIDIA. 2021 . CUDA C++ Programming Guide . Retrieved March 29, 2021 from https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html NVIDIA. 2021. CUDA C++ Programming Guide. Retrieved March 29, 2021 from https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html"},{"key":"e_1_3_2_1_31_1","volume-title":"Computer Vision - ECCV 2018 - 15th European Conference. 608\u2013624.","author":"Park Eunhyeok","unstructured":"Eunhyeok Park , Sungjoo Yoo , and Peter Vajda . 2018. Value-Aware Quantization for Training and Inference of Neural Networks . In Computer Vision - ECCV 2018 - 15th European Conference. 608\u2013624. Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. 2018. Value-Aware Quantization for Training and Inference of Neural Networks. In Computer Vision - ECCV 2018 - 15th European Conference. 608\u2013624."},{"key":"e_1_3_2_1_32_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas K\u00f6pf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In Advances in Neural Information Processing Systems. 8024\u20138035. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8024\u20138035."},{"key":"e_1_3_2_1_33_1","volume-title":"Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv preprint arXiv:1612.05360(2016).","author":"Quan Tran\u00a0Minh","year":"2016","unstructured":"Tran\u00a0Minh Quan , David\u00a0 GC Hildebrand , and Won-Ki Jeong . 2016 . Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv preprint arXiv:1612.05360(2016). Tran\u00a0Minh Quan, David\u00a0GC Hildebrand, and Won-Ki Jeong. 2016. Fusionnet: A deep fully residual convolutional neural network for image segmentation in connectomics. arXiv preprint arXiv:1612.05360(2016)."},{"key":"e_1_3_2_1_34_1","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767(2018).  Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767(2018)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_36_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations.","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_1_38_1","volume-title":"Retrieved","year":"2021","unstructured":"Tencent. 2021 . ncnn . Retrieved February 27, 2021 from https:\/\/github.com\/Tencent\/ncnn Tencent. 2021. ncnn. Retrieved February 27, 2021 from https:\/\/github.com\/Tencent\/ncnn"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807631"},{"key":"e_1_3_2_1_40_1","volume-title":"UNIT: Unifying Tensorized Instruction Compilation. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization. IEEE, 77\u201389","author":"Weng Jian","year":"2021","unstructured":"Jian Weng , Animesh Jain , Jie Wang , Leyuan Wang , Yida Wang , and Tony Nowatzki . 2021 . UNIT: Unifying Tensorized Instruction Compilation. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization. IEEE, 77\u201389 . Jian Weng, Animesh Jain, Jie Wang, Leyuan Wang, Yida Wang, and Tony Nowatzki. 2021. UNIT: Unifying Tensorized Instruction Compilation. In 2021 IEEE\/ACM International Symposium on Code Generation and Optimization. IEEE, 77\u201389."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Shmuel Winograd. 1980. Arithmetic complexity of computations. Vol.\u00a033. Siam.  Shmuel Winograd. 1980. Arithmetic complexity of computations. Vol.\u00a033. Siam.","DOI":"10.1137\/1.9781611970364"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332466.3374520"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3129393"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00306"},{"key":"e_1_3_2_1_45_1","unstructured":"Zhewei Yao Zhen Dong Zhangcheng Zheng Amir Gholami Jiali Yu Eric Tan Leyuan Wang Qijing Huang Yida Wang Michael\u00a0W Mahoney 2020. HAWQV3: Dyadic Neural Network Quantization. arXiv preprint arXiv:2011.10680(2020).  Zhewei Yao Zhen Dong Zhangcheng Zheng Amir Gholami Jiali Yu Eric Tan Leyuan Wang Qijing Huang Yida Wang Michael\u00a0W Mahoney 2020. HAWQV3: Dyadic Neural Network Quantization. arXiv preprint arXiv:2011.10680(2020)."},{"key":"e_1_3_2_1_46_1","unstructured":"Aleksandar Zlateski Zhen Jia Kai Li and Fredo Durand. 2018. A Deeper Look at FFT and Winograd Convolutions.  Aleksandar Zlateski Zhen Jia Kai Li and Fredo Durand. 2018. A Deeper Look at FFT and Winograd Convolutions."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3330345.3330382"}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","location":"Lemont IL USA","acronym":"ICPP 2021"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472464","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3472464","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:11Z","timestamp":1750193291000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472464"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":47,"alternative-id":["10.1145\/3472456.3472464","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3472464","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}