{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,28]],"date-time":"2026-07-28T10:48:26Z","timestamp":1785235706680,"version":"3.55.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T00:00:00Z","timestamp":1694217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,10,31]]},"abstract":"<jats:p>Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at the cost of substantial computation and energy consumption, making it difficult to be deployed on mobile and embedded devices. In CNNs, the compute-intensive convolutional layers are usually followed by a ReLU activation layer, which clamps negative outputs to zeros, resulting in large activation sparsity. By exploiting such sparsity in CNN models, we propose a software-hardware co-design BitSET, that aggressively saves energy during CNN inference. The bit-serial BitSET accelerator adopts a prediction-based bit-level early termination technique that terminates the ineffectual computation of negative outputs early. To assist the algorithm, we propose a novel weight encoding that allows more accurate predictions with fewer bits. BitSET leverages the bit-level computation reduction both in the predictive early termination algorithm and in the non-predictive, energy-efficient bit-serial architecture. Compared to UNPU, an energy-efficient bit-serial CNN accelerator, BitSET yields an average 1.5\u00d7 speedup and 1.4\u00d7 energy efficiency improvement with no accuracy loss due to a 48% reduction in bit-level computations. Relaxing the allowed accuracy loss to 1% increases the gains to an average of 1.6\u00d7 speedup and 1.4\u00d7 energy efficiency improvement.<\/jats:p>","DOI":"10.1145\/3609093","type":"journal-article","created":{"date-parts":[[2023,9,9]],"date-time":"2023-09-09T13:33:18Z","timestamp":1694266398000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9351-431X","authenticated-orcid":false,"given":"Yunjie","family":"Pan","sequence":"first","affiliation":[{"name":"University of Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2085-0312","authenticated-orcid":false,"given":"Jiecao","family":"Yu","sequence":"additional","affiliation":[{"name":"Facebook Inc., USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1167-3999","authenticated-orcid":false,"given":"Andrew","family":"Lukefahr","sequence":"additional","affiliation":[{"name":"Indiana University, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5894-8342","authenticated-orcid":false,"given":"Reetuparna","family":"Das","sequence":"additional","affiliation":[{"name":"University of Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0438-0616","authenticated-orcid":false,"given":"Scott","family":"Mahlke","sequence":"additional","affiliation":[{"name":"University of Michigan, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,9,9]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"662","volume-title":"ISCA\u201918","author":"Akhlaghi Vahideh","year":"2018","unstructured":"Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. Snapea: Predictive early activation for reducing computation in deep convolutional neural networks. In ISCA\u201918. IEEE, 662\u2013673."},{"key":"e_1_3_1_3_2","first-page":"382","volume-title":"MICRO\u201917","author":"Albericio Jorge","year":"2017","unstructured":"Jorge Albericio, Alberto Delm\u00e1s, Patrick Judd, Sayeh Sharify, Gerard O\u2019Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In MICRO\u201917. 382\u2013394."},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1007\/978-3-642-25446-8_4","volume-title":"International Workshop on Human Behavior Understanding","author":"Baccouche Moez","year":"2011","unstructured":"Moez Baccouche, Franck Mamalet, Christian Wolf, Christophe Garcia, and Atilla Baskurt. 2011. Sequential deep learning for human action recognition. In International Workshop on Human Behavior Understanding. Springer, 29\u201339."},{"key":"e_1_3_1_5_2","first-page":"527","volume-title":"International Conference on Machine Learning","author":"Bolukbasi Tolga","year":"2017","unstructured":"Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. In International Conference on Machine Learning. PMLR, 527\u2013536."},{"issue":"1","key":"e_1_3_1_6_2","first-page":"127","article-title":"Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks","volume":"52","author":"Chen Yu-Hsin","year":"2016","unstructured":"Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. JSSC 52, 1 (2016), 127\u2013138.","journal-title":"JSSC"},{"key":"e_1_3_1_7_2","article-title":"Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016).","journal-title":"arXiv preprint arXiv:1602.02830"},{"key":"e_1_3_1_8_2","first-page":"749","volume-title":"ASPLOS\u201919","author":"Lascorz Alberto Delmas","year":"2019","unstructured":"Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, and Andreas Moshovos. 2019. Bit-tactical: A software\/hardware approach to exploiting value and bit sparsity in neural networks. In ASPLOS\u201919. 749\u2013763."},{"key":"e_1_3_1_9_2","first-page":"5840","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Dong Xuanyi","year":"2017","unstructured":"Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. 2017. More is less: A more complicated network with less inference complexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5840\u20135848."},{"key":"e_1_3_1_10_2","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1145\/2554688.2554785","volume-title":"Proceedings of the 2014 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays","author":"Dorrance Richard","year":"2014","unstructured":"Richard Dorrance, Fengbo Ren, and Dejan Markovi\u0107. 2014. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs. In Proceedings of the 2014 ACM\/SIGDA International Symposium on Field-programmable Gate Arrays. 161\u2013170."},{"key":"e_1_3_1_11_2","volume-title":"Deep Learning","author":"Goodfellow Ian","year":"2016","unstructured":"Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT press."},{"key":"e_1_3_1_12_2","article-title":"Hardware-oriented approximation of convolutional neural networks","author":"Gysel Philipp","year":"2016","unstructured":"Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:1604.03168 (2016).","journal-title":"arXiv preprint arXiv:1604.03168"},{"issue":"3","key":"e_1_3_1_13_2","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1145\/3007787.3001163","article-title":"EIE: Efficient inference engine on compressed deep neural network","volume":"44","author":"Han Song","year":"2016","unstructured":"Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44, 3 (2016), 243\u2013254.","journal-title":"ACM SIGARCH Computer Architecture News"},{"key":"e_1_3_1_14_2","article-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding","author":"Han Song","year":"2015","unstructured":"Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).","journal-title":"arXiv preprint arXiv:1510.00149"},{"key":"e_1_3_1_15_2","first-page":"2961","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"He Kaiming","year":"2017","unstructured":"Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961\u20132969."},{"key":"e_1_3_1_16_2","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770\u2013778."},{"issue":"1","key":"e_1_3_1_17_2","first-page":"10882","article-title":"Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks","volume":"22","author":"Hoefler Torsten","year":"2021","unstructured":"Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. The Journal of Machine Learning Research 22, 1 (2021), 10882\u201311005.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_1_18_2","first-page":"10","volume-title":"ISSCC\u201914","author":"Horowitz Mark","year":"2014","unstructured":"Mark Horowitz. 2014. 1.1 computing\u2019s energy problem (and what we can do about it). In ISSCC\u201914. IEEE, 10\u201314."},{"key":"e_1_3_1_19_2","first-page":"1314","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Howard Andrew","year":"2019","unstructured":"Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et\u00a0al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1314\u20131324."},{"key":"e_1_3_1_20_2","article-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).","journal-title":"arXiv preprint arXiv:1502.03167"},{"key":"e_1_3_1_21_2","first-page":"1","volume-title":"MICRO\u201916","author":"Judd Patrick","year":"2016","unstructured":"Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In MICRO\u201916. IEEE, 1\u201312."},{"key":"e_1_3_1_22_2","first-page":"6399","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Kirillov Alexander","year":"2019","unstructured":"Alexander Kirillov, Ross Girshick, Kaiming He, and Piotr Doll\u00e1r. 2019. Panoptic feature pyramid networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 6399\u20136408."},{"key":"e_1_3_1_23_2","article-title":"Quantizing deep convolutional networks for efficient inference: A whitepaper","author":"Krishnamoorthi Raghuraman","year":"2018","unstructured":"Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).","journal-title":"arXiv preprint arXiv:1806.08342"},{"key":"e_1_3_1_24_2","unstructured":"Alex Krizhevsky Geoffrey Hinton et\u00a0al. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_1_25_2","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097\u20131105."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_1_27_2","first-page":"139","volume-title":"ICS\u201918","author":"Lee Dongwoo","year":"2018","unstructured":"Dongwoo Lee, Sungbum Kang, and Kiyoung Choi. 2018. ComPEND: Computation pruning through early negative detection for ReLU in a deep neural network accelerator. In ICS\u201918. 139\u2013148."},{"issue":"1","key":"e_1_3_1_28_2","first-page":"173","article-title":"UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision","volume":"54","author":"Lee Jinmook","year":"2018","unstructured":"Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2018. UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision. JSSC 54, 1 (2018), 173\u2013185.","journal-title":"JSSC"},{"key":"e_1_3_1_29_2","first-page":"1","volume-title":"2017 IEEE International Symposium on Circuits and Systems (ISCAS\u201917)","author":"Lin Yingyan","year":"2017","unstructured":"Yingyan Lin, Charbel Sakr, Yongjune Kim, and Naresh Shanbhag. 2017. Predictivenet: An energy-efficient convolutional neural network via zero prediction. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS\u201917). IEEE, 1\u20134."},{"key":"e_1_3_1_30_2","first-page":"738","volume-title":"MICRO\u201920","author":"Liu Liu","year":"2020","unstructured":"Liu Liu, Zheng Qu, Lei Deng, Fengbin Tu, Shuangchen Li, Xing Hu, Zhenyu Gu, Yufei Ding, and Yuan Xie. 2020. Duet: Boosting deep neural network efficiency on dual-module architecture. In MICRO\u201920. IEEE, 738\u2013750."},{"key":"e_1_3_1_31_2","article-title":"Pruning convolutional neural networks for resource efficient inference","author":"Molchanov Pavlo","year":"2016","unstructured":"Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 (2016).","journal-title":"arXiv preprint arXiv:1611.06440"},{"key":"e_1_3_1_32_2","first-page":"807","volume-title":"Proceedings of the 27th International Conference on Machine Learning (ICML\u201910)","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML\u201910). 807\u2013814."},{"issue":"2","key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1145\/3140659.3080254","article-title":"Scnn: An accelerator for compressed-sparse convolutional neural networks","volume":"45","author":"Parashar Angshuman","year":"2017","unstructured":"Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. Scnn: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Computer Architecture News 45, 2 (2017), 27\u201340.","journal-title":"ACM SIGARCH Computer Architecture News"},{"key":"e_1_3_1_34_2","first-page":"8024","volume-title":"Advances in Neural Information Processing Systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024\u20138035."},{"key":"e_1_3_1_35_2","first-page":"58","volume-title":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Qin Eric","year":"2020","unstructured":"Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920). IEEE, 58\u201370."},{"key":"e_1_3_1_36_2","article-title":"Searching for activation functions","author":"Ramachandran Prajit","year":"2017","unstructured":"Prajit Ramachandran, Barret Zoph, and Quoc V Le. 2017. Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017).","journal-title":"arXiv preprint arXiv:1710.05941"},{"key":"e_1_3_1_37_2","first-page":"525","volume-title":"European Conference on Computer Vision","author":"Rastegari Mohammad","year":"2016","unstructured":"Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525\u2013542."},{"key":"e_1_3_1_38_2","first-page":"779","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Redmon Joseph","year":"2016","unstructured":"Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779\u2013788."},{"key":"e_1_3_1_39_2","article-title":"Yolov3: An incremental improvement","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).","journal-title":"arXiv preprint arXiv:1804.02767"},{"key":"e_1_3_1_40_2","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/4937.001.0001","volume-title":"Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks","author":"Reed Russell","year":"1999","unstructured":"Russell Reed and Robert J. MarksII. 1999. Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks. Mit Press."},{"key":"e_1_3_1_41_2","doi-asserted-by":"crossref","first-page":"925","DOI":"10.1145\/3297858.3304076","volume-title":"Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems","author":"Ren Ao","year":"2019","unstructured":"Ao Ren, Tianyun Zhang, Shaokai Ye, Jiayu Li, Wenyao Xu, Xuehai Qian, Xue Lin, and Yanzhi Wang. 2019. Admm-nn: An algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 925\u2013938."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_1_43_2","first-page":"4510","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Sandler Mark","year":"2018","unstructured":"Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510\u20134520."},{"key":"e_1_3_1_44_2","first-page":"304","volume-title":"ISCA\u201919","author":"Sharify Sayeh","year":"2019","unstructured":"Sayeh Sharify, Alberto Delmas Lascorz, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Dylan Malone Stuart, Zissis Poulos, and Andreas Moshovos. 2019. Laconic deep learning inference acceleration. In ISCA\u201919. IEEE, 304\u2013317."},{"key":"e_1_3_1_45_2","first-page":"1","volume-title":"2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916)","author":"Sharma Hardik","year":"2016","unstructured":"Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201916). IEEE, 1\u201312."},{"key":"e_1_3_1_46_2","first-page":"752","volume-title":"ISCA\u201918","author":"Song Mingcong","year":"2018","unstructured":"Mingcong Song, Jiechen Zhao, Yang Hu, Jiaqi Zhang, and Tao Li. 2018. Prediction based execution on deep neural networks. In ISCA\u201918. IEEE, 752\u2013763."},{"key":"e_1_3_1_47_2","first-page":"1010","volume-title":"2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920)","author":"Song Zhuoran","year":"2020","unstructured":"Zhuoran Song, Bangqi Fu, Feiyang Wu, Zhaoming Jiang, Li Jiang, Naifeng Jing, and Xiaoyao Liang. 2020. Drq: Dynamic region-based quantization for deep neural network acceleration. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA\u201920). IEEE, 1010\u20131021."},{"key":"e_1_3_1_48_2","article-title":"Striving for simplicity: The all convolutional net","author":"Springenberg Jost Tobias","year":"2014","unstructured":"Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).","journal-title":"arXiv preprint arXiv:1412.6806"},{"key":"e_1_3_1_49_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20139."},{"key":"e_1_3_1_50_2","first-page":"21","article-title":"To bridge neural network design and real-world performance: A behaviour study for neural networks","volume":"3","author":"Tang Xiaohu","year":"2021","unstructured":"Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, and Yunxin Liu. 2021. To bridge neural network design and real-world performance: A behaviour study for neural networks. Proceedings of Machine Learning and Systems 3 (2021), 21\u201337.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_51_2","article-title":"Attention is all you need","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762 (2017).","journal-title":"arXiv preprint arXiv:1706.03762"},{"key":"e_1_3_1_52_2","first-page":"8612","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Kuan","year":"2019","unstructured":"Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8612\u20138620."},{"key":"e_1_3_1_53_2","first-page":"409","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Wang Xin","year":"2018","unstructured":"Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. 2018. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). 409\u2013424."},{"key":"e_1_3_1_54_2","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/HPCA.2019.00029","volume-title":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919)","author":"Wang Xiaowei","year":"2019","unstructured":"Xiaowei Wang, Jiecao Yu, Charles Augustine, Ravi Iyer, and Reetuparna Das. 2019. Bit prudent in-cache acceleration of deep convolutional neural networks. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919). IEEE, 81\u201393."},{"key":"e_1_3_1_55_2","article-title":"Empirical evaluation of rectified activations in convolutional network","author":"Xu Bing","year":"2015","unstructured":"Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015).","journal-title":"arXiv preprint arXiv:1505.00853"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080215"},{"key":"e_1_3_1_57_2","first-page":"365","volume-title":"ECCV\u201918","author":"Zhang Dongqing","year":"2018","unstructured":"Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In ECCV\u201918. 365\u2013382."},{"key":"e_1_3_1_58_2","first-page":"292","volume-title":"2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA\u201919)","author":"Zhang Jiaqi","year":"2019","unstructured":"Jiaqi Zhang, Xiangru Chen, Mingcong Song, and Tao Li. 2019. Eager pruning: Algorithm and architecture support for fast training of deep neural networks. In 2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA\u201919). IEEE, 292\u2013303."},{"issue":"2","key":"e_1_3_1_59_2","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1109\/JSSC.2020.3043870","article-title":"Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference","volume":"56","author":"Zhang Jie-Fang","year":"2020","unstructured":"Jie-Fang Zhang, Ching-En Lee, Chester Liu, Yakun Sophia Shao, Stephen W. Keckler, and Zhengya Zhang. 2020. Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference. IEEE Journal of Solid-State Circuits 56, 2 (2020), 636\u2013647.","journal-title":"IEEE Journal of Solid-State Circuits"},{"key":"e_1_3_1_60_2","article-title":"Incremental network quantization: Towards lossless cnns with low-precision weights","author":"Zhou Aojun","year":"2017","unstructured":"Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. International Conference on Learning Representations (ICLR) (2017).","journal-title":"International Conference on Learning Representations (ICLR)"},{"key":"e_1_3_1_61_2","article-title":"DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients","volume":"1606","author":"Zhou Shuchang","year":"2016","unstructured":"Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs\/1606.06160 (2016).","journal-title":"CoRR"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609093","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3609093","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:58Z","timestamp":1750182538000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3609093"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,9]]},"references-count":60,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2023,10,31]]}},"alternative-id":["10.1145\/3609093"],"URL":"https:\/\/doi.org\/10.1145\/3609093","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,9]]},"assertion":[{"value":"2023-03-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-30","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}