{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:44:32Z","timestamp":1772725472233,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":90,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,11]],"date-time":"2022-06-11T00:00:00Z","timestamp":1654905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Sciences and Engineering Research Council of Canada (NSERC)","award":["NETGP485577-15"],"award-info":[{"award-number":["NETGP485577-15"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,18]]},"DOI":"10.1145\/3470496.3527404","type":"proceedings-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T19:06:01Z","timestamp":1654023961000},"page":"536-551","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Anticipating and eliminating redundant computations in accelerated sparse training"],"prefix":"10.1145","author":[{"given":"Jonathan S.","family":"Lew","sequence":"first","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"}]},{"given":"Yunpeng","family":"Liu","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"}]},{"given":"Wenyi","family":"Gong","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"}]},{"given":"Negar","family":"Goli","sequence":"additional","affiliation":[{"name":"Huawei Technologies, Vancouver, BC, Canada"}]},{"given":"R. David","family":"Evans","sequence":"additional","affiliation":[{"name":"Borealis AI, Vancouver, BC, Canada"}]},{"given":"Tor M.","family":"Aamodt","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, BC, Canada"}]}],"member":"320","published-online":{"date-parts":[[2022,6,11]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2020. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf.  2020. NVIDIA A100 Tensor Core GPU Architecture. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123982"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001138"},{"key":"e_1_3_2_1_4_1","first-page":"12449","article-title":"Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations","volume":"33","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski , Yuhao Zhou , Abdelrahman Mohamed , and Michael Auli . 2020 . Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations . Advances in Neural Information Processing Systems 33 (2020), 12449 -- 12460 . Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. Advances in Neural Information Processing Systems 33 (2020), 12449--12460.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_5_1","unstructured":"Christian Bartz. 2021. chainer-transformer.  Christian Bartz. 2021. chainer-transformer."},{"key":"e_1_3_2_1_6_1","volume-title":"Davide Del Testa","author":"Bojarski Mariusz","year":"2016","unstructured":"Mariusz Bojarski , Davide Del Testa , Daniel Dworakowski, Bernhard Firner , Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016 . End to End Learning for Self-Driving Cars . arXiv:1604.07316 [cs] (April 2016). arXiv:1604.07316 [cs] Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, and Karol Zieba. 2016. End to End Learning for Self-Driving Cars. arXiv:1604.07316 [cs] (April 2016). arXiv:1604.07316 [cs]"},{"key":"e_1_3_2_1_7_1","volume-title":"Training Deep Nets with Sublinear Memory Cost. arXiv:1604.06174 [cs] (April","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen , Bing Xu , Chiyuan Zhang , and Carlos Guestrin . 2016. Training Deep Nets with Sublinear Memory Cost. arXiv:1604.06174 [cs] (April 2016 ). arXiv:1604.06174 [cs] Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. arXiv:1604.06174 [cs] (April 2016). arXiv:1604.06174 [cs]"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"e_1_3_2_1_11_1","volume-title":"Neural Gradients Are Near-Lognormal: Improved Quantized and Sparse Training. In International Conference on Learning Representations.","author":"Chmiel Brian","year":"2020","unstructured":"Brian Chmiel , Liad Ben-Uri , Moran Shkolnik , Elad Hoffer , Ron Banner , and Daniel Soudry . 2020 . Neural Gradients Are Near-Lognormal: Improved Quantized and Sparse Training. In International Conference on Learning Representations. Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel Soudry. 2020. Neural Gradients Are Near-Lognormal: Improved Quantized and Sparse Training. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_12_1","volume-title":"Bridging the Accuracy Gap for 2-Bit Quantized Neural Networks (QNN). arXiv:1807.06964 [cs] (July","author":"Choi Jungwook","year":"2018","unstructured":"Jungwook Choi , Pierce I.- Jen Chuang , Zhuo Wang , Swagath Venkataramani , Vijayalakshmi Srinivasan , and Kailash Gopalakrishnan . 2018. Bridging the Accuracy Gap for 2-Bit Quantized Neural Networks (QNN). arXiv:1807.06964 [cs] (July 2018 ). arXiv:1807.06964 [cs] Jungwook Choi, Pierce I.-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Bridging the Accuracy Gap for 2-Bit Quantized Neural Networks (QNN). arXiv:1807.06964 [cs] (July 2018). arXiv:1807.06964 [cs]"},{"key":"e_1_3_2_1_13_1","volume-title":"BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. arXiv:1511.00363 [cs] (April","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux , Yoshua Bengio , and Jean-Pierre David . 2016. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. arXiv:1511.00363 [cs] (April 2016 ). arXiv:1511.00363 [cs] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2016. BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations. arXiv:1511.00363 [cs] (April 2016). arXiv:1511.00363 [cs]"},{"key":"e_1_3_2_1_14_1","volume-title":"Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv:1602.02830 [cs] (March","author":"Courbariaux Matthieu","year":"2016","unstructured":"Matthieu Courbariaux , Itay Hubara , Daniel Soudry , Ran El-Yaniv , and Yoshua Bengio . 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv:1602.02830 [cs] (March 2016 ). arXiv:1602.02830 [cs] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv:1602.02830 [cs] (March 2016). arXiv:1602.02830 [cs]"},{"key":"e_1_3_2_1_15_1","volume-title":"Aamodt","author":"Dally William J.","year":"2016","unstructured":"William J. Dally , R. Curtis Harting , and Tor M . Aamodt . 2016 . Digital Design Using VHDL: A Systems Approach (1st ed.). Cambridge University Press , USA. William J. Dally, R. Curtis Harting, and Tor M. Aamodt. 2016. Digital Design Using VHDL: A Systems Approach (1st ed.). Cambridge University Press, USA."},{"key":"e_1_3_2_1_16_1","volume-title":"Mixed Precision Training of Convolutional Neural Networks Using Integer Operations. In International Conference on Learning Representations.","author":"Das Dipankar","year":"2018","unstructured":"Dipankar Das , Naveen Mellempudi , Dheevatsa Mudigere , Dhiraj Kalamkar , Sasikanth Avancha , Kunal Banerjee , Srinivas Sridharan , Karthik Vaidyanathan , Bharat Kaul , Evangelos Georganas , Alexander Heinecke , Pradeep Dubey , Jesus Corbal , Nikita Shustrov , Roma Dubtsov , Evarist Fomenko , and Vadim Pirogov . 2018 . Mixed Precision Training of Convolutional Neural Networks Using Integer Operations. In International Conference on Learning Representations. Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov, Roma Dubtsov, Evarist Fomenko, and Vadim Pirogov. 2018. Mixed Precision Training of Convolutional Neural Networks Using Integer Operations. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304041"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00090"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_20_1","volume-title":"Sparse Networks from Scratch: Faster Training without Losing Performance. arXiv:1907.04840 [cs, stat] (Aug","author":"Dettmers Tim","year":"2019","unstructured":"Tim Dettmers and Luke Zettlemoyer . 2019. Sparse Networks from Scratch: Faster Training without Losing Performance. arXiv:1907.04840 [cs, stat] (Aug . 2019 ). arXiv:1907.04840 [cs, stat] Tim Dettmers and Luke Zettlemoyer. 2019. Sparse Networks from Scratch: Faster Training without Losing Performance. arXiv:1907.04840 [cs, stat] (Aug. 2019). arXiv:1907.04840 [cs, stat]"},{"key":"e_1_3_2_1_21_1","volume-title":"BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arXiv:1810.04805 [cs] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arXiv:1810.04805 [cs]"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00075"},{"key":"e_1_3_2_1_23_1","volume-title":"Rigging the Lottery: Making All Tickets Winners. In International Conference on Machine Learning. PMLR, 2943--2952","author":"Evci Utku","year":"2020","unstructured":"Utku Evci , Trevor Gale , Jacob Menick , Pablo Samuel Castro , and Erich Elsen . 2020 . Rigging the Lottery: Making All Tickets Winners. In International Conference on Machine Learning. PMLR, 2943--2952 . Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the Lottery: Making All Tickets Winners. In International Conference on Machine Learning. PMLR, 2943--2952."},{"key":"e_1_3_2_1_24_1","unstructured":"Andrew Feldman. 2020. Cerebras Wafer Scale Engine: Why We Need Big Chips for Deep Learning.  Andrew Feldman. 2020. Cerebras Wafer Scale Engine: Why We Need Big Chips for Deep Learning."},{"key":"e_1_3_2_1_25_1","volume-title":"Trainable Neural Networks. arXiv:1803.03635 [cs] (March","author":"Frankle Jonathan","year":"2019","unstructured":"Jonathan Frankle and Michael Carbin . 2019. The Lottery Ticket Hypothesis: Finding Sparse , Trainable Neural Networks. arXiv:1803.03635 [cs] (March 2019 ). arXiv:1803.03635 [cs] Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. arXiv:1803.03635 [cs] (March 2019). arXiv:1803.03635 [cs]"},{"key":"e_1_3_2_1_26_1","unstructured":"J.P. Fricker and A. Hock. 2019. Building a Wafer-Scale Deep Learning System: Lessons Learned.  J.P. Fricker and A. Hock. 2019. Building a Wafer-Scale Deep Learning System: Lessons Learned."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00725"},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1548--1558","author":"Goli Negar","unstructured":"Negar Goli and Tor M. Aamodt . 2020. ReSprop: Reuse Sparsified Backpropagation . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1548--1558 . Negar Goli and Tor M. Aamodt. 2020. ReSprop: Reuse Sparsified Backpropagation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1548--1558."},{"key":"e_1_3_2_1_29_1","volume-title":"Proceedings of Machine Learning and Systems 1 (April","author":"Golub Maximilian","year":"2019","unstructured":"Maximilian Golub , Guy Lemieux , and Mieszko Lis . 2019 . Full Deep Neural Network Training On A Pruned Weight Budget . Proceedings of Machine Learning and Systems 1 (April 2019), 252--263. Maximilian Golub, Guy Lemieux, and Mieszko Lis. 2019. Full Deep Neural Network Training On A Pruned Weight Budget. Proceedings of Machine Learning and Systems 1 (April 2019), 252--263."},{"key":"e_1_3_2_1_30_1","volume-title":"Grosse","author":"Gomez Aidan N.","year":"2017","unstructured":"Aidan N. Gomez , Mengye Ren , Raquel Urtasun , and Roger B . Grosse . 2017 . The Reversible Residual Network: Backpropagation Without Storing Activations. In NIPS. Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse. 2017. The Reversible Residual Network: Backpropagation Without Storing Activations. In NIPS."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358291"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2019.00009"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_2_1_34_1","volume-title":"Dally","author":"Han Song","year":"2016","unstructured":"Song Han , Huizi Mao , and William J . Dally . 2016 . Deep Compression : Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding . arXiv:1510.00149 [cs] (Feb. 2016). arXiv:1510.00149 [cs] Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 [cs] (Feb. 2016). arXiv:1510.00149 [cs]"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/309"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Huang Gao","unstructured":"Gao Huang , Zhuang Liu , Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_1_39_1","volume-title":"Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen.","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang , Youlong Cheng , Ankur Bapna , Orhan Firat , Mia Xu Chen , Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019 . GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism . arXiv:1811.06965 [cs] (July 2019). arXiv:1811.06965 [cs] Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. arXiv:1811.06965 [cs] (July 2019). arXiv:1811.06965 [cs]"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654984"},{"key":"e_1_3_2_1_41_1","first-page":"1","article-title":"Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations","volume":"18","author":"Hubara Itay","year":"2018","unstructured":"Itay Hubara , Matthieu Courbariaux , Daniel Soudry , Ran El-Yaniv , and Yoshua Bengio . 2018 . Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations . Journal of Machine Learning Research 18 , 187 (2018), 1 -- 30 . Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2018. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. Journal of Machine Learning Research 18, 187 (2018), 1--30.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_42_1","volume-title":"SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and &lt;0.5MB Model Size. arXiv:1602.07360 [cs] (Nov","author":"Iandola Forrest N.","year":"2016","unstructured":"Forrest N. Iandola , Song Han , Matthew W. Moskewicz , Khalid Ashraf , William J. Dally , and Kurt Keutzer . 2016. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and &lt;0.5MB Model Size. arXiv:1602.07360 [cs] (Nov . 2016 ). arXiv:1602.07360 [cs] Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and &lt;0.5MB Model Size. arXiv:1602.07360 [cs] (Nov. 2016). arXiv:1602.07360 [cs]"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00010"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3360307"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_1_46_1","unstructured":"Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009).  Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. (2009)."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304028"},{"key":"e_1_3_2_1_48_1","volume-title":"Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In International Conference on Machine Learning. PMLR, 5533--5543","author":"Kurtz Mark","year":"2020","unstructured":"Mark Kurtz , Justin Kopinsky , Rati Gelashvili , Alexander Matveev , John Carr , Michael Goin , William Leiserson , Sage Moore , Nir Shavit , and Dan Alistarh . 2020 . Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In International Conference on Machine Learning. PMLR, 5533--5543 . Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, and Dan Alistarh. 2020. Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks. In International Conference on Machine Learning. PMLR, 5533--5543."},{"key":"e_1_3_2_1_49_1","volume-title":"Proceedings of the 2nd International Conference on Neural Information Processing Systems (NIPS'89)","author":"Cun Yann Le","unstructured":"Yann Le Cun , John S. Denker , and Sara A. Solla . 1989. Optimal Brain Damage . In Proceedings of the 2nd International Conference on Neural Information Processing Systems (NIPS'89) . MIT Press, Cambridge, MA, USA, 598--605. Yann Le Cun, John S. Denker, and Sara A. Solla. 1989. Optimal Brain Damage. In Proceedings of the 2nd International Conference on Neural Information Processing Systems (NIPS'89). MIT Press, Cambridge, MA, USA, 598--605."},{"key":"e_1_3_2_1_50_1","volume-title":"Ternary Weight Networks. arXiv:1605.04711 [cs] (Nov","author":"Li Fengfu","year":"2016","unstructured":"Fengfu Li , Bo Zhang , and Bin Liu . 2016. Ternary Weight Networks. arXiv:1605.04711 [cs] (Nov . 2016 ). arXiv:1605.04711 [cs] Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary Weight Networks. arXiv:1605.04711 [cs] (Nov. 2016). arXiv:1605.04711 [cs]"},{"key":"e_1_3_2_1_51_1","volume-title":"Dally","author":"Lin Yujun","year":"2020","unstructured":"Yujun Lin , Song Han , Huizi Mao , Yu Wang , and William J . Dally . 2020 . Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training . arXiv:1712.01887 [cs, stat] (June 2020). arXiv:1712.01887 [cs, stat] Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J. Dally. 2020. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. arXiv:1712.01887 [cs, stat] (June 2020). arXiv:1712.01887 [cs, stat]"},{"key":"e_1_3_2_1_52_1","volume-title":"So","author":"Liu Junjie","year":"2020","unstructured":"Junjie Liu , Zhe Xu , Runbin Shi , Ray C. C. Cheung , and Hayden K. H . So . 2020 . Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers . arXiv:2005.06870 [cs, stat] (May 2020). arXiv:2005.06870 [cs, stat] Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, and Hayden K. H. So. 2020. Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers. arXiv:2005.06870 [cs, stat] (May 2020). arXiv:2005.06870 [cs, stat]"},{"key":"e_1_3_2_1_53_1","volume-title":"Dynamic Sparse Graph for Efficient Deep Learning. arXiv:1810.00859 [cs, stat] (May","author":"Liu Liu","year":"2019","unstructured":"Liu Liu , Lei Deng , Xing Hu , Maohua Zhu , Guoqi Li , Yufei Ding , and Yuan Xie . 2019. Dynamic Sparse Graph for Efficient Deep Learning. arXiv:1810.00859 [cs, stat] (May 2019 ). arXiv:1810.00859 [cs, stat] Liu Liu, Lei Deng, Xing Hu, Maohua Zhu, Guoqi Li, Yufei Ding, and Yuan Xie. 2019. Dynamic Sparse Graph for Efficient Deep Learning. arXiv:1810.00859 [cs, stat] (May 2019). arXiv:1810.00859 [cs, stat]"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.42"},{"key":"e_1_3_2_1_55_1","volume-title":"Kingma","author":"Louizos Christos","year":"2018","unstructured":"Christos Louizos , Max Welling , and Diederik P . Kingma . 2018 . Learning Sparse Neural Networks through $L_0$ Regularization . arXiv: 1712.01312 [cs, stat] (June 2018). arXiv:1712.01312 [cs, stat] Christos Louizos, Max Welling, and Diederik P. Kingma. 2018. Learning Sparse Neural Networks through $L_0$ Regularization. arXiv: 1712.01312 [cs, stat] (June 2018). arXiv:1712.01312 [cs, stat]"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002491"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00069"},{"key":"e_1_3_2_1_58_1","volume-title":"Dally","author":"Mao Huizi","year":"2017","unstructured":"Huizi Mao , Song Han , Jeff Pool , Wenshuo Li , Xingyu Liu , Yu Wang , and William J . Dally . 2017 . Exploring the Regularity of Sparse Structure in Convolutional Neural Networks . arXiv:1705.08922 [cs, stat] (June 2017). arXiv:1705.08922 [cs, stat] Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J. Dally. 2017. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. arXiv:1705.08922 [cs, stat] (June 2017). arXiv:1705.08922 [cs, stat]"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2717764.2717783"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2975185"},{"key":"e_1_3_2_1_61_1","volume-title":"Variational Dropout Sparsifies Deep Neural Networks. In International Conference on Machine Learning. PMLR, 2498--2507","author":"Molchanov Dmitry","year":"2017","unstructured":"Dmitry Molchanov , Arsenii Ashukha , and Dmitry Vetrov . 2017 . Variational Dropout Sparsifies Deep Neural Networks. In International Conference on Machine Learning. PMLR, 2498--2507 . Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational Dropout Sparsifies Deep Neural Networks. In International Conference on Machine Learning. PMLR, 2498--2507."},{"key":"e_1_3_2_1_62_1","volume-title":"Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 [cs, stat] (June","author":"Molchanov Pavlo","year":"2017","unstructured":"Pavlo Molchanov , Stephen Tyree , Tero Karras , Timo Aila , and Jan Kautz . 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 [cs, stat] (June 2017 ). arXiv:1611.06440 [cs, stat] Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. arXiv:1611.06440 [cs, stat] (June 2017). arXiv:1611.06440 [cs, stat]"},{"key":"e_1_3_2_1_63_1","volume-title":"International Conference on Machine Learning. PMLR, 4646--4655","author":"Mostafa Hesham","year":"2019","unstructured":"Hesham Mostafa and Xin Wang . 2019 . Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization . In International Conference on Machine Learning. PMLR, 4646--4655 . Hesham Mostafa and Xin Wang. 2019. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization. In International Conference on Machine Learning. PMLR, 4646--4655."},{"key":"e_1_3_2_1_64_1","volume-title":"Hinton","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey E . Hinton . 2010 . Rectified Linear Units Improve Restricted Boltzmann Machines. In ICML. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In ICML."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3058217"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"e_1_3_2_1_69_1","volume-title":"Lin (Eds.)","volume":"33","author":"Raihan Md Aamir","year":"2020","unstructured":"Md Aamir Raihan and Tor Aamodt . 2020 . Sparse Weight Activation Training. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H . Lin (Eds.) , Vol. 33 . Curran Associates, Inc., 15625--15638. Md Aamir Raihan and Tor Aamodt. 2020. Sparse Weight Activation Training. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 15625--15638."},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00017"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1038\/323533a0"},{"key":"e_1_3_2_1_72_1","volume-title":"Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs] (Dec","author":"Silver David","year":"2017","unstructured":"David Silver , Thomas Hubert , Julian Schrittwieser , Ioannis Antonoglou , Matthew Lai , Arthur Guez , Marc Lanctot , Laurent Sifre , Dharshan Kumaran , Thore Graepel , Timothy Lillicrap , Karen Simonyan , and Demis Hassabis . 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs] (Dec . 2017 ). arXiv:1712.01815 [cs] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. 2017. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 [cs] (Dec. 2017). arXiv:1712.01815 [cs]"},{"key":"e_1_3_2_1_73_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (Sept","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (Sept . 2014 ). arXiv:1409.1556 [cs] Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs] (Sept. 2014). arXiv:1409.1556 [cs]"},{"key":"e_1_3_2_1_74_1","first-page":"1","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"Srivastava Nitish","year":"2014","unstructured":"Nitish Srivastava , Geoffrey Hinton , Alex Krizhevsky , Ilya Sutskever , and Ruslan Salakhutdinov . 2014 . Dropout: A Simple Way to Prevent Neural Networks from Overfitting . The Journal of Machine Learning Research 15 , 1 (Jan. 2014), 1929--1958. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. The Journal of Machine Learning Research 15, 1 (Jan. 2014), 1929--1958.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSE.2007.44"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_2_1_77_1","volume-title":"EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. PMLR, 6105--6114","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le . 2019 . EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. PMLR, 6105--6114 . Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. PMLR, 6105--6114."},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330756"},{"key":"e_1_3_2_1_79_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_80_1","unstructured":"Isak Edo Vivancos Ali Hadizaden and Omar Mohamed Awad. 2021. DNNSim. https:\/\/github.com\/isakedo\/DNNsim.  Isak Edo Vivancos Ali Hadizaden and Omar Mohamed Awad. 2021. DNNSim. https:\/\/github.com\/isakedo\/DNNsim."},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00088"},{"key":"e_1_3_2_1_82_1","volume-title":"Training and Inference with Integers in Deep Neural Networks. In International Conference on Learning Representations.","author":"Wu Shuang","year":"2018","unstructured":"Shuang Wu , Guoqi Li , Feng Chen , and Luping Shi . 2018 . Training and Inference with Integers in Deep Neural Networks. In International Conference on Learning Representations. Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and Inference with Integers in Deep Neural Networks. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00064"},{"key":"e_1_3_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378514"},{"key":"e_1_3_2_1_85_1","volume-title":"Wide Residual Networks. arXiv:1605.07146 [cs] (June","author":"Zagoruyko Sergey","year":"2017","unstructured":"Sergey Zagoruyko and Nikos Komodakis . 2017. Wide Residual Networks. arXiv:1605.07146 [cs] (June 2017 ). arXiv:1605.07146 [cs] Sergey Zagoruyko and Nikos Komodakis. 2017. Wide Residual Networks. arXiv:1605.07146 [cs] (June 2017). arXiv:1605.07146 [cs]"},{"key":"e_1_3_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783723"},{"key":"e_1_3_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_12"},{"key":"e_1_3_2_1_88_1","volume-title":"DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arXiv:1606.06160 [cs] (Feb","author":"Zhou Shuchang","year":"2018","unstructured":"Shuchang Zhou , Yuxin Wu , Zekun Ni , Xinyu Zhou , He Wen , and Yuheng Zou . 2018. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arXiv:1606.06160 [cs] (Feb . 2018 ). arXiv:1606.06160 [cs] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2018. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arXiv:1606.06160 [cs] (Feb. 2018). arXiv:1606.06160 [cs]"},{"key":"e_1_3_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358269"},{"key":"e_1_3_2_1_90_1","volume-title":"Le","author":"Zoph Barret","year":"2018","unstructured":"Barret Zoph , Vijay Vasudevan , Jonathon Shlens , and Quoc V . Le . 2018 . Learning Transferable Architectures for Scalable Image Recognition . arXiv:1707.07012 [cs, stat] (April 2018). arXiv:1707.07012 [cs, stat] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning Transferable Architectures for Scalable Image Recognition. arXiv:1707.07012 [cs, stat] (April 2018). arXiv:1707.07012 [cs, stat]"}],"event":{"name":"ISCA '22: The 49th Annual International Symposium on Computer Architecture","location":"New York New York","acronym":"ISCA '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE CS TCAA IEEE CS technical committee on architectural acoustics"]},"container-title":["Proceedings of the 49th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527404","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470496.3527404","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:28Z","timestamp":1750188628000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527404"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,11]]},"references-count":90,"alternative-id":["10.1145\/3470496.3527404","10.1145\/3470496"],"URL":"https:\/\/doi.org\/10.1145\/3470496.3527404","relation":{},"subject":[],"published":{"date-parts":[[2022,6,11]]},"assertion":[{"value":"2022-06-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}