{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T19:14:07Z","timestamp":1648581247211},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2021,7,8]]},"abstract":"Micro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things applications consume time-series data that are naturally suitable for recurrent neural networks (RNNs) like LSTMs and GRUs. However, RNNs can be large and difficult to deploy on these devices, as they have few kilobytes of memory. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This article introduces a method to compress RNNs for resource-constrained environments using the Kronecker product (KP). KPs can compress RNN layers by 16\u00d7 to 38\u00d7 with minimal accuracy loss. By quantizing the resulting models to 8 bits, we further push the compression factor to 50\u00d7. We compare KP with other state-of-the-art compression techniques across seven benchmarks spanning five different applications and show that KP can beat the task accuracy achieved by other techniques by a large margin while simultaneously improving the inference runtime. Sometimes the KP compression mechanism can introduce an accuracy loss. We develop a hybrid KP approach to mitigate this. Our hybrid KP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.<\/jats:p>","DOI":"10.1145\/3440016","type":"journal-article","created":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T16:27:00Z","timestamp":1626280020000},"page":"1-18","source":"Crossref","is-referenced-by-count":1,"title":["Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products"],"prefix":"10.1145","volume":"17","author":[{"given":"Urmish","family":"Thakker","sequence":"first","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Igor","family":"Fedorov","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Chu","family":"Zhou","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Dibakar","family":"Gope","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Matthew","family":"Mattina","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Ganesh","family":"Dasika","sequence":"additional","affiliation":[{"name":"AMD Research, Austin, TX"}]},{"given":"Jesse","family":"Beu","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Boston, MA"}]}],"member":"320","reference":[{"key":"e_1_2_1_1_1","volume-title":"Yelp Review Dataset. Retrieved","year":"2020"},{"key":"e_1_2_1_2_1","volume-title":"et\u00a0al","author":"Abadi Mart\u00edn","year":"2015"},{"key":"e_1_2_1_3_1","volume-title":"Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, et\u00a0al.","author":"Banbury Colby R.","year":"2020"},{"key":"e_1_2_1_4_1","volume-title":"Mandic","author":"Calvi Giuseppe Giovanni","year":"2019"},{"key":"e_1_2_1_5_1","unstructured":"Soravit Changpinyo Mark Sandler and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arxiv:1702.06257. Soravit Changpinyo Mark Sandler and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arxiv:1702.06257."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV\u201915)","author":"Cheng Y.","year":"2015"},{"key":"e_1_2_1_7_1","unstructured":"Kyunghyun Cho Bart van Merrienboer \u00c7aglar G\u00fcl\u00e7ehre Fethi Bougares Holger Schwenk and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078. Kyunghyun Cho Bart van Merrienboer \u00c7aglar G\u00fcl\u00e7ehre Fethi Bougares Holger Schwenk and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294792"},{"key":"e_1_2_1_9_1","unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \u20131. arxiv:1602.02830. Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \u20131. arxiv:1602.02830."},{"key":"e_1_2_1_10_1","unstructured":"Misha Denil Babak Shakibi Laurent Dinh Marc\u2019Aurelio Ranzato and Nando de Freitas. 2013. Predicting parameters in deep learning. arxiv:1306.0543. Misha Denil Babak Shakibi Laurent Dinh Marc\u2019Aurelio Ranzato and Nando de Freitas. 2013. Predicting parameters in deep learning. arxiv:1306.0543."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3194554.3194625"},{"key":"e_1_2_1_12_1","unstructured":"Trevor Gale Erich Elsen and Sara Hooker. 2019. The state of sparsity in deep neural networks. arxiv:1902.09574. Trevor Gale Erich Elsen and Sara Hooker. 2019. The state of sparsity in deep neural networks. arxiv:1902.09574."},{"key":"e_1_2_1_13_1","volume-title":"Deep Learning","author":"Goodfellow Ian"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR\u201920)","author":"Gope Dibakar","year":"2020"},{"key":"e_1_2_1_15_1","volume-title":"Retrieved","author":"Guennebau Gael","year":"2009"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI\u201916)","author":"Hammerla Nils Y.","year":"2016"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201916)","author":"Han Song"},{"key":"e_1_2_1_18_1","unstructured":"Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176. Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176."},{"key":"e_1_2_1_19_1","unstructured":"Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176. Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 2nd International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. ACM","author":"Huang Xueqin","year":"2020"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Itay Hubara Matthieu Courbariaux Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arxiv:1609.07061. https:\/\/doi.org\/10.1145\/3417313.3429380 Itay Hubara Matthieu Courbariaux Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arxiv:1609.07061. https:\/\/doi.org\/10.1145\/3417313.3429380","DOI":"10.1145\/3417313.3429380"},{"key":"e_1_2_1_23_1","first-page":"1","article-title":"Quantized neural networks: Training neural networks with low precision weights and activations","volume":"18","author":"Hubara Itay","year":"2017","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.291440"},{"key":"e_1_2_1_25_1","unstructured":"Cijo Jose Moustapha Ciss\u00e9 and Fran\u00e7ois Fleuret. 2017. Kronecker recurrent units. arxiv:1705.10142. Cijo Jose Moustapha Ciss\u00e9 and Fran\u00e7ois Fleuret. 2017. Kronecker recurrent units. arxiv:1705.10142."},{"key":"e_1_2_1_26_1","unstructured":"Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arxiv:1703.10722. Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arxiv:1703.10722."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u201917)","volume":"70","author":"Kumar Ashish","year":"2017"},{"key":"e_1_2_1_28_1","unstructured":"Aditya Kusupati Manish Singh Kush Bhatia Ashish Kumar Prateek Jain and Manik Varma. 2019. FastGRNN: A fast accurate stable and tiny kilobyte sized gated recurrent neural network. arxiv:1901.02358. Aditya Kusupati Manish Singh Kush Bhatia Ashish Kumar Prateek Jain and Manik Varma. 2019. FastGRNN: A fast accurate stable and tiny kilobyte sized gated recurrent neural network. arxiv:1901.02358."},{"key":"e_1_2_1_29_1","volume-title":"Matrix Analysis for Scientists and Engineers","author":"Laub Alan J."},{"key":"e_1_2_1_30_1","unstructured":"V. Lebedev Y. Ganin M. Rakhuba I. Oseledets and V. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arxiv:cs.CV\/1412.6553. V. Lebedev Y. Ganin M. Rakhuba I. Oseledets and V. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arxiv:cs.CV\/1412.6553."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Zhouhan Lin Matthieu Courbariaux Roland Memisevic and Yoshua Bengio. 2015. Neural networks with few multiplications. arxiv:1510.03009. https:\/\/doi.org\/10.1109\/5.726791 Zhouhan Lin Matthieu Courbariaux Roland Memisevic and Yoshua Bengio. 2015. Neural networks with few multiplications. arxiv:1510.03009. https:\/\/doi.org\/10.1109\/5.726791","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_33_1","volume-title":"Kingma","author":"Louizos Christos","year":"2017"},{"key":"e_1_2_1_34_1","volume-title":"Retrieved","author":"Nagy James","year":"2010"},{"key":"e_1_2_1_35_1","volume-title":"Diamos","author":"Narang Sharan","year":"2017"},{"key":"e_1_2_1_36_1","unstructured":"Kirill Neklyudov Dmitry Molchanov Arsenii Ashukha and Dmitry Vetrov. 2017. Structured Bayesian pruning via log-normal multiplicative noise. arxiv:1705.07283. Kirill Neklyudov Dmitry Molchanov Arsenii Ashukha and Dmitry Vetrov. 2017. Structured Bayesian pruning via log-normal multiplicative noise. arxiv:1705.07283."},{"key":"e_1_2_1_37_1","unstructured":"Razvan Pascanu Tomas Mikolov and Yoshua Bengio. 2012. Understanding the exploding gradient problem. arxiv:1211.5063. Razvan Pascanu Tomas Mikolov and Yoshua Bengio. 2012. Understanding the exploding gradient problem. arxiv:1211.5063."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3417313.3429381"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 2010 7th International Conference on Networked Sensing Systems (INSS\u201910)","author":"Roggen D.","year":"2010"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_2_1_41_1","volume-title":"Advances in Neural Information Processing Systems 28","author":"Sindhwani Vikas"},{"key":"e_1_2_1_42_1","unstructured":"Yu Tang Zhigang Kan Dequan Sun Linbo Qiao Jingjing Xiao Zhiquan Lai and Dongsheng Li. 2020. ADMMiRNN: Training RNN with stable convergence via an efficient ADMM approach. arxiv:2006.05622. Yu Tang Zhigang Kan Dequan Sun Linbo Qiao Jingjing Xiao Zhiquan Lai and Dongsheng Li. 2020. ADMMiRNN: Training RNN with stable convergence via an efficient ADMM approach. arxiv:2006.05622."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3362743.3362965"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the Workshop on Simple and Efficient Natural Language Processing (SustaiNLP\u201920)","author":"Thakker Urmish","year":"2020"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Urmish Thakker Jesse G. Beu Dibakar Gope Ganesh Dasika and Matthew Mattina. 2019. Run-time efficient RNN compression for inference on edge devices. arxiv:1906.04886. Urmish Thakker Jesse G. Beu Dibakar Gope Ganesh Dasika and Matthew Mattina. 2019. Run-time efficient RNN compression for inference on edge devices. arxiv:1906.04886.","DOI":"10.1109\/EMC249363.2019.00013"},{"key":"e_1_2_1_46_1","unstructured":"Urmish Thakker Ganesh Dasika Jesse G. Beu and Matthew Mattina. 2019. Measuring scheduling efficiency of RNNs for NLP applications. arxiv:1904.03302. Urmish Thakker Ganesh Dasika Jesse G. Beu and Matthew Mattina. 2019. Measuring scheduling efficiency of RNNs for NLP applications. arxiv:1904.03302."},{"key":"e_1_2_1_47_1","unstructured":"Urmish Thakker Paul Whatmough Matthew Mattina and Jesse Beu. 2020. Compressing language models using doped Kronecker products. arxiv:cs.LG\/2001.08896. Urmish Thakker Paul Whatmough Matthew Mattina and Jesse Beu. 2020. Compressing language models using doped Kronecker products. arxiv:cs.LG\/2001.08896."},{"key":"e_1_2_1_48_1","volume-title":"Advances in Neural Information Processing Systems 31","author":"Thomas Anna"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Lloyd Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM. Lloyd Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM.","DOI":"10.1137\/1.9780898719574"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of theDeep Learning and Unsupervised Feature Learning Workshop (NIPS\u201911)","author":"Vanhoucke Vincent"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174253"},{"key":"e_1_2_1_52_1","volume-title":"Speech Commands: A dataset for limited-vocabulary speech recognition. arxiv:1804.03209.","author":"Warden Pete","year":"2018"},{"key":"e_1_2_1_53_1","volume-title":"Jonathan Le Roux, and Les Atlas","author":"Wisdom Scott","year":"2016"},{"key":"e_1_2_1_54_1","unstructured":"Wojciech Zaremba Ilya Sutskever and Oriol Vinyals. 2014. Recurrent neural network regularization. arxiv:1409.2329. Wojciech Zaremba Ilya Sutskever and Oriol Vinyals. 2014. Recurrent neural network regularization. arxiv:1409.2329."},{"key":"e_1_2_1_55_1","volume-title":"Dhillon","author":"Zhang Jiong","year":"2018"},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV\u201915)","author":"Zhang X.","year":"2015"},{"key":"e_1_2_1_57_1","volume-title":"Hello Edge: Keyword spotting on microcontrollers. arxiv:1711.07128.","author":"Zhang Yundong","year":"2017"},{"key":"e_1_2_1_58_1","unstructured":"Shuchang Zhou and Jia-Nan Wu. 2015. Compression of fully-connected layer in neural network by Kronecker product. arxiv:1507.05775. Shuchang Zhou and Jia-Nan Wu. 2015. Compression of fully-connected layer in neural network by Kronecker product. arxiv:1507.05775."},{"key":"e_1_2_1_59_1","unstructured":"Michael Zhu and Suyog Gupta. 2017. To prune or not to prune: Exploring the efficacy of pruning for model compression. arxiv:1710.01878. Michael Zhu and Suyog Gupta. 2017. To prune or not to prune: Exploring the efficacy of pruning for model compression. arxiv:1710.01878."}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3440016","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T16:27:57Z","timestamp":1626280077000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3440016"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,8]]},"references-count":59,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7,8]]}},"alternative-id":["10.1145\/3440016"],"URL":"http:\/\/dx.doi.org\/10.1145\/3440016","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"value":"1550-4832","type":"print"},{"value":"1550-4840","type":"electronic"}],"subject":["Electrical and Electronic Engineering","Hardware and Architecture","Software"],"published":{"date-parts":[[2021,7,8]]}}}