{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T22:45:56Z","timestamp":1757544356680,"version":"3.41.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T00:00:00Z","timestamp":1626220800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>Micro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things applications consume time-series data that are naturally suitable for recurrent neural networks (RNNs) like LSTMs and GRUs. However, RNNs can be large and difficult to deploy on these devices, as they have few kilobytes of memory. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This article introduces a method to compress RNNs for resource-constrained environments using the Kronecker product (KP). KPs can compress RNN layers by 16\u00d7 to 38\u00d7 with minimal accuracy loss. By quantizing the resulting models to 8 bits, we further push the compression factor to 50\u00d7. We compare KP with other state-of-the-art compression techniques across seven benchmarks spanning five different applications and show that KP can beat the task accuracy achieved by other techniques by a large margin while simultaneously improving the inference runtime. Sometimes the KP compression mechanism can introduce an accuracy loss. We develop a hybrid KP approach to mitigate this. Our hybrid KP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.<\/jats:p>","DOI":"10.1145\/3440016","type":"journal-article","created":{"date-parts":[[2021,7,14]],"date-time":"2021-07-14T16:27:00Z","timestamp":1626280020000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products"],"prefix":"10.1145","volume":"17","author":[{"given":"Urmish","family":"Thakker","sequence":"first","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Igor","family":"Fedorov","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Chu","family":"Zhou","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Dibakar","family":"Gope","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Matthew","family":"Mattina","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Austin, TX"}]},{"given":"Ganesh","family":"Dasika","sequence":"additional","affiliation":[{"name":"AMD Research, Austin, TX"}]},{"given":"Jesse","family":"Beu","sequence":"additional","affiliation":[{"name":"Arm ML Research Lab, Boston, MA"}]}],"member":"320","published-online":{"date-parts":[[2021,7,14]]},"reference":[{"volume-title":"Yelp Review Dataset. Retrieved","year":"2020","key":"e_1_2_1_1_1","unstructured":"Kaggle. 2020. Yelp Review Dataset. Retrieved August 3, 2020 from https:\/\/www.kaggle.com\/yelp-dataset\/yelp-dataset. Kaggle. 2020. Yelp Review Dataset. Retrieved August 3, 2020 from https:\/\/www.kaggle.com\/yelp-dataset\/yelp-dataset."},{"key":"e_1_2_1_2_1","volume-title":"et\u00a0al","author":"Abadi Mart\u00edn","year":"2015","unstructured":"Mart\u00edn Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S. Corrado , et\u00a0al . 2015 . TensorFlow: Large- Scale Machine Learning on Heterogeneous Systems. Retrieved June 1, 2021 from https:\/\/www.tensorflow.org\/. (Software available from tensorflow.org.) Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et\u00a0al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved June 1, 2021 from https:\/\/www.tensorflow.org\/. (Software available from tensorflow.org.)"},{"key":"e_1_2_1_3_1","volume-title":"Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, et\u00a0al.","author":"Banbury Colby R.","year":"2020","unstructured":"Colby R. Banbury , Vijay Janapa Reddi , Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, et\u00a0al. 2020 . Benchmarking TinyML systems: Challenges and direction. arxiv:cs.PF\/2003.04821. Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, et\u00a0al. 2020. Benchmarking TinyML systems: Challenges and direction. arxiv:cs.PF\/2003.04821."},{"key":"e_1_2_1_4_1","volume-title":"Mandic","author":"Calvi Giuseppe Giovanni","year":"2019","unstructured":"Giuseppe Giovanni Calvi , Ahmad Moniri , Mahmoud Mahfouz , Zeyang Yu , Qibin Zhao , and Danilo P . Mandic . 2019 . Tucker tensor layer in fully connected neural networks. arxiv:1903.06133. Giuseppe Giovanni Calvi, Ahmad Moniri, Mahmoud Mahfouz, Zeyang Yu, Qibin Zhao, and Danilo P. Mandic. 2019. Tucker tensor layer in fully connected neural networks. arxiv:1903.06133."},{"key":"e_1_2_1_5_1","unstructured":"Soravit Changpinyo Mark Sandler and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arxiv:1702.06257.  Soravit Changpinyo Mark Sandler and Andrey Zhmoginov. 2017. The power of sparsity in convolutional neural networks. arxiv:1702.06257."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.327"},{"key":"e_1_2_1_7_1","unstructured":"Kyunghyun Cho Bart van Merrienboer \u00c7aglar G\u00fcl\u00e7ehre Fethi Bougares Holger Schwenk and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078.  Kyunghyun Cho Bart van Merrienboer \u00c7aglar G\u00fcl\u00e7ehre Fethi Bougares Holger Schwenk and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294792"},{"key":"e_1_2_1_9_1","unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \u20131. arxiv:1602.02830.  Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or \u20131. arxiv:1602.02830."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999852"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3194554.3194625"},{"key":"e_1_2_1_12_1","unstructured":"Trevor Gale Erich Elsen and Sara Hooker. 2019. The state of sparsity in deep neural networks. arxiv:1902.09574.  Trevor Gale Erich Elsen and Sara Hooker. 2019. The state of sparsity in deep neural networks. arxiv:1902.09574."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00362"},{"key":"e_1_2_1_15_1","volume-title":"Retrieved","author":"Guennebau Gael","year":"2009","unstructured":"Gael Guennebau and Benoit Jacob . 2009 . Eigen Library . Retrieved December 21, 2018 from http:\/\/eigen.tuxfamily.org\/. Gael Guennebau and Benoit Jacob. 2009. Eigen Library. Retrieved December 21, 2018 from http:\/\/eigen.tuxfamily.org\/."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3060832.3060835"},{"volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201916)","author":"Han Song","key":"e_1_2_1_17_1","unstructured":"Song Han , Huizi Mao , and William J. Dally . 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding . In Proceedings of the International Conference on Learning Representations (ICLR\u201916) . Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR\u201916)."},{"key":"e_1_2_1_18_1","unstructured":"Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176.  Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176."},{"key":"e_1_2_1_19_1","unstructured":"Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176.  Qinyao He He Wen Shuchang Zhou Yuxin Wu Cong Yao Xinyu Zhou and Yuheng Zou. 2016. Effective quantization methods for recurrent neural networks. arxiv:1611.10176."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3417313.3429380"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.291440"},{"key":"e_1_2_1_25_1","unstructured":"Cijo Jose Moustapha Ciss\u00e9 and Fran\u00e7ois Fleuret. 2017. Kronecker recurrent units. arxiv:1705.10142.  Cijo Jose Moustapha Ciss\u00e9 and Fran\u00e7ois Fleuret. 2017. Kronecker recurrent units. arxiv:1705.10142."},{"key":"e_1_2_1_26_1","unstructured":"Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arxiv:1703.10722.  Oleksii Kuchaiev and Boris Ginsburg. 2017. Factorization tricks for LSTM networks. arxiv:1703.10722."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305581"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327546.3327577"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1062366"},{"key":"e_1_2_1_30_1","unstructured":"V. Lebedev Y. Ganin M. Rakhuba I. Oseledets and V. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arxiv:cs.CV\/1412.6553.  V. Lebedev Y. Ganin M. Rakhuba I. Oseledets and V. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned cp-decomposition. arxiv:cs.CV\/1412.6553."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"#cr-split#-e_1_2_1_32_1.1","doi-asserted-by":"crossref","unstructured":"Zhouhan Lin Matthieu Courbariaux Roland Memisevic and Yoshua Bengio. 2015. Neural networks with few multiplications. arxiv:1510.03009. https:\/\/doi.org\/10.1109\/5.726791 10.1109\/5.726791","DOI":"10.1109\/5.726791"},{"key":"#cr-split#-e_1_2_1_32_1.2","doi-asserted-by":"crossref","unstructured":"Zhouhan Lin Matthieu Courbariaux Roland Memisevic and Yoshua Bengio. 2015. Neural networks with few multiplications. arxiv:1510.03009. https:\/\/doi.org\/10.1109\/5.726791","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_33_1","volume-title":"Kingma","author":"Louizos Christos","year":"2017","unstructured":"Christos Louizos , Max Welling , and Diederik P . Kingma . 2017 . Learning sparse neural networks through regularization. arxiv:1712.01312. Christos Louizos, Max Welling, and Diederik P. Kingma. 2017. Learning sparse neural networks through regularization. arxiv:1712.01312."},{"key":"e_1_2_1_34_1","volume-title":"Retrieved","author":"Nagy James","year":"2010","unstructured":"James Nagy . 2010 . Introduction to Kronecker Products . Retrieved May 20, 2019 from http:\/\/www.mathcs.emory.edu\/ nagy\/courses\/fall10\/515\/KroneckerIntro.pdf. James Nagy. 2010. Introduction to Kronecker Products. Retrieved May 20, 2019 from http:\/\/www.mathcs.emory.edu\/ nagy\/courses\/fall10\/515\/KroneckerIntro.pdf."},{"key":"e_1_2_1_35_1","volume-title":"Diamos","author":"Narang Sharan","year":"2017","unstructured":"Sharan Narang , Eric Undersander , and Gregory F . Diamos . 2017 . Block-sparse recurrent neural networks. arxiv:1711.02782. Sharan Narang, Eric Undersander, and Gregory F. Diamos. 2017. Block-sparse recurrent neural networks. arxiv:1711.02782."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295422"},{"key":"e_1_2_1_37_1","unstructured":"Razvan Pascanu Tomas Mikolov and Yoshua Bengio. 2012. Understanding the exploding gradient problem. arxiv:1211.5063.  Razvan Pascanu Tomas Mikolov and Yoshua Bengio. 2012. Understanding the exploding gradient problem. arxiv:1211.5063."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3417313.3429381"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/INSS.2010.5573462"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969584"},{"key":"e_1_2_1_42_1","unstructured":"Yu Tang Zhigang Kan Dequan Sun Linbo Qiao Jingjing Xiao Zhiquan Lai and Dongsheng Li. 2020. ADMMiRNN: Training RNN with stable convergence via an efficient ADMM approach. arxiv:2006.05622.  Yu Tang Zhigang Kan Dequan Sun Linbo Qiao Jingjing Xiao Zhiquan Lai and Dongsheng Li. 2020. ADMMiRNN: Training RNN with stable convergence via an efficient ADMM approach. arxiv:2006.05622."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3362743.3362965"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.sustainlp-1.2"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Urmish Thakker Jesse G. Beu Dibakar Gope Ganesh Dasika and Matthew Mattina. 2019. Run-time efficient RNN compression for inference on edge devices. arxiv:1906.04886.  Urmish Thakker Jesse G. Beu Dibakar Gope Ganesh Dasika and Matthew Mattina. 2019. Run-time efficient RNN compression for inference on edge devices. arxiv:1906.04886.","DOI":"10.1109\/EMC249363.2019.00013"},{"key":"e_1_2_1_46_1","unstructured":"Urmish Thakker Ganesh Dasika Jesse G. Beu and Matthew Mattina. 2019. Measuring scheduling efficiency of RNNs for NLP applications. arxiv:1904.03302.  Urmish Thakker Ganesh Dasika Jesse G. Beu and Matthew Mattina. 2019. Measuring scheduling efficiency of RNNs for NLP applications. arxiv:1904.03302."},{"key":"e_1_2_1_47_1","unstructured":"Urmish Thakker Paul Whatmough Matthew Mattina and Jesse Beu. 2020. Compressing language models using doped Kronecker products. arxiv:cs.LG\/2001.08896.  Urmish Thakker Paul Whatmough Matthew Mattina and Jesse Beu. 2020. Compressing language models using doped Kronecker products. arxiv:cs.LG\/2001.08896."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327546.3327580"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Lloyd Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM.  Lloyd Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM.","DOI":"10.1137\/1.9780898719574"},{"volume-title":"Proceedings of theDeep Learning and Unsupervised Feature Learning Workshop (NIPS\u201911)","author":"Vanhoucke Vincent","key":"e_1_2_1_50_1","unstructured":"Vincent Vanhoucke , Andrew Senior , and Mark Z. Mao . 2011. Improving the speed of neural networks on CPUs . In Proceedings of theDeep Learning and Unsupervised Feature Learning Workshop (NIPS\u201911) . Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of theDeep Learning and Unsupervised Feature Learning Workshop (NIPS\u201911)."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174253"},{"key":"e_1_2_1_52_1","volume-title":"Speech Commands: A dataset for limited-vocabulary speech recognition. arxiv:1804.03209.","author":"Warden Pete","year":"2018","unstructured":"Pete Warden . 2018 . Speech Commands: A dataset for limited-vocabulary speech recognition. arxiv:1804.03209. Pete Warden. 2018. Speech Commands: A dataset for limited-vocabulary speech recognition. arxiv:1804.03209."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157643"},{"key":"e_1_2_1_54_1","unstructured":"Wojciech Zaremba Ilya Sutskever and Oriol Vinyals. 2014. Recurrent neural network regularization. arxiv:1409.2329.  Wojciech Zaremba Ilya Sutskever and Oriol Vinyals. 2014. Recurrent neural network regularization. arxiv:1409.2329."},{"key":"e_1_2_1_55_1","volume-title":"Dhillon","author":"Zhang Jiong","year":"2018","unstructured":"Jiong Zhang , Qi Lei , and Inderjit S . Dhillon . 2018 . Stabilizing gradients for deep neural networks via efficient SVD parameterization. arxiv:1803.09327. Jiong Zhang, Qi Lei, and Inderjit S. Dhillon. 2018. Stabilizing gradients for deep neural networks via efficient SVD parameterization. arxiv:1803.09327."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.335"},{"key":"e_1_2_1_57_1","volume-title":"Hello Edge: Keyword spotting on microcontrollers. arxiv:1711.07128.","author":"Zhang Yundong","year":"2017","unstructured":"Yundong Zhang , Naveen Suda , Liangzhen Lai , and Vikas Chandra . 2017 . Hello Edge: Keyword spotting on microcontrollers. arxiv:1711.07128. Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello Edge: Keyword spotting on microcontrollers. arxiv:1711.07128."},{"key":"e_1_2_1_58_1","unstructured":"Shuchang Zhou and Jia-Nan Wu. 2015. Compression of fully-connected layer in neural network by Kronecker product. arxiv:1507.05775.  Shuchang Zhou and Jia-Nan Wu. 2015. Compression of fully-connected layer in neural network by Kronecker product. arxiv:1507.05775."},{"key":"e_1_2_1_59_1","unstructured":"Michael Zhu and Suyog Gupta. 2017. To prune or not to prune: Exploring the efficacy of pruning for model compression. arxiv:1710.01878.  Michael Zhu and Suyog Gupta. 2017. To prune or not to prune: Exploring the efficacy of pruning for model compression. arxiv:1710.01878."}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3440016","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3440016","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:17Z","timestamp":1750197737000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3440016"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,14]]},"references-count":60,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3440016"],"URL":"https:\/\/doi.org\/10.1145\/3440016","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"type":"print","value":"1550-4832"},{"type":"electronic","value":"1550-4840"}],"subject":[],"published":{"date-parts":[[2021,7,14]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}