{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T21:52:20Z","timestamp":1758405140788,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2017,12,7]],"date-time":"2017-12-07T00:00:00Z","timestamp":1512604800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61672048"],"award-info":[{"award-number":["61672048"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2018,3,31]]},"abstract":"<jats:p>Convolutional neural networks (CNNs) are widely employed in many image recognition applications. With the proliferation of embedded and mobile devices, such applications are becoming commonplace on mobile devices. Network pruning is a commonly used strategy to reduce the memory and storage footprints of CNNs on mobile devices. In this article, we propose customized versions of the sparse matrix multiplication algorithm to speed up inference on mobile devices and make it more energy efficient. Specifically, we propose a Block Compressed Sparse Column algorithm and a bit-representation-based algorithm (BitsGEMM) that exploit sparsity to accelerate the fully connected layers of a network on the NVIDIA Jetson TK1 platform. We evaluate the proposed algorithms using real-world object classification and object detection applications. Experiments show that performance speedups can be achieved over the original baseline implementation using cuBLAS. On object detection CNNs, an average speedup of 1.82\u00d7 is obtained over baseline cuBLAS in the fully connected layer of the VGG model, whereas on classification CNNs, an average speedup of 1.51\u00d7 is achieved for the fully connected layer of the pruned-VGG model. Energy consumption reduction of 43--46% is also observed due to decreased computational and memory bandwidth demands.<\/jats:p>","DOI":"10.1145\/3122788","type":"journal-article","created":{"date-parts":[[2017,12,11]],"date-time":"2017-12-11T13:26:47Z","timestamp":1512998807000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs"],"prefix":"10.1145","volume":"17","author":[{"given":"Xinfeng","family":"Xie","sequence":"first","affiliation":[{"name":"School of EECS, Peking Univerisity, China"}]},{"given":"Dayou","family":"Du","sequence":"additional","affiliation":[{"name":"School of EECS, Peking Univerisity, China"}]},{"given":"Qian","family":"Li","sequence":"additional","affiliation":[{"name":"School of EECS, Peking Univerisity, China"}]},{"given":"Yun","family":"Liang","sequence":"additional","affiliation":[{"name":"School of EECS, Peking Univerisity, China"}]},{"given":"Wai Teng","family":"Tang","sequence":"additional","affiliation":[{"name":"Institute of High Performance Computing, A*STAR, Singapore"}]},{"given":"Zhong Liang","family":"Ong","sequence":"additional","affiliation":[{"name":"Institute of High Performance Computing, A*STAR, Singapore"}]},{"given":"Mian","family":"Lu","sequence":"additional","affiliation":[{"name":"Huawei Singapore Research Centre"}]},{"given":"Huynh Phung","family":"Huynh","sequence":"additional","affiliation":[{"name":"Institute of High Performance Computing, A*STAR, Singapore"}]},{"given":"Rick Siow Mong","family":"Goh","sequence":"additional","affiliation":[{"name":"Institute of High Performance Computing, A*STAR, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2017,12,7]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Structured pruning of deep convolutional neural networks. CoRR abs\/1512.08571","author":"Anwar Sajid","year":"2015","unstructured":"Sajid Anwar , Kyuyeon Hwang , and Wonyong Sung . 2015. Structured pruning of deep convolutional neural networks. CoRR abs\/1512.08571 ( 2015 ). Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2015. Structured pruning of deep convolutional neural networks. CoRR abs\/1512.08571 (2015)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688521"},{"key":"e_1_2_1_3_1","first-page":"105","article-title":"Pruning algorithms of neural networksa comparative study","volume":"3","author":"Gethsiyal Augasta M","year":"2013","unstructured":"M Gethsiyal Augasta and Thangairulappan Kathirvalavakumar . 2013 . Pruning algorithms of neural networksa comparative study . Central Eur. J. Comput. Sci. 3 , 3 (2013), 105 -- 115 . M Gethsiyal Augasta and Thangairulappan Kathirvalavakumar. 2013. Pruning algorithms of neural networksa comparative study. Central Eur. J. Comput. Sci. 3, 3 (2013), 105--115.","journal-title":"Central Eur. J. Comput. Sci."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3015144"},{"key":"e_1_2_1_6_1","volume-title":"Theano: New features and speed improvements. arXiv preprint arXiv:1211.5590.","author":"Bastien Fr\u00e9d\u00e9ric","year":"2012","unstructured":"Fr\u00e9d\u00e9ric Bastien , Pascal Lamblin , Razvan Pascanu , James Bergstra , Ian J. Goodfellow , Arnaud Bergeron , Nicolas Bouchard , and Yoshua Bengio . 2012 . Theano: New features and speed improvements. arXiv preprint arXiv:1211.5590. Fr\u00e9d\u00e9ric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, and Yoshua Bengio. 2012. Theano: New features and speed improvements. arXiv preprint arXiv:1211.5590."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2008.45"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1854273.1854309"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815993"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of The 32nd International Conference on Machine Learning (ICML\u201913)","author":"Chen Wenlin","year":"2015","unstructured":"Wenlin Chen , James Wilson , Stephen Tyree , Kilian Weinberger , and Yixin Chen . 2015 . Compressing neural networks with the hashing trick . In Proceedings of The 32nd International Conference on Machine Learning (ICML\u201913) . 2285--2294. Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of The 32nd International Conference on Machine Learning (ICML\u201913). 2285--2294."},{"key":"e_1_2_1_12_1","volume-title":"cuDNN: Efficient primitives for deep learning. CoRR abs\/1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . 2014. cuDNN: Efficient primitives for deep learning. CoRR abs\/1410.0759 ( 2014 ). Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. CoRR abs\/1410.0759 (2014)."},{"volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)","author":"Coates Adam","key":"e_1_2_1_13_1","unstructured":"Adam Coates , Brody Huval , Tao Wang , David J. Wu , Bryan C. Catanzaro , and Andrew Y. Ng . 2013. Deep learning with COTS HPC systems . In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) . 1337--1345. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Bryan C. Catanzaro, and Andrew Y. Ng. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913). 1337--1345."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the BigLearn, Neural Information Processing Systems Workshop (NIPS\u201911)","author":"Collobert Ronan","year":"2011","unstructured":"Ronan Collobert , Koray Kavukcuoglu , and Cl\u00e9ment Farabet . 2011 . Torch7: A matlab-like environment for machine learning . In Proceedings of the BigLearn, Neural Information Processing Systems Workshop (NIPS\u201911) . Ronan Collobert, Koray Kavukcuoglu, and Cl\u00e9ment Farabet. 2011. Torch7: A matlab-like environment for machine learning. In Proceedings of the BigLearn, Neural Information Processing Systems Workshop (NIPS\u201911)."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the PASCAL Visual Object Classes Challenge 2007 (VOC\u201907)","author":"Everingham Mark","year":"2007","unstructured":"Mark Everingham , Luc Van Gool , Christopher KI Williams , John Winn , and Andrew Zisserman . 2007 . Proceedings of the PASCAL Visual Object Classes Challenge 2007 (VOC\u201907) . Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2007. Proceedings of the PASCAL Visual Object Classes Challenge 2007 (VOC\u201907)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2159430.2159436"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.68"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021745"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"key":"e_1_2_1_22_1","volume-title":"Dally","author":"Han Song","year":"2015","unstructured":"Song Han , Huizi Mao , and William J . Dally . 2015 . Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs\/1510.00149 (2015). Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs\/1510.00149 (2015)."},{"key":"e_1_2_1_23_1","volume-title":"Deep residual learning for image recognition. CoRR abs\/1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015a. Deep residual learning for image recognition. CoRR abs\/1512.03385 ( 2015 ). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015a. Deep residual learning for image recognition. CoRR abs\/1512.03385 (2015)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"volume-title":"Proceedings of the International Conference on Computational Sciences-Part I (ICCS\u201901)","author":"Im Eun-Jin","key":"e_1_2_1_26_1","unstructured":"Eun-Jin Im and Katherine A. Yelick . 2001. Optimizing sparse matrix computations for register reuse in SPARSITY . In Proceedings of the International Conference on Computational Sciences-Part I (ICCS\u201901) . 127--136. Eun-Jin Im and Katherine A. Yelick. 2001. Optimizing sparse matrix computations for register reuse in SPARSITY. In Proceedings of the International Conference on Computational Sciences-Part I (ICCS\u201901). 127--136."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201912)","author":"Krizhevsky Alex","key":"e_1_2_1_28_1","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012. ImageNet classification with deep convolutional neural networks . In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201912) . Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201912)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465013"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.64"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004038951"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851190"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.05.304"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.88"},{"key":"e_1_2_1_36_1","unstructured":"NVIDIA. 2016. DIGITS\u2014Interactive Deep Learning GPU Training System. Retrieved from https:\/\/developer.nvidia.com\/digits.  NVIDIA. 2016. DIGITS\u2014Interactive Deep Learning GPU Training System. Retrieved from https:\/\/developer.nvidia.com\/digits."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503281"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201915)","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015 . Faster R-CNN: Towards real-time object detection with region proposal networks . In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201915) . Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201915)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_40_1","volume-title":"Iterative Methods for Sparse Linear Systems","author":"Saad Yousef","unstructured":"Yousef Saad . 2003. Iterative Methods for Sparse Linear Systems ( 2 nd ed.). Society for Industrial and Applied Mathematics , Philadelphia, PA . Yousef Saad. 2003. Iterative Methods for Sparse Linear Systems (2nd ed.). Society for Industrial and Applied Mathematics, Philadelphia, PA.","edition":"2"},{"volume-title":"Parallel Processing and Applied Mathematics","author":"Saule Erik","key":"e_1_2_1_41_1","unstructured":"Erik Saule , Kamer Kaya , and \u00dcmit V. \u00c7ataly\u00fcrek . 2014. Performance evaluation of sparse matrix multiplication kernels on intel xeon phi . In Parallel Processing and Applied Mathematics . Springer , Berlin , 559--570. Erik Saule, Kamer Kaya, and \u00dcmit V. \u00c7ataly\u00fcrek. 2014. Performance evaluation of sparse matrix multiplication kernels on intel xeon phi. In Parallel Processing and Applied Mathematics. Springer, Berlin, 559--570."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR\u201915).  Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR\u201915).","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/2738600.2738618"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2015.115"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2011.08.003"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062207"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362674"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062244"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555255"},{"volume-title":"Proceedings of the 13th European Conference in Computer Vision. 818--833","author":"Matthew","key":"e_1_2_1_51_1","unstructured":"Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks . In Proceedings of the 13th European Conference in Computer Vision. 818--833 . Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the 13th European Conference in Computer Vision. 818--833."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2966986.2967011"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3122788","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3122788","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:05:08Z","timestamp":1750273508000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3122788"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,7]]},"references-count":51,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,3,31]]}},"alternative-id":["10.1145\/3122788"],"URL":"https:\/\/doi.org\/10.1145\/3122788","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2017,12,7]]},"assertion":[{"value":"2016-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-12-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}