{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T14:25:26Z","timestamp":1775139926881,"version":"3.50.1"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T00:00:00Z","timestamp":1631836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>\n            Over the past several years, the need for on-device deep learning has been rapidly increasing, and efficient CNN inference on mobile platforms has been actively researched. Sparsity exploitation has been one of the most active research themes, but the studies mostly focus on weight sparsity by weight pruning. Activation sparsity, on the contrary, requires compression at runtime for every input tensor. Hence, the research on activation sparsity mainly targets NPUs that can efficiently process this with their own hardware logic. In this paper, we observe that it is difficult to accelerate CNN inference on mobile GPUs with natural activation sparsity and that the widely used CSR-based sparse convolution is not sufficiently effective due to the compression overhead. We propose several novel sparsification methods that can boost activation sparsity without harming accuracy. In particular, we selectively sparsify some layers with an extremely high sparsity and adopt sparse convolution or dense convolution depending on the layers. Further, we present an efficient sparse convolution method without compression and demonstrate that it can be faster than the CSR implementation. With ResNet-50, we achieved 1.88\n            <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>\n                  \n                <\/jats:tex-math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>\n            speedup compared to TFLite on a Mali-G76 GPU.\n          <\/jats:p>","DOI":"10.1145\/3477008","type":"journal-article","created":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T18:36:51Z","timestamp":1631903811000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Exploiting Activation Sparsity for Fast CNN Inference on Mobile GPUs"],"prefix":"10.1145","volume":"20","author":[{"given":"Chanyoung","family":"Oh","sequence":"first","affiliation":[{"name":"KT AI2XL, Republic of Korea"}]},{"given":"Junhyuk","family":"So","sequence":"additional","affiliation":[{"name":"University of Seoul, Republic of Korea"}]},{"given":"Sumin","family":"Kim","sequence":"additional","affiliation":[{"name":"University of Seoul, Republic of Korea"}]},{"given":"Youngmin","family":"Yi","sequence":"additional","affiliation":[{"name":"University of Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2021,9,17]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2016. ARM Compute Library. https:\/\/github.com\/ARM-software\/ComputeLibrary.  2016. ARM Compute Library. https:\/\/github.com\/ARM-software\/ComputeLibrary."},{"key":"e_1_2_1_2_1","unstructured":"2017. TensorFlow Lite. https:\/\/www.tensorflow.org\/lite.  2017. TensorFlow Lite. https:\/\/www.tensorflow.org\/lite."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330701"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001138"},{"key":"e_1_2_1_5_1","volume-title":"Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arxiv:1908.09791","author":"Cai Han","year":"2019","unstructured":"Han Cai , Chuang Gan , Tianzhe Wang , Zhekai Zhang , and Song Han . 2019 . Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arxiv:1908.09791 (2019). Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2019. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arxiv:1908.09791 (2019)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01147"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/4235.996017"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01464"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.194"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00725"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2909437.2909442"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2017.2741463"},{"key":"e_1_2_1_16_1","volume-title":"International Conference on Machine Learning. PMLR, 5533\u20135543","author":"Kurtz Mark","year":"2020","unstructured":"Mark Kurtz , Justin Kopinsky , Rati Gelashvili , Alexander Matveev , John Carr , Michael Goin , William Leiserson , Sage Moore , Nir Shavit , and Dan Alistarh . 2020 . Inducing and exploiting activation sparsity for fast inference on deep neural networks . In International Conference on Machine Learning. PMLR, 5533\u20135543 . Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, and Dan Alistarh. 2020. Inducing and exploiting activation sparsity for fast inference on deep neural networks. In International Conference on Machine Learning. PMLR, 5533\u20135543."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_2_1_18_1","volume-title":"Pruning filters for efficient convnets. arXiv preprint arxiv:1608.08710","author":"Li Hao","year":"2016","unstructured":"Hao Li , Asim Kadav , Igor Durdanovic , Hanan Samet , and Hans Peter Graf . 2016. Pruning filters for efficient convnets. arXiv preprint arxiv:1608.08710 ( 2016 ). Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arxiv:1608.08710 (2016)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00290"},{"key":"e_1_2_1_20_1","volume-title":"Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arxiv:1705.08922","author":"Mao Huizi","year":"2017","unstructured":"Huizi Mao , Song Han , Jeff Pool , Wenshuo Li , Xingyu Liu , Yu Wang , and William J Dally . 2017. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arxiv:1705.08922 ( 2017 ). Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J Dally. 2017. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arxiv:1705.08922 (2017)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378534"},{"key":"e_1_2_1_22_1","volume-title":"2017 5th International Conference on Learning Representations","author":"Park Jongsoo","year":"2017","unstructured":"Jongsoo Park , Sheng Li , Wei Wen , Ping Tak Peter Tang , Hai Li , Yiran Chen , and Pradeep Dubey . 2017 . Faster cnns with direct sparse convolutions and guided pruning . 2017 5th International Conference on Learning Representations (2017). Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster cnns with direct sparse convolutions and guided pruning. 2017 5th International Conference on Learning Representations (2017)."},{"key":"e_1_2_1_23_1","unstructured":"Qualcomm Technologies Inc.2017. Qualcomm\u00ae Snapdragon\u2122 Mobile Platform OpenCL General Programming and Optimization. https:\/\/developer.qualcomm.com\/qfile\/33472\/80-nb295-11_a.pdf.  Qualcomm Technologies Inc.2017. Qualcomm\u00ae Snapdragon\u2122 Mobile Platform OpenCL General Programming and Optimization. https:\/\/developer.qualcomm.com\/qfile\/33472\/80-nb295-11_a.pdf."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00908"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00239"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414654"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-29611-7_6"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICTAI.2019.00197"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080215"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2018.8342010"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477008","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:46Z","timestamp":1750188646000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477008"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,17]]},"references-count":31,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3477008"],"URL":"https:\/\/doi.org\/10.1145\/3477008","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,17]]},"assertion":[{"value":"2021-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}