{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T15:49:51Z","timestamp":1761061791685,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,8]],"date-time":"2022-10-08T00:00:00Z","timestamp":1665187200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2018016"],"award-info":[{"award-number":["2018016"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,8]]},"DOI":"10.1145\/3559009.3569667","type":"proceedings-article","created":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T14:02:50Z","timestamp":1674828170000},"page":"265-278","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["High-Performance Architecture Aware Sparse Convolutional Neural Networks for GPUs"],"prefix":"10.1145","author":[{"given":"Lizhi","family":"Xiang","sequence":"first","affiliation":[{"name":"University of Utah"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"University of Utah"}]},{"given":"Aravind","family":"Sukumaran-Rajam","sequence":"additional","affiliation":[{"name":"Meta Platforms"}]}],"member":"320","published-online":{"date-parts":[[2023,1,27]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"TensorFlow: Large-scale machine learning on heterogeneous systems","author":"Abadi Mart\u00edn","year":"2015","unstructured":"Mart\u00edn Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S. Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Ian Goodfellow , Andrew Harp , Geoffrey Irving , Michael Isard , Yangqing Jia , Rafal Jozefowicz , Lukasz Kaiser , Manjunath Kudlur , Josh Levenberg , Dandelion Man\u00e9 , Rajat Monga , Sherry Moore , Derek Murray , Chris Olah , Mike Schuster , Jonathon Shlens , Benoit Steiner , Ilya Sutskever , Kunal Talwar , Paul Tucker , Vincent Vanhoucke , Vijay Vasudevan , Fernanda Vi\u00e9gas , Oriol Vinyals , Pete Warden , Martin Wattenberg , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . TensorFlow: Large-scale machine learning on heterogeneous systems , 2015 . Software available from tensorflow.org. Mart\u00edn Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Man\u00e9, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi\u00e9gas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989163"},{"key":"e_1_3_2_1_4_1","volume-title":"Escort: Efficient sparse convolutional neural networks on gpus. CoRR, abs\/1802.10280","author":"Chen Xuhao","year":"2018","unstructured":"Xuhao Chen . Escort: Efficient sparse convolutional neural networks on gpus. CoRR, abs\/1802.10280 , 2018 . Xuhao Chen. Escort: Efficient sparse convolutional neural networks on gpus. CoRR, abs\/1802.10280, 2018."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_6_1","volume-title":"Submanifold sparse convolutional networks. CoRR, abs\/1706.01307","author":"Graham Benjamin","year":"2017","unstructured":"Benjamin Graham and Laurens van der Maaten . Submanifold sparse convolutional networks. CoRR, abs\/1706.01307 , 2017 . Benjamin Graham and Laurens van der Maaten. Submanifold sparse convolutional networks. CoRR, abs\/1706.01307, 2017."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"e_1_3_2_1_8_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding","author":"Han Song","year":"2016","unstructured":"Song Han , Huizi Mao , and William J. Dally . Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding , 2016 . Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, 2016."},{"key":"e_1_3_2_1_9_1","first-page":"1135","volume-title":"Advances in Neural Information Processing Systems 28","author":"Han Song","year":"2015","unstructured":"Song Han , Jeff Pool , John Tran , and William Dally . Learning both weights and connections for efficient neural network. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors , Advances in Neural Information Processing Systems 28 , pages 1135 -- 1143 . Curran Associates, Inc. , 2015 . Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 1135--1143. Curran Associates, Inc., 2015."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPIN.2017.8049931"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_1_14_1","volume-title":"An empirical evaluation of deep learning on highway driving. CoRR, abs\/1504.01716","author":"Huval Brody","year":"2015","unstructured":"Brody Huval , Tao Wang , Sameep Tandon , Jeff Kiske , Will Song , Joel Pazhayampallil , Mykhaylo Andriluka , Pranav Rajpurkar , Toki Migimatsu , Royce Cheng-Yue , Fernando A. Mujica , Adam Coates , and Andrew Y. Ng . An empirical evaluation of deep learning on highway driving. CoRR, abs\/1504.01716 , 2015 . Brody Huval, Tao Wang, Sameep Tandon, Jeff Kiske, Will Song, Joel Pazhayampallil, Mykhaylo Andriluka, Pranav Rajpurkar, Toki Migimatsu, Royce Cheng-Yue, Fernando A. Mujica, Adam Coates, and Andrew Y. Ng. An empirical evaluation of deep learning on highway driving. CoRR, abs\/1504.01716, 2015."},{"key":"e_1_3_2_1_15_1","volume-title":"Deep roots: Improving CNN efficiency with hierarchical filter groups. CoRR, abs\/1605.06489","author":"Ioannou Yani","year":"2016","unstructured":"Yani Ioannou , Duncan P. Robertson , Roberto Cipolla , and Antonio Criminisi . Deep roots: Improving CNN efficiency with hierarchical filter groups. CoRR, abs\/1605.06489 , 2016 . Yani Ioannou, Duncan P. Robertson, Roberto Cipolla, and Antonio Criminisi. Deep roots: Improving CNN efficiency with hierarchical filter groups. CoRR, abs\/1605.06489, 2016."},{"key":"e_1_3_2_1_16_1","volume-title":"Speeding up convolutional neural networks with low rank expansions. ArXiv, abs\/1405.3866","author":"Jaderberg Max","year":"2014","unstructured":"Max Jaderberg , Andrea Vedaldi , and Andrew Zisserman . Speeding up convolutional neural networks with low rank expansions. ArXiv, abs\/1405.3866 , 2014 . Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. ArXiv, abs\/1405.3866, 2014."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115709"},{"key":"e_1_3_2_1_18_1","volume-title":"Pruning filters for efficient convnets. CoRR, abs\/1608.08710","author":"Li Hao","year":"2016","unstructured":"Hao Li , Asim Kadav , Igor Durdanovic , Hanan Samet , and Hans Peter Graf . Pruning filters for efficient convnets. CoRR, abs\/1608.08710 , 2016 . Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. Pruning filters for efficient convnets. CoRR, abs\/1608.08710, 2016."},{"key":"e_1_3_2_1_19_1","volume-title":"Dynamic runtime feature map pruning. arXiv preprint arXiv:1812.09922","author":"Liang Tailin","year":"2018","unstructured":"Tailin Liang , Lei Wang , Shaobo Shi , and John Glossner . Dynamic runtime feature map pruning. arXiv preprint arXiv:1812.09922 , 2018 . Tailin Liang, Lei Wang, Shaobo Shi, and John Glossner. Dynamic runtime feature map pruning. arXiv preprint arXiv:1812.09922, 2018."},{"key":"e_1_3_2_1_20_1","volume-title":"Top 500: HPCG - june","author":"Top","year":"2022","unstructured":"Top 500 list. Top 500: HPCG - june 2022 . https:\/\/www.top500.org\/lists\/hpcg\/2022\/06\/, 2022. Top 500 list. Top 500: HPCG - june 2022. https:\/\/www.top500.org\/lists\/hpcg\/2022\/06\/, 2022."},{"key":"e_1_3_2_1_21_1","volume-title":"The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Liu Baoyuan","year":"2015","unstructured":"Baoyuan Liu , Min Wang , Hassan Foroosh , Marshall Tappen , and Marianna Pensky . Sparse convolutional neural networks . In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , June 2015 . Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. Sparse convolutional neural networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015."},{"key":"e_1_3_2_1_22_1","volume-title":"Efficient sparse-winograd convolutional neural networks. CoRR, abs\/1802.06367","author":"Liu Xingyu","year":"2018","unstructured":"Xingyu Liu , Jeff Pool , Song Han , and William J. Dally . Efficient sparse-winograd convolutional neural networks. CoRR, abs\/1802.06367 , 2018 . Xingyu Liu, Jeff Pool, Song Han, and William J. Dally. Efficient sparse-winograd convolutional neural networks. CoRR, abs\/1802.06367, 2018."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.541"},{"key":"e_1_3_2_1_24_1","unstructured":"Nvidia. Nvidia management library. https:\/\/developer.nvidia.com\/nvidia-management-library-nvml.  Nvidia. Nvidia management library. https:\/\/developer.nvidia.com\/nvidia-management-library-nvml."},{"key":"e_1_3_2_1_25_1","article-title":"A survey of the usages of deep learning for natural language processing","author":"Otter D. W.","year":"2020","unstructured":"D. W. Otter , J. R. Medina , and J. K. Kalita . A survey of the usages of deep learning for natural language processing . IEEE Transactions on Neural Networks and Learning Systems, pages 1--21 , 2020 . D. W. Otter, J. R. Medina, and J. K. Kalita. A survey of the usages of deep learning for natural language processing. IEEE Transactions on Neural Networks and Learning Systems, pages 1--21, 2020.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems, pages 1--21"},{"key":"e_1_3_2_1_26_1","volume-title":"Holistic sparsecnn: Forging the trident of accuracy, speed, and size. CoRR, abs\/1608.01409","author":"Park Jongsoo","year":"2016","unstructured":"Jongsoo Park , Sheng R. Li , Wei Wen , Hai Li , Yiran Chen , and Pradeep Dubey . Holistic sparsecnn: Forging the trident of accuracy, speed, and size. CoRR, abs\/1608.01409 , 2016 . Jongsoo Park, Sheng R. Li, Wei Wen, Hai Li, Yiran Chen, and Pradeep Dubey. Holistic sparsecnn: Forging the trident of accuracy, speed, and size. CoRR, abs\/1608.01409, 2016."},{"key":"e_1_3_2_1_27_1","volume-title":"Sbnet: Sparse blocks network for fast inference. CoRR, abs\/1801.02108","author":"Ren Mengye","year":"2018","unstructured":"Mengye Ren , Andrei Pokrovsky , Bin Yang , and Raquel Urtasun . Sbnet: Sparse blocks network for fast inference. CoRR, abs\/1801.02108 , 2018 . Mengye Ren, Andrei Pokrovsky, Bin Yang, and Raquel Urtasun. Sbnet: Sparse blocks network for fast inference. CoRR, abs\/1801.02108, 2018."},{"key":"e_1_3_2_1_28_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014 . Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2016.2525803"},{"key":"e_1_3_2_1_30_1","first-page":"241","volume-title":"Medical Imaging 2016: Computer-Aided Diagnosis","author":"Sun Wenqing","year":"2016","unstructured":"Wenqing Sun , Bin Zheng , and Wei Qian . Computer aided lung cancer diagnosis with deep learning algorithms . In Georgia D. Tourassi and Samuel G. Armato III, editors, Medical Imaging 2016: Computer-Aided Diagnosis , volume 9785 , pages 241 -- 248 . International Society for Optics and Photonics, SPIE , 2016 . Wenqing Sun, Bin Zheng, and Wei Qian. Computer aided lung cancer diagnosis with deep learning algorithms. In Georgia D. Tourassi and Samuel G. Armato III, editors, Medical Imaging 2016: Computer-Aided Diagnosis, volume 9785, pages 241 -- 248. International Society for Optics and Photonics, SPIE, 2016."},{"key":"e_1_3_2_1_31_1","volume-title":"May","author":"Team Theano Development","year":"2016","unstructured":"Theano Development Team . Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs\/1605.02688 , May 2016 . Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs\/1605.02688, May 2016."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157329"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46723-8_14"},{"key":"e_1_3_2_1_34_1","volume-title":"Accelerating convolutional neural network by exploiting sparsity on gpus. arXiv preprint arXiv:1909.09927","author":"Xu Weizhi","year":"2019","unstructured":"Weizhi Xu , Shengyu Fan , Hui Yu , and Xin Fu . Accelerating convolutional neural network by exploiting sparsity on gpus. arXiv preprint arXiv:1909.09927 , 2019 . Weizhi Xu, Shengyu Fan, Hui Yu, and Xin Fu. Accelerating convolutional neural network by exploiting sparsity on gpus. arXiv preprint arXiv:1909.09927, 2019."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2018.2840738"}],"event":{"name":"PACT '22: International Conference on Parallel Architectures and Compilation Techniques","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IFIP WG 10.3 IFIP WG 10.3","IEEE CS"],"location":"Chicago Illinois","acronym":"PACT '22"},"container-title":["Proceedings of the International Conference on Parallel Architectures and Compilation Techniques"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3559009.3569667","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3559009.3569667","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3559009.3569667","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:38Z","timestamp":1750186958000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3559009.3569667"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,8]]},"references-count":35,"alternative-id":["10.1145\/3559009.3569667","10.1145\/3559009"],"URL":"https:\/\/doi.org\/10.1145\/3559009.3569667","relation":{},"subject":[],"published":{"date-parts":[[2022,10,8]]},"assertion":[{"value":"2023-01-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}