{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,12,1]],"date-time":"2023-12-01T05:14:14Z","timestamp":1701407654087},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,7]]},"DOI":"10.1145\/3605573.3605632","type":"proceedings-article","created":{"date-parts":[[2023,9,13]],"date-time":"2023-09-13T16:21:16Z","timestamp":1694622076000},"update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs"],"prefix":"10.1145","author":[{"ORCID":"http:\/\/orcid.org\/0009-0007-1455-8725","authenticated-orcid":false,"given":"Junqing","family":"Lin","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, China"}]},{"ORCID":"http:\/\/orcid.org\/0009-0000-5707-0606","authenticated-orcid":false,"given":"Honghe","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, China"}]},{"ORCID":"http:\/\/orcid.org\/0009-0001-3516-8765","authenticated-orcid":false,"given":"Xiaolong","family":"Shi","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-5098-1503","authenticated-orcid":false,"given":"Jingwei","family":"Sun","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-1497-5525","authenticated-orcid":false,"given":"Xianzhi","family":"Yu","sequence":"additional","affiliation":[{"name":"Huawei Noah's Ark Lab, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-6924-2468","authenticated-orcid":false,"given":"Jun","family":"Yao","sequence":"additional","affiliation":[{"name":"Huawei Noah's Ark Lab, China"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-0794-7681","authenticated-orcid":false,"given":"Guangzhong","family":"Sun","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, China"}]}],"member":"320","published-online":{"date-parts":[[2023,9,13]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2022. Basic Linear Algebra on NVIDIA GPUs. https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html. 2022. Basic Linear Algebra on NVIDIA GPUs. https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html."},{"key":"e_1_3_2_1_2_1","unstructured":"2022. A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication. https:\/\/docs.nvidia.com\/cuda\/cusparse\/index.html. 2022. A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication. https:\/\/docs.nvidia.com\/cuda\/cusparse\/index.html."},{"key":"e_1_3_2_1_3_1","unstructured":"2022. The sdk for high-performance deep learning inference. https:\/\/docs.nvidia.com\/deeplearning\/tensorrt\/. 2022. The sdk for high-performance deep learning inference. https:\/\/docs.nvidia.com\/deeplearning\/tensorrt\/."},{"key":"e_1_3_2_1_4_1","volume-title":"Dual Lottery Ticket Hypothesis. In The Tenth International Conference on Learning Representations, ICLR 2022","author":"Bai Yue","year":"2022","unstructured":"Yue Bai , Huan Wang , Zhiqiang Tao , Kunpeng Li , and Yun Fu . 2022 . Dual Lottery Ticket Hypothesis. In The Tenth International Conference on Learning Representations, ICLR 2022 , Virtual Event , April 25-29, 2022. OpenReview.net. Yue Bai, Huan Wang, Zhiqiang Tao, Kunpeng Li, and Yun Fu. 2022. Dual Lottery Ticket Hypothesis. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/592"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273513"},{"key":"e_1_3_2_1_7_1","volume-title":"TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , and Arvind Krishnamurthy . 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA , June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171\u20134186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_2_1_9_1","volume-title":"RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. In 57th ACM\/IEEE Design Automation Conference, DAC 2020","author":"Dong Peiyan","year":"2020","unstructured":"Peiyan Dong , Siyue Wang , Wei Niu , Chengming Zhang , Sheng Lin , Zhengang Li , Yifan Gong , Bin Ren , Xue Lin , and Dingwen Tao . 2020 . RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. In 57th ACM\/IEEE Design Automation Conference, DAC 2020 , San Francisco, CA, USA , July 20-24, 2020. IEEE, 1\u20136. Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, and Dingwen Tao. 2020. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. In 57th ACM\/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, July 20-24, 2020. IEEE, 1\u20136."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11023-020-09548-1"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00021"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_13_1","volume-title":"Structured Pruning for Deep Convolutional Neural Networks: A survey. arXiv preprint arXiv:2303.00566","author":"He Yang","year":"2023","unstructured":"Yang He and Lingao Xiao . 2023. Structured Pruning for Deep Convolutional Neural Networks: A survey. arXiv preprint arXiv:2303.00566 ( 2023 ). Yang He and Lingao Xiao. 2023. Structured Pruning for Deep Convolutional Neural Networks: A survey. arXiv preprint arXiv:2303.00566 (2023)."},{"key":"e_1_3_2_1_14_1","article-title":"Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks","volume":"22","author":"Hoefler Torsten","year":"2021","unstructured":"Torsten Hoefler , Dan Alistarh , Tal Ben-Nun , Nikoli Dryden , and Alexandra Peste . 2021 . Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks . J. Mach. Learn. Res. 22 (2021), 241:1\u2013241:124. Torsten Hoefler, Dan Alistarh, Tal Ben-Nun, Nikoli Dryden, and Alexandra Peste. 2021. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22 (2021), 241:1\u2013241:124.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019","author":"Hong Changwan","year":"2019","unstructured":"Changwan Hong , Aravind Sukumaran-Rajam , Israt Nisa , Kunal Singh , and P. Sadayappan . 2019. Adaptive sparse tiling for sparse matrix multiplication . In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019 , Washington, DC, USA , February 16-20, 2019 . ACM, 300\u2013314. Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, DC, USA, February 16-20, 2019. ACM, 300\u2013314."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115709"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020","author":"Kurt S\u00fcreyya\u00a0Emre","year":"2020","unstructured":"S\u00fcreyya\u00a0Emre Kurt , Aravind Sukumaran-Rajam , Fabrice Rastello , and P. Sadayappan . 2020. Efficient tiled sparse matrix multiplication through matrix signatures . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020 , Virtual Event \/ Atlanta, Georgia, USA , November 9-19, 2020 . IEEE\/ACM, 87. S\u00fcreyya\u00a0Emre Kurt, Aravind Sukumaran-Rajam, Fabrice Rastello, and P. Sadayappan. 2020. Efficient tiled sparse matrix multiplication through matrix signatures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event \/ Atlanta, Georgia, USA, November 9-19, 2020. IEEE\/ACM, 87."},{"key":"e_1_3_2_1_18_1","volume-title":"HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark. In 9th International Conference on Learning Representations, ICLR 2021","author":"Li Chaojian","year":"2021","unstructured":"Chaojian Li , Zhongzhi Yu , Yonggan Fu , Yongan Zhang , Yang Zhao , Haoran You , Qixuan Yu , Yue Wang , Cong Hao , and Yingyan Lin . 2021 . HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark. In 9th International Conference on Learning Representations, ICLR 2021 , Virtual Event, Austria , May 3-7, 2021. OpenReview.net. Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, and Yingyan Lin. 2021. HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/94"},{"key":"e_1_3_2_1_20_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020","author":"Ma Lingxiao","year":"2020","unstructured":"Lingxiao Ma , Zhiqiang Xie , Zhi Yang , Jilong Xue , Youshan Miao , Wei Cui , Wenxiang Hu , Fan Yang , Lintao Zhang , and Lidong Zhou . 2020 . Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks . In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020 , Virtual Event , November 4-6, 2020. USENIX Association, 881\u2013897. Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou. 2020. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 881\u2013897."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS51385.2021.00016"},{"key":"e_1_3_2_1_23_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , 2019 . Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019). Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_3_2_1_24_1","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Sanh Victor","year":"2020","unstructured":"Victor Sanh , Thomas Wolf , and Alexander\u00a0 M. Rush . 2020 . Movement Pruning: Adaptive Sparsity by Fine-Tuning . In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 , NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Victor Sanh, Thomas Wolf, and Alexander\u00a0M. Rush. 2020. Movement Pruning: Adaptive Sparsity by Fine-Tuning. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2022\/786"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414654"},{"key":"e_1_3_2_1_27_1","volume-title":"SparseDNN: Fast Sparse Deep Learning Inference on CPUs. CoRR abs\/2101.07948","author":"Wang Ziheng","year":"2021","unstructured":"Ziheng Wang . 2021. SparseDNN: Fast Sparse Deep Learning Inference on CPUs. CoRR abs\/2101.07948 ( 2021 ). Ziheng Wang. 2021. SparseDNN: Fast Sparse Deep Learning Inference on CPUs. CoRR abs\/2101.07948 (2021)."},{"key":"e_1_3_2_1_28_1","volume-title":"Fast Sparse Deep Neural Network Inference with Flexible SpMM Optimization Space Exploration. In 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021","author":"Xin Jie","year":"2021","unstructured":"Jie Xin , Xianqi Ye , Long Zheng , Qinggang Wang , Yu Huang , Pengcheng Yao , Linchen Yu , Xiaofei Liao , and Hai Jin . 2021 . Fast Sparse Deep Neural Network Inference with Flexible SpMM Optimization Space Exploration. In 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021 , Waltham, MA, USA , September 20-24, 2021. IEEE, 1\u20137. Jie Xin, Xianqi Ye, Long Zheng, Qinggang Wang, Yu Huang, Pengcheng Yao, Linchen Yu, Xiaofei Liao, and Hai Jin. 2021. Fast Sparse Deep Neural Network Inference with Flexible SpMM Optimization Space Exploration. In 2021 IEEE High Performance Extreme Computing Conference, HPEC 2021, Waltham, MA, USA, September 20-24, 2021. IEEE, 1\u20137."},{"key":"e_1_3_2_1_29_1","volume-title":"Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020","author":"Zheng Lianmin","year":"2020","unstructured":"Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody\u00a0Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , Joseph\u00a0 E. Gonzalez , and Ion Stoica . 2020 . Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020 , Virtual Event , November 4-6, 2020. USENIX Association, 863\u2013879. Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody\u00a0Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph\u00a0E. Gonzalez, and Ion Stoica. 2020. Ansor: Generating High-Performance Tensor Programs for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4-6, 2020. USENIX Association, 863\u2013879."},{"key":"e_1_3_2_1_30_1","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022","author":"Zheng Ningxin","year":"2022","unstructured":"Ningxin Zheng , Bin Lin , Quanlu Zhang , Lingxiao Ma , Yuqing Yang , Fan Yang , Yang Wang , Mao Yang , and Lidong Zhou . 2022 . SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute . In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022 , Carlsbad, CA, USA , July 11-13, 2022. USENIX Association, 213\u2013232. Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, and Lidong Zhou. 2022. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, Carlsbad, CA, USA, July 11-13, 2022. USENIX Association, 213\u2013232."},{"key":"e_1_3_2_1_31_1","volume-title":"ROLLER: Fast and Efficient Tensor Compilation for Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022","author":"Zhu Hongyu","year":"2022","unstructured":"Hongyu Zhu , Ruofan Wu , Yijia Diao , Shanbin Ke , Haoyu Li , Chen Zhang , Jilong Xue , Lingxiao Ma , Yuqing Xia , Wei Cui , Fan Yang , Mao Yang , Lidong Zhou , Asaf Cidon , and Gennady Pekhimenko . 2022 . ROLLER: Fast and Efficient Tensor Compilation for Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022 , Carlsbad, CA, USA , July 11-13, 2022. USENIX Association, 233\u2013248. Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, and Gennady Pekhimenko. 2022. ROLLER: Fast and Efficient Tensor Compilation for Deep Learning. In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, Carlsbad, CA, USA, July 11-13, 2022. USENIX Association, 233\u2013248."}],"event":{"name":"ICPP 2023: 52nd International Conference on Parallel Processing","location":"Salt Lake City UT USA","acronym":"ICPP 2023"},"container-title":["Proceedings of the 52nd International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3605573.3605632","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,1]],"date-time":"2023-12-01T02:10:15Z","timestamp":1701396615000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3605573.3605632"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,7]]},"references-count":31,"alternative-id":["10.1145\/3605573.3605632","10.1145\/3605573"],"URL":"http:\/\/dx.doi.org\/10.1145\/3605573.3605632","relation":{},"published":{"date-parts":[[2023,8,7]]},"assertion":[{"value":"2023-09-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}