{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:40:13Z","timestamp":1755870013868,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","funder":[{"name":"Sichuan Natural Science Foundation for Distinguished Young Scholar","award":["2023NSFSC1966"],"award-info":[{"award-number":["2023NSFSC1966"]}]},{"name":"National Natural Science Foundation of China","award":["61672438"],"award-info":[{"award-number":["61672438"]}]},{"name":"Postgraduate Innovation Fund Project by Southwest University of Science and Technology","award":["24ycx1137"],"award-info":[{"award-number":["24ycx1137"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,8]]},"DOI":"10.1145\/3721145.3725770","type":"proceedings-article","created":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T12:57:17Z","timestamp":1755867437000},"page":"161-172","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["HR-SpMM: Adaptive Row Partitioning and Hybrid Kernel Design for Sparse Matrix Multiplication"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-0204-6678","authenticated-orcid":false,"given":"Qi","family":"Wang","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6602-9994","authenticated-orcid":false,"given":"Yaobin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2800-6976","authenticated-orcid":false,"given":"Yi","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2307-7674","authenticated-orcid":false,"given":"Rong","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5888-975X","authenticated-orcid":false,"given":"Pingping","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang, Sichuan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,8,22]]},"reference":[{"key":"e_1_3_3_1_2_2","volume-title":"Nvidia cuda toolkit","unstructured":"[n. d.]. Nvidia cuda toolkit. https:\/\/docs.nvidia.com\/cuda\/index.html."},{"key":"e_1_3_3_1_3_2","unstructured":"Willow Ahrens and Erik\u00a0G. Boman. 2020. On Optimal Partitioning For Sparse Matrices In Variable Block Row Format. ArXiv abs\/2005.12414 (2020). https:\/\/api.semanticscholar.org\/CorpusID:218889474"},{"key":"e_1_3_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.125"},{"key":"e_1_3_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_3_3_1_6_2","volume-title":"Nvidia","author":"api. Nvidia blocked-sparse","unstructured":"Nvidia blocked-sparse api.[n. d.]. Nvidia. https:\/\/docs.nvidia.com\/cuda\/cusparse\/index.html.."},{"key":"e_1_3_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693471"},{"key":"e_1_3_3_1_8_2","doi-asserted-by":"publisher","unstructured":"Jee\u00a0W. Choi Amik Singh and Richard\u00a0W. Vuduc. 2010. Model-driven autotuning of sparse matrix-vector multiply on GPUs. SIGPLAN Not. 45 5 (Jan. 2010) 115\u2013126. 10.1145\/1837853.1693471","DOI":"10.1145\/1837853.1693471"},{"key":"e_1_3_3_1_9_2","volume-title":"NVIDIA","author":"library. Nvshmem communication","unstructured":"Nvshmem communication library.[n. d.]. NVIDIA. https:\/\/developer.nvidia.com\/nvshmem.."},{"key":"e_1_3_3_1_10_2","doi-asserted-by":"publisher","unstructured":"Gunduz Vehbi Demirci Aparajita Haldar and Hakan Ferhatosmanoglu. 2022. Scalable Graph Convolutional Network Training on Distributed-Memory Systems. Proceedings of the VLDB Endowment 16 4 (1 Dec. 2022) 711\u2013724. 10.14778\/3574245.3574256The 49th International Conference on Very Large Data Bases 2023 VLDB 2023 ; Conference date: 28-08-2023 Through 01-09-2023.","DOI":"10.14778\/3574245.3574256"},{"key":"e_1_3_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3502181.3531467"},{"key":"e_1_3_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00021"},{"key":"e_1_3_3_1_13_2","first-page":"1","volume-title":"IEEE INFOCOM 2023-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","author":"Guo Mingfeng","year":"2023","unstructured":"Mingfeng Guo, Yaobin Wang, Yajun Gu, Yufang Chen, Huan Liu, Huarong Chen, Dongxuan Han, Hengyang Xu, Chunhua Deng, Pingping Tang, et\u00a0al. 2023. Bs-SpMM: Accelerate Sparse Matrix-Matrix Multiplication by Balanced Split Strategy on the GPU. In IEEE INFOCOM 2023-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 1\u20136."},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3208040.3208062"},{"key":"e_1_3_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293883.3295712"},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00076"},{"key":"e_1_3_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Zhiyuan Li Xun\u00a0Jian 0001 Yue Wang Yingxia Shao and Lei Chen. 2024. DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning. PVLDB 17 6 (February 2024) 1364\u20131376. https:\/\/www.vldb.org\/pvldb\/vol17\/p1364-li.pdf","DOI":"10.14778\/3648160.3648176"},{"key":"e_1_3_3_1_18_2","volume-title":"Nvidia","author":"accumulate(wmma). Warp matrix\u00a0multiply","unstructured":"Warp matrix\u00a0multiply accumulate(wmma).[n. d.]. Nvidia. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html.."},{"key":"e_1_3_3_1_19_2","volume-title":"GPU Technology Conference","volume":"12","author":"Naumov Maxim","year":"2010","unstructured":"Maxim Naumov, L Chien, Philippe Vandermersch, and Ujval Kapasi. 2010. Cusparse library. In GPU Technology Conference , Vol.\u00a012."},{"key":"e_1_3_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA63168.2024.00066"},{"key":"e_1_3_3_1_21_2","unstructured":"Narayanan Sundaram Nadathur Satish Md. Mostofa\u00a0Ali Patwary Subramanya\u00a0R. Dulloor Michael\u00a0J. Anderson Satya\u00a0Gautam Vadlamudi Dipankar Das and Pradeep\u00a0K. Dubey. 2015. GraphMat: High performance graph analytics made productive. ArXiv abs\/1503.07241 (2015). https:\/\/api.semanticscholar.org\/CorpusID:8312489"},{"key":"e_1_3_3_1_22_2","volume-title":"NVIDIA","author":"operations. Improved tensor\u00a0core","unstructured":"Improved tensor\u00a0core operations.[n. d.]. NVIDIA. https:\/\/docs.nvidia.com\/cuda\/ampere-tuning-guide\/index.html."},{"key":"e_1_3_3_1_23_2","unstructured":"Kiran\u00a0Koshy Thekumparampil Chong Wang Sewoong Oh and Li-Jia Li. 2018. Attention-based Graph Neural Network for Semi-supervised Learning. ArXiv abs\/1803.03735 (2018). https:\/\/api.semanticscholar.org\/CorpusID:3847272"},{"key":"e_1_3_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5214359"},{"key":"e_1_3_3_1_25_2","doi-asserted-by":"publisher","unstructured":"Aristidis\u00a0G. Vrahatis Konstantinos Lazaros and Sotiris Kotsiantis. 2024. Graph Attention Networks: A Comprehensive Review of Methods and Applications. Future Internet 16 9 (2024). 10.3390\/fi16090318","DOI":"10.3390\/fi16090318"},{"key":"e_1_3_3_1_26_2","unstructured":"Minjie Wang Da Zheng Zihao Ye Quan Gan Mufei Li Xiang Song Jinjing Zhou Chao Ma Lingfan Yu Yu Gai Tianjun Xiao Tong He George Karypis Jinyang Li and Zheng Zhang. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. (2019)."},{"key":"e_1_3_3_1_27_2","first-page":"515","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Wang Yuke","year":"2021","unstructured":"Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 515\u2013531. https:\/\/www.usenix.org\/conference\/osdi21\/presentation\/wang-yuke"},{"key":"e_1_3_3_1_28_2","first-page":"149","volume-title":"2023 USENIX Annual Technical Conference (USENIX ATC 23)","author":"Wang Yuke","year":"2023","unstructured":"Yuke Wang, Boyuan Feng, Zheng Wang, Guyue Huang, and Yufei Ding. 2023. TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 149\u2013164. https:\/\/www.usenix.org\/conference\/atc23\/presentation\/wang-yuke"},{"key":"e_1_3_3_1_29_2","unstructured":"Keyulu Xu Weihua Hu Jure Leskovec and Stefanie Jegelka. 2018. How Powerful are Graph Neural Networks? ArXiv abs\/1810.00826 (2018). https:\/\/api.semanticscholar.org\/CorpusID:52895589"},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Carl Yang Ayd\u0131n Bulu\u00e7 and John\u00a0Douglas Owens. 2019. GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU. ACM Transactions on Mathematical Software (TOMS) 48 (2019) 1 \u2013 51. https:\/\/api.semanticscholar.org\/CorpusID:198167536","DOI":"10.1145\/3466795"},{"key":"e_1_3_3_1_31_2","doi-asserted-by":"crossref","unstructured":"Yi Yang Ping Xiang Jingfei Kong Mike Mantor and Huiyang Zhou. 2012. A unified optimizing compiler framework for different GPGPU architectures. ACM Transactions on Architecture and Code Optimization (TACO) 9 2 (2012) 1\u201333.","DOI":"10.1145\/2207222.2207225"}],"event":{"name":"ICS '25: 2025 International Conference on Supercomputing","location":"Salt Lake City USA","acronym":"ICS '25","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 39th ACM International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721145.3725770","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:02:25Z","timestamp":1755867745000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721145.3725770"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,8]]},"references-count":30,"alternative-id":["10.1145\/3721145.3725770","10.1145\/3721145"],"URL":"https:\/\/doi.org\/10.1145\/3721145.3725770","relation":{},"subject":[],"published":{"date-parts":[[2025,6,8]]},"assertion":[{"value":"2025-08-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}