{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T12:12:32Z","timestamp":1767183152706,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T00:00:00Z","timestamp":1742428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62072204"],"award-info":[{"award-number":["62072204"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2021YFB1714600"],"award-info":[{"award-number":["2021YFB1714600"]}]},{"name":"National Supercomputing Center in Zhengzhou"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            The\n            <jats:italic>Sparse General Matrix-Matrix multiplication<\/jats:italic>\n            (SpGEMM) is a fundamental component for many applications, such as\n            <jats:italic>algebraic multigrid methods<\/jats:italic>\n            (AMG), graphic processing, and deep learning. However, the unbearable latency of computing high-dimensional, large-scale sparse matrix multiplication on GPUs hinders the development of these applications. An effective approach is heterogeneous cores collaborative computing, but this method must address three aspects: (1) irregular non-zero elements lead to load imbalance and irregular memory access, (2) different core computing latency differences reduce computational parallelism, and (3) temporary data transfer between different cores introduces additional latency overhead. In this work, we propose an innovative framework for collaborative large-scale sparse matrix multiplication on CPU-GPU heterogeneous cores, named ApSpGEMM. ApSpGEMM is based on sparsity rules and proposes reordering and splitting algorithms to eliminate the impact of non-zero element distribution features on load and memory access. Then adaptive panels allocation with affinity constraints among cores improves computational parallelism. Finally, carefully arranged asynchronous data transmission and computation balance communication overhead. Compared with state-of-the-art SpGEMM methods, our approach provides excellent absolute performance on matrices with different sparse structures. On heterogeneous cores, the GFlops of large-scale sparse matrix multiplication is improved by 2.25 to 7.21 times.\n          <\/jats:p>","DOI":"10.1145\/3703352","type":"journal-article","created":{"date-parts":[[2024,11,6]],"date-time":"2024-11-06T11:01:42Z","timestamp":1730890902000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["ApSpGEMM: Accelerating Large-scale SpGEMM with Heterogeneous Collaboration and Adaptive Panel"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0336-0522","authenticated-orcid":false,"given":"Dezhong","family":"Yao","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7735-3187","authenticated-orcid":false,"given":"Sifan","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1651-7784","authenticated-orcid":false,"given":"Tongtong","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6615-0699","authenticated-orcid":false,"given":"Gang","family":"Wu","sequence":"additional","affiliation":[{"name":"National Super Computing Center in Zhengzhou, Zhengzhou University, Zhengzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3934-7605","authenticated-orcid":false,"given":"Hai","family":"Jin","sequence":"additional","affiliation":[{"name":"School of computer science and technology, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,20]]},"reference":[{"issue":"5","key":"e_1_3_2_2_2","first-page":"C568\u2013C590","article-title":"Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication","volume":"36","author":"Akbudak Kadir","year":"2014","unstructured":"Kadir Akbudak and Cevdet Aykanat. 2014. Simultaneous input and output matrix partitioning for outer-product-parallel sparse matrix-matrix multiplication. SIAM Journal on Scientific Computing 36, 5 (2014), C568\u2013C590.","journal-title":"SIAM Journal on Scientific Computing"},{"issue":"3","key":"e_1_3_2_3_2","first-page":"13:1\u201313:34","article-title":"Partitioning models for scaling parallel sparse matrix-matrix multiplication","volume":"4","author":"Akbudak Kadir","year":"2018","unstructured":"Kadir Akbudak, Oguz Selvitopi, and Cevdet Aykanat. 2018. Partitioning models for scaling parallel sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing 4, 3 (2018), 13:1\u201313:34.","journal-title":"ACM Transactions on Parallel Computing"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3223512"},{"issue":"2","key":"e_1_3_2_5_2","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.11591\/ijeecs.v28.i2.pp1012-1019","article-title":"Matrix-matrix multiplication on graphics processing unit platform using tiling technique","volume":"28","author":"Balagafshe Rahman Ghasempour","year":"2022","unstructured":"Rahman Ghasempour Balagafshe, Alireza Akoushideh, and Asadollah Shahbahrami. 2022. Matrix-matrix multiplication on graphics processing unit platform using tiling technique. IAES Indonesian Journal of Electrical Engineering and Computer Science 28, 2 (2022), 1012\u20131019.","journal-title":"IAES Indonesian Journal of Electrical Engineering and Computer Science"},{"key":"e_1_3_2_6_2","first-page":"222","volume-title":"Proc. of the 2013 Symposium on Parallelism in Algorithms and Architectures (SPAA\u201913)","author":"Ballard Grey","year":"2013","unstructured":"Grey Ballard, Aydin Bulu\u00e7, James Demmel, Laura Grigori, Benjamin Lipshitz, Oded Schwartz, and Sivan Toledo. 2013. Communication optimal parallel multiplication of sparse random matrices. In Proc. of the 2013 Symposium on Parallelism in Algorithms and Architectures (SPAA\u201913). 222\u2013231."},{"issue":"3","key":"e_1_3_2_7_2","first-page":"18:1\u201318:34","article-title":"Hypergraph partitioning for sparse matrix-matrix multiplication","volume":"3","author":"Ballard Grey","year":"2016","unstructured":"Grey Ballard, Alex Druinsky, Nicholas Knight, and Oded Schwartz. 2016. Hypergraph partitioning for sparse matrix-matrix multiplication. ACM Transactions on Parallel Computing 3, 3 (2016), 18:1\u201318:34.","journal-title":"ACM Transactions on Parallel Computing"},{"issue":"4","key":"e_1_3_2_8_2","first-page":"C123\u2013C152","article-title":"Exposing fine-grained parallelism in algebraic multigrid methods","volume":"34","author":"Bell Nathan","year":"2012","unstructured":"Nathan Bell, Steven Dalton, and Luke N. Olson. 2012. Exposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing 34, 4 (2012), C123\u2013C152.","journal-title":"SIAM Journal on Scientific Computing"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019886628"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-022-05172-4"},{"key":"e_1_3_2_11_2","first-page":"101","volume-title":"Proc. of the 2023 International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201923)","author":"F\u00e8vre Valentin Le","year":"2023","unstructured":"Valentin Le F\u00e8vre and Marc Casas. 2023. Efficient execution of SpGEMM on long vector architectures. In Proc. of the 2023 International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201923). 101\u2013113."},{"issue":"1","key":"e_1_3_2_12_2","first-page":"C54\u2013C71","article-title":"GPU-accelerated sparse matrix-matrix multiplication by iterative row merging","volume":"37","author":"Gremse Felix","year":"2015","unstructured":"Felix Gremse, Andreas H\u00f6fter, Lars Ole Schwen, Fabian Kiessling, and Uwe Naumann. 2015. GPU-accelerated sparse matrix-matrix multiplication by iterative row merging. SIAM Journal on Scientific Computing 37, 1 (2015), C54\u2013C71.","journal-title":"SIAM Journal on Scientific Computing"},{"issue":"4","key":"e_1_3_2_13_2","first-page":"C429\u2013C449","article-title":"Memory-efficient sparse matrix-matrix multiplication by row merging on many-core architectures","volume":"40","author":"Gremse Felix","year":"2018","unstructured":"Felix Gremse, Kerstin K\u00fcpper, and Uwe Naumann. 2018. Memory-efficient sparse matrix-matrix multiplication by row merging on many-core architectures. SIAM Journal on Scientific Computing 40, 4 (2018), C429\u2013C449.","journal-title":"SIAM Journal on Scientific Computing"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/355791.355796"},{"key":"e_1_3_2_15_2","first-page":"300","volume-title":"Proc. of the 2019 SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919)","author":"Hong Changwan","year":"2019","unstructured":"Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive sparse tiling for sparse matrix multiplication. In Proc. of the 2019 SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919). 300\u2013314."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCOMM.2023.3236385"},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1007\/978-3-030-58601-0_36","volume-title":"Proc. of the Computer Vision 2020 European Conference (ECCV\u201920)","author":"Ishimtsev Vladislav","year":"2020","unstructured":"Vladislav Ishimtsev, Alexey Bokhovkin, Alexey Artemov, Savva Ignatyev, Matthias Nie\u00dfner, Denis Zorin, and Evgeny Burnaev. 2020. CAD-Deform: Deformable fitting of CAD models to 3D scans. In Proc. of the Computer Vision 2020 European Conference (ECCV\u201920). 599\u2013628."},{"issue":"12","key":"e_1_3_2_18_2","first-page":"2607","article-title":"HPMaX: Heterogeneous parallel matrix multiplication using CPUs and GPUs","volume":"102","author":"Kang Homin","year":"2020","unstructured":"Homin Kang, Hyuck-Chan Kwon, and Duksu Kim. 2020. HPMaX: Heterogeneous parallel matrix multiplication using CPUs and GPUs. Springer Computing 102, 12 (2020), 2607\u20132631.","journal-title":"Springer Computing"},{"key":"e_1_3_2_19_2","first-page":"1","volume-title":"Proc. of the 2016 High Performance Extreme Computing Conference (HPEC\u201916)","author":"Kepner Jeremy","year":"2016","unstructured":"Jeremy Kepner, Peter Aaltonen, David A. Bader, Aydin Bulu\u00e7, Franz Franchetti, John R. Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski, Timothy G. Mattson, and Jos\u00e9 E. Moreira. 2016. Mathematical foundations of the GraphBLAS. In Proc. of the 2016 High Performance Extreme Computing Conference (HPEC\u201916). 1\u20139."},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1109\/FCCM57271.2023.00032","volume-title":"Proc. of the 2023 Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201923)","author":"Kono Fumiya","year":"2023","unstructured":"Fumiya Kono, Naohito Nakasato, and Maho Nakata. 2023. Accelerating 128-bit floating-point matrix multiplication on FPGAs. In Proc. of the 2023 Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201923). 204."},{"key":"e_1_3_2_21_2","first-page":"925","volume-title":"Proc. of the 2020 International Conference on Data Engineering (ICDE\u201920)","author":"Lee Jeongmyung","year":"2020","unstructured":"Jeongmyung Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, and Yongjun Park. 2020. Optimization of GPU-based sparse matrix multiplication for large sparse networks. In Proc. of the 2020 International Conference on Data Engineering (ICDE\u201920). 925\u2013936."},{"issue":"2","key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"122402","DOI":"10.1007\/s11432-021-3374-x","article-title":"A memristive neural network based matrix equation solver with high versatility and high energy efficiency","volume":"66","author":"Li Jiancong","year":"2023","unstructured":"Jiancong Li, Houji Zhou, Yi Li, and Xiangshui Miao. 2023. A memristive neural network based matrix equation solver with high versatility and high energy efficiency. Science China Information Sciences 66, 2 (2023), 122402.","journal-title":"Science China Information Sciences"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2023.3248282"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.47"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2015.06.010"},{"issue":"1","key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/s10723-023-09646-1","article-title":"An intelligent framework for oversubscription management in CPU-GPU unified memory","volume":"21","author":"Long Xinjian","year":"2023","unstructured":"Xinjian Long, Xiangyang Gong, Bo Zhang, and Huiyang Zhou. 2023. An intelligent framework for oversubscription management in CPU-GPU unified memory. Springer Journal of Grid Computin 21, 1 (2023), 11.","journal-title":"Springer Journal of Grid Computin"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42514-023-00151-1"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"102545","DOI":"10.1016\/j.parco.2019.102545","article-title":"Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors","volume":"90","author":"Nagasaka Yusuke","year":"2019","unstructured":"Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Ayd\u0131n Bulu\u00e7. 2019. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90, C (2019), 102545.","journal-title":"Parallel Comput."},{"key":"e_1_3_2_29_2","first-page":"101","volume-title":"Proc. of the 2017 International Conference on Parallel Processing (ICPP\u201917)","author":"Nagasaka Yusuke","year":"2017","unstructured":"Yusuke Nagasaka, Akira Nukada, and Satoshi Matsuoka. 2017. High-performance and memory-saving sparse general matrix-matrix multiplication for NVIDIA pascal GPU. In Proc. of the 2017 International Conference on Parallel Processing (ICPP\u201917). 101\u2013110."},{"key":"e_1_3_2_30_2","first-page":"90","volume-title":"Proc. of the 2022 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201922)","author":"Niu Yuyao","year":"2022","unstructured":"Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu. 2022. TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs. In Proc. of the 2022 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201922). 90\u2013106."},{"key":"e_1_3_2_31_2","first-page":"724","volume-title":"Proc. of the 2018 International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Pal Subhankar","year":"2018","unstructured":"Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David T. Blaauw, Trevor N. Mudge, and Ronald G. Dreslinski. 2018. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In Proc. of the 2018 International Symposium on High Performance Computer Architecture (HPCA\u201918). 724\u2013736."},{"key":"e_1_3_2_32_2","first-page":"362","volume-title":"Proc. of the 2020 SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201920)","author":"Parger Mathias","year":"2020","unstructured":"Mathias Parger, Martin Winter, Daniel Mlakar, and Markus Steinberger. 2020. spECK: Accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In Proc. of the 2020 SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201920). 362\u2013375."},{"key":"e_1_3_2_33_2","first-page":"110","volume-title":"Proc. of the 2021 International Conference on High Performance Computing in Asia-Pacific Region (HPC-Asia\u201921)","author":"Rasouli Majid","year":"2021","unstructured":"Majid Rasouli, Robert M. Kirby, and Hari Sundar. 2021. A compressed, divide and conquer algorithm for scalable distributed matrix-matrix multiplication. In Proc. of the 2021 International Conference on High Performance Computing in Asia-Pacific Region (HPC-Asia\u201921). 110\u2013119."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.jpdc.2021.02.013","article-title":"TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs","volume":"151","author":"Rivera Cody","year":"2021","unstructured":"Cody Rivera, Jieyang Chen, Nan Xiong, Jing Zhang, Shuaiwen Leon Song, and Dingwen Tao. 2021. TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs. Journal of Parallel Distributed Computing 151, C (2021), 70\u201385.","journal-title":"Journal of Parallel Distributed Computing"},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1109\/HPCA56546.2023.10070977","volume-title":"Proc. of the 2023 International Symposium on High-Performance Computer Architecture (HPCA\u201923)","author":"Shabani Hesam","year":"2023","unstructured":"Hesam Shabani, Abhishek Singh, Bishoy Youhana, and Xiaochen Guo. 2023. HIRAC: A hierarchical accelerator with sorting-based packing for SpGEMMs in DNN applications. In Proc. of the 2023 International Symposium on High-Performance Computer Architecture (HPCA\u201923). 247\u2013258."},{"key":"e_1_3_2_36_2","first-page":"1","volume-title":"In Proc. of the 2023 Interregional NEWCAS Conference (NEWCAS\u201923)","author":"Shin Banseok","year":"2023","unstructured":"Banseok Shin, Sehun Park, and Jaeha Kung. 2023. Improving hardware efficiency of a sparse training accelerator by restructuring a reduction network. In In Proc. of the 2023 Interregional NEWCAS Conference (NEWCAS\u201923). 1\u20135."},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/CLUSTER.2014.6968748","volume-title":"Proc. of the 2014 International Conference on Cluster Computing (CLUSTER\u201914)","author":"Shirahata Koichi","year":"2014","unstructured":"Koichi Shirahata, Hitoshi Sato, and Satoshi Matsuoka. 2014. Out-of-core GPU memory management for MapReduce-based large-scale graph processing. In Proc. of the 2014 International Conference on Cluster Computing (CLUSTER\u201914). 221\u2013229."},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1145\/3490422.3502357","volume-title":"Proc. of the 2022 International Symposium on Field-Programmable Gate Arrays (FPGA\u201922)","author":"Song Linghao","year":"2022","unstructured":"Linghao Song, Yuze Chi, Atefeh Sohrabizadeh, Young-kyu Choi, Jason Lau, and Jason Cong. 2022. Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication. In Proc. of the 2022 International Symposium on Field-Programmable Gate Arrays (FPGA\u201922). 65\u201377."},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1109\/CLUSTER51413.2022.00053","volume-title":"Proc. of the 2022 International Conference on Cluster Computing (CLUSTER\u201922)","author":"Grinten Alexander van der","year":"2022","unstructured":"Alexander van der Grinten, Geert Custers, Duy Le Thanh, and Henning Meyerhenke. 2022. Fast dynamic updates and dynamic SpGEMM on MPI-Distributed graphs. In Proc. of the 2022 International Conference on Cluster Computing (CLUSTER\u201922). 429\u2013439."},{"key":"e_1_3_2_40_2","first-page":"1083","volume-title":"Proc. of the 2021 ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201921)","author":"Wang Yang","year":"2021","unstructured":"Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, and Jingwen Leng. 2021. Dual-side sparse tensor core. In Proc. of the 2021 ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA\u201921). 1083\u20131095."},{"key":"e_1_3_2_41_2","first-page":"68","volume-title":"Proc. of the 2019 SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919)","author":"Winter Martin","year":"2019","unstructured":"Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, and Markus Steinberger. 2019. Adaptive sparse matrix-matrix multiplication on the GPU. In Proc. of the 2019 SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP\u201919). 68\u201381."},{"key":"e_1_3_2_42_2","first-page":"392","volume-title":"Proc. of the 2021 International Parallel and Distributed Processing Symposium (IPDPS\u201921)","author":"Xia Yang","year":"2021","unstructured":"Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath. 2021. Scaling sparse matrix multiplication on CPU-GPU nodes. In Proc. of the 2021 International Parallel and Distributed Processing Symposium (IPDPS\u201921). 392\u2013401."},{"key":"e_1_3_2_43_2","first-page":"329","volume-title":"Proc. of the 2023 SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP\u201923)","author":"Yesil Serif","year":"2023","unstructured":"Serif Yesil, Azin Heidarshenas, Adam Morrison, and Josep Torrellas. 2023. WISE: Predicting the performance of sparse matrix vector multiplication with machine learning. In Proc. of the 2023 SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP\u201923). 329\u2013341."},{"issue":"4","key":"e_1_3_2_44_2","first-page":"49:1\u201349:24","article-title":"ASA: Accelerating sparse accumulation in column-wise SpGEMM","volume":"19","author":"Zhang Chao","year":"2022","unstructured":"Chao Zhang, Maximilian H. Bremer, Cy P. Chan, John Shalf, and Xiaochen Guo. 2022. ASA: Accelerating sparse accumulation in column-wise SpGEMM. ACM Transactions on Architecture and Code Optimization 19, 4 (2022), 49:1\u201349:24.","journal-title":"ACM Transactions on Architecture and Code Optimization"},{"key":"e_1_3_2_45_2","first-page":"379","volume-title":"Proc. of the 2023 International Parallel and Distributed Processing Symposium (IPDPS\u201923)","author":"Zhang Yichen","year":"2023","unstructured":"Yichen Zhang, Shengguo Li, Fan Yuan, Dezun Dong, Xiaojian Yang, Tiejun Li, and Zheng Wang. 2023. Memory-aware optimization for sequences of sparse matrix-vector multiplications. In Proc. of the 2023 International Parallel and Distributed Processing Symposium (IPDPS\u201923). 379\u2013389."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703352","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3703352","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:19:03Z","timestamp":1750295943000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,20]]},"references-count":44,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3703352"],"URL":"https:\/\/doi.org\/10.1145\/3703352","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,3,20]]},"assertion":[{"value":"2024-03-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}