{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:46:42Z","timestamp":1753883202520,"version":"3.41.2"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"National Key Research and Development Program of China","award":["2023YFB3001502"],"award-info":[{"award-number":["2023YFB3001502"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U23B2020, 62090024, 62232015, 62302479"],"award-info":[{"award-number":["U23B2020, 62090024, 62232015, 62302479"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Innovation Funding of ICT, CAS","award":["E361010, E261110"],"award-info":[{"award-number":["E361010, E261110"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>\n            Sparse matrix-vector semiring computation is a key operation in sparse matrix computations, with performance strongly dependent on both program design and the features of the sparse matrices. Given the diversity of sparse matrices, designing a tailored program for each matrix is challenging. To address this, we propose SRSparse,\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n            a program generator that creates tailored programs by automatically combining program designing methods to fit specific input matrices. It provides two components: the\n            <jats:italic toggle=\"yes\">problem definition configuration<\/jats:italic>\n            , which declares the computation, and the\n            <jats:italic toggle=\"yes\">scheduling language<\/jats:italic>\n            , which can be leveraged by an auto-tuner to specify the program designs. The two are lowered to the intermediate representations of SRSparse, the\n            <jats:italic toggle=\"yes\">Format IR<\/jats:italic>\n            and\n            <jats:italic toggle=\"yes\">Kernel IR<\/jats:italic>\n            , which respectively generate format conversion routine and kernel code. We evaluate SRSparse on four representative sparse kernels and three format conversion routines. For sparse kernels, SRSparse achieves median speedups over handwritten programs: COO (3.50\u00d7), CSR-Adaptive (5.36\u00d7), CSR5 (2.06\u00d7), ELL (1.63\u00d7), Gunrock (1.57\u00d7), and GraphBLAST (1.96\u00d7); over an auto-tuner: AlphaSparse (1.16\u00d7); and over a compiler: TACO (1.71\u00d7). For format conversion routines, SRSparse achieves median speedups over handwritten implementations: Intel MKL (7.60\u00d7), SPARSKIT (2.61\u00d7), CUSP (2.77\u00d7), and Ginkgo (1.74\u00d7); and over a compiler: TACO (4.04\u00d7).\n          <\/jats:p>","DOI":"10.1145\/3722114","type":"journal-article","created":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T11:14:16Z","timestamp":1741346056000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SRSparse: Generating Codes for High-Performance Sparse Matrix-Vector Semiring Computations"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3590-3616","authenticated-orcid":false,"given":"Zhen","family":"Du","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1599-9665","authenticated-orcid":false,"given":"Ying","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1953-1392","authenticated-orcid":false,"given":"Ninghui","family":"Sun","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2491-7679","authenticated-orcid":false,"given":"Huimin","family":"Cui","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"School of Computer and Control Engineering, University of Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2909-7750","authenticated-orcid":false,"given":"Xiaobing","family":"Feng","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences","place":["Beijing, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1270-4147","authenticated-orcid":false,"given":"Jiajia","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Computer Science, North Carolina State University (NCSU)","place":["Raleigh, United States"]}]}],"member":"320","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3480935"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.69"},{"key":"e_1_3_2_4_2","volume-title":"Efficient Sparse Matrix-vector Multiplication on CUDA","author":"Bell Nathan","year":"2008","unstructured":"Nathan Bell and Michael Garland. 2008. Efficient Sparse Matrix-vector Multiplication on CUDA. Technical Report. Citeseer."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3226228"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.5555\/865221"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536313"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2015.55"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.09.005"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2500365.2500613"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC41404.2022.00071"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3017994"},{"key":"e_1_3_2_14_2","article-title":"A systematic literature survey of sparse matrix-vector multiplication","author":"Gao Jianhua","year":"2024","unstructured":"Jianhua Gao, Bingjie Liu, Weixing Ji, and Hua Huang. 2024. A systematic literature survey of sparse matrix-vector multiplication. arXiv preprint arXiv:2404.06047 (2024).","journal-title":"arXiv preprint arXiv:2404.06047"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.68"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2006.88"},{"key":"e_1_3_2_17_2","article-title":"Adaptive row-grouped CSR format for storing of sparse matrices on GPU","author":"Heller Martin","year":"2012","unstructured":"Martin Heller and Tom\u00e1\u0161 Oberhuber. 2012. Adaptive row-grouped CSR format for storing of sparse matrices on GPU. arXiv preprint arXiv:1203.5737 (2012).","journal-title":"arXiv preprint arXiv:1203.5737"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2380776.2380778"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/2039367"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1029\/2001GL013552"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133901"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.21105\/joss.01244"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1137\/130930352"},{"key":"e_1_3_2_24_2","first-page":"7577","article-title":"A survey on performance modelling and optimization techniques for SpMV on GPUs","volume":"5","author":"Kulkarni Aditi V.","year":"2014","unstructured":"Aditi V. Kulkarni and C. R. Barde. 2014. A survey on performance modelling and optimization techniques for SpMV on GPUs. Int. J. Comput. Sci. Inf. Technol. 5 (2014), 7577\u20137582.","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"key":"e_1_3_2_25_2","first-page":"31","volume-title":"10th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912)","author":"Kyrola Aapo","year":"2012","unstructured":"Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale graph computation on just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912). 31\u201346."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2401575"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_29_2","article-title":"Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL","author":"Li Ang","year":"2015","unstructured":"Ang Li, Hammad Mazhar, Radu Serban, and Dan Negrut. 2015. Comparison of SPMV performance on matrices with different matrix format using CUSP, cuSPARSE and ViennaCL. University of Wisconsin, Madison (2015).","journal-title":"University of Wisconsin, Madison"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2491956.2462181"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2681072"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751209"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2015.7245713"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.10"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3016078.2851190"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3314221.3314646"},{"key":"e_1_3_2_37_2","volume-title":"GPU Technology Conference","author":"Naumov Maxim","year":"2010","unstructured":"Maxim Naumov, L. Chien, Philippe Vandermersch, and Ujval Kapasi. 2010. Cusparse library. In GPU Technology Conference."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1002\/wics.13"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS.2018.00072"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_2_41_2","first-page":"51","volume-title":"Intl. Workshop on GPUs and Scientific Applications","author":"Rupp Karl","year":"2010","unstructured":"Karl Rupp, Florian Rudolf, and Josef Weinbub. 2010. ViennaCL-a high level linear algebra library for GPUs and multi-core CPUs. In Intl. Workshop on GPUs and Scientific Applications. 51\u201356."},{"key":"e_1_3_2_42_2","article-title":"SPARSKIT: A basic tool kit for sparse computations (Version 2)","author":"Saad Youcef","year":"1994","unstructured":"Youcef Saad. 1994. SPARSKIT: A basic tool kit for sparse computations (Version 2). Computer Science Department, University of Minnesota (1994).","journal-title":"Computer Science Department, University of Minnesota"},{"issue":"333","key":"e_1_3_2_43_2","first-page":"10","article-title":"Hill-climbing search","volume":"81","author":"Selman Bart","year":"2006","unstructured":"Bart Selman and Carla P. Gomes. 2006. Hill-climbing search. Encyclopedia of Cognitive Science 81, 333-335 (2006), 10.","journal-title":"Encyclopedia of Cognitive Science"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC.2012.192"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578360.3580264"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3527333"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45545-0_23"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/11596110_7"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2857721"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.02.004"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.5555\/374106"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1658"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-06486-4_7"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8659.2012.03227.x"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466795"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342013501126"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582047"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178487.3178495"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.03.002"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3722114","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T12:20:40Z","timestamp":1751458840000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722114"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":60,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3722114"],"URL":"https:\/\/doi.org\/10.1145\/3722114","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-09-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}