{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T10:18:34Z","timestamp":1779963514605,"version":"3.53.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T00:00:00Z","timestamp":1776038400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T00:00:00Z","timestamp":1776038400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004233","name":"Universitat Polit\u00e8cnica de Val\u00e8ncia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004233","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Computing"],"published-print":{"date-parts":[[2026,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    The sparse matrix\u2013vector multiplication (\n                    <jats:sc>SpMV<\/jats:sc>\n                    ) kernel is a key kernel in scientific and engineering applications, forming the core of many iterative solvers for linear systems and eigenvalue problems. Due to its low arithmetic intensity and irregular memory access patterns,\n                    <jats:sc>SpMV<\/jats:sc>\n                    remains memory-bound on modern architectures, making its efficient implementation particularly challenging. This paper presents vectorized\n                    <jats:sc>SpMV<\/jats:sc>\n                    routines for RISC-V processors with SIMD support, exploiting the RISC-V Vector Extension (RVV 1.0). We implement and evaluate three storage formats\u2014CSR (Compressed Sparse Row), SELL-\n                    <jats:italic>p<\/jats:italic>\n                    (a vector-friendly variant of ELLPACK), and JDS (Jagged Diagonal Storage)\u2014providing low-level implementations that leverage RVV intrinsics. Performance is assessed on two commercial RISC-V platforms (CanMV-K230 and BananaPi F3) with 128-bit and 256-bit vector registers, and on the EPAC research system featuring 16,384-bit vectors. Results show that the vectorized routines significantly outperform scalar baselines, achieving a variety of speed-ups depending on the format and architecture. These findings highlight the potential of open RISC-V architectures for high-performance sparse linear algebra and provide a foundation for future vector-aware sparse kernel optimizations.\n                  <\/jats:p>","DOI":"10.1007\/s00607-026-01658-5","type":"journal-article","created":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T02:54:25Z","timestamp":1776048865000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Sparse matrix\u2013vector product on RISC-V processors with SIMD units"],"prefix":"10.1007","volume":"108","author":[{"given":"Andr\u00e9s E.","family":"Tom\u00e1s","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"H\u00e9ctor","family":"Mart\u00ednez","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sandra","family":"Catal\u00e1n","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Patricia","family":"Siwinska","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Adri\u00e1n","family":"Castell\u00f3","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marc","family":"Casas","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Enrique S.","family":"Quintana-Ort\u00ed","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,4,13]]},"reference":[{"key":"1658_CR1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611971538","volume-title":"Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods","author":"R Barrett","year":"1994","unstructured":"Barrett R, Berry M, Chan T, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Vorst H (1994) Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA"},{"issue":"170","key":"1658_CR2","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1090\/S0025-5718-1985-0777273-9","volume":"44","author":"Y Saad","year":"1985","unstructured":"Saad Y, Schultz MH (1985) Conjugate Gradient-like algorithms for solving nonsymmetric linear systems. Math Comput 44(170):417\u2013434","journal-title":"Math Comput"},{"key":"1658_CR3","doi-asserted-by":"publisher","DOI":"10.1515\/9781400830329","volume-title":"Google\u2019s PageRank and Beyond: The Science of Search Engine Rankings","author":"AN Langville","year":"2006","unstructured":"Langville AN, Meyer CD (2006) Google\u2019s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton, NJ"},{"key":"1658_CR4","doi-asserted-by":"crossref","unstructured":"Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. SC \u201909, pp. 18\u201311811. ACM, New York, NY, USA","DOI":"10.1145\/1654059.1654078"},{"key":"1658_CR5","doi-asserted-by":"publisher","unstructured":"Bulu\u00e7 A, Williams S, Oliker L, Demmel J (2011) Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 721\u2013733. IEEE, Anchorage, AK, USA. https:\/\/doi.org\/10.1109\/IPDPS.2011.73","DOI":"10.1109\/IPDPS.2011.73"},{"issue":"4","key":"1658_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3017994","volume":"43","author":"S Filippone","year":"2017","unstructured":"Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Transactions on Mathematical Software 43(4):1\u201349. https:\/\/doi.org\/10.1145\/3017994","journal-title":"ACM Transactions on Mathematical Software"},{"key":"1658_CR7","doi-asserted-by":"publisher","unstructured":"Choi J, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 115\u2013126. ACM, Bangalore, India . https:\/\/doi.org\/10.1145\/1693453.1693471","DOI":"10.1145\/1693453.1693471"},{"key":"1658_CR8","unstructured":"Grossman M, Thiele C, Araya-Polo M, Frank F, Alpak FO, Sarkar V (2016) A survey of sparse matrix-vector multiplication performance on large matrices. CoRR arxiv:abs\/1608.00636"},{"key":"1658_CR9","doi-asserted-by":"publisher","unstructured":"Liu W, Vinter B (2015) CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the 29th ACM International Conference on Supercomputing (ICS), pp. 339\u2013350. ACM, Sorrento, Italy . https:\/\/doi.org\/10.1145\/2751205.2751240","DOI":"10.1145\/2751205.2751240"},{"key":"1658_CR10","doi-asserted-by":"publisher","unstructured":"Gao J, Liu B, Ji W, Huang H (2024) A systematic literature survey of sparse matrix-vector multiplication. CoRR https:\/\/doi.org\/10.48550\/arXiv.2404.06047, arXiv:2404.06047 [cs.DC]","DOI":"10.48550\/arXiv.2404.06047"},{"key":"1658_CR11","doi-asserted-by":"publisher","unstructured":"Flegar G, Quintana-Ort\u00ed ES (2016) Balanced CSR sparse matrix-vector product on graphics processors. In: High-Performance Computing \u2013 ISC 2016 Workshops. Lecture Notes in Computer Science, pp. 697\u2013709. Springer, Cham, Switzerland. https:\/\/doi.org\/10.1007\/978-3-319-64203-1_50","DOI":"10.1007\/978-3-319-64203-1_50"},{"issue":"5\u20136","key":"1658_CR12","doi-asserted-by":"publisher","first-page":"284","DOI":"10.1016\/j.parco.2011.03.004","volume":"37","author":"K Bergmans","year":"2011","unstructured":"Bergmans K, Meerbergen K, Vandebril R (2011) Algorithms for parallel shared-memory sparse matrix-vector multiplication on unstructured matrices. Parallel Comput 37(5\u20136):284\u2013299. https:\/\/doi.org\/10.1016\/j.parco.2011.03.004","journal-title":"Parallel Comput"},{"issue":"2","key":"1658_CR13","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1145\/2907071","volume":"49","author":"S Mittal","year":"2016","unstructured":"Mittal S (2016) A survey of recent prefetching techniques for processor caches. ACM Comput Surv 49(2):35\u201313535. https:\/\/doi.org\/10.1145\/2907071","journal-title":"ACM Comput Surv"},{"issue":"5","key":"1658_CR14","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1109\/12.384268","volume":"44","author":"T-YF Chen","year":"1995","unstructured":"Chen T-YF, Baer J-L (1995) Effective hardware-based data prefetching for high-performance processors. IEEE Trans Comput 44(5):609\u2013623. https:\/\/doi.org\/10.1109\/12.384268","journal-title":"IEEE Trans Comput"},{"issue":"5","key":"1658_CR15","doi-asserted-by":"publisher","first-page":"408","DOI":"10.1137\/130930352","volume":"36","author":"M Kreutzer","year":"2014","unstructured":"Kreutzer M, Hager G, Wellein G, Fehske H, Bishop AR (2014) A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J Sci Comput 36(5):408\u2013432. https:\/\/doi.org\/10.1137\/130930352","journal-title":"SIAM J Sci Comput"},{"key":"1658_CR16","unstructured":"Anzt H, Tomov S, Dongarra J (2014) Implementing a sparse matrix vector product for the SELL-C\/SELL-C-$$\\sigma $$ formats on NVIDIA GPUs. Technical Report UT-EECS-14-727, University of Tennessee Computer Science Technical Report. https:\/\/smartech.gatech.edu\/handle\/1853\/47468"},{"key":"1658_CR17","unstructured":"Saad Y. Jagged diagonal storage (JDS) format. In: Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., Vorst, H. (eds.) Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2000). Chap. 2.8.3.1.5. See section on Jagged Diagonal Storage for vector and parallel architectures"},{"key":"1658_CR18","doi-asserted-by":"publisher","unstructured":"Bian H, Huang J, Dong R, Liu L, Wang X (2020) CSR2: A new format for SIMD-accelerated SpMV. In: Proceedings of the 20th IEEE\/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 350\u2013359. IEEE\/ACM, Melbourne, Australia . https:\/\/doi.org\/10.1109\/CCGrid49817.2020.00-58","DOI":"10.1109\/CCGrid49817.2020.00-58"},{"key":"1658_CR19","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1016\/j.jpdc.2021.08.002","volume":"158","author":"Y Zhang","year":"2021","unstructured":"Zhang Y, Yang W, Tang D, Li K, Li K et al (2021) Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor. Journal of Parallel and Distributed Computing 158:126\u2013137. https:\/\/doi.org\/10.1016\/j.jpdc.2021.08.002","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"1658_CR20","doi-asserted-by":"publisher","unstructured":"G\u00f3mez C, Mantovani F, Focht E, Casas M (2021) Efficiently running spmv on long vector architectures. In: Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP \u201921, pp. 292\u2013303. Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3437801.3441592","DOI":"10.1145\/3437801.3441592"},{"issue":"1","key":"1658_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2049662.2049663","volume":"38","author":"TA Davis","year":"2011","unstructured":"Davis TA, Hu Y (2011) The SuiteSparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38(1):1\u201325. https:\/\/doi.org\/10.1145\/2049662.2049663","journal-title":"ACM Transactions on Mathematical Software (TOMS)"},{"issue":"4","key":"1658_CR22","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1145\/1498765.1498785","volume":"52","author":"S Williams","year":"2009","unstructured":"Williams S, Waterman A, Patterson D (2009) Roofline: An insightful visual performance model for multicore architectures. Commun ACM 52(4):65\u201376. https:\/\/doi.org\/10.1145\/1498765.1498785","journal-title":"Commun ACM"},{"issue":"1","key":"1658_CR23","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/L-CA.2013.6","volume":"13","author":"A Ilic","year":"2014","unstructured":"Ilic A, Pratas F, Sousa L (2014) Cache-aware roofline model: Upgrading the loft. IEEE Comput Archit Lett 13(1):21\u201324. https:\/\/doi.org\/10.1109\/L-CA.2013.6","journal-title":"IEEE Comput Archit Lett"}],"container-title":["Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00607-026-01658-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00607-026-01658-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00607-026-01658-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,28]],"date-time":"2026-05-28T09:58:40Z","timestamp":1779962320000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00607-026-01658-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,13]]},"references-count":23,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2026,5]]}},"alternative-id":["1658"],"URL":"https:\/\/doi.org\/10.1007\/s00607-026-01658-5","relation":{},"ISSN":["0010-485X","1436-5057"],"issn-type":[{"value":"0010-485X","type":"print"},{"value":"1436-5057","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,13]]},"assertion":[{"value":"18 November 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 March 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 April 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"66"}}