{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T07:06:08Z","timestamp":1774595168088,"version":"3.50.1"},"reference-count":28,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2004,2,1]],"date-time":"2004-02-01T00:00:00Z","timestamp":1075593600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2004,2]]},"abstract":"<jats:p> Sparse matrix\u2013vector multiplication is an important computational kernel that performs poorly on most modern processors due to a low compute-to-memory ratio and irregular memory access patterns. Optimization is difficult because of the complexity of cache-based memory systems and because performance is highly dependent on the non-zero structure of the matrix. The SPARSITY system is designed to address these problems by allowing users to automatically build sparse matrix kernels that are tuned to their matrices and machines. SPARSITY combines traditional techniques such as loop transformations with data structure transformations and optimization heuristics that are specific to sparse matrices. It provides a novel framework for selecting optimization parameters, such as block size, using a combination of performance models and search. <\/jats:p><jats:p> In this paper we discuss the optimization of two operations: a sparse matrix times a dense vector and a sparse matrix times a set of dense vectors. Our experience indicates that register level optimizations are effective for matrices arising in certain scientific simulations, in particular finite-element problems. Cache level optimizations are important when the vector used in multiplication is larger than the cache size, especially for matrices in which the non-zero structure is random. For applications involving multiple vectors, reorganizing the computation to perform the entire set of multiplications as a single operation produces significant speedups. We describe the different optimizations and parameter selection techniques and evaluate them on several machines using over 40 matrices taken from a broad set of application domains. Our results demonstrate speedups of up to 4\u00d7 for the single vector case and up to 10\u00d7 for the multiple vector case. <\/jats:p>","DOI":"10.1177\/1094342004041296","type":"journal-article","created":{"date-parts":[[2004,4,22]],"date-time":"2004-04-22T01:27:18Z","timestamp":1082597238000},"page":"135-158","source":"Crossref","is-referenced-by-count":250,"title":["Sparsity: Optimization Framework for Sparse Matrix Kernels"],"prefix":"10.1177","volume":"18","author":[{"given":"Eun-Jin","family":"Im","sequence":"first","affiliation":[{"name":"SCHOOL OF COMPUTER SCIENCE KOOKMIN UNIVERSITY, SEOUL, KOREA"}]},{"given":"Katherine","family":"Yelick","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Vuduc","sequence":"additional","affiliation":[{"name":"COMPUTER SCIENCE DIVISION UNIVERSITY OF CALIFORNIA, BERKELEY, CA, USA"}]}],"member":"179","published-online":{"date-parts":[[2004,2,1]]},"reference":[{"key":"atypb1","doi-asserted-by":"crossref","unstructured":"Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., and van der Vorst, H., editors. 2000. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA .","DOI":"10.1137\/1.9780898719581"},{"key":"atypb2","unstructured":"Baker, A. H., Jessup, E. R., and Manteuffel, T. January 2003. A technique for accelerating the convergence of restarted GMRES, Technical Report CU-CS-045-03, University of Colorado, Department of Computer Science ."},{"key":"atypb3","unstructured":"Balay, S., Gropp, W. D., McInnes, L. C., and Smith, B. F. 2000. PETSc 2.0 users manual, Technical Report ANL-95\/11-Revision 2.0.28, Argonne National Laboratory ."},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1137\/1037127"},{"key":"atypb5","unstructured":"Bik, A. J. C. 1996. Compiler support for sparse matrix computations, PhD thesis, Leiden University."},{"key":"atypb6","doi-asserted-by":"crossref","unstructured":"Bilmes, J. A., Asanovic, K., Demmel, J., Chin, C., and Lam, D. July 1997. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology . In International Conference on Supercomputing, Vienna, Austria.","DOI":"10.1145\/263580.263662"},{"key":"atypb7","unstructured":"BLAST Forum. October 1999. Documentation for the Basic Linear Algebra Subprograms (BLAS), http:\/\/www.netlib.org\/blas\/blast-forum."},{"key":"atypb8","doi-asserted-by":"crossref","unstructured":"Golub, G. H. and Underwood, R. 1977. The block Lanczos method for computing eigenvalues. In Mathematical Sotware III, J. R. Rice, editor, Academic, New York , pp. 361\u2013377.","DOI":"10.1016\/B978-0-12-587260-7.50018-2"},{"key":"atypb9","unstructured":"Goto, K. and van de Geijn, R. November 2002. On reducing TLB misses in matrix multiplication, Technical Report TR-2002-55, University of Texas at Austin ."},{"key":"atypb10","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479888151111"},{"key":"atypb11","doi-asserted-by":"publisher","DOI":"10.1002\/nme.1620260503"},{"key":"atypb12","unstructured":"Hennesy, J. L. and Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach, 2nd edition, Morgan Kaufman, San Mateo, CA ."},{"key":"atypb13","doi-asserted-by":"crossref","unstructured":"Heras, D. B., Perez, V. B., Dominguez, J. C. C., and Rivera, F. F. 1999. Modeling and improving locality for irregular problems: sparse matrix\u2013vector product on cache memories as a case study. In HPCN Europe, Amsterdam, Tne Netherlands , pp. 201\u2013210.","DOI":"10.1007\/BFb0100581"},{"key":"atypb14","unstructured":"Im, E.J. May 2000. Optimizing the performance of sparse matrix\u2013vector multiplication, Ph.D. Thesis, University of California at Berkeley."},{"key":"atypb15","unstructured":"Im, E.J. and Yelick, K. March 1999. Optimizing sparse matrix vector multiplication on SMPs . In 9th SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, TX."},{"key":"atypb16","doi-asserted-by":"crossref","unstructured":"Kotlyar, V., Pingali, K., and Stodghill, P. 1997. Compiling parallel code for sparse matrix applications . In Proceedings of Supercomputing San Jose, CA.","DOI":"10.1145\/509593.509603"},{"key":"atypb17","doi-asserted-by":"crossref","unstructured":"Lam, M. S., Rothberg, E. E., and Wolf, M. E. April 1991. The cache performance and optimizations of blocked algorithms . In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems Santa Clara, CA.","DOI":"10.1145\/106972.106981"},{"key":"atypb18","unstructured":"Lehoucq, R. and Maschhoff, K. 1997. Implementation of an implicitly restarted block Arnoldi method, Preprint MCSP649-0297, Argonne National Lab."},{"key":"atypb19","unstructured":"Marques, O. A. 1995. BLZPACK: Description and User\u2019s guide, Technical Report TR\/PA\/95\/30, CERFACS."},{"key":"atypb20","doi-asserted-by":"crossref","unstructured":"Oliker, L., Li, X., Heber, G., and Biswas, R. 2000. Ordering unstructured meshes for sparse matrix computations on leading parallel systems. In Parallel and Distributed Processing, 15 IPDPS 2000 Workshops, Lecture Notes in Computer Science Vol. 1800, Springer-Verlag, Berlin , pp. 497\u2013503.","DOI":"10.1007\/3-540-45591-4_66"},{"key":"atypb21","doi-asserted-by":"crossref","unstructured":"Pinar, A. and Heath, M. 1999. Improving performance of sparse matrix\u2013vector multiplication . In Proceedings of Supercomputing Portland, OR.","DOI":"10.1145\/331532.331562"},{"key":"atypb22","unstructured":"Pozo, R. and Remington, K. 1997. NIST Sparse BLAS, http:\/\/math.nist.gov\/spblas."},{"key":"atypb23","doi-asserted-by":"crossref","unstructured":"Pugh, W. and Shpeisman, T. August 1998. Generation of efficient code for sparse matrix computations . In roceedings of the 11th Workshop on Languages and Compilers for Parallel Computing, LNCS, Springer-Verlag, Berlin.","DOI":"10.1007\/3-540-48319-5_14"},{"key":"atypb24","doi-asserted-by":"publisher","DOI":"10.1007\/BF01388686"},{"key":"atypb25","doi-asserted-by":"publisher","DOI":"10.1007\/BF01388687"},{"key":"atypb26","unstructured":"Shi, J. and Malik, J. January 1998. Motion segmentation and tracking using normalized cuts . In International Conference on Computer Vision Bombay, India."},{"key":"atypb27","unstructured":"Toledo, S. March 1997. Improving memory-system performance of sparse matrix\u2013vector multiplication . In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing Minneapolis, MN."},{"key":"atypb28","unstructured":"Whaley, R. C., Petitet, A., and Dongarra, J. J. September 2000. Automated empirical optimization of software and the ATLAS project, http:\/\/math-atlas.sourceforge.net."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342004041296","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342004041296","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T09:53:37Z","timestamp":1741082017000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342004041296"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,2]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2004,2]]}},"alternative-id":["10.1177\/1094342004041296"],"URL":"https:\/\/doi.org\/10.1177\/1094342004041296","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,2]]}}}