{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T23:00:58Z","timestamp":1777676458379,"version":"3.51.4"},"reference-count":33,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2011,8,1]],"date-time":"2011-08-01T00:00:00Z","timestamp":1312156800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2011,8]]},"abstract":"<jats:p>The execution time of many scientific computing applications is dominated by the time spent in performing sparse matrix vector multiplication (SMV; y \u2190 A \u00b7 x). We consider improving the performance of SMV on multicores by exploiting the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. First, we identify indistinguishable vertices, i.e., vertices with the same adjacency structure, in a graph representation of the sparse matrix (A) and group them into a supernode. Next, we identify effectively dense blocks within the matrix by grouping rows and columns in each supernode. Finally, by using a suitable data structure for this representation of the matrix, we reduce the number of load operations during SMV while exactly preserving the original sparsity structure of A. In addition, we use ordering techniques to enhance locality in accesses to the vector, x, to yield an SMV kernel that exploits the effectively dense substructures in the matrix. We evaluate our scheme on Intel Nehalem and AMD Shanghai processors. We observe that for larger matrices on the Intel Nehalem processor, our method improves performance on average by 37.35% compared with the traditional compressed sparse row scheme (a blocked compressed form improves performance on average by 30.27%). Benefits of our new format are similar for the AMD processor. More importantly, if we pick for each matrix the best among our method and the blocked compressed scheme, the average performance improvements increase to 40.85%. Additional results indicate that the best performing scheme varies depending on the matrix and the system. We therefore propose an effective density measure that could be used for method selection, thus adding to the variety of options for an auto-tuned optimized SMV kernel that can exploit sparse matrix properties and hardware attributes for high performance.<\/jats:p>","DOI":"10.1177\/1094342011414748","type":"journal-article","created":{"date-parts":[[2011,8,5]],"date-time":"2011-08-05T06:09:32Z","timestamp":1312524572000},"page":"328-341","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["Exploiting dense substructures for fast sparse matrix vector multiplication"],"prefix":"10.1177","volume":"25","author":[{"given":"Manu","family":"Shantharam","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering The Pennsylvania State University University Park, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anirban","family":"Chatterjee","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering The Pennsylvania State University University Park, PA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Padma","family":"Raghavan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering The Pennsylvania State University University Park, PA, USA,"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2011,8,5]]},"reference":[{"key":"atypb1","doi-asserted-by":"publisher","DOI":"10.1137\/0916081"},{"key":"atypb2","first-page":"100","volume":"09","author":"Belgin M.","journal-title":"In ICS\u2019"},{"key":"atypb3","volume-title":"The Mathematical Theory of Finite Element Methods","author":"Brenner S.","year":"2007","edition":"3"},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1177\/109434200001400303"},{"key":"atypb5","volume-title":"First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem)","author":"Casazza J.","year":"2008"},{"key":"atypb6","volume-title":"Proceedings of the 1969 24th National Conference","author":"Cuthill E."},{"key":"atypb7","doi-asserted-by":"publisher","DOI":"10.2514\/3.12012"},{"key":"atypb8","unstructured":"Davis T. ( 1997) The University of Florida Sparse Matrix Collection . NA Digest, p. 97."},{"key":"atypb9","volume-title":"Proceedings of IPDPS","author":"Demmel J."},{"key":"atypb10","volume-title":"Conference on Linux Clusters: The HPC Revolution","author":"Dongarra J."},{"key":"atypb11","volume-title":"Computer Solution of Large Sparse Positive Definite Systems","author":"George A.","year":"1981"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1137\/1031001"},{"key":"atypb13","volume-title":"Euromicro Conference on Parallel, Distributed, and Network-Based Processing","author":"Goumas G."},{"key":"atypb14","doi-asserted-by":"publisher","DOI":"10.1177\/1094342010374847"},{"key":"atypb15","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"atypb16","doi-asserted-by":"publisher","DOI":"10.1137\/1033099"},{"key":"atypb17","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718133"},{"key":"atypb18","volume-title":"Proceedings of the International Conference on Computational Science","author":"Im E."},{"key":"atypb19","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041296"},{"key":"atypb20","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-44520-X_1"},{"key":"atypb21","volume-title":"37th International Conference on Parallel Processing, 2008 (ICPP\u201908)","author":"Kourtis K."},{"key":"atypb22","first-page":"113","volume":"06","author":"Langou J.","year":"2006","journal-title":"In SuperComputing\u2019"},{"key":"atypb23","volume-title":"International Conference on Parallel Processing","author":"Lee BC"},{"key":"atypb24","volume-title":"Department of Defense HPCMP Users Group Conference","author":"Mucci PJ"},{"key":"atypb25","first-page":"30","volume":"99","author":"Pinar A.","year":"1999","journal-title":"In Supercomputing\u2019"},{"key":"atypb26","volume-title":"An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Technical Report","author":"Shewchuk JR","year":"1994"},{"key":"atypb27","doi-asserted-by":"publisher","DOI":"10.1147\/rd.416.0711"},{"key":"atypb28","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/16\/1\/071"},{"key":"atypb29","volume-title":"Proceedings of the High Performance Computing and Communications","author":"Vuduc R."},{"key":"atypb30","volume-title":"AMD\u2019s Shanghai 45nm Opterons versus Intel\u2019s latest 45nm Xeons","author":"Wasson S.","year":"2008"},{"key":"atypb31","first-page":"307","volume":"06","author":"Willcock J.","year":"2006","journal-title":"In ICS\u2019"},{"key":"atypb32","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362674"},{"key":"atypb33","doi-asserted-by":"publisher","DOI":"10.1137\/080733243"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342011414748","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342011414748","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:19:00Z","timestamp":1777450740000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342011414748"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,8]]},"references-count":33,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2011,8]]}},"alternative-id":["10.1177\/1094342011414748"],"URL":"https:\/\/doi.org\/10.1177\/1094342011414748","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,8]]}}}