{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T12:53:09Z","timestamp":1760100789800,"version":"3.38.0"},"reference-count":25,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2015,7,14]],"date-time":"2015-07-14T00:00:00Z","timestamp":1436832000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2016,2]]},"abstract":"<jats:p> In this paper, we present a new sparse matrix data format that leads to improved memory coalescing and more efficient sparse matrix-vector multiplication for a wide range of problems on high-throughput architectures such as a GPU. The sparse matrix structure is constructed by sorting the rows based on the row length (defined as the number of non-zero elements in a matrix row) followed by a partition into two ranges, short rows and long rows. Based on this partition, the matrix entries are then transformed into ELLPACK or vectorized compressed sparse row format. In addition, the number of threads are adaptively selected by their row length, in order to balance the workload for each graphics processing unit thread. Several computational experiments are presented to support this approach and the results suggest a notable improvement over a wide range of matrix structures. <\/jats:p>","DOI":"10.1177\/1094342015593156","type":"journal-article","created":{"date-parts":[[2015,7,15]],"date-time":"2015-07-15T01:02:27Z","timestamp":1436922147000},"page":"103-120","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":21,"title":["A hybrid format for better performance of sparse matrix-vector multiplication on a GPU"],"prefix":"10.1177","volume":"30","author":[{"given":"Dahai","family":"Guo","sequence":"first","affiliation":[{"name":"National Center for Supercomputing Applications, University of Illinois, Urbana, IL, USA"}]},{"given":"William","family":"Gropp","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Urbana, IL, USA"}]},{"given":"Luke N","family":"Olson","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Urbana, IL, USA"}]}],"member":"179","published-online":{"date-parts":[[2015,7,14]]},"reference":[{"key":"bibr1-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331600"},{"key":"bibr2-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1145\/2597652.2597678"},{"key":"bibr3-1094342015593156","unstructured":"Balay S, Abhyankar S, Adams MF, PETSc users manual. Technical Report. Report no. ANL-95\/11 - Revision 3.5 2014. Argonne: Argonne National Laboratory."},{"key":"bibr4-1094342015593156","unstructured":"Baskaran MM, Bordawekar R. Optimizing sparse matrix-vector multiplication on GPUs. IBM Research Report. Report no.RC2470 2009. IBM Watson Research Center Yorktown."},{"key":"bibr5-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1137\/110838844"},{"key":"bibr6-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"bibr7-1094342015593156","unstructured":"Blelloch GE, Heroux MA, Zagha M. Segmented operations for sparse matrix computation on vector multiprocessors. Technical Report. Report no.CMU-CS-93-173 1993. Pittsburgh: Carnegie Mellon University."},{"key":"bibr8-1094342015593156","unstructured":"Blue Waters (nd) National Center for Supercomputing Applications Available at: https:\/\/bluewaters.ncsa.illinois.edu\/"},{"key":"bibr9-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693471"},{"key":"bibr10-1094342015593156","unstructured":"NVIDIA Corporation (2012) NVIDIA\u2019s Next Generation CUDATM Compute Architecture: KeplerTM GK110. Available at: http:\/\/www.nvidia.com\/content\/PDF\/kepler\/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf"},{"key":"bibr11-1094342015593156","unstructured":"Cusp (2015) Cusp: Generic parallel algorithms for sparse matrix and graph computations. Avaliable at: https:\/\/github.com\/cusplibrary\/cusplibrary"},{"key":"bibr12-1094342015593156","unstructured":"Nvidia Corporation (n.d.) cuSPARSE Available at: http:\/\/docs.nvidia.com\/cuda\/cusparse"},{"key":"bibr13-1094342015593156","unstructured":"Davis T, Hu Y (2015) The University of Florida sparse matrix collection. Technical Report, University of Florida."},{"key":"bibr14-1094342015593156","doi-asserted-by":"publisher","DOI":"10.2528\/PIER11031607"},{"volume-title":"Proceedings of the extreme scaling workshop","year":"2012","author":"Guo D","key":"bibr15-1094342015593156"},{"key":"bibr16-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1137\/130930352"},{"key":"bibr17-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0825-3"},{"key":"bibr18-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.10"},{"key":"bibr19-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-44917-2_13"},{"volume-title":"GPU technology conference 2012","year":"2012","author":"Micikevicius P","key":"bibr20-1094342015593156"},{"key":"bibr21-1094342015593156","doi-asserted-by":"crossref","unstructured":"Reguly I, Giles M. Efficient sparse matrix-vector multiplication on cache-based GPUs. In: Innovative Parallel Computing, San Jose, CA, 13\u201314 May 2012.","DOI":"10.1109\/InPar.2012.6339602"},{"volume-title":"GPU technology conference, 2012","year":"2012","author":"Rennich S","key":"bibr22-1094342015593156"},{"key":"bibr23-1094342015593156","unstructured":"GP STREAM benchmark (nd). NVIDIA Corporation. Available at: https:\/\/devtalk.nvidia.com\/default\/topic\/381934\/stream-benchmark\/"},{"key":"bibr24-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2011.53"},{"key":"bibr25-1094342015593156","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015593156","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342015593156","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342015593156","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T02:05:27Z","timestamp":1740794727000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342015593156"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,7,14]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2016,2]]}},"alternative-id":["10.1177\/1094342015593156"],"URL":"https:\/\/doi.org\/10.1177\/1094342015593156","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2015,7,14]]}}}