{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:27:40Z","timestamp":1759134460929,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T00:00:00Z","timestamp":1534118400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,8,13]]},"DOI":"10.1145\/3225058.3225074","type":"proceedings-article","created":{"date-parts":[[2018,8,8]],"date-time":"2018-08-08T19:13:06Z","timestamp":1533755586000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform"],"prefix":"10.1145","author":[{"given":"Qiao","family":"Sun","sequence":"first","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Haidian Qu, Beijing Shi, China"}]},{"given":"Changyou","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Haidian Qu, Beijing Shi, China"}]},{"given":"Changmao","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Haidian Qu, Beijing Shi, China"}]},{"given":"Jiajia","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Haidian Qu, Beijing Shi, China"}]},{"given":"Leisheng","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Haidian Qu, Beijing Shi, China"}]}],"member":"320","published-online":{"date-parts":[[2018,8,13]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2014. CUSP: A C++ Templated Sparse Matrix Library.  2014. CUSP: A C++ Templated Sparse Matrix Library."},{"key":"e_1_3_2_1_2_1","unstructured":"2014. The Open Standard for Parallel Programming of Heterogeneous Systems. https:\/\/www.khronos.org\/opencl.  2014. The Open Standard for Parallel Programming of Heterogeneous Systems. https:\/\/www.khronos.org\/opencl."},{"volume-title":"Top-500 supercomputer list","year":"2017","key":"e_1_3_2_1_3_1","unstructured":"2017. Top-500 supercomputer list in 2017 . https:\/\/www.top500.org\/lists\/2017\/06\/. 2017. Top-500 supercomputer list in 2017. https:\/\/www.top500.org\/lists\/2017\/06\/."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926273"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.69"},{"key":"e_1_3_2_1_6_1","volume-title":"Karl Rupp, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang.","author":"Balay Satish","year":"2016","unstructured":"Satish Balay , Shrirang Abhyankar , Mark F. Adams , Jed Brown , Peter Brune , Kris Buschelman , Lisandro Dalcin , Victor Eijkhout , William D. Gropp , Dinesh Kaushik , Matthew G. Knepley , Lois Curfman McInnes , Karl Rupp, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2016 . PETSc Web page. http:\/\/www.mcs.anl.gov\/petsc. http:\/\/www.mcs.anl.gov\/petsc Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2016. PETSc Web page. http:\/\/www.mcs.anl.gov\/petsc. http:\/\/www.mcs.anl.gov\/petsc"},{"key":"e_1_3_2_1_7_1","volume-title":"Ribeiro","author":"Batista Vicente H. F.","year":"2010","unstructured":"Vicente H. F. Batista , George O. Ainsworth Jr , and Fernando L. B . Ribeiro . 2010 . Parallel structurally-symmetric sparse matrix-vector products on multi-core processors. Computer Science ( 2010). Vicente H. F. Batista, George O. Ainsworth Jr, and Fernando L. B. Ribeiro. 2010. Parallel structurally-symmetric sparse matrix-vector products on multi-core processors. Computer Science (2010)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_3_2_1_9_1","volume-title":"International Conference on Computer Application and System Modeling. V11-161--V11-165","author":"Cao Wei","year":"2010","unstructured":"Wei Cao , Lu Yao , Zongzhe Li , Yongxian Wang , and Zhenghua Wang . 2010 . Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format . In International Conference on Computer Application and System Modeling. V11-161--V11-165 . Wei Cao, Lu Yao, Zongzhe Li, Yongxian Wang, and Zhenghua Wang. 2010. Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format. In International Conference on Computer Application and System Modeling. V11-161--V11-165."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.09.005"},{"key":"e_1_3_2_1_11_1","unstructured":"T. Davis and Y. Hu. 2018. University of Florida Sparse Matrix Collection. http:\/\/www.cise.ufl.edu\/research\/sparse\/matrices\/.  T. Davis and Y. Hu. 2018. University of Florida Sparse Matrix Collection. http:\/\/www.cise.ufl.edu\/research\/sparse\/matrices\/."},{"volume-title":"International Conference on Computational Science. 632--641","author":"Robert","key":"e_1_3_2_1_12_1","unstructured":"Robert D. Falgout and Ulrike Meier Yang. 2002. hypre: A Library of High Performance Preconditioners . In International Conference on Computational Science. 632--641 . Robert D. Falgout and Ulrike Meier Yang. 2002. hypre: A Library of High Performance Preconditioners. In International Conference on Computational Science. 632--641."},{"key":"e_1_3_2_1_13_1","volume-title":"International Workshop, Para 2006","author":"Gilbert John R.","year":"2006","unstructured":"John R. Gilbert , Steve Reinhardt , and Viral B. Shah . 2007. High-Performance Graph Algorithms from Parallel Sparse Matrices. In Applied Parallel Computing. State of the Art in Scientific Computing , International Workshop, Para 2006 , Ume\u00e5, Sweden , June 18-21, 2006 , Revised Selected Papers. 260--269. John R. Gilbert, Steve Reinhardt, and Viral B. Shah. 2007. High-Performance Graph Algorithms from Parallel Sparse Matrices. In Applied Parallel Computing. State of the Art in Scientific Computing, International Workshop, Para 2006, Ume\u00e5, Sweden, June 18-21, 2006, Revised Selected Papers. 260--269."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"W. D. Gropp D. K. Kaushik D. E. Keyes and B. F. Smith. 2000. Towards Realistic Performance Bounds for Implicit CFD Codes. Parallel Computational Fluid Dynamics (2000) 241--248.  W. D. Gropp D. K. Kaushik D. E. Keyes and B. F. Smith. 2000. Towards Realistic Performance Bounds for Implicit CFD Codes. Parallel Computational Fluid Dynamics (2000) 241--248.","DOI":"10.1016\/B978-044482851-4.50030-X"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-016-5588-7"},{"key":"e_1_3_2_1_16_1","first-page":"1095","article-title":"An Overview of Trilinos","volume":"30","author":"Heroux Michael","year":"2003","unstructured":"Michael Heroux , Roscoe Bartlett , Vicki Howle , Robert Hoekstra , Jonathan Hu , Tamara Kolda , Richard Lehoucq , Kevin Long , Roger Pawlowski , and Eric Phipps . 2003 . An Overview of Trilinos . Sandia National Laboratories 30 , 1 (2003), 1095 -- 1101 . Michael Heroux, Roscoe Bartlett, Vicki Howle, Robert Hoekstra, Jonathan Hu, Tamara Kolda, Richard Lehoucq, Kevin Long, Roger Pawlowski, and Eric Phipps. 2003. An Overview of Trilinos. Sandia National Laboratories 30, 1 (2003), 1095--1101.","journal-title":"Sandia National Laboratories"},{"key":"e_1_3_2_1_17_1","volume-title":"Optimization of Sparse Matrix Kernels for Data Mining. In Siam Conf on Data Mining.","author":"Im Eun Jin","year":"2000","unstructured":"Eun Jin Im and Katherine Yelick . 2000 . Optimization of Sparse Matrix Kernels for Data Mining. In Siam Conf on Data Mining. Eun Jin Im and Katherine Yelick. 2000. Optimization of Sparse Matrix Kernels for Data Mining. In Siam Conf on Data Mining."},{"key":"e_1_3_2_1_18_1","unstructured":"Intel. 2018. Math Kernel Library MKL.  Intel. 2018. Math Kernel Library MKL."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2029256.2029301"},{"key":"e_1_3_2_1_20_1","unstructured":"Pramod Kumbhar. 2011. Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes. (2011).  Pramod Kumbhar. 2011. Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes. (2011)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2401575"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2015.2401575"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0825-3"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751209"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2015.04.004"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465013"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2015.7245713"},{"volume-title":"Parallel Sparse Matrix-Vector Multiplication Using Accelerators","author":"Maeda Hiroshi","key":"e_1_3_2_1_28_1","unstructured":"Hiroshi Maeda and Daisuke Takahashi . 2016. Parallel Sparse Matrix-Vector Multiplication Using Accelerators . Springer International Publishing . Hiroshi Maeda and Daisuke Takahashi. 2016. Parallel Sparse Matrix-Vector Multiplication Using Accelerators. Springer International Publishing."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851190"},{"key":"e_1_3_2_1_31_1","unstructured":"Nvida. 2018. CUSparse Library.  Nvida. 2018. CUSparse Library."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSoC.2014.43"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304624"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1658"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.006"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2555243.2555255"},{"key":"e_1_3_2_1_37_1","volume-title":"Petiton","author":"Ye Fan","year":"2014","unstructured":"Fan Ye , Christophe Calvin , and Serge G . Petiton . 2014 . A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture. Springer International Publishing . 43--56 pages. Fan Ye, Christophe Calvin, and Serge G. Petiton. 2014. A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture. Springer International Publishing. 43--56 pages."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"J. Zhang C. Zhou Y. Wang L. Ju Q. Du X. Chi D. Xu D. Chen Y. Liu and Z. Liu. 2016. Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer. (Nov 2016) 34--45.   J. Zhang C. Zhou Y. Wang L. Ju Q. Du X. Chi D. Xu D. Chen Y. Liu and Z. Liu. 2016. Extreme-Scale Phase Field Simulations of Coarsening Dynamics on the Sunway TaihuLight Supercomputer. (Nov 2016) 34--45.","DOI":"10.1109\/SC.2016.3"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2994148"},{"key":"e_1_3_2_1_40_1","first-page":"1906","article-title":"Optimizations on Sparse Matrix-Vector Multiplication Based on CUDA","volume":"18","author":"Zhou Hong","year":"2010","unstructured":"Hong Zhou , Xiaoya Fan , and Lili Zhao . 2010 . Optimizations on Sparse Matrix-Vector Multiplication Based on CUDA . Computer Measurement & Control 18 , 8 (2010), 1906 -- 1895 . Hong Zhou, Xiaoya Fan, and Lili Zhao. 2010. Optimizations on Sparse Matrix-Vector Multiplication Based on CUDA. Computer Measurement & Control 18, 8 (2010), 1906--1895.","journal-title":"Computer Measurement & Control"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Francisco Zquez Jos Ndez Garz and Ester M N. 2012. Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Elsevier Science Publishers B. V. 408--420 pages.  Francisco Zquez Jos Ndez Garz and Ester M N. 2012. Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Elsevier Science Publishers B. V. 408--420 pages.","DOI":"10.1016\/j.parco.2011.08.003"}],"event":{"name":"ICPP 2018: 47th International Conference on Parallel Processing","sponsor":["University of Oregon University of Oregon"],"location":"Eugene OR USA","acronym":"ICPP 2018"},"container-title":["Proceedings of the 47th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3225058.3225074","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3225058.3225074","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:39:06Z","timestamp":1750210746000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3225058.3225074"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,13]]},"references-count":40,"alternative-id":["10.1145\/3225058.3225074","10.1145\/3225058"],"URL":"https:\/\/doi.org\/10.1145\/3225058.3225074","relation":{},"subject":[],"published":{"date-parts":[[2018,8,13]]},"assertion":[{"value":"2018-08-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}