{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:32:52Z","timestamp":1750307572291,"version":"3.41.0"},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2010,1,1]],"date-time":"2010-01-01T00:00:00Z","timestamp":1262304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2010,1]]},"abstract":"<jats:p>Double precision floating point Sparse Matrix-Vector Multiplication (SMVM) is a critical computational kernel used in iterative solvers for systems of sparse linear equations. The poor data locality exhibited by sparse matrices along with the high memory bandwidth requirements of SMVM result in poor performance on general purpose processors. Field Programmable Gate Arrays (FPGAs) offer a possible alternative with their customizable and application-targeted memory sub-system and processing elements. In this work we investigate two separate implementations of the SMVM on an SRC-6 MAPStation workstation. The first implementation investigates the peak performance capability, while the second implementation balances the amount of instantiated logic with the available sustained bandwidth of the FPGA subsystem. Both implementations yield the same sustained performance with the second producing a much more efficient solution. The metrics of processor and application balance are introduced to help provide some insight into the efficiencies of the FPGA and CPU based solutions explicitly showing the tight coupling of the available bandwidth to peak floating point performance. Due to the FPGAs ability to balance the amount of implemented logic to the available memory bandwidth it can provide a much more efficient solution. Finally, making use of the lessons learned implementing the SMVM, we present a fully implemented non-preconditioned Conjugate Gradient Algorithm utilizing the second SMVM design.<\/jats:p>","DOI":"10.1145\/1661438.1661440","type":"journal-article","created":{"date-parts":[[2010,1,26]],"date-time":"2010-01-26T14:01:38Z","timestamp":1264514498000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application"],"prefix":"10.1145","volume":"3","author":[{"given":"David","family":"Dubois","sequence":"first","affiliation":[{"name":"Los Alamos National Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Dubois","sequence":"additional","affiliation":[{"name":"Los Alamos National Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Boorman","sequence":"additional","affiliation":[{"name":"Los Alamos National Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carolyn","family":"Connor","sequence":"additional","affiliation":[{"name":"Los Alamos National Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Steve","family":"Poole","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2010,1]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Achronix Semiconductor Corporation. 2008. Speedster data sheet. http:\/\/www.achronix.com. Achronix Semiconductor Corporation . 2008. Speedster data sheet. http:\/\/www.achronix.com."},{"volume-title":"Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods","author":"Barrett R.","key":"e_1_2_1_2_1","unstructured":"Barrett , R. , Berry , M. , Chan , T. , Demmel , J. , Donato , J. , Dongarra , J. , Eijkhout , V. , Pozo , R. , Romine , C. , and Ven der Vorst , H. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods . SIAM , Philadelphia, PA . Barrett, R., Berry, M., Chan, T., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and Ven der Vorst, H. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046203"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2008.54"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2008.53"},{"key":"e_1_2_1_6_1","unstructured":"Fettig J. Kwok W.-Y. and Saied F. 2002. Scaling Behavior of Linear Solvers on Large Linux Clusters. National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign IL. Fettig J. Kwok W.-Y. and Saied F. 2002. Scaling Behavior of Linear Solvers on Large Linux Clusters . National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign IL."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968304"},{"key":"e_1_2_1_8_1","volume-title":"Computer Architecture: A Quantitative Approach","author":"Hennessy J.","year":"2003","unstructured":"Hennessy , J. and Patterson , D . 2003 . Computer Architecture: A Quantitative Approach , 3 rd Ed. Morgan Kaufmann . Hennessy, J. and Patterson, D. 2003. Computer Architecture: A Quantitative Approach, 3rd Ed. Morgan Kaufmann.","edition":"3"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1080\/00029890.1998.12004985"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/11752578_63"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/977091.977115"},{"key":"e_1_2_1_12_1","unstructured":"Mills R. T. D\u2019Azevedo E. F. and Fahey M. R. 2005. Progress towards optimizing the PETSc numerical toolkit on the Cray X1. Cray Users Group http:\/\/www.ccs.ornl.gov\/~rmills\/pubs\/cug2005.pdf. Mills R. T. D\u2019Azevedo E. F. and Fahey M. R. 2005. Progress towards optimizing the PETSc numerical toolkit on the Cray X1. Cray Users Group http:\/\/www.ccs.ornl.gov\/~rmills\/pubs\/cug2005.pdf."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2006.8"},{"volume-title":"An introduction to the conjugate gradient method without the agonizing pain. Tech. rep. UMI Order Number: CS-94-125","author":"Shewchuk J. R.","key":"e_1_2_1_14_1","unstructured":"Shewchuk , J. R. 1994. An introduction to the conjugate gradient method without the agonizing pain. Tech. rep. UMI Order Number: CS-94-125 , Carnegie Mellon University . Shewchuk, J. R. 1994. An introduction to the conjugate gradient method without the agonizing pain. Tech. rep. UMI Order Number: CS-94-125, Carnegie Mellon University."},{"key":"e_1_2_1_15_1","unstructured":"SRC Computers Inc. 2008. Product page. http:\/\/www.srccomp.com\/products\/products.asp. SRC Computers Inc. 2008. Product page. http:\/\/www.srccomp.com\/products\/products.asp."},{"key":"e_1_2_1_16_1","unstructured":"SRC Computers Inc. SRC C Programming Environment v. 2.1 Guide. SRC Computers Inc. SRC Computers Inc . SRC C Programming Environment v. 2.1 Guide . SRC Computers Inc."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.60"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.416.0711"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968305"},{"key":"e_1_2_1_20_1","unstructured":"Wellein G. Hager G. and Zeiser T. 2005. Basic principles of modern processors: Memory hierarchy optimization of data access. http:\/\/www.rrze.unierlangen.de\/ausbildung\/vorlesungen\/04-25_2005_ptfs.pdf. Wellein G. Hager G. and Zeiser T. 2005. Basic principles of modern processors: Memory hierarchy optimization of data access. http:\/\/www.rrze.unierlangen.de\/ausbildung\/vorlesungen\/04-25_2005_ptfs.pdf."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046202"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1661438.1661440","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1661438.1661440","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:41:03Z","timestamp":1750250463000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1661438.1661440"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1]]}},"alternative-id":["10.1145\/1661438.1661440"],"URL":"https:\/\/doi.org\/10.1145\/1661438.1661440","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2010,1]]},"assertion":[{"value":"2008-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}