{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:26:35Z","timestamp":1750307195281,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2012,3,1]],"date-time":"2012-03-01T00:00:00Z","timestamp":1330560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2012,3]]},"abstract":"<jats:p>FPGAs have the potential to serve as a platform for accelerating many computations including scientific applications. However, the large development cost and short life span for FPGA designs have limited their adoption by the scientific computing community. FPGA-based scientific computing and many kinds of embedded computing could become more practical if there were hardware libraries that were portable to any FPGA-based system with performance that scaled with the size of the FPGA. To illustrate this idea we have implemented one common super-computing library function: the LU factorization method for solving systems of linear equations. This paper describes a method for making the design both portable and scalable that should be illustrative if such libraries are to be built in the future. The design is a software-based generator that leverages both the flexibility of a software programming language and the parameters inherent in an hardware description language. The generator accepts parameters that describe the FPGA capacity and external memory capabilities. We compare the performance of our engine executing on the largest FPGA available at the time of this work (an Altera Stratix III 3S340) to a single processor core fabricated in the same 65nm IC process running a highly optimized software implementation from the processor vendor. For single precision matrices on the order of 10,000 \u00d7 10,000 elements, the FPGA implementation is 2.2 times faster and the energy dissipated per useful GFLOP operation is a factor of 5 times less. For double precision, the FPGA implementation is 1.7 times faster and 3.5 times more energy efficient.<\/jats:p>","DOI":"10.1145\/2133352.2133358","type":"journal-article","created":{"date-parts":[[2012,3,20]],"date-time":"2012-03-20T12:04:05Z","timestamp":1332245045000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Portable and scalable FPGA-based acceleration of a direct linear system solver"],"prefix":"10.1145","volume":"5","author":[{"given":"Wei","family":"Zhang","sequence":"first","affiliation":[{"name":"University of Toronto, ON, Canada"}]},{"given":"Vaughn","family":"Betz","sequence":"additional","affiliation":[{"name":"University of Toronto, ON, Canada"}]},{"given":"Jonathan","family":"Rose","sequence":"additional","affiliation":[{"name":"University of Toronto, ON, Canada"}]}],"member":"320","published-online":{"date-parts":[[2012,3,23]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Agility Design Solutions Inc. 2008. Handel-c. http:\/\/www.agilityds.com\/products\/c_based_products\/dk_design_suite\/handel-c.aspx.  Agility Design Solutions Inc. 2008. Handel-c. http:\/\/www.agilityds.com\/products\/c_based_products\/dk_design_suite\/handel-c.aspx."},{"key":"e_1_2_1_2_1","unstructured":"Altera. 2008. Netlist optimizations and physical synthesis. Tech. rep. Altera Corporation. http:\/\/www.altera.com\/literature\/hb\/qts\/qts_qii52007.pdf.  Altera. 2008. Netlist optimizations and physical synthesis. Tech. rep. Altera Corporation. http:\/\/www.altera.com\/literature\/hb\/qts\/qts_qii52007.pdf."},{"key":"e_1_2_1_3_1","unstructured":"Altera Corporation. 2008. Intellectual property solutions. http:\/\/www.altera.com\/products\/ip\/ipm- index.html.  Altera Corporation. 2008. Intellectual property solutions. http:\/\/www.altera.com\/products\/ip\/ipm- index.html."},{"key":"e_1_2_1_4_1","unstructured":"AutoESL. 2008. Auto pilot synthesis tool. http:\/\/www.autoesl.com\/.  AutoESL. 2008. Auto pilot synthesis tool. http:\/\/www.autoesl.com\/."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1117201.1117204"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"e_1_2_1_7_1","unstructured":"Cray Inc. 2008. http:\/\/www.cray.com.  Cray Inc. 2008. http:\/\/www.cray.com."},{"volume-title":"Proceedings of the International Conference on Engineering of Reconfigureable Systems and Algorithms.","author":"Daga V.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046203"},{"key":"e_1_2_1_10_1","unstructured":"Diersch H. J. G. 2008. Error norm. http:\/\/www1.wasy.de\/deutsch\/produkte\/feflow\/hilfe\/general\/theory\/whitepapers\/error_norms\/enornorm.html.  Diersch H. J. G. 2008. Error norm. http:\/\/www1.wasy.de\/deutsch\/produkte\/feflow\/hilfe\/general\/theory\/whitepapers\/error_norms\/enornorm.html."},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Dongarra J. J. Duff I. S. Sorensen D. C. and van der Vorst H. A. 1998. Numerical Linear Algebra for High-Performance Computers. SIAM Philadelphia PA.   Dongarra J. J. Duff I. S. Sorensen D. C. and van der Vorst H. A. 1998. Numerical Linear Algebra for High-Performance Computers. SIAM Philadelphia PA.","DOI":"10.1137\/1.9780898719611"},{"key":"e_1_2_1_12_1","unstructured":"Hager W. W. 1988. Applied Numerical Linear Algebra. Prentice Hall Englewood Cliffs NJ.  Hager W. W. 1988. Applied Numerical Linear Algebra. Prentice Hall Englewood Cliffs NJ."},{"key":"e_1_2_1_13_1","unstructured":"Intel. 2008. Intel math kernel library. http:\/\/www.intel.com\/cd\/software\/products\/asmo-na\/eng\/307757.htm.  Intel. 2008. Intel math kernel library. http:\/\/www.intel.com\/cd\/software\/products\/asmo-na\/eng\/307757.htm."},{"key":"e_1_2_1_14_1","unstructured":"Intel Corporation. 2008. Intel Xeon processor 5160. http:\/\/processorfinder.intel.com\/Details.aspx?sSpec=SLABS.  Intel Corporation. 2008. Intel Xeon processor 5160. http:\/\/processorfinder.intel.com\/Details.aspx?sSpec=SLABS."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2003.812306"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78610-8_10"},{"volume-title":"Proceedings of the 6th Annual IEEE Symposium on FPGAs for Custom Computing Machines. 485--498","author":"Mencer O.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","unstructured":"Mentor Graphics. 2008. Catapult synthesis. http:\/\/www.mentor.com\/products\/esl\/high_level_synthesis\/catapult_synthesis\/index.cfm.  Mentor Graphics. 2008. Catapult synthesis. http:\/\/www.mentor.com\/products\/esl\/high_level_synthesis\/catapult_synthesis\/index.cfm."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2007.110"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2007.103"},{"key":"e_1_2_1_21_1","unstructured":"NVIDIA Corporation. 2011. Geforce gtx 280. http:\/\/www.nvidia.com\/object\/product_geforce_gtx_280_us.html.  NVIDIA Corporation. 2011. Geforce gtx 280. http:\/\/www.nvidia.com\/object\/product_geforce_gtx_280_us.html."},{"key":"e_1_2_1_22_1","unstructured":"SRC Computers. 2008. http:\/\/www.srccomp.com.  SRC Computers. 2008. http:\/\/www.srccomp.com."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2008.89"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (SC'08)","author":"Volkov V.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","unstructured":"XtremeData Inc. 2008. http:\/\/www.xtremedatainc.com.  XtremeData Inc. 2008. http:\/\/www.xtremedatainc.com."},{"key":"e_1_2_1_26_1","unstructured":"Zhang W. 2008. Portable and scalable FPGA-based acceleration of a direct linear system solver. M.A.Sc. Thesis University of Toronto.  Zhang W. 2008. Portable and scalable FPGA-based acceleration of a direct linear system solver. M.A.Sc. Thesis University of Toronto."},{"volume-title":"Proceedings of the International Conference on Field-Programmable Technology. 17--24","author":"Zhang W.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2008.55"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046202"},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications. 1--6.","author":"Zhuo L.","key":"e_1_2_1_30_1"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2133352.2133358","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2133352.2133358","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:05Z","timestamp":1750241165000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2133352.2133358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,3]]}},"alternative-id":["10.1145\/2133352.2133358"],"URL":"https:\/\/doi.org\/10.1145\/2133352.2133358","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2012,3]]},"assertion":[{"value":"2010-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-03-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}