{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:40:08Z","timestamp":1750297208757,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,10,27]],"date-time":"2024-10-27T00:00:00Z","timestamp":1729987200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,27]]},"DOI":"10.1145\/3676536.3676780","type":"proceedings-article","created":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T12:53:56Z","timestamp":1744203236000},"page":"1-9","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["MatFactory: A Framework for High-Performance Matrix Factorization on FPGAs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-1464-5271","authenticated-orcid":false,"given":"Mingzhe","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2127-6011","authenticated-orcid":false,"given":"Xiaochen","family":"Hao","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3275-7791","authenticated-orcid":false,"given":"Hongbo","family":"Rong","sequence":"additional","affiliation":[{"name":"Parallel Computing Lab, Intel, San Jose, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4281-1018","authenticated-orcid":false,"given":"Wenguang","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology\/SIGS, Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,4,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.1999.832392"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1971.223171"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/EE.1941.6434611"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3039409"},{"key":"e_1_3_2_1_5_1","volume-title":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13","author":"Matteis Tiziano De","year":"2020","unstructured":"Tiziano De Matteis, Johannes de Fine Licht, and Torsten Hoefler. 2020. FBLAS: Streaming linear algebra on FPGA. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046204"},{"key":"e_1_3_2_1_7_1","unstructured":"The Khronos Group. 2024. SYCL. https:\/\/www.khronos.org\/sycl"},{"key":"e_1_3_2_1_8_1","volume-title":"Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs. In 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","author":"Hao Xiaochen","year":"2023","unstructured":"Xiaochen Hao, Mingzhe Zhang, Ce Sun, Zhuofu Tao, Hongbo Rong, Yu Zhang, Lei He, Eric Petit, Wenguang Chen, and Yun Liang. 2023. Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs. In 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 34--40."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2018.2852260"},{"key":"e_1_3_2_1_10_1","unstructured":"Intel. 2024. Developer Clouds for Accelerated Computing. https:\/\/devcloud.intel.com"},{"key":"e_1_3_2_1_11_1","unstructured":"Intel. 2024. Intel Arria 10 FPGA and SoC Product Table. https:\/\/cdrdv2-public.intel.com\/714167\/arria-10-product-table.pdf"},{"key":"e_1_3_2_1_12_1","unstructured":"Intel. 2024. oneAPI Programming Model. https:\/\/www.oneapi.io"},{"key":"e_1_3_2_1_13_1","unstructured":"Intel. 2024. oneAPI Samples. https:\/\/github.com\/oneapi-src\/oneAPI-samples"},{"key":"e_1_3_2_1_14_1","unstructured":"Intel. 2024. Removing Loop-Carried Dependencies Caused by Accesses to Memory Arrays. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/programmable\/683521\/21-4\/removing-loop-carried-dependencies-caused.html"},{"key":"e_1_3_2_1_15_1","unstructured":"Intel. 2024. Thermal and Power Guidelines. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/programmable\/683795\/current\/introduction.html"},{"key":"e_1_3_2_1_16_1","volume-title":"Waqas Ahmed TOOR, et al","author":"Irfan Raafia","year":"2017","unstructured":"Raafia Irfan, Waqas Ahmed TOOR, et al. 2017. FPGA-based Low Latency Inverse QRD Architecture for Adaptive Beamforming in Phased Array Radars. radioengineering 26, 3 (2017)."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.24"},{"key":"e_1_3_2_1_18_1","volume-title":"International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA). 1--12","author":"Johnson Jeremy","year":"2008","unstructured":"Jeremy Johnson, Timothy Chagnon, Petya Vachranukunkiet, Prawat Nagvajara, and Chika Nwankpa. 2008. Sparse LU decomposition using FPGA. In International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA). 1--12."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSI-TSA\/VLSI-DAT57221.2023.10134072"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1982.1653825"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174273"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3571856"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2006.1692603"},{"volume-title":"Floating point STAP implementation on FPGAs. In 2011 IEEE RadarCon (RADAR)","author":"Mauer Volker","key":"e_1_3_2_1_24_1","unstructured":"Volker Mauer and Michael Parker. 2011. Floating point STAP implementation on FPGAs. In 2011 IEEE RadarCon (RADAR). IEEE, 901--904."},{"key":"e_1_3_2_1_25_1","unstructured":"NVIDIA. 2024. Basic Linear Algebra on GPUs. https:\/\/developer.nvidia.com\/cublas"},{"key":"e_1_3_2_1_26_1","unstructured":"NVIDIA. 2024. Direct Linear Solvers on NVIDIA GPUs. https:\/\/developer.nvidia.com\/cusolver"},{"key":"e_1_3_2_1_27_1","unstructured":"Hiroyuki Ootomo and Rio Yokota. 2019. TSQR on TensorCores. In SC19 research poster."},{"key":"e_1_3_2_1_28_1","unstructured":"E Peise and P Bientinesi. 2016. Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection.(2016)."},{"key":"e_1_3_2_1_29_1","unstructured":"Jimmy Pettersson and Ian Wainwright. 2010. Radar signal processing with graphics processors (GPUs)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2012.6339142"},{"key":"e_1_3_2_1_31_1","first-page":"447","article-title":"An fpga-based unscented kalman filter for system-on-chip applications","volume":"64","author":"Soh Jeremy","year":"2016","unstructured":"Jeremy Soh and Xiaofeng Wu. 2016. An fpga-based unscented kalman filter for system-on-chip applications. IEEE Transactions on Circuits and Systems II: Express Briefs 64, 4 (2016), 447--451.","journal-title":"IEEE Transactions on Circuits and Systems II: Express Briefs"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439292"},{"key":"e_1_3_2_1_33_1","volume-title":"A high performance and memory efficient LU decomposer on FPGAs","author":"Wu Guiming","year":"2010","unstructured":"Guiming Wu, Yong Dou, Junqing Sun, and Gregory D Peterson. 2010. A high performance and memory efficient LU decomposer on FPGAs. IEEE transactions on computers 61, 3 (2010), 366--378."},{"key":"e_1_3_2_1_34_1","unstructured":"Xilinx. 2024. AMD Alveo U280 Product Brief. https:\/\/www.xilinx.com\/content\/dam\/xilinx\/publications\/product-briefs\/alveo-u280-product-brief.pdf"},{"key":"e_1_3_2_1_35_1","unstructured":"Xilinx. 2024. Vitis Accelerated Libraries. https:\/\/www.xilinx.com\/products\/design-tools\/vitis\/vitis-libraries.html"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133352.2133358"}],"event":{"name":"ICCAD '24: 43rd IEEE\/ACM International Conference on Computer-Aided Design","sponsor":["SIGDA ACM Special Interest Group on Design Automation","IEEE CAS","IEEE CEDA","IEEE EDS"],"location":"Newark Liberty International Airport Marriott New York NY USA","acronym":"ICCAD '24"},"container-title":["Proceedings of the 43rd IEEE\/ACM International Conference on Computer-Aided Design"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3676536.3676780","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3676536.3676780","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:44Z","timestamp":1750295924000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3676536.3676780"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,27]]},"references-count":36,"alternative-id":["10.1145\/3676536.3676780","10.1145\/3676536"],"URL":"https:\/\/doi.org\/10.1145\/3676536.3676780","relation":{},"subject":[],"published":{"date-parts":[[2024,10,27]]},"assertion":[{"value":"2025-04-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}