{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T23:25:27Z","timestamp":1773271527777,"version":"3.50.1"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,11,18]],"date-time":"2024-11-18T00:00:00Z","timestamp":1731888000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>Sparse matrix multiplication (SpMM) plays a critical role in high-performance computing applications, such as deep learning, image processing, and physical simulation. Field-Programmable Gate Arrays (FPGAs), with their configurable hardware resources, can be tailored to accelerate SpMMs. There has been considerable research on deploying sparse matrix multipliers across various FPGA platforms. However, the FPGA-based design of sparse matrix multipliers still presents numerous challenges. Therefore, it is necessary to summarize and organize the current work to provide a reference for further research. This article first introduces the computational method of SpMM and categorizes the different challenges of FPGA deployment. Following this, we introduce and analyze a variety of state-of-the-art FPGA-based accelerators tailored for SpMMs. In addition, a comparative analysis of these accelerators is performed, examining metrics including compression rate, throughput, and resource utilization. Finally, we propose potential research directions and challenges for further study of FPGA-based SpMM accelerators.<\/jats:p>","DOI":"10.1145\/3687480","type":"journal-article","created":{"date-parts":[[2024,8,28]],"date-time":"2024-08-28T17:02:39Z","timestamp":1724864559000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-Art to Future Opportunities"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1380-7105","authenticated-orcid":false,"given":"Yajing","family":"Liu","sequence":"first","affiliation":[{"name":"Fuzhou University, Fuzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6837-5675","authenticated-orcid":false,"given":"Ruiqi","family":"Chen","sequence":"additional","affiliation":[{"name":"Vrije University Brussel, Brussel, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1815-2324","authenticated-orcid":false,"given":"Shuyang","family":"Li","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7116-4699","authenticated-orcid":false,"given":"Jing","family":"Yang","sequence":"additional","affiliation":[{"name":"Fuzhou University, Fuzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9003-8966","authenticated-orcid":false,"given":"Shun","family":"Li","sequence":"additional","affiliation":[{"name":"Fuzhou University, Fuzhou, China and VeriMake Innovation Lab, Nanjing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4877-9688","authenticated-orcid":false,"given":"Bruno","family":"da Silva","sequence":"additional","affiliation":[{"name":"Vrije University Brussel, Brussel, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,18]]},"reference":[{"key":"e_1_3_1_2_2","volume-title":"Choosing the Right Architecture for Real-Time Signal Processing Designs","author":"Adams Leon","year":"2002","unstructured":"Leon Adams and Strategic Marketing. 2002. Choosing the Right Architecture for Real-Time Signal Processing Designs. Texas Instruments, Dallas, TX, USA."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2015.75"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011403516"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1109\/FPL60245.2023.00039","volume-title":"2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)","author":"Chen Ruiqi","year":"2023","unstructured":"Ruiqi Chen, Haoyang Zhang, Shun Li, Enhao Tang, Jun Yu, and Kun Wang. 2023a. Graph-OPU: A highly integrated FPGA-based overlay processor for graph neural networks. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, Gothenburg, Sweden, 228\u2013234. DOI: 10.1109\/FPL60245.2023.00039"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS46773.2023.10181734"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.INS.2020.03.020"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-019-04121-z"},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1145\/3385412.3385963","volume-title":"41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)","author":"Chou Stephen","year":"2020","unstructured":"Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2020. Automatic generation of efficient sparse tensor format conversion routines. In 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, Copenhagen, Denmark, 823\u2013838. DOI: 10.1145\/3385412.3385963"},{"key":"e_1_3_1_10_2","first-page":"1","volume-title":"55th Annual Design Automation Conference (DAC)","author":"Cong Jason","year":"2018","unstructured":"Jason Cong, Peng Wei, Cody Hao Yu, and Peng Zhang. 2018. Automated accelerator generation and optimization with composable, parallel and pipeline architecture. In 55th Annual Design Automation Conference (DAC). ACM, New York, NY, 1\u20136. DOI: 10.1145\/3195970.3195999"},{"key":"e_1_3_1_11_2","unstructured":"Paolo D\u2019Alberto Abhishek Jain Ismail Bustany Henri Fraisse and Mansimran Benipal. 2023. Entropy maximization in sparse matrix by vector multiplication ( \\(ESpMV\\) ). arXiv:2308.00106 1\u201326. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2308.00106"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","unstructured":"Mehmet Deveci Simon D. Hammond Michael M. Wolf and Sivasankaran Rajamanickam. 2018. Sparse matrix-matrix multiplication on multilevel memory architectures: Algorithms and experiments. arXiv:1804.00695 1\u201324. DOI: 10.48550\/arXiv.1804.00695","DOI":"10.48550\/arXiv.1804.00695"},{"key":"e_1_3_1_14_2","first-page":"54","volume-title":"2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)","author":"Du Yixiao","year":"2022","unstructured":"Yixiao Du, Yuwei Hu, Zhongchun Zhou, and Zhiru Zhang. 2022. High-performance sparse linear algebra on HBM-equipped FPGAs using HLS: A case study on SpMV. In 2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, New York, NY, 54\u201364. DOI: 10.1145\/3490422.3502368"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567810"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ScalA.2018.00011"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2023.3281714"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.43"},{"key":"e_1_3_1_19_2","first-page":"293","volume-title":"32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)","author":"Gu Zhixiang","year":"2020","unstructured":"Zhixiang Gu, Jose Moreira, David Edelsohn, and Ariful Azad. 2020. Bandwidth optimized parallel algorithms for sparse matrix-matrix multiplication using propagation blocking. In 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). ACM, Virtual Event, USA, 293\u2013303. DOI: 10.1145\/3350755.3400216"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289185"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/355791.355796"},{"key":"e_1_3_1_22_2","first-page":"1","volume-title":"34th ACM International Conference on Supercomputing (ICS)","author":"He Xin","year":"2020","unstructured":"Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, and Trevor Mudge. 2020. Sparse-TPU: Adapting systolic arrays for sparse matrices. In 34th ACM International Conference on Supercomputing (ICS). ACM, Barcelona, Spain, 1\u201312. DOI: 10.1145\/3392717.3392751"},{"key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/HPCA51647.2021.00017","volume-title":"2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)","author":"Hojabr Reza","year":"2021","unstructured":"Reza Hojabr, Ali Sedaghati, Amirali Sharifian, Ahmad Khonsari, and Arrvindh Shriraman. 2021. SPAGHETTI: Streaming accelerators for highly sparse GEMM on FPGAs. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, Seoul, Korea (South), 84\u201396. DOI: 10.1109\/HPCA51647.2021.00017"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2019.2912923"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"Mohammad Hosseinabady Mohd Amiruddin Bin Zainol and Jose Nunez-Yanez. 2019. Heterogeneous FPGA+ GPU embedded systems: Challenges and opportunities. arXiv:1901.06331 1\u201310. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1901.06331","DOI":"10.1145\/3182172"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1287\/opre.9.6.841"},{"key":"e_1_3_1_27_2","first-page":"1","volume-title":"2021 IEEE\/ACM International Conference on Computer Aided Design (ICCAD)","author":"Hu Yuwei","year":"2021","unstructured":"Yuwei Hu, Yixiao Du, Ecenur Ustun, and Zhiru Zhang. 2021. GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs. In 2021 IEEE\/ACM International Conference on Computer Aided Design (ICCAD). IEEE, Munich, Germany, 1\u20139. DOI: 10.1109\/ICCAD51958.2021.9643582"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/0010-4655(95)00031-A"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM57271.2023.00023"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM51124.2021.00026"},{"key":"e_1_3_1_31_2","first-page":"1","volume-title":"44th Annual International Symposium on Computer Architecture (ISCA)","author":"Jouppi Norman P.","year":"2017","unstructured":"Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In 44th Annual International Symposium on Computer Architecture (ISCA). ACM, Toronto, Canada, 1\u201312. DOI: 10.1145\/3079856.3080246"},{"key":"e_1_3_1_32_2","first-page":"1","volume-title":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","author":"Kepner Jeremy","year":"2019","unstructured":"Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Ryan Robinett, and Sid Samsi. 2019. Sparse deep neural network graph challenge. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, Waltham, MA, USA, 1\u20137. DOI: 10.1109\/HPEC.2019.8916336"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2012.12"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2023.3281719"},{"key":"e_1_3_1_35_2","first-page":"1","volume-title":"2021 IEEE\/ACM International Conference on Computer Aided Design (ICCAD)","author":"Li Shiqing","year":"2021","unstructured":"Shiqing Li, Di Liu, and Weichen Liu. 2021. Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In 2021 IEEE\/ACM International Conference on Computer Aided Design (ICCAD). IEEE, Munich, Germany, 1\u20139. DOI: 10.1109\/ICCAD51958.2021.9643453"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2023.3281715"},{"key":"e_1_3_1_37_2","first-page":"1","volume-title":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Li Shiqing","year":"2023","unstructured":"Shiqing Li and Weichen Liu. 2023. Accelerating Gustavson-based SpMM on embedded FPGAs with element-wise parallelism and access pattern-aware caches. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, Antwerp, Belgium, 1\u20136. DOI: 10.23919\/DATE56975.2023.10136958"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC-DSS-SmartCity-DependSys57074.2022.00171"},{"key":"e_1_3_1_39_2","first-page":"1","volume-title":"50th International Conference on Parallel Processing Workshop","author":"Liao Hui-Hsin","year":"2021","unstructured":"Hui-Hsin Liao, Chao-Lin Lee, Jenq-Kuen Lee, Wei-Chih Lai, Ming-Yu Hung, and Chung-Wen Huang. 2021. Support convolution of CNN with compression sparse matrix multiplication flow in TVM. In 50th International Conference on Parallel Processing Workshop. ACM, New York, NY, 1\u20137. DOI: 10.1145\/3458744.3473352"},{"key":"e_1_3_1_40_2","first-page":"8773","volume-title":"AAAI Conference on Artificial Intelligence","author":"Likhosherstov Valerii","year":"2023","unstructured":"Valerii Likhosherstov, Krzysztof Choromanski, and Adrian Weller. 2023. On the expressive flexibility of self-attention matrices. In AAAI Conference on Artificial Intelligence. PKP, Washington DC, USA, 8773\u20138781. DOI: 10.1609\/aaai.v37i7.26055"},{"key":"e_1_3_1_41_2","first-page":"33","volume-title":"28th Asia and South Pacific Design Automation Conference (ASP-DAC)","author":"Liu Bowen","year":"2023","unstructured":"Bowen Liu and Dajiang Liu. 2023. Towards high-bandwidth-utilization SpMV on FPGAs via partial vector duplication. In 28th Asia and South Pacific Design Automation Conference (ASP-DAC). ACM, New York, NY, 33\u201338. DOI: 10.1145\/3566097.3567839"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-018-0604-8"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3355399"},{"key":"e_1_3_1_44_2","first-page":"01","volume-title":"2023 IEEE Guwahati Subsection Conference (GCON)","author":"Mandal Uditnarayan","year":"2023","unstructured":"Uditnarayan Mandal and Arighna Deb. 2023. ReMCOO: An efficient representation of sparse matrix-vector multiplication. In 2023 IEEE Guwahati Subsection Conference (GCON). IEEE, Guwahati, India, 01\u201306. DOI: 10.1109\/GCON58516.2023.10183488"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2023.3344681"},{"key":"e_1_3_1_46_2","first-page":"65","volume-title":"2010 Eurographics\/ACM SIGGRAPH Symposium on Computer Animation (SCA)","author":"McAdams Aleka","year":"2010","unstructured":"Aleka McAdams, Eftychios Sifakis, and Joseph Teran. 2010. A parallel multigrid Poisson solver for fluids simulation on large grids. In 2010 Eurographics\/ACM SIGGRAPH Symposium on Computer Animation (SCA). Eurographics Association, Madrid, Spain, 65\u201373. DOI: 10.2312\/SCA\/SCA10\/065-073"},{"key":"e_1_3_1_47_2","first-page":"90","volume-title":"27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","author":"Niu Yuyao","year":"2022","unstructured":"Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu. 2022. TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs. In 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, Seoul, Republic of Korea, 90\u2013106. DOI: 10.1145\/3503221.3508431"},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1109\/FPL60245.2023.00029","volume-title":"2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)","author":"Oliver Jos\u00e9","year":"2023","unstructured":"Jos\u00e9 Oliver, Carlos \u00c1lvarez, Teresa Cervero, Xavier Martorell, John D. Davis, and Eduard Ayguad\u00e9. 2023. Accelerating SpMV on FPGAs through block-row compress: A task-based approach. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, Gothenburg, Sweden, 151\u2013158. DOI: 10.1109\/FPL60245.2023.00029"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00067"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080254"},{"key":"e_1_3_1_51_2","first-page":"362","volume-title":"25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)","author":"Parger Mathias","year":"2020","unstructured":"Mathias Parger, Martin Winter, Daniel Mlakar, and Markus Steinberger. 2020. spECK: Accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, San Diego, CA, USA, 362\u2013375. DOI: 10.1145\/3332466.3374521"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS45731.2020.9181266"},{"key":"e_1_3_1_53_2","unstructured":"Jeff Pool. 2020. Accelerating sparsity in the NVIDIA Ampere architecture. GTC 2020 (2020). Retrieved from https:\/\/developer.download.nvidia.com\/video\/gputechconf\/gtc\/2020\/presentations\/s22085-accelerating-sparsity-in-the-nvidia-ampere-architecture%E2%80%8B.pdf"},{"key":"e_1_3_1_54_2","volume-title":"Numerical Methods for Large Eigenvalue Problems","author":"Saad Yousef","year":"1992","unstructured":"Yousef Saad. 1992. Numerical Methods for Large Eigenvalue Problems. Manchester University Press, Manchester, UK."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071015"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSIPN.2017.2731051"},{"key":"e_1_3_1_57_2","unstructured":"Mohammadreza Soltaniyeh Richard P. Martin and Santosh Nagarakatte. 2020. Synergistic CPU-FPGA acceleration of sparse linear algebra. arXiv:2004.13907 1\u201312. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2004.13907"},{"key":"e_1_3_1_58_2","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1145\/3489517.3530420","volume-title":"59th ACM\/IEEE Design Automation Conference (DAC)","author":"Song Linghao","year":"2022","unstructured":"Linghao Song, Yuze Chi, Licheng Guo, and Jason Cong. 2022a. Serpens: A high bandwidth memory based accelerator for general-purpose sparse matrix-vector multiplication. In 59th ACM\/IEEE Design Automation Conference (DAC). ACM, San Francisco, CA, USA, 211\u2013216. DOI: 10.1145\/3489517.3530420"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1145\/3490422.3502357","volume-title":"2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)","author":"Song Linghao","year":"2022","unstructured":"Linghao Song, Yuze Chi, Atefeh Sohrabizadeh, Young-kyu Choi, Jason Lau, and Jason Cong. 2022b. Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication. In 2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, Virtual Event, USA, 65\u201377. DOI: 10.1145\/3490422.3502357"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00068"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3550075"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2024.3355499"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2010.2103924"},{"key":"e_1_3_1_64_2","unstructured":"Minjie Wang Da Zheng Zihao Ye Quan Gan Mufei Li Xiang Song Jinjing Zhou Chao Ma Lingfan Yu Yu Gai Tianjun Xiao Tong He George Karypis Jinyang Li and Zheng Zhang. 2019. Deep graph library: A graph-centric highly-performant package for graph neural networks. arXiv:1909.01315 1\u201318. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1909.01315"},{"key":"e_1_3_1_65_2","first-page":"445","volume-title":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","author":"Wu Chen","year":"2022","unstructured":"Chen Wu, Zhuofu Tao, Kun Wang, and Lei He. 2022. SkeletonGCN: A simple yet effective accelerator for GCN training. In 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, Belfast, United Kingdom, 445\u2013451. DOI: 10.1109\/FPL57034.2022.00073"},{"key":"e_1_3_1_66_2","unstructured":"Xilinx. 2023a. xbutil Utility. Retrieved from https:\/\/www.xilinx.com\/video\/software\/xilinx-board-utility-introduction.html"},{"key":"e_1_3_1_67_2","unstructured":"Xilinx. 2023b. Xilinx Power Estimator. Retrieved from https:\/\/www.xilinx.com\/products\/technology\/power\/xpe.html"},{"key":"e_1_3_1_68_2","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1145\/3626202.3637562","volume-title":"2024 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA)","author":"Zeng Shulin","year":"2024","unstructured":"Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, Jintao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, and Yu Wang. 2024. FlightLLM: Efficient large language model inference with a complete mapping flow on FPGAs. In 2024 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA). ACM, Monterey, CA, USA, 223\u2013234. DOI: 10.1145\/3626202.3637562"},{"key":"e_1_3_1_69_2","unstructured":"Wei Zeng Xiaozhe Ren Teng Su Hui Wang Yi Liao Zhiwei Wang Xin Jiang ZhenZhang Yang Kaisheng Wang Xiaoda Zhang Chen Li Ziyan Gong Yifan Yao Xinjing Huang Jun Wang Jianfeng Yu Qi Guo Yue Yu Yan Zhang Jin Wang Hengtao Tao Dasen Yan Zexuan Yi Fang Peng Fangqing Jiang Han Zhang Lingfeng Deng Yehong Zhang Zhe Lin Chao Zhang Shaojie Zhang Mingyue Guo Shanzhi Gu Gaojun Fan Yaowei Wang Xuefeng Jin Qun Liu and Yonghong Tian. 2021. PanGu- \\(\\alpha\\) : Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. arXiv:2104.12369 1\u201323. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2104.12369"},{"key":"e_1_3_1_70_2","first-page":"233","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Zhang Bingyi","year":"2023","unstructured":"Bingyi Zhang and Viktor K. Prasanna. 2023. Dynasparse: Accelerating GNN inference through dynamic sparsity exploitation. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, St. Petersburg, FL, USA, 233\u2013244. DOI: 10.1109\/IPDPS54959.2023.00032"},{"key":"e_1_3_1_71_2","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1145\/3445814.3446702","volume-title":"26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)","author":"Zhang Guowei","year":"2021","unstructured":"Guowei Zhang, Nithya Attaluri, Joel S. Emer, and Daniel Sanchez. 2021. Gamma: Leveraging Gustavson\u2019s algorithm to accelerate sparse matrix multiplication. In 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, Virtual Event, USA, 687\u2013701. DOI: 10.1145\/3445814.3446702"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00030"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687480","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687480","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:44Z","timestamp":1750295864000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687480"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,18]]},"references-count":71,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3687480"],"URL":"https:\/\/doi.org\/10.1145\/3687480","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,18]]},"assertion":[{"value":"2024-01-21","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}