{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T07:13:23Z","timestamp":1777187603780,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":92,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,2,11]],"date-time":"2022-02-11T00:00:00Z","timestamp":1644537600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/"}],"funder":[{"name":"Xilinx XACC Program"},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1937599"],"award-info":[{"award-number":["CCF-1937599"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"CDSC industrial partners (https:\/\/cdsc.ucla.edu\/partners)"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,2,13]]},"DOI":"10.1145\/3490422.3502357","type":"proceedings-article","created":{"date-parts":[[2022,2,12]],"date-time":"2022-02-12T05:09:21Z","timestamp":1644642561000},"page":"65-77","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":75,"title":["Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication"],"prefix":"10.1145","author":[{"given":"Linghao","family":"Song","sequence":"first","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuze","family":"Chi","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atefeh","family":"Sohrabizadeh","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Young-kyu","family":"Choi","sequence":"additional","affiliation":[{"name":"Inha University, Incheon, South Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jason","family":"Lau","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jason","family":"Cong","sequence":"additional","affiliation":[{"name":"University of California, Los Angeles, Los Angeles, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,2,11]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750386"},{"key":"e_1_3_2_2_2_1","volume-title":"The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays . 23--33","author":"Arora Aman","unstructured":"Aman Arora , Samidh Mehta , Vaughn Betz , and Lizy K. John . 2021. Tensor Slices to the Rescue: Supercharging ML Acceleration on FPGAs . In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays . 23--33 . Aman Arora, Samidh Mehta, Vaughn Betz, and Lizy K. John. 2021. Tensor Slices to the Rescue: Supercharging ML Acceleration on FPGAs. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays . 23--33."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_3_2_2_4_1","volume-title":"SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. In MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture. 282--297","author":"Besta Maciej","year":"2021","unstructured":"Maciej Besta , Raghavendra Kanakagiri , Grzegorz Kwasniewski , Rachata Ausavarungnirun , Jakub Ber\u00e1nek , Konstantinos Kanellopoulos , Kacper Janda , Zur Vonarburg-Shmaria , Lukas Gianinazzi , Ioana Stefan , Juan G\u00f3mez Luna , Jakub Golinowski , Marcin Copik , Lukas Kapp-Schwoerer , Salvatore Di Girolamo , Nils Blach , Marek Konieczny , Onur Mutlu , and Torsten Hoefler . 2021 . SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. In MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture. 282--297 . Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Ber\u00e1nek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan G\u00f3mez Luna, Jakub Golinowski, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Nils Blach, Marek Konieczny, Onur Mutlu, and Torsten Hoefler. 2021. SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. In MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture. 282--297."},{"key":"e_1_3_2_2_5_1","volume-title":"Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 951--966","author":"Cali Damla Senol","year":"2020","unstructured":"Damla Senol Cali , Gurpreet S. Kalsi , Z\u00fclal Bing\u00f6l , Can Firtina , Lavanya Subramanian , Jeremie S. Kim , Rachata Ausavarungnirun , Mohammed Alser , Juan Gomez-Luna , Amirali Boroumand , 2020 . GenASM: A High-Performance , Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 951--966 . Damla Senol Cali, Gurpreet S. Kalsi, Z\u00fclal Bing\u00f6l, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, et almbox. 2020. GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 951--966."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASP-DAC47756.2020.9045555"},{"key":"e_1_3_2_2_7_1","volume-title":"Sorting Large Data Sets with FPGA-Accelerated Samplesort. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 326--326","author":"Chen Han","year":"2019","unstructured":"Han Chen , Sergey Madaminov , Michael Ferdman , and Peter Milder . 2019 b . Sorting Large Data Sets with FPGA-Accelerated Samplesort. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 326--326 . Han Chen, Sergey Madaminov, Michael Ferdman, and Peter Milder. 2019 b. Sorting Large Data Sets with FPGA-Accelerated Samplesort. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 326--326."},{"key":"e_1_3_2_2_8_1","volume-title":"DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 609--622","author":"Chen Yunji","year":"2014","unstructured":"Yunji Chen , Tao Luo , Shaoli Liu , Shijin Zhang , Liqiang He , Jia Wang , Ling Li , Tianshi Chen , Zhiwei Xu , Ninghui Sun , 2014 . DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 609--622 . Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et almbox. 2014. DaDianNao: A Machine-Learning Supercomputer. In 2014 47th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE, 609--622."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3370748.3406552"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293919"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1021\/jm4004285"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498258"},{"key":"e_1_3_2_2_14_1","volume-title":"Extending High-Level Synthesis for Task-Parallel Programs. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 204--213","author":"Chi Yuze","year":"2021","unstructured":"Yuze Chi , Licheng Guo , Jason Lau , Young-kyu Choi, Jie Wang , and Jason Cong . 2021 . Extending High-Level Synthesis for Task-Parallel Programs. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 204--213 . Yuze Chi, Licheng Guo, Jason Lau, Young-kyu Choi, Jie Wang, and Jason Cong. 2021. Extending High-Level Synthesis for Task-Parallel Programs. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 204--213."},{"key":"e_1_3_2_2_15_1","volume-title":"HBM Connect: High-Performance HLS Interconnect for FPGA HBM. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 116--126","author":"Chi Yuze","year":"2021","unstructured":"Young-kyu Choi, Yuze Chi , Weikang Qiao , Nikola Samardzic , and Jason Cong . 2021 . HBM Connect: High-Performance HLS Interconnect for FPGA HBM. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 116--126 . Young-kyu Choi, Yuze Chi, Weikang Qiao, Nikola Samardzic, and Jason Cong. 2021. HBM Connect: High-Performance HLS Interconnect for FPGA HBM. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 116--126."},{"key":"e_1_3_2_2_16_1","volume-title":"When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization . arXiv preprint arXiv:2010.06075","author":"Chi Yuze","year":"2020","unstructured":"Young-kyu Choi, Yuze Chi , Jie Wang , Licheng Guo , and Jason Cong . 2020. When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization . arXiv preprint arXiv:2010.06075 ( 2020 ). Young-kyu Choi, Yuze Chi, Jie Wang, Licheng Guo, and Jason Cong. 2020. When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization . arXiv preprint arXiv:2010.06075 (2020)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1039\/C8SC04228D"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2876372"},{"key":"e_1_3_2_2_19_1","volume-title":"Accelerator-Rich Architectures: Opportunities and Progresses. In 2014 51st ACM\/EDAC\/IEEE Design Automation Conference (DAC). IEEE, 1--6.","author":"Cong Jason","year":"2014","unstructured":"Jason Cong , Mohammad Ali Ghodrat , Michael Gill , Beayna Grigorian , Karthik Gururaj , and Glenn Reinman . 2014 . Accelerator-Rich Architectures: Opportunities and Progresses. In 2014 51st ACM\/EDAC\/IEEE Design Automation Conference (DAC). IEEE, 1--6. Jason Cong, Mohammad Ali Ghodrat, Michael Gill, Beayna Grigorian, Karthik Gururaj, and Glenn Reinman. 2014. Accelerator-Rich Architectures: Opportunities and Progresses. In 2014 51st ACM\/EDAC\/IEEE Design Automation Conference (DAC). IEEE, 1--6."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240838"},{"key":"e_1_3_2_2_21_1","volume-title":"Latte: Locality Aware Transformation for High-Level Synthesis. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 125--128","author":"Cong Jason","year":"2018","unstructured":"Jason Cong , Peng Wei , Cody Hao Yu , and Peipei Zhou . 2018 b. Latte: Locality Aware Transformation for High-Level Synthesis. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 125--128 . Jason Cong, Peng Wei, Cody Hao Yu, and Peipei Zhou. 2018b. Latte: Locality Aware Transformation for High-Level Synthesis. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 125--128."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847339"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2821565"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3361682"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049670"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375296"},{"key":"e_1_3_2_2_27_1","volume-title":"FBLAS: Streaming Linear Algebra on FPGA. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13","author":"Matteis Tiziano De","year":"2020","unstructured":"Tiziano De Matteis , Johannes de Fine Licht , and Torsten Hoefler . 2020 . FBLAS: Streaming Linear Algebra on FPGA. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13 . Tiziano De Matteis, Johannes de Fine Licht, and Torsten Hoefler. 2020. FBLAS: Streaming Linear Algebra on FPGA. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--13."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293904"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/2650280.2650344"},{"key":"e_1_3_2_2_30_1","volume-title":"GenAx: A Genome Sequencing Accelerator. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 69--82","author":"Fujiki Daichi","year":"2018","unstructured":"Daichi Fujiki , Arun Subramaniyan , Tianjun Zhang , Yu Zeng , Reetuparna Das , David Blaauw , and Satish Narayanasamy . 2018 . GenAx: A Genome Sequencing Accelerator. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 69--82 . Daichi Fujiki, Arun Subramaniyan, Tianjun Zhang, Yu Zeng, Reetuparna Das, David Blaauw, and Satish Narayanasamy. 2018. GenAx: A Genome Sequencing Accelerator. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 69--82."},{"key":"e_1_3_2_2_31_1","volume-title":"Sparse GPU Kernels for Deep Learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14","author":"Gale Trevor","year":"2020","unstructured":"Trevor Gale , Matei Zaharia , Cliff Young , and Erich Elsen . 2020 . Sparse GPU Kernels for Deep Learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14 . Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU Kernels for Deep Learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--14."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586216"},{"key":"e_1_3_2_2_33_1","volume-title":"AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 922--936","author":"Geng Tong","year":"2020","unstructured":"Tong Geng , Ang Li , Runbin Shi , Chunshu Wu , Tianqi Wang , Yanfei Li , Pouya Haghi , Antonino Tumeo , Shuai Che , Steve Reinhardt , 2020 . AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 922--936 . Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et almbox. 2020. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 922--936."},{"key":"e_1_3_2_2_34_1","unstructured":"Licheng Guo Yuze Chi Jie Wang Jason Lau Weikang Qiao Ecenur Ustun Zhiru Zhang and Jason Cong. 2021. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 81--92.  Licheng Guo Yuze Chi Jie Wang Jason Lau Weikang Qiao Ecenur Ustun Zhiru Zhang and Jason Cong. 2021. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays. 81--92."},{"key":"e_1_3_2_2_35_1","unstructured":"Licheng Guo Jason Lau Zhenyuan Ruan Peng Wei and Jason Cong. 2019. Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE 127--135.  Licheng Guo Jason Lau Zhenyuan Ruan Peng Wei and Jason Cong. 2019. Hardware Acceleration of Long Read Pairwise Overlapping in Genome Sequencing: A Race Between FPGA and GPU. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE 127--135."},{"key":"e_1_3_2_2_36_1","volume-title":"Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--13","author":"Ham Tae Jun","year":"2016","unstructured":"Tae Jun Ham , Lisa Wu , Narayanan Sundaram , Nadathur Satish , and Margaret Martonosi . 2016 . Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--13 . Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1--13."},{"key":"e_1_3_2_2_37_1","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035","author":"Hamilton William L.","year":"2017","unstructured":"William L. Hamilton , Rex Ying , and Jure Leskovec . 2017 . Inductive Representation Learning on Large Graphs . In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035 . William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025--1035."},{"key":"e_1_3_2_2_38_1","volume-title":"EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 243--254","author":"Han Song","unstructured":"Song Han , Xingyu Liu , Huizi Mao , Jing Pu , Ardavan Pedram , Mark A. Horowitz , and William J. Dally . 2016 . EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 243--254 . Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 243--254."},{"key":"e_1_3_2_2_39_1","volume-title":"Dally","author":"Han Song","year":"2015","unstructured":"Song Han , Huizi Mao , and William J . Dally . 2015 a. Deep Compression : Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding . arXiv preprint arXiv:1510.00149 (2015). Song Han, Huizi Mao, and William J. Dally. 2015a. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv preprint arXiv:1510.00149 (2015)."},{"key":"e_1_3_2_2_40_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1 . 1135--1143","author":"Han Song","unstructured":"Song Han , Jeff Pool , John Tran , and William J. Dally . 2015b. Learning both Weights and Connections for Efficient Neural Networks . In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1 . 1135--1143 . Song Han, Jeff Pool, John Tran, and William J. Dally. 2015b. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1 . 1135--1143."},{"key":"e_1_3_2_2_41_1","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture . 319--333","author":"Hegde Kartik","unstructured":"Kartik Hegde , Hadi Asghari-Moghaddam , Michael Pellauer , Neal Crago , Aamer Jaleel , Edgar Solomonik , Joel Emer , and Christopher W. Fletcher . 2019. ExTensor: An Accelerator for Sparse Tensor Algebra . In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture . 319--333 . Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W. Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture . 319--333."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3282307"},{"key":"e_1_3_2_2_43_1","volume-title":"SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 84--96","author":"Hojabr Reza","year":"2021","unstructured":"Reza Hojabr , Ali Sedaghati , Amirali Sharifian , Ahmad Khonsari , and Arrvindh Shriraman . 2021 . SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 84--96 . Reza Hojabr, Ali Sedaghati, Amirali Sharifian, Ahmad Khonsari, and Arrvindh Shriraman. 2021. SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 84--96."},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476177"},{"key":"e_1_3_2_2_45_1","volume-title":"GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs. In 2021 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--9.","author":"Hu Yuwei","year":"2021","unstructured":"Yuwei Hu , Yixiao Du , Ecenur Ustun , and Zhiru Zhang . 2021 . GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs. In 2021 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--9. Yuwei Hu, Yixiao Du, Ecenur Ustun, and Zhiru Zhang. 2021. GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs. In 2021 IEEE\/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1--9."},{"key":"e_1_3_2_2_46_1","volume-title":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Huang Guyue","unstructured":"Guyue Huang , Guohao Dai , Yu Wang , and Huazhong Yang . 2020. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks . In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis . IEEE , 1--12. Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. 2020. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12."},{"key":"e_1_3_2_2_47_1","volume-title":"Shuhai: A Tool for Benchmarking High Bandwidth Memory on FPGAs","author":"Huang Hongjing","year":"2021","unstructured":"Hongjing Huang , Zeke Wang , Jie Zhang , Zhenhao He , Chao Wu , Jun Xiao , and Gustavo Alonso . 2021 . Shuhai: A Tool for Benchmarking High Bandwidth Memory on FPGAs . IEEE Trans. Comput . (2021). Hongjing Huang, Zeke Wang, Jie Zhang, Zhenhao He, Chao Wu, Jun Xiao, and Gustavo Alonso. 2021. Shuhai: A Tool for Benchmarking High Bandwidth Memory on FPGAs. IEEE Trans. Comput. (2021)."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358329"},{"key":"e_1_3_2_2_49_1","volume-title":"Terabyte Sort on FPGA-Accelerated Flash Storage. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17--24","author":"Jun Sang-Woo","year":"2017","unstructured":"Sang-Woo Jun , Shuotao Xu , 2017 . Terabyte Sort on FPGA-Accelerated Flash Storage. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17--24 . Sang-Woo Jun, Shuotao Xu, et almbox. 2017. Terabyte Sort on FPGA-Accelerated Flash Storage. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 17--24."},{"key":"e_1_3_2_2_50_1","volume-title":"Kipf and Max Welling","author":"Thomas","year":"2016","unstructured":"Thomas N. Kipf and Max Welling . 2016 . Semi-Supervised Classification with Graph Convolutional Networks . arXiv preprint arXiv:1609.02907 (2016). Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_3_2_2_51_1","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data .  Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http:\/\/snap.stanford.edu\/data ."},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751209"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322275"},{"key":"e_1_3_2_2_54_1","volume-title":"CUSPARSE Library. In GPU Technology Conference .","author":"Naumov Maxim","year":"2010","unstructured":"Maxim Naumov , L. Chien , Philippe Vandermersch , and Ujval Kapasi . 2010 . CUSPARSE Library. In GPU Technology Conference . Maxim Naumov, L. Chien, Philippe Vandermersch, and Ujval Kapasi. 2010. CUSPARSE Library. In GPU Technology Conference ."},{"key":"e_1_3_2_2_55_1","volume-title":"Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et almbox.","author":"Naumov Maxim","year":"2019","unstructured":"Maxim Naumov , Dheevatsa Mudigere , Hao-Jun Michael Shi , Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et almbox. 2019 . Deep Learning Recommendation Model for Personalization and Recommendation Systems . arXiv preprint arXiv:1906.00091 (2019). Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et almbox. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. arXiv preprint arXiv:1906.00091 (2019)."},{"key":"e_1_3_2_2_56_1","volume-title":"OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 724--736","author":"Pal Subhankar","year":"2018","unstructured":"Subhankar Pal , Jonathan Beaumont , Dong-Hyeon Park , Aporva Amarnath , Siying Feng , Chaitali Chakrabarti , Hun-Seok Kim , David Blaauw , Trevor Mudge , and Ronald Dreslinski . 2018 . OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 724--736 . Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 724--736."},{"key":"e_1_3_2_2_57_1","volume-title":"FANS: FPGA-Accelerated Near-Storage Sorting. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 106--114","author":"Qiao Weikang","year":"2021","unstructured":"Weikang Qiao , Jihun Oh , Licheng Guo , Mau-Chung Frank Chang , and Jason Cong . 2021 . FANS: FPGA-Accelerated Near-Storage Sorting. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 106--114 . Weikang Qiao, Jihun Oh, Licheng Guo, Mau-Chung Frank Chang, and Jason Cong. 2021. FANS: FPGA-Accelerated Near-Storage Sorting. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 106--114."},{"key":"e_1_3_2_2_58_1","volume-title":"SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58--70","author":"Qin Eric","year":"2020","unstructured":"Eric Qin , Ananda Samajdar , Hyoukjun Kwon , Vineet Nadella , Sudarshan Srinivasan , Dipankar Das , Bharat Kaul , and Tushar Krishna . 2020 . SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58--70 . Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58--70."},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_3_2_2_60_1","volume-title":"Bonsai: High-Performance Adaptive Merge Tree Sorting. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 282--294","author":"Samardzic Nikola","year":"2020","unstructured":"Nikola Samardzic , Weikang Qiao , Vaibhav Aggarwal , Mau-Chung Frank Chang , and Jason Cong . 2020 . Bonsai: High-Performance Adaptive Merge Tree Sorting. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 282--294 . Nikola Samardzic, Weikang Qiao, Vaibhav Aggarwal, Mau-Chung Frank Chang, and Jason Cong. 2020. Bonsai: High-Performance Adaptive Merge Tree Sorting. In 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 282--294."},{"key":"e_1_3_2_2_61_1","volume-title":"Multi-Chip-Module-Based Architecture. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture. 14--27","author":"Shao Yakun Sophia","year":"2019","unstructured":"Yakun Sophia Shao , Jason Clemons , Rangharajan Venkatesan , Brian Zimmer , Matthew Fojtik , Nan Jiang , Ben Keller , Alicia Klinefelter , Nathaniel Pinckney , Priyanka Raina , 2019 . Simba: Scaling Deep-Learning Inference with . Multi-Chip-Module-Based Architecture. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture. 14--27 . Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, et almbox. 2019. Simba: Scaling Deep-Learning Inference with. Multi-Chip-Module-Based Architecture. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture. 14--27."},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.47"},{"key":"e_1_3_2_2_63_1","volume-title":"Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 535--547","author":"Shen Yongming","year":"2017","unstructured":"Yongming Shen , Michael Ferdman , and Peter Milder . 2017 b. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 535--547 . Yongming Shen, Michael Ferdman, and Peter Milder. 2017b. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 535--547."},{"key":"e_1_3_2_2_64_1","volume-title":"AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355","author":"Song Linghao","year":"2020","unstructured":"Linghao Song , Fan Chen , Youwei Zhuo , Xuehai Qian , Hai Li , and Yiran Chen . 2020 . AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355 . Linghao Song, Fan Chen, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2020. AccPar: Tensor Partitioning for Heterogeneous Deep Learning Accelerators. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 342--355."},{"key":"e_1_3_2_2_65_1","volume-title":"HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 56--68","author":"Song Linghao","year":"2019","unstructured":"Linghao Song , Jiachen Mao , Youwei Zhuo , Xuehai Qian , Hai Li , and Yiran Chen . 2019 . HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 56--68 . Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2019. HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 56--68."},{"key":"e_1_3_2_2_66_1","volume-title":"PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 541--552","author":"Song Linghao","year":"2017","unstructured":"Linghao Song , Xuehai Qian , Hai Li , and Yiran Chen . 2017 . PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 541--552 . Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 541--552."},{"key":"e_1_3_2_2_67_1","volume-title":"GraphR: Accelerating Graph Processing Using ReRAM. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 531--543","author":"Song Linghao","year":"2018","unstructured":"Linghao Song , Youwei Zhuo , Xuehai Qian , Hai Li , and Yiran Chen . 2018 . GraphR: Accelerating Graph Processing Using ReRAM. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 531--543 . Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating Graph Processing Using ReRAM. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 531--543."},{"key":"e_1_3_2_2_68_1","volume-title":"Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 689--702","author":"Srivastava Nitish","year":"2020","unstructured":"Nitish Srivastava , Hanchen Jin , Shaden Smith , Hongbo Rong , David Albonesi , and Zhiru Zhang . 2020 . Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 689--702 . Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) . IEEE, 689--702."},{"key":"e_1_3_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00033"},{"key":"e_1_3_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.69"},{"key":"e_1_3_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304624"},{"key":"e_1_3_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.111.0025"},{"key":"e_1_3_2_2_73_1","volume-title":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. 199--213","author":"Turakhia Yatish","unstructured":"Yatish Turakhia , Gill Bejerano , and William J. Dally . 2018. Darwin: A Genomics Co-processor Provides up to 15,000 X Acceleration on Long Read Assembly . In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. 199--213 . Yatish Turakhia, Gill Bejerano, and William J. Dally. 2018. Darwin: A Genomics Co-processor Provides up to 15,000 X Acceleration on Long Read Assembly. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. 199--213."},{"key":"e_1_3_2_2_74_1","volume-title":"AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays . 93--104","author":"Wang Jie","year":"2021","unstructured":"Jie Wang , Licheng Guo , and Jason Cong . 2021 b . AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays . 93--104 . Jie Wang, Licheng Guo, and Jason Cong. 2021 b. AutoSA: A Polyhedral Compiler for High-Performance Systolic Arrays on FPGA. In The 2021 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays . 93--104."},{"key":"e_1_3_2_2_75_1","volume-title":"15th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 21). 515--531.","author":"Wang Yuke","unstructured":"Yuke Wang , Boyuan Feng , Gushu Li , Shuangchen Li , Lei Deng , Yuan Xie , and Yufei Ding . 2021 a. GNNAdvisor: An Adaptive and Efficient Runtime System for $$GNN$$ Acceleration on GPUs . In 15th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 21). 515--531. Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021 a. GNNAdvisor: An Adaptive and Efficient Runtime System for $$GNN$$ Acceleration on GPUs. In 15th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 21). 515--531."},{"key":"e_1_3_2_2_76_1","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems . 2082--2090","author":"Wen Wei","year":"2016","unstructured":"Wei Wen , Chunpeng Wu , Yandan Wang , Yiran Chen , and Hai Li . 2016 . Learning Structured Sparsity in Deep Neural Networks . In Proceedings of the 30th International Conference on Neural Information Processing Systems . 2082--2090 . Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems . 2082--2090."},{"key":"e_1_3_2_2_77_1","volume-title":"Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. In SC'07: Proceedings of the 2007 ACM\/IEEE Conference on Supercomputing. IEEE, 1--12","author":"Williams Samuel","year":"2007","unstructured":"Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , and James Demmel . 2007 . Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. In SC'07: Proceedings of the 2007 ACM\/IEEE Conference on Supercomputing. IEEE, 1--12 . Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2007. Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms. In SC'07: Proceedings of the 2007 ACM\/IEEE Conference on Supercomputing. IEEE, 1--12."},{"key":"e_1_3_2_2_78_1","volume-title":"FPGA Accelerated INDEL Realignment in the Cloud. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 277--290","author":"Wu Lisa","year":"2019","unstructured":"Lisa Wu , David Bruns-Smith , Frank A. Nothaft , Qijing Huang , Sagar Karandikar , Johnny Le , Andrew Lin , Howard Mao , Brendan Sweeney , Krste Asanovi\u0107 , 2019 . FPGA Accelerated INDEL Realignment in the Cloud. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 277--290 . Lisa Wu, David Bruns-Smith, Frank A. Nothaft, Qijing Huang, Sagar Karandikar, Johnny Le, Andrew Lin, Howard Mao, Brendan Sweeney, Krste Asanovi\u0107 , et almbox. 2019. FPGA Accelerated INDEL Realignment in the Cloud. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 277--290."},{"key":"e_1_3_2_2_79_1","volume-title":"SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) . IEEE, 570--583","author":"Xie Xinfeng","year":"2021","unstructured":"Xinfeng Xie , Zheng Liang , Peng Gu , Abanti Basak , Lei Deng , Ling Liang , Xing Hu , and Yuan Xie . 2021 . SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) . IEEE, 570--583 . Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, and Yuan Xie. 2021. SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) . IEEE, 570--583."},{"key":"e_1_3_2_2_80_1","volume-title":"International Conference on Learning Representations .","author":"Xu Keyulu","year":"2018","unstructured":"Keyulu Xu , Weihua Hu , Jure Leskovec , and Stefanie Jegelka . 2018 . How Powerful are Graph Neural Networks? . In International Conference on Learning Representations . Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How Powerful are Graph Neural Networks?. In International Conference on Learning Representations ."},{"key":"e_1_3_2_2_81_1","volume-title":"Modeling and Discovering Vulnerabilities with Code Property Graphs. In 2014 IEEE Symposium on Security and Privacy. IEEE, 590--604","author":"Yamaguchi Fabian","year":"2014","unstructured":"Fabian Yamaguchi , Nico Golde , Daniel Arp , and Konrad Rieck . 2014 . Modeling and Discovering Vulnerabilities with Code Property Graphs. In 2014 IEEE Symposium on Security and Privacy. IEEE, 590--604 . Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. 2014. Modeling and Discovering Vulnerabilities with Code Property Graphs. In 2014 IEEE Symposium on Security and Privacy. IEEE, 590--604."},{"key":"e_1_3_2_2_82_1","volume-title":"HyGCN: A GCN Accelerator with Hybrid Architecture. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15--29","author":"Yan Mingyu","year":"2020","unstructured":"Mingyu Yan , Lei Deng , Xing Hu , Ling Liang , Yujing Feng , Xiaochun Ye , Zhimin Zhang , Dongrui Fan , and Yuan Xie . 2020 . HyGCN: A GCN Accelerator with Hybrid Architecture. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15--29 . Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN Accelerator with Hybrid Architecture. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15--29."},{"key":"e_1_3_2_2_83_1","volume-title":"Owens","author":"Yang Carl","year":"2018","unstructured":"Carl Yang , Aydin Bulucc , and John D . Owens . 2018 . Design Principles for Sparse Matrix Multiplication on the GPU. In European Conference on Parallel Processing. Springer , 672--687. Carl Yang, Aydin Bulucc , and John D. Owens. 2018. Design Principles for Sparse Matrix Multiplication on the GPU. In European Conference on Parallel Processing. Springer, 672--687."},{"key":"e_1_3_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_3_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021698"},{"key":"e_1_3_2_2_86_1","volume-title":"GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 544--557","author":"Zhang Mingxing","year":"2018","unstructured":"Mingxing Zhang , Youwei Zhuo , Chao Wang , Mingyu Gao , Yongwei Wu , Kang Chen , Christos Kozyrakis , and Xuehai Qian . 2018 . GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 544--557 . Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 544--557."},{"key":"e_1_3_2_2_87_1","volume-title":"SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261--274","author":"Zhang Zhekai","unstructured":"Zhekai Zhang , Hanrui Wang , Song Han , and William J. Dally . 2020 . SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261--274 . Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient Architecture for Sparse Matrix Multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261--274."},{"key":"e_1_3_2_2_88_1","volume-title":"High-Throughput and Energy-Efficient Graph Processing on FPGA. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 103--110","author":"Zhou Shijie","unstructured":"Shijie Zhou , Charalampos Chelmis , and Viktor K. Prasanna . 2016 . High-Throughput and Energy-Efficient Graph Processing on FPGA. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 103--110 . Shijie Zhou, Charalampos Chelmis, and Viktor K. Prasanna. 2016. High-Throughput and Energy-Efficient Graph Processing on FPGA. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 103--110."},{"key":"e_1_3_2_2_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_3_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358269"},{"key":"e_1_3_2_2_91_1","unstructured":"Xiaowei Zhu Wentao Han and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 $$USENIX$$ Annual Technical Conference ($$USENIX$$ $$ATC$$ 15). 375--386.  Xiaowei Zhu Wentao Han and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In 2015 $$USENIX$$ Annual Technical Conference ($$USENIX$$ $$ATC$$ 15). 375--386."},{"key":"e_1_3_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358256"}],"event":{"name":"FPGA '22: The 2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","location":"Virtual Event USA","acronym":"FPGA '22","sponsor":["SIGDA ACM Special Interest Group on Design Automation"]},"container-title":["Proceedings of the 2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3490422.3502357","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3490422.3502357","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3490422.3502357","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:03Z","timestamp":1750188663000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3490422.3502357"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,11]]},"references-count":92,"alternative-id":["10.1145\/3490422.3502357","10.1145\/3490422"],"URL":"https:\/\/doi.org\/10.1145\/3490422.3502357","relation":{},"subject":[],"published":{"date-parts":[[2022,2,11]]},"assertion":[{"value":"2022-02-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}