{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T03:44:47Z","timestamp":1770349487487,"version":"3.49.0"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,5,21]],"date-time":"2024-05-21T00:00:00Z","timestamp":1716249600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Institute of Information & communications Technology Planning & Evaluation","award":["IITP2017-0-00466 SW StarLab and IITP2021-0-01817 Development of Next-Generation Computing Techniques for Hyper-Composable Datacenters"],"award-info":[{"award-number":["IITP2017-0-00466 SW StarLab and IITP2021-0-01817 Development of Next-Generation Computing Techniques for Hyper-Composable Datacenters"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>The multiplication of sparse matrix and vector (SpMV) is one of the most widely used kernels in high-performance computing as well as machine learning acceleration for sparse neural networks. The design space of SpMV accelerators has two axes: algorithm and matrix representation. There have been two widely used algorithms and data representations. Two algorithms, scalar multiplication and dot product, can be combined with two sparse data representations, compressed sparse and bitmap formats for the matrix and vector. Although the prior accelerators adopted one of the possible designs, it is yet to be investigated which design is the best one across different hardware resources and workload characteristics. This paper first investigates the impact of design choices with respect to the algorithm and data representation. Our evaluation shows that no single design always outperforms the others across different workloads, but the two best designs (i.e., compressed sparse format and bitmap format with dot product) have complementary performance with trade-offs incurred by the matrix characteristics. Based on the analysis, this study proposes Cerberus, a triple-mode accelerator supporting two sparse operation modes in addition to the base dense mode. To allow such multi-mode operation, it proposes a prediction model based on matrix characteristics under a given hardware configuration, which statically selects the best mode for a given sparse matrix with its dimension and density information. Our experimental results show that Cerberus provides 12.1\u00d7 performance improvements from a dense-only accelerator, and 1.5\u00d7 improvements from a fixed best SpMV design.<\/jats:p>","DOI":"10.1145\/3653020","type":"journal-article","created":{"date-parts":[[2024,3,17]],"date-time":"2024-03-17T07:43:03Z","timestamp":1710661383000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-0886-6794","authenticated-orcid":false,"given":"Soojin","family":"Hwang","sequence":"first","affiliation":[{"name":"KAIST, Yuseong-gu, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0460-3809","authenticated-orcid":false,"given":"Daehyeon","family":"Baek","sequence":"additional","affiliation":[{"name":"KAIST, Yuseong-gu Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6629-449X","authenticated-orcid":false,"given":"Jongse","family":"Park","sequence":"additional","affiliation":[{"name":"KAIST, Yuseong-gu Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1742-047X","authenticated-orcid":false,"given":"Jaehyuk","family":"Huh","sequence":"additional","affiliation":[{"name":"KAIST, Yuseong-gu Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,5,21]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2020.102710"},{"key":"e_1_3_1_3_2","unstructured":"AMD. 2021. Xilinx Vivado 2021.2. https:\/\/www.xilinx.com\/products\/design-tools\/vivado.html"},{"key":"e_1_3_1_4_2","first-page":"116","volume-title":"Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201921)","author":"Baek Daehyeon","year":"2021","unstructured":"Daehyeon Baek, Soojin Hwang, Taekyung Heo, Daehoon Kim, and Jaehyuk Huh. 2021. InnerSP: A memory efficient sparse matrix multiplication accelerator with locality-aware inner product processing. In Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201921). 116\u2013128."},{"issue":"2","key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0228343","article-title":"Speed, energy, and area optimized early output quasi-delay-insensitive array multipliers","volume":"15","author":"Balasubramanian P.","year":"2020","unstructured":"P. Balasubramanian, D. L. Maskell, and N. E. Mastorakis. 2020. Speed, energy, and area optimized early output quasi-delay-insensitive array multipliers. PLoS ONE 15, 2 (2020), 1\u201320.","journal-title":"PLoS ONE"},{"key":"e_1_3_1_6_2","first-page":"1","volume-title":"SC\u201909: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Bell Nathan","year":"2009","unstructured":"Nathan Bell and Michael Garland. 2009. Implementing sparse matrix-vector multiplication on throughput-oriented processors. In SC\u201909: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201311."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503293"},{"key":"e_1_3_1_8_2","unstructured":"Aric Coady. 2004. Process and System for Sparse Vector and Matrix Representation of Document Indexing and Retrieval. US Patent 6 751 628."},{"issue":"1","key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2049662.2049663","article-title":"The University of Florida sparse matrix collection","volume":"38","author":"Davis Timothy A.","year":"2011","unstructured":"Timothy A. Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1\u201325.","journal-title":"ACM Transactions on Mathematical Software (TOMS)"},{"key":"e_1_3_1_10_2","first-page":"1110","volume-title":"Proceedings of the 48th International Symposium on Computer Architecture (ISCA\u201921)","author":"Deng Chunhua","year":"2021","unstructured":"Chunhua Deng, Yang Sui, Siyu Liao, Xuehai Qian, and Bo Yuan. 2021. GoSPA: An energy-efficient high-performance golbally optimized SParse convolutional neural network accelerator. In Proceedings of the 48th International Symposium on Computer Architecture (ISCA\u201921). 1110\u20131123."},{"key":"e_1_3_1_11_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]"},{"key":"e_1_3_1_12_2","article-title":"Auto-selection of an optimal sparse matrix format in the neuro-simulator ANNarchy","volume":"16","author":"Dinkelbach Helge \u00dclo","year":"2022","unstructured":"Helge \u00dclo Dinkelbach, Badr-Eddine Bouhlal, Julien Vitay, and Fred H. Hamker. 2022. Auto-selection of an optimal sparse matrix format in the neuro-simulator ANNarchy. Frontiers in Neuroinformatics 16 (2022).","journal-title":"Frontiers in Neuroinformatics"},{"issue":"1","key":"e_1_3_1_13_2","first-page":"1","article-title":"The original Harwell-Boeing collection.","volume":"14","author":"Duff Iain","year":"1989","unstructured":"Iain Duff, Roger Grimes, and John Lewis. 1989. The original Harwell-Boeing collection. ACM Trans. Math. Software 14, 1 (1989), 1\u201314.","journal-title":"ACM Trans. Math. Software"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","unstructured":"Jean-Guillaume Dumas. 2017. SIMC: Sparse integer matrix collection. (2017). DOI:10.18709\/PERSCIDO.2017.11.DS185","DOI":"10.18709\/PERSCIDO.2017.11.DS185"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00012"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1145\/3352460.3358291","volume-title":"Proceedings of the 52nd International Symposium on Microarchitecture (MICRO\u201919)","author":"Gondimalla Ashish","year":"2019","unstructured":"Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. 2019. SparTen: A sparse tensor accelerator for convolution neural networks. In Proceedings of the 52nd International Symposium on Microarchitecture (MICRO\u201919). 151\u2013165."},{"key":"e_1_3_1_17_2","first-page":"1","volume-title":"Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919)","author":"Gupta Udit","year":"2019","unstructured":"Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, and David Brooks. 2019. MASR: A modular accelerator for sparse RNNs. In Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201919). 1\u201314."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_1_19_2","first-page":"1","article-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding","author":"Han Song","year":"2016","unstructured":"Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Proceedings of the International Conference on Learning Representations (ICLR) (2016), 1\u201314.","journal-title":"Proceedings of the International Conference on Learning Representations (ICLR)"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969366"},{"key":"e_1_3_1_21_2","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arxiv:1512.03385 [cs.CV]"},{"key":"e_1_3_1_22_2","first-page":"1","volume-title":"The Proceedings of the International Conference on Supercomputing (ICS\u201920)","author":"He Xin","year":"2020","unstructured":"Xin He, Subhankar Pal, Aporva Amarnath, Siying Feng, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Yuhan Chen, Ronald Dreslinski, and Trevor Mudge. 2020. Sparse-TPU: Adapting systolic arrays for sparse matrices. In The Proceedings of the International Conference on Supercomputing (ICS\u201920). 1\u201312."},{"key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1145\/3352460.3358275","volume-title":"Proceedings of the 52nd International Symposium on Microarchitecture (MICRO\u201919)","author":"Hegde Kartik","year":"2019","unstructured":"Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W. Fletcher. 2019. ExTensor: An accelerator for sparse tensor Algebra. In Proceedings of the 52nd International Symposium on Microarchitecture (MICRO\u201919). 319\u2013333."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00062"},{"key":"e_1_3_1_25_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. arxiv:1503.02531 [stat.ML]"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3208040.3208062"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00051"},{"key":"e_1_3_1_28_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient Convolution Neural Networks for Mobile Vision Applications. arxiv:1704.04861 [cs.CV]"},{"key":"e_1_3_1_29_2","first-page":"1109","volume-title":"Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920)","author":"Hyun Bongjoon","year":"2020","unstructured":"Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, and Minsoo Rhu. 2020. NeuMMU: Architectural support for efficient address translations in neural processing units. In Proceedings of the 25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201920). 1109\u20131124."},{"key":"e_1_3_1_30_2","unstructured":"Forrest N. Iandola Song Han Matthew W. Moskewicz Khalid Ashraf William J. Dally and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and \\(\\lt\\) 0.5MB Model Size. arxiv:1602.07360 [cs.CV]"},{"key":"e_1_3_1_31_2","unstructured":"Intel. 2019. Intel Math Kernel Library. https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/tools\/math-kernel-library.html"},{"key":"e_1_3_1_32_2","doi-asserted-by":"crossref","unstructured":"Alexander Kozlov Ivan Lazarevich Vasily Shamporov Nikolay Lyalyushkin and Yury Gorbachev. 2020. Neural Network Compression Framework for Fast Model Inference. arxiv:2002.08679 [cs.CV]","DOI":"10.1007\/978-3-030-80129-8_17"},{"key":"e_1_3_1_33_2","first-page":"1","volume-title":"Proceedings of Neural Information Processing Systems (NeurIPS\u201912)","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of Neural Information Processing Systems (NeurIPS\u201912). 1\u20139."},{"issue":"3","key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/CJECE.2013.6704691","article-title":"Leading one detectors and leading one position detectors - an evolutionary design methodology","volume":"36","author":"Kunaraj Kumarasamy","year":"2013","unstructured":"Kumarasamy Kunaraj and R. Seshasayanan. 2013. Leading one detectors and leading one position detectors - an evolutionary design methodology. Canadian Journal of Electrical and Computer Engineering 36, 3 (2013), 103\u2013110.","journal-title":"Canadian Journal of Electrical and Computer Engineering"},{"issue":"1","key":"e_1_3_1_35_2","first-page":"1","article-title":"A novel cross-latch shift register scheme for low power applications","volume":"11","author":"Kuo Po-Yu","year":"2021","unstructured":"Po-Yu Kuo, Ming-Hwa Sheu, Chang-Ming Tsai, Ming-Yan Tsai, and Jin-Fa Lin. 2021. A novel cross-latch shift register scheme for low power applications. Applied Sciences 11, 1 (2021), 1\u201311.","journal-title":"Applied Sciences"},{"key":"e_1_3_1_36_2","first-page":"1","volume-title":"SC18: International Conference for High Performance Computing, Networking, Storage, and Analysis","author":"Li Jiajia","year":"2018","unstructured":"Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical storage of sparse tensors. In SC18: International Conference for High Performance Computing, Networking, Storage, and Analysis. 1\u201315."},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"Min Li Yulong Ao and Chao Yang. 2020. Adaptive SpMV\/SpMSpV on GPUs for input vectors of varied sparsity. (2020). arxiv:2006.16767 [cs.DC]","DOI":"10.1109\/TPDS.2020.3040150"},{"key":"e_1_3_1_38_2","first-page":"1","volume-title":"Proceedings of the 38th International Conference of Machine Learning","author":"Liu Shiwei","year":"2021","unstructured":"Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, and Mykola Pechenizkiy. 2021. Selfish sparse RNN training. In Proceedings of the 38th International Conference of Machine Learning. 1\u201312."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2015.04.004"},{"key":"e_1_3_1_40_2","first-page":"977","volume-title":"Proceedings of the 54th International Symposium on Microarchitecture (MICRO\u201921)","author":"Lu Liqiang","year":"2021","unstructured":"Liqiang Lu, Yicheng Jin, Hangrui Bi, Zizhang Luo, Peng Li, Tao Wang, and Yun Liang. 2021. Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture. In Proceedings of the 54th International Symposium on Microarchitecture (MICRO\u201921). 977\u2013991."},{"issue":"3","key":"e_1_3_1_41_2","first-page":"579","article-title":"A nonnegative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method","volume":"27","author":"Luo Xin","year":"2015","unstructured":"Xin Luo, MengChu Zhou, Shuai Li, Zhuhong You, Yunni Xia, and Qingsheng Zhu. 2015. A nonnegative latent factor model for large-scale sparse matrices in recommender systems via alternating direction method. IEEE Transactions on Neural Networks and Learning Systems 27, 3 (2015), 579\u2013592.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/3014904.3014982"},{"key":"e_1_3_1_43_2","article-title":"Linear Programming Problems from C. M\u00e9sz\u00e1ros","author":"M\u00e9sz\u00e1ros Csaba","year":"2004","unstructured":"Csaba M\u00e9sz\u00e1ros. 2004. Linear Programming Problems from C. M\u00e9sz\u00e1ros. http:\/\/www.sztaki.hu\/ meszaros\/public_ftp\/lptestset","journal-title":"http:\/\/www.sztaki.hu\/ meszaros\/public_ftp\/lptestset"},{"key":"e_1_3_1_44_2","first-page":"252","volume-title":"Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201923)","author":"Mu\u00f1oz-Mart\u00ednez Francisco","year":"2023","unstructured":"Francisco Mu\u00f1oz-Mart\u00ednez, Raveesh Garg, Michael Pellauer, Jos\u00e9 L. Abell\u00e1n, Manuel E. Acacio, and Tushar Krishna. 2023. Flexagon: A multi-dataflow sparse-sparse matrix multiplication accelerator for efficient DNN processing. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201923). 252\u2013265."},{"key":"e_1_3_1_45_2","first-page":"122","volume-title":"2021 IEEE International Symposium on Workload Characterization (IISWC\u201921)","author":"Mu\u00f1oz-Matr\u00ednez Fransico","year":"2021","unstructured":"Fransico Mu\u00f1oz-Matr\u00ednez, Jos\u00e9 L. Abell\u00e1n, Manuel E. Acacio, and Tushar Krishna. 2021. STONNE: Enabling cycle-level microarchitectural simulation for DNN inference accelerators. In 2021 IEEE International Symposium on Workload Characterization (IISWC\u201921). 122\u2013125."},{"key":"e_1_3_1_46_2","unstructured":"Markus Nagel Marios Fournarakis Rana Ali Amjad Yelysei Bondarenko Mart van Baalen and Tijmen Blankevoort. 2021. A White Paper on Neural Network Quantization. arxiv:2106.08295 [cs.LG]"},{"key":"e_1_3_1_47_2","unstructured":"M. Naumov L. S. Chien P. Vandermersch and U. Kapasi. 2022. CUSPARSE Library. https:\/\/developer.nvidia.com\/cusparse"},{"key":"e_1_3_1_48_2","unstructured":"NVIDIA. 2023. cuBLAS. https:\/\/developer.nvidia.com\/cublas"},{"key":"e_1_3_1_49_2","first-page":"724","volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Pal Subhankar","year":"2018","unstructured":"Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. OuterSPACE: An outer product based sparse matrix multiplication accelerator. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201918). 724\u2013736."},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_1_51_2","first-page":"1014","volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201921)","author":"Qin Eric","year":"2021","unstructured":"Eric Qin, Geonhwa Jeong, William Won, Sheng-Chun Kao, Hyoukjun Kwon, Sudarshan Srinivasan, Dipankar Das, Gordon E. Moon, Sivasankaran Rajamanickam, and Tushar Krishna. 2021. Extending sparse tensor accelerators to support multiple compression formats. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS\u201921). 1014\u20131024."},{"key":"e_1_3_1_52_2","first-page":"58","volume-title":"Proceedings of the 26th International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Qin Eric","year":"2020","unstructured":"Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vinnet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training. In Proceedings of the 26th International Symposium on High Performance Computer Architecture (HPCA\u201920). 58\u201370."},{"key":"e_1_3_1_53_2","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. https:\/\/github.com\/openai\/gpt-2"},{"key":"e_1_3_1_54_2","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1145\/3352460.3358330","volume-title":"Proceedings of the 52nd International Symposium on Microarchitecture (MICRO\u201919)","author":"Sadi Fazle","year":"2019","unstructured":"Fazle Sadi, Joe Sweeney, Tze Meng Low, James C. Hoe, Larry Pileggi, and Franz Franchetti. 2019. Efficient SpMV operation for large and highly sparse matrices using scalable multi-way merge parallelization. In Proceedings of the 52nd International Symposium on Microarchitecture (MICRO\u201919). 347\u2013358."},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1155\/2016\/1260879"},{"key":"e_1_3_1_56_2","unstructured":"Ananda Samajdar Yuhao Zhu Paul Whatmough Matthew Mattina and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN Accelerator Simulator. arxiv:1811.02883 [cs.DC]"},{"key":"e_1_3_1_57_2","volume-title":"Digital Design with Chisel","author":"Schoeberl Martin","year":"2019","unstructured":"Martin Schoeberl. 2019. Digital Design with Chisel. Kindle Direct Publishing."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751244"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","first-page":"766","DOI":"10.1109\/MICRO50266.2020.00068","volume-title":"2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Srivastava Nitish","year":"2020","unstructured":"Nitish Srivastava, Hanchen Jin, Jie Liu, David Albonesi, and Zhiru Zhang. 2020. MatRaptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). 766\u2013780."},{"key":"e_1_3_1_60_2","first-page":"689","volume-title":"Proceedings of the 26th International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Srivastava Nitish","year":"2020","unstructured":"Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations. In Proceedings of the 26th International Symposium on High Performance Computer Architecture (HPCA\u201920). 689\u2013702."},{"key":"e_1_3_1_61_2","first-page":"97","volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Wang Hanrui","year":"2020","unstructured":"Hanrui Wang, Zhekai Zhang, and Song Han. 2020. SpAtten: Efficient sparse attention architecture with cascade token and head pruning. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201920). 97\u2013110."},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362674"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO56248.2022.00096"},{"key":"e_1_3_1_64_2","first-page":"570","volume-title":"Proceedings of the 27th International Symposium on High Performance Computer Architecture (HPCA\u201921)","author":"Xie Xinfeng","year":"2021","unstructured":"Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, and Yuan Xie. 2021. SpaceA: Sparse matrix vector multiplication on processing-in-memory accelerator. In Proceedings of the 27th International Symposium on High Performance Computer Architecture (HPCA\u201921). 570\u2013583."},{"key":"e_1_3_1_65_2","first-page":"711","volume-title":"Proceedings of the 53rd International Symposium on Microarchitecture (MICRO\u201920)","author":"Yang Dingqing","year":"2020","unstructured":"Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, and Mieszko Lis. 2020. Procrustes: A dataflow and accelerator for sparse deep neural network training. In Proceedings of the 53rd International Symposium on Microarchitecture (MICRO\u201920). 711\u2013724."},{"key":"e_1_3_1_66_2","first-page":"1","volume-title":"Proceedings of the 49th International Symposium on Microarchitecture (MICRO\u201916)","author":"Zhang Shijin","year":"2016","unstructured":"Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proceedings of the 49th International Symposium on Microarchitecture (MICRO\u201916). 1\u201312."},{"key":"e_1_3_1_67_2","first-page":"261","volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Zhang Zhekai","year":"2020","unstructured":"Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. SpArch: Efficient architecture for sparse matrix multiplication. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201920). 261\u2013274."},{"key":"e_1_3_1_68_2","unstructured":"Neta Zmora Guy Jacob Lev Zlotnik Bar Elharar and Gal Novik. 2019. Neural Network Distiller: A Python Package for DNN Compression Research. arxiv:1910.12232 [cs.LG]"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653020","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3653020","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:56:55Z","timestamp":1750291015000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653020"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,21]]},"references-count":67,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3653020"],"URL":"https:\/\/doi.org\/10.1145\/3653020","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,21]]},"assertion":[{"value":"2023-10-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}