{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T13:28:10Z","timestamp":1773840490605,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,5,25]],"date-time":"2022-05-25T00:00:00Z","timestamp":1653436800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            This article proposes a novel hardware accelerator for the inference task with sparse convolutional neural networks (CNNs) by building a hardware unit to perform Image to Column (\n            <jats:sc>Im2Col<\/jats:sc>\n            ) transformation of the input feature map coupled with a systolic-array-based general matrix-matrix multiplication (GEMM) unit. Our design carefully overlaps the\n            <jats:sc>Im2Col<\/jats:sc>\n            transformation with the GEMM computation to maximize parallelism. We propose a novel design for the\n            <jats:sc>Im2Col<\/jats:sc>\n            unit that uses a set of distributed local memories connected by a ring network, which improves energy efficiency and latency by streaming the input feature map only once. The systolic-array-based GEMM unit in the accelerator can be dynamically configured as multiple GEMM units with square-shaped systolic arrays or as a single GEMM unit with a tall systolic array. This dynamic reconfigurability enables effective pipelining of\n            <jats:sc>Im2Col<\/jats:sc>\n            and GEMM operations and attains high processing element utilization for a wide range of CNNs. Further, our accelerator is sparsity aware, improving performance and energy efficiency by effectively mapping the sparse feature maps and weights to the processing elements, skipping ineffectual operations and unnecessary data movements involving zeros. Our prototype, SPOTS, is on average 2.16\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , 1.74\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , and 1.63\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            faster than Gemmini, Eyeriss, and Sparse-PE, which are prior hardware accelerators for dense and sparse CNNs, respectively. SPOTS is also 78\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and 12\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            more energy-efficient when compared to CPU and GPU implementations, respectively.\n          <\/jats:p>","DOI":"10.1145\/3532863","type":"journal-article","created":{"date-parts":[[2022,4,25]],"date-time":"2022-04-25T16:30:15Z","timestamp":1650904215000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["An Accelerator for Sparse Convolutional Neural Networks Leveraging Systolic General Matrix-matrix Multiplication"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2608-7322","authenticated-orcid":false,"given":"Mohammadreza","family":"Soltaniyeh","sequence":"first","affiliation":[{"name":"Rutgers University, Busch Campus, Piscataway, NJ, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9290-3984","authenticated-orcid":false,"given":"Richard P.","family":"Martin","sequence":"additional","affiliation":[{"name":"Rutgers University, Busch Campus, Piscataway, NJ, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5048-8548","authenticated-orcid":false,"given":"Santosh","family":"Nagarakatte","sequence":"additional","affiliation":[{"name":"Rutgers University, Busch Campus, Piscataway, NJ, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,5,25]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2852335"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001138"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/567806.567807"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078186"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00090"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/2391541.2391560"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750389"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304014"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Hasan Genc Seah Kim Alon Amid Ameer Haj-Ali Vighnesh Iyer Pranav Prakash Jerry Zhao Daniel Grubb Harrison Liew Howard Mao Albert Ou Colin Schmidt Samuel Steffl John Wright Ion Stoica Jonathan Ragan-Kelley Krste Asanovic Borivoje Nikolic and Yakun Sophia Shao. 2021. Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration. arxiv:1911.09925 [cs.DC]","DOI":"10.1109\/DAC18074.2021.9586216"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358291"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"key":"e_1_3_1_18_2","article-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding","author":"Han Song","year":"2016","unstructured":"Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations (ICLR\u201916).","journal-title":"International Conference on Learning Representations (ICLR\u201916)"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358263"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2911674"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4939-2163-8_1"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304028"},{"key":"e_1_3_1_25_2","unstructured":"Hyoukjun Kwon Liangzhen Lai Michael Pellauer Tushar Krishna Yu-Hsin Chen and Vikas Chandra. 2019. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads. arXiv:arXiv:1909.07437"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173176"},{"key":"e_1_3_1_27_2","unstructured":"Zhi-Gang Liu Paul N. Whatmough and Matthew Mattina. 2020. Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration. arxiv:2009.02381 [cs.AR]"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.2979965"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.29"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00067"},{"key":"e_1_3_1_31_2","unstructured":"NVIDIA Peter Vingelmann and Frank H. P. Fitzek. 2020. CUDA release: 10.2.89. (2020). https:\/\/developer.nvidia.com\/cuda-toolkit."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080254"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3126708"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.32"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783720"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2019.2924007"},{"key":"e_1_3_1_38_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv:1409.1556 [cs.CV]"},{"key":"e_1_3_1_39_2","unstructured":"Mohammadreza Soltaniyeh Richard P. Martin and Santosh Nagarakatte. 2020. Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra. arxiv:2004.13907 [cs.DC] A Rutgers Department of Computer Science Technical Report DCS-TR-750."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSE.2007.44"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_42_2","first-page":"9","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201916)","author":"Wen Wei","year":"2016","unstructured":"Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201916). Curran Associates Inc., Red Hook, NY, 9 pages."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2021.3074300"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460776"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783723"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00011"},{"key":"e_1_3_1_47_2","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1109\/IISWC53511.2021.00029","volume-title":"2021 IEEE International Symposium on Workload Characterization (IISWC\u201921)","author":"Zhou Y.","year":"2021","unstructured":"Y. Zhou, M. Yang, C. Guo, J. Leng, Y. Liang, Q. Chen, M. Guo, and Y. Zhu. 2021. Characterizing and demystifying the implicit convolution algorithm on commercial matrix-multiplication accelerators. In 2021 IEEE International Symposium on Workload Characterization (IISWC\u201921). IEEE Computer Society, Los Alamitos, CA, 214\u2013225."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3532863","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3532863","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:37Z","timestamp":1750186837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3532863"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,25]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3532863"],"URL":"https:\/\/doi.org\/10.1145\/3532863","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,25]]},"assertion":[{"value":"2021-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}