{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T10:40:07Z","timestamp":1751366407222,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272477"],"award-info":[{"award-number":["62272477"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSF of Hunan Province","award":["2022JJ10066"],"award-info":[{"award-number":["2022JJ10066"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>Multiplication plays a critical role in SRAM-based Computing-in-Memory (CIM) architectures. However, current SRAM-based CIMs face three major limitations. First, they do not fully exploit bit-level sparsity, resulting in unnecessary overhead in both latency and energy consumption. Second, the generation of numerous zero-dot products is superfluous. Third, the irregular organization of SRAM complicates the implementation.<\/jats:p>\n          <jats:p\/>\n          <jats:p>To address these issues, we propose Shift-CIM, a general-purpose approach that fully leverages bit-level sparsity within SRAM-based multiplications. Shift-CIM aligns the multipliers within the SRAM array, accumulating only the required dot products based on the non-zero bits of the multipliers. Shift-CIM achieves a regular SRAM organization by assembling two irregular SRAM arrays in a transposed manner.<\/jats:p>\n          <jats:p>Our evaluations show that Shift-CIM is highly efficient, operating at a supply voltage of 0.9 V and a frequency of 833 MHz, while incurring only a 4.8% area overhead. Despite these modest requirements, Shift-CIM significantly accelerates multiplication operations, achieving up to 3.08\u00d7 the performance improvement and a 60% reduction in energy consumption compared to state-of-the-art designs.<\/jats:p>","DOI":"10.1145\/3719654","type":"journal-article","created":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T11:01:36Z","timestamp":1740826896000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Shift-CIM: In-SRAM Alignment To Support General-Purpose Bit-level Sparsity Exploration in SRAM Multiplication"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5620-2355","authenticated-orcid":false,"given":"Gaoyang","family":"Zhao","sequence":"first","affiliation":[{"name":"National University of Defense Technology College of Computer Science and Technology","place":["Changsha, China"]},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2174-4559","authenticated-orcid":false,"given":"Qiuran","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Defense Technology College of Computer Science and Technology","place":["Changsha, China"]},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5561-4685","authenticated-orcid":false,"given":"Rongzhen","family":"Lin","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, National University of Defense Technology","place":["Changsha, China"]},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9556-5535","authenticated-orcid":false,"given":"Yaohua","family":"Wang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology College of Computer Science and Technology","place":["Changsha, China"]},{"name":"Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology","place":["Changsha, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,6,28]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.21"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2022.3164756"},{"key":"e_1_3_3_4_2","unstructured":"Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang Minjie Wang Tianjun Xiao Bing Xu Chiyuan Zhang and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274v1 [cs.DC] https:\/\/arxiv.org\/abs\/1512.01274v1"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9365766"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2012.2185930"},{"key":"e_1_3_3_7_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921)","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al.. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR\u201921). OpenReview.net."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00040"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.2200\/S01109ED1V01Y202106CAC057"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS45731.2020.9180870"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358291"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42615.2023.10067260"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9365989"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42615.2023.10067305"},{"key":"e_1_3_3_17_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs\/1704.04861 (2017). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2020.3039206"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586103"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218567"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2023.3266651"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530660"},{"key":"e_1_3_3_23_2","unstructured":"Zhuang Liu Hanzi Mao Chao-Yuan Wu Christoph Feichtenhofer Trevor Darrell and Saining Xie. 2022. A ConvNet for the 2020s. arxiv:2201.03545 [cs.CV]. Retrieved from https:\/\/arxiv.org\/abs\/2201.03545"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480123"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2023.3250437"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","unstructured":"Chen Nie Chenyu Tang Jie Lin Huan Hu Chenyang Lv Ting Cao Weifeng Zhang Li Jiang Xiaoyao Liang Weikang Qian Yanan Sun and Zhezhi He. 2024. VSPIM: SRAM Processing-in-memory DNN acceleration via vector-scalar operations. IEEE Trans. Comput. 73 10 (2024) 2378\u20132390. DOI:10.1109\/TC.2023.3285095","DOI":"10.1109\/TC.2023.3285095"},{"key":"e_1_3_3_27_2","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zeming Lin Alban Desmaison Zachary DeVito Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. In 31st Conference on Neural Information Processing Systems (NIPS 2017). Long Beach CA USA."},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.5555\/1481567"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001139"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662392"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2019.2935251"},{"key":"e_1_3_3_32_2","article-title":"Very deep convolutional networks for large-scale image recognition","volume":"1409","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556 (2014).","journal-title":"CoRR"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ESSCIRC55480.2022.9911440"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731645"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731645"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42615.2023.10067842"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503219"},{"key":"e_1_3_3_38_2","volume-title":"Hacker\u2019s Delight (2 ed.)","author":"Warren H. S.","year":"2013","unstructured":"H. S. Warren. 2013. Hacker\u2019s Delight (2 ed.). Addison-Wesley."},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2018.2841824"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2012.2228398"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCE-Taiwan49838.2020.9258135"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731545"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322271"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASSCC.2017.8240204"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASSCC.2014.7008851"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9365958"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3719654","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T10:17:59Z","timestamp":1751365079000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3719654"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,28]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3719654"],"URL":"https:\/\/doi.org\/10.1145\/3719654","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,6,28]]},"assertion":[{"value":"2024-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-10","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}