{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T15:56:02Z","timestamp":1781884562092,"version":"3.54.5"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T00:00:00Z","timestamp":1651104000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2022,4,30]]},"abstract":"<jats:p>This article presents Computational SRAM (C-SRAM) solution combining In- and Near-Memory Computing approaches. It allows performing arithmetic, logic, and complex memory operations inside or next to the memory without transferring data over the system bus, leading to significant energy reduction. Operations are performed on large vectors of data occupying the entire physical row of C-SRAM array, leading to high performance gains. We introduce the C-SRAM solution in this article as an integrated vector processing unit to be used by a scalar processor as an energy-efficient and high performing co-processor. We detail the C-SRAM system design on different levels: (i) circuit design and silicon proof of concept, (ii) system interface and instruction set architecture, and (iii) high-level software programming and simulation. Experimental results on two complete memory-bound applications, AES and MobileNetV2, show that the C-SRAM implementation achieves up to 70\u00d7 timing speedup and 37\u00d7 energy reduction compared to scalar architecture, and up to 17\u00d7 timing speedup and 5\u00d7 energy reduction compared to SIMD architecture.<\/jats:p>","DOI":"10.1145\/3485823","type":"journal-article","created":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T10:08:04Z","timestamp":1651140484000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Towards a Truly Integrated Vector Processing Unit for Memory-bound Applications Based on a Cost-competitive Computational SRAM Design Solution"],"prefix":"10.1145","volume":"18","author":[{"given":"Maha","family":"Kooli","sequence":"first","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Antoine","family":"Heraud","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Henri-Pierre","family":"Charles","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bastien","family":"Giraud","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roman","family":"Gauchi","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mona","family":"Ezzadeen","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kevin","family":"Mambu","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Valentin","family":"Egloff","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jean-Philippe","family":"Noel","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CEA, LIST, F-38000 Grenoble"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,4,28]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"[n.d.]. ARM: Cortex-M0+ Processor. Retrieved from https:\/\/developer.arm.com\/products\/processors\/cortex-m\/cortex-m0-plus."},{"key":"e_1_3_1_3_2","unstructured":"[n.d.]. CUDA. Retrieved from https:\/\/developer.nvidia.com\/cuda-zone."},{"key":"e_1_3_1_4_2","unstructured":"[n.d.]. Intel Advanced Vector Extensions 512. Retrieved from https:\/\/www.intel.co.uk\/content\/www\/uk\/en\/architecture-and-technology\/avx-512-overview.html."},{"key":"e_1_3_1_5_2","unstructured":"[n.d.]. LLVM Language Reference Manual. Retrieved from www.llvm.org\/docs\/LangRef.html."},{"key":"e_1_3_1_6_2","unstructured":"[n.d.]. OpenCL. Retrieved from https:\/\/www.khronos.org\/opencl\/."},{"key":"e_1_3_1_7_2","unstructured":"[n.d.]. Porting and Optimizing HPC Applications for ARM SVE . Retrieved from https:\/\/developer.arm.com\/documentation\/101726\/0210\/Learn-about-the-Scalable-Vector-Extension--SVE-?lang=en."},{"key":"e_1_3_1_8_2","unstructured":"[n.d.]. RISC-V \u201cV\u201d Vector Extension. Retrieved from https:\/\/riscv.github.io\/documents\/riscv-v-spec\/."},{"key":"e_1_3_1_9_2","unstructured":"2018. UPMEM. Retrieved Dec. 2021 from www.upmem.com\/."},{"key":"e_1_3_1_10_2","first-page":"481","volume-title":"IEEE International Symposium on High-Performance Computer Architecture","author":"Aga Shaizeen","year":"2017","unstructured":"Shaizeen Aga et\u00a0al. 2017. Compute caches. In IEEE International Symposium on High-Performance Computer Architecture. 481\u2013492."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRC.2016.7738698"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2465787.2465794"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1964.tb04102.x"},{"key":"e_1_3_1_14_2","volume-title":"IEEE International Electron Devices Meeting","author":"Chen Wei-Hao","year":"2017","unstructured":"Wei-Hao Chen et\u00a0al. 2017. A 16Mb dual-mode ReRAM macro with sub-14ns computing-in-memory and memory functions enabled by self-write termination scheme. In IEEE International Electron Devices Meeting. 28\u20132."},{"key":"e_1_3_1_15_2","volume-title":"The Design of Rijndael: AES\u2014The Advanced Encryption Standard","author":"Daemen Joan","year":"2013","unstructured":"Joan Daemen and Vincent Rijmen. 2013. The Design of Rijndael: AES\u2014The Advanced Encryption Standard. Springer Science & Business Media."},{"key":"e_1_3_1_16_2","first-page":"383","volume-title":"IEEE International Symposium on Computer Architecture","author":"Eckert Charles","year":"2018","unstructured":"Charles Eckert et\u00a0al. 2018. Neural cache: Bit-serial in-cache acceleration of deep neural networks. In IEEE International Symposium on Computer Architecture. 383\u2013396."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE51398.2021.9473992"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2005.170"},{"key":"e_1_3_1_19_2","first-page":"427","volume-title":"IEEE Design, Automation and Test in Europe Conference","author":"Gaillardon Pierre-Emmanuel","year":"2016","unstructured":"Pierre-Emmanuel Gaillardon et\u00a0al. 2016. The programmable logic-in-memory (PLiM) computer. In IEEE Design, Automation and Test in Europe Conference. 427\u2013432."},{"key":"e_1_3_1_20_2","volume-title":"Computer Architecture: A Quantitative Approach, 6th Edition","author":"Hennessy John L.","year":"2018","unstructured":"John L. Hennessy and David Patterson. 2018. Computer Architecture: A Quantitative Approach, 6th Edition. Elsevier."},{"key":"e_1_3_1_21_2","volume-title":"IEEE International Solid-State Circuits Conference","author":"Horowitz Mark","year":"2014","unstructured":"Mark Horowitz. 2014. 1.1 Computing\u2019s energy problem (and what we can do about it). In IEEE International Solid-State Circuits Conference."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.3390\/jlpea10040030"},{"key":"e_1_3_1_23_2","volume-title":"Analysing and Supporting the Reliability Decision-making Process in Computing Systems with a Reliability Evaluation Framework","author":"Kooli Maha","year":"2016","unstructured":"Maha Kooli. 2016. Analysing and Supporting the Reliability Decision-making Process in Computing Systems with a Reliability Evaluation Framework. Thesis. Universit\u00e9 Montpellier."},{"key":"e_1_3_1_24_2","volume-title":"IEEE\/ACM RSP-ESWEEK","author":"Kooli Maha","year":"2017","unstructured":"Maha Kooli et\u00a0al. 2017. Software Platform Dedicated for In-Memory Computing Circuit Evaluation. In IEEE\/ACM RSP-ESWEEK."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2018.8342276"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2013.2282132"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2014.2357292"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2898064"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3323476"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116506"},{"key":"e_1_3_1_32_2","first-page":"286","article-title":"A 35.6 TOPS\/W\/mm2 3-stage pipelined computational SRAM with adjustable form factor for highly data-centric applications","volume":"3","author":"Noel J.-P.","year":"2020","unstructured":"J.-P. Noel, M. Pezzin, R. Gauchi, J.-F. Christmann, M. Kooli, H.-P. Charles, L. Ciampolini, M. Diallo, F. Lepin, B. Blampey et\u00a0al. 2020. A 35.6 TOPS\/W\/mm2 3-stage pipelined computational SRAM with adjustable form factor for highly data-centric applications. IEEE ISSCL Lett. 3 (2020), 286\u2013289.","journal-title":"IEEE ISSCL Lett."},{"key":"e_1_3_1_33_2","unstructured":"P. Prinz T. Crawford J. L. Hennessy and D. A. Patterson. 2018. Computer Architecture: A Quantitative Approach (6th Edition)."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1145\/3370748.3406550","volume-title":"ACM\/IEEE International Symposium on Low Power Electronics and Design (ISLPED)","author":"Roman Gauchi","year":"2020","unstructured":"Gauchi Roman et\u00a0al. 2020. Reconfigurable tiles of computing-in-memory SRAM architecture for scalable vectorization. In ACM\/IEEE International Symposium on Low Power Electronics and Design (ISLPED). 121\u2013126."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41565-020-0655-z"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1145\/3123939.3124544","volume-title":"IEEE\/ACM International Symposium on Microarchitecture","author":"Seshadri Vivek","year":"2017","unstructured":"Vivek Seshadri et\u00a0al. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In IEEE\/ACM International Symposium on Microarchitecture. 273\u2013287."},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1145\/2989081.2989087","volume-title":"2nd International Symposium on Memory Systems","author":"Siegl Patrick","year":"2016","unstructured":"Patrick Siegl et\u00a0al. 2016. Data-centric computing frontiers: A survey on processing-in-memory. In 2nd International Symposium on Memory Systems. Association for Computing Machinery, New York, NY, 295\u2013308."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTQE.2010.2051217"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662419"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2018.2882194"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2017.39"},{"issue":"1","key":"e_1_3_1_43_2","first-page":"203","article-title":"Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors","volume":"55","author":"Xue Cheng-Xin","year":"2019","unstructured":"Cheng-Xin Xue et\u00a0al. 2019. Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors. IEEE JSSCC J. 55, 1 (2019), 203\u2013215.","journal-title":"IEEE JSSCC J."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967243"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2017.2776302"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485823","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3485823","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:43Z","timestamp":1750191103000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485823"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,28]]},"references-count":44,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4,30]]}},"alternative-id":["10.1145\/3485823"],"URL":"https:\/\/doi.org\/10.1145\/3485823","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"value":"1550-4832","type":"print"},{"value":"1550-4840","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,28]]},"assertion":[{"value":"2020-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-04-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}