{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T00:34:19Z","timestamp":1769733259805,"version":"3.49.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T00:00:00Z","timestamp":1624838400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2021,11,30]]},"abstract":"<jats:p>Compute-in-memory (CIM) is an attractive solution to address the \u201cmemory wall\u201d challenges for the extensive computation in deep learning hardware accelerators. For custom ASIC design, a specific chip instance is restricted to a specific network during runtime. However, the development cycle of the hardware is normally far behind the emergence of new algorithms. Although some of the reported CIM-based architectures can adapt to different deep neural network (DNN) models, few details about the dataflow or control were disclosed to enable such an assumption. Instruction set architecture (ISA) could support high flexibility, but its complexity would be an obstacle to efficiency. In this article, a runtime reconfigurable design methodology of CIM-based accelerators is proposed to support a class of convolutional neural networks running on one prefabricated chip instance with ASIC-like efficiency. First, several design aspects are investigated: (1) the reconfigurable weight mapping method; (2) the input side of data transmission, mainly about the weight reloading; and (3) the output side of data processing, mainly about the reconfigurable accumulation. Then, a system-level performance benchmark is performed for the inference of different DNN models, such as VGG-8 on a CIFAR-10 dataset and AlexNet GoogLeNet, ResNet-18, and DenseNet-121 on an ImageNet dataset to measure the trade-offs between runtime reconfigurability, chip area, memory utilization, throughput, and energy efficiency.<\/jats:p>","DOI":"10.1145\/3460436","type":"journal-article","created":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T17:06:56Z","timestamp":1624900016000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["A Runtime Reconfigurable Design of Compute-in-Memory\u2013Based Hardware Accelerator for Deep Learning Inference"],"prefix":"10.1145","volume":"26","author":[{"given":"Anni","family":"Lu","sequence":"first","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, US"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaochen","family":"Peng","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, US"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yandong","family":"Luo","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, US"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shanshi","family":"Huang","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, US"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shimeng","family":"Yu","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology, Atlanta, GA, US"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,28]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2976475"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC19947.2020.9062985"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC19947.2020.9063078"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TED.2015.2439635"},{"key":"e_1_2_1_5_1","first-page":"3","article-title":"2020. Drain-erase scheme in ferroelectric field-effect transistor\u2014Part I: Device characterization","volume":"67","author":"Wang P.","year":"2020","unstructured":"P. Wang 2020. Drain-erase scheme in ferroelectric field-effect transistor\u2014Part I: Device characterization . IEEE Transactions on Electron Devices 67 , 3 ( 2020 ), 955\u2013961. P. Wang et al. 2020. Drain-erase scheme in ferroelectric field-effect transistor\u2014Part I: Device characterization. IEEE Transactions on Electron Devices 67, 3 (2020), 955\u2013961.","journal-title":"IEEE Transactions on Electron Devices"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_2_1_7_1","volume-title":"International Conference for Learning Representations (ICLR\u201915)","author":"Simonyan K.","unstructured":"K. Simonyan and A. Zisserman . 2015. Very deep convolutional networks for large-scale image recognition . In International Conference for Learning Representations (ICLR\u201915) , San Diego, CA, USA. 1--14. K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference for Learning Representations (ICLR\u201915), San Diego, CA, USA. 1--14."},{"key":"e_1_2_1_8_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","unstructured":"Szegedy Christian et al. 2015. Going deeper with convolutions . In IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915) , Boston, MA, USA. 1--9. Szegedy Christian et al. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915), Boston, MA, USA. 1--9."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_2_1_12_1","volume-title":"International Conference on Machine Learning (ICML\u201919)","author":"Tan M.","year":"2019","unstructured":"M. Tan 2019 . EfficientNet: Rethinking model scaling for convolutional neural networks . International Conference on Machine Learning (ICML\u201919) , Long Beach, CA, USA. 6105--6114. M. Tan et al. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (ICML\u201919), Long Beach, CA, USA. 6105--6114."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001139"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.55"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322271"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358328"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001179"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00015"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173171"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304049"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2998456"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM19573.2019.8993491"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.3026667"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2018.8310401"},{"key":"e_1_2_1_25_1","volume-title":"IEEE International Electron Devices Meeting (IEDM\u201916)","author":"Trentzsch M.","year":"2016","unstructured":"M. Trentzsch 2016 . A 28 nm HKMG super low power embedded NVM technology based on ferroelectric FETs . In IEEE International Electron Devices Meeting (IEDM\u201916) , San Francisco, CA, USA. 11.5. M. Trentzsch et al. 2016. A 28 nm HKMG super low power embedded NVM technology based on ferroelectric FETs. In IEEE International Electron Devices Meeting (IEDM\u201916), San Francisco, CA, USA. 11.5."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2017.00538"},{"key":"e_1_2_1_27_1","first-page":"4","article-title":"2019. Optimizing weight mapping and data flow for convolutional neural networks on processing-in-memory architectures","volume":"67","author":"Peng X.","year":"2019","unstructured":"X. Peng 2019. Optimizing weight mapping and data flow for convolutional neural networks on processing-in-memory architectures . IEEE Transactions on Circuits and Systems I: Regular Papers 67 , 4 ( 2019 ), 1333\u20131343. X. Peng et al. 2019. Optimizing weight mapping and data flow for convolutional neural networks on processing-in-memory architectures. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 4 (2019), 1333\u20131343.","journal-title":"IEEE Transactions on Circuits and Systems I: Regular Papers"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2020.3001526"},{"key":"e_1_2_1_29_1","volume-title":"JEDEC Mobile Forum","author":"Skinner D.","unstructured":"D. Skinner . 2013. LPDDR4 moves mobile . In JEDEC Mobile Forum , Santa Clara, CA, USA . D. Skinner. 2013. LPDDR4 moves mobile. In JEDEC Mobile Forum, Santa Clara, CA, USA."},{"key":"e_1_2_1_30_1","first-page":"4","article-title":"2020. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators","volume":"70","author":"Jiang W.","year":"2020","unstructured":"W. Jiang 2020. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators . IEEE Transactions on Computers 70 , 4 ( 2020 ), 595--605. W. Jiang et al. 2020. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators. IEEE Transactions on Computers 70, 4 (2020), 595--605.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD45719.2019.8942065"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cdt.2016.0095"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2018.8310262"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2019.8662302"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293970"},{"key":"e_1_2_1_36_1","unstructured":"NVIDIA. 2020. NVIDIA A100 tensor core GPU. Retrieved 2021 from https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/.  NVIDIA. 2020. NVIDIA A100 tensor core GPU. Retrieved 2021 from https:\/\/www.nvidia.com\/en-us\/data-center\/a100\/."}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460436","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460436","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:04Z","timestamp":1750191424000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460436"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,28]]},"references-count":36,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,30]]}},"alternative-id":["10.1145\/3460436"],"URL":"https:\/\/doi.org\/10.1145\/3460436","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,28]]},"assertion":[{"value":"2021-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}