{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:20:10Z","timestamp":1773843610561,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T00:00:00Z","timestamp":1631836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes. 2.5D integration or chiplet-based architectures interconnect multiple small chips (i.e., chiplets) to form a large computing system, presenting a feasible solution beyond a monolithic IMC architecture to accelerate large deep learning models. This paper presents a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of such a paradigm shift in IMC architecture design. SIAM integrates device, circuit, architecture, network-on-chip (NoC), network-on-package (NoP), and DRAM access models to realize an end-to-end system. SIAM is scalable in its support of a wide range of deep neural networks (DNNs), customizable to various network structures and configurations, and capable of efficient design space exploration. We demonstrate the flexibility, scalability, and simulation speed of SIAM by benchmarking different state-of-the-art DNNs with CIFAR-10, CIFAR-100, and ImageNet datasets. We further calibrate the simulation results with a published silicon result, SIMBA. The chiplet-based IMC architecture obtained through SIAM shows 130<jats:inline-formula><jats:alternatives><jats:tex-math\/><\/jats:alternatives><\/jats:inline-formula>and 72<jats:inline-formula><jats:alternatives><jats:tex-math\/><\/jats:alternatives><\/jats:inline-formula>improvement in energy-efficiency for ResNet-50 on the ImageNet dataset compared to Nvidia V100 and T4 GPUs.<\/jats:p>","DOI":"10.1145\/3476999","type":"journal-article","created":{"date-parts":[[2021,9,17]],"date-time":"2021-09-17T18:36:51Z","timestamp":1631903811000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":59,"title":["SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks"],"prefix":"10.1145","volume":"20","author":[{"given":"Gokul","family":"Krishnan","sequence":"first","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]},{"given":"Sumit K.","family":"Mandal","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison, Madison, WI, USA"}]},{"given":"Manvitha","family":"Pannala","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]},{"given":"Chaitali","family":"Chakrabarti","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]},{"given":"Jae-Sun","family":"Seo","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]},{"given":"Umit Y.","family":"Ogras","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison, Madison, WI, USA"}]},{"given":"Yu","family":"Cao","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,9,17]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Fabrication Cost. https:\/\/anysilicon.com\/die-per-wafer-formula-free-calculators\/. Accessed","year":"2021","unstructured":"AnySilicon. 2011. Fabrication Cost. https:\/\/anysilicon.com\/die-per-wafer-formula-free-calculators\/. Accessed 29 Mar. 2021 . AnySilicon. 2011. Fabrication Cost. https:\/\/anysilicon.com\/die-per-wafer-formula-free-calculators\/. Accessed 29 Mar. 2021."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323439.3323989"},{"key":"e_1_2_1_3_1","volume-title":"\u2018Zeppelin\u2019: An SoC for multichip architectures. In 2018 IEEE ISSCC","author":"Beck Noah","unstructured":"Noah Beck , Sean White , Milam Paraschou , and Samuel Naffziger . 2018. \u2018Zeppelin\u2019: An SoC for multichip architectures. In 2018 IEEE ISSCC . IEEE , 40\u201342. Noah Beck, Sean White, Milam Paraschou, and Samuel Naffziger. 2018. \u2018Zeppelin\u2019: An SoC for multichip architectures. In 2018 IEEE ISSCC. IEEE, 40\u201342."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_2_1_5_1","volume-title":"Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911","author":"\u00a0al Mariusz Bojarski","year":"2017","unstructured":"Mariusz Bojarski et \u00a0al . 2017. Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911 ( 2017 ). Mariusz Bojarski et\u00a0al. 2017. Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911 (2017)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/3437539.3437559"},{"key":"e_1_2_1_7_1","volume-title":"2018","author":"\u00a0al Marc Erett","unstructured":"Marc Erett et \u00a0al . 2018. A 126mW 56Gb\/s NRZ wireline transceiver for synchronous short-reach applications in 16nm FinFET. In 2018 IEEE ISSCC. IEEE , 274\u2013276. Marc Erett et\u00a0al. 2018. A 126mW 56Gb\/s NRZ wireline transceiver for synchronous short-reach applications in 16nm FinFET. In 2018 IEEE ISSCC. IEEE, 274\u2013276."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3224419"},{"key":"e_1_2_1_9_1","volume-title":"2017 IEEE ISSCC","author":"\u00a0al David Greenhill","unstructured":"David Greenhill et \u00a0al . 2017. 3.3 A 14nm 1GHz FPGA with 2.5 D transceiver integration. In 2017 IEEE ISSCC . IEEE. David Greenhill et\u00a0al. 2017. 3.3 A 14nm 1GHz FPGA with 2.5 D transceiver integration. In 2017 IEEE ISSCC. IEEE."},{"key":"e_1_2_1_10_1","volume-title":"Deep residual learning for image recognition","author":"\u00a0al Kaiming He","unstructured":"Kaiming He et \u00a0al . 2016. Deep residual learning for image recognition . In IEEE CVPR. 770\u2013778. Kaiming He et\u00a0al. 2016. Deep residual learning for image recognition. In IEEE CVPR. 770\u2013778."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the IEEE\/CVF ICCV. 1314\u20131324","author":"\u00a0al Andrew Howard","year":"2019","unstructured":"Andrew Howard et \u00a0al . 2019 . Searching for mobilenetv3 . In Proceedings of the IEEE\/CVF ICCV. 1314\u20131324 . Andrew Howard et\u00a0al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE\/CVF ICCV. 1314\u20131324."},{"key":"e_1_2_1_12_1","volume-title":"Densely connected convolutional networks","author":"\u00a0al Gao Huang","unstructured":"Gao Huang et \u00a0al . 2017. Densely connected convolutional networks . In IEEE CVPR. 4700\u20134708. Gao Huang et\u00a0al. 2017. Densely connected convolutional networks. In IEEE CVPR. 4700\u20134708."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00083"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322237"},{"key":"e_1_2_1_15_1","volume-title":"RxNN: A framework for evaluating deep neural networks on resistive crossbars","author":"Jain Shubham","year":"2020","unstructured":"Shubham Jain , Abhronil Sengupta , Kaushik Roy , and Anand Raghunathan . 2020. RxNN: A framework for evaluating deep neural networks on resistive crossbars . IEEE TCAD ( 2020 ). Shubham Jain, Abhronil Sengupta, Kaushik Roy, and Anand Raghunathan. 2020. RxNN: A framework for evaluating deep neural networks on resistive crossbars. IEEE TCAD (2020)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3050856"},{"key":"e_1_2_1_17_1","volume-title":"A detailed and flexible cycle-accurate network-on-chip simulator","author":"\u00a0al Nan Jiang","unstructured":"Nan Jiang et \u00a0al . 2013. A detailed and flexible cycle-accurate network-on-chip simulator . In IEEE ISPASS. 86\u201396. Nan Jiang et\u00a0al. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In IEEE ISPASS. 86\u201396."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830808"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2015.2414456"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2020.3001559"},{"key":"e_1_2_1_21_1","volume-title":"2021 China Semiconductor Technology International Conference (CSTIC). IEEE, 1\u20134.","author":"\u00a0al Gokul Krishnan","year":"2021","unstructured":"Gokul Krishnan et \u00a0al . 2021 . Interconnect-centric benchmarking of in-memory acceleration for DNNS . In 2021 China Semiconductor Technology International Conference (CSTIC). IEEE, 1\u20134. Gokul Krishnan et\u00a0al. 2021. Interconnect-centric benchmarking of in-memory acceleration for DNNS. In 2021 China Semiconductor Technology International Conference (CSTIC). IEEE, 1\u20134."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2019.2960207"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2020.3015509"},{"key":"e_1_2_1_24_1","volume-title":"Energy-efficient networks-on-chip architectures: Design and run-time optimization. Network-on-Chip Security and Privacy","author":"Mandal Sumit K","year":"2021","unstructured":"Sumit K Mandal , Anish Krishnakumar , and Umit Y Ogras . 2021. Energy-efficient networks-on-chip architectures: Design and run-time optimization. Network-on-Chip Security and Privacy ( 2021 ), 55. Sumit K Mandal, Anish Krishnakumar, and Umit Y Ogras. 2021. Energy-efficient networks-on-chip architectures: Design and run-time optimization. Network-on-Chip Security and Privacy (2021), 55."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2008.2010691"},{"key":"e_1_2_1_26_1","volume-title":"Datasheet for DDR3 Model. https:\/\/media-www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr3\/2gb_ddr3l-rs.pdf?rev=f43686e89394458caff410138d9d2152. Accessed","author":"MICRON.","year":"2021","unstructured":"MICRON. 2011. Datasheet for DDR3 Model. https:\/\/media-www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr3\/2gb_ddr3l-rs.pdf?rev=f43686e89394458caff410138d9d2152. Accessed 29 Mar. 2021 . MICRON. 2011. Datasheet for DDR3 Model. https:\/\/media-www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr3\/2gb_ddr3l-rs.pdf?rev=f43686e89394458caff410138d9d2152. Accessed 29 Mar. 2021."},{"key":"e_1_2_1_27_1","volume-title":"Datasheet for DDR4 Model. https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr4\/4gb_ddr4_dram_2e0d.pdf. Accessed","author":"MICRON.","year":"2021","unstructured":"MICRON. 2014. Datasheet for DDR4 Model. https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr4\/4gb_ddr4_dram_2e0d.pdf. Accessed 29 Mar. 2021 . MICRON. 2014. Datasheet for DDR4 Model. https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/dram\/ddr4\/4gb_ddr4_dram_2e0d.pdf. Accessed 29 Mar. 2021."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2020.3022920"},{"key":"e_1_2_1_29_1","volume-title":"An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE IEDM","author":"Peng Xiaochen","unstructured":"Xiaochen Peng , Shanshi Huang , Yandong Luo , Xiaoyu Sun , and Shimeng Yu. 2019. DNN+ neurosim : An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE IEDM . IEEE , 32\u20135. Xiaochen Peng, Shanshi Huang, Yandong Luo, Xiaoyu Sun, and Shimeng Yu. 2019. DNN+ neurosim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE IEDM. IEEE, 32\u20135."},{"key":"e_1_2_1_30_1","volume-title":"2013","author":"\u00a0al John W","unstructured":"John W Poulton et \u00a0al . 2013. A 0.54 pJ\/b 20Gb\/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications. In 2013 IEEE ISSCC. IEEE , 404\u2013405. John W Poulton et\u00a0al. 2013. A 0.54 pJ\/b 20Gb\/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications. In 2013 IEEE ISSCC. IEEE, 404\u2013405."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3338075.3338083"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/PerComWorkshops51409.2021.9431078"},{"key":"e_1_2_1_33_1","volume-title":"Scale-sim: Systolic CNN accelerator simulator. arXiv preprint arXiv:1811.02883","author":"Samajdar Ananda","year":"2018","unstructured":"Ananda Samajdar , Yuhao Zhu , Paul Whatmough , Matthew Mattina , and Tushar Krishna . 2018 . Scale-sim: Systolic CNN accelerator simulator. arXiv preprint arXiv:1811.02883 (2018). Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic CNN accelerator simulator. arXiv preprint arXiv:1811.02883 (2018)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001139"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358302"},{"key":"e_1_2_1_36_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228414"},{"key":"e_1_2_1_38_1","volume-title":"Pipelayer: A pipelined reram-based accelerator for deep learning","author":"\u00a0al Linghao Song","year":"2017","unstructured":"Linghao Song et \u00a0al . 2017 . Pipelayer: A pipelined reram-based accelerator for deep learning . In IEEE HPCA. 541\u2013552. Linghao Song et\u00a0al. 2017. Pipelayer: A pipelined reram-based accelerator for deep learning. In IEEE HPCA. 541\u2013552."},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Simon M Tam et\u00a0al. 2018. SkyLake-SP: A 14nm 28-core xeon\u00ae Processor. In 2018 IEEE ISSCC. 34\u201336. Simon M Tam et\u00a0al. 2018. SkyLake-SP: A 14nm 28-core xeon\u00ae Processor. In 2018 IEEE ISSCC. 34\u201336.","DOI":"10.1109\/ISSCC.2018.8310170"},{"key":"e_1_2_1_40_1","volume-title":"Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. In 2018 IEEE CICC","author":"\u00a0al Walker J","unstructured":"Walker J Turner et \u00a0al . 2018. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. In 2018 IEEE CICC . IEEE , 1\u20138. Walker J Turner et\u00a0al. 2018. Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. In 2018 IEEE CICC. IEEE, 1\u20138."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00137"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00066"},{"key":"e_1_2_1_43_1","first-page":"48","article-title":"Vesti: Energy-efficient in-memory computing accelerator for deep neural networks","volume":"28","author":"\u00a0al Shihui Yin","year":"2019","unstructured":"Shihui Yin et \u00a0al . 2019 . Vesti: Energy-efficient in-memory computing accelerator for deep neural networks . IEEE TVLSI 28 , 1 (2019), 48 \u2013 61 . Shihui Yin et\u00a0al. 2019. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE TVLSI 28, 1 (2019), 48\u201361.","journal-title":"IEEE TVLSI"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386263.3407647"},{"key":"e_1_2_1_45_1","volume-title":"2019 Symposium on VLSI Circuits. IEEE, C300\u2013C301","author":"\u00a0al Brian Zimmer","year":"2019","unstructured":"Brian Zimmer et \u00a0al . 2019 . A 0.11 PJ\/Op, 0.32-128 TOPS, scalable multi-chip-module-based deep neural network accelerator with ground-reference signaling in 16nm . In 2019 Symposium on VLSI Circuits. IEEE, C300\u2013C301 . Brian Zimmer et\u00a0al. 2019. A 0.11 PJ\/Op, 0.32-128 TOPS, scalable multi-chip-module-based deep neural network accelerator with ground-reference signaling in 16nm. In 2019 Symposium on VLSI Circuits. IEEE, C300\u2013C301."},{"key":"e_1_2_1_46_1","volume-title":"Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578","author":"Zoph Barret","year":"2016","unstructured":"Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 ( 2016 ). Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016)."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476999","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3476999","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:46Z","timestamp":1750188646000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3476999"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,17]]},"references-count":46,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3476999"],"URL":"https:\/\/doi.org\/10.1145\/3476999","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,17]]},"assertion":[{"value":"2021-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}