{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T16:53:22Z","timestamp":1778950402950,"version":"3.51.4"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,9,18]],"date-time":"2024-09-18T00:00:00Z","timestamp":1726617600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>\n            Augmented Reality and Virtual Reality have emerged as the next frontier of intelligent image sensors and computer systems. In these systems, 3D die stacking stands out as a compelling solution, enabling\n            <jats:italic>in situ<\/jats:italic>\n            processing capability of the sensory data for tasks such as image classification and object detection at low power, low latency, and a small form factor. These intelligent 3D CMOS Image Sensor (CIS) systems present a wide design space, encompassing multiple domains (e.g., computer vision algorithms, circuit design, system architecture, and semiconductor technology, including 3D stacking) that have not been explored in-depth so far. This article aims to fill this gap. We first present an analytical evaluation framework, STAR-3DSim, dedicated to rapid pre-RTL evaluation of 3D-CIS systems capturing the entire stack from the pixel layer to the on-sensor processor layer. With STAR-3DSim, we then propose several knobs for PPA (power, performance, area) improvement of the Deep Neural Network (DNN) accelerator that can provide up to 53%, 41%, and 63% reduction in energy, latency, and area, respectively, across a broad set of relevant AR\/VR workloads. Last, we present full-system evaluation results by taking image sensing, cross-tier data transfer, and off-sensor communication into consideration.\n          <\/jats:p>","DOI":"10.1145\/3670404","type":"journal-article","created":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T11:42:34Z","timestamp":1717760554000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Estimating Power, Performance, and Area for On-Sensor Deployment of AR\/VR Workloads Using an Analytical Framework"],"prefix":"10.1145","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5337-5680","authenticated-orcid":false,"given":"Xiaoyu","family":"Sun","sequence":"first","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America, San Jose, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6148-7711","authenticated-orcid":false,"given":"Xiaochen","family":"Peng","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America, San Jose, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4815-9235","authenticated-orcid":false,"given":"Sai Qian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7918-4655","authenticated-orcid":false,"given":"Jorge","family":"Gomez","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6283-3564","authenticated-orcid":false,"given":"Win-San","family":"Khwa","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Co Ltd, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0086-1076","authenticated-orcid":false,"given":"Syed Shakib","family":"Sarwar","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6070-6310","authenticated-orcid":false,"given":"Ziyun","family":"Li","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7539-8250","authenticated-orcid":false,"given":"Weidong","family":"Cao","sequence":"additional","affiliation":[{"name":"The George Washington University, Washington, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-9733-7941","authenticated-orcid":false,"given":"Zhao","family":"Wang","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0358-1165","authenticated-orcid":false,"given":"Chiao","family":"Liu","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6905-6350","authenticated-orcid":false,"given":"Meng-Fan","family":"Chang","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Co Ltd, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0810-9903","authenticated-orcid":false,"given":"Barbara","family":"De Salvo","sequence":"additional","affiliation":[{"name":"Meta Reality Labs, Redmond, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5957-826X","authenticated-orcid":false,"given":"Kerem","family":"Akarvardar","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America, San Jose, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0096-1472","authenticated-orcid":false,"given":"H.-S. Philip","family":"Wong","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America, San Jose, United States"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,9,18]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Creating the future: Augmented reality, the next human-machine interface","author":"Abrash M.","year":"2021","unstructured":"M. Abrash. 2021. Creating the future: Augmented reality, the next human-machine interface. In IEEE International Electron Devices Meeting (IEDM\u201921).","journal-title":"IEEE International Electron Devices Meeting (IEDM\u201921)"},{"issue":"15","key":"e_1_3_1_3_2","doi-asserted-by":"crossref","DOI":"10.3390\/s21155197","article-title":"A 3.0 Gsymbol\/s\/lane MIPI C-PHY receiver with adaptive level-dependent equalizer for mobile CMOS image sensor","volume":"21","author":"Choi S.","year":"2021","unstructured":"S. Choi, C. Song, and Y.-C. Jang. 2021. A 3.0 Gsymbol\/s\/lane MIPI C-PHY receiver with adaptive level-dependent equalizer for mobile CMOS image sensor. Sensors 21, 15 (2021).","journal-title":"Sensors"},{"key":"e_1_3_1_4_2","article-title":"Augmented reality-the next frontier of image sensors and compute systems","author":"Liu C.","year":"2022","unstructured":"C. Liu, S. Chen, T.-H. Tsai, B. D. Salvo, and J. Gomez. 2022. Augmented reality-the next frontier of image sensors and compute systems. In IEEE International Solid-State Circuits Conference (ISSCC\u201922).","journal-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201922)"},{"key":"e_1_3_1_5_2","unstructured":"J. Gomez S. Patel S. S. Sarwar Z. Li R. Capoccia Z. Wang R. Pinkham A. Berkovich T.-H. Tsai B. D. Salvo and C. Liu. 2022. Distributed on-sensor compute system for AR\/VR devices: A semi-analytical simulation framework for power estimation. arXiv preprint arXiv:2203.07474."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2018.2817632"},{"key":"e_1_3_1_7_2","article-title":"3D stacked high throughput pixel parallel image sensor with integrated ReRAM based neural accelerator","author":"Amir M. F.","year":"2018","unstructured":"M. F. Amir and S. Mukhopadhyay. 2018. 3D stacked high throughput pixel parallel image sensor with integrated ReRAM based neural accelerator. In IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S\u201918).","journal-title":"IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S\u201918)"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2021.3121259"},{"key":"e_1_3_1_9_2","unstructured":"Sony. 2020. Sony to release world's first intelligent vision sensors with AI processing functionality. Retrieved from https:\/\/www.sony.com\/en\/SonyInfo\/News\/Press\/202005\/20-037E\/"},{"key":"e_1_3_1_10_2","volume-title":"Symposia on VLSI technology and Circuits","author":"Beyne E.","year":"2020","unstructured":"E. Beyne. 2020. Heterogeneous system partitioning and the 3D interconnect technology landscape. In Symposia on VLSI technology and Circuits."},{"key":"e_1_3_1_11_2","article-title":"A 4.6 \u03bcm, 512\u00d7 512, ultra-low power stacked digital pixel sensor with triple quantization and 127dB dynamic range","author":"Liu C.","year":"2020","unstructured":"C. Liu, L. Bainbridge, A. Berkovich, S. Chen, W. Gao, T.-H. Tsai, K. Mori, R. Ikeno, M. Uno, T. Isozaki, Y.-L. Tsai, I. Takayanagi, and J. Nakamura. 2020. A 4.6 \u03bcm, 512\u00d7 512, ultra-low power stacked digital pixel sensor with triple quantization and 127dB dynamic range. In IEEE International Electron Devices Meeting (IEDM\u201920).","journal-title":"IEEE International Electron Devices Meeting (IEDM\u201920)"},{"key":"e_1_3_1_12_2","article-title":"3D integration technologies for the stacked CMOS image sensors","author":"Kagawa Y.","year":"2019","unstructured":"Y. Kagawa and H. Iwamoto. 2019. 3D integration technologies for the stacked CMOS image sensors. In International 3D Systems Integration Conference (3DIC\u201919).","journal-title":"International 3D Systems Integration Conference (3DIC\u201919)"},{"key":"e_1_3_1_13_2","article-title":"A 6.9 \u03bcm pixel-pitch 3D stacked global shutter CMOS image sensor with 3M Cu-Cu connections","author":"Miura T.","year":"2019","unstructured":"T. Miura, M. Sakakibara, H. Takahashi, T. Taura, K. Tatani, Y. Oike, and T. Ezaki. 2019. A 6.9 \u03bcm pixel-pitch 3D stacked global shutter CMOS image sensor with 3M Cu-Cu connections. In International 3D Systems Integration Conference (3DIC\u201919).","journal-title":"International 3D Systems Integration Conference (3DIC\u201919)"},{"key":"e_1_3_1_14_2","article-title":"RITnet: Real-time semantic segmentation of the eye for gaze tracking","author":"Chaudhary A. K.","year":"2019","unstructured":"A. K. Chaudhary, R. Kothari, M. Acharya, S. Dangi, N. Nair, R. Bailey, C. Kanan, G. Diaz, and J. B. Pelz. 2019. RITnet: Real-time semantic segmentation of the eye for gaze tracking. In IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919).","journal-title":"IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919)"},{"key":"e_1_3_1_15_2","article-title":"U-Net: Convolutional networks for biomedical image segmentation","author":"Ronneberger O.","year":"2015","unstructured":"O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI\u201915).","journal-title":"Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI\u201915)"},{"key":"e_1_3_1_16_2","article-title":"Densely connected convolutional networks","author":"Huang G.","year":"2017","unstructured":"G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition.","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"e_1_3_1_17_2","article-title":"MobileNetV2: Inverted residuals and linear bottlenecks","author":"Sandler M.","year":"2018","unstructured":"M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE Conference on Computer Vision and Pattern Recognition.","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition"},{"issue":"4","key":"e_1_3_1_18_2","article-title":"MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality","volume":"39","author":"Han S.","year":"2020","unstructured":"S. Han, B. Liu, R. Cabezas, C. D. Twigg, P. Zhang, J. Petkau, T.-H. Yu, C.-J. Tai, M. Akbay, Z. Wang, A. Nitzan, G. Dong, Y. Ye, L. Tao, C. Wan, and R. Wang. 2020. MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans. Graph. 39, 4 (2020).","journal-title":"ACM Trans. Graph."},{"key":"e_1_3_1_19_2","article-title":"SOSNet: Second order similarity regularization for local descriptor learning","author":"Tian Y.","year":"2019","unstructured":"Y. Tian, X. Yu, B. Fan, F. Wu, H. Heijnen, and V. Balntas. 2019. SOSNet: Second order similarity regularization for local descriptor learning. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","journal-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3007421"},{"key":"e_1_3_1_21_2","article-title":"Deep residual learning for image recognition","author":"He K.","year":"2016","unstructured":"K. He, X. Zhang, S. Ren, and Sun Jian. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.","journal-title":"IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"e_1_3_1_22_2","unstructured":"S. Mehta and M. Rastegari. 2021. MobileViT: Light-weight general-purpose and mobile-friendly vision transformer. arXiv:2110.02178."},{"key":"e_1_3_1_23_2","unstructured":"A. Dosovitskiy L. Beyer A. Kolesnikov D. Weissenborn X. Zhai T. Unterthiner M. Dehghani M. Minderer G. Heigold S. Gelly J. Uszkoreit and N. Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929."},{"key":"e_1_3_1_24_2","unstructured":"P. Vasu J. Gabriel J. Zhu O. Tuzel and A. Ranjan. 2023. FastViT: A fast hybrid vision transformer using structural reparameterization. arXiv:2303.14189."},{"key":"e_1_3_1_25_2","article-title":"A systematic methodology for characterizing scalability of DNN accelerators using SCALE-SIM","author":"Samajdar A.","year":"2020","unstructured":"A. Samajdar, J. M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna. 2020. A systematic methodology for characterizing scalability of DNN accelerators using SCALE-SIM. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201920).","journal-title":"IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201920)"},{"key":"e_1_3_1_26_2","article-title":"Efficient processing of MLPerf mobile workloads using digital compute-in-memory macros","author":"Sun X.","year":"2024","unstructured":"X. Sun, W. Cao, B. Crafton, K. Akarvardar, H. Mori, H. Fujiwara, H. Noguchi, Y.-D. Chih, M.-F. Chang, Y. Wang, and T.-Y. J. Chang. 2024. Efficient processing of MLPerf mobile workloads using digital compute-in-memory macros. IEEE Trans. Comput.-Aid. Des. Integ. Circ. Sys. 43, 4 (2024), 1191--1205.","journal-title":"IEEE Trans. Comput.-Aid. Des. Integ. Circ. Sys."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","DOI":"10.1145\/3079856.3080246","article-title":"In-datacenter performance analysis of a tensor processing unit","author":"Jouppi N. P.","year":"2017","unstructured":"N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, and R. Bajwa. 2017. In-datacenter performance analysis of a tensor processing unit. In 44th Annual International Symposium on Computer Architecture.","journal-title":"44th Annual International Symposium on Computer Architecture"},{"key":"e_1_3_1_29_2","article-title":"Ten lessons from three generations shaped Google's TPUv4i: Industrial product","author":"Jouppi N. P.","year":"2021","unstructured":"N. P. Jouppi, D. H. Yoon, M. Ashcraft, M. Gottscho, T. B. Jablin, and G. Kurian. 2021. Ten lessons from three generations shaped Google's TPUv4i: Industrial product. In ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA\u201921).","journal-title":"ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA\u201921)"},{"key":"e_1_3_1_30_2","article-title":"System-level design and integration of a prototype AR\/VR hardware featuring a custom low-power DNN accelerator chip in 7nm technology for codec avatars","author":"Sumbul E. H.","year":"2022","unstructured":"E. H. Sumbul, T. F. Wu, Y. Li, S. S. Sarwar, W. Koven, E. Murphy-Trotzky, X. Cai, E. Ansari, D. H. Morris, H. Liu, D. Kim, and E. Beigne. 2022. System-level design and integration of a prototype AR\/VR hardware featuring a custom low-power DNN accelerator chip in 7nm technology for codec avatars. In IEEE Custom Integrated Circuits Conference (CICC\u201922).","journal-title":"IEEE Custom Integrated Circuits Conference (CICC\u201922)"},{"key":"e_1_3_1_31_2","doi-asserted-by":"crossref","DOI":"10.1145\/3316781.3317874","article-title":"On-chip memory technology design space explorations for mobile deep neural network accelerators","author":"Li H.","year":"2019","unstructured":"H. Li, M. Bhargava, P. N. Whatmough, and H.-S. P. Wong. 2019. On-chip memory technology design space explorations for mobile deep neural network accelerators. In 56th Annual Design Automation Conference (DAC\u201919).","journal-title":"56th Annual Design Automation Conference (DAC\u201919)"},{"key":"e_1_3_1_32_2","article-title":"A 16nm 32Mb embedded STT-MRAM with a 6ns read access time, 1M-cycle write endurance, 20-year retention at 150\u00b0C and MTJ-OTP solutions for magnetic immunity","author":"Lee P.-H.","year":"2023","unstructured":"P.-H. Lee, C.-F. Lee, Y.-C. Shih, H.-J. Lin, Y.-A. Chang, C.-H. Lu, Y.-L. Chen, C.-P. Lo, C.-C. Chen, C.-H. Kuo, T.-L. Chou, C.-Y. Wang, J. Wu, R. Wang, H. Chuang, Y. Wang, Y.-D. Chih, and T.-Y. J. Chang. 2023. A 16nm 32Mb embedded STT-MRAM with a 6ns read access time, 1M-cycle write endurance, 20-year retention at 150\u00b0C and MTJ-OTP solutions for magnetic immunity. In IEEE International Solid-State Circuits Conference (ISSCC\u201923).","journal-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201923)"},{"key":"e_1_3_1_33_2","volume-title":"12th MRAM Global Innovation Forum","author":"Guedj J.","year":"2021","unstructured":"J. Guedj. 2021. A 2MB, 16nm, sub 5ns reads, MRAM-based memory IP core. In 12th MRAM Global Innovation Forum."},{"key":"e_1_3_1_34_2","article-title":"A 20Mb embedded STT-MRAM array achieving 72% write energy reduction with self-termination write schemes in 16nm FinFET logic process","author":"Ito T.","year":"2021","unstructured":"T. Ito, T. Saito, Y. Taito, K. Sonoda, G. Watanabe, K. Matsubara, A. Kanda, T. Shimoi, K. Takeda, and T. Kono. 2021. A 20Mb embedded STT-MRAM array achieving 72% write energy reduction with self-termination write schemes in 16nm FinFET logic process. In IEEE International Electron Devices Meeting (IEDM\u201921).","journal-title":"IEEE International Electron Devices Meeting (IEDM\u201921)"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSSC.2016.2546199"},{"key":"e_1_3_1_36_2","article-title":"28-nm 0.08 mm2\/Mb embedded MRAM for frame buffer memory","author":"Han S.","year":"2020","unstructured":"S. Han, J. Lee, H. Shin, J. Lee, K. Suh, K. Nam, B. Kwon, M. Cho, J. Lee, J. Jeong, J.-H. Park, S. Oh, S.-O. Park, S. Hwang, S. Pyo, H. Jung, Y. Ji, J. Bak, D. Kim, W. Ham, Y. Kim, K. Lee, K. Lee, Y. Song, G.-H. Koh, Y. Hong, and G. Jeong. 2020. 28-nm 0.08 mm2\/Mb embedded MRAM for frame buffer memory. In IEEE International Electron Devices Meeting (IEDM\u201920).","journal-title":"IEEE International Electron Devices Meeting (IEDM\u201920)"},{"key":"e_1_3_1_37_2","article-title":"3D stacked CIS compatible 40nm embedded STT-MRAM for buffer memory","author":"Oka M.","year":"2021","unstructured":"M. Oka, Y. Namba, Y. Sato, H. Uchida, T. Doi, T. Tatsuno, W. Nakazawa, A. Tamura, R. Haga, M. Kuroda, and M. Hosomi. 2021. 3D stacked CIS compatible 40nm embedded STT-MRAM for buffer memory. In Symposium on VLSI Technology.","journal-title":"Symposium on VLSI Technology"},{"issue":"5","key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3358191","article-title":"Achieving lossless accuracy with lossy programming for efficient neural-network training on NVM-based systems","volume":"19","author":"Wang W.-C.","year":"2019","unstructured":"W.-C. Wang, Y.-H. Chang, T.-W. Kuo, C.-C. Ho, Y.-M. Chang, and H.-S. Chang. 2019. Achieving lossless accuracy with lossy programming for efficient neural-network training on NVM-based systems. ACM Trans. Embed. Comput. Syst. 19, 5s (2019), 1\u201322.","journal-title":"ACM Trans. Embed. Comput. Syst."},{"key":"e_1_3_1_39_2","unstructured":"ARM. 2020. Ethos-U55 NPU. Retrieved from https:\/\/developer.arm.com\/Processors\/Ethos-U55"},{"key":"e_1_3_1_40_2","unstructured":"NVIDIA. 2017. NVDLA Primer. Retrieved from http:\/\/nvdla.org\/primer.html"},{"key":"e_1_3_1_41_2","article-title":"FuSeConv: Fully separable convolutions for fast inference on systolic arrays","author":"Selvam S.","year":"2021","unstructured":"S. Selvam, V. Ganesan, and P. Kumar. 2021. FuSeConv: Fully separable convolutions for fast inference on systolic arrays. In Design, Automation & Test in Europe Conference & Exhibition (DATE\u201921).","journal-title":"Design, Automation & Test in Europe Conference & Exhibition (DATE\u201921)"},{"issue":"11","key":"e_1_3_1_42_2","first-page":"2860","article-title":"Heterogeneous systolic array architecture for compact CNNs hardware accelerators","volume":"33","author":"Xu R.","year":"2021","unstructured":"R. Xu, S. Ma, Y. Wang, Y. Guo, D. Li, and Y. Qiao. 2021. Heterogeneous systolic array architecture for compact CNNs hardware accelerators. IEEE Trans. Parallel Distrib. Syst. 33, 11 (2021), 2860\u20132871.","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"issue":"5","key":"e_1_3_1_43_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3497745","article-title":"MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units","volume":"27","author":"Lee S.","year":"2022","unstructured":"S. Lee, J. Choi, W. Jung, B. Kim, J. Park, H. Kim, and J. H. Ahn. 2022. MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units. ACM Trans. Des. Automat. Electron. Syst. 27, 5 (2022), 1\u201325.","journal-title":"ACM Trans. Des. Automat. Electron. Syst."},{"key":"e_1_3_1_44_2","article-title":"A 351TOPS\/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications","author":"Dong Q.","year":"2020","unstructured":"Q. Dong, M. E. Sinangil, B. Erbagci, D. Sun, W.-S. Khwa, H.-J. Liao, Y. Wang, and T.-Y. J. Chang. 2020. A 351TOPS\/W and 372.4 GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications. In IEEE International Solid-State Circuits Conference (ISSCC\u201920).","journal-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201920)"},{"key":"e_1_3_1_45_2","article-title":"A 5-nm 254-TOPS\/W 221-TOPS\/mm 2 fully-digital Computing-in-memory macro supporting Wide-range dynamic-voltage-frequency scaling and simultaneous MAC and write operations","author":"Fujiwara H.","year":"2022","unstructured":"H. Fujiwara, H. Mori, W.-C. Zhao, M.-C. Chuang, R. Naous, C.-K. Chuang, T. Hashizume, D. Sun, C.-F. Lee, K. Akarvardar, S. Adham, T.-L. Chou, M. E. Sinangil, Y. Wang, Y.-D. Chih, Y.-H. Chen, H.-J. Liao, and T.-Y. J. Chang. 2022. A 5-nm 254-TOPS\/W 221-TOPS\/mm 2 fully-digital Computing-in-memory macro supporting Wide-range dynamic-voltage-frequency scaling and simultaneous MAC and write operations. In IEEE International Solid-State Circuits Conference (ISSCC\u201922).","journal-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201922)"},{"key":"e_1_3_1_46_2","article-title":"15.4 a 5.99-to-691.1 TOPS\/W tensor-train in-memory-computing processor using bit-level-sparsity-based optimization and variable-precision quantization","author":"Guo R.","year":"2021","unstructured":"R. Guo, Z. Yue, X. Si, T. Hu, H. Li, L. Tang, Y. Wang, L. Liu, M.-F. Chang, Q. Li, S. Wei, and S. Yin. 2021. 15.4 a 5.99-to-691.1 TOPS\/W tensor-train in-memory-computing processor using bit-level-sparsity-based optimization and variable-precision quantization. In IEEE International Solid-State Circuits Conference (ISSCC\u201921).","journal-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201921)"},{"key":"e_1_3_1_47_2","article-title":"A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB\/s read-and-decryption bandwidth and 25.1-55.1 TOPS\/W 8b MAC for AI operations","author":"Chiu Y.-C.","year":"2022","unstructured":"Y.-C. Chiu, C.-S. Yang, S.-H. Teng, H.-Y. Huang, F.-C. Chang, Y. Wu, and M.-F. Chang. 2022. A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB\/s read-and-decryption bandwidth and 25.1-55.1 TOPS\/W 8b MAC for AI operations. In IEEE International Solid-State Circuits Conference (ISSCC\u201922).","journal-title":"IEEE International Solid-State Circuits Conference (ISSCC\u201922)"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TED.2016.2556709"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TED.2018.2876688"},{"key":"e_1_3_1_50_2","first-page":"594","article-title":"System on integrated chips (SoIC (TM) for 3D heterogeneous integration","author":"Chen M.-F.","year":"2019","unstructured":"M.-F. Chen, F.-C. Chen, W.-C. Chiou, and D. C. Yu. 2019. System on integrated chips (SoIC (TM) for 3D heterogeneous integration. In IEEE 69th Electronic Components and Technology Conference (ECTC\u201919). 594\u2013599.","journal-title":"IEEE 69th Electronic Components and Technology Conference (ECTC\u201919)"},{"key":"e_1_3_1_51_2","article-title":"Pearl: Towards optimization of DNN-accelerators via closed-form analytical representation","author":"Dutt A.","year":"2022","unstructured":"A. Dutt, S. Nandy, and M. M. Sabry. 2022. Pearl: Towards optimization of DNN-accelerators via closed-form analytical representation. In 27th Asia and South Pacific Design Automation Conference (ASP-DAC\u201922).","journal-title":"27th Asia and South Pacific Design Automation Conference (ASP-DAC\u201922)"},{"key":"e_1_3_1_52_2","article-title":"Timeloop: A systematic approach to DNN accelerator evaluation","author":"Parashar A.","year":"2019","unstructured":"A. Parashar, P. Raina, Y. S. Shao, Y.-H. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer. 2019. Timeloop: A systematic approach to DNN accelerator evaluation. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201919).","journal-title":"IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201919)"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3358198"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3059962"},{"key":"e_1_3_1_55_2","article-title":"STONNE: Enabling cycle-level microarchitectural simulation for dnn inference accelerators","author":"Francisco M.-M.","year":"2021","unstructured":"M.-M. Francisco, J. L. Abellan, M. E. Acacio, and T. Krishna. 2021. STONNE: Enabling cycle-level microarchitectural simulation for dnn inference accelerators. In IEEE International Symposium on Workload Characterization (IISWC\u201921).","journal-title":"IEEE International Symposium on Workload Characterization (IISWC\u201921)"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3296957.3173176"},{"key":"e_1_3_1_57_2","article-title":"SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for dnn training","author":"Qin E.","year":"2020","unstructured":"E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna. 2020. SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for dnn training. In IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920).","journal-title":"IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)"},{"key":"e_1_3_1_58_2","article-title":"Sparseloop: An analytical approach to sparse tensor accelerator modeling","author":"Wu Y. N.","year":"2020","unstructured":"Y. N. Wu, P.-A. Tsai, A. Parashar, V. Sze, and J. S. Emer. 2020. Sparseloop: An analytical approach to sparse tensor accelerator modeling. In 55th IEEE\/ACM International Symposium on Microarchitecture.","journal-title":"55th IEEE\/ACM International Symposium on Microarchitecture"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3424669"},{"key":"e_1_3_1_60_2","doi-asserted-by":"crossref","DOI":"10.1145\/3195970.3195997","article-title":"Ares: A framework for quantifying the resilience of deep neural networks","author":"Reagen B.","year":"2018","unstructured":"B. Reagen, U. Gupta, L. Pentecost, P. Whatmough, S. K. Lee, N. Mulholland, D. Brooks, and G.-Y. Wei. 2018. Ares: A framework for quantifying the resilience of deep neural networks. In 55th Annual Design Automation Conference.","journal-title":"55th Annual Design Automation Conference"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3670404","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3670404","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:38Z","timestamp":1750291538000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3670404"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,18]]},"references-count":59,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3670404"],"URL":"https:\/\/doi.org\/10.1145\/3670404","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,18]]},"assertion":[{"value":"2024-01-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}