{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T21:55:19Z","timestamp":1775598919952,"version":"3.50.1"},"reference-count":227,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Recent trends in deep learning (DL) have made hardware accelerators essential for various high-performance computing (HPC) applications, including image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent developments in DL accelerators, focusing on their role in meeting the performance demands of HPC applications. We explore cutting-edge approaches to DL acceleration, covering not only GPU- and TPU-based platforms but also specialized hardware such as FPGA- and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators, and co-processors. This survey also describes accelerators leveraging emerging memory technologies and computing paradigms, including 3D-stacked Processor-In-Memory, non-volatile memories like Resistive RAM and Phase Change Memories used for in-memory computing, as well as Neuromorphic Processing Units, and Multi-Chip Module-based accelerators. Furthermore, we provide insights into emerging quantum-based accelerators and photonics. Finally, this survey categorizes the most influential architectures and technologies from recent years, offering readers a comprehensive perspective on the rapidly evolving field of deep learning acceleration.<\/jats:p>","DOI":"10.1145\/3729215","type":"journal-article","created":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T11:28:26Z","timestamp":1744975706000},"page":"1-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":54,"title":["A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1668-0883","authenticated-orcid":false,"given":"Cristina","family":"Silvano","sequence":"first","affiliation":[{"name":"DEIB, Politecnico di Milano, Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1853-1614","authenticated-orcid":false,"given":"Daniele","family":"Ielmini","sequence":"additional","affiliation":[{"name":"DEIB, Politecnico di Milano, Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0301-4419","authenticated-orcid":false,"given":"Fabrizio","family":"Ferrandi","sequence":"additional","affiliation":[{"name":"DEIB, Politecnico di Milano, Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2630-1509","authenticated-orcid":false,"given":"Leandro","family":"Fiorin","sequence":"additional","affiliation":[{"name":"DEIB, Politecnico di Milano, Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8202-1627","authenticated-orcid":false,"given":"Serena","family":"Curzel","sequence":"additional","affiliation":[{"name":"DEIB, Politecnico di Milano, Milano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8068-3806","authenticated-orcid":false,"given":"Luca","family":"Benini","sequence":"additional","affiliation":[{"name":"Alma Mater Studiorum Universit\u00e0 di Bologna, Bologna, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7924-933X","authenticated-orcid":false,"given":"Francesco","family":"Conti","sequence":"additional","affiliation":[{"name":"Alma Mater Studiorum Universit\u00e0 di Bologna, Bologna, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7495-6895","authenticated-orcid":false,"given":"Angelo","family":"Garofalo","sequence":"additional","affiliation":[{"name":"Alma Mater Studiorum Universit\u00e0 di Bologna, Bologna, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8755-0504","authenticated-orcid":false,"given":"Cristian","family":"Zambelli","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Ferrara, Ferrara, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2301-3838","authenticated-orcid":false,"given":"Enrico","family":"Calore","sequence":"additional","affiliation":[{"name":"INFN, Ferrara, Italy"},{"name":"Universit\u00e0 degli Studi di Ferrara, Ferrara Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0132-9196","authenticated-orcid":false,"given":"Sebastiano","family":"Schifano","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Ferrara, Ferrara Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3129-0664","authenticated-orcid":false,"given":"Maurizio","family":"Palesi","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Catania, Catania, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7452-5828","authenticated-orcid":false,"given":"Giuseppe","family":"Ascia","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Catania, Catania, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0874-7793","authenticated-orcid":false,"given":"Davide","family":"Patti","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Catania, Catania, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7167-9530","authenticated-orcid":false,"given":"Nicola","family":"Petra","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Napoli Federico II, Napoli, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0204-0949","authenticated-orcid":false,"given":"Davide","family":"De Caro","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Napoli Federico II, Napoli, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9762-6522","authenticated-orcid":false,"given":"Luciano","family":"Lavagno","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Torino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4366-1102","authenticated-orcid":false,"given":"Teodoro","family":"Urso","sequence":"additional","affiliation":[{"name":"Politecnico di Torino, Torino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6870-7083","authenticated-orcid":false,"given":"Valeria","family":"Cardellini","sequence":"additional","affiliation":[{"name":"University of Rome Tor Vergata, Roma Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7444-876X","authenticated-orcid":false,"given":"Gian Carlo","family":"Cardarilli","sequence":"additional","affiliation":[{"name":"University of Rome Tor Vergata, Roma Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1144-3707","authenticated-orcid":false,"given":"Robert","family":"Birke","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi di Torino, Torino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1363-9201","authenticated-orcid":false,"given":"Stefania","family":"Perri","sequence":"additional","affiliation":[{"name":"Universit\u00e0 degli Studi della Calabria, Arcavacata di Rende Italy"}]}],"member":"320","published-online":{"date-parts":[[2025,6,13]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1109\/ISSCC42613.2021.9365791","volume-title":"2021 IEEE International Solid-State Circuits Conference (ISSCC)","volume":"64","author":"Agrawal Ankur","year":"2021","unstructured":"Ankur Agrawal, Sae Kyu Lee, Joel Silberman, Matthew Ziegler, Mingu Kang, Swagath Venkataramani, Nianzheng Cao, Bruce Fleischer, Michael Guillorn, Matthew Cohen, Ophir Erez, Thomas Fox, George Gristede, Howard Haynie, Vicktoria Ivanov, Siyu Koswatta, Shih-Hsien Lo, Martin Lutz, Gary Maier, Alex Mesh, Yevgeny Nustov, Scot Rider, Marcel Schaal, Michael Scheuermann, Xiao Sun, Naigang Wang, Fanchieh Yee, Ching Zhou, Vinay Shah, Brian Curran, Vijayalakshmi Srinivasan, Pong-Fei Lu, Sunil Shukla, Kailash Gopalakrishnan, and Leland Chang. 2021. 9.1 A 7nm 4-core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-aware throttling. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 64. IEEE, 144\u2013146. DOI:10.1109\/ISSCC42613.2021.9365791"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2852335"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2474396"},{"key":"e_1_3_3_5_2","volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916)","author":"Albericio Jorge","year":"2016","unstructured":"Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916). 13. DOI:10.1109\/ISCA.2016.11"},{"key":"e_1_3_3_6_2","first-page":"1249","volume-title":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Alves Marco A. Z.","year":"2016","unstructured":"Marco A. Z. Alves, Matthias Diener, Paulo C. Santos, and Luigi Carro. 2016. Large vector extensions inside the HMC. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1249\u20131254."},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1155\/2010\/372652"},{"key":"e_1_3_3_8_2","unstructured":"AMD. 2021. AMD Instinct MI200 series accelerator. (Jan2021). Retrieved May 25 2023 from https:\/\/www.amd.com\/system\/files\/documents\/amd-instinct-mi200-datasheet.pdf"},{"key":"e_1_3_3_9_2","unstructured":"Michael Andersch Greg Palmer Ronny Krashinsky Nick Stam Vishal Mehta Gonzalo Brito and Sridhar Ramaswamy. 2022. NVIDIA Hopper Architecture In-Depth. (Mar2022). Retrieved Apr 16 2023 from https:\/\/developer.nvidia.com\/blog\/nvidia-hopper-architecture-in-depth\/"},{"key":"e_1_3_3_10_2","first-page":"320","volume-title":"2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)","author":"Arunkumar Akhil","year":"2017","unstructured":"Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-chip-module GPUs for continued performance scalability. In 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 320\u2013332."},{"key":"e_1_3_3_11_2","unstructured":"Imad Al Assir Mohamad El Iskandarani Hadi Rayan Al Sandid and Mazen A. R. Saghir. 2021. Arrow: A RISC-V vector accelerator for machine learning inference. arXiv:2107.07169. Retrieved from https:\/\/arxiv.org\/abs\/2107.07169"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1561\/2200000006"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH54963.2022.00010"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature23474"},{"key":"e_1_3_3_15_2","unstructured":"Braket 2023. Quantum Computing Service\u2014Amazon Braket\u2014AWS. (2023). Retrieved from https:\/\/aws.amazon.com\/braket\/"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","unstructured":"Michael Broughton Guillaume Verdon Trevor McCourt Antonio J. Martinez Jae Hyeon Yoo Sergei V. Isakov Philip Massey Ramin Halavati Murphy Yuezhen Niu Alexander Zlokapa Evan Peters Owen Lockwood Andrea Skolik Sofiene Jerbi Vedran Dunjko Martin Leib Michael Streif David Von Dollen Hongxiang Chen Shuxiang Cao Roeland Wiersema Hsin-Yuan Huang Jarrod R. McClean Ryan Babbush Sergio Boixo Dave Bacon Alan K. Ho Hartmut Neven and Masoud Mohseni. 2021. TensorFlow Quantum: A Software Framework for Quantum Machine Learning. (2021). arXiv:2003.02989. DOI:10.48550\/arXiv.2003.02989","DOI":"10.48550\/arXiv.2003.02989"},{"key":"e_1_3_3_17_2","first-page":"1","volume-title":"Design, Automation & Test in Europe Conference & Exhibition, DATE","author":"Bruschi Nazareno","year":"2023","unstructured":"Nazareno Bruschi, Giuseppe Tagliavini, Angelo Garofalo, Francesco Conti, Irem Boybat, Luca Benini, and Davide Rossi. 2023. End-to-end DNN Inference on a massively parallel analog in memory computing architecture. In Design, Automation & Test in Europe Conference & Exhibition, DATE. IEEE, 1\u20136. DOI:10.23919\/DATE56975.2023.10137208"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3122905"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSITechnologyandCir46769.2022.9830509"},{"key":"e_1_3_3_20_2","first-page":"1","volume-title":"2013 International Joint Conference on Neural Networks (IJCNN)","author":"Cassidy Andrew S.","year":"2013","unstructured":"Andrew S. Cassidy, Paul Merolla, John V. Arthur, Steve K. Esser, Bryan Jackson, Rodrigo Alvarez-Icaza, Pallab Datta, Jun Sawada, Theodore M. Wong, Vitaly Feldman, Arnon Amir, Daniel Ben-Dayan Rubin, Filipp Akopyan, Emmett McQuinn, William P. Risk, and Dharmendra S. Modha. 2013. Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores. In 2013 International Joint Conference on Neural Networks (IJCNN). 1\u201310. DOI:10.1109\/IJCNN.2013.6707077"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2019.2950087"},{"key":"e_1_3_3_22_2","first-page":"1","volume-title":"41st IEEE\/ACM International Conference on Computer-Aided Design","author":"Cavalcante Matheus","year":"2022","unstructured":"Matheus Cavalcante, Domenic W\u00fcthrich, Matteo Perotti, Samuel Riedel, and Luca Benini. 2022. Spatz: A compact vector processing unit for high-performance and energy-efficient shared-L1 clusters. In 41st IEEE\/ACM International Conference on Computer-Aided Design. ACM, San Diego California, 1\u20139. DOI:10.1145\/3508352.3549367"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2016.2592330"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2888898"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS52781.2021.9567417"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00016"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2022.3228765"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2996864"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2020.01.007"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.13"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524500"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3256796"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.022071134"},{"key":"e_1_3_3_36_2","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/ISSCC42613.2021.9365803","volume-title":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"64","author":"Choquette Jack","year":"2021","unstructured":"Jack Choquette, Edward Lee, Ronny Krashinsky, Vishnu Balan, and Brucek Khailany. 2021. 3.2 The A100 datacenter GPU and ampere architecture. In 2021 IEEE International Solid- State Circuits Conference (ISSCC), Vol. 64. 48\u201350. DOI:10.1109\/ISSCC42613.2021.9365803"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2021.3120538"},{"key":"e_1_3_3_38_2","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/ISSCC42615.2023.10067643","volume-title":"2023 IEEE International Solid- State Circuits Conference (ISSCC)","author":"Conti Francesco","year":"2023","unstructured":"Francesco Conti, Davide Rossi, Gianna Paulin, Anaelo Garofalo, Alfio Di Mauro, Georg Rutishauer, Gian marco Ottavi, Manuel Eggimann, Hayate Okuhara, Vincent Huard, Olivier Montfort, Lionel Jure, Nils Exibard, Pascal Gouedo, Mathieu Louvat, Emmanuel Botte, and Luca Benini. 2023. 22.1 A 12.4TOPS\/W @ 136GOPS AI-IoT system-on-chip with 16 RISC-V, 2-to-8b precision-scalable DNN acceleration and 30%-boost adaptive body biasing. In 2023 IEEE International Solid- State Circuits Conference (ISSCC). 21\u201323. DOI:10.1109\/ISSCC42615.2023.10067643"},{"key":"e_1_3_3_39_2","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1109\/PDP52278.2021.00041","volume-title":"2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","author":"Cordeiro Aline S.","year":"2021","unstructured":"Aline S. Cordeiro, Sairo R. dos Santos, Francis B. Moreira, Paulo C. Santos, Luigi Carro, and Marco A. Z. Alves. 2021. Machine learning migration for efficient near-data processing. In 2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). 212\u2013219. DOI:10.1109\/PDP52278.2021.00041"},{"key":"e_1_3_3_40_2","unstructured":"D-WAVE 2023. D-Wave Systems - The Practical Quantum Computing Company. (2023). https:\/\/www.dwavesys.com\/"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.022071133"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.112130359"},{"key":"e_1_3_3_43_2","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1145\/3297858.3304041","volume-title":"Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201919)","author":"Lascorz Alberto Delmas","year":"2019","unstructured":"Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, and Andreas Moshovos. 2019. Bit-tactical: A software\/hardware approach to exploiting value and bit sparsity in neural networks. In Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201919). 749\u2013763. DOI:10.1145\/3297858.3304041"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00024"},{"key":"e_1_3_3_45_2","volume-title":"Proceedings of the 55th Annual Design Automation Conference (DAC\u201918)","author":"Deng Quan","year":"2018","unstructured":"Quan Deng, Lei Jiang, Youtao Zhang, Minxuan Zhang, and Jun Yang. 2018. DrAcc: A DRAM based accelerator for accurate CNN inference. In Proceedings of the 55th Annual Design Automation Conference (DAC\u201918). Association for Computing Machinery, New York, NY, USA, Article 168, 6 pages. DOI:10.1145\/3195970.3196029"},{"key":"e_1_3_3_46_2","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1109\/ISSCC42615.2023.10067422","volume-title":"2023 IEEE International Solid-State Circuits Conference (ISSCC)","author":"Desoli Giuseppe","year":"2023","unstructured":"Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Manui Avodhyawasi, Harsh Rawat, Hitesh Chawla, VS Abhijith, Paolo Zambotti, Akhilesh Sharma, Carmine Cappetta, Michele Rossi, Antonio De Vita, and Francesca Girardi. 2023. A 40-310TOPS\/W SRAM-based all-digital up to 4b in-memory computing multi-tiled NN accelerator in FD-SOI 18nm for deep-learning edge applications. In 2023 IEEE International Solid-State Circuits Conference (ISSCC). 260\u2013262. DOI:10.1109\/ISSCC42615.2023.10067422"},{"key":"e_1_3_3_47_2","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1109\/ISSCC.2017.7870349","volume-title":"2017 IEEE International Solid-State Circuits Conference (ISSCC)","author":"Desoli Giuseppe","year":"2017","unstructured":"Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. A 2.9TOPS\/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In 2017 IEEE International Solid-State Circuits Conference (ISSCC). 238\u2013239. DOI:10.1109\/ISSCC.2017.7870349"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3229767"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","unstructured":"Alfio Di Mauro Moritz Scherer Davide Rossi and Luca Benini. 2022. Kraken: A direct event\/frame-based multi-sensor fusion SoC for ultra-efficient visual processing in Nano-UAVs. In 2022 IEEE Hot Chips 34 Symposium (HCS). 1\u201319. DOI:10.1109\/HCS55958.2022.9895621","DOI":"10.1109\/HCS55958.2022.9895621"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2022.3140674"},{"key":"e_1_3_3_51_2","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1109\/ISSCC49657.2024.10454572","volume-title":"2024 IEEE International Solid-State Circuits Conference (ISSCC)","volume":"67","author":"Santos Maico Cassel Dos","year":"2024","unstructured":"Maico Cassel Dos Santos, Tianyu Jia, Joseph Zuckerman, Martin Cochet, Davide Giri, Erik Jens Loscalzo, Karthik Swaminathan, Thierry Tambe, Jeff Jun Zhang, Alper Buyuktosunoglu, Kuan-Lin Chiu, Giuseppe Di Guglielmo, Paolo Mantovani, Luca Piccolboni, Gabriele Tombesi, David Trilla, John-David Wellman, En-Yu Yang, Aporva Amarnath, Ying Jing, Bakshree Mishra, Joshua Park, Vignesh Suresh, Sarita Adve, Pradip Bose, David Brooks, Luca P. Carloni, Kenneth L. Shepard, and Gu-Yeon Wei. 2024. 14.5 A 12nm Linux-SMP-Capable RISC-V SoC with 14 accelerator types, distributed hardware power management and flexible noc-based data orchestration. In 2024 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 67. 262\u2013264. DOI:10.1109\/ISSCC49657.2024.10454572"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2017.2735490"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2022.3163817"},{"issue":"120","key":"e_1_3_3_54_2","first-page":"1","article-title":"Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity","volume":"23","author":"Fedus William","year":"2022","unstructured":"William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research 23, 120 (2022), 1\u201339. Retrieved from http:\/\/jmlr.org\/papers\/v23\/21-0998.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.3211117"},{"key":"e_1_3_3_56_2","first-page":"1","volume-title":"ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)","author":"Fowers Jeremy","year":"2018","unstructured":"Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M. Caulfield, Eric S. Chung, and Doug Burger. 2018. A configurable cloud-scale DNN processor for real-time AI. In ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 1\u201314."},{"key":"e_1_3_3_57_2","article-title":"Bottom-up and top-down neural processing systems design: Neuromorphic intelligence as the convergence of natural and artificial intelligence","volume":"2106","author":"Frenkel Charlotte","year":"2021","unstructured":"Charlotte Frenkel, David Bol, and Giacomo Indiveri. 2021. Bottom-up and top-down neural processing systems design: Neuromorphic intelligence as the convergence of natural and artificial intelligence. CoRR abs\/2106.01288 (2021).","journal-title":"CoRR"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBCAS.2018.2880425"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBCAS.2019.2928793"},{"key":"e_1_3_3_60_2","doi-asserted-by":"crossref","unstructured":"Manuel Le Gallo Riduan Khaddam-Aljameh Milos Stanisavljevic Athanasios Vasilopoulos Benedikt Kersting Martino Dazzi Geethan Karunaratne Matthias Braendli Abhairaj Singh Silvia M Mueller et\u00a0al. 2022. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. arXiv:2212.02872. Retrieved from https:\/\/arxiv.org\/abs\/2212.02872","DOI":"10.1038\/s41928-023-01010-1"},{"key":"e_1_3_3_61_2","first-page":"1","volume-title":"2023 IEEE Custom Integrated Circuits Conference (CICC)","author":"Gao Fei","year":"2023","unstructured":"Fei Gao, Ting-Jung Chang, Ang Li, Marcelo Orenes-Vera, Davide Giri, Paul J. Jackson, August Ning, Georgios Tziantzioulis, Joseph Zuckerman, Jinzheng Tu, Kaifeng Xu, Grigory Chirkov, Gabriele Tombesi, Jonathan Balkind, Margaret Martonosi, Luca Carloni, and David Wentzlaff. 2023. DECADES: A 67mm2, 1.46TOPS, 55 giga cache-coherent 64-Bit RISC-V instructions per second, heterogeneous manycore SoC with 109 Tiles Including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET. In 2023 IEEE Custom Integrated Circuits Conference (CICC). 1\u20132. DOI:10.1109\/CICC57935.2023.10121257"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3571157"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093337.3037702"},{"key":"e_1_3_3_64_2","first-page":"807","volume-title":"Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201919)","author":"Gao Mingyu","year":"2019","unstructured":"Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. 2019. TANGRAM: Optimized coarse-grained dataflow for scalable NN accelerators. In Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201919). Association for Computing Machinery, New York, NY, USA, 807\u2013820. DOI:10.1145\/3297858.3304014"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2022.3170152"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJSSCS.2022.3210082"},{"key":"e_1_3_3_67_2","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1109\/DAC18074.2021.9586216","volume-title":"2021 58th ACM\/IEEE Design Automation Conference (DAC)","author":"Genc Hasan","year":"2021","unstructured":"Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Steffl, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, and Yakun Sophia Shao. 2021. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In 2021 58th ACM\/IEEE Design Automation Conference (DAC). 769\u2013774. DOI:10.1109\/DAC18074.2021.9586216"},{"key":"e_1_3_3_68_2","unstructured":"Amir Gholami Sehoon Kim Zhen Dong Zhewei Yao Michael W. Mahoney and Kurt Keutzer. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference. (2021). arXiv:2103.13630. Retrieved from https:\/\/arxiv.org\/abs\/2103.13630"},{"key":"e_1_3_3_69_2","first-page":"1049","volume-title":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Giri Davide","year":"2020","unstructured":"Davide Giri, Kuan-Lin Chiu, Giuseppe Di Guglielmo, Paolo Mantovani, and Luca P. Carloni. 2020. ESP4ML: Platform-based design of systems-on-chip for embedded machine learning. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1049\u20131054. DOI:10.23919\/DATE48585.2020.9116317"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3073893"},{"key":"e_1_3_3_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS46773.2023.10181809"},{"key":"e_1_3_3_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358291"},{"key":"e_1_3_3_73_2","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1109\/ESSCIRC53450.2021.9567768","volume-title":"ESSCIRC 2021 - IEEE 47th European Solid State Circuits Conference (ESSCIRC)","author":"Gonzalez Abraham","year":"2021","unstructured":"Abraham Gonzalez, Jerry Zhao, Ben Korpan, Hasan Genc, Colin Schmidt, John Wright, Ayan Biswas, Alon Amid, Farhana Sheikh, Anton Sorokin, Sirisha Kale, Mani Yalamanchi, Ramya Yarlagadda, Mark Flannigan, Larry Abramowitz, Elad Alon, Yakun Sophia Shao, Krste Asanovic, and Borivoje Nikolic. 2021. A 16mm 2 106.1 GOPS\/W heterogeneous RISC-V multi-core multi-accelerator soc in low-power 22nm FinFET. In ESSCIRC 2021 - IEEE 47th European Solid State Circuits Conference (ESSCIRC). IEEE, Grenoble, France, 259\u2013262. DOI:10.1109\/ESSCIRC53450.2021.9567768"},{"key":"e_1_3_3_74_2","volume-title":"Deep Learning","author":"Goodfellow Ian","year":"2016","unstructured":"Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http:\/\/www.deeplearningbook.org."},{"key":"e_1_3_3_75_2","unstructured":"GreenWaves Technologies GAP9 Processor. 2023. Retrieved from https:\/\/greenwaves-technologies.com\/gap9_processor\/. (2023). Accessed: 2023-04-18."},{"key":"e_1_3_3_76_2","volume-title":"International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201920)","author":"Guo Cong","year":"2020","unstructured":"Cong Guo, Bo Yang Hsueh, Jingwen Leng, Yuxian Qiu, Yue Guan, Zehuan Wang, Xiaoying Jia, Xipeng Li, Minyi Guo, and Yuhao Zhu. 2020. Accelerating sparse DNN models without hardware-support via tile-wise sparsity. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC\u201920). Article 16, 15 pages."},{"key":"e_1_3_3_77_2","unstructured":"K. Guo W. Li K. Zhong Z. Zhu S. Zeng S. Han Y. Xie P. Debacker M. Verhelst and Y. Wang. 2023. Neural Network Accelerator Comparison. (2023). Retrieved May 9 2024 from https:\/\/nicsefc.ee.tsinghua.edu.cn\/project.html"},{"key":"e_1_3_3_78_2","first-page":"1737","volume-title":"32nd International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"37","author":"Gupta Suyog","year":"2015","unstructured":"Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. 1737\u20131746."},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2024.3383964"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2018.11.002"},{"key":"e_1_3_3_81_2","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1109\/ISSCC49657.2024.10454395","volume-title":"2024 IEEE International Solid-State Circuits Conference (ISSCC)","volume":"67","author":"Hager Pascal Alexander","year":"2024","unstructured":"Pascal Alexander Hager, Bert Moons, Stefan Cosemans, Ioannis A. Papistas, Bram Rooseleer, Jeroen Van Loon, Roel Uytterhoeven, Florian Zaruba, Spyridoula Koumousi, Milos Stanisavljevic, Stefan Mach, Sebastiaan Mutsaards, Riduan Khaddam Aljameh, Gua Hao Khov, Brecht Machiels, Cristian Olar, Anastasios Psarras, Sander Geursen, Jeroen Vermeeren, Yi Lu, Abhishek Maringanti, Deepak Ameta, Leonidas Katselas, Noah H\u00fctter, Manuel Schmuck, Swetha Sivadas, Karishma Sharma, Manuel Oliveira, Ramon Aerne, Nitish Sharma, Timir Soni, Beatrice Bussolino, Djordje Pesut, Michele Pallaro, Andrei Podlesnii, Alexios Lyrakis, Yannick Ruiner, Martino Dazzi, Johannes Thiele, Koen Goetschalckx, Nazareno Bruschi, Jonas Doevenspeck, Bram Verhoef, Stefan Linz, Giuseppe Garcea, Jonathan Ferguson, Ioannis Koltsidas, and Evangelos Eleftheriou. 2024. 11.3 Metis AIPU: A 12nm 15TOPS\/W 209.6TOPS SoC for Cost- and energy-efficient inference at the edge. In 2024 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 67. 212\u2013214. DOI:10.1109\/ISSCC49657.2024.10454395"},{"key":"e_1_3_3_82_2","first-page":"329","volume-title":"Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201921)","author":"Hajinazar Nastaran","year":"2021","unstructured":"Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Jo\u00e3o Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan G\u00f3mez-Luna, and Onur Mutlu. 2021. SIMDRAM: A framework for bit-serial SIMD processing using DRAM. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS\u201921). Association for Computing Machinery, New York, NY, USA, 329\u2013345. DOI:10.1145\/3445814.3446749"},{"key":"e_1_3_3_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.3390\/make4010004"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00040"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00062"},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2022.3214064"},{"key":"e_1_3_3_88_2","unstructured":"HPCWIRE 2022. Quantum computers emerging as accelerators in HPC. (2022). https:\/\/www.hpcwire.com\/2022\/06\/07\/quantum-computers-emerging-as-accelerators-in-hpc\/"},{"key":"e_1_3_3_89_2","unstructured":"Intel. Lunar Lake processor specifications. (n.d.). Retrieved Jun 29 2024 from https:\/\/download.intel.com\/newsroom\/2024\/client-computing\/Lunar-Lake-Architecture-Fact-Sheet.pd"},{"key":"e_1_3_3_90_2","unstructured":"Intel. 2022. Intel Arc A770 Graphics 16GB. (Jul2022). Retrieved May 25 2023 from https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/229151\/intel-arc-a770-graphics-16gb.html"},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2023.3236566"},{"key":"e_1_3_3_92_2","first-page":"269","volume-title":"ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)","author":"Jia Tianyu","year":"2022","unstructured":"Tianyu Jia, Paolo Mantovani, Maico Cassel Dos Santos, Davide Giri, Joseph Zuckerman, Erik Jens Loscalzo, Martin Cochet, Karthik Swaminathan, Gabriele Tombesi, Jeff Jun Zhang, Nandhini Chandramoorthy, John-David Wellman, Kevin Tien, Luca Carloni, Kenneth Shepard, David Brooks, Gu-Yeon Wei, and Pradip Bose. 2022. A 12nm agile-designed soc for swarm-based perception with heterogeneous IP blocks, a reconfigurable memory hierarchy, and an 800MHz multi-plane NoC. In ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC). 269\u2013272. DOI:10.1109\/ESSCIRC55480.2022.9911456"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","unstructured":"Zhe Jia Blake Tillman Marco Maggioni and Daniele Paolo Scarpazza. 2019. Dissecting the Graphcore IPU Architecture via Microbenchmarking. (Dec.2019). arXiv:1912.03413. DOI:10.48550\/arXiv.1912.03413","DOI":"10.48550\/arXiv.1912.03413"},{"key":"e_1_3_3_94_2","doi-asserted-by":"crossref","first-page":"1565","DOI":"10.1109\/SMC52423.2021.9658643","volume-title":"2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","author":"Jiao Qiang","year":"2021","unstructured":"Qiang Jiao, Wei Hu, Fang Liu, and Yong Dong. 2021. RISC-VTF: RISC-V based extended instruction set for transformer. In 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 1565\u20131570. DOI:10.1109\/SMC52423.2021.9658643"},{"key":"e_1_3_3_95_2","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1109\/ISSCC19947.2020.9062984","volume-title":"2020 IEEE International Solid- State Circuits Conference - (ISSCC)","author":"Jiao Yang","year":"2020","unstructured":"Yang Jiao, Liang Han, Rong Jin, Yi-Jung Su, Chiente Ho, Li Yin, Yun Li, Long Chen, Zhen Chen, Lu Liu, Zhuyu He, Yu Yan, Jun He, Jun Mao, Xiaotao Zai, Xuejun Wu, Yongquan Zhou, Mingqiu Gu, Guocai Zhu, Rong Zhong, Wenyuan Lee, Ping Chen, Yiping Chen, Weiliang Li, Deyu Xiao, Qing Yan, Mingyuan Zhuang, Jiejun Chen, Yun Tian, Yingzi Lin, Wei Wu, Hao Li, and Zesheng Dou. 2020. A 12nm programmable convolution-efficient neural-processing-unit chip achieving 825TOPS. In 2020 IEEE International Solid- State Circuits Conference - (ISSCC). 136\u2013140. DOI:10.1109\/ISSCC19947.2020.9062984"},{"key":"e_1_3_3_96_2","volume-title":"International Conference on Learning Representations","author":"Jin Qing","year":"2022","unstructured":"Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, and Sergey Tulyakov. 2022. F8Net: Fixed-point 8-bit only multiplication for network quantization. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=_CfpJazzXT2"},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589350"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.032271057"},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/3360307"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2022.3214170"},{"issue":"4","key":"e_1_3_3_102_2","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1109\/JSSC.2022.3140414","article-title":"HERMES-core\u2013A 1.59-TOPS\/mm 2 PCM on 14-nm CMOS in-memory compute core using 300-ps\/LSB linearized CCO-based ADCs","volume":"57","author":"Khaddam-Aljameh Riduan","year":"2022","unstructured":"Riduan Khaddam-Aljameh, Milos Stanisavljevic, Jordi Fornt Mas, Geethan Karunaratne, Matthias Br\u00e4ndli, Feng Liu, Abhairaj Singh, Silvia M M\u00fcller, Urs Egger, Anastasios Petropoulos, Theodore Antonakopoulos, Kevin Brew, Samuel Choi, Injo Ok, Fee Li Lie, Nicole Saulnier, Victor Chan, Ishtiaq Ahsan, Vijay Narayanan, S. R. Nandakumar, Manuel Le Gallo, Pier Andrea Francese, Abu Sebastian, and Evangelos Eleftheriou. 2022. HERMES-core\u2013A 1.59-TOPS\/mm 2 PCM on 14-nm CMOS in-memory compute core using 300-ps\/LSB linearized CCO-based ADCs. IEEE Journal of Solid-State Circuits 57, 4 (2022), 1027\u20131038.","journal-title":"IEEE Journal of Solid-State Circuits"},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2018.11.012"},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.41"},{"key":"e_1_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2022.3160455"},{"key":"e_1_3_3_106_2","unstructured":"Sehoon Kim Coleman Hooper Thanakul Wattanawong Minwoo Kang Ruohan Yan Hasan Genc Grace Dinh Qijing Huang Kurt Keutzer Michael W. Mahoney Yakun Sophia Shao and Amir Gholami. 2023. Full Stack Optimization of Transformer Inference: a Survey. (2023). arxiv:cs.CL\/2302.14017https:\/\/arxiv.org\/abs\/2302.14017"},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSITechnologyandCir46769.2022.9830276"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2018.00941"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS52781.2021.9567075"},{"key":"e_1_3_3_110_2","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems 25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1097\u20131105. Retrieved from http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf"},{"key":"e_1_3_3_111_2","volume-title":"2023 International Conference on Electronics, Information, and Communication (ICEIC)","author":"Kwon Youngsu","year":"2023","unstructured":"Youngsu Kwon, Jinho Han, Yongcheol Peter Cho, Juyeob Kim, Jaehoon Chung, Jaewoong Choi, Sujin Park, Igyeong Kim, Hyunjeong Kwon, Jinkyu Kim, Hyunmi Kim, Won Jeon, Youngdeuk Jeon, Minhyung Cho, and Minseok Choi. 2023. Chiplet heterogeneous-integration AI processor. In 2023 International Conference on Electronics, Information, and Communication (ICEIC)."},{"key":"e_1_3_3_112_2","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1109\/ISSCC42613.2021.9365862","volume-title":"2021 IEEE International Solid-State Circuits Conference (ISSCC)","volume":"64","author":"Kwon Young-Cheon","year":"2021","unstructured":"Young-Cheon Kwon, Suk Han Lee, Jaehoon Lee, Sang-Hyuk Kwon, Je Min Ryu, Jong-Pil Son, O Seongil, Hak-Soo Yu, Haesuk Lee, Soo Young Kim, Youngmin Cho, Jin Guk Kim, Jongyoon Choi, Hyun-Sung Shin, Jin Kim, BengSeng Phuah, HyoungMin Kim, Myeong Jun Song, Ahn Choi, Daeho Kim, SooYoung Kim, Eun-Bong Kim, David Wang, Shinhaeng Kang, Yuhwan Ro, Seungwoo Seo, JoonHo Song, Jaeyoun Youn, Kyomin Sohn, and Nam Sung Kim. 2021. 25.4 A 20nm 6GB function-in-memory DRAM, based on HBM2 with a 1.2TFLOPS programmable computing unit using bank-level parallelism, for machine learning applications. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 64. 350\u2013352. DOI:10.1109\/ISSCC42613.2021.9365862"},{"key":"e_1_3_3_113_2","first-page":"410","volume-title":"2021 IEEE 23rd Electronics Packaging Technology Conference (EPTC)","author":"Lan Jingjing","year":"2021","unstructured":"Jingjing Lan, Vishnu P. Nambiar, Rheeshaalaen Sabapathy, Mihai Dragos Rotaru, and Anh Tuan Do. 2021. Chiplet-based architecture design for multi-core neuromorphic processor. In 2021 IEEE 23rd Electronics Packaging Technology Conference (EPTC). 410\u2013412. DOI:10.1109\/EPTC53413.2021.9663898"},{"key":"e_1_3_3_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00063"},{"issue":"7553","key":"e_1_3_3_115_2","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.","journal-title":"Nature"},{"key":"e_1_3_3_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_3_117_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2021.3120113"},{"key":"e_1_3_3_118_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJSSCS.2022.3216798"},{"key":"e_1_3_3_119_2","first-page":"199","volume-title":"ESSCIRC 2014-40th European Solid State Circuits Conference (ESSCIRC)","author":"Lee Yunsup","year":"2014","unstructured":"Yunsup Lee, Andrew Waterman, Rimas Avizienis, Henry Cook, Chen Sun, Vladimir Stojanovi\u0107, and Krste Asanovi\u0107. 2014. A 45nm 1.3 GHz 16.7 double-precision GFLOPS\/W RISC-V processor with vector accelerators. In ESSCIRC 2014-40th European Solid State Circuits Conference (ESSCIRC). IEEE, 199\u2013202."},{"key":"e_1_3_3_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSSC.2023.3303111"},{"key":"e_1_3_3_121_2","article-title":"Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA","volume":"2105","author":"Li Gang","year":"2021","unstructured":"Gang Li, Zejian Liu, Fanrong Li, and Jian Cheng. 2021. Block convolution: Towards memory-efficient inference of large-scale CNNs on FPGA. CoRR abs\/2105.08937 (2021). arXiv:2105.08937https:\/\/arxiv.org\/abs\/2105.08937","journal-title":"CoRR"},{"key":"e_1_3_3_122_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2019.2924215"},{"key":"e_1_3_3_123_2","first-page":"288","volume-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50\u201917)","author":"Li Shuangchen","year":"2017","unstructured":"Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, and Yuan Xie. 2017. DRISA: A DRAM-based reconfigurable in-situ accelerator. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50\u201917). Association for Computing Machinery, New York, NY, USA, 288\u2013301. DOI:10.1145\/3123939.3123977"},{"issue":"10","key":"e_1_3_3_124_2","first-page":"2332","article-title":"SPRINT: A high-performance, energy-efficient, and scalable chiplet-based accelerator with photonic interconnects for CNN inference","volume":"33","author":"Li Yuan","year":"2021","unstructured":"Yuan Li, Ahmed Louri, and Avinash Karanth. 2021. SPRINT: A high-performance, energy-efficient, and scalable chiplet-based accelerator with photonic interconnects for CNN inference. IEEE Transactions on Parallel and Distributed Systems 33, 10 (2021), 2332\u20132345.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_3_125_2","first-page":"134","volume-title":"2020 IEEE International Solid- State Circuits Conference - (ISSCC)","author":"Lin Chien-Hung","year":"2020","unstructured":"Chien-Hung Lin, Chih-Chung Cheng, Yi-Min Tsai, Sheng-Je Hung, Yu-Ting Kuo, Perry H Wang, Pei-Kuei Tsung, Jeng-Yun Hsu, Wei-Chih Lai, Chia-Hung Liu, Shao-Yu Wang, Chin-Hua Kuo, Chih-Yu Chang, Ming-Hsien Lee, Tsung-Yao Lin, and Chih-Cheng Chen. 2020. A 3.4-to-13.3TOPS\/W 3.6TOPS dual-core deep-learning accelerator for versatile AI applications in 7nm 5G smartphone SoC. In 2020 IEEE International Solid- State Circuits Conference - (ISSCC). 134\u2013136. DOI:10.1109\/ISSCC19947.2020.9063111"},{"key":"e_1_3_3_126_2","volume-title":"2019 Symposium on VLSI Circuits","author":"Lin Mu-Shan","year":"2019","unstructured":"Mu-Shan Lin, Tze-Chiang Huang, Chien-Chun Tsai, King-Ho Tam, Cheng-Hsiang Hsieh, Tom Chen, Wen-Hung Huang, Jack Hu, Yu-Chi Chen, Sandeep Kumar Goel, Chin-Ming Fu, Stefan Rusu, Chao-Chieh Li, Sheng-Yao Yang, Mei Wong, Shu-Chun Yang, and Frank Lee. 2019. A 7nm 4GHz Arm-core-based CoWoS chiplet design for high performance computing. In 2019 Symposium on VLSI Circuits."},{"key":"e_1_3_3_127_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2018.00840"},{"key":"e_1_3_3_128_2","doi-asserted-by":"publisher","DOI":"10.1145\/3357375"},{"key":"e_1_3_3_129_2","first-page":"1","volume-title":"2015 52nd ACM\/EDAC\/IEEE Design Automation Conference (DAC)","author":"Liu Xiaoxiao","year":"2015","unstructured":"Xiaoxiao Liu, Mengjie Mao, Beiye Liu, Hai Li, Yiran Chen, Boxun Li, Yu Wang, Hao Jiang, Mark Barnell, Qing Wu, and Jianhua Yang. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In 2015 52nd ACM\/EDAC\/IEEE Design Automation Conference (DAC). 1\u20136. DOI:10.1145\/2744769.2744900"},{"key":"e_1_3_3_130_2","first-page":"4932","volume-title":"2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Liu Zechun","year":"2022","unstructured":"Zechun Liu, Kwang-Ting Cheng, Dong Huang, Eric Xing, and Zhiqiang Shen. 2022. Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation. In 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4932\u20134942. DOI:10.1109\/CVPR52688.2022.00489"},{"key":"e_1_3_3_131_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2018.2815603"},{"key":"e_1_3_3_132_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2022.104441"},{"key":"e_1_3_3_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2022.3187199"},{"key":"e_1_3_3_134_2","volume-title":"2020 IEEE Symposium on VLSI Technology","author":"Martinez P. Y.","year":"2020","unstructured":"P. Y. Martinez, Y. Beilliard, M. Godard, D. Danovitch, D. Drouin, J. Charbonnier, P. Coudrain, A. Garnier, D. Lattard, P. Vivet, S. Cheramy, E. Guthmuller, C. Fuguet Tortolero, V. Mengue, J. Durupt, A. Philippe, and D. Dutoit. 2020. ExaNoDe: Combined integration of chiplets on active interposer with bare dice in a multi-chip-module for heterogeneous and scalable high performance compute nodes. In 2020 IEEE Symposium on VLSI Technology."},{"key":"e_1_3_3_135_2","first-page":"336","article-title":"Mlperf training benchmark","volume":"2","author":"Mattson Peter","year":"2020","unstructured":"Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, et\u00a0al. 2020. Mlperf training benchmark. Proceedings of Machine Learning and Systems 2 (2020), 336\u2013349.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_3_136_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2974843"},{"key":"e_1_3_3_137_2","doi-asserted-by":"publisher","DOI":"10.1109\/HOTCHIPS.2019.8875670"},{"key":"e_1_3_3_138_2","first-page":"394","volume-title":"24th Asia and South Pacific Design Automation Conference","author":"Min Chuhan","year":"2019","unstructured":"Chuhan Min, Jiachen Mao, Hai Li, and Yiran Chen. 2019. NeuralHMC: An efficient HMC-based accelerator for deep neural networks. In 24th Asia and South Pacific Design Automation Conference. ACM, 394\u2013399. DOI:10.1145\/3287624.3287642"},{"key":"e_1_3_3_139_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575861"},{"key":"e_1_3_3_140_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2022.3198505"},{"key":"e_1_3_3_141_2","unstructured":"Asit K. Mishra Jorge Albericio Latorre Jeff Pool Darko Stosic Dusan Stosic Ganesh Venkatesh Chong Yu and Paulius Micikevicius. 2021. Accelerating sparse deep neural networks. arXiv:2104.08378. Retrieved from https:\/\/arxiv.org\/abs\/2104.08378"},{"key":"e_1_3_3_142_2","first-page":"1","volume-title":"2016 IEEE Aerospace Conference","author":"Mounce Gabriel","year":"2016","unstructured":"Gabriel Mounce, Jim Lyke, Stephen Horan, Wes Powell, Rich Doyle, and Rafi Some. 2016. Chiplet based approach for heterogeneous processing and packaging architectures. In 2016 IEEE Aerospace Conference. 1\u201312."},{"key":"e_1_3_3_143_2","first-page":"252","volume-title":"28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023)","author":"Mart\u00ednez Francisco Mu\u00f1oz","year":"2023","unstructured":"Francisco Mu\u00f1oz Mart\u00ednez, Raveesh Garg, Michael Pellauer, Jos\u00e9 L. Abell\u00e1n, Manuel E. Acacio, and Tushar Krishna. 2023. Flexagon: A multi-dataflow sparse-sparse matrix multiplication accelerator for efficient DNN processing. In 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). 252\u2013265. DOI:10.1145\/3582016.3582069"},{"key":"e_1_3_3_144_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI59464.2023.10238679"},{"key":"e_1_3_3_145_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTQE.2019.2941485"},{"key":"e_1_3_3_146_2","doi-asserted-by":"publisher","DOI":"10.1109\/TED.2021.3115993"},{"key":"e_1_3_3_147_2","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1109\/FCCM.2019.00035","volume-title":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","author":"Nurvitadhi Eriko","year":"2019","unstructured":"Eriko Nurvitadhi, Dongup Kwon, Ali Jafari, Andrew Boutros, Jaewoong Sim, Phillip Tomson, Huseyin Sumbul, Gregory Chen, Phil Knag, Raghavan Kumar, Ram Krishnamurthy, Sergey Gribok, Bogdan Pasca, Martin Langhammer, Debbie Marr, and Aravind Dasu. 2019. Why compete when you can work together: FPGA-ASIC integration for persistent RNNs. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 199\u2013207."},{"key":"e_1_3_3_148_2","doi-asserted-by":"publisher","DOI":"10.1109\/VLSICircuits18222.2020.9162917"},{"key":"e_1_3_3_149_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2023.3254810"},{"key":"e_1_3_3_150_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2013.2259038"},{"key":"e_1_3_3_151_2","first-page":"27","volume-title":"Proceedings of the 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA\u201917)","author":"Parashar Angshuman","year":"2017","unstructured":"Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA\u201917). 27\u201340. DOI:10.1145\/3079856.3080254"},{"key":"e_1_3_3_152_2","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1109\/ISSCC42613.2021.9365928","volume-title":"2021 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"64","author":"Park Jun-Seok","year":"2021","unstructured":"Jun-Seok Park, Jun-Woo Jang, Heonsoo Lee, Dongwoo Lee, Sehwan Lee, Hanwoong Jung, Seungwon Lee, Suknam Kwon, Kyungah Jeong, Joon-Ho Song, SukHwan Lim, and Inyup Kang. 2021. A 6K-MAC feature-map-sparsity-aware neural processing unit in 5nm flagship mobile soc. In 2021 IEEE International Solid- State Circuits Conference (ISSCC), Vol. 64. 152\u2013154. DOI:10.1109\/ISSCC42613.2021.9365928"},{"key":"e_1_3_3_153_2","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1109\/ISSCC42614.2022.9731639","volume-title":"2022 IEEE International Solid-State Circuits Conference (ISSCC)","volume":"65","author":"Park Jun-Seok","year":"2022","unstructured":"Jun-Seok Park, Changsoo Park, Suknam Kwon, Hyeong-Seok Kim, Taeho Jeon, Yesung Kang, Heonsoo Lee, Dongwoo Lee, James Kim, YoungJong Lee, Sangkyu Park, Jun-Woo Jang, SangHyuck Ha, MinSeong Kim, Jihoon Bang, Suk Hwan Lim, and Inyup Kang. 2022. A multi-mode 8K-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4nm flagship mobile soc. In 2022 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 65. 246\u2013248. DOI:10.1109\/ISSCC42614.2022.9731639"},{"key":"e_1_3_3_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBCAS.2015.2504563"},{"key":"e_1_3_3_155_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2021.3093242"},{"key":"e_1_3_3_156_2","doi-asserted-by":"publisher","unstructured":"Gianna Paulin Paul Scheffler Thomas Benz Matheus Cavalcante Tim Fischer Manuel Eggimann Yichao Zhang Nils Wistoff Luca Bertaccini Luca Colagrande Gianmarco Ottavi Frank K. G\u00fcrkaynak Davide Rossi and Luca Benini. 2024. Occamy: A 432-Core 28.1 DP-GFLOP\/s\/W 83% FPU Utilization Dual-Chiplet Dual-HBM2E RISC-V-Based accelerator for stencil and sparse linear algebra computations with 8-to-64-bit floating-point support in 12nm FinFET. In 2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). 12. DOI:10.1109\/VLSITechnologyandCir46783.2024.10631529","DOI":"10.1109\/VLSITechnologyandCir46783.2024.10631529"},{"key":"e_1_3_3_157_2","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1109\/ICCD.2013.6657019","volume-title":"2013 IEEE 31st International Conference on Computer Design (ICCD)","author":"Peemen Maurice","year":"2013","unstructured":"Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In 2013 IEEE 31st International Conference on Computer Design (ICCD). 13\u201319. DOI:10.1109\/ICCD.2013.6657019"},{"key":"e_1_3_3_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2023.3292579"},{"key":"e_1_3_3_159_2","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/ASAP54787.2022.00017","volume-title":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","author":"Perotti Matteo","year":"2022","unstructured":"Matteo Perotti, Matheus Cavalcante, Nils Wistoff, Renzo Andri, Lukas Cavigelli, and Luca Benini. 2022. A \u201cNew Ara\u201d for vector computing: An open source highly efficient RISC-V V 1.0 vector processor design. In 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP). 43\u201351. DOI:10.1109\/ASAP54787.2022.00017"},{"key":"e_1_3_3_160_2","doi-asserted-by":"publisher","DOI":"10.3390\/jimaging6090085"},{"issue":"4","key":"e_1_3_3_161_2","doi-asserted-by":"crossref","first-page":"1013","DOI":"10.1109\/JSSC.2022.3140753","article-title":"CHIMERA: A 0.92-TOPS, 2.2-TOPS\/W edge AI accelerator with 2-MByte on-chip foundry resistive RAM for efficient training and inference","volume":"57","author":"Prabhu Kartik","year":"2022","unstructured":"Kartik Prabhu, Albert Gural, Zainab F Khan, Robert M Radway, Massimo Giordano, Kalhan Koul, Rohan Doshi, John W Kustin, Timothy Liu, Gregorio B. Lopes, Victor Turbiner, Win-San Khwa, Yu-Der Chih, Meng-Fan Chang, Gu\u00e9nol\u00e9 Lallement, Boris Murmann, Subhasish Mitra, and Priyanka Raina. 2022. CHIMERA: A 0.92-TOPS, 2.2-TOPS\/W edge AI accelerator with 2-MByte on-chip foundry resistive RAM for efficient training and inference. IEEE Journal of Solid-State Circuits 57, 4 (2022), 1013\u20131026.","journal-title":"IEEE Journal of Solid-State Circuits"},{"key":"e_1_3_3_162_2","volume-title":"Proceedings of the 2023 Design Automation Conference (DAC 2023), to Appear","author":"Prasad Arpan","year":"2023","unstructured":"Arpan Prasad, Luca Benini, and Francesco Conti. 2023. Specialization meets flexibility: A heterogeneous architecture for high-efficiency, high-flexibility AR\/VR processing. In Proceedings of the 2023 Design Automation Conference (DAC 2023), to Appear."},{"key":"e_1_3_3_163_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2024.3385987"},{"key":"e_1_3_3_164_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"e_1_3_3_165_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107281"},{"key":"e_1_3_3_166_2","unstructured":"QISKIT 2023. IBM Qiskit Simulator. (2023). https:\/\/qiskit.org\/"},{"key":"e_1_3_3_167_2","article-title":"Going deeper with embedded FPGA platform for convolutional neural network","author":"Qiu Jiantao","year":"2016","unstructured":"Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (2016).","journal-title":"2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays"},{"key":"e_1_3_3_168_2","first-page":"1147","volume-title":"Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Rahman Atul","year":"2017","unstructured":"Atul Rahman, Sangyun Oh, Jongeun Lee, and Kiyoung Choi. 2017. Design space exploration of FPGA accelerators for convolutional neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE). 1147\u20131152. DOI:10.23919\/DATE.2017.7927162"},{"key":"e_1_3_3_169_2","doi-asserted-by":"publisher","DOI":"10.1145\/3571155"},{"key":"e_1_3_3_170_2","unstructured":"IBM Research. A new chip architecture points to faster more energy-efficient AI. (n.d.). Retrieved from https:\/\/research.ibm.com\/blog\/northpole-ibm-ai-chi"},{"key":"e_1_3_3_171_2","first-page":"1","volume-title":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","author":"Reuther Albert","year":"2022","unstructured":"Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2022. AI and ML accelerator survey and trends. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). 1\u201310. DOI:10.1109\/HPEC55821.2022.9926331"},{"key":"e_1_3_3_172_2","volume-title":"The perceptron - A perceiving and recognizing automaton","author":"Rosenblatt F.","year":"1957","unstructured":"F. Rosenblatt. 1957. The perceptron - A perceiving and recognizing automaton. Technical Report 85-460-1. Cornell Aeronautical Laboratory, Ithaca, New York."},{"key":"e_1_3_3_173_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2021.3114881"},{"key":"e_1_3_3_174_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2021.3127517"},{"key":"e_1_3_3_175_2","first-page":"53","volume-title":"20th IEEE International Conference on Application-Specific Systems, Architectures and Processors","author":"Sankaradas Murugan","year":"2009","unstructured":"Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors. IEEE Computer Society, USA, 53\u201360. DOI:10.1109\/ASAP.2009.25"},{"key":"e_1_3_3_176_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003"},{"key":"e_1_3_3_177_2","first-page":"1","volume-title":"2011 IEEE Custom Integrated Circuits Conference (CICC)","author":"Seo Jae-sun","year":"2011","unstructured":"Jae-sun Seo, Bernard Brezzo, Yong Liu, Benjamin D. Parker, Steven K. Esser, Robert K. Montoye, Bipin Rajendran, Jos\u00e9 A. Tierno, Leland Chang, Dharmendra S. Modha, and Daniel J. Friedman. 2011. A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons. In 2011 IEEE Custom Integrated Circuits Conference (CICC). 1\u20134. DOI:10.1109\/CICC.2011.6055293"},{"key":"e_1_3_3_178_2","first-page":"185","volume-title":"2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Seshadri Vivek","year":"2013","unstructured":"Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization. In 2013 46th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 185\u2013197."},{"key":"e_1_3_3_179_2","first-page":"273","volume-title":"2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Seshadri Vivek","year":"2017","unstructured":"Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In 2017 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 273\u2013287."},{"issue":"10","key":"e_1_3_3_180_2","article-title":"Design of flexible hardware accelerators for image convolutions and transposed convolutions","volume":"7","author":"Sestito Cristian","year":"2021","unstructured":"Cristian Sestito, Fanny Spagnolo, and Stefania Perri. 2021. Design of flexible hardware accelerators for image convolutions and transposed convolutions. Journal of Imaging 7, 10 (2021), 210. Retrieved from https:\/\/www.mdpi.com\/2313-433X\/7\/10\/210","journal-title":"Journal of Imaging"},{"key":"e_1_3_3_181_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.12"},{"key":"e_1_3_3_182_2","first-page":"1","volume-title":"2019 56th ACM\/IEEE Design Automation Conference (DAC)","author":"Shan Junnan","year":"2019","unstructured":"Junnan Shan, Mario R. Casu, Jordi Cortadella, Luciano Lavagno, and Mihai T. Lazarescu. 2019. Exact and heuristic allocation of muiti-kernel applications to multi-FPGA platforms. In 2019 56th ACM\/IEEE Design Automation Conference (DAC). 1\u20136."},{"key":"e_1_3_3_183_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358302"},{"issue":"11","key":"e_1_3_3_184_2","doi-asserted-by":"crossref","first-page":"4145","DOI":"10.1109\/TCAD.2022.3197500","article-title":"SWAP: A server-scale communication-aware chiplet-based manycore PIM accelerator","volume":"41","author":"Sharma Harsh","year":"2022","unstructured":"Harsh Sharma, Sumit K Mandal, Janardhan Rao Doppa, Umit Y Ogras, and Partha Pratim Pande. 2022. SWAP: A server-scale communication-aware chiplet-based manycore PIM accelerator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 11 (2022), 4145\u20134156.","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"e_1_3_3_185_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2023.3268092"},{"key":"e_1_3_3_186_2","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1109\/ISSCC49657.2024.10454441","volume-title":"2024 IEEE International Solid-State Circuits Conference (ISSCC)","volume":"67","author":"Smith Alan","year":"2024","unstructured":"Alan Smith, Eric Chapman, Chintan Patel, Raja Swaminathan, John Wuu, Tyrone Huang, Wonjun Jung, Alexander Kaganov, Hugh McIntyre, and Ramon Mangaser. 2024. 11.1 AMD InstinctTM MI300 series modular chiplet package \u2013 HPC and AI accelerator for exa-class systems. In 2024 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 67. 490\u2013492. DOI:10.1109\/ISSCC49657.2024.10454441"},{"key":"e_1_3_3_187_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3255864"},{"key":"e_1_3_3_188_2","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1109\/ISSCC.2019.8662476","volume-title":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","author":"Song Jinook","year":"2019","unstructured":"Jinook Song, Yunkyo Cho, Jun-Seok Park, Jun-Woo Jang, Sehwan Lee, Joon-Ho Song, Jae-Gon Lee, and Inyup Kang. 2019. An 11.5TOPS\/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8nm flagship mobile soc. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC). 130\u2013132. DOI:10.1109\/ISSCC.2019.8662476"},{"key":"e_1_3_3_189_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.55"},{"key":"e_1_3_3_190_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.vlsi.2020.04.008"},{"key":"e_1_3_3_191_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2021.3120495"},{"key":"e_1_3_3_192_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3142292"},{"key":"e_1_3_3_193_2","first-page":"273","volume-title":"2010 International Conference on Field-Programmable Technology","author":"Sriram Vinay","year":"2011","unstructured":"Vinay Sriram, David Cox, Kuen Tsoi, and Wayne Luk. 2011. Towards an embedded biologically-inspired machine vision processor. In 2010 International Conference on Field-Programmable Technology. 273\u2013278. DOI:10.1109\/FPT.2010.5681487"},{"key":"e_1_3_3_194_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_3_195_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3258906"},{"key":"e_1_3_3_196_2","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1109\/ISSCC42615.2023.10067817","volume-title":"2023 IEEE International Solid- State Circuits Conference (ISSCC)","author":"Tambe Thierry","year":"2023","unstructured":"Thierry Tambe, Jeff Zhang, Coleman Hooper, Tianyu Jia, Paul N. Whatmough, Joseph Zuckerman, Maico Cassel Dos Santos, Erik Jens Loscalzo, Davide Giri, Kenneth Shepard, Luca Carloni, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2023. 22.9 A 12nm 18.1TFLOPs\/W sparse transformer processor with entropy-based early exit, mixed-precision predication and fine-grained power management. In 2023 IEEE International Solid- State Circuits Conference (ISSCC). IEEE, San Francisco, CA, USA, 342\u2013344. DOI:10.1109\/ISSCC42615.2023.10067817"},{"key":"e_1_3_3_197_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2023.3343457"},{"key":"e_1_3_3_198_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics11223833"},{"key":"e_1_3_3_199_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.03904"},{"key":"e_1_3_3_200_2","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.23919\/DATE54114.2022.9774759","volume-title":"2022 Conference & Exhibition on Design, Automation & Test in Europe","author":"Tortorella Yvan","year":"2022","unstructured":"Yvan Tortorella, Luca Bertaccini, Davide Rossi, Luca Benini, and Francesco Conti. 2022. RedMulE: A compact FP16 matrix-multiplication accelerator for adaptive deep learning on RISC-V-based ultra-low-power SoCs. In 2022 Conference & Exhibition on Design, Automation & Test in Europe. European Design and Automation Association, 1099\u20131102."},{"key":"e_1_3_3_201_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_3_202_2","unstructured":"Paramita Basak Upama Md Jobair Hossain Faruk Mohammad Nazim Mohammad Masum Hossain Shahriar Gias Uddin Shabir Barzanjeh Sheikh Iqbal Ahamed and Akond Rahman. 2022. Evolution of Quantum Computing: A Systematic Survey on the Use of Quantum Computing Tools. (2022). arxiv:cs.SE\/2204.01856"},{"key":"e_1_3_3_203_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2024.3359044"},{"key":"e_1_3_3_204_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3061912"},{"key":"e_1_3_3_205_2","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998\u20136008."},{"key":"e_1_3_3_206_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2844093"},{"key":"e_1_3_3_207_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00021"},{"key":"e_1_3_3_208_2","unstructured":"Ventana Micro. 2023. Retrieved from https:\/\/www.ventanamicro.com\/. (2023). Accessed: 2023-04-18."},{"issue":"4","key":"e_1_3_3_209_2","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MSSC.2022.3201783","article-title":"ML processors are going multi-core: A performance dream or a scheduling nightmare?","volume":"14","author":"Verhelst Marian","year":"2022","unstructured":"Marian Verhelst, Man Shi, and Linyan Mei. 2022. ML processors are going multi-core: A performance dream or a scheduling nightmare?IEEE Solid-State Circuits Magazine 14, 4 (2022), 18\u201327.","journal-title":"IEEE Solid-State Circuits Magazine"},{"key":"e_1_3_3_210_2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1109\/HPCA.2017.42","volume-title":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","author":"Vijayaraghavan Thiruvengadam","year":"2017","unstructured":"Thiruvengadam Vijayaraghavan, Yasuko Eckert, Gabriel H. Loh, Michael J. Schulte, Mike Ignatowski, Bradford M. Beckmann, William C. Brantley, Joseph L. Greathouse, Wei Huang, Arun Karunanithi, Onur Kayiran, Mitesh Meswani, Indrani Paul, Matthew Poremba, Steven Raasch, Steven K. Reinhardt, Greg Sadowski, and Vilas Sridharan. 2017. Design and analysis of an APU for exascale computing. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 85\u201396."},{"issue":"1","key":"e_1_3_3_211_2","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1109\/JSSC.2020.3036341","article-title":"IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management","volume":"56","author":"Vivet Pascal","year":"2021","unstructured":"Pascal Vivet, Eric Guthmuller, Yvain Thonnart, Gael Pillonnet, C\u00e9sar Fuguet, Ivan Miro-Panades, Guillaume Moritz, Jean Durupt, Christian Bernard, Didier Varreau, Julian Pontes, S\u00e9bastien Thuries, David Coriat, Michel Harrand, Denis Dutoit, Didier Lattard, Lucile Arnaud, Jean Charbonnier, Perceval Coudrain, Arnaud Garnier, Fr\u00e9d\u00e9ric Berger, Alain Gueugnot, Alain Greiner, Quentin L. Meunier, Alexis Farcy, Alexandre Arriordaz, S\u00e9verine Ch\u00e9ramy, and Fabien Clermidy. 2021. IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management. IEEE Journal of Solid-State Circuits 56, 1 (2021), 79\u201397.","journal-title":"IEEE Journal of Solid-State Circuits"},{"key":"e_1_3_3_212_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-022-04992-8"},{"key":"e_1_3_3_213_2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1109\/ASAP52443.2021.00018","volume-title":"2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","author":"Wang Shihang","year":"2021","unstructured":"Shihang Wang, Jianghan Zhu, Qi Wang, Can He, and Terry Tao Ye. 2021. Customized instruction on RISC-V for winograd-based convolution acceleration. In 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP). 65\u201368. DOI:10.1109\/ASAP52443.2021.00018"},{"key":"e_1_3_3_214_2","first-page":"288","article-title":"FPAP: A folded architecture for energy-quality scalable convolutional neural networks","volume":"66","author":"Wang Yizhi","year":"2019","unstructured":"Yizhi Wang, Jun Lin, and Zhongfeng Wang. 2019. FPAP: A folded architecture for energy-quality scalable convolutional neural networks. IEEE Transactions on Circuits and Systems I: Regular Papers 66 (2019), 288\u2013301.","journal-title":"IEEE Transactions on Circuits and Systems I: Regular Papers"},{"key":"e_1_3_3_215_2","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1109\/ISSCC49657.2024.10454387","volume-title":"2024 IEEE International Solid-State Circuits Conference (ISSCC)","author":"Wang Yipeng","year":"2024","unstructured":"Yipeng Wang, Mengtian Yang, Chieh-Pu Lo, and Jaydeep P. Kulkarni. 2024. 30.6 Vecim: A 289.13GOPS\/W RISC-V Vector co-processor with compute-in-memory vector register file for efficient high-performance computing. In 2024 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, San Francisco, CA, USA, 492\u2013494. DOI:10.1109\/ISSCC49657.2024.10454387"},{"key":"e_1_3_3_216_2","unstructured":"Sally Ward-Foxton. 2022. Axelera Demos AI Test Chip After Taping Out in Four Months. (2022)."},{"key":"e_1_3_3_217_2","first-page":"1","volume-title":"2017 54th ACM\/EDAC\/IEEE Design Automation Conference (DAC)","author":"Wei Xuechao","year":"2017","unstructured":"Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In 2017 54th ACM\/EDAC\/IEEE Design Automation Conference (DAC). 1\u20136. DOI:10.1145\/3061639.3062207"},{"key":"e_1_3_3_218_2","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1109\/ISOCC56007.2022.10031596","volume-title":"2022 19th International SoC Design Conference (ISOCC)","author":"Xuan Zhou Yu","year":"2022","unstructured":"Zhou Yu Xuan, Ching-Jui Lee, and Tsung Tai Yeh. 2022. Lego: Dynamic tensor-splitting multi-tenant DNN models on multi-chip-module architecture. In 2022 19th International SoC Design Conference (ISOCC). 173\u2013174. DOI:10.1109\/ISOCC56007.2022.10031596"},{"key":"e_1_3_3_219_2","first-page":"388","volume-title":"2019 IEEE International Solid- State Circuits Conference - (ISSCC)","author":"Xue Cheng-Xin","year":"2019","unstructured":"Cheng-Xin Xue, Wei-Hao Chen, Je-Syu Liu, Jia-Fang Li, Wei-Yu Lin, Wei-En Lin, Jing-Hong Wang, Wei-Chen Wei, Ting-Wei Chang, Tung-Cheng Chang, Tsung-Yuan Huang, Hui-Yao Kao, Shih-Ying Wei, Yen-Cheng Chiu, Chun-Ying Lee, Chung-Chuan Lo, Ya-Chin King, Chorng-Jung Lin, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang, and Meng-Fan Chang. 2019. A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC). 388\u2013390. DOI:10.1109\/ISSCC.2019.8662395"},{"key":"e_1_3_3_220_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00060"},{"key":"e_1_3_3_221_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2024.3350664"},{"key":"e_1_3_3_222_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.3045564"},{"key":"e_1_3_3_223_2","doi-asserted-by":"publisher","DOI":"10.1145\/2934583.2934644"},{"key":"e_1_3_3_224_2","doi-asserted-by":"publisher","DOI":"10.23919\/VLSIC.2019.8778193"},{"key":"e_1_3_3_225_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783723"},{"key":"e_1_3_3_226_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00061"},{"key":"e_1_3_3_227_2","first-page":"1","volume-title":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","author":"Zhou Haoxiang","year":"2023","unstructured":"Haoxiang Zhou, Haiqiao Hong, Dingbang Liu, Hang Liu, Yu Xia, Kai Li, Jun Liu, Shaobo Luo, Wei Mao, and Hao Yu. 2023. RISC-V based fully-parallel SRAM computing-in-memory accelerator with high hardware utilization and data reuse rate. In 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1\u20135. DOI:10.1109\/AICAS57966.2023.10168630"},{"key":"e_1_3_3_228_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2019.2960488"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729215","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T12:27:23Z","timestamp":1749817643000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729215"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,13]]},"references-count":227,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3729215"],"URL":"https:\/\/doi.org\/10.1145\/3729215","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,13]]},"assertion":[{"value":"2023-06-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}