{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T02:44:13Z","timestamp":1775616253000,"version":"3.50.1"},"reference-count":102,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T00:00:00Z","timestamp":1594080000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications. Their ability to go beyond human precision has made these networks a milestone in the history of AI. However, while on the one hand they present cutting edge performance, on the other hand they require enormous computing power. For this reason, numerous optimization techniques at the hardware and software level, and specialized architectures, have been developed to process these models with high performance and power\/energy efficiency without affecting their accuracy. In the past, multiple surveys have been reported to provide an overview of different architectures and optimization techniques for efficient execution of Deep Learning (DL) algorithms. This work aims at providing an up-to-date survey, especially covering the prominent works from the last 3 years of the hardware architectures research for DNNs. In this paper, the reader will first understand what a hardware accelerator is, and what are its main components, followed by the latest techniques in the field of dataflow, reconfigurability, variable bit-width, and sparsity.<\/jats:p>","DOI":"10.3390\/fi12070113","type":"journal-article","created":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T10:41:09Z","timestamp":1594118469000},"page":"113","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":171,"title":["An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2500-2283","authenticated-orcid":false,"given":"Maurizio","family":"Capra","sequence":"first","affiliation":[{"name":"Department of Electrical, Electronics and Telecommunication Engineering, Politecnico di Torino, 10129 Torino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2608-820X","authenticated-orcid":false,"given":"Beatrice","family":"Bussolino","sequence":"additional","affiliation":[{"name":"Department of Electrical, Electronics and Telecommunication Engineering, Politecnico di Torino, 10129 Torino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0689-4776","authenticated-orcid":false,"given":"Alberto","family":"Marchisio","sequence":"additional","affiliation":[{"name":"Embedded Computing Systems, Institute of Computer Engineering, Technische Universit\u00e4t Wien (TU Wien), 1040 Vienna, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2607-8135","authenticated-orcid":false,"given":"Muhammad","family":"Shafique","sequence":"additional","affiliation":[{"name":"Embedded Computing Systems, Institute of Computer Engineering, Technische Universit\u00e4t Wien (TU Wien), 1040 Vienna, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2238-9443","authenticated-orcid":false,"given":"Guido","family":"Masera","sequence":"additional","affiliation":[{"name":"Department of Electrical, Electronics and Telecommunication Engineering, Politecnico di Torino, 10129 Torino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3069-0319","authenticated-orcid":false,"given":"Maurizio","family":"Martina","sequence":"additional","affiliation":[{"name":"Department of Electrical, Electronics and Telecommunication Engineering, Politecnico di Torino, 10129 Torino, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2020,7,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep Learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zanc, R., Cioara, T., and Anghel, I. (2019, January 5\u20137). Forecasting Financial Markets using Deep Learning. Proceedings of the 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania.","DOI":"10.1109\/ICCP48234.2019.8959715"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ying, J.J., Huang, P., Chang, C., and Yang, D. (2017, January 11\u201314). A preliminary study on deep learning for predicting social insurance payment behavior. Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA.","DOI":"10.1109\/BigData.2017.8258131"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ha, V., Lu, D., Choi, G.S., Nguyen, H., and Yoon, B. (2019, January 17\u201320). Improving Credit Risk Prediction in Online Peer-to-Peer (P2P) Lending Using Feature selection with Deep learning. Proceedings of the 2019 21st International Conference on Advanced Communication Technology (ICACT), PyeongChang Kwangwoon_Do, Korea.","DOI":"10.23919\/ICACT.2019.8701943"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Arslan, A.K., Ya\u015far, \u015e., and \u00c7olak, C. (2019, January 21\u201322). An Intelligent System for the Classification of Lung Cancer Based on Deep Learning Strategy. Proceedings of the 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey.","DOI":"10.1109\/IDAP.2019.8875896"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.fcij.2017.12.001","article-title":"Classification using Deep Learning Neural Networks for Brain Tumors","volume":"3","author":"Mohsen","year":"2018","journal-title":"Future Comput. Inform. J."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Barata, C., and Marques, J.S. (2019, January 8\u201311). Deep Learning For Skin Cancer Diagnosis With Hierarchical Architectures. Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy.","DOI":"10.1109\/ISBI.2019.8759561"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1002\/rob.21918","article-title":"A survey of deep learning techniques for autonomous driving","volume":"37","author":"Grigorescu","year":"2020","journal-title":"J. Field Robot."},{"key":"ref_9","unstructured":"Palossi, D., Loquercio, A., Conti, F., Flamand, E., Scaramuzza, D., and Benini, L. (2018). Ultra Low Power Deep-Learning-powered Autonomous Nano Drones. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, D., and Liu, S. (2018, January 13\u201315). Top-Down Saliency Object Localization Based on Deep-Learned Features. Proceedings of the 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China.","DOI":"10.1109\/CISP-BMEI.2018.8633218"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., and Terzopoulos, D. (2020). Image Segmentation Using Deep Learning: A Survey. arXiv.","DOI":"10.1109\/TPAMI.2021.3059968"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kaskavalci, H.C., and G\u00f6ren, S. (2019, January 26\u201328). A Deep Learning Based Distributed Smart Surveillance Architecture using Edge and Cloud Computing. Proceedings of the 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), Istanbul, Turkey.","DOI":"10.1109\/Deep-ML.2019.00009"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Capra, M., Peloso, R., Masera, G., Ruo Roch, M., and Martina, M. (2019). Edge Computing: A Survey On the Hardware Requirements in the Internet of Things World. Future Internet, 11.","DOI":"10.3390\/fi11040100"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Shafique, M., Theocharides, T., Bouganis, C., Hanif, M.A., Khalid, F., Haf\u0131z, R., and Rehman, S. (2018, January 19\u201323). An overview of next-generation architectures for machine learning: Roadmap, opportunities and challenges in the IoT era. Proceedings of the 2018 Design, Automation Test in Europe Conference Exhibition (DATE), Dresden, Germany.","DOI":"10.23919\/DATE.2018.8342120"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Marchisio, A., Hanif, M.A., Khalid, F., Plastiras, G., Kyrkou, C., Theocharides, T., and Shafique, M. (2019, January 15\u201317). Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges. Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, FL, USA.","DOI":"10.1109\/ISVLSI.2019.00105"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, J.J., Liu, K., Khalid, F., Hanif, M.A., Rehman, S., Theocharides, T., Artussi, A., Shafique, M., and Garg, S. (2019, January 2\u20136). Building Robust Machine Learning Systems: Current Progress, Research Challenges, and Opportunities. Proceedings of the 56th Annual Design Automation Conference, Las Vegas, NV, USA.","DOI":"10.1145\/3316781.3323472"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/MDAT.2020.2971217","article-title":"Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead","volume":"37","author":"Shafique","year":"2020","journal-title":"IEEE Des. Test"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2295","DOI":"10.1109\/JPROC.2017.2761740","article-title":"Efficient Processing of Deep Neural Networks: A Tutorial and Survey","volume":"105","author":"Sze","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1109\/JPROC.2020.2976475","article-title":"Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey","volume":"108","author":"Deng","year":"2020","journal-title":"Proc. IEEE"},{"key":"ref_20","unstructured":"Schuman, C.D., Potok, T.E., Patton, R.M., Birdwell, J.D., Dean, M.E., Rose, G.S., and Plank, J.S. (2017). A Survey of Neuromorphic Computing and Neural Networks in Hardware. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1016\/j.eng.2020.01.007","article-title":"A Survey of Accelerator Architectures for Deep Neural Networks","volume":"6","author":"Chen","year":"2020","journal-title":"Engineering"},{"key":"ref_22","unstructured":"Han, S., Mao, H., and Dally, W.J. (2015). Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv."},{"key":"ref_23","unstructured":"Cai, H., Gan, C., and Han, S. (2019). Once for All: Train One Network and Specialize it for Efficient Deployment. arXiv."},{"key":"ref_24","unstructured":"Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J., Keckler, S.W., and Dally, W.J. (2017, January 24\u201328). SCNN: An accelerator for compressed-sparse convolutional neural networks. Proceedings of the 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1663","DOI":"10.1109\/TC.2019.2924215","article-title":"SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules","volume":"68","author":"Li","year":"2019","journal-title":"IEEE Trans. Comput."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1109\/TNNLS.2018.2852335","article-title":"NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps","volume":"30","author":"Aimar","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000006","article-title":"Learning Deep Architectures for AI","volume":"2","author":"Bengio","year":"2009","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-Based Learning Applied to Document Recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/S0893-6080(98)00116-6","article-title":"On the momentum term in gradient descent learning algorithms","volume":"12","author":"Qian","year":"1999","journal-title":"Neural Netw."},{"key":"ref_30","unstructured":"Kingma, D., and Ba, J. (2014, January 14\u201316). Adam: A Method for Stochastic Optimization. Proceedings of the International Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_31","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems\u2014Volume 1, Curran Associates Inc."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_33","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27\u201330 June 2016, IEEE Computer Society.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_36","unstructured":"Singh, S.P., and Markovitch, S. (2017, January 4\u20139). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep Learning with Depthwise Separable Convolutions. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated Residual Transformations for Deep Neural Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-Excitation Networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., and Le, Q.V. (2018, January 18\u201322). Learning Transferable Architectures for Scalable Image Recognition. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00907"},{"key":"ref_43","unstructured":"Burstein, J., Doran, C., and Solorio, T. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2\u20137 June 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics."},{"key":"ref_44","unstructured":"Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., and Catanzaro, B. (2019). Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arXiv."},{"key":"ref_45","first-page":"1","article-title":"Lower Numerical Precision Deep Learning Inference and Training","volume":"3","author":"Rodriguez","year":"2018","journal-title":"Intel White Paper"},{"key":"ref_46","unstructured":"Lorette, G. (2006). High Performance Convolutional Neural Networks for Document Processing. Tenth International Workshop on Frontiers in Handwriting Recognition, Universit\u00e9 de Rennes 1, Suvisoft. Available online: http:\/\/www.suvisoft.com."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Vasudevan, A., Anderson, A., and Gregg, D. (2017, January 10\u201312). Parallel Multi Channel convolution using General Matrix Multiplication. Proceedings of the 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), Seattle, WA, USA.","DOI":"10.1109\/ASAP.2017.7995254"},{"key":"ref_48","unstructured":"Mathieu, M., Henaff, M., and LeCun, Y. (2013). Fast Training of Convolutional Networks through FFTs. arXiv."},{"key":"ref_49","unstructured":"James, R. (2020, June 06). Intel AVX-512 Instructions. Available online: https:\/\/software.intel.com\/content\/www\/cn\/zh\/develop\/articles\/intel-avx-512-instructions.html."},{"key":"ref_50","unstructured":"(2020, June 06). bfloat16\u2014Hardware Numerics Definition. Available online: https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/download\/bfloat16-hardware-numerics-definition.html."},{"key":"ref_51","unstructured":"Gogar, S.L. (2020, June 06). BigDL\u2014Scale-out Deep Learning on Apache Spark* Cluster. Available online: https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/bigdl-scale-out-deep-learning-on-apache-spark-cluster.html."},{"key":"ref_52","unstructured":"Hua, K.A., Rui, Y., Steinmetz, R., Hanjalic, A., Natsev, A., and Zhu, W. (2014, January 3\u20137). Caffe: Convolutional Architecture for Fast Feature Embedding. Proceedings of the ACM International Conference on Multimedia, MM\u201914, Orlando, FL, USA."},{"key":"ref_53","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2020, June 06). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: tensorflow.org."},{"key":"ref_54","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 4\u20139). Automatic Differentiation in PyTorch. Proceedings of the NIPS 2017 Workshop on Autodiff, Long Beach, CA, USA."},{"key":"ref_55","unstructured":"Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., and Shelhamer, E. (2014). cuDNN: Efficient Primitives for Deep Learning. arXiv."},{"key":"ref_56","unstructured":"(2020, June 06). Available online: https:\/\/developer.nvidia.com\/gpu-accelerated-libraries."},{"key":"ref_57","unstructured":"(2020, June 06). NVIDIA TESLA V100 GPU ARCHITECTURE. Available online: https:\/\/images.nvidia.com\/content\/technologies\/volta\/pdf\/437317-Volta-V100-DS-NV-US-WEB.pdf."},{"key":"ref_58","unstructured":"(2020, June 06). NVIDIA A100 Tensor Core GPU Architecture. Available online: https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/nvidia-ampere-architecture-whitepaper.pdf."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Gokhale, V., Jin, J., Dundar, A., Martini, B., and Culurciello, E. (2014, January 23\u201328). A 240 G-ops\/s Mobile Coprocessor for Deep Neural Networks. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA.","DOI":"10.1109\/CVPRW.2014.106"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3140659.3080246","article-title":"In-Datacenter Performance Analysis of a Tensor Processing Unit","volume":"45","author":"Jouppi","year":"2017","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., and Temam, O. (2015, January 13\u201317). ShiDianNao: Shifting vision processing closer to the sensor. Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, USA.","DOI":"10.1145\/2749469.2750389"},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"2461","DOI":"10.1109\/TCSVT.2016.2592330","article-title":"Origami: A 803-GOp\/s\/W Convolutional Network Accelerator","volume":"27","author":"Cavigelli","year":"2017","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1109\/JSSC.2016.2616357","article-title":"Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks","volume":"52","author":"Chen","year":"2017","journal-title":"IEEE J. Solid-State Circuits"},{"key":"ref_64","first-page":"269","article-title":"DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning","volume":"49","author":"Chen","year":"2014","journal-title":"Int. Conf. Archit. Support Program. Lang. Oper. Syst."},{"key":"ref_65","unstructured":"Han, S., Pool, J., Tran, J., and Dally, W.J. (2015). Learning Both Weights and Connections for Efficient Neural Networks. Proceedings of the 28th International Conference on Neural Information Processing Systems\u2014Volume 1, MIT Press."},{"key":"ref_66","unstructured":"Xie, X., Jones, M.W., and Tam, G.K.L. (2015, January 7\u201310). Data-free Parameter Pruning for Deep Neural Networks. Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Marchisio, A., Hanif, M.A., Martina, M., and Shafique, M. (2018, January 8\u201313). PruNet: Class-Blind Pruning Method For Deep Neural Networks. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil.","DOI":"10.1109\/IJCNN.2018.8489764"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. (2016, January 18\u201322). Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. Proceedings of the 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea.","DOI":"10.1109\/ISCA.2016.11"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., and Chen, Y. (2016, January 15\u201319). Cambricon-X: An accelerator for sparse neural networks. Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan.","DOI":"10.1109\/MICRO.2016.7783723"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Gondimalla, A., Chesnut, N., Thottethodi, M., and Vijaykumar, T.N. (2019, January 12\u201316). SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks. Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture, Columbus, OH, USA.","DOI":"10.1145\/3352460.3358291"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., and Dally, W.J. (2016, January 18\u201322). EIE: Efficient Inference Engine on Compressed Deep Neural Network. Proceedings of the 43rd ACM\/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, Korea.","DOI":"10.1109\/ISCA.2016.30"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1109\/MDAT.2017.2741463","article-title":"ZeNA: Zero-Aware Neural Network Accelerator","volume":"35","author":"Kim","year":"2018","journal-title":"IEEE Des. Test"},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1109\/JETCAS.2019.2910232","article-title":"Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices","volume":"9","author":"Chen","year":"2019","journal-title":"IEEE J. Emerg. Sel. Top. Circuits Syst."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Hegde, K., Yu, J., Agrawal, R., Yan, M., Pellauer, M., and Fletcher, C. (2018, January 1\u20136). UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition. Proceedings of the 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Angeles, CA, USA.","DOI":"10.1109\/ISCA.2018.00062"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Granas, A., and Dugundji, J. (2003). Fixed Point Theory, Springer.","DOI":"10.1007\/978-0-387-21593-8"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Horowitz, M. (2014, January 9\u201313). 1.1 Computing\u2019s energy problem (and what we can do about it). Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA.","DOI":"10.1109\/ISSCC.2014.6757323"},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1166\/jolpe.2018.1575","article-title":"X-DNNs: Systematic Cross-Layer Approximations for Energy-Efficient Deep Neural Networks","volume":"14","author":"Hanif","year":"2018","journal-title":"J. Low Power Electron."},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"5784","DOI":"10.1109\/TNNLS.2018.2808319","article-title":"Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks","volume":"29","author":"Gysel","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. (2018, January 18\u201323). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00286"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., and Graf, H.P. (2009, January 7\u20139). A Massively Parallel Coprocessor for Convolutional Neural Networks. Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, Boston, MA, USA.","DOI":"10.1109\/ASAP.2009.25"},{"key":"ref_81","unstructured":"Sakr, C., and Shanbhag, N. (2019). Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm. arXiv."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1109\/LCA.2016.2597140","article-title":"Stripes: Bit-Serial Deep Neural Network Computing","volume":"16","author":"Judd","year":"2017","journal-title":"IEEE Comput. Archit. Lett."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1109\/JSSC.2018.2865489","article-title":"UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision","volume":"54","author":"Lee","year":"2019","journal-title":"IEEE J. Solid-State Circuits"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Sharify, S., Lascorz, A.D., Siu, K., Judd, P., and Moshovos, A. (2018, January 24\u201328). Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks. Proceedings of the 2018 55th ACM\/ESDA\/IEEE Design Automation Conference (DAC), San Francisco, CA, USA.","DOI":"10.1109\/DAC.2018.8465915"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Sharma, H., Park, J., Suda, N., Lai, L., Chau, B., Chandra, V., and Esmaeilzadeh, H. (2018, January 1\u20136). Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. Proceedings of the 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA.","DOI":"10.1109\/ISCA.2018.00069"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Ryu, S., Kim, H., Yi, W., and Kim, J. (2019, January 2\u20136). BitBlade: Area and Energy-Efficient Precision-Scalable Neural Network Accelerator with Bitwise Summation. Proceedings of the 2019 56th ACM\/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA.","DOI":"10.1145\/3316781.3317784"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Lu, W., Yan, G., Li, J., Gong, S., Han, Y., and Li, X. (2017, January 4\u20138). FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, USA.","DOI":"10.1109\/HPCA.2017.29"},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"2220","DOI":"10.1109\/TVLSI.2017.2688340","article-title":"Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns","volume":"25","author":"Tu","year":"2017","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_89","unstructured":"Hanif, M.A., Putra, R.V.W., Tanvir, M., Hafiz, R., Rehman, S., and Shafique, M. (2018). MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks. arXiv."},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Yin, S., Ouyang, P., Tang, S., Tu, F., Li, X., Liu, L., and Wei, S. (2017, January 5\u20138). A 1.06-to-5.09 TOPS\/W reconfigurable hybrid-neural-network processor for deep learning applications. Proceedings of the 2017 Symposium on VLSI Circuits, Kyoto, Japan.","DOI":"10.23919\/VLSIC.2017.8008534"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Fowers, J., Ovtcharov, K., Papamichael, M., Massengill, T., Liu, M., Lo, D., Alkalay, S., Haselman, M., Adams, L., and Ghandi, M. (2018, January 1\u20136). A Configurable Cloud-Scale DNN Processor for Real-Time AI. Proceedings of the 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA.","DOI":"10.1109\/ISCA.2018.00012"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Kwon, H., Samajdar, A., and Krishna, T. (2018). MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery.","DOI":"10.1145\/3173162.3173176"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Qin, E., Samajdar, A., Kwon, H., Nadella, V., Srinivasan, S., Das, D., Kaul, B., and Krishna, T. (2020, January 22\u201326). SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), San Diego, CA, USA.","DOI":"10.1109\/HPCA47549.2020.00015"},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1109\/MM.2018.053631145","article-title":"DNPU: An Energy-Efficient Deep-Learning Processor with Heterogeneous Multi-Core Architecture","volume":"38","author":"Shin","year":"2018","journal-title":"IEEE Micro"},{"key":"ref_95","unstructured":"Stoutchinin, A., Conti, F., and Benini, L. (2019). Optimally Scheduling CNN Convolutions for Efficient Memory Access. arXiv."},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Li, J., Yan, G., Lu, W., Jiang, S., Gong, S., Wu, J., and Li, X. (2018, January 19\u201323). SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators. Proceedings of the 2018 Design, Automation Test in Europe Conference Exhibition (DATE), Dresden, Germany.","DOI":"10.23919\/DATE.2018.8342033"},{"key":"ref_97","unstructured":"Putra, R.V.W., Hanif, M.A., and Shafique, M. (2020, January 19\u201323). DRMap: A Generic DRAM Data Mapping Policy for Energy-Efficient Processing of Convolutional Neural Networks. Proceedings of the 57th Annual Design Automation Conference 2020, San Francisco, CA, USA."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Wei, X., Liang, Y., and Cong, J. (2019, January 2\u20136). Overcoming Data Transfer Bottlenecks in FPGA-based DNN Accelerators via Layer Conscious Memory Management. Proceedings of the 2019 56th ACM\/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA.","DOI":"10.1145\/3316781.3317875"},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Khwa, W., Chen, J., Li, J., Si, X., Yang, E., Sun, X., Liu, R., Chen, P., Li, Q., and Yu, S. (2018, January 11\u201315). A 65 nm 4 Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3 ns and 55.8 TOPS\/W fully parallel product-sum operation for binary DNN edge processors. Proceedings of the 2018 IEEE International Solid-State Circuits Conference\u2014(ISSCC), San Francisco, CA, USA.","DOI":"10.1109\/ISSCC.2018.8310401"},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1109\/MSSC.2016.2546199","article-title":"Emerging Memory Technologies: Recent Trends and Prospects","volume":"8","author":"Yu","year":"2016","journal-title":"IEEE Solid-State Circuits Mag."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Lee, H.J., Kim, C.H., and Kim, S.W. (2020, January 19\u201322). Design of Floating-Point MAC Unit for Computing DNN Applications in PIM. Proceedings of the 2020 International Conference on Electronics, Information, and Communication (ICEIC), Barcelona, Spain.","DOI":"10.1109\/ICEIC49074.2020.9050989"},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Schabel, J., Baker, L., Dey, S., Li, W., and Franzon, P.D. (2016, January 17\u201319). Processor-in-memory support for artificial neural networks. Proceedings of the 2016 IEEE International Conference on Rebooting Computing (ICRC), San Diego, CA, USA.","DOI":"10.1109\/ICRC.2016.7738697"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/12\/7\/113\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:48:22Z","timestamp":1760176102000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/12\/7\/113"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,7]]},"references-count":102,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2020,7]]}},"alternative-id":["fi12070113"],"URL":"https:\/\/doi.org\/10.3390\/fi12070113","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,7]]}}}