{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:33:28Z","timestamp":1775230408718,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,11,9]],"date-time":"2019-11-09T00:00:00Z","timestamp":1573257600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Electronics"],"abstract":"<jats:p>Edge devices are becoming smarter with the integration of machine learning methods, such as deep learning, and are therefore used in many application domains where decisions have to be made without human intervention. Deep learning and, in particular, convolutional neural networks (CNN) are more efficient than previous algorithms for several computer vision applications such as security and surveillance, where image and video analysis are required. This better efficiency comes with a cost of high computation and memory requirements. Hence, running CNNs in embedded computing devices is a challenge for both algorithm and hardware designers. New processing devices, dedicated system architectures and optimization of the networks have been researched to deal with these computation requirements. In this paper, we improve the inference execution times of CNNs in low density FPGAs (Field-Programmable Gate Arrays) using fixed-point arithmetic, zero-skipping and weight pruning. The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources. With the proposed architecture, it is possible to infer an image in AlexNet in 2.9 ms in a ZYNQ7020 and 1.0 ms in a ZYNQ7045 with less than 1% accuracy degradation. These results improve previous state-of-the-art architectures for CNN inference.<\/jats:p>","DOI":"10.3390\/electronics8111321","type":"journal-article","created":{"date-parts":[[2019,11,12]],"date-time":"2019-11-12T04:07:07Z","timestamp":1573531627000},"page":"1321","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Fast Convolutional Neural Networks in Low Density FPGAs Using Zero-Skipping and Weight Pruning"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8556-4507","authenticated-orcid":false,"given":"M\u00e1rio P.","family":"V\u00e9stias","sequence":"first","affiliation":[{"name":"INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Polit\u00e9cnico de Lisboa, 1959-007 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7060-4745","authenticated-orcid":false,"given":"Rui Policarpo","family":"Duarte","sequence":"additional","affiliation":[{"name":"INESC-ID, Instituto Superior T\u00e9cnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7525-7546","authenticated-orcid":false,"given":"Jos\u00e9 T.","family":"de Sousa","sequence":"additional","affiliation":[{"name":"INESC-ID, Instituto Superior T\u00e9cnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3621-8322","authenticated-orcid":false,"given":"Hor\u00e1cio C.","family":"Neto","sequence":"additional","affiliation":[{"name":"INESC-ID, Instituto Superior T\u00e9cnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1109\/35.41400","article-title":"Handwritten digit recognition: applications of neural network chips and automatic learning","volume":"27","author":"Cun","year":"1989","journal-title":"IEEE Commun. Mag."},{"key":"ref_3","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems\u2014Volume 1, Lake Tahoe, Nevada."},{"key":"ref_4","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_7","unstructured":"(1900). Omitted for blind review."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1145\/1816038.1815993","article-title":"A Dynamically Configurable Coprocessor for Convolutional Neural Networks","volume":"38","author":"Chakradhar","year":"2010","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., and Sun, N. (2014, January 13\u201317). DaDianNao: A Machine-Learning Supercomputer. Proceedings of the 2014 47th Annual IEEE\/ACM International Symposium on Microarchitecture, Cambridge, UK.","DOI":"10.1109\/MICRO.2014.58"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., and Cong, J. (2015, January 22\u201324). Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. Proceedings of the 2015 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/2684746.2689060"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Liu, B., Zou, D., Feng, L., Feng, S., Fu, P., and Li, J. (2019). An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution. Electronics, 8.","DOI":"10.3390\/electronics8030281"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Rivera-Acosta, M., Ortega-Cisneros, S., and Rivera, J. (2019). Automatic Tool for Fast Generation of Custom Convolutional Neural Networks Accelerators for FPGA. Electronics, 8.","DOI":"10.3390\/electronics8060641"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., and Song, S. (2016, January 21\u201323). Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/2847263.2847265"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.S., and Cao, Y. (2016, January 21\u201323). Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. Proceedings of the 2016 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/2847263.2847276"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"e3850","DOI":"10.1002\/cpe.3850","article-title":"FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency","volume":"29","author":"Qiao","year":"2017","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"17:1","DOI":"10.1145\/3079758","article-title":"Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks","volume":"10","author":"Liu","year":"2017","journal-title":"ACM Trans. Reconfigurable Technol. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Alwani, M., Chen, H., Ferdman, M., and Milder, P. (2016, January 15\u201319). Fused-layer CNN accelerators. Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan.","DOI":"10.1109\/MICRO.2016.7783725"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1145\/3140659.3080221","article-title":"Maximizing CNN Accelerator Efficiency Through Resource Partitioning","volume":"45","author":"Shen","year":"2017","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2601","DOI":"10.1109\/TCAD.2018.2857078","article-title":"MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip","volume":"37","author":"Gong","year":"2018","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Venieris, S.I., and Bouganis, C. (2018). fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst., 1\u201317.","DOI":"10.1145\/3020078.3021791"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1109\/TCAD.2017.2705069","article-title":"Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA","volume":"37","author":"Guo","year":"2018","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"ref_22","unstructured":"Gysel, P., Motamedi, M., and Ghiasi, S. (2016, January 2\u20134). Hardware-oriented Approximation of Convolutional Neural Networks. Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., and Chen, D. (2018, January 27\u201331). A Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA. Proceedings of the 28th International Conference on Field-Programmable Logic and Applications, Dublin, Ireland.","DOI":"10.1109\/FPL.2018.00035"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M., and Vissers, K. (2017, January 22\u201324). FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3020078.3021744"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1941","DOI":"10.1109\/TCSI.2017.2767204","article-title":"Efficient Hardware Architectures for Deep Convolutional Neural Network","volume":"65","author":"Wang","year":"2018","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1874","DOI":"10.1109\/TVLSI.2019.2913958","article-title":"High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic","volume":"27","author":"Lian","year":"2019","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Winograd, S. (1980). Arithmetic Complexity of Computations, Siam.","DOI":"10.1137\/1.9781611970364"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lavin, A., and Gray, S. (2016, January 27\u201330). Fast Algorithms for Convolutional Neural Networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.435"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, C., and Prasanna, V. (2017, January 22\u201324). Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System. Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3020078.3021727"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lu, L., Liang, Y., Xiao, Q., and Yan, S. (May, January 30). Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA.","DOI":"10.1109\/FCCM.2017.64"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liang, Y., Lu, L., Xiao, Q., and Yan, S. (2019). Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 1.","DOI":"10.1109\/TCAD.2019.2897701"},{"key":"ref_32","unstructured":"Han, S., Mao, H., and Dally, W.J. (2015). Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. CoRR arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"548","DOI":"10.1145\/3140659.3080215","article-title":"Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism","volume":"45","author":"Yu","year":"2017","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., and Moshovos, A. (2016, January 18\u201322). Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. Proceedings of the 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Korea.","DOI":"10.1109\/ISCA.2016.11"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., and Subhaschandra, S. (2017, January 22\u201324). Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?. Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3020078.3021740"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1109\/TNNLS.2018.2852335","article-title":"NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps","volume":"30","author":"Aimar","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Lu, L., Xie, J., Huang, R., Zhang, J., Lin, W., and Liang, Y. (May, January 28). An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs. Proceedings of the 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA.","DOI":"10.1109\/FCCM.2019.00013"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xu, J., Han, Y., Li, H., and Li, X. (2016, January 5\u20139). DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family. Proceedings of the 2016 53nd ACM\/EDAC\/IEEE Design Automation Conference (DAC), Austin, TX, USA.","DOI":"10.1145\/2897937.2898003"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Kouris, A., Venieris, S.I., and Bouganis, C. (2018, January 27\u201331). CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks. Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL), Dublin, Ireland.","DOI":"10.1109\/FPL.2018.00034"}],"container-title":["Electronics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-9292\/8\/11\/1321\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:33:15Z","timestamp":1760189595000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-9292\/8\/11\/1321"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,9]]},"references-count":39,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["electronics8111321"],"URL":"https:\/\/doi.org\/10.3390\/electronics8111321","relation":{},"ISSN":["2079-9292"],"issn-type":[{"value":"2079-9292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,9]]}}}