{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T17:06:19Z","timestamp":1774631179182,"version":"3.50.1"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Sign Process Syst"],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep Learning (DL) is pervasive across a wide variety of domains. Convolutional Neural Networks (CNNs) are often used for image processing DL applications. Modern CNN models are growing to meet the needs of more sophisticated tasks, e.g. using Transposed Convolutions (TCONVs) for image decompression and image generation. Such state-of-the-art DL models often target GPU-based high-performance architectures, due to the high computational and hardware resource needs of TCONV layers. To avoid prohibitive GPU energy costs, CNNs are increasingly deployed to decentralized embedded autonomous devices, such as Field Programmable Gate Arrays (FPGAs). However, this poses challenges for designing efficient hardware implementations of TCONV layers. This paper presents a parameterized design and implementation of a new TCONV module, which is synthesizable onto FPGAs. It is implemented using the High-Level Synthesis (HLS), through a C++\u2009template to parameterize its functional and non-functional properties. These parameters allow kernel sizes, image sizes, quantization and parallelism to be varied by users. With a systematic exploration in this design space, we find an optimal instance of this TCONV module that achieves 6.25 Giga Outputs per Second (<jats:italic>Gout\/s<\/jats:italic>) using just 1.53 W of power. We then use our TCONV layer in two neural networks for image decompression and image generation. Image decompression achieves a speed throughput of more than 30K frames-per-second (<jats:italic>fps<\/jats:italic>) using only the 16% of resources on average, image generation achieves an energy efficiency of 324 <jats:italic>fps<\/jats:italic>\/W and outperforms comparable state-of-the-art models by at least 7.3\u00d7.<\/jats:p>","DOI":"10.1007\/s11265-023-01883-7","type":"journal-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T01:01:47Z","timestamp":1691110907000},"page":"1245-1263","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["FPGA Design of Transposed Convolutions for Deep Learning Using High-Level Synthesis"],"prefix":"10.1007","volume":"95","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7731-0002","authenticated-orcid":false,"given":"Cristian","family":"Sestito","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1363-9201","authenticated-orcid":false,"given":"Stefania","family":"Perri","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0365-693X","authenticated-orcid":false,"given":"Robert","family":"Stewart","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"1883_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2018\/7068349","volume":"2018","author":"A Voulodimos","year":"2018","unstructured":"Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep Learning For Computer Vision: A Brief Review. Computational Intelligence and Neuroscience, 2018, 1\u201313. https:\/\/doi.org\/10.1155\/2018\/7068349","journal-title":"Computational Intelligence and Neuroscience"},{"key":"1883_CR2","doi-asserted-by":"publisher","first-page":"19143","DOI":"10.1109\/ACCESS.2019.2896880","volume":"7","author":"AB Nassif","year":"2019","unstructured":"Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143\u201319165. https:\/\/doi.org\/10.1109\/ACCESS.2019.2896880","journal-title":"IEEE Access"},{"issue":"12","key":"1883_CR3","doi-asserted-by":"publisher","first-page":"1959","DOI":"10.1007\/s11548-018-1860-1","volume":"13","author":"Z Wang","year":"2018","unstructured":"Wang, Z., & Majewicz Fey, A. (2018). Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. International Journal of Computer Assisted Radiology and Surgery, 13(12), 1959\u20131970. https:\/\/doi.org\/10.1007\/s11548-018-1860-1","journal-title":"International Journal of Computer Assisted Radiology and Surgery"},{"issue":"1","key":"1883_CR4","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","volume":"35","author":"A Creswell","year":"2018","unstructured":"Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1), 53\u201365. https:\/\/doi.org\/10.1109\/MSP.2017.2765202","journal-title":"IEEE Signal Processing Magazine"},{"key":"1883_CR5","doi-asserted-by":"publisher","first-page":"2153","DOI":"10.1016\/j.procs.2023.01.191","volume":"218","author":"M Kumar","year":"2023","unstructured":"Kumar, M., & Sharma, H. K. (2023). A GAN-Based Model of Deepfake Detection in Social Media. Procedia Computer Science, 218, 2153\u20132162. https:\/\/doi.org\/10.1016\/j.procs.2023.01.191","journal-title":"Procedia Computer Science"},{"issue":"10","key":"1883_CR6","doi-asserted-by":"publisher","first-page":"3471","DOI":"10.1109\/TCSI.2020.2991189","volume":"67","author":"D Im","year":"2020","unstructured":"Im, D., Han, D., Choi, S., Kang, S., & Yoo, H. J. (2020). DT-CNN: An energy-efficient dilated and transposed convolutional neural network processor for region of interest based image segmentation. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(10), 3471\u20133483. https:\/\/doi.org\/10.1109\/TCSI.2020.2991189","journal-title":"IEEE Transactions on Circuits and Systems I: Regular Papers"},{"issue":"10","key":"1883_CR7","doi-asserted-by":"publisher","first-page":"2281","DOI":"10.1109\/TMI.2019.2903562","volume":"38","author":"Z Gu","year":"2019","unstructured":"Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., Zhang, T., Gao, S., & Liu, J. (2019). Ce-net: Context encoder network for 2d medical image segmentation. IEEE Transactions on Medical Imaging, 38(10), 2281\u20132292. https:\/\/doi.org\/10.1109\/TMI.2019.2903562","journal-title":"IEEE Transactions on Medical Imaging"},{"key":"1883_CR8","doi-asserted-by":"publisher","unstructured":"Dong, C., Loy, C. C., & Tang, X. (2016). Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision (ECCV) (pp. 391\u2013407). Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-319-46475-6_25","DOI":"10.1007\/978-3-319-46475-6_25"},{"issue":"8","key":"1883_CR9","doi-asserted-by":"publisher","first-page":"9009","DOI":"10.1109\/JSEN.2023.3256524","volume":"23","author":"F Spagnolo","year":"2023","unstructured":"Spagnolo, F., Corsonello, P., Frustaci, F., & Perri, S. (2023). Design of a Low-power Super-Resolution Architecture for Virtual Reality Wearable Devices. IEEE Sensors Journal, 23(8), 9009\u20139016. https:\/\/doi.org\/10.1109\/JSEN.2023.3256524","journal-title":"IEEE Sensors Journal"},{"issue":"1","key":"1883_CR10","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1109\/TCSVT.2018.2888898","volume":"30","author":"JW Chang","year":"2020","unstructured":"Chang, J. W., Kang, K. W., & Kang, S. J. (2020). An energy-efficient FPGA-based deconvolutional neural networks accelerator for single image super-resolution. IEEE Transactions on Circuits and Systems for Video Technology, 30(1), 281\u2013295. https:\/\/doi.org\/10.1109\/TCSVT.2018.2888898","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"1883_CR11","doi-asserted-by":"publisher","unstructured":"Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y. T., Srivatsan, K., Moss, D., Subhaschandra, S., & Boudoukh, G. (2017). Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA) (pp. 5\u201314). ACM. https:\/\/doi.org\/10.1145\/3020078.3021740","DOI":"10.1145\/3020078.3021740"},{"key":"1883_CR12","doi-asserted-by":"publisher","unstructured":"Yazdanbakhsh, A., Brzozowski, M., Khaleghi, B., Ghodrati, S., Samadi, K., Kim, N. S., & Esmaeilzadeh, H. (2018). FlexiGAN: An end-to-end solution for FPGA acceleration of generative adversarial networks. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (pp. 65\u201372). IEEE. https:\/\/doi.org\/10.1109\/FCCM.2018.00019","DOI":"10.1109\/FCCM.2018.00019"},{"key":"1883_CR13","doi-asserted-by":"publisher","unstructured":"Sestito, C., Spagnolo, F., & Perri, S. (2021). Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions. Journal of Imaging, 7(10):210, 1\u201316. https:\/\/doi.org\/10.3390\/jimaging7100210","DOI":"10.3390\/jimaging7100210"},{"key":"1883_CR14","unstructured":"Zhang, X., Das, S., Neopane, O., & Kreutz-Delgado, K. (2017). A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA. arXiv preprint arXiv:1705.02583."},{"issue":"3","key":"1883_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3242897","volume":"11","author":"M Blott","year":"2018","unstructured":"Blott, M., Preu\u00dfer, T. B., Fraser, N. J., Gambardella, G., & O\u2019brien, K., Umuroglu, Y., Leeser, M., & Vissers, K. (2018). FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(3), 1\u201323. https:\/\/doi.org\/10.1145\/3242897","journal-title":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)"},{"key":"1883_CR16","doi-asserted-by":"publisher","unstructured":"Stewart, R., Nowlan, A., Bacchus, P., Ducasse, Q., & Komendantskaya, E. (2021). Optimising hardware accelerated neural networks with quantisation and a knowledge distillation evolutionary algorithm. Electronics, 10(4):396, 1\u201321. https:\/\/doi.org\/10.3390\/electronics10040396","DOI":"10.3390\/electronics10040396"},{"key":"1883_CR17","doi-asserted-by":"publisher","unstructured":"Sestito, C., Perri, S., & Stewart, R. (2022). Design-Space Exploration of Quantized Transposed Convolutional Neural Networks for FPGA-based Systems-on-Chip. In 2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC\/PiCom\/CBDCom\/CyberSciTech) (pp. 1\u20136). IEEE. https:\/\/doi.org\/10.1109\/DASC\/PiCom\/CBDCom\/Cy55231.2022.9927825","DOI":"10.1109\/DASC\/PiCom\/CBDCom\/Cy55231.2022.9927825"},{"key":"1883_CR18","unstructured":"LeCun, Y., Cortes, C., & Burges, C. J. (1998). The MNIST database of handwritten digits. Retrieved from http:\/\/yann.lecun.com\/exdb\/mnist\/"},{"key":"1883_CR19","doi-asserted-by":"publisher","unstructured":"Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. https:\/\/doi.org\/10.48550\/arXiv.1708.07747","DOI":"10.48550\/arXiv.1708.07747"},{"key":"1883_CR20","doi-asserted-by":"publisher","unstructured":"Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. https:\/\/doi.org\/10.48550\/arXiv.1511.06434","DOI":"10.48550\/arXiv.1511.06434"},{"key":"1883_CR21","doi-asserted-by":"publisher","unstructured":"Meng, Y., Kuppannagari, S., Kannan, R., & Prasanna, V. (2021, December). How to Avoid Zero-Spacing in Fractionally-Strided Convolution? A Hardware-Algorithm Co-Design Methodology. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC) (pp. 81\u201390). IEEE. https:\/\/doi.org\/10.1109\/HiPC53243.2021.00022","DOI":"10.1109\/HiPC53243.2021.00022"},{"key":"1883_CR22","doi-asserted-by":"publisher","unstructured":"Mao, W., Lin, J., & Wang, Z. (2020). F-DNA: Fast convolution architecture for deconvolutional network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 28(8), 1867\u20131880. https:\/\/doi.org\/10.1109\/TVLSI.2020.3000519","DOI":"10.1109\/TVLSI.2020.3000519"},{"key":"1883_CR23","doi-asserted-by":"publisher","unstructured":"Yu, Y., Zhao, T., Wang, M., Wang, K., & He, L. (2020). Uni-OPU: An FPGA-based uniform accelerator for convolutional and transposed convolutional networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 28(7), 1545\u20131556. https:\/\/doi.org\/10.1109\/TVLSI.2020.2995741","DOI":"10.1109\/TVLSI.2020.2995741"},{"key":"1883_CR24","doi-asserted-by":"publisher","unstructured":"Di, X., Yang, H. G., Jia, Y., Huang, Z., & Mao, N. (2020). Exploring efficient acceleration architecture for Winograd-transformed transposed convolution of GANs on FPGAs. Electronics, 9(2):286, 1\u201321. https:\/\/doi.org\/10.3390\/electronics9020286","DOI":"10.3390\/electronics9020286"},{"key":"1883_CR25","doi-asserted-by":"publisher","unstructured":"Marrazzo, E., Spagnolo, F., & Perri, S. (2022). Runtime Reconfigurable Hardware Accelerator for Energy-Efficient Transposed Convolutions. In 2022 17th Conference on Ph. D Research in Microelectronics and Electronics (PRIME) (pp. 141\u2013144). IEEE. https:\/\/doi.org\/10.1109\/PRIME55000.2022.9816800","DOI":"10.1109\/PRIME55000.2022.9816800"},{"issue":"11","key":"1883_CR26","doi-asserted-by":"publisher","first-page":"2519","DOI":"10.1109\/TCAD.2018.2857258","volume":"37","author":"J Yan","year":"2018","unstructured":"Yan, J., Yin, S., Tu, F., Liu, L., & Wei, S. (2018). GNA: Reconfigurable and efficient architecture for generative network acceleration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11), 2519\u20132529. https:\/\/doi.org\/10.1109\/TCAD.2018.2857258","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems"},{"key":"1883_CR27","doi-asserted-by":"publisher","unstructured":"Perri, S., Sestito, C., Spagnolo, F., & Corsonello, P. (2020). Efficient deconvolution architecture for heterogeneous systems-on-chip. Journal of Imaging, 6(9):85, 1\u201317. https:\/\/doi.org\/10.3390\/jimaging6090085","DOI":"10.3390\/jimaging6090085"},{"key":"1883_CR28","doi-asserted-by":"publisher","unstructured":"Wang, D., Shen, J., Wen, M., & Zhang, C. (2019). Efficient implementation of 2D and 3D sparse deconvolutional neural networks with a uniform architecture on FPGAs. Electronics, 8(7):803, 1\u201313. https:\/\/doi.org\/10.3390\/electronics8070803","DOI":"10.3390\/electronics8070803"},{"key":"1883_CR29","doi-asserted-by":"publisher","unstructured":"Lavin, A., & Gray, S. (2016). Fast algorithms for convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4013\u20134021). IEEE. https:\/\/doi.org\/10.1109\/CVPR.2016.435","DOI":"10.1109\/CVPR.2016.435"},{"issue":"3","key":"1883_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3242900","volume":"11","author":"S Liu","year":"2018","unstructured":"Liu, S., Fan, H., Niu, X., Ng, H. C., Chu, Y., & Luk, W. (2018). Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 11(3), 1\u201322. https:\/\/doi.org\/10.1145\/3242900","journal-title":"ACM Transactions on Reconfigurable Technology and Systems (TRETS)"},{"key":"1883_CR31","unstructured":"ARM. (2012). AMBA 4 AXI4, AXI4-Lite, and AXI4-Stream Protocol Assertions User Guide. Retrieved from https:\/\/developer.arm.com\/documentation\/dui0534\/b\/"},{"key":"1883_CR32","doi-asserted-by":"publisher","unstructured":"Hara, K., Saito, D., & Shouno, H. (2015). Analysis of function of rectified linear unit used in deep learning. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1\u20138). IEEE. https:\/\/doi.org\/10.1109\/IJCNN.2015.7280578","DOI":"10.1109\/IJCNN.2015.7280578"},{"key":"1883_CR33","unstructured":"AMD Xilinx. (2020). Vivado Design Suite User Guide: High-Level Synthesis. UG902 (v2019.2). Retrieved from https:\/\/www.xilinx.com\/content\/dam\/xilinx\/support\/documents\/sw_manuals\/xilinx2019_2\/ug902-vivado-high-level-synthesis.pdf"},{"key":"1883_CR34","doi-asserted-by":"publisher","unstructured":"Sestito, C., Perri, S., & Stewart, R. (2022). Accuracy Evaluation of Transposed Convolution-Based Quantized Neural Networks. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1\u20138). IEEE. https:\/\/doi.org\/10.1109\/IJCNN55064.2022.9892671","DOI":"10.1109\/IJCNN55064.2022.9892671"}],"container-title":["Journal of Signal Processing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-023-01883-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11265-023-01883-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11265-023-01883-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,2]],"date-time":"2023-12-02T09:11:50Z","timestamp":1701508310000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11265-023-01883-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":34,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["1883"],"URL":"https:\/\/doi.org\/10.1007\/s11265-023-01883-7","relation":{},"ISSN":["1939-8018","1939-8115"],"issn-type":[{"value":"1939-8018","type":"print"},{"value":"1939-8115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"30 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 April 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflicts of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}