{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T17:09:27Z","timestamp":1783098567901,"version":"3.54.6"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,3]],"date-time":"2024-05-03T00:00:00Z","timestamp":1714694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Shenzhen Science and Technology Program","award":["KQTD20210811090149095"],"award-info":[{"award-number":["KQTD20210811090149095"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these images for high-definition presentation. However, real-time execution of deep neural network (DNN)-based SR methods at the edge poses challenges due to their high computational and storage requirements. To address this, field-programmable gate arrays (FPGAs) have emerged as a promising platform, offering flexibility, programmability, and adaptability to evolving models. Previous FPGA-based SR solutions have focused on reducing computational and memory costs through aggressive simplification techniques, often sacrificing the quality of the reconstructed images. This paper introduces a novel SR network specifically designed for edge applications, which maintains reconstruction performance while managing computation costs effectively. Additionally, we propose an architectural design that enables the real-time and end-to-end inference of the proposed SR network on embedded FPGAs. Our key contributions include a tailored SR algorithm optimized for embedded FPGAs, a DSP-enhanced design that achieves a significant four-fold speedup, a novel scalable cache strategy for handling large feature maps, optimization of DSP cascade consumption, and a constraint optimization approach for resource allocation. Experimental results demonstrate that our FPGA-specific accelerator surpasses existing solutions, delivering superior throughput, energy efficiency, and image quality.<\/jats:p>","DOI":"10.1145\/3652855","type":"journal-article","created":{"date-parts":[[2024,3,16]],"date-time":"2024-03-16T11:20:20Z","timestamp":1710588020000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs"],"prefix":"10.1145","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7291-3712","authenticated-orcid":false,"given":"Hongduo","family":"Liu","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3687-8590","authenticated-orcid":false,"given":"Yijian","family":"Qian","sequence":"additional","affiliation":[{"name":"SmartMore, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2646-7808","authenticated-orcid":false,"given":"Youqiang","family":"Liang","sequence":"additional","affiliation":[{"name":"SmartMore, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4426-8347","authenticated-orcid":false,"given":"Bin","family":"Zhang","sequence":"additional","affiliation":[{"name":"SmartMore, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8310-2616","authenticated-orcid":false,"given":"Zhaohan","family":"Liu","sequence":"additional","affiliation":[{"name":"SmartMore, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2946-4541","authenticated-orcid":false,"given":"Tao","family":"He","sequence":"additional","affiliation":[{"name":"SmartMore, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9501-9254","authenticated-orcid":false,"given":"Wenqian","family":"Zhao","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0048-3140","authenticated-orcid":false,"given":"Jiangbo","family":"Lu","sequence":"additional","affiliation":[{"name":"SmartMore, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6406-4810","authenticated-orcid":false,"given":"Bei","family":"Yu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,5,3]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10593-2_13"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_18"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2865304"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00344"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01132"},{"key":"e_1_3_1_7_2","first-page":"181","volume-title":"2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","author":"He Zhuolun","year":"2018","unstructured":"Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo. 2018. FPGA-based real-time super-resolution system for ultra high definition videos. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 181\u2013188."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2864321"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2888898"},{"key":"e_1_3_1_10_2","first-page":"1121","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Nguyen Ngoc Long","year":"2021","unstructured":"Ngoc Long Nguyen, J\u00e9r\u00e9my Anger, Axel Davy, Pablo Arias, and Gabriele Facciolo. 2021. Self-supervised multi-image super-resolution for push-frame satellite images. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1121\u20131131."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2009.09.002"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-41778-3_18"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/bxm075"},{"key":"e_1_3_1_14_2","first-page":"6070","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Huang Yawen","year":"2017","unstructured":"Yawen Huang, Ling Shao, and Alejandro F. Frangi. 2017. Simultaneous super-resolution and cross-modality synthesis of 3D medical images using weakly-supervised joint convolutional sparse coding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6070\u20136079."},{"key":"e_1_3_1_15_2","first-page":"1","volume-title":"2015 International Conference on Technologies for Sustainable Development (ICTSD)","author":"Isaac Jithin Saji","year":"2015","unstructured":"Jithin Saji Isaac and Ramesh Kulkarni. 2015. Super resolution techniques for medical image processing. In 2015 International Conference on Technologies for Sustainable Development (ICTSD). IEEE, 1\u20136."},{"key":"e_1_3_1_16_2","first-page":"645","volume-title":"13th  \\(\\lbrace\\) USENIX \\(\\rbrace\\)  Symposium on Operating Systems Design and Implementation ( \\(\\lbrace\\) OSDI \\(\\rbrace\\)  18)","author":"Yeo Hyunho","year":"2018","unstructured":"Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. 2018. Neural adaptive content-aware internet video delivery. In 13th \\(\\lbrace\\) USENIX \\(\\rbrace\\) Symposium on Operating Systems Design and Implementation ( \\(\\lbrace\\) OSDI \\(\\rbrace\\) 18). 645\u2013661."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155384"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2439281"},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1007\/978-3-319-46475-6_25","volume-title":"Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14","author":"Dong Chao","year":"2016","unstructured":"Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016. Accelerating the super-resolution convolutional neural network. In Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 391\u2013407."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.182"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","unstructured":"C. Ledig L. Theis F. Huszar J. Caballero A. Cunningham A. Acosta A. Aitken A. Tejani J. Totz Z. Wang and W. Shi. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681\u20134690.","DOI":"10.1109\/CVPR.2017.19"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00340"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00265"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Marco Bevilacqua Aline Roumy Christine Guillemot and Marie-Line Alberi-Morel. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In British Machine Vision Conference BMVC 2012 Surrey UK September 3-7 2012 BMVA Press 1\u201310.","DOI":"10.5244\/C.26.135"},{"key":"e_1_3_1_25_2","first-page":"20343","article-title":"LAPAR: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond","volume":"33","author":"Li Wenbo","year":"2020","unstructured":"Wenbo Li, Kun Zhou, Lu Qi, Nianjuan Jiang, Jiangbo Lu, and Jiaya Jia. 2020. LAPAR: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond. Advances in Neural Information Processing Systems 33 (2020), 20343\u201320355.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_26_2","first-page":"99","volume-title":"European Conference on Computer Vision","author":"Chu Xiangxiang","year":"2020","unstructured":"Xiangxiang Chu, Bo Zhang, and Ruijun Xu. 2020. Multi-objective reinforced evolution in mobile neural architecture search. In European Conference on Computer Vision. Springer, 99\u2013113."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847276"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Yifan Yang Qijing Huang Bichen Wu Tianjun Zhang Liang Ma Giulio Gambardella Michaela Blott Luciano Lavagno Kees Vissers John Wawrzynek and Kurt Keutzer. 2019. Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas. In Proceedings of the 2019 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays. 23\u201332.","DOI":"10.1145\/3289602.3293902"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293915"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439296"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2019.2905242"},{"key":"e_1_3_1_32_2","first-page":"14","volume-title":"2018 International Conference on Field-Programmable Technology (FPT)","author":"Fan Hongxiang","year":"2018","unstructured":"Hongxiang Fan, Shuanglong Liu, Martin Ferianc, Ho-Cheung Ng, Zhiqiang Que, Shen Liu, Xinyu Niu, and Wayne Luk. 2018. A real-time object detection accelerator with compressed SSDLite on FPGA. In 2018 International Conference on Field-Programmable Technology (FPT). IEEE, 14\u201321."},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1109\/FPL53798.2021.00021","volume-title":"2021 31st International Conference on Field-Programmable Logic and Applications (FPL)","author":"Anupreetham Anupreetham","year":"2021","unstructured":"Anupreetham Anupreetham, Mohamed Ibrahim, Mathew Hall, Andrew Boutros, Ajay Kuzhively, Abinash Mohanty, Eriko Nurvitadhi, Vaughn Betz, Yu Cao, and Jae-sun Seo. 2021. End-to-end FPGA-based object detection using pipelined CNN and non-maximum suppression. In 2021 31st International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 76\u201382."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2021.3064639"},{"key":"e_1_3_1_35_2","unstructured":"Song Han Junlong Kang Huizi Mao Yiming Hu Xin Li Yubin Li Dongliang Xie Hong Luo Song Yao Yu Wang Huazhong Yang and William (Bill) J. Dally. 2017. Ese: Efficient speech recognition engine with sparse lstm on fpga. In Proceedings of the 2017 ACM\/SIGDA International Symposium on Field- Programmable Gate Arrays. 75\u201384."},{"key":"e_1_3_1_36_2","first-page":"1","volume-title":"2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)","author":"Li Chen-Lu","year":"2018","unstructured":"Chen-Lu Li, Yu-Jie Huang, Yu-Jie Cai, Jun Han, and Xiao-Yang Zeng. 2018. FPGA implementation of LSTM based on automatic speech recognition. In 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). IEEE, 1\u20133."},{"key":"e_1_3_1_37_2","first-page":"012010","volume-title":"Journal of Physics: Conference Series","volume":"2171","author":"Hu Huaixiang","year":"2022","unstructured":"Huaixiang Hu, Jiatong Li, Chunchun Wu, Xueyang Li, and Yuping Chen. 2022. Design and implementation of intelligent speech recognition system based on FPGA. In Journal of Physics: Conference Series, Vol. 2171. IOP Publishing, 012010."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_3_1_40_2","first-page":"1","volume-title":"2019 IEEE International Symposium on Circuits and Systems (ISCAS)","author":"Zhang Jiaxi","year":"2019","unstructured":"Jiaxi Zhang, Wentai Zhang, Guojie Luo, Xuechao Wei, Yun Liang, and Jason Cong. 2019. Frequency improvement of systolic array-based CNNs on FPGAs. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1\u20134."},{"key":"e_1_3_1_41_2","article-title":"8-bit dot-product acceleration","author":"Fu Yao","year":"2017","unstructured":"Yao Fu, Ephrem Wu, and Ashish Sirasao. 2017. 8-bit dot-product acceleration. Xilinx Inc.: San Jose, CA, USA (2017).","journal-title":"Xilinx Inc.: San Jose, CA, USA"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240801"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218684"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.150"},{"key":"e_1_3_1_45_2","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1007\/978-3-642-27413-8_47","volume-title":"Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7","author":"Zeyde Roman","year":"2012","unstructured":"Roman Zeyde, Michael Elad, and Matan Protter. 2012. On single image scale-up using sparse-representations. In Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. Springer, 711\u2013730."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2001.937655"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299156"},{"key":"e_1_3_1_48_2","unstructured":"([n. d.]). NVIDIA-NX. https:\/\/developer.nvidia.com\/embedded\/jetson-xavier-nx"},{"key":"e_1_3_1_49_2","unstructured":"([n. d.]). Atlas 200 DK. https:\/\/e.huawei.com\/hk\/products\/cloud-computing-dc\/atlas\/ascend-310"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375887"},{"key":"e_1_3_1_51_2","unstructured":"([n. d.]). Xilinx DPU. https:\/\/www.xilinx.com\/products\/intellectual-property\/dpu.html"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2018.2799577"},{"issue":"4","key":"e_1_3_1_53_2","first-page":"1739","article-title":"An FPGA-based residual recurrent neural network for real-time video super-resolution","volume":"32","author":"Sun Kaicong","year":"2021","unstructured":"Kaicong Sun, Maurice Koch, Zhe Wang, Slavisa Jovanovic, Hassan Rabah, and Sven Simon. 2021. An FPGA-based residual recurrent neural network for real-time video super-resolution. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 1739\u20131750.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_54_2","first-page":"7197","volume-title":"International Conference on Machine Learning","author":"Nagel Markus","year":"2020","unstructured":"Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. 2020. Up or down? Adaptive rounding for post-training quantization. In International Conference on Machine Learning. PMLR, 7197\u20137206."},{"key":"e_1_3_1_55_2","unstructured":"Jiantao Qiu Jie Wang Song Yao Kaiyuan Guo Boxun Li Erjin Zhou Jincheng Yu Tianqi Tang Ningyi Xu Sen Song Yu Wang and Huazhong Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the 2016 ACM\/SIGDA International Symposium on Field Programmable Gate Arrays. 26\u201335."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062244"},{"key":"e_1_3_1_57_2","first-page":"1","volume-title":"2017 IEEE International Symposium on Circuits and Systems (ISCAS)","author":"Ma Yufei","year":"2017","unstructured":"Yufei Ma, Minkyu Kim, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. End-to-end scalable FPGA accelerator for deep residual networks. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1\u20134."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2022.3215718"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652855","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3652855","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:53:56Z","timestamp":1750287236000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3652855"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,3]]},"references-count":57,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3652855"],"URL":"https:\/\/doi.org\/10.1145\/3652855","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,3]]},"assertion":[{"value":"2023-04-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-06","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}