{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T11:40:57Z","timestamp":1774352457575,"version":"3.50.1"},"reference-count":34,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T00:00:00Z","timestamp":1716768000000},"content-version":"vor","delay-in-days":147,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFD2000700"],"award-info":[{"award-number":["2022YFD2000700"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Computers &amp;amp; Digital Techniques"],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:p>The asynchronous advantage actor\u2010critic (A3C) algorithm is widely regarded as one of the most effective and powerful algorithms among various deep reinforcement learning algorithms. However, the distributed and asynchronous nature of the A3C algorithm brings increased algorithm complexity and computational requirements, which not only leads to an increased training cost but also amplifies the difficulty of deploying the algorithm on resource\u2010limited field programmable gate array (FPGA) platforms. In addition, the resource wastage problem caused by the distributed training characteristics of A3C algorithms and the resource allocation problem affected by the imbalance between the computational amount of inference and training need to be carefully considered when designing accelerators. In this paper, we introduce a deployment strategy designed for distributed algorithms aimed at enhancing the resource utilization of hardware devices. Subsequently, a FPGA architecture is constructed specifically for accelerating the inference and training processes of the A3C algorithm. The experimental results show that our proposed deployment strategy reduces resource consumption by 62.5% and decreases the number of agents waiting for training by 32.2%, and the proposed A3C accelerator achieves 1.83\u00d7 and 2.39\u00d7 improvements in speedup compared to CPU (Intel i9\u201013900K) and GPU (NVIDIA RTX 4090) with less power consumption respectively. Furthermore, our design shows superior resource efficiency compared to existing works.<\/jats:p>","DOI":"10.1049\/2024\/7855250","type":"journal-article","created":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T23:06:08Z","timestamp":1716851168000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["A FPGA Accelerator of Distributed A3C Algorithm with Optimal Resource Deployment"],"prefix":"10.1049","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-3162-8389","authenticated-orcid":false,"given":"Fen","family":"Ge","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6831-0009","authenticated-orcid":false,"given":"Guohui","family":"Zhang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0006-5953-1246","authenticated-orcid":false,"given":"Ziyu","family":"Li","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8274-1846","authenticated-orcid":false,"given":"Fang","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"265","published-online":{"date-parts":[[2024,5,27]]},"reference":[{"key":"e_1_2_10_1_2","unstructured":"MnihV. BadiaA. P. MirzaM. GravesA. LillicrapT. HarleyT. andKavukcuogluK. Asynchronous methods for deep reinforcement learning Proceedings of The 33rd International Conference on Machine Learning 2016 PLMR 1928\u20131937."},{"key":"e_1_2_10_2_2","doi-asserted-by":"crossref","unstructured":"KendallA. HawkeJ. JanzD. MazurP. andRedaD. Learning to drive in a day IEEE International Conference on Robotics and Automation (ICRA) 2019 Montreal QC Canada IEEE 8248\u20138254 https:\/\/doi.org\/10.1109\/ICRA.2019.8793742 2-s2.0-85071490861.","DOI":"10.1109\/ICRA.2019.8793742"},{"key":"e_1_2_10_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/IVS.2018.8500718"},{"key":"e_1_2_10_4_2","doi-asserted-by":"crossref","unstructured":"GandhiD. PintoL. andGuptaA. Learning to fly by crashing 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017 Vancouver BC Canada IEEE 3948\u20133955 https:\/\/doi.org\/10.1109\/IROS.2017.8206247 2-s2.0-85041949482.","DOI":"10.1109\/IROS.2017.8206247"},{"key":"e_1_2_10_5_2","doi-asserted-by":"crossref","unstructured":"AnwarM. A.andRaychowdhuryA. NavREn-Rl: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images IEEE International Conference on Mechatronics and Machine Vision in Practice (M2VIP) 1995 Stuttgart Germany IEEE 1\u20136 https:\/\/doi.org\/10.1109\/M2VIP.2018.8600838 2-s2.0-85061818380.","DOI":"10.1109\/M2VIP.2018.8600838"},{"key":"e_1_2_10_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10846\u2010018\u20100891\u20108"},{"key":"e_1_2_10_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/APCCAS.2018.8605639"},{"key":"e_1_2_10_8_2","doi-asserted-by":"crossref","unstructured":"ZhangT. McCarthyZ. JowO. LeeD. ChenX. GoldbergK. andAbbeelP. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation IEEE International Conference on Robotics and Automation (ICRA) 2018 Brisbane QLD Australia IEEE 5628\u20135635 https:\/\/doi.org\/10.1109\/ICRA.2018.8461249 2-s2.0-85059374758.","DOI":"10.1109\/ICRA.2018.8461249"},{"key":"e_1_2_10_9_2","doi-asserted-by":"crossref","unstructured":"MarchesiniE.andFarinelliA. Enhancing deep reinforcement learning approaches for multi-robot navigation via single-robot evolutionary policy search International Conference on Robotics and Automation (ICRA) 2022 Philadelphia PA USA IEEE 5525\u20135531 https:\/\/doi.org\/10.1109\/ICRA46639.2022.9812341.","DOI":"10.1109\/ICRA46639.2022.9812341"},{"key":"e_1_2_10_10_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_2_10_11_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12555\u2010020\u20100277\u20100"},{"key":"e_1_2_10_12_2","unstructured":"BabaeizadehM. FrosioI. TyreeS. ClemonsJ. andKautzJ. Reinforcement learning through asynchronous advantage actor-critic on a gpu 2016 arXiv preprint arXiv: 1611.06256."},{"key":"e_1_2_10_13_2","unstructured":"AbdelouahabK. PelcatM. SerotJ. andBerryF. Accelerating CNN inference on FPGAs: a survey 2018 arXiv preprint arXiv: 1806.01683."},{"key":"e_1_2_10_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2961174"},{"key":"e_1_2_10_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJCAS.2020.3043737"},{"key":"e_1_2_10_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451053"},{"key":"e_1_2_10_17_2","doi-asserted-by":"crossref","unstructured":"ChangA. X. M.andCulurcielloE. Hardware accelerators for recurrent neural networks on FPGA IEEE International Symposium on Circuits and Systems (ISCAS) 2017 Baltimore MD USA IEEE 1\u20134 https:\/\/doi.org\/10.1109\/ISCAS.2017.8050816 2-s2.0-85032656330.","DOI":"10.1109\/ISCAS.2017.8050816"},{"key":"e_1_2_10_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2007.11.026"},{"key":"e_1_2_10_19_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_2_10_20_2","doi-asserted-by":"crossref","unstructured":"MengY. KuppannagariS. RajatR. SrivastavaA. KannanR. andPrasannaV. QTAccel: a generic FPGA based design for Q-table based reinforcement learning accelerators IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2020 New Orleans LA USA IEEE 107\u2013114 https:\/\/doi.org\/10.1109\/IPDPSW50202.2020.00024.","DOI":"10.1109\/IPDPSW50202.2020.00024"},{"key":"e_1_2_10_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2020.3028350"},{"key":"e_1_2_10_22_2","doi-asserted-by":"crossref","unstructured":"ShiriA. PrakashB. MazumderA. N. WaytowichN. R. OatesT. andMohseninT. An energy-efficient hardware accelerator for hierarchical deep reinforcement learning IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2021 Washington DC DC USA IEEE 1\u20134 https:\/\/doi.org\/10.1109\/AICAS51828.2021.9458548.","DOI":"10.1109\/AICAS51828.2021.9458548"},{"key":"e_1_2_10_23_2","doi-asserted-by":"crossref","unstructured":"LealD. P. SugayaM. AmanoH. andOhkawaT. FPGA acceleration of ROS2-based reinforcement learning agents Eighth International Symposium on Computing and Networking Workshops (CANDARW) 2020 Naha Japan IEEE 106\u2013112 https:\/\/doi.org\/10.1109\/CANDARW51189.2020.00031.","DOI":"10.1109\/CANDARW51189.2020.00031"},{"key":"e_1_2_10_24_2","doi-asserted-by":"crossref","unstructured":"LiM.-J. LiA.-H. HuangY.-J. andChuS.-I. Implementation of deep reinforcement learning International Conference on Information Science and Systems 2019 ACM 232\u2013236 https:\/\/doi.org\/10.1145\/3322645.3322693 2-s2.0-85066946272.","DOI":"10.1145\/3322645.3322693"},{"key":"e_1_2_10_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2021.3063363"},{"key":"e_1_2_10_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3039902.3039915"},{"key":"e_1_2_10_27_2","doi-asserted-by":"crossref","unstructured":"MengY.andKuppannagariS. Accelerating proximal policy optimization on CPU-FPGA heterogeneous platforms IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM) 2020 Fayetteville AR USA IEEE 19\u201327 https:\/\/doi.org\/10.1109\/FCCM48280.2020.00012.","DOI":"10.1109\/FCCM48280.2020.00012"},{"key":"e_1_2_10_28_2","doi-asserted-by":"crossref","unstructured":"YangJ. HongS. andKimJ.-Y. FIXAR: a fixed-point deep reinforcement learning platform with quantization-aware training and adaptive parallelism ACM\/IEEE Design Automation Conference (DAC) 2021 San Francisco CA USA IEEE 259\u2013264 https:\/\/doi.org\/10.1109\/DAC18074.2021.9586213.","DOI":"10.1109\/DAC18074.2021.9586213"},{"key":"e_1_2_10_29_2","doi-asserted-by":"crossref","unstructured":"ChoH. OhP. ParkJ. JungW. andLeeJ. FA3C: FPGA-accelerated deep reinforcement learning International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2019 ACM 499\u2013513 https:\/\/doi.org\/10.1145\/3297858.3304058 2-s2.0-85064698044.","DOI":"10.1145\/3297858.3304058"},{"key":"e_1_2_10_30_2","doi-asserted-by":"crossref","unstructured":"WangY. WangM. LiB. LiH. andLiX. A many-core accelerator design for on-chip deep reinforcement learning International Conference on Computer-Aided Design 2020 San Diego CA USA IEEE 1\u20137.","DOI":"10.1145\/3400302.3415636"},{"key":"e_1_2_10_31_2","doi-asserted-by":"crossref","unstructured":"ChenH. IssaM. NiY. andImaniM. DARL: distributed reconfigurable accelerator for hyperdimensional reinforcement learning IEEE\/ACM International Conference on Computer-Aided Design 2022 San Diego CA USA IEEE 1\u20139.","DOI":"10.1145\/3508352.3549437"},{"key":"e_1_2_10_32_2","doi-asserted-by":"crossref","unstructured":"LiY. LiuI.-J. YuanY. ChenD. SchwingA. andHuangJ. Accelerating distributed reinforcement learning with in-switch computing 2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA) 2019 Phoenix AZ USA IEEE 279\u2013291.","DOI":"10.1145\/3307650.3322259"},{"key":"e_1_2_10_33_2","first-page":"26","article-title":"Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude","volume":"4","author":"Tieleman T.","year":"2012","journal-title":"COURSERA: Neural Networks for Machine Learning"},{"key":"e_1_2_10_34_2","doi-asserted-by":"crossref","unstructured":"GankidiP. R.andThangavelauthamJ. FPGA architecture for deep learning and its application to planetary robotics IEEE Aerospace Conference 2017 Big Sky MT USA IEEE 1\u20139 https:\/\/doi.org\/10.1109\/AERO.2017.7943929 2-s2.0-85021192643.","DOI":"10.1109\/AERO.2017.7943929"}],"container-title":["IET Computers &amp; Digital Techniques"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/ietcdt\/2024\/7855250.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/ietcdt\/2024\/7855250.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/2024\/7855250","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T09:13:51Z","timestamp":1762334031000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/2024\/7855250"}},"subtitle":[],"editor":[{"given":"Roger","family":"Woods","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,1]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10.1049\/2024\/7855250"],"URL":"https:\/\/doi.org\/10.1049\/2024\/7855250","archive":["Portico"],"relation":{},"ISSN":["1751-8601","1751-861X"],"issn-type":[{"value":"1751-8601","type":"print"},{"value":"1751-861X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1]]},"assertion":[{"value":"2023-11-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"7855250"}}