{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T11:37:22Z","timestamp":1775561842134,"version":"3.50.1"},"reference-count":34,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,3,2]],"date-time":"2020-03-02T00:00:00Z","timestamp":1583107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["2018R1A6A1A03025526"],"award-info":[{"award-number":["2018R1A6A1A03025526"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor\u2013critic proximal policy optimization (Actor\u2013Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.<\/jats:p>","DOI":"10.3390\/s20051359","type":"journal-article","created":{"date-parts":[[2020,3,3]],"date-time":"2020-03-03T03:13:28Z","timestamp":1583205208000},"page":"1359","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":63,"title":["Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8807-1158","authenticated-orcid":false,"given":"Hyun-Kyo","family":"Lim","sequence":"first","affiliation":[{"name":"Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan 31253, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ju-Bong","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan 31253, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joo-Seong","family":"Heo","sequence":"additional","affiliation":[{"name":"Department of Interdisciplinary Program in Creative Engineering, Korea University of Technology and Education, Cheonan 31253, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5835-7972","authenticated-orcid":false,"given":"Youn-Hee","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science Engineering, Korea University of Technology and Education, Cheonan 31253, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,3,2]]},"reference":[{"key":"ref_1","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, The MIT Press."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-Level Control through Deep Reinforcement Learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the Game of Go with Deep Neural Networks and Tree Search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_4","unstructured":"Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., and Graepel, T. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv."},{"key":"ref_5","unstructured":"Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W., Dudzik, A., Huang, A., Georgiev, P., and Powell, R. (2019, February 02). AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. Available online: https:\/\/deepmind.com\/blog\/alphastar-mastering-real-time-strategy-game-starcraft-ii\/."},{"key":"ref_6","unstructured":"Wang, L., Liu, Y., and Zhai, X. (2015, January 28\u201330). Design of Reinforce Learning Control Algorithm and Verified in Inverted Pendulum. Proceedings of the 34th Chinese Control Conference (CCC), Hangzou, China."},{"key":"ref_7","first-page":"1","article-title":"Reinforcement Learning-based Control of Nonlinear Systems using Lyapunov Stability Concept and Fuzzy Reward Scheme","volume":"99","author":"Chen","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Express Briefs"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Puriel-Gil, G., Yu, W., and Sossa, H. (2018, January 5\u20137). Reinforcement Learning Compensation based PD Control for Inverted Pendulum. Proceedings of the 15th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), Mexico City, Mexico.","DOI":"10.1109\/ICEEE.2018.8533946"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kersandt, K., Mu\u00f1oz, G., and Barrado, C. (2018, January 23\u201327). Self-training by Reinforcement Learning for Full-autonomous Drones of the Future. Proceedings of the IEEE\/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK.","DOI":"10.1109\/DASC.2018.8569503"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"36682","DOI":"10.1109\/ACCESS.2019.2905621","article-title":"Imitation Reinforcement Learning-Based Remote Rotary Inverted Pendulum Control in OpenFlow Network","volume":"7","author":"Kim","year":"2019","journal-title":"IEEE Access"},{"key":"ref_11","unstructured":"Bonawitz, K., Eichner, H., and Grieskamp, W. (2019). Towards Federated Learning at Scale: System Design. arXiv."},{"key":"ref_12","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_13","unstructured":"Konda, V. (2002). Actor-Critic Algorithms. [Ph.D. Thesis, Cambridge of University]."},{"key":"ref_14","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I. (2017, January 17). Multi-agent Actor-Critic for Mixed Cooperative-Competitive Environments. Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA."},{"key":"ref_15","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 7\u20139). Trust Region Policy Optimization. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_16","unstructured":"Konecn\u00fd, J., McMahan, H.B., and Ramage, D. (2015). Federated Optimization: Distributed Optimization Beyond the Datacenter. arXiv."},{"key":"ref_17","unstructured":"McMahan, H.B., Moore, E., Ramage, D., and y Arcas, B.A. (2016). Federated Learning of Deep Networks using Model Averaging. arXiv."},{"key":"ref_18","unstructured":"Konecn\u00fd, J., McMahan, H.B., Yu, F.X., Richt\u00e1rik, P., Suresh, A.T., and Bacon, D. (2016). Federated Learning: Strategies for Improving Communication Efficiency. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Torrey, L., and Shavlik, J. (2009). Handbook of Research on Machine Learning Applications, IGI Global.","DOI":"10.4018\/978-1-60566-766-9.ch011"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Glatt, R., Silva, F., and Costa, A. (2016, January 9\u201312). Towards Knowledge Transfer in Deep Reinforcement Learning. Proceedings of the 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), Recife, Brazil.","DOI":"10.1109\/BRACIS.2016.027"},{"key":"ref_21","unstructured":"Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A.D., Panneershelvam, V., Suleyman, M., Beattie, C., and Petersen, S. (2015). Massively Parallel Methods for Deep Reinforcement Learning. arXiv."},{"key":"ref_22","unstructured":"Zhuo, H.H., Feng, W., Xu, Q., Yang, Q., and Lin, Y. (2019). Federated Reinforcement Learning. arXiv."},{"key":"ref_23","unstructured":"Liang, X., Liu, Y., Chen, T., Liu, M., and Yang, Q. (2019). Federated Transfer Reinforcement Learning for Autonomous Driving. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4555","DOI":"10.1109\/LRA.2019.2931179","article-title":"Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems","volume":"4","author":"Liu","year":"2019","journal-title":"IEEE Rob. Autom Lett."},{"key":"ref_25","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the International Conference on Machine Learning, New York City, NY, USA."},{"key":"ref_26","unstructured":"Quanser (2016, March 02). QUBE - Servo 2. Available online: https:\/\/www.quanser.com\/products\/qube-servo-2\/."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Strom, N. (2015, January 6\u201310). Scalable Distributed DNN Training using Commodity GPU Cloud Computing. Proceedings of the INTERSPEECH. ISCA, Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-354"},{"key":"ref_28","unstructured":"Zheng, S., Meng, Q., Wang, T., Chen, W., Yu, N., Ma, Z.M., and Liu, T.Y. (2017, January 6\u201311). Asynchronous Stochastic Gradient Descent with Delay Compensation. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_29","unstructured":"Srinivasan, A., Jain, A., and Barekatain, P. (May, January 30). An Analysis of the Delayed Gradients Problem in Asynchronous SGD. Proceedings of the ICLR 2018 Workshop, Vancouver, BC, Canada."},{"key":"ref_30","unstructured":"Schulman, J., Moritz, P., Levine, S., Jordan, M.I., and Abbeel, P. (2015). High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv."},{"key":"ref_31","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference for Learning Representations, San Diego, CA, USA."},{"key":"ref_32","first-page":"825","article-title":"Information Theory and Statistics. Solomon Kullback","volume":"54","author":"Lindley","year":"1959","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_33","unstructured":"Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates."},{"key":"ref_34","unstructured":"Lim, H.K., Kim, J.B., and Han, Y.H. (2019, January 23\u201324). Learning Performance Improvement Using Federated Reinforcement Learning Based on Distributed Multi-Agents. Proceedings of the KICS Fall Conference 2019, Seoul, Korea."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/5\/1359\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:03:13Z","timestamp":1760173393000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/5\/1359"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,2]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,3]]}},"alternative-id":["s20051359"],"URL":"https:\/\/doi.org\/10.3390\/s20051359","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,2]]}}}