{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,28]],"date-time":"2025-09-28T00:04:09Z","timestamp":1759017849012,"version":"3.44.0"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T00:00:00Z","timestamp":1758931200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T00:00:00Z","timestamp":1758931200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach. Intell. Res."],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Federated reinforcement learning (FedRL) is an emerging paradigm in data-driven control where a group of decision-making agents cooperate to learn optimal control laws through a distributed reinforcement learning procedure, with the peculiarity of having the constraints of not sharing any process\/control data. In the typical FedRL setting, a centralized entity is responsible for orchestrating the distributed training process. To remove this design limitation, this work proposes a solution to enable a fully decentralized approach leveraging on results from consensus theory. The proposed algorithm, named FedRLCon, can then deal with: 1) scenarios with homogeneous agents, which can share their actor and, possibly, the critic networks; 2) scenarios with heterogeneous agents, in which agents may share their critic network only. The proposed algorithms are validated on two scenarios, consisting of a resource management problem in a communication network and a smart grid case study. Our tests show that practically no performance is lost for the decentralization.<\/jats:p>","DOI":"10.1007\/s11633-025-1550-8","type":"journal-article","created":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T06:48:04Z","timestamp":1758955684000},"page":"929-940","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing Federated Reinforcement Learning: A Consensus-based Approach for Both Homogeneous and Heterogeneous Agents"],"prefix":"10.1007","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5503-8506","authenticated-orcid":false,"given":"Alessandro","family":"Giuseppi","sequence":"first","affiliation":[]},{"given":"Danilo","family":"Menegatti","sequence":"additional","affiliation":[]},{"given":"Antonio","family":"Pietrabissa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,9,27]]},"reference":[{"key":"1550_CR1","volume-title":"Reinforcement Learning: An Introduction","author":"R S Sutton","year":"1998","unstructured":"R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction, Cambridge, USA: MIT Press, 1998."},{"key":"1550_CR2","first-page":"1008","volume-title":"Proceedings of the 13th International Conference on Neural Information Processing Systems","author":"V Konda","year":"1999","unstructured":"V. Konda, J. Tsitsiklis. Actor-critic algorithms. In Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, USA, pp. 1008\u20131014, 1999."},{"key":"1550_CR3","first-page":"1273","volume-title":"Proceedings of the 20th International Conference on Artificial Intelligence and Statistics","author":"H B McMahan","year":"2017","unstructured":"H. B. McMahan, E. Moore, D. Ramage, S. Hampson, B. A. Y. Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, pp. 1273\u20131282, 2017."},{"key":"1550_CR4","volume-title":"Massively parallel methods for deep reinforcement learning","author":"A Nair","year":"2015","unstructured":"A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, D. Silver. Massively parallel methods for deep reinforcement learning, [Online], Available: https:\/\/arxiv.org\/abs\/1507.04296, 2015."},{"key":"1550_CR5","volume-title":"Proceedings of the 6th International Conference on Learning Representations","author":"D Horgan","year":"2018","unstructured":"D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, D. Silver. Distributed prioritized experience replay. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, Canada, 2018."},{"issue":"1","key":"1550_CR6","doi-asserted-by":"publisher","first-page":"18","DOI":"10.20517\/ir.2021.02","volume":"1","author":"J Qi","year":"2021","unstructured":"J. Qi, Q. Zhou, L. Lei, K. Zheng. Federated reinforcement learning: Techniques, applications, and open challenges. Intelligence & Robotics, vol. 1, no. 1, pp. 18\u201357, 2021. DOI: https:\/\/doi.org\/10.20517\/ir.2021.02.","journal-title":"Intelligence & Robotics"},{"key":"1550_CR7","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"J Geiping","year":"2020","unstructured":"J. Geiping, H. Bauermeister, H. Dr\u00f6ge, M. Moeller. Inverting gradients - how easy is it to break privacy in federated learning? In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, Article number 1421, 2020."},{"issue":"4","key":"1550_CR8","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1007\/s11633-022-1338-z","volume":"19","author":"A Giuseppi","year":"2022","unstructured":"A. Giuseppi, S. Manfredi, A. Pietrabissa. A weighted average consensus approach for decentralized federated learning. Machine Intelligence Research, vol. 19, no. 4, pp. 319\u2013330, 2022. DOI: https:\/\/doi.org\/10.1007\/s11633-022-1338-z.","journal-title":"Machine Intelligence Research"},{"key":"1550_CR9","volume-title":"Federated deep reinforcement learning","author":"H H Zhuo","year":"2020","unstructured":"H. H. Zhuo, W. Feng, Y. Lin, Q. Xu, Q. Yang. Federated deep reinforcement learning, [Online], Available: https:\/\/arxiv.org\/abs\/1901.08277, 2020."},{"key":"1550_CR10","doi-asserted-by":"publisher","first-page":"78","DOI":"10.23919\/acc55779.2023.10156236","volume-title":"Proceedings of American Control Conference","author":"Z Yuan","year":"2023","unstructured":"Z. Yuan, S. Xu, M. Zhu. Federated reinforcement learning for generalizable motion planning. In Proceedings of American Control Conference, San Diego, USA, pp. 78\u201383, 2023. DOI: https:\/\/doi.org\/10.23919\/acc55779.2023.10156236."},{"issue":"11","key":"1550_CR11","doi-asserted-by":"publisher","first-page":"12321","DOI":"10.1109\/TVT.2022.3190557","volume":"71","author":"R Luo","year":"2022","unstructured":"R. Luo, W. Ni, H. Tian, J. Cheng. Federated deep reinforcement learning for RIS-assisted indoor multi-robot communication systems. IEEE Transactions on Vehicular Technology, vol. 71, no. 11, pp. 12321\u201312326, 2022. DOI: https:\/\/doi.org\/10.1109\/tvt.2022.3190557.","journal-title":"IEEE Transactions on Vehicular Technology"},{"issue":"22","key":"1550_CR12","doi-asserted-by":"publisher","first-page":"22095","DOI":"10.1109\/JIOT.2021.3081626","volume":"9","author":"M Xu","year":"2022","unstructured":"M. Xu, J. Peng, B. B. Gupta, J. Kang, Z. Xiong, Z. Li, A. A. Abd El-Latif. Multiagent federated reinforcement learning for secure incentive mechanism in intelligent cyber\u2013physical systems. IEEE Internet of Things Journal, vol. 9, no. 22, pp. 22095\u201322108, 2022. DOI: https:\/\/doi.org\/10.1109\/jiot.2021.3081626.","journal-title":"IEEE Internet of Things Journal"},{"issue":"1","key":"1550_CR13","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1109\/JSTSP.2022.3224591","volume":"17","author":"M Xu","year":"2023","unstructured":"M. Xu, D. Niyato, Z. Yang, Z. Xiong, J. Kang, D. I. Kim, X. Shen. Privacy-preserving intelligent resource allocation for federated edge learning in quantum internet. IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 1, pp. 142\u2013157, 2023. DOI: https:\/\/doi.org\/10.1109\/jstsp.2022.3224591.","journal-title":"IEEE Journal of Selected Topics in Signal Processing"},{"issue":"8","key":"1550_CR14","doi-asserted-by":"publisher","first-page":"5572","DOI":"10.1109\/TII.2020.3032165","volume":"17","author":"S Messaoud","year":"2021","unstructured":"S. Messaoud, A. Bradai, O. B. Ahmed, P. T. A. Quang, M. Atri, M. S. Hossain. Deep federated Q-learning-based network slicing for industrial IoT. IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp. 5572\u20135582, 2021. DOI: https:\/\/doi.org\/10.1109\/tii.2020.3032165.","journal-title":"IEEE Transactions on Industrial Informatics"},{"issue":"5","key":"1550_CR15","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1109\/MNET.001.2200194","volume":"36","author":"X Li","year":"2022","unstructured":"X. Li, C. Sun, J. Wen, X. Wang, M. Guizani, V. C. M. Leung. Multi-user QoE enhancement: Federated multiagent reinforcement learning for cooperative edge intelligence. IEEE Network, vol. 36, no. 5, pp. 144\u2013151, 2022. DOI: https:\/\/doi.org\/10.1109\/mnet.001.2200194.","journal-title":"IEEE Network"},{"key":"1550_CR16","volume-title":"Playing Atari with deep reinforcement learning","author":"V Mnih","year":"2013","unstructured":"V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with deep reinforcement learning, [Online], Available: https:\/\/arxiv.org\/abs\/1312.5602, 2013."},{"key":"1550_CR17","doi-asserted-by":"publisher","first-page":"76296","DOI":"10.1109\/ACCESS.2021.3083087","volume":"9","author":"H K Lim","year":"2021","unstructured":"H. K. Lim, J. B. Kim, I. Ullah, J. S. Heo, Y. H. Han. Federated reinforcement learning acceleration method for precise control of multiple devices. IEEE Access, vol. 9, pp. 76296\u201376306, 2021. DOI: https:\/\/doi.org\/10.1109\/access.2021.3083087.","journal-title":"IEEE Access"},{"key":"1550_CR18","volume-title":"Proximal policy optimization algorithms","author":"J Schulman","year":"2017","unstructured":"J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https:\/\/arxiv.org\/abs\/1707.06347, 2017."},{"key":"1550_CR19","first-page":"2810","volume-title":"Proceedings of International Conference on Autonomous Agents and Multiagent Systems","author":"F X Fan","year":"2023","unstructured":"F. X. Fan, Y. Ma, Z. Dai, C. Tan, B. K. H. Low. FedHQL: Federated heterogeneous Q-learning. In Proceedings of International Conference on Autonomous Agents and Multiagent Systems, London, UK, pp. 2810\u20132812, 2023."},{"issue":"3","key":"1550_CR20","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"C J C H Watkins","year":"1992","unstructured":"C. J. C. H. Watkins, P. Dayan. Q-learning. Machine Learning, vol. 8, no. 3, pp. 279\u2013292, 1992. DOI: https:\/\/doi.org\/10.1007\/BF00992698.","journal-title":"Machine Learning"},{"key":"1550_CR21","doi-asserted-by":"publisher","DOI":"10.1109\/ciss56502.2023.10089771","volume-title":"Proceedings of the 57th Annual Conference on Information Sciences and Systems","author":"Y Zhu","year":"2023","unstructured":"Y. Zhu, X. Gong. Distributed policy gradient with heterogeneous computations for federated reinforcement learning. In Proceedings of the 57th Annual Conference on Information Sciences and Systems, Baltimore, USA, 2023. DOI: https:\/\/doi.org\/10.1109\/ciss56502.2023.10089771."},{"key":"1550_CR22","first-page":"18","volume-title":"Proceedings of the 25th International Conference on Artificial Intelligence and Statistics","author":"H Jin","year":"2022","unstructured":"H. Jin, Y. Peng, W. Yang, S. Wang, Z. Zhang. Federated reinforcement learning with environment heterogeneity. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, Cambridge, USA, pp. 18\u201337, 2022."},{"issue":"1","key":"1550_CR23","doi-asserted-by":"publisher","first-page":"398","DOI":"10.1109\/TNET.2020.3035770","volume":"29","author":"C T Dinh","year":"2021","unstructured":"C. T. Dinh, N. H. Tran, M. N. H. Nguyen, C. S. Hong, W. Bao, A. Y. Zomaya, V. Gramoli. Federated learning over wireless networks: Convergence analysis and resource allocation. IEEE\/ACM Transactions on Networking, vol. 29, no. 1, pp. 398\u2013409, 2021. DOI: https:\/\/doi.org\/10.1109\/tnet.2020.3035770.","journal-title":"IEEE\/ACM Transactions on Networking"},{"issue":"9","key":"1550_CR24","doi-asserted-by":"publisher","first-page":"1520","DOI":"10.1109\/TAC.2004.834113","volume":"49","author":"R Olfati-Saber","year":"2004","unstructured":"R. Olfati-Saber, R. M. Murray. Consensus problems in networks of agents with switching topology and time-delays. IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1520\u20131533, 2004. DOI: https:\/\/doi.org\/10.1109\/TAC.2004.834113.","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"1","key":"1550_CR25","first-page":"13","volume":"4","author":"F Pedroche S\u00e1nchez","year":"2014","unstructured":"F. Pedroche S\u00e1nchez, M. Rebollo Pedruelo, C. Carrascosa Casamayor, A. Palomares Chust. Convergence of weighted-average consensus for undirected graphs. International Journal of Complex Systems in Science, vol. 4, no. 1, pp. 13\u201316, 2014.","journal-title":"International Journal of Complex Systems in Science"},{"key":"1550_CR26","doi-asserted-by":"publisher","first-page":"2094","DOI":"10.1609\/aaai.v30i1.10295","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence","author":"H van Hasselt","year":"2016","unstructured":"H. van Hasselt, A. Guez, D. Silver. Deep reinforcement learning with double Q-learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, pp. 2094\u20132100, 2016. DOI: https:\/\/doi.org\/10.1609\/aaai.v30i1.10295."},{"issue":"3","key":"1550_CR27","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1023\/A:1022672621406","volume":"8","author":"R J Williams","year":"1992","unstructured":"R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, vol. 8, no. 3, pp. 229\u2013256, 1992. DOI: https:\/\/doi.org\/10.1007\/BF00992696.","journal-title":"Machine Learning"},{"key":"1550_CR28","first-page":"1928","volume-title":"Proceedings of the 33rd International Conference on Machine Learning","author":"V Mnih","year":"2016","unstructured":"V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, New York City, USA, pp. 1928\u20131937, 2016."},{"key":"1550_CR29","volume-title":"Proceedings of the 4th International Conference on Learning Representations","author":"T P Lillicrap","year":"2016","unstructured":"T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D. Wierstra. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, San Juan, USA, 2016."},{"key":"1550_CR30","first-page":"1587","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"S Fujimoto","year":"2018","unstructured":"S. Fujimoto, H. van Hoof, D. Meger. Addressing function approximation error in actor-critic methods. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1587\u20131596, 2018."},{"key":"1550_CR31","first-page":"5039","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"S Di-Castro","year":"2022","unstructured":"S. Di-Castro, S. Mannor, D. Di Castro. Analysis of stochastic processes through replay buffers. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, USA, pp. 5039\u20135060, 2022."},{"key":"1550_CR32","first-page":"12621","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"S Zhang","year":"2021","unstructured":"S. Zhang, H. Yao, S. Whiteson. Breaking the deadly triad with a target network. In Proceedings of the 38th International Conference on Machine Learning, pp. 12621\u201312631, 2021."},{"issue":"4","key":"1550_CR33","doi-asserted-by":"publisher","first-page":"340","DOI":"10.2307\/2266585","volume":"18","author":"A Mostowski","year":"1953","unstructured":"A. Mostowski. A. A. Markov. The theory of algorithms. The Journal of Symbolic Logic, vol. 18, no. 4, pp. 340\u2013341, 1953. DOI: https:\/\/doi.org\/10.2307\/2266585.","journal-title":"The Journal of Symbolic Logic"},{"issue":"2","key":"1550_CR34","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","volume":"38","author":"L Busoniu","year":"2008","unstructured":"L. Busoniu, R. Babuska, B. De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2, pp. 156\u2013172, 2008. DOI: https:\/\/doi.org\/10.1109\/TSMCC.2007.913919.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)"},{"issue":"5","key":"1550_CR35","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1103\/PhysRev.36.823","volume":"36","author":"G E Uhlenbeck","year":"1930","unstructured":"G. E. Uhlenbeck, L. S. Ornstein. On the theory of the Brownian motion. Physical Review, vol. 36, no. 5, pp. 823\u2013841, 1930. DOI: https:\/\/doi.org\/10.1103\/physrev.36.823.","journal-title":"Physical Review"},{"key":"1550_CR36","doi-asserted-by":"publisher","unstructured":"S. Bakri, B. Brik, A. Ksentini. On using reinforcement learning for network slice admission control in 5G: Offline vs. online. International Journal of Communication Systems, vol. 34, no. 7, Article number e4757, 2021. DOI: https:\/\/doi.org\/10.1002\/dac.4757.","DOI":"10.1002\/dac.4757"},{"issue":"4","key":"1550_CR37","doi-asserted-by":"publisher","first-page":"3163","DOI":"10.1109\/TSG.2021.3061619","volume":"12","author":"R Dai","year":"2021","unstructured":"R. Dai, R. Esmaeilbeigi, H. Charkhgard. The utilization of shared energy storage in energy systems: A comprehensive review. IEEE Transactions on Smart Grid, vol. 12, no. 4, pp. 3163\u20133174, 2021. DOI: https:\/\/doi.org\/10.1109\/tsg.2021.3061619.","journal-title":"IEEE Transactions on Smart Grid"},{"key":"1550_CR38","doi-asserted-by":"publisher","first-page":"3447","DOI":"10.1109\/LCSYS.2023.3329072","volume":"7","author":"A Joshi","year":"2023","unstructured":"A. Joshi, M. Tipaldi, L. Glielmo. A consensus Q-learning approach for decentralized control of shared energy storage. IEEE Control Systems Letters, vol. 7, pp. 3447\u20133452, 2023. DOI: https:\/\/doi.org\/10.1109\/lcsys.2023.3329072.","journal-title":"IEEE Control Systems Letters"},{"issue":"8","key":"1550_CR39","doi-asserted-by":"publisher","first-page":"787","DOI":"10.1080\/14786451.2015.1100196","volume":"36","author":"E L Ratnam","year":"2017","unstructured":"E. L. Ratnam, S. R. Weller, C. M. Kellett, A. T. Murray. Residential load and rooftop PV generation: An Australian distribution network dataset. International Journal of Sustainable Energy, vol. 36, no. 8, pp. 787\u2013806, 2017. DOI: https:\/\/doi.org\/10.1080\/14786451.2015.1100196.","journal-title":"International Journal of Sustainable Energy"},{"key":"1550_CR40","volume-title":"Using solar and load predictions in battery scheduling at the residential level","author":"R Bean","year":"2018","unstructured":"R. Bean, H. Khan. Using solar and load predictions in battery scheduling at the residential level, [Online], Available: https:\/\/arxiv.org\/abs\/1810.11178, 2018."}],"container-title":["Machine Intelligence Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11633-025-1550-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11633-025-1550-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11633-025-1550-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T06:48:08Z","timestamp":1758955688000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11633-025-1550-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,27]]},"references-count":40,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["1550"],"URL":"https:\/\/doi.org\/10.1007\/s11633-025-1550-8","relation":{},"ISSN":["2731-538X","2731-5398"],"issn-type":[{"value":"2731-538X","type":"print"},{"value":"2731-5398","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,27]]},"assertion":[{"value":"30 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declared that they have no conflicts of interest to this work.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations of conflict of interest"}}]}}