{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,8]],"date-time":"2025-11-08T23:00:59Z","timestamp":1762642859785,"version":"build-2065373602"},"reference-count":19,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,3,30]],"date-time":"2023-03-30T00:00:00Z","timestamp":1680134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004489","name":"Mitacs","doi-asserted-by":"publisher","award":["IT28020"],"award-info":[{"award-number":["IT28020"]}],"id":[{"id":"10.13039\/501100004489","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>This paper addresses the issue of choosing an appropriate reward function in multi-agent reinforcement learning. The traditional approach of using joint rewards for team performance is questioned due to a lack of theoretical backing. The authors explore the impact of changing the reward function from joint to individual on learning centralized decentralized execution algorithms in a Level-Based Foraging environment. Empirical results reveal that individual rewards contain more variance, but may have less bias compared to joint rewards. The findings show that different algorithms are affected differently, with value factorization methods and PPO-based methods taking advantage of the increased variance to achieve better performance. This study sheds light on the importance of considering the choice of a reward function and its impact on multi-agent reinforcement learning systems.<\/jats:p>","DOI":"10.3390\/systems11040180","type":"journal-article","created":{"date-parts":[[2023,3,30]],"date-time":"2023-03-30T04:45:35Z","timestamp":1680151535000},"page":"180","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["It\u2019s All about Reward: Contrasting Joint Rewards and Individual Reward in Centralized Learning Decentralized Execution Algorithms"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2572-529X","authenticated-orcid":false,"given":"Peter","family":"Atrazhev","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7780-5048","authenticated-orcid":false,"given":"Petr","family":"Musilek","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,30]]},"reference":[{"key":"ref_1","unstructured":"Samvelyan, M., Rashid, T., de Witt, C.S., Farquhar, G., Nardelli, N., Rudner, T.G.J., Hung, C.M., Torr, P.H.S., Foerster, J., and Whiteson, S. (2019, January 13\u201317). The StarCraft Multi-Agent Challenge. Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Montr\u00e9al, QC, Canada."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1613\/jair.3912","article-title":"The arcade learning environment: An evaluation platform for general agents","volume":"47","author":"Bellemare","year":"2013","journal-title":"J. Artif. Intell. Res."},{"key":"ref_3","unstructured":"Ellis, B., Moalla, S., Samvelyan, M., Sun, M., Mahajan, A., Foerster, J.N., and Whiteson, S. (2022). SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Oliehoek, F.A., and Amato, C. (2016). A Concise Introduction to Decentralized POMDPs, Springer International Publishing.","DOI":"10.1007\/978-3-319-28929-8"},{"key":"ref_5","unstructured":"Papoudakis, G., Christianos, F., Sch\u00e4fer, L., and Albrecht, S.V. (2020, January 17). Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. Proceedings of the RLEM\u201920: Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities, Virtual."},{"key":"ref_6","first-page":"1928","article-title":"Asynchronous Methods for Deep Reinforcement Learning","volume":"Volume 48","author":"Balcan","year":"2016","journal-title":"Proceedings of the 33rd International Conference on Machine Learning"},{"key":"ref_7","first-page":"24611","article-title":"The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games","volume":"35","author":"Yu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_8","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I. (2017, January 4\u20139). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_9","unstructured":"Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J.N., and Whiteson, S. (2018, January 10\u201315). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_10","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., and Tuyls, K. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv."},{"key":"ref_11","unstructured":"Lyu, X., Xiao, Y., Daley, B., and Amato, C. (2021, January 3\u20137). Contrasting Centralized and Decentralized Critics in Multi-Agent Reinforcement Learning. Proceedings of the AAMAS \u201921: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, Virtual."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tumer, K., and Wolpert, D. (2004). A survey of collectives. Collectives and the Design of Complex Systems, Springer.","DOI":"10.1007\/978-1-4419-8909-3"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Colby, M., Duchow-Pressley, T., Chung, J.J., and Tumer, K. (2016, January 9\u201313). Local approximation of difference evaluation functions. Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, Singapore.","DOI":"10.2514\/1.I010379"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1007\/s10458-008-9046-9","article-title":"Analyzing and visualizing multiagent rewards in dynamic and stochastic domains","volume":"17","author":"Agogino","year":"2008","journal-title":"Auton. Agents-Multi-Agent Syst."},{"key":"ref_15","unstructured":"Proper, S., and Tumer, K. (2012, January 4\u20138). Modeling difference rewards for multiagent learning. Proceedings of the AAMAS, Valencia, Spain."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_17","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_18","unstructured":"Christianos, F., Sch\u00e4fer, L., and Albrecht, S.V. (2020, January 6\u201312). Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Atrazhev, P., and Musilek, P. (2022, January 13\u201315). Investigating Effects of Centralized Learning Decentralized Execution on Team Coordination in the Level Based Foraging Environment as a Sequential Social Dilemma. Proceedings of the International Conference on Practical Applications of Agents and Multi-Agent Systemsm, L\u2019Aquila, Italy.","DOI":"10.1007\/978-3-031-18192-4_2"}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/11\/4\/180\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:06:56Z","timestamp":1760123216000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/11\/4\/180"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,30]]},"references-count":19,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["systems11040180"],"URL":"https:\/\/doi.org\/10.3390\/systems11040180","relation":{},"ISSN":["2079-8954"],"issn-type":[{"type":"electronic","value":"2079-8954"}],"subject":[],"published":{"date-parts":[[2023,3,30]]}}}