{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T18:28:01Z","timestamp":1767637681584,"version":"3.48.0"},"reference-count":20,"publisher":"Maximum Academic Press","issue":"1","license":[{"start":{"date-parts":[[2016,2,11]],"date-time":"2016-02-11T00:00:00Z","timestamp":1455148800000},"content-version":"unspecified","delay-in-days":41,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2016,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.<\/jats:p>\n                  <jats:p>\n                    This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agents\n                    <jats:italic>a priori<\/jats:italic>\n                    , the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution.\n                  <\/jats:p>\n                  <jats:p>We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.<\/jats:p>","DOI":"10.1017\/s0269888915000193","type":"journal-article","created":{"date-parts":[[2016,2,11]],"date-time":"2016-02-11T20:33:12Z","timestamp":1455222792000},"page":"59-76","source":"Crossref","is-referenced-by-count":6,"title":["Context-sensitive reward shaping for sparse interaction multi-agent systems"],"prefix":"10.48130","volume":"31","author":[{"given":"Yann-Micha\u00ebl","family":"de Hauwere","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sam","family":"Devlin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Kudenko","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ann","family":"Now\u00e9","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"27968","published-online":{"date-parts":[[2016,2,11]]},"reference":[{"key":"S0269888915000193_ref19","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2008.920998"},{"key":"S0269888915000193_ref18","doi-asserted-by":"publisher","DOI":"10.1142\/S0219525909002301"},{"key":"S0269888915000193_ref15","unstructured":"Ng A. Y. , Harada D. & Russell S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278\u2013287. Morgan Kaufmann."},{"key":"S0269888915000193_ref14","unstructured":"Melo F. & Veloso M. 2010. Local Multiagent Coordination in Decentralised MDPs with Sparse Interactions. Technical report CMU-CS-10-133, School of Computer Science, Carnegie Mellon University."},{"key":"S0269888915000193_ref13","unstructured":"Melo F. & Veloso M. 2009. Learning of coordination: exploiting sparse interactions in multiagent systems. In Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems, 773\u2013780."},{"key":"S0269888915000193_ref10","doi-asserted-by":"crossref","unstructured":"Grzes M. & Kudenko D. 2008. Plan-based reward shaping for reinforcement learning. In 4th International IEEE Conference on Intelligent Systems, 2008. IS\u201908, 2, 10\u201322\u201310\u201329.","DOI":"10.1109\/IS.2008.4670492"},{"key":"S0269888915000193_ref7","unstructured":"Devlin S. & Kudenko D. 2011. Theoretical considerations of potential-based reward shaping for multiagent systems. In The 10th International Conference on Autonomous Agents and Multiagent Systems\u2014Volume 1, 225\u2013232."},{"key":"S0269888915000193_ref6","doi-asserted-by":"crossref","unstructured":"De Hauwere Y.-M. , Vrancx P. & Now\u00e9 A. 2011c. Solving sparse delayed coordination problems in multi-agent reinforcement learning. In Adaptive Agents and Multi-Agent Systems V, Lecture Notes in Artificial Intelligence Volume 7113, 45\u201352. Springer-Verlag.","DOI":"10.1007\/978-3-642-28499-1_8"},{"key":"S0269888915000193_ref5","unstructured":"De Hauwere Y.-M. , Vrancx P. & Now\u00e9 A. 2011b. Solving delayed coordination problems in MAS (extended abstract). In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115\u20131116."},{"key":"S0269888915000193_ref4","unstructured":"De Hauwere Y.-M. , Vrancx P. & Now\u00e9 A. 2011a. Adaptive state representations for multi-agent reinforcement learning. In Proceedings of the 3th International Conference on Agents and Artificial Intelligence, 181\u2013189."},{"key":"S0269888915000193_ref3","unstructured":"De Hauwere Y.-M. , Vrancx P. & Now\u00e9 A. 2010. Learning multi-agent state space representations. In The 9th International Conference on Autonomous Agents and Multiagent Systems, 715\u2013722."},{"key":"S0269888915000193_ref2","unstructured":"Claus C. & Boutilier C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, 746\u2013752. AAAI Press."},{"key":"S0269888915000193_ref1","unstructured":"Boutilier C. 1996. Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, 195\u2013210."},{"key":"S0269888915000193_ref11","first-page":"1039","article-title":"Nash Q-learning for general-sum stochastic games","volume":"4","author":"Hu","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S0269888915000193_ref8","doi-asserted-by":"crossref","unstructured":"Devlin S. & Kudenko D. (In Press), Plan-based reward shaping for multi-agent reinforcement learning. Knowledge Engineering Review.","DOI":"10.1017\/S0269888915000181"},{"key":"S0269888915000193_ref17","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993306"},{"key":"S0269888915000193_ref12","unstructured":"Kok J. , \u2019t Hoen P. , Bakker B. & Vlassis N. 2005. Utile coordination: learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG\u201905), 29\u201336."},{"key":"S0269888915000193_ref16","unstructured":"Randl\u00f8v J. & Alstr\u00f8m P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, ICML\u201998, 463\u2013471. Morgan Kaufmann."},{"key":"S0269888915000193_ref9","unstructured":"Greenwald A. & Hall K. 2003. Correlated-Q learning. In AAAI Spring Symposium, 242\u2013249. AAAI Press."},{"key":"S0269888915000193_ref20","unstructured":"Watkins C. 1989. Learning from Delayed Rewards. PhD thesis, University of Cambridge."}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888915000193","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:06Z","timestamp":1767624126000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888915000193\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,1]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2016,1]]}},"alternative-id":["S0269888915000193"],"URL":"https:\/\/doi.org\/10.1017\/s0269888915000193","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"type":"print","value":"0269-8889"},{"type":"electronic","value":"1469-8005"}],"subject":[],"published":{"date-parts":[[2016,1]]}}}