{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T09:45:04Z","timestamp":1775295904942,"version":"3.50.1"},"reference-count":54,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2018,12,4]],"date-time":"2018-12-04T00:00:00Z","timestamp":1543881600000},"content-version":"unspecified","delay-in-days":337,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2018]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The majority of multi-agent reinforcement learning (MARL) implementations aim to optimize systems with respect to a single objective, despite the fact that many real-world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a means to address the credit assignment problem in single-objective MARL, however it has been shown to alter the intended goals of a domain if misused, leading to unintended behaviour. Two popular shaping methods are potential-based reward shaping and difference rewards, and both have been repeatedly shown to improve learning speed and the quality of joint policies learned by agents in single-objective MARL domains. This work discusses the theoretical implications of applying these shaping approaches to cooperative multi-objective MARL problems, and evaluates their efficacy using two benchmark domains. Our results constitute the first empirical evidence that agents using these shaping methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.<\/jats:p>","DOI":"10.1017\/s0269888918000292","type":"journal-article","created":{"date-parts":[[2018,12,4]],"date-time":"2018-12-04T06:42:17Z","timestamp":1543905737000},"source":"Crossref","is-referenced-by-count":50,"title":["Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning"],"prefix":"10.48130","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7951-878X","authenticated-orcid":false,"given":"Patrick","family":"Mannion","sequence":"first","affiliation":[]},{"given":"Sam","family":"Devlin","sequence":"additional","affiliation":[]},{"given":"Jim","family":"Duggan","sequence":"additional","affiliation":[]},{"given":"Enda","family":"Howley","sequence":"additional","affiliation":[]}],"member":"27968","published-online":{"date-parts":[[2018,12,4]]},"reference":[{"key":"S0269888918000292_ref54","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-016-2124-z"},{"key":"S0269888918000292_ref34","doi-asserted-by":"crossref","unstructured":"Rahmattalabi A. , Chung J. J. , Colby M. & Tumer K. 2016. D++: Structural credit assignment in tightly coupled multiagent domains. In 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), 4424\u20134429. IEEE.","DOI":"10.1109\/IROS.2016.7759651"},{"key":"S0269888918000292_ref27","doi-asserted-by":"publisher","DOI":"10.1007\/s00158-003-0368-6"},{"key":"S0269888918000292_ref51","doi-asserted-by":"publisher","DOI":"10.1613\/jair.995"},{"key":"S0269888918000292_ref23","doi-asserted-by":"crossref","unstructured":"Mannion P. , Mason K. , Devlin S. , Duggan J. & Howley E. 2016c. Dynamic economic emissions dispatch optimisation using multi-agent reinforcement learning. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2016).","DOI":"10.65109\/INZJ2338"},{"key":"S0269888918000292_ref3","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijepes.2007.06.009"},{"key":"S0269888918000292_ref24","unstructured":"Mannion P. , Mason K. , Devlin S. , Duggan J. & Howley E. 2016d. Multi-objective dynamic dispatch optimisation using multi-agent reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1345\u20131346."},{"key":"S0269888918000292_ref1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-008-9046-9"},{"key":"S0269888918000292_ref32","unstructured":"Ng A. Y. , Harada D. & Russell S. J. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, ICML \u201999, 278\u2013287. Morgan Kaufmann Publishers Inc."},{"key":"S0269888918000292_ref53","volume-title":"Introduction to Multiagent Systems","author":"Wooldridge","year":"2001"},{"key":"S0269888918000292_ref38","doi-asserted-by":"crossref","unstructured":"Roijers D. M. , Whiteson S. & Oliehoek F. A. 2014. Linear support for multi-objective coordination graphs. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, 1297\u20131304. International Foundation for Autonomous Agents and Multiagent Systems.","DOI":"10.65109\/PNFC2600"},{"key":"S0269888918000292_ref12","doi-asserted-by":"publisher","DOI":"10.1142\/S0219525911002998"},{"key":"S0269888918000292_ref4","doi-asserted-by":"publisher","DOI":"10.1080\/09540091.2014.885282"},{"key":"S0269888918000292_ref15","first-page":"59","volume-title":"Complex Decision Making: Theory and Practice","author":"Duggan","year":"2008"},{"key":"S0269888918000292_ref30","unstructured":"Mitchell T. M. 1997. Machine Learning. McGraw-Hill Series in Computer Science. McGraw-Hill."},{"key":"S0269888918000292_ref14","doi-asserted-by":"crossref","unstructured":"Devlin S. , Yliniemi L. , Kudenko D. & Tumer K. 2014. Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 165\u2013172. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).","DOI":"10.65109\/ADIB2802"},{"key":"S0269888918000292_ref45","first-page":"3483","article-title":"Multi-objective reinforcement learning using sets of pareto dominating policies","volume":"15","author":"Van Moffaert","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"key":"S0269888918000292_ref11","unstructured":"Devlin S. & Kudenko D. 2012. Dynamic potential-based reward shaping. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 433\u2013440."},{"key":"S0269888918000292_ref42","doi-asserted-by":"crossref","unstructured":"Taylor A. , Dusparic I. , Galv\u00e1n-L\u00f3pez E. , Clarke S. & Cahill V. 2014. Accelerating learning in multi-objective systems through transfer learning. In Neural Networks (IJCNN), 2014 International Joint Conference on, 2298\u20132305. IEEE.","DOI":"10.1109\/IJCNN.2014.6889438"},{"key":"S0269888918000292_ref17","doi-asserted-by":"crossref","unstructured":"Grze\u015b M. 2017. Reward shaping in episodic reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 565\u2013573. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).","DOI":"10.65109\/CZHK5726"},{"key":"S0269888918000292_ref25","doi-asserted-by":"publisher","DOI":"10.1017\/S026988891700011X"},{"key":"S0269888918000292_ref20","unstructured":"Mannion P. , Devlin S. , Duggan J. & Howley E. 2016. Avoiding the tragedy of the commons using reward shaping. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2016)."},{"key":"S0269888918000292_ref22","unstructured":"Mannion P. , Duggan J. & Howley E. 2016b. Generating multi-agent potential functions using counterfactual estimates. In Proceedings of Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016)."},{"key":"S0269888918000292_ref50","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27645-3"},{"key":"S0269888918000292_ref39","doi-asserted-by":"publisher","DOI":"10.1613\/jair.4550"},{"key":"S0269888918000292_ref31","doi-asserted-by":"publisher","DOI":"10.2307\/1969529"},{"key":"S0269888918000292_ref43","doi-asserted-by":"crossref","unstructured":"Tumer K. & Agogino A. 2007. Distributed agent-based air traffic flow management. In Proceedings of the 6th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 330\u2013337. ACM.","DOI":"10.1145\/1329125.1329434"},{"key":"S0269888918000292_ref28","unstructured":"Mason K. 2015. Avoidance Techniques and Neighbourhood Topologies in Particle Swarm Optimisation. Master\u2019s thesis. National University of Ireland Galway."},{"key":"S0269888918000292_ref6","unstructured":"Claus C. & Boutilier C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National\/Tenth Conference on Artificial Intelligence\/Innovative Applications of Artificial Intelligence, AAAI \u201998\/IAAI, 746\u2013752."},{"key":"S0269888918000292_ref13","doi-asserted-by":"crossref","unstructured":"Devlin S. , Grzes M. & Kudenko D. 2011b. Multi-agent, potential-based reward shaping for robocup keepaway (extended abstract). In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1227\u20131228. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).","DOI":"10.65109\/SETL1494"},{"key":"S0269888918000292_ref7","doi-asserted-by":"crossref","unstructured":"Colby M. & Tumer K. 2015. An evolutionary game theoretic analysis of difference evaluation functions. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, 1391\u20131398. ACM.","DOI":"10.1145\/2739480.2754770"},{"key":"S0269888918000292_ref2","first-page":"406","article-title":"Inductive reasoning and bounded rationality","volume":"84","author":"Arthur","year":"1994","journal-title":"The American Economic Review"},{"key":"S0269888918000292_ref5","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"S0269888918000292_ref41","first-page":"41","article-title":"Penalty functions","volume":"2","author":"Smith","year":"2000","journal-title":"Evolutionary Computation"},{"key":"S0269888918000292_ref19","doi-asserted-by":"crossref","unstructured":"Malialis K. , Devlin S. & Kudenko D. 2016. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 503\u2013511. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).","DOI":"10.65109\/FQEN6076"},{"key":"S0269888918000292_ref33","volume-title":"Manual of political economy","author":"Pareto","year":"1906"},{"key":"S0269888918000292_ref37","unstructured":"Roijers D. M. , Whiteson S. & Oliehoek F. A. 2013. Computing convex coverage sets for multi-objective coordination graphs. In International Conference on Algorithmic Decision Theory, 309\u2013323."},{"key":"S0269888918000292_ref46","doi-asserted-by":"crossref","unstructured":"Van Moffaert K. , Drugan M. M. & Now\u00e9 A. 2013. Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 191\u2013199. IEEE.","DOI":"10.1109\/ADPRL.2013.6615007"},{"key":"S0269888918000292_ref36","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3987"},{"key":"S0269888918000292_ref52","doi-asserted-by":"publisher","DOI":"10.1209\/epl\/i2000-00208-x"},{"key":"S0269888918000292_ref48","doi-asserted-by":"publisher","DOI":"10.1109\/59.260861"},{"key":"S0269888918000292_ref44","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5232-5"},{"key":"S0269888918000292_ref49","unstructured":"Watkins C. J. C. H. 1989. Learning from Delayed Rewards. PhD thesis. King\u2019s College, Cambridge."},{"key":"S0269888918000292_ref16","unstructured":"G\u00e1bor Z. , Kalm\u00e1r Z. & Szepesv\u00e1ri C. 1998. Multi-criteria reinforcement learning. In Proceedings of the Fifteenth International Conference on Machine Learning, 197\u2013205."},{"key":"S0269888918000292_ref35","unstructured":"Randl\u00f8v J. & Alstr\u00f8m P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML \u201998, 463\u2013471. Morgan Kaufmann Publishers Inc."},{"key":"S0269888918000292_ref8","doi-asserted-by":"crossref","unstructured":"Colby M. , Duchow-Pressley T. , Chung J. J. & Tumer K. 2016. Local approximation of difference evaluation functions. In Proceedings of the 15th International Conference on Autonomous Agents & Multiagent Systems (AAMAS), 521\u2013529. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).","DOI":"10.65109\/ODSN4129"},{"key":"S0269888918000292_ref29","unstructured":"Mason K. , Mannion P. , Duggan J. & Howley E. 2016. Applying multi-agent reinforcement learning to watershed management. In Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2016)."},{"key":"S0269888918000292_ref47","unstructured":"Van Moffaert K. , Brys T. , Chandra A. , Esterle L. , Lewis P. R. & Now\u00e9 A. 2014. A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning. In Neural Networks (IJCNN), 2014 International Joint Conference, 2306\u20132314."},{"key":"S0269888918000292_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2014.01.007"},{"key":"S0269888918000292_ref40","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2006.02.006"},{"key":"S0269888918000292_ref10","doi-asserted-by":"crossref","unstructured":"Devlin S. & Kudenko D. 2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 225\u2013232. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).","DOI":"10.65109\/VDID3904"},{"key":"S0269888918000292_ref9","unstructured":"Devlin S. 2013. Potential-Based Reward Shaping for Knowledge-Based, Multi-Agent Reinforcement Learning. PhD thesis, University of York."},{"key":"S0269888918000292_ref21","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-25808-9_4"},{"key":"S0269888918000292_ref26","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.05.090"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888918000292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T08:46:37Z","timestamp":1775292397000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888918000292\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018]]},"references-count":54,"alternative-id":["S0269888918000292"],"URL":"https:\/\/doi.org\/10.1017\/s0269888918000292","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018]]},"article-number":"e23"}}