{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T21:36:41Z","timestamp":1768253801494,"version":"3.49.0"},"reference-count":80,"publisher":"Springer Science and Business Media LLC","issue":"31","license":[{"start":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T00:00:00Z","timestamp":1692748800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T00:00:00Z","timestamp":1692748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100018693","name":"HORIZON EUROPE Framework Programme","doi-asserted-by":"publisher","award":["VALAWAI (HE-101070930)"],"award-info":[{"award-number":["VALAWAI (HE-101070930)"]}],"id":[{"id":"10.13039\/100018693","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["Crowd4SDG (H2020-872944)"],"award-info":[{"award-number":["Crowd4SDG (H2020-872944)"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["TAILOR (H2020-952215)"],"award-info":[{"award-number":["TAILOR (H2020-952215)"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["COREDEM (H2020-785907)"],"award-info":[{"award-number":["COREDEM (H2020-785907)"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009563","name":"Fundaci\u00f3n para la Formaci\u00f3n e Investigaci\u00f3n Sanitarias de la Regi\u00f3n de Murcia","doi-asserted-by":"publisher","award":["22S01386-001"],"award-info":[{"award-number":["22S01386-001"]}],"id":[{"id":"10.13039\/501100009563","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010198","name":"Ministerio de Asuntos Econ\u00f3micos y Transformaci\u00f3n Digital, Gobierno de Espa\u00f1a","doi-asserted-by":"publisher","award":["PID2019-104156GB-I00"],"award-info":[{"award-number":["PID2019-104156GB-I00"]}],"id":[{"id":"10.13039\/501100010198","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["FPU18\/03387"],"award-info":[{"award-number":["FPU18\/03387"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003339","name":"Consejo Superior de Investigaciones Cientificas","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003339","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>This paper tackles the open problem of value alignment in multi-agent systems. In particular, we propose an approach to build an <jats:italic>ethical<\/jats:italic> environment that guarantees that agents in the system learn a joint ethically-aligned behaviour while pursuing their respective individual objectives. Our contributions are founded in the framework of Multi-Objective Multi-Agent Reinforcement Learning. Firstly, we characterise a family of Multi-Objective Markov Games (MOMGs), the so-called <jats:italic>ethical<\/jats:italic> MOMGs, for which we can formally guarantee the learning of ethical behaviours. Secondly, based on our characterisation we specify the process for building single-objective ethical environments that simplify the learning in the multi-agent system. We illustrate our process with an ethical variation of the Gathering Game, where agents manage to compensate social inequalities by learning to behave in alignment with the moral value of beneficence.<\/jats:p>","DOI":"10.1007\/s00521-023-08898-y","type":"journal-article","created":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T12:02:28Z","timestamp":1692792148000},"page":"25619-25644","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Multi-objective reinforcement learning for designing ethical multi-agent environments"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1339-2018","authenticated-orcid":false,"given":"Manel","family":"Rodriguez-Soto","sequence":"first","affiliation":[]},{"given":"Maite","family":"Lopez-Sanchez","sequence":"additional","affiliation":[]},{"given":"Juan A.","family":"Rodriguez-Aguilar","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,8,23]]},"reference":[{"key":"8898_CR1","doi-asserted-by":"publisher","unstructured":"Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning, ICML \u201904. ACM, New York, NY, USA. https:\/\/doi.org\/10.1145\/1015330.1015430","DOI":"10.1145\/1015330.1015430"},{"key":"8898_CR2","unstructured":"Abel D, MacGlashan J, Littman ML (2016) Reinforcement learning as a framework for ethical decision making. In: AAAI Workshops: AI, Ethics, and Society, Association for the Advancement of Artificial Intelligence, vol 92"},{"key":"8898_CR3","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1007\/s10676-006-0004-4","volume":"7","author":"C Allen","year":"2005","unstructured":"Allen C, Smit I, Wallach W (2005) Artificial morality: top\u2013down, bottom\u2013up, and hybrid approaches. Ethics Inform Technol 7:149\u2013155. https:\/\/doi.org\/10.1007\/s10676-006-0004-4","journal-title":"Ethics Inform Technol"},{"key":"8898_CR4","doi-asserted-by":"crossref","unstructured":"Alshiekh M, Bloem R, Ehlers R, K\u00f6nighofer B, Niekum S, Topcu U (2018) Safe reinforcement learning via shielding. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v32i1.11797"},{"key":"8898_CR5","unstructured":"Amodei D, Olah C, Steinhardt J, Christiano PF, Schulman J, Man\u00e9 D (2016) Concrete problems in ai safety. CoRR abs\/1606.06565"},{"key":"8898_CR6","unstructured":"Arnold T, Kasenberg D, Scheutz M (2017) Value alignment or misalignment\u2014what will keep systems accountable? In: AAAI Workshops 2017, Association for the Advancement of Artificial Intelligence. https:\/\/hrilab.tufts.edu\/publications\/arnoldetal17aiethics.pdf. Accessed 16 May 2020"},{"key":"8898_CR7","volume-title":"The Cambridge dictionary of philosophy","author":"R Audi","year":"1999","unstructured":"Audi R (1999) The Cambridge dictionary of philosophy. Cambridge University Press, Cambridge"},{"key":"8898_CR8","unstructured":"Bai A, Srivastava S, Russell S (2016) Markovian state and action abstractions for mdps via hierarchical mcts. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI\u201916. AAAI Press, pp 3029\u20133037"},{"key":"8898_CR9","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1609\/aaai.v33i01.33013","volume":"33","author":"A Balakrishnan","year":"2019","unstructured":"Balakrishnan A, Bouneffouf D, Mattei N, Rossi F (2019) Incorporating behavioral constraints in online AI systems. Proc AAAI Confer Artif Intell 33:3\u201311. https:\/\/doi.org\/10.1609\/aaai.v33i01.33013","journal-title":"Proc AAAI Confer Artif Intell"},{"key":"8898_CR10","doi-asserted-by":"publisher","unstructured":"Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. Proceedings of the 25th International Conference on Machine Learning, pp 41\u201347. https:\/\/doi.org\/10.1145\/1390156.1390162","DOI":"10.1145\/1390156.1390162"},{"issue":"5","key":"8898_CR11","first-page":"679","volume":"6","author":"R Bellman","year":"1957","unstructured":"Bellman R (1957) A markovian decision process. J Math Mech 6(5):679\u2013684","journal-title":"J Math Mech"},{"key":"8898_CR12","doi-asserted-by":"publisher","first-page":"101726","DOI":"10.1016\/j.techsoc.2021.101726","volume":"67","author":"JP Boada","year":"2021","unstructured":"Boada JP, Maestre BR, Gen\u00eds CT (2021) The ethical issues of social assistive robotics: a critical literature review. Technol Soc 67:101726","journal-title":"Technol Soc"},{"key":"8898_CR13","doi-asserted-by":"crossref","unstructured":"Casas-Roma J, Conesa J (2020) Towards the design of ethically-aware pedagogical conversational agents. In: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. Springer, pp 188\u2013198","DOI":"10.1007\/978-3-030-61105-7_19"},{"key":"8898_CR14","unstructured":"Castelletti A, Corani G, Rizzoli A, Sessa RS, Weber E (2002) Reinforcement learning in the operational management of a water system. In: Modelling and Control in Environmental Issues 2001, Pergamon Press, pp 325\u2013330"},{"key":"8898_CR15","doi-asserted-by":"crossref","unstructured":"Chatila R, Dignum V, Fisher M, Giannotti F, Morik K, Russell S, Yeung K (2021) Trustworthy AI. In: Reflections on Artificial Intelligence for Humanity. Springer, Berlin, pp 13\u201339","DOI":"10.1007\/978-3-030-69128-8_2"},{"issue":"1","key":"8898_CR16","first-page":"1","volume":"5","author":"RM Chisholm","year":"1963","unstructured":"Chisholm RM (1963) Supererogation and offence: a conceptual scheme for ethics. Ratio (Misc.) 5(1):1","journal-title":"Ratio (Misc.)"},{"key":"8898_CR17","unstructured":"Chow Y, Nachum O, Duenez-Guzman E, Ghavamzadeh M (2018) A lyapunov-based approach to safe reinforcement learning. In: Neurips 2018"},{"key":"8898_CR18","unstructured":"European Comission (2021) Artificial Intelligence Act. https:\/\/eur-lex.europa.eu\/legal-content\/EN\/TXT\/?qid=1623335154975 &uri=CELEX%3A52021PC0206. Accessed 29 June, 2021"},{"key":"8898_CR19","unstructured":"Damgaard C (2022) Gini coefficient. https:\/\/mathworld.wolfram.com\/GiniCoefficient.html. Accessed 30 Apr, 2022"},{"issue":"6","key":"8898_CR20","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1109\/MIS.2003.1249168","volume":"18","author":"RK Dash","year":"2003","unstructured":"Dash RK, Jennings NR, Parkes DC (2003) Computational-mechanism design: a call to arms. IEEE Intell Syst 18(6):40\u201347. https:\/\/doi.org\/10.1109\/MIS.2003.1249168","journal-title":"IEEE Intell Syst"},{"key":"8898_CR21","unstructured":"Ecoffet A, Lehman J (2021) Reinforcement learning under moral uncertainty. In: Meila M, Zhang T (eds) Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol 139. PMLR, pp 2926\u20132936. https:\/\/proceedings.mlr.press\/v139\/ecoffet21a.html"},{"key":"8898_CR22","unstructured":"Elsayed-Aly I, Bharadwaj S, Amato C, Ehlers R, Topcu U, Feng L (2021) Safe multi-agent reinforcement learning via shielding. In: Proceedings of the 20th International Conference on Autonomous Agents and Multi-Agent Aystems (AAMAS 2021), Main track, pp 483\u2013491"},{"issue":"9","key":"8898_CR23","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1145\/2955091","volume":"59","author":"A Etzioni","year":"2016","unstructured":"Etzioni A, Etzioni O (2016) Designing AI systems that obey our laws and values. Commun ACM 59(9):29\u201331. https:\/\/doi.org\/10.1145\/2955091","journal-title":"Commun ACM"},{"key":"8898_CR24","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1007\/s11023-020-09539-2","volume":"30","author":"I Gabriel","year":"2020","unstructured":"Gabriel I (2020) Artificial intelligence, values, and alignment. Minds Mach 30:411\u2013437. https:\/\/doi.org\/10.1007\/s11023-020-09539-2","journal-title":"Minds Mach"},{"issue":"1","key":"8898_CR25","first-page":"1437","volume":"16","author":"J Garc\u00eda","year":"2015","unstructured":"Garc\u00eda J, Fern\u00e1ndez F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16(1):1437\u20131480","journal-title":"J Mach Learn Res"},{"key":"8898_CR26","doi-asserted-by":"publisher","DOI":"10.1007\/s11023-020-09524-9","author":"J Haas","year":"2020","unstructured":"Haas J (2020) Moral gridworlds: a theoretical proposal for modeling artificial moral cognition. Minds Mach. https:\/\/doi.org\/10.1007\/s11023-020-09524-9","journal-title":"Minds Mach"},{"key":"8898_CR27","unstructured":"Hadfield-Menell D, Russell SJ, Abbeel P, Dragan A (2016) Cooperative inverse reinforcement learning. Adv Neural Inform Process Syst 29:3909\u20133917"},{"key":"8898_CR28","volume-title":"The righteous mind: why good people are divided by politics and religion","author":"J Haidt","year":"2012","unstructured":"Haidt J (2012) The righteous mind: why good people are divided by politics and religion. Vintage, New York"},{"key":"8898_CR29","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511498466","volume-title":"The structure of values and norms. Cambridge studies in probability, induction and decision theory","author":"SO Hansson","year":"2001","unstructured":"Hansson SO (2001) The structure of values and norms. Cambridge studies in probability, induction and decision theory. Cambridge University Press, Cambridge. https:\/\/doi.org\/10.1017\/CBO9780511498466"},{"key":"8898_CR30","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-77434-3","volume-title":"Introduction to formal philosophy","author":"SO Hansson","year":"2018","unstructured":"Hansson SO, Hendricks V (2018) Introduction to formal philosophy. Springer, Berlin"},{"key":"8898_CR31","doi-asserted-by":"crossref","unstructured":"Hayes C, Rdulescu R, Bargiacchi E, K\u00e4llstr\u00f6m J, Macfarlane M, Reymond M, Verstraeten T, Zintgraf L, Dazeley R, Heintz F, Howley E, Irissappane A, Mannion P, Nowe A, Ramos G, Restelli M, Vamplew P, Roijers D (2021) A practical guide to multi-objective reinforcement learning and planning. In: Autonomous Agents and Multi-Agent Systems, ISSN 1387-2532, E-ISSN 1573-7454, vol 36, no 1","DOI":"10.1007\/s10458-022-09552-y"},{"key":"8898_CR32","doi-asserted-by":"crossref","unstructured":"Hostetler J, Fern A, Dietterich T (2014) State aggregation in monte carlo tree search. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI\u201914. AAAI Press, pp 2446\u20132452","DOI":"10.1609\/aaai.v28i1.9066"},{"key":"8898_CR33","first-page":"1039","volume":"4","author":"J Hu","year":"2003","unstructured":"Hu J, Wellman MP (2003) Nash q-learning for general-sum stochastic games. J Mach Learn Res 4:1039\u20131069","journal-title":"J Mach Learn Res"},{"key":"8898_CR34","unstructured":"Hughes E, Leibo JZ, Phillips M, Tuyls K, Du\u00e9\u00f1ez-Guzm\u00e1n EA, Casta\u00f1eda AG, Dunning I, Zhu T, McKee KR, Koster R, Roff H, Graepel T (2018) Inequity aversion improves cooperation in intertemporal social dilemmas. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), vol 31, pp 1\u201311"},{"key":"8898_CR35","unstructured":"IEEE (2019) IEEE global initiative on ethics of autonomous and intelligent systems. https:\/\/standards.ieee.org\/industry-connections\/ec\/autonomous-systems.html. Accessed 29 June 2021"},{"key":"8898_CR36","unstructured":"Jaques N, Lazaridou A, Hughes E, \u00c7aglar G\u00fcl\u00e7ehre Ortega PA, Strouse D, Leibo JZ, de Freitas N (2019) Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR, vol 97, pp 3040\u20133049"},{"issue":"1","key":"8898_CR37","first-page":"237","volume":"4","author":"LP Kaelbling","year":"1996","unstructured":"Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Int Res 4(1):237\u2013285","journal-title":"J Artif Int Res"},{"key":"8898_CR38","unstructured":"Krakovna V, Orseau L, Martic M, Legg S (2019) Penalizing side effects using stepwise relative reachability. arXiv preprint"},{"key":"8898_CR39","doi-asserted-by":"crossref","unstructured":"Busoniu L, Babuska R, BDS (2010) Multi-agent reinforcement learning: an overview. Innov Multi-Agent Syst Appl 1:183\u2013221","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"8898_CR40","unstructured":"Leibo JZ, Zambaldi VF, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. CoRR abs\/1702.03037. arXiv:1702.03037"},{"key":"8898_CR41","unstructured":"Leike J, Martic M, Krakovna V, Ortega P, Everitt T, Lefrancq A, Orseau L, Legg S (2017) Ai safety gridworlds. arXiv:1711.09883"},{"key":"8898_CR42","unstructured":"Li L, Walsh TJ, Littman ML (2006) Towards a unified theory of state abstraction for mdps. In: In Proceedings of the Ninth International Symposium on Artificial Intelligence and Mathematics, pp 531\u2013539"},{"key":"8898_CR43","unstructured":"Liscio E, Meer MVD, Siebert LC, Jonker C, Mouter N, Murukannaiah PK (2021) Axies: identifying and evaluating context-specific values. Axies: Identifying and Evaluating Context-Specific Values. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '21), Main track, pp 799\u2013808"},{"key":"8898_CR44","doi-asserted-by":"crossref","unstructured":"Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, ICML\u201994. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 157\u2013163. http:\/\/dl.acm.org\/citation.cfm?id=3091574.3091594","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"8898_CR45","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511794216","volume-title":"Game theory","author":"M Maschler","year":"2013","unstructured":"Maschler M, Solan E, Zamir S (2013) Game theory, 2nd edn. Cambridge University Press, Cambridge","edition":"2"},{"key":"8898_CR46","unstructured":"McKee KR, Gemp I, McWilliams B, Du\u00e8\u00f1ez Guzm\u00e1n EA, Hughes E, Leibo JZ (2020) Social diversity and social preferences in mixed-motive reinforcement learning. AAMAS \u201920. International Foundation for Autonomous Agents and Multiagent Systems, pp 869\u2013877"},{"issue":"1","key":"8898_CR47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18564\/jasss.3929","volume":"22","author":"R Mercuur","year":"2019","unstructured":"Mercuur R, Dignum V, Jonker C et al (2019) The value of values and norms in social simulation. J Artif Soc Soc Simul 22(1):1\u20139","journal-title":"J Artif Soc Soc Simul"},{"key":"8898_CR48","doi-asserted-by":"crossref","unstructured":"Nashed SB, Svegliato J, Zilberstein S (2021) Ethically compliant sequential decision making. In: Proceedings of the 4th Conference on AI, Ethics, and Society (AIES)","DOI":"10.1609\/aaai.v35i13.17386"},{"key":"8898_CR49","doi-asserted-by":"publisher","unstructured":"Natarajan S, Tadepalli P (2005) Dynamic preferences in multi-criteria reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning, ICML \u201905. Association for Computing Machinery, New York, NY, USA, pp 601\u2013608. https:\/\/doi.org\/10.1145\/1102351.1102427","DOI":"10.1145\/1102351.1102427"},{"key":"8898_CR50","unstructured":"Neto G (2005) From single-agent to multi-agent reinforcement learning: foundational concepts and methods.  http:\/\/users.isr.ist.utl.pt\/~mtjspaan\/readingGroup\/learningNeto05.pdf. Accessed 18 May 2021"},{"key":"8898_CR51","doi-asserted-by":"publisher","first-page":"6377","DOI":"10.1147\/JRD.2019.2940428","volume":"PP","author":"R Noothigattu","year":"2019","unstructured":"Noothigattu R, Bouneffouf D, Mattei N, Chandra R, Madan P, Kush R, Campbell M, Singh M, Rossi F (2019) Teaching AI agents ethical values using reinforcement learning and policy orchestration. IBM J Res Dev PP:6377\u20136381. https:\/\/doi.org\/10.1147\/JRD.2019.2940428","journal-title":"IBM J Res Dev"},{"key":"8898_CR52","unstructured":"Peysakhovich A, Lerer A (2017) Prosocial learning agents solve generalized stag hunts better than selfish ones. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), Main track extended abstract, pp 2043\u20132044"},{"key":"8898_CR53","volume-title":"Ethics, technology, and engineering: an introduction","author":"I van de Poel","year":"2011","unstructured":"van de Poel I, Royakkers L (2011) Ethics, technology, and engineering: an introduction. Wiley-Blackwell, New York"},{"key":"8898_CR54","unstructured":"Riedl MO, Harrison B (2016) Using stories to teach human values to artificial agents. In: AI, Ethics, and Society, Papers from the 2016 AAAI Workshop"},{"key":"8898_CR55","doi-asserted-by":"crossref","unstructured":"Rodriguez-Soto M, Lopez-Sanchez M, Rodriguez\u00a0Aguilar JA (2021) Multi-objective reinforcement learning for designing ethical environments. In: Zhou ZH (eds) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21. International Joint Conferences on Artificial Intelligence Organization. Main Track, pp 545\u2013551","DOI":"10.24963\/ijcai.2021\/76"},{"key":"8898_CR56","doi-asserted-by":"publisher","unstructured":"Roijers D, Whiteson S (2017) Multi-objective decision making. synthesis lectures on artificial intelligence and machine learning. Morgan and Claypool, California, USA. https:\/\/doi.org\/10.2200\/S00765ED1V01Y201704AIM034. http:\/\/www.morganclaypool.com\/doi\/abs\/10.2200\/S00765ED1V01Y201704AIM034","DOI":"10.2200\/S00765ED1V01Y201704AIM034"},{"issue":"1","key":"8898_CR57","first-page":"67","volume":"48","author":"DM Roijers","year":"2013","unstructured":"Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Int Res 48(1):67\u2013113","journal-title":"J Artif Int Res"},{"key":"8898_CR58","doi-asserted-by":"publisher","first-page":"9785","DOI":"10.1609\/aaai.v33i01.33019785","volume":"33","author":"F Rossi","year":"2019","unstructured":"Rossi F, Mattei N (2019) Building ethically bounded AI. Proc AAAI Confer Artif Intell 33:9785\u20139789. https:\/\/doi.org\/10.1609\/aaai.v33i01.33019785","journal-title":"Proc AAAI Confer Artif Intell"},{"key":"8898_CR59","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1609\/aimag.v36i4.2577","volume":"36","author":"S Russell","year":"2015","unstructured":"Russell S, Dewey D, Tegmark M (2015) Research priorities for robust and beneficial artificial intelligence. Ai Mag 36:105\u2013114. https:\/\/doi.org\/10.1609\/aimag.v36i4.2577","journal-title":"Ai Mag"},{"key":"8898_CR60","unstructured":"Rdulescu R (2021) Decision making in multi-objective multi-agent systems: a utility-based perspective. Ph.D. thesis, Vrije Universiteit Brussel"},{"key":"8898_CR61","first-page":"1","volume":"34","author":"R Rdulescu","year":"2019","unstructured":"Rdulescu R, Mannion P, Roijers DM, Now\u00e9 A (2019) Multi-objective multi-agent decision making: a utility-based analysis and survey. Auton Agents Multi-Agent Syst 34:1\u201352","journal-title":"Auton Agents Multi-Agent Syst"},{"key":"8898_CR62","doi-asserted-by":"publisher","unstructured":"Saisubramanian S, Kamar E, Zilberstein S (2020) A multi-objective approach to mitigate negative side effects. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp 354\u2013361. https:\/\/doi.org\/10.24963\/ijcai.2020\/50","DOI":"10.24963\/ijcai.2020\/50"},{"key":"8898_CR63","unstructured":"Saisubramanian S, Zilberstein S (2021) Mitigating negative side effects via environment shaping. International Foundation for Autonomous Agents and Multiagent Systems, pp 1640\u20131642"},{"key":"8898_CR64","unstructured":"Sierra C, Osman N, Noriega P, Sabater-Mir J, Perello-Moragues A (2019) Value alignment: a formal approach. Responsible Artificial Intelligence Agents Workshop (RAIA) in AAMAS 2019"},{"issue":"3","key":"8898_CR65","first-page":"229","volume":"1","author":"P Singer","year":"1972","unstructured":"Singer P (1972) Famine, affluence and morality. Philos Public Aff 1(3):229\u2013243","journal-title":"Philos Public Aff"},{"key":"8898_CR66","unstructured":"Soares N, Fallenstein B (2014) Aligning superintelligence with human interests: a technical research agenda. Machine Intelligence Research Institute (MIRI) technical report 8"},{"key":"8898_CR67","doi-asserted-by":"crossref","unstructured":"Sun FY, Chang YY, Wu YH, Lin SD (2018) Designing non-greedy reinforcement learning agents with diminishing reward shaping. In: Proceedings of the 2018 AAAI\/ACM Conference on AI, Ethics, and Society (AIES 2018), pp 297\u2013302","DOI":"10.1145\/3278721.3278759"},{"key":"8898_CR68","unstructured":"Sun FY, Chang YY, Wu YH, Lin SD (2019) A regulation enforcement solution for multi-agent reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Main track extended abstract, pp. 2201\u20132203"},{"key":"8898_CR69","doi-asserted-by":"publisher","first-page":"54","DOI":"10.11590\/abhps.2020.2.04","volume":"8","author":"M Sutrop","year":"2020","unstructured":"Sutrop M (2020) Challenges of aligning artificial intelligence with human values. Acta Baltica Historiae et Philosophiae Scientiarum 8:54\u201372. https:\/\/doi.org\/10.11590\/abhps.2020.2.04","journal-title":"Acta Baltica Historiae et Philosophiae Scientiarum"},{"key":"8898_CR70","volume-title":"Reinforcement learning\u2014an introduction. Adaptive computation and machine learning","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning\u2014an introduction. Adaptive computation and machine learning. MIT Press, Cambridge"},{"key":"8898_CR71","doi-asserted-by":"crossref","unstructured":"Svegliato J, Nashed SB, Zilberstein S (2021) Ethically compliant sequential decision making. In: Proceedings of the 35th AAAI International Conference on Artificial Intelligence","DOI":"10.1609\/aaai.v35i13.17386"},{"key":"8898_CR72","doi-asserted-by":"publisher","unstructured":"Tolmeijer S, Kneer M, Sarasua C, Christen M, Bernstein A (2021) Implementations in machine ethics: a survey. ACM Comput Surv. https:\/\/doi.org\/10.1145\/3419633","DOI":"10.1145\/3419633"},{"key":"8898_CR73","doi-asserted-by":"publisher","unstructured":"Vamplew P, Dazeley R, Foale C, Firmin S, Mummery J (2018) Human-aligned artificial intelligence is a multiobjective problem. Ethics Inform Technol. https:\/\/doi.org\/10.1007\/s10676-017-9440-6","DOI":"10.1007\/s10676-017-9440-6"},{"key":"8898_CR74","doi-asserted-by":"publisher","unstructured":"Vamplew P, Foale C, Dazeley R, Bignold A (2021) Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Eng Appl Artif Intell. https:\/\/doi.org\/10.1016\/j.engappai.2021.104186","DOI":"10.1016\/j.engappai.2021.104186"},{"key":"8898_CR75","doi-asserted-by":"publisher","unstructured":"Vamplew P, Yearwood J, Dazeley R, Berry A (2008) On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts. https:\/\/doi.org\/10.1007\/978-3-540-89378-3_37","DOI":"10.1007\/978-3-540-89378-3_37"},{"key":"8898_CR76","unstructured":"Vlassis NA (2009) A concise introduction to multiagent systems and distributed artificial intelligence. In: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence"},{"key":"8898_CR77","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins CJCH, Dayan P (1992) Technical note q-learning. Mach Learn 8:279\u2013292. https:\/\/doi.org\/10.1007\/BF00992698","journal-title":"Mach Learn"},{"key":"8898_CR78","doi-asserted-by":"crossref","unstructured":"Wu YH, Lin SD (2018) A low-cost ethics shaping approach for designing reinforcement learning agents. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32","DOI":"10.1609\/aaai.v32i1.11498"},{"key":"8898_CR79","doi-asserted-by":"crossref","unstructured":"Yu H, Shen Z, Miao C, Leung C, Lesser VR, Yang Q (2018) Building ethics into artificial intelligence. In: IJCAI, pp 5527\u20135533","DOI":"10.24963\/ijcai.2018\/779"},{"key":"8898_CR80","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1007\/978-3-030-60990-0_12","volume-title":"Multi-agent reinforcement learning: a selective overview of theories and algorithms","author":"K Zhang","year":"2021","unstructured":"Zhang K, Yang Z, Ba\u015far T (2021) Multi-agent reinforcement learning: a selective overview of theories and algorithms. Springer International Publishing, Cham, pp 321\u2013384. https:\/\/doi.org\/10.1007\/978-3-030-60990-0_12"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-023-08898-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-023-08898-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-023-08898-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,19]],"date-time":"2025-10-19T05:02:36Z","timestamp":1760850156000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-023-08898-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,23]]},"references-count":80,"journal-issue":{"issue":"31","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["8898"],"URL":"https:\/\/doi.org\/10.1007\/s00521-023-08898-y","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,23]]},"assertion":[{"value":"13 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 July 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflicts of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}