{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,28]],"date-time":"2026-07-28T07:59:36Z","timestamp":1785225576117,"version":"3.55.0"},"reference-count":218,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,4,1]],"date-time":"2022-04-01T00:00:00Z","timestamp":1648771200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,13]],"date-time":"2022-04-13T00:00:00Z","timestamp":1649808000000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011878","name":"Vlaamse regering","doi-asserted-by":"publisher","award":["Onderzoeksprogramma Artifici\u00eble Intelligentie (AI) Vlaanderen"],"award-info":[{"award-number":["Onderzoeksprogramma Artifici\u00eble Intelligentie (AI) Vlaanderen"]}],"id":[{"id":"10.13039\/501100011878","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National University Ireland, Galway"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.<\/jats:p>","DOI":"10.1007\/s10458-022-09552-y","type":"journal-article","created":{"date-parts":[[2022,4,13]],"date-time":"2022-04-13T04:03:08Z","timestamp":1649822588000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":339,"title":["A practical guide to multi-objective reinforcement learning and planning"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4783-7126","authenticated-orcid":false,"given":"Conor F.","family":"Hayes","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roxana","family":"R\u0103dulescu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eugenio","family":"Bargiacchi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Johan","family":"K\u00e4llstr\u00f6m","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matthew","family":"Macfarlane","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mathieu","family":"Reymond","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Timothy","family":"Verstraeten","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Luisa M.","family":"Zintgraf","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Richard","family":"Dazeley","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fredrik","family":"Heintz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Enda","family":"Howley","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Athirai A.","family":"Irissappane","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7951-878X","authenticated-orcid":false,"given":"Patrick","family":"Mannion","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ann","family":"Now\u00e9","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gabriel","family":"Ramos","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marcello","family":"Restelli","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter","family":"Vamplew","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Diederik M.","family":"Roijers","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,4,13]]},"reference":[{"key":"9552_CR1","unstructured":"Abdelfattah, S., Merrick, K., & Hu, J. (2019). Intrinsically motivated hierarchical policy learning in multi-objective markov decision processes. IEEE Transactions on Cognitive and Developmental Systems."},{"key":"9552_CR2","unstructured":"Abdolmaleki, A., Huang, S., Hasenclever, L., Neunert, M., Song, F., Zambelli, M., Martins, M., Heess, N., Hadsell, R., & Riedmiller, M. (2020). A distributional view on multi-objective policy optimization. In: International Conference on Machine Learning, (pp. 11\u201322). PMLR."},{"issue":"5","key":"9552_CR3","doi-asserted-by":"publisher","first-page":"3220","DOI":"10.1016\/j.rser.2012.02.016","volume":"16","author":"M Abdullah","year":"2012","unstructured":"Abdullah, M., Yatim, A., Tan, C., & Saidur, R. (2012). A review of maximum power point tracking algorithms for wind energy systems. Renewable and Sustainable Energy Reviews, 16(5), 3220\u20133227.","journal-title":"Renewable and Sustainable Energy Reviews"},{"key":"9552_CR4","unstructured":"Abels, A., Roijers, D., Lenaerts, T., Now\u00e9, A., & Steckelmacher, D. (2019). Dynamic weights in multi-objective deep reinforcement learning. In: International Conference on Machine Learning, (pp. 11\u201320). PMLR."},{"key":"9552_CR5","doi-asserted-by":"crossref","unstructured":"Aho, J., Buckspan, A., Laks, J., Fleming, P., Jeong, Y., Dunne, F., Churchfield, M., Pao, L., & Johnson, K. (2012). A tutorial of wind turbine control for supporting grid frequency through active power control. In: American Control Conference (ACC), pp. 3120\u20143131.","DOI":"10.1109\/ACC.2012.6315180"},{"key":"9552_CR6","unstructured":"Aissani, N., Beldjilali, B., & Trentesaux, D. (2008). Efficient and effective reactive scheduling of manufacturing system using sarsa-multi-objective agents. In: MOSIM\u201908: 7th Conference Internationale de Modelisation et Simulation, pp. 698\u2013707."},{"issue":"6","key":"9552_CR7","doi-asserted-by":"publisher","first-page":"851","DOI":"10.1109\/TEVC.2017.2767023","volume":"22","author":"LM Antonio","year":"2017","unstructured":"Antonio, L. M., & Coello, C. A. C. (2017). Coevolutionary multiobjective evolutionary algorithms: Survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation, 22(6), 851\u2013865.","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"9552_CR8","unstructured":"Aoki, K., Kimura, H., & Kobayashi, S. (2004). Distributed reinforcement learning using bi-directional decision making for multi-criteria control of multi-stage flow systems. In: The 8th Conference on Intelligent Autonomous Systems, pp. 281\u2013290."},{"key":"9552_CR9","doi-asserted-by":"crossref","unstructured":"Aumann, R.J. (1987). Correlated equilibrium as an expression of bayesian rationality. Econometrica: Journal of the Econometric Society, pp. 1\u201318.","DOI":"10.2307\/1911154"},{"key":"9552_CR10","doi-asserted-by":"crossref","unstructured":"Avigad, G., Eisenstadt, E., & Cohen, M.W. (2011). Optimal strategies for multi objective games and their search by evolutionary multi objective optimization. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG\u201911), pp. 166\u2013173. IEEE.","DOI":"10.1109\/CIG.2011.6032003"},{"key":"9552_CR11","unstructured":"Barreto, A., Dabney, W., Munos, R., Hunt, J.J., Schaul, T., van Hasselt, H.P., & Silver, D. (2017). Successor features for transfer in reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 4055\u20134065."},{"key":"9552_CR12","doi-asserted-by":"crossref","unstructured":"Barrett, L., & Narayanan, S. (2008). Learning all optimal policies with multiple criteria. In: Proceedings of the 25th International Conference on Machine Learning, pp. 41\u201347.","DOI":"10.1145\/1390156.1390162"},{"key":"9552_CR13","doi-asserted-by":"publisher","unstructured":"Beliakov, G., Bowsell, S., Cao, T., Dazeley, R., Mak-Hau, V., Nguyen, M.T., Wilkin, T., & Yearwood, J. (2019). Aggregation of dependent criteria in multicriteria decision making problems by means of capacities. In: 23rd International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand. https:\/\/doi.org\/10.36334\/modsim.2019.B3.beliakov","DOI":"10.36334\/modsim.2019.B3.beliakov"},{"key":"9552_CR14","unstructured":"Borsa, D., Barreto, A., Quan, J., Mankowitz, D.J., van Hasselt, H., Munos, R., Silver, D., & Schaul, T. (2019). Universal successor features approximators. In: International Conference on Learning Representations."},{"key":"9552_CR15","doi-asserted-by":"crossref","unstructured":"Bouneffouf, D., Rish, I., & Aggarwal, C. (2020). Survey on applications of multi-armed and contextual bandits. In: 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1\u20138. IEEE.","DOI":"10.1109\/CEC48606.2020.9185782"},{"key":"9552_CR16","unstructured":"Bryce, D., Cushing, W., & Kambhampati, S. (2007). Probabilistic planning is multi-objective. Arizona State University, Tech. Rep. ASU-CSE, 07-006."},{"key":"9552_CR17","doi-asserted-by":"crossref","unstructured":"Brys, T., Van\u00a0Moffaert, K., Van\u00a0Vaerenbergh, K., & Now\u00e9, A. (2013). On the behaviour of scalarization methods for the engagement of a wet clutch. In:2013 12th International Conference on Machine Learning and Applications, vol.\u00a01, pp. 258\u2013263. IEEE.","DOI":"10.1109\/ICMLA.2013.52"},{"key":"9552_CR18","doi-asserted-by":"crossref","unstructured":"Castelletti, A., Pianosi, F., & Restelli, M. (2012). Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems. In: IJCNN, pp. 1\u20138. IEEE.","DOI":"10.1109\/IJCNN.2012.6252759"},{"issue":"6","key":"9552_CR19","doi-asserted-by":"publisher","first-page":"3476","DOI":"10.1002\/wrcr.20295","volume":"49","author":"A Castelletti","year":"2013","unstructured":"Castelletti, A., Pianosi, F., & Restelli, M. (2013). A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run. Water Resources Research, 49(6), 3476\u20133486.","journal-title":"Water Resources Research"},{"issue":"6","key":"9552_CR20","doi-asserted-by":"publisher","first-page":"1595","DOI":"10.1016\/j.automatica.2008.03.003","volume":"44","author":"A Castelletti","year":"2008","unstructured":"Castelletti, A., Pianosi, F., & Soncini-Sessa, R. (2008). Water reservoir control under economic, social and environmental constraints. Automatica, 44(6), 1595\u20131607.","journal-title":"Automatica"},{"key":"9552_CR21","doi-asserted-by":"crossref","unstructured":"Chen, W., & Liu, L. (2019). Pareto monte carlo tree search for multi-objective informative planning. In: Robotics: Science and Systems.","DOI":"10.15607\/RSS.2019.XV.072"},{"key":"9552_CR22","doi-asserted-by":"crossref","unstructured":"Chen, X., Ghadirzadeh, A., Bj\u00f6rkman, M., & Jensfelt, P. (2019). Meta-learning for multi-objective reinforcement learning. In: 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 977\u2013983. IEEE.","DOI":"10.1109\/IROS40897.2019.8968092"},{"key":"9552_CR23","doi-asserted-by":"crossref","unstructured":"Chen, D., Wang, Y., & Gao, W. (2020). Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Applied Intelligence.","DOI":"10.1007\/s10489-020-01702-7"},{"key":"9552_CR24","unstructured":"Cheng, H.T. (1988). Algorithms for partially observable Markov decision processes. Ph.D. thesis, University of British Columbia."},{"key":"9552_CR25","doi-asserted-by":"publisher","unstructured":"Cohen, J. E. (1998). Cooperation and self-interest: Pareto-inefficiency of nash equilibria in finite random games. Proceedings of the National Academy of Sciences,95(17), 9724\u20139731. https:\/\/doi.org\/10.1073\/pnas.95.17.9724. URL https:\/\/www.pnas.org\/content\/95\/17\/9724","DOI":"10.1073\/pnas.95.17.9724"},{"key":"9552_CR26","doi-asserted-by":"crossref","unstructured":"Cruz, F., Dazeley, R., & Vamplew, P. (2019). Memory-based explainable reinforcement learning. In: Australasian Joint Conference on Artificial Intelligence, pp. 66\u201377. Springer.","DOI":"10.1007\/978-3-030-35288-2_6"},{"key":"9552_CR27","doi-asserted-by":"crossref","unstructured":"da\u00a0Silva\u00a0Veith, A., de\u00a0Souza, F.R., de\u00a0Assun\u00e7\u00e3o, M.D., Lef\u00e8vre, L., & dos Anjos, J.C.S. (2019). Multi-objective reinforcement learning for reconfiguring data stream analytics on edge computing. In: Proceedings of the 48th International Conference on Parallel Processing, pp. 1\u201310.","DOI":"10.1145\/3337821.3337894"},{"key":"9552_CR28","unstructured":"Dazeley, R., Vamplew, P., & Cruz, F. (2021). Explainable reinforcement learning for broad-xai: A conceptual framework and survey. arXiv preprint arXiv:2108.09003."},{"key":"9552_CR29","doi-asserted-by":"publisher","first-page":"103525","DOI":"10.1016\/j.artint.2021.103525","volume":"299","author":"R Dazeley","year":"2021","unstructured":"Dazeley, R., Vamplew, P., Foale, C., Young, C., Aryal, S., & Cruz, F. (2021). Levels of explainable artificial intelligence for human-aligned conversational explanations. Artificial Intelligence, 299, 103525.","journal-title":"Artificial Intelligence"},{"key":"9552_CR30","doi-asserted-by":"crossref","unstructured":"Deb, K. (2011). Multi-objective optimisation using evolutionary algorithms: an introduction. In: Multi-objective evolutionary optimisation for product design and manufacturing, pp. 3\u201334. Springer.","DOI":"10.1007\/978-0-85729-652-8_1"},{"issue":"2","key":"9552_CR31","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1109\/4235.996017","volume":"6","author":"K Deb","year":"2002","unstructured":"Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182\u2013197.","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"9552_CR32","doi-asserted-by":"crossref","unstructured":"Deisenroth, M.P., Neumann, G., Peters, J., et\u00a0al. (2013). A survey on policy search for robotics. Foundations and Trends\u00ae in Robotics2(1\u20132), 1\u2013142.","DOI":"10.1561\/2300000021"},{"key":"9552_CR33","unstructured":"Delle\u00a0Fave, F., Stranders, R., Rogers, A., & Jennings, N. (2011). Bounded decentralised coordination over multiple objectives. In: Proceedings of the Tenth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 371\u2013378."},{"key":"9552_CR34","doi-asserted-by":"crossref","unstructured":"Deng, Z., & Liu, M. (2018). An integrated generation-compensation optimization strategy for enhanced short-term voltage security of large-scale power systems using multi-objective reinforcement learning method. In: 2018 International Conference on Power System Technology (POWERCON), pp. 4099\u20134106. IEEE.","DOI":"10.1109\/POWERCON.2018.8601814"},{"key":"9552_CR35","doi-asserted-by":"publisher","first-page":"34770","DOI":"10.1109\/ACCESS.2020.2974503","volume":"8","author":"Z Deng","year":"2020","unstructured":"Deng, Z., Lu, Z., Guo, Z., Yao, W., Zhao, W., Zhou, B., & Hong, C. (2020). Coordinated optimization of generation and compensation to enhance short-term voltage security of power systems using accelerated multi-objective reinforcement learning. IEEE Access, 8, 34770\u201334782.","journal-title":"IEEE Access"},{"key":"9552_CR36","doi-asserted-by":"crossref","unstructured":"Dornheim, J., & Link, N. (2018). Multiobjective reinforcement learning for reconfigurable adaptive optimal control of manufacturing processes. In: 2018 International Symposium on Electronics and Telecommunications (ISETC), pp. 1\u20135. IEEE.","DOI":"10.1109\/ISETC.2018.8583854"},{"key":"9552_CR37","doi-asserted-by":"crossref","unstructured":"Drugan, M.M., & Nowe, A. (2013). Designing multi-objective multi-armed bandits algorithms: A study. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1\u20138. IEEE.","DOI":"10.1109\/IJCNN.2013.6707036"},{"issue":"1","key":"9552_CR38","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1109\/TCC.2014.2303077","volume":"2","author":"R Duan","year":"2014","unstructured":"Duan, R., Prodan, R., & Li, X. (2014). Multi-objective game theoretic scheduling of bag-of-tasks workflows on hybrid clouds. IEEE Transactions on Cloud Computing, 2(1), 29\u201342.","journal-title":"IEEE Transactions on Cloud Computing"},{"issue":"3","key":"9552_CR39","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1016\/0304-4068(90)90004-S","volume":"19","author":"P Dubey","year":"1990","unstructured":"Dubey, P., & Rogawski, J. (1990). Inefficiency of smooth market mechanisms. Journal of Mathematical Economics, 19(3), 285\u2013304.","journal-title":"Journal of Mathematical Economics"},{"key":"9552_CR40","doi-asserted-by":"crossref","unstructured":"Dusparic, I., & Cahill, V. (2009). Distributed w-learning: Multi-policy optimization in self-organizing systems. In: 2009 Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems, pp. 20\u201329. IEEE.","DOI":"10.1109\/SASO.2009.23"},{"key":"9552_CR41","doi-asserted-by":"crossref","unstructured":"Eisenstadt, E., Moshaiov, A., & Avigad, G. (2015). Co-evolution of strategies for multi-objective games under postponed objective preferences. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 461\u2013468. IEEE.","DOI":"10.1109\/CIG.2015.7317915"},{"key":"9552_CR42","doi-asserted-by":"crossref","unstructured":"Elfwing, S., & Seymour, B. (2017). Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the maxpain algorithm. In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 140\u2013147. IEEE.","DOI":"10.1109\/DEVLRN.2017.8329799"},{"key":"9552_CR43","first-page":"503","volume":"6","author":"D Ernst","year":"2005","unstructured":"Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503\u2013556.","journal-title":"Journal of Machine Learning Research"},{"issue":"2","key":"9552_CR44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3376916","volume":"53","author":"JG Falc\u00f3n-Cardona","year":"2020","unstructured":"Falc\u00f3n-Cardona, J. G., & Coello, C. A. C. (2020). Indicator-based multi-objective evolutionary algorithms: A comprehensive survey. ACM Computing Surveys (CSUR), 53(2), 1\u201335.","journal-title":"ACM Computing Surveys (CSUR)"},{"issue":"5","key":"9552_CR45","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1109\/MCOM.2019.1800796","volume":"57","author":"PVR Ferreira","year":"2019","unstructured":"Ferreira, P. V. R., Paffenroth, R., Wyglinski, A. M., Hackett, T. M., Bilen, S. G., Reinhart, R. C., & Mortensen, D. J. (2019). Reinforcement learning for satellite communications: from leo to deep space operations. IEEE Communications Magazine, 57(5), 70\u201375.","journal-title":"IEEE Communications Magazine"},{"key":"9552_CR46","unstructured":"Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126\u20131135."},{"key":"9552_CR47","unstructured":"G\u00e1bor, Z., Kalm\u00e1r, Z., & Szepesv\u00e1ri, C. (1998). Multi-criteria reinforcement learning. In: ICML,98, 197\u2013205."},{"key":"9552_CR48","doi-asserted-by":"crossref","unstructured":"Galand, L., & Lust, T. (2015). Exact methods for computing all lorenz optimal solutions to biobjective problems. In: International Conference on Algorithmic DecisionTheory, pp. 305\u2013321. Springer.","DOI":"10.1007\/978-3-319-23114-3_19"},{"issue":"1","key":"9552_CR49","first-page":"1437","volume":"16","author":"J Garc\u0131a","year":"2015","unstructured":"Garc\u0131a, J., & Fern\u00e1ndez, F. (2015). A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1), 1437\u20131480.","journal-title":"Journal of Machine Learning Research"},{"key":"9552_CR50","doi-asserted-by":"crossref","unstructured":"Geibel, P. (2006). Reinforcement learning for MDPs with constraints. In: European Conference on Machine Learning, pp. 646\u2013653. Springer.","DOI":"10.1007\/11871842_63"},{"key":"9552_CR51","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1613\/jair.1666","volume":"24","author":"P Geibel","year":"2005","unstructured":"Geibel, P., & Wysotzki, F. (2005). Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research, 24, 81\u2013108.","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"2","key":"9552_CR52","doi-asserted-by":"publisher","first-page":"04015050","DOI":"10.1061\/(ASCE)WR.1943-5452.0000570","volume":"142","author":"M Giuliani","year":"2016","unstructured":"Giuliani, M., Castelletti, A., Pianosi, F., Mason, E., & Reed, P. M. (2016). Curses, tradeoffs, and scalable management: Advancing evolutionary multiobjective direct policy search to improve water reservoir operations. Journal of Water Resources Planning and Management, 142(2), 04015050.","journal-title":"Journal of Water Resources Planning and Management"},{"key":"9552_CR53","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/j.envsoft.2014.02.011","volume":"57","author":"M Giuliani","year":"2014","unstructured":"Giuliani, M., Galelli, S., & Soncini-Sessa, R. (2014). A dimensionality reduction approach for many-objective markov decision processes: Application to a water reservoir operation problem. Environmental Modelling & Software, 57, 101\u2013114.","journal-title":"Environmental Modelling & Software"},{"key":"9552_CR54","doi-asserted-by":"crossref","unstructured":"Govindaiah, S., & Petty, M.D. (2019). Applying reinforcement learning to plan manufacturing material handling part 1: Background and formal problem specification. In: Proceedings of the 2019 ACM Southeast Conference, pp. 168\u2013171.","DOI":"10.1145\/3299815.3314451"},{"key":"9552_CR55","doi-asserted-by":"crossref","unstructured":"Grandoni, F., Krysta, P., Leonardi, S., & Ventre, C. (2010). Utilitarian mechanism design for multi-objective optimization. In: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, pp. 573\u2013584. Society for Industrial and Applied Mathematics.","DOI":"10.1137\/1.9781611973075.48"},{"issue":"2","key":"9552_CR56","doi-asserted-by":"publisher","first-page":"55","DOI":"10.4018\/jats.2009040104","volume":"1","author":"Y Guo","year":"2009","unstructured":"Guo, Y., Zeman, A., & Li, R. (2009). A reinforcement learning approach to setting multi-objective goals for energy demand management. International Journal of Agent Technologies and Systems (IJATS), 1(2), 55\u201370.","journal-title":"International Journal of Agent Technologies and Systems (IJATS)"},{"key":"9552_CR57","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/j.engappai.2019.08.014","volume":"86","author":"MM Hasan","year":"2019","unstructured":"Hasan, M. M., Lwin, K., Imani, M., Shabut, A., Bittencourt, L. F., & Hossain, M. A. (2019). Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality. Engineering Applications of Artificial Intelligence, 86, 107\u2013135.","journal-title":"Engineering Applications of Artificial Intelligence"},{"key":"9552_CR58","first-page":"2613","volume":"23","author":"H Hasselt","year":"2010","unstructured":"Hasselt, H. (2010). Double q-learning. Advances in Neural Information Processing Systems, 23, 2613\u20132621.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"9552_CR59","unstructured":"Hayes, C.F., Reymond, M., Roijers, D.M., Howley, E., & Mannion, P. (2021). Distributional monte carlo tree search for risk-aware and multi-objective reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1530\u20131532."},{"key":"9552_CR60","unstructured":"Hayes, C.F., Reymond, M., Roijers, D.M., Howley, E., & Mannion, P. (2021). Risk-aware and multi-objective decision making with distributional monte carlo tree search. arXiv preprint arXiv:2102.00966."},{"key":"9552_CR61","doi-asserted-by":"crossref","unstructured":"Horie, N., Matsui, T., Moriyama, K., Mutoh, A., & Inuzuka, N. (2019). Multi-objective safe reinforcement learning. Artificial Life and Robotics pp. 1\u20139.","DOI":"10.1007\/s10015-019-00523-3"},{"key":"9552_CR62","doi-asserted-by":"crossref","unstructured":"Horwood, J., & Noutahi, E. (2020). Molecular design in synthetically accessible chemical space via deep reinforcement learning. arXiv preprint arXiv:2004.14308.","DOI":"10.1021\/acsomega.0c04153"},{"key":"9552_CR63","doi-asserted-by":"crossref","unstructured":"Hu, X., Zhang, Y., Liao, X., Liu, Z., Wang, W., & Ghannouchi, F.M. (2020). Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems. IEEE Transactions on Broadcasting.","DOI":"10.1109\/TBC.2019.2960940"},{"key":"9552_CR64","unstructured":"Huang, S.H., Zambelli, M., Kay, J., Martins, M.F., Tassa, Y., Pilarski, P.M., & Hadsell, R. (2019). Learning gentle object manipulation with curiosity-driven deep reinforcement learning. arXiv preprint arXiv:1903.08542."},{"key":"9552_CR65","doi-asserted-by":"crossref","unstructured":"Igarashi, A., & Roijers, D.M. (2017). Multi-criteria coalition formation games. In: International Conference on Algorithmic Decision Theory, pp. 197\u2013213. Springer.","DOI":"10.1007\/978-3-319-67504-6_14"},{"key":"9552_CR66","doi-asserted-by":"crossref","unstructured":"Ikenaga, A., & Arai, S. (2018). Inverse reinforcement learning approach for elicitation of preferences in multi-objective sequential optimization. In: 2018 IEEE International Conference on Agents (ICA), pp. 117\u2013118. IEEE.","DOI":"10.1109\/AGENTS.2018.8460075"},{"key":"9552_CR67","doi-asserted-by":"crossref","unstructured":"Inja, M., Kooijman, C., de\u00a0Waard, M., Roijers, D.M., & Whiteson, S. (2014). Queued pareto local search for multi-objective optimization. In: International Conference on Parallel Problem Solving from Nature, pp. 589\u2013599. Springer.","DOI":"10.1007\/978-3-319-10762-2_58"},{"key":"9552_CR68","doi-asserted-by":"crossref","unstructured":"Issabekov, R., & Vamplew, P. (2012). An empirical comparison of two common multiobjective reinforcement learning algorithms. In: Australasian Joint Conference on Artificial Intelligence, pp. 626\u2013636. Springer.","DOI":"10.1007\/978-3-642-35101-3_53"},{"issue":"5","key":"9552_CR69","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1080\/0952813X.2017.1292319","volume":"29","author":"A Jalalimanesh","year":"2017","unstructured":"Jalalimanesh, A., Haghighi, H. S., Ahmadi, A., Hejazian, H., & Soltani, M. (2017). Multi-objective optimization of radiotherapy: distributed q-learning and agent-based simulation. Journal of Experimental & Theoretical artificial intelligence, 29(5), 1071\u20131086.","journal-title":"Journal of Experimental & Theoretical artificial intelligence"},{"issue":"10","key":"9552_CR70","doi-asserted-by":"publisher","first-page":"3900","DOI":"10.1109\/TITS.2019.2906260","volume":"20","author":"J Jin","year":"2019","unstructured":"Jin, J., & Ma, X. (2019). A multi-objective agent-based control approach with application in intelligent traffic signal system. IEEE Transactions on Intelligent Transportation Systems, 20(10), 3900\u20133912.","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"9552_CR71","doi-asserted-by":"crossref","unstructured":"Jonker, C.M., Aydo\u011fan, R., Baarslag, T., Fujita, K., Ito, T., & Hindriks, K. (2017). Automated negotiating agents competition (anac). In: Thirty-First AAAI Conference on Artificial Intelligence.","DOI":"10.1609\/aaai.v31i1.10637"},{"key":"9552_CR72","unstructured":"Juozapaitis, Z., Koul, A., Fern, A., Erwig, M., & Doshi-Velez, F. (2019). Explainable reinforcement learning via reward decomposition. In: IJCAI\/ECAI Workshop on Explainable Artificial Intelligence."},{"key":"9552_CR73","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/j.neucom.2017.04.074","volume":"263","author":"TG Karimpanal","year":"2017","unstructured":"Karimpanal, T. G., & Wilhelm, E. (2017). Identification and off-policy learning of multiple objectives using adaptive clustering. Neurocomputing, 263, 39\u201347.","journal-title":"Neurocomputing"},{"key":"9552_CR74","unstructured":"Kluyver, T., Ragan-Kelley, B., P\u00e9rez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., & development team, J. (2016). Jupyter notebooks - a publishing format for reproducible computational workflows. In: F.\u00a0Loizides, B.\u00a0Scmidt (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87\u201390. IOS Press, Netherlands. URL https:\/\/eprints.soton.ac.uk\/403913\/"},{"key":"9552_CR75","unstructured":"Kooijman, C., de\u00a0Waard, M., Inja, M., Roijers, D., & Whiteson, S. (2015). Pareto local policy search for momdp planning. In: ESANN 2015: Proceedings of the 23rd European Symposium on Artificial Neural Networks, Special Session on Emerging Techniques and Applications in Multi-Objective Reinforcement Learning, pp. 53\u201358. URL http:\/\/www.cs.ox.ac.uk\/people\/shimon.whiteson\/pubs\/kooijmanesann15.pdf"},{"key":"9552_CR76","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.engappai.2019.01.010","volume":"80","author":"E Krasheninnikova","year":"2019","unstructured":"Krasheninnikova, E., Garc\u00eda, J., Maestre, R., & Fern\u00e1ndez, F. (2019). Reinforcement learning for pricing strategy optimization in the insurance industry. Engineering Applications of Artificial Intelligence, 80, 8\u201319.","journal-title":"Engineering Applications of Artificial Intelligence"},{"issue":"1","key":"9552_CR77","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1111\/biom.12132","volume":"70","author":"EB Laber","year":"2014","unstructured":"Laber, E. B., Lizotte, D. J., & Ferguson, B. (2014). Set-valued dynamic treatment regimes for competing outcomes. Biometrics, 70(1), 53\u201361.","journal-title":"Biometrics"},{"key":"9552_CR78","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.neucom.2016.12.076","volume":"246","author":"A Lacerda","year":"2017","unstructured":"Lacerda, A. (2017). Multi-objective ranked bandits for recommender systems. Neurocomputing, 246, 12\u201324.","journal-title":"Neurocomputing"},{"issue":"6","key":"9552_CR79","doi-asserted-by":"publisher","first-page":"608","DOI":"10.1016\/j.chemosphere.2012.01.014","volume":"87","author":"CS Lee","year":"2012","unstructured":"Lee, C. S. (2012). Multi-objective game-theory models for conflict analysis in reservoir watershed management. Chemosphere, 87(6), 608\u2013613.","journal-title":"Chemosphere"},{"key":"9552_CR80","doi-asserted-by":"crossref","unstructured":"Lepenioti, K., Pertselakis, M., Bousdekis, A., Louca, A., Lampathaki, F., Apostolou, D., Mentzas, G., & Anastasiou, S. (2020). Machine learning for predictive and prescriptive analytics of operational data in smart manufacturing. In: International Conference on Advanced Information Systems Engineering, pp. 5\u201316. Springer.","DOI":"10.1007\/978-3-030-49165-9_1"},{"key":"9552_CR81","unstructured":"Li, C., & Czarnecki, K. (2019). Urban driving with multi-objective deep reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 359\u2013367. International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9552_CR82","unstructured":"Li, K., Zhang, T., & Wang, R. (2020). Deep reinforcement learning for multiobjective optimization. IEEE Transactions on Cybernetics."},{"issue":"1","key":"9552_CR83","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1016\/j.eswa.2011.07.019","volume":"39","author":"X Li","year":"2012","unstructured":"Li, X., Gao, L., & Li, W. (2012). Application of game theory based hybrid algorithm for multi-objective integrated process planning and scheduling. Expert Systems with Applications, 39(1), 288\u2013297.","journal-title":"Expert Systems with Applications"},{"issue":"1","key":"9552_CR84","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2792984","volume":"48","author":"B Li","year":"2015","unstructured":"Li, B., Li, J., Tang, K., & Yao, X. (2015). Many-objective evolutionary algorithms: A survey. ACM Computing Surveys (CSUR), 48(1), 1\u201335.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"9552_CR85","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971."},{"key":"9552_CR86","unstructured":"Lizotte, D.J., Bowling, M.H., & Murphy, S.A. (2010). Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 695\u2013702. Citeseer."},{"key":"9552_CR87","unstructured":"Ma, C., Wen, J., & Bengio, Y. (2018). Universal successor representations for transfer reinforcement learning. arXiv preprint arXiv:1804.03758."},{"key":"9552_CR88","doi-asserted-by":"crossref","unstructured":"Mandel, T., Liu, Y.E., Brunskill, E., & Popovic, Z. (2017). Where to add actions in human-in-the-loop reinforcement learning. In: AAAI, pp. 2322\u20132328.","DOI":"10.1609\/aaai.v31i1.10945"},{"key":"9552_CR89","doi-asserted-by":"crossref","unstructured":"Mandow, L., & P\u00e9rez-de-la Cruz, J.L. (2018). Pruning dominated policies in multiobjective Pareto q-learning. In: Conference of the Spanish Association for Artificial Intelligence, pp. 240\u2013250. Springer.","DOI":"10.1007\/978-3-030-00374-6_23"},{"key":"9552_CR90","doi-asserted-by":"crossref","unstructured":"Mannion, P., Devlin, S., Duggan, J., & Howley, E. (2018). Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. The Knowledge Engineering Review, 33(e23). URL https:\/\/doi.org\/10.1017\/S0269888918000292.","DOI":"10.1017\/S0269888918000292"},{"key":"9552_CR91","doi-asserted-by":"crossref","unstructured":"Mannion, P., Devlin, S., Mason, K., Duggan, J., & Howley, E. (2017). Policy invariance under reward transformations for multi-objective reinforcement learning. Neurocomputing, 263.","DOI":"10.1016\/j.neucom.2017.05.090"},{"key":"9552_CR92","doi-asserted-by":"publisher","unstructured":"Mannion, P., Duggan, J., & Howley, E. (2016). An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In: Autonomic Road Transport Support Systems, pp. 47\u201366. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-319-25808-9_4","DOI":"10.1007\/978-3-319-25808-9_4"},{"key":"9552_CR93","unstructured":"Mannion, P., Heintz, F., Karimpanal, T.G., & Vamplew, P. (2021). Multi-objective decision making for trustworthy ai. In: Proceedings of the Multi-Objective Decision Making (MODeM) Workshop."},{"key":"9552_CR94","doi-asserted-by":"crossref","unstructured":"Marinescu, R. (2009). Exploiting problem decomposition in multi-objective constraint optimization. In: International Conference on Principles and Practice of Constraint Programming, pp. 592\u2013607. Springer.","DOI":"10.1007\/978-3-642-04244-7_47"},{"key":"9552_CR95","doi-asserted-by":"crossref","unstructured":"Marinescu, R. (2011). Efficient approximation algorithms for multi-objective constraint optimization. In: ADT 2011: Proceedings of the Second International Conference on Algorithmic Decision Theory, pp. 150\u2013164.","DOI":"10.1007\/978-3-642-24873-3_12"},{"key":"9552_CR96","doi-asserted-by":"crossref","unstructured":"Matsui, T. (2019). A study of joint policies considering bottlenecks and fairness. In: ICAART (1), pp. 80\u201390.","DOI":"10.5220\/0007577800800090"},{"key":"9552_CR97","unstructured":"Mello, F., Apostolopoulou, D., & Alonso, E. (2020). Cost efficient distributed load frequency control in power systems. In: 21st IFAC World Congress."},{"key":"9552_CR98","doi-asserted-by":"crossref","unstructured":"M\u00e9ndez-Hern\u00e1ndez, B.M., Rodr\u00edguez-Bazan, E.D., Martinez-Jimenez, Y., Libin, P., & Now\u00e9, A. (2019). A multi-objective reinforcement learning algorithm for jssp. In: International Conference on Artificial Neural Networks, pp. 567\u2013584. Springer.","DOI":"10.1007\/978-3-030-30487-4_44"},{"key":"9552_CR99","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1016\/j.jclepro.2017.10.297","volume":"174","author":"EJN Menezes","year":"2018","unstructured":"Menezes, E. J. N., Ara\u00fajo, A. M., & da Silva, N. S. B. (2018). A review on wind turbine control and its associated methods. Journal of Cleaner Production, 174, 945\u2013953.","journal-title":"Journal of Cleaner Production"},{"key":"9552_CR100","doi-asserted-by":"crossref","unstructured":"Messikh, C., & Zarour, N. (2018). Towards a multi-objective reinforcement learning based routing protocol for cognitive radio networks. In: 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), pp. 84\u201389. IEEE.","DOI":"10.1109\/SaCoNeT.2018.8585717"},{"issue":"7540","key":"9552_CR101","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529\u2013533.","journal-title":"Nature"},{"key":"9552_CR102","doi-asserted-by":"crossref","unstructured":"Moghaddam, A., Yalaoui, F., & Amodeo, L. (2011). Lorenz versus pareto dominance in a single machine scheduling problem with rejection. In: International Conference on Evolutionary Multi-Criterion Optimization, pp. 520\u2013534. Springer.","DOI":"10.1007\/978-3-642-19893-9_36"},{"key":"9552_CR103","unstructured":"Mossalam, H., Assael, Y.M., Roijers, D.M., & Whiteson, S. (2016). Multi-objective deep reinforcement learning. In: NIPS 2016 Workshop on Deep Reinforcement Learning."},{"key":"9552_CR104","doi-asserted-by":"crossref","unstructured":"Multi-objective routing in integrated services networks. (1991). Economides, A.A., Silvester, J.A., et al. A game theory approach. In: Infocom,91, 1220\u20131227.","DOI":"10.1109\/INFCOM.1991.147643"},{"key":"9552_CR105","unstructured":"Nagabandi, A., Clavera, I., Liu, S., Fearing, R.S., Abbeel, P., Levine, S., & Finn, C. (2019). Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: Proceedings of Seventh International Conference on Learning Representations."},{"issue":"2","key":"9552_CR106","doi-asserted-by":"publisher","first-page":"286","DOI":"10.2307\/1969529","volume":"54","author":"J Nash","year":"1951","unstructured":"Nash, J. (1951). Non-cooperative games. Annals of Mathematics, 54(2), 286\u2013295.","journal-title":"Annals of Mathematics"},{"key":"9552_CR107","doi-asserted-by":"crossref","unstructured":"Natarajan, S., & Tadepalli, P. (2005). Dynamic preferences in multi-criteria reinforcement learning. In: Proceedings of the 22nd international conference on Machine learning, pp. 601\u2013608.","DOI":"10.1145\/1102351.1102427"},{"key":"9552_CR108","unstructured":"Nguyena, M., & Caoa, T. (2017). A hybrid decision making model for evaluating land combat vehicle system. In: 22nd International Congress on Modelling and Simulation, MODSIM2017, Modelling and Simulation Society of Australia and New Zealand, pp. 1399\u20131405."},{"key":"9552_CR109","doi-asserted-by":"publisher","first-page":"103915","DOI":"10.1016\/j.engappai.2020.103915","volume":"96","author":"TT Nguyen","year":"2020","unstructured":"Nguyen, T. T., Nguyen, N. D., Vamplew, P., Nahavandi, S., Dazeley, R., & Lim, C. P. (2020). A multi-objective deep reinforcement learning framework. Engineering Applications of Artificial Intelligence, 96, 103915.","journal-title":"Engineering Applications of Artificial Intelligence"},{"key":"9552_CR110","unstructured":"Nian, X., Irissappane, A.A., & Roijers, D. (2020). DCRAC: Deep conditioned recurrent actor-critic for multi-objective partially observable environments. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 931\u2013938."},{"key":"9552_CR111","unstructured":"Noothigattu, R., Bouneffouf, D., Mattei, N., Chandra, R., Madan, P., Varshney, K., Campbell, M., Singh, M., & Rossi, F. (2018). Interpretable multi-objective reinforcement learning through policy orchestration. arXiv preprint arXiv:1809.08343."},{"key":"9552_CR112","doi-asserted-by":"crossref","unstructured":"Ort\u00fazar, J.d.D., & Willumsen, L.G. (2011). Modelling transport (4th ed.). Chichester, UK: John Wiley & Sons.","DOI":"10.1002\/9781119993308"},{"key":"9552_CR113","doi-asserted-by":"publisher","first-page":"105392","DOI":"10.1016\/j.knosys.2019.105392","volume":"193","author":"A Pan","year":"2020","unstructured":"Pan, A., Xu, W., Wang, L., & Ren, H. (2020). Additional planning with multiple objectives for reinforcement learning. Knowledge-Based Systems, 193, 105392.","journal-title":"Knowledge-Based Systems"},{"key":"9552_CR114","doi-asserted-by":"crossref","unstructured":"Parisi, S., Pirotta, M., Smacchia, N., Bascetta, L., Restelli, M. (2014). Policy gradient approaches for multi-objective sequential decision making. In: IJCNN, pp. 2323\u20132330. IEEE.","DOI":"10.1109\/IJCNN.2014.6889738"},{"key":"9552_CR115","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.neucom.2016.11.094","volume":"263","author":"S Parisi","year":"2017","unstructured":"Parisi, S., Pirotta, M., & Peters, J. (2017). Manifold-based multi-objective policy search with sample reuse. Neurocomputing, 263, 3\u201314.","journal-title":"Neurocomputing"},{"key":"9552_CR116","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1613\/jair.4961","volume":"57","author":"S Parisi","year":"2016","unstructured":"Parisi, S., Pirotta, M., & Restelli, M. (2016). Multi-objective reinforcement learning through continuous pareto manifold approximation. Journal of Artificial Intelligence Research, 57, 187\u2013227.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9552_CR117","doi-asserted-by":"crossref","unstructured":"Perez, J., Germain-Renaud, C., K\u00e9gl, B., & Loomis, C. (2009). Responsive elastic computing. In: Proceedings of the 6th International Conference Industry Session on Grids Meets Autonomic Computing, pp. 55\u201364.","DOI":"10.1145\/1555301.1555311"},{"key":"9552_CR118","doi-asserted-by":"crossref","unstructured":"Perez, D., Samothrakis, S., & Lucas, S. (2013). Online and offline learning in multi-objective monte carlo tree search. In: 2013 IEEE Conference on Computational Inteligence in Games (CIG), pp. 1\u20138. IEEE.","DOI":"10.1109\/CIG.2013.6633621"},{"issue":"3","key":"9552_CR119","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1007\/s10723-010-9161-0","volume":"8","author":"J Perez","year":"2010","unstructured":"Perez, J., Germain-Renaud, C., K\u00e9gl, B., & Loomis, C. (2010). Multi-objective reinforcement learning for responsive grids. Journal of Grid Computing, 8(3), 473\u2013492.","journal-title":"Journal of Grid Computing"},{"key":"9552_CR120","unstructured":"Perny, P., & Weng, P. (2010). On finding compromise solutions in multiobjective markov decision processes. In: ECAI, vol. 215, pp. 969\u2013970."},{"key":"9552_CR121","unstructured":"Perny, P., Weng, P., Goldsmith, J., & Hanna, J. (2013). Approximation of lorenz-optimal solutions in multiobjective markov decision processes. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence, pp. 92\u201394."},{"issue":"2","key":"9552_CR122","doi-asserted-by":"publisher","first-page":"258","DOI":"10.2166\/hydro.2013.169","volume":"15","author":"F Pianosi","year":"2013","unstructured":"Pianosi, F., Castelletti, A., & Restelli, M. (2013). Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management. Journal of Hydroinformatics, 15(2), 258\u2013270.","journal-title":"Journal of Hydroinformatics"},{"key":"9552_CR123","doi-asserted-by":"crossref","unstructured":"Pla, A., Lopez, B., & Murillo, J. (2012). Multi criteria operators for multi-attribute auctions. In: International Conference on Modeling Decisions for Artificial Intelligence, pp. 318\u2013328. Springer.","DOI":"10.1007\/978-3-642-34620-0_29"},{"issue":"1","key":"9552_CR124","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1007\/s11227-019-03033-y","volume":"76","author":"Y Qin","year":"2020","unstructured":"Qin, Y., Wang, H., Yi, S., Li, X., & Zhai, L. (2020). An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning. The Journal of Supercomputing, 76(1), 455\u2013480.","journal-title":"The Journal of Supercomputing"},{"issue":"9","key":"9552_CR125","doi-asserted-by":"publisher","first-page":"e0138970","DOI":"10.1371\/journal.pone.0138970","volume":"10","author":"S Qu","year":"2015","unstructured":"Qu, S., Ji, Y., & Goh, M. (2015). The robust weighted multi-objective game. PloS one, 10(9), e0138970.","journal-title":"PloS one"},{"key":"9552_CR126","doi-asserted-by":"crossref","unstructured":"R\u0103dulescu, R., Mannion, P., Roijers, D.M., & Now\u00e9, A. (2020). Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems, 34(10).","DOI":"10.1007\/s10458-019-09433-x"},{"key":"9552_CR127","doi-asserted-by":"publisher","first-page":"e32","DOI":"10.1017\/S0269888920000351","volume":"35","author":"R R\u0103dulescu","year":"2020","unstructured":"R\u0103dulescu, R., Mannion, P., Zhang, Y., Roijers, D. M., & Now\u00e9, A. (2020). A utility-based analysis of equilibria in multi-objective normal-form games. The Knowledge Engineering Review, 35, e32. https:\/\/doi.org\/10.1017\/S0269888920000351.","journal-title":"The Knowledge Engineering Review"},{"key":"9552_CR128","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-021-06184-3","author":"R R\u0103dulescu","year":"2021","unstructured":"R\u0103dulescu, R., Verstraeten, T., Zhang, Y., Mannion, P., Roijers, D. M., & Now\u00e9, A. (2021). Opponent learning awareness and modelling in multi-objective normal form games. Neural Computing and Applications. https:\/\/doi.org\/10.1007\/s00521-021-06184-3.","journal-title":"Neural Computing and Applications"},{"issue":"1","key":"9552_CR129","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1007\/s10776-019-00463-6","volume":"27","author":"RN Raj","year":"2020","unstructured":"Raj, R. N., Nayak, A., & Kumar, M. S. (2020). A survey and performance evaluation of reinforcement learning based spectrum aware routing in cognitive radio ad hoc networks. International Journal of Wireless Information Networks, 27(1), 144\u2013163.","journal-title":"International Journal of Wireless Information Networks"},{"key":"9552_CR130","doi-asserted-by":"publisher","unstructured":"Ramos, G.de.O., da Silva, B.C., R\u0103dulescu, R., Bazzan, A.L.C., & Now\u00e9, A. (2020). Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review, 35, e8. https:\/\/doi.org\/10.1017\/S0269888920000119.","DOI":"10.1017\/S0269888920000119"},{"key":"9552_CR131","unstructured":"Ramos, G.de.O., R\u0103dulescu, R., Now\u00e9, A., & Tavares, A.R. (2020). Toll-based learning for minimising congestion under heterogeneous preferences. In: B.\u00a0An, N.\u00a0Yorke-Smith, A.\u00a0El\u00a0Fallah\u00a0Seghrouchni, G.\u00a0Sukthankar (eds.) Proc. of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), pp. 1098\u20131106. IFAAMAS, Auckland, New Zealand."},{"key":"9552_CR132","doi-asserted-by":"crossref","unstructured":"Ravichandran, N.B., Yang, F., Peters, C., Lansner, A., & Herman, P. (2018). Pedestrian simulation as multi-objective reinforcement learning. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, pp. 307\u2013312.","DOI":"10.1145\/3267851.3267914"},{"issue":"6","key":"9552_CR133","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1007\/s11269-005-9011-1","volume":"20","author":"MJ Reddy","year":"2006","unstructured":"Reddy, M. J., & Kumar, D. N. (2006). Optimal reservoir operation using multi-objective evolutionary algorithm. Water Resources Management, 20(6), 861\u2013878.","journal-title":"Water Resources Management"},{"key":"9552_CR134","unstructured":"Reymond, M., & Now\u00e9, A. (2019). Pareto-DQN: Approximating the pareto front in complex multi-objective decision problems. In: Proceedings of the adaptive and learning agents workshop (ALA-19) at AAMAS."},{"key":"9552_CR135","unstructured":"Reymond, M., Hayes, C., Roijers, D.M., Steckelmacher, D., & Now\u00e9, A. (2021). Actor-critic multi-objective reinforcement learning for non-linear utility functions. In: Multi-Objective Decision Making Workshop (MODeM 2021)."},{"key":"9552_CR136","doi-asserted-by":"crossref","unstructured":"Roijers, D.M. (2016). Multi-objective decision-theoretic planning. Ph.D. thesis, University of Amsterdam.","DOI":"10.1145\/3008665.3008670"},{"key":"9552_CR137","unstructured":"Roijers, D.M., R\u00f6pke, W., Now\u00e9, A., & R\u0103dulescu, R. (2021). On following pareto-optimal policies in multi-objective planning and reinforcement learning. In: Proceedings of the Multi-Objective Decision Making (MODeM) Workshop."},{"key":"9552_CR138","unstructured":"Roijers, D.M., Steckelmacher, D., & Now\u00e9, A. (2018). Multi-objective reinforcement learning for the expected utility of the return. In: Proceedings of the Adaptive and Learning Agents workshop at FAIM, vol. 2018."},{"key":"9552_CR139","doi-asserted-by":"crossref","unstructured":"Roijers, D.M., Walraven, E., & Spaan, M.T.J. (2018). Bootstrapping LPs in value iteration for multi-objective and partially observable MDPs. In: Proceedings of the Twenty-Eighth International Conference on Automated Planning and Scheduling (ICAPS), pp. 218\u2013226.","DOI":"10.1609\/icaps.v28i1.13903"},{"key":"9552_CR140","unstructured":"Roijers, D.M., Whiteson, S., & Oliehoek, F.A. (2015). Point-based planning for multi-objective pomdps. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence (IJCAI), pp. 1666\u20131672."},{"key":"9552_CR141","doi-asserted-by":"crossref","unstructured":"Roijers, D.M., Zintgraf, L.M., & Now\u00e9, A. (2017). Interactive thompson sampling for multi-objective multi-armed bandits. In: International Conference on Algorithmic Decision Theory, pp. 18\u201334. Springer.","DOI":"10.1007\/978-3-319-67504-6_2"},{"key":"9552_CR142","unstructured":"Roijers, D., Zintgraf, L., Libin, P., & Nowe, A. (2018). Interactive multi-objective reinforcement learning in multi-armed bandits for any utility function. In: Proceedings of the adaptive and learning agents workshop (ALA-18) at AAMAS."},{"key":"9552_CR143","doi-asserted-by":"crossref","unstructured":"Roijers, D.M., Zintgraf, L.M., Libin, P., Reymond, M., Bargiacchi, E., & Now\u00e9, A. (2020). Interactive multi-objective reinforcement learning in multi-armed bandits with gaussian process utility models. In: ECML-PKDD 2020: Proceedings of the 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, p.\u00a016.","DOI":"10.1007\/978-3-030-67664-3_28"},{"key":"9552_CR144","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1613\/jair.3987","volume":"48","author":"DM Roijers","year":"2013","unstructured":"Roijers, D. M., Vamplew, P., Whiteson, S., & Dazeley, R. (2013). A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48, 67\u2013113.","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"1","key":"9552_CR145","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-031-01576-2","volume":"11","author":"DM Roijers","year":"2017","unstructured":"Roijers, D. M., & Whiteson, S. (2017). Multi-objective decision making. Synthesis Lectures on Artificial Intelligence and Machine Learning, 11(1), 1\u2013129.","journal-title":"Synthesis Lectures on Artificial Intelligence and Machine Learning"},{"key":"9552_CR146","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1613\/jair.4550","volume":"52","author":"DM Roijers","year":"2015","unstructured":"Roijers, D. M., Whiteson, S., & Oliehoek, F. A. (2015). Computing convex coverage sets for faster multi-objective coordination. Journal of Artificial Intelligence Research, 52, 399\u2013443.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9552_CR147","unstructured":"Roll\u00f3n, E. (2008). Multi-objective optimization for graphical models. Ph.D. thesis, Universitat Polit\u00e8cnica de Catalunya, Barcelona."},{"key":"9552_CR148","unstructured":"Rollon, E., & Larrosa, J. (2007). Multi-objective russian doll search. In: AAAI, pp. 249\u2013254."},{"key":"9552_CR149","unstructured":"Rollon, E., & Larrosa, J. (2008). Constraint optimization techniques for multiobjective branch and bound search. In: International conference on logic programming, ICLP."},{"key":"9552_CR150","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1007\/s10732-006-6726-y","volume":"12","author":"E Roll\u00f3n","year":"2006","unstructured":"Roll\u00f3n, E., & Larrosa, J. (2006). Bucket elimination for multiobjective optimization problems. Journal of Heuristics, 12, 307\u2013328.","journal-title":"Journal of Heuristics"},{"key":"9552_CR151","unstructured":"Rowe, J., Smith, A., Pokorny, B., Mott, B., & Lester, J. (2018). Toward automated scenario generation with deep reinforcement learning in gift. In: Proceedings of the Sixth Annual GIFT User Symposium, pp. 65\u201374."},{"key":"9552_CR152","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.neucom.2016.10.100","volume":"263","author":"M Ruiz-Montiel","year":"2017","unstructured":"Ruiz-Montiel, M., Mandow, L., & P\u00e9rez-de-la Cruz, J. L. (2017). A temporal difference method for multi-objective reinforcement learning. Neurocomputing, 263, 15\u201325.","journal-title":"Neurocomputing"},{"key":"9552_CR153","doi-asserted-by":"crossref","unstructured":"Saisubramanian, S., Kamar, E., & Zilberstein, S. (2020). A multi-objective approach to mitigate negative side effects. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence.","DOI":"10.24963\/ijcai.2020\/50"},{"key":"9552_CR154","unstructured":"Schaul, T., Horgan, D., Gregor, K., & Silver, D. (2015). Universal value function approximators. In: International conference on machine learning, pp. 1312\u20131320."},{"key":"9552_CR155","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347."},{"key":"9552_CR156","unstructured":"Shabani, N. (2009). Incorporating flood control rule curves of the columbia river hydroelectric system in a multireservoir reinforcement learning optimization model. Ph.D. thesis, University of British Columbia."},{"key":"9552_CR157","unstructured":"Siddique, U., Weng, P., & Zimmer, M. (2020). Learning fair policies in multiobjective (deep) reinforcement learning with average and discounted rewards. In: International Conference on Machine Learning."},{"key":"9552_CR158","doi-asserted-by":"publisher","unstructured":"Silver, D., Singh, S., Precup, D., & Sutton, R. S. (2021). Reward is enough. Artificial Intelligence,299, 103535. https:\/\/doi.org\/10.1016\/j.artint.2021.103535. URL https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0004370221000862","DOI":"10.1016\/j.artint.2021.103535"},{"key":"9552_CR159","unstructured":"Smith, B. J., Klassert, R., & Pihlakas, R. (2021). Soft maximin approaches to multi-objective decision-making for encoding human intuitive values. In: Multi-Objective Decision Making Workshop."},{"key":"9552_CR160","doi-asserted-by":"crossref","unstructured":"Soh, H., & Demiris, Y. (2011). Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs). In: Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp. 713\u2013720.","DOI":"10.1145\/2001576.2001674"},{"key":"9552_CR161","doi-asserted-by":"crossref","unstructured":"Soh, H., & Demiris, Y. (2011). Multi-reward policies for medical applications: Anthrax attacks and smart wheelchairs. In: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, pp. 471\u2013478.","DOI":"10.1145\/2001858.2002036"},{"issue":"1","key":"9552_CR162","doi-asserted-by":"publisher","first-page":"136","DOI":"10.3390\/app8010136","volume":"8","author":"Y Sun","year":"2018","unstructured":"Sun, Y., Li, Y., Xiong, W., Yao, Z., Moniz, K., & Zahir, A. (2018). Pareto optimal solutions for network defense strategy selection simulator in multi-objective reinforcement learning. Applied Sciences, 8(1), 136.","journal-title":"Applied Sciences"},{"key":"9552_CR163","volume-title":"Reinforcement learning: An introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT Press."},{"key":"9552_CR164","doi-asserted-by":"crossref","unstructured":"Tajmajer, T. (2018). Modular multi-objective deep reinforcement learning with decision values. In: Federated conference on computer science and information systems (FedCSIS), pp. 85\u201393. IEEE.","DOI":"10.15439\/2018F231"},{"key":"9552_CR165","doi-asserted-by":"crossref","unstructured":"Taylor, A., Dusparic, I., Galv\u00e1n-L\u00f3pez, E., Clarke, S., & Cahill, V. (2014). Accelerating learning in multi-objective systems through transfer learning. In: Neural Networks (IJCNN), 2014 International Joint Conference on, pp. 2298\u20132305. IEEE.","DOI":"10.1109\/IJCNN.2014.6889438"},{"key":"9552_CR166","unstructured":"Tesauro, G., Das, R., Chan, H., Kephart, J., Levine, D., Rawson, F., & Lefurgy, C. (2008). Managing power consumption and performance of computing systems using reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1497\u20131504."},{"key":"9552_CR167","unstructured":"Thomas, L. (1982). Constrained Markov decision processes as multi-objective problems. Department of Decision Theory: University of Manchester."},{"key":"9552_CR168","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1016\/j.eswa.2016.10.045","volume":"72","author":"B Tozer","year":"2017","unstructured":"Tozer, B., Mazzuchi, T., & Sarkani, S. (2017). Many-objective stochastic path finding using reinforcement learning. Expert Systems with Applications, 72, 371\u2013382.","journal-title":"Expert Systems with Applications"},{"issue":"3","key":"9552_CR169","first-page":"440","volume":"21","author":"A Trivedi","year":"2016","unstructured":"Trivedi, A., Srinivasan, D., Sanyal, K., & Ghosh, A. (2016). A survey of multiobjective evolutionary algorithms based on decomposition. IEEE Transactions on Evolutionary Computation, 21(3), 440\u2013462.","journal-title":"IEEE Transactions on Evolutionary Computation"},{"key":"9552_CR170","unstructured":"Turgay, E., Oner, D., & Tekin, C. (2018). Multi-objective contextual bandit problem with similarity information. In: International Conference on Artificial Intelligence and Statistics, pp. 1673\u20131681."},{"key":"9552_CR171","doi-asserted-by":"crossref","unstructured":"Vamplew, P., Dazeley, R., Barker, E., & Kelarev, A. (2009). Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks. In: Australasian Joint Conference on Artificial Intelligence, pp. 340\u2013349. Springer.","DOI":"10.1007\/978-3-642-10439-8_35"},{"key":"9552_CR172","doi-asserted-by":"publisher","unstructured":"Vamplew, P., Foale, C., Dazeley, R., & Bignold, A. (2021). Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence100. https:\/\/doi.org\/10.1016\/j.engappai.2021.104186","DOI":"10.1016\/j.engappai.2021.104186"},{"key":"9552_CR173","doi-asserted-by":"crossref","unstructured":"Vamplew, P., Issabekov, R., Dazeley, R., & Foale, C. (2015). Reinforcement learning of Pareto-optimal multiobjective policies using steering. In: Australasian Joint Conference on Artificial Intelligence, pp. 596\u2013608. Springer.","DOI":"10.1007\/978-3-319-26350-2_53"},{"key":"9552_CR174","doi-asserted-by":"crossref","unstructured":"Vamplew, P., Yearwood, J., Dazeley, R., & Berry, A. (2008). On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Australasian Joint Conference on Artificial Intelligence, pp. 372\u2013378. Springer.","DOI":"10.1007\/978-3-540-89378-3_37"},{"issue":"1\u20132","key":"9552_CR175","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1007\/s10994-010-5232-5","volume":"84","author":"P Vamplew","year":"2011","unstructured":"Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., & Dekker, E. (2011). Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning, 84(1\u20132), 51\u201380.","journal-title":"Machine Learning"},{"key":"9552_CR176","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1016\/j.neucom.2016.09.141","volume":"263","author":"P Vamplew","year":"2017","unstructured":"Vamplew, P., Dazeley, R., & Foale, C. (2017). Softmax exploration strategies for multiobjective reinforcement learning. Neurocomputing, 263, 74\u201386.","journal-title":"Neurocomputing"},{"issue":"1","key":"9552_CR177","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1007\/s10676-017-9440-6","volume":"20","author":"P Vamplew","year":"2018","unstructured":"Vamplew, P., Dazeley, R., Foale, C., Firmin, S., & Mummery, J. (2018). Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology, 20(1), 27\u201340.","journal-title":"Ethics and Information Technology"},{"key":"9552_CR178","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-021-05859-1","author":"P Vamplew","year":"2021","unstructured":"Vamplew, P., Foale, C., & Dazeley, R. (2021). The impact of environmental stochasticity on value-based multiobjective reinforcement learning. Neural Computing and Applications. https:\/\/doi.org\/10.1007\/s00521-021-05859-1.","journal-title":"Neural Computing and Applications"},{"key":"9552_CR179","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1016\/j.neucom.2016.08.152","volume":"263","author":"P Vamplew","year":"2017","unstructured":"Vamplew, P., Issabekov, R., Dazeley, R., Foale, C., Berry, A., Moore, T., & Creighton, D. (2017). Steering approaches to Pareto-optimal multiobjective reinforcement learning. Neurocomputing, 263, 26\u201338.","journal-title":"Neurocomputing"},{"key":"9552_CR180","doi-asserted-by":"crossref","unstructured":"van Dijk, M.T., van Wingerden, J.W., Ashuri, T., Li, Y., & Rotea, M.A. (2016). Yaw-misalignment and its impact on wind turbine loads and wind farm power output. Journal of Physics: Conference Series, 753(6).","DOI":"10.1088\/1742-6596\/753\/6\/062013"},{"key":"9552_CR181","doi-asserted-by":"crossref","unstructured":"Van Dijk, M.T., van Wingerden, J.W., Ashuri, T., Li, Y., & Rotea, M.A. (2016). Yaw-misalignment and its impact on wind turbine loads and wind farm power output. Journal of Physics: Conference Series, 753(6).","DOI":"10.1088\/1742-6596\/753\/6\/062013"},{"issue":"1","key":"9552_CR182","first-page":"3483","volume":"15","author":"K Van Moffaert","year":"2014","unstructured":"Van Moffaert, K., & Now\u00e9, A. (2014). Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1), 3483\u20133512.","journal-title":"The Journal of Machine Learning Research"},{"key":"9552_CR183","unstructured":"Van\u00a0der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11)."},{"key":"9552_CR184","doi-asserted-by":"crossref","unstructured":"Van\u00a0Moffaert, K., Brys, T., Chandra, A., Esterle, L., Lewis, P.R., & Now\u00e9, A. (2014). A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning. In: 2014 International joint conference on neural networks (IJCNN), pp. 2306\u20132314. IEEE.","DOI":"10.1109\/IJCNN.2014.6889637"},{"key":"9552_CR185","doi-asserted-by":"crossref","unstructured":"Van\u00a0Moffaert, K., Drugan, M. M., & Now\u00e9, A. (2013). Hypervolume-based multi-objective reinforcement learning. In: International Conference on Evolutionary Multi-Criterion Optimization, pp. 352\u2013366. Springer.","DOI":"10.1007\/978-3-642-37140-0_28"},{"key":"9552_CR186","doi-asserted-by":"crossref","unstructured":"Van\u00a0Moffaert, K., Drugan, M. M., & Now\u00e9, A. (2013). Scalarized multi-objective reinforcement learning: Novel design techniques. In: 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 191\u2013199. IEEE.","DOI":"10.1109\/ADPRL.2013.6615007"},{"key":"9552_CR187","doi-asserted-by":"crossref","unstructured":"Van\u00a0Vaerenbergh, K., Rodr\u00edguez, A., Gagliolo, M., Vrancx, P., Now\u00e9, A., Stoev, J., Goossens, S., Pinte, G., & Symens, W. (2012). Improving wet clutch engagement with reinforcement learning. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1\u20138. IEEE.","DOI":"10.1109\/IJCNN.2012.6252825"},{"key":"9552_CR188","unstructured":"Verstraeten, T., Daems, P.J., Bargiacchi, E., Roijers, D.M., Libin, P.J., & Helsen, J. (2021). Scalable optimization for wind farm control using coordination graphs. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1362\u20131370."},{"key":"9552_CR189","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1016\/j.rser.2019.03.019","volume":"109","author":"T Verstraeten","year":"2019","unstructured":"Verstraeten, T., Now\u00e9, A., Keller, J., Guo, Y., Sheng, S., & Helsen, J. (2019). Fleetwide data-enabled reliability improvement of wind turbines. Renewable and Sustainable Energy Reviews, 109, 428\u2013437.","journal-title":"Renewable and Sustainable Energy Reviews"},{"issue":"3","key":"9552_CR190","first-page":"707","volume":"58","author":"C Von L\u00fccken","year":"2014","unstructured":"Von L\u00fccken, C., Bar\u00e1n, B., & Brizuela, C. (2014). A survey on multi-objective evolutionary algorithms for many-objective problems. Computational optimization and applications, 58(3), 707\u2013756.","journal-title":"Computational optimization and applications"},{"key":"9552_CR191","volume-title":"Moral machines: Teaching robots right from wrong","author":"W Wallach","year":"2008","unstructured":"Wallach, W., & Allen, C. (2008). Moral machines: Teaching robots right from wrong. Oxford: Oxford University Press."},{"key":"9552_CR192","unstructured":"Wang, W., & Sebag, M. (2012). Multi-objective Monte-Carlo tree search. In: Asian Conference on Machine Learning (pp. 507-522). PMLR, Singapore."},{"key":"9552_CR193","doi-asserted-by":"publisher","first-page":"17480","DOI":"10.1109\/ACCESS.2019.2894756","volume":"7","author":"H Wang","year":"2019","unstructured":"Wang, H., Lei, Z., Zhang, X., Peng, J., & Jiang, H. (2019). Multiobjective reinforcement learning-based intelligent approach for optimization of activation rules in automatic generation control. IEEE Access, 7, 17480\u201317492.","journal-title":"IEEE Access"},{"issue":"2\u20133","key":"9552_CR194","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1007\/s10994-013-5369-0","volume":"92","author":"W Wang","year":"2013","unstructured":"Wang, W., & Sebag, M. (2013). Hypervolume indicator and dominance reward based multi-objective monte-carlo tree search. Machine Learning, 92(2\u20133), 403\u2013429.","journal-title":"Machine Learning"},{"key":"9552_CR195","doi-asserted-by":"crossref","unstructured":"Wanigasekara, N., Liang, Y., Goh, S.T., Liu, Y., Williams, J.J., & Rosenblum, D.S. (2019). Learning multi-objective rewards and user utility function in contextual bandits for personalized ranking. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3835\u20133841. AAAI Press.","DOI":"10.24963\/ijcai.2019\/532"},{"key":"9552_CR196","doi-asserted-by":"crossref","unstructured":"Weng, D., Chen, R., Zhang, J., Bao, J., Zheng, Y., & Wu, Y. (2020). Pareto-optimal transit route planning with multi-objective monte-carlo tree search. IEEE Transactions on Intelligent Transportation Systems.","DOI":"10.1109\/TITS.2020.2964012"},{"issue":"2","key":"9552_CR197","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1016\/0022-247X(82)90122-6","volume":"89","author":"D White","year":"1982","unstructured":"White, D. (1982). Multi-objective infinite-horizon discounted markov decision processes. Journal of Mathematical Analysis and Applications, 89(2), 639\u2013647.","journal-title":"Journal of Mathematical Analysis and Applications"},{"key":"9552_CR198","first-page":"129","volume":"1","author":"CC White","year":"1980","unstructured":"White, C. C., & Kim, K. W. (1980). Solution procedures for vector criterion Markov decision processes. Large Scale Systems, 1, 129\u2013140.","journal-title":"Large Scale Systems"},{"key":"9552_CR199","doi-asserted-by":"crossref","unstructured":"Wiering, M. A., & De\u00a0Jong, E. D. (2007). Computing optimal stationary policies for multi-objective markov decision processes. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 158\u2013165. IEEE.","DOI":"10.1109\/ADPRL.2007.368183"},{"key":"9552_CR200","doi-asserted-by":"crossref","unstructured":"Wiering, M. A., Withagen, M., & Drugan, M. M. (2014). Model-based multi-objective reinforcement learning. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1\u20136. IEEE.","DOI":"10.1109\/ADPRL.2014.7010622"},{"issue":"136","key":"9552_CR201","first-page":"1","volume":"18","author":"C Wirth","year":"2017","unstructured":"Wirth, C., Akrour, R., Neumann, G., F\u00fcrnkranz, J., et al. (2017). A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research, 18(136), 1\u201346.","journal-title":"Journal of Machine Learning Research"},{"key":"9552_CR202","doi-asserted-by":"crossref","unstructured":"Wray, K. H., & Zilberstein, S. (2015). Multi-objective pomdps with lexicographic reward preferences. In: Twenty-Fourth International Joint Conference on Artificial Intelligence.","DOI":"10.1609\/aaai.v29i1.9647"},{"key":"9552_CR203","doi-asserted-by":"crossref","unstructured":"Wray, K. H., Zilberstein, S., & Mouaddib, A. I. (2015). Multi-objective mdps with conditional lexicographic reward preferences. In: Twenty-ninth AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v29i1.9647"},{"key":"9552_CR204","unstructured":"Xu, J., Tian, Y., Ma, P., Rus, D., Sueda, S., & Matusik, W. (2020). Prediction-guided multi-objective reinforcement learning for continuous robot control. In: Proceedings of the 37th International Conference on Machine Learning."},{"key":"9552_CR205","doi-asserted-by":"crossref","unstructured":"Yahyaa, S. Q., Drugan, M. M., & Manderick, B. (2014). Knowledge gradient for multi-objective multi-armed bandit algorithms. In: ICAART (1), pp. 74\u201383.","DOI":"10.1109\/ADPRL.2014.7010619"},{"key":"9552_CR206","doi-asserted-by":"crossref","unstructured":"Yamaguchi, T., Nagahama, S., Ichikawa, Y., Takadama, K. (2019). Model-based multi-objective reinforcement learning with unknown weights. In: International Conference on Human-Computer Interaction, pp. 311\u2013321. Springer.","DOI":"10.1007\/978-3-030-22649-7_25"},{"key":"9552_CR207","unstructured":"Yang, C., Lu, J., Gao, X., Liu, H., Chen, Q., Liu, G., & Chen, G. (2020). MoTiAC: Multi-objective actor-critics for real-time bidding. arXiv preprint arXiv:2002.07408."},{"key":"9552_CR208","unstructured":"Yang, R., Sun, X., & Narasimhan, K. (2019). A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In: Advances in Neural Information Processing Systems, pp. 14636\u201314647."},{"issue":"10","key":"9552_CR209","doi-asserted-by":"publisher","first-page":"3869","DOI":"10.1007\/s00500-016-2124-z","volume":"20","author":"L Yliniemi","year":"2016","unstructured":"Yliniemi, L., & Tumer, K. (2016). Multi-objective multiagent credit assignment in reinforcement learning and nsga-ii. Soft Computing, 20(10), 3869\u20133887.","journal-title":"Soft Computing"},{"issue":"1","key":"9552_CR210","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1007\/s10957-012-0234-z","volume":"159","author":"H Yu","year":"2013","unstructured":"Yu, H., & Liu, H. (2013). Robust multiple objective game theory. Journal of Optimization Theory and Applications, 159(1), 272\u2013280.","journal-title":"Journal of Optimization Theory and Applications"},{"key":"9552_CR211","unstructured":"Zhan, H., & Cao, Y. (2019). Relationship explainable multi-objective reinforcement learning with semantic explainability generation. arXiv preprint arXiv:1909.12268."},{"key":"9552_CR212","unstructured":"Zhang, Y., R\u0103dulescu, R., Mannion, P., Roijers, D. M., & Now\u00e9, A. (2020). Opponent modelling for reinforcement learning in multi-objective normal form games. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2080\u20132082."},{"key":"9552_CR213","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1016\/j.enbuild.2019.07.029","volume":"199","author":"Z Zhang","year":"2019","unstructured":"Zhang, Z., Chong, A., Pan, Y., Zhang, C., & Lam, K. P. (2019). Whole building energy model for hvac optimal control: A practical framework based on deep reinforcement learning. Energy and Buildings, 199, 472\u2013490.","journal-title":"Energy and Buildings"},{"issue":"1","key":"9552_CR214","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-37186-2","volume":"9","author":"Z Zhou","year":"2019","unstructured":"Zhou, Z., Kearnes, S., Li, L., Zare, R. N., & Riley, P. (2019). Optimization of molecules via deep reinforcement learning. Scientific Reports, 9(1), 1\u201310.","journal-title":"Scientific Reports"},{"key":"9552_CR215","unstructured":"Zintgraf, L. M., Kanters, T. V., Roijers, D. M., Oliehoek, F., & Beau, P. (2015). Quality assessment of MORL algorithms: A utility-based approach. In: Benelearn 2015: Proceedings of the 24th Annual Machine Learning Conference of Belgium and the Netherlands."},{"key":"9552_CR216","unstructured":"Zintgraf, L. M., Roijers, D. M., Linders, S., Jonker, C. M., & Now\u00e9, A. (2018). Ordered preference elicitation strategies for supporting multi-objective decision making. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1477\u20131485. International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9552_CR217","doi-asserted-by":"crossref","unstructured":"Zitzler, E., Knowles, J., & Thiele, L. (2008). Quality assessment of pareto set approximations. In: Multiobjective Optimization, pp. 373\u2013404. Springer.","DOI":"10.1007\/978-3-540-88908-3_14"},{"issue":"4","key":"9552_CR218","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/4235.797969","volume":"3","author":"E Zitzler","year":"1999","unstructured":"Zitzler, E., & Thiele, L. (1999). Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, 3(4), 257\u2013271.","journal-title":"IEEE Transactions on Evolutionary Computation"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09552-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-022-09552-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09552-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,22]],"date-time":"2024-09-22T02:41:19Z","timestamp":1726972879000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-022-09552-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":218,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["9552"],"URL":"https:\/\/doi.org\/10.1007\/s10458-022-09552-y","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4]]},"assertion":[{"value":"11 February 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 April 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"26"}}