{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T06:31:07Z","timestamp":1774420267742,"version":"3.50.1"},"reference-count":26,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T00:00:00Z","timestamp":1655078400000},"content-version":"unspecified","delay-in-days":163,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Conventional reinforcement learning focuses on problems with single objective. However, many problems have multiple objectives or criteria that may be independent, related, or contradictory. In such cases, multi-objective reinforcement learning is used to propose a compromise among the solutions to balance the objectives. TOPSIS is a multi-criteria decision method that selects the alternative with minimum distance from the positive ideal solution and the maximum distance from the negative ideal solution, so it can be used effectively in the decision-making process to select the next action. In this research a single-policy algorithm called TOPSIS Q-Learning is provided with focus on its performance in online mode. Unlike all single-policy methods, in the first version of the algorithm, there is no need for the user to specify the weights of the objectives. The user\u2019s preferences may not be completely definite, so all weight preferences are combined together as decision criteria and a solution is generated by considering all these preferences at once and user can model the uncertainty and weight changes of objectives around their specified preferences of objectives. If the user only wants to apply the algorithm for a specific set of weights the second version of the algorithm efficiently accomplishes that.<\/jats:p>","DOI":"10.1017\/s0269888921000163","type":"journal-article","created":{"date-parts":[[2022,6,13]],"date-time":"2022-06-13T15:42:12Z","timestamp":1655134932000},"update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":12,"title":["An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning"],"prefix":"10.48130","volume":"37","author":[{"given":"Mohammad","family":"Mirzanejad","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7683-443X","authenticated-orcid":false,"given":"Morteza","family":"Ebrahimi","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Vamplew","sequence":"additional","affiliation":[]},{"given":"Hadi","family":"Veisi","sequence":"additional","affiliation":[]}],"member":"27968","published-online":{"date-parts":[[2022,6,13]]},"reference":[{"key":"S0269888921000163_ref21","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5232-5"},{"key":"S0269888921000163_ref8","doi-asserted-by":"publisher","DOI":"10.2307\/2296469"},{"key":"S0269888921000163_ref23","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.08.152"},{"key":"S0269888921000163_ref24","first-page":"372","article-title":"On the limitations of scalarization for multi-objective reinforcement learning of Pareto fronts","volume":"5360","author":"Vamplew","year":"2008","journal-title":"AI 2008: Advances in Artificial Intelligence."},{"key":"S0269888921000163_ref2","unstructured":"Gabor, Z. , Kalmar, Z. & Szepesvari, C. 1998. Multi-criteria reinforcement learning. In The Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 197\u2013205."},{"key":"S0269888921000163_ref10","unstructured":"Moffaert, K. V. 2014. Multi-criteria reinforcement learning for sequential decision making problems, Ph.D. dissertation, Dept. Comput. Sci., Vrije Universiteit Brussel., Brussels, Belgium."},{"key":"S0269888921000163_ref12","first-page":"3483","article-title":"Multi-objective reinforcement learning using sets of pareto dominating policies","volume":"15","author":"Moffaert","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"S0269888921000163_ref1","doi-asserted-by":"crossref","unstructured":"Barrett, L. & Narayanan, S. 2008. Learning all optimal policies with multiple criteria. In Proceedings of the 25th International Conference on Machine Learning, New York, NY, USA, pp. 41\u201347.","DOI":"10.1145\/1390156.1390162"},{"key":"S0269888921000163_ref11","unstructured":"Moffaert, K. V. , Drugan, M. M. & Now\u00e9, A. 2013. Scalarized multi-objective reinforcement learning: Novel design techniques. In IEEE ADPRL, Singapore, pp. 191\u2013199."},{"key":"S0269888921000163_ref20","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993306"},{"key":"S0269888921000163_ref5","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-48318-9"},{"key":"S0269888921000163_ref9","first-page":"123","volume-title":"Trade-off Analysis: The Indifference and Preferred Proportions Approaches, Conflicting Objectives in Decisions","author":"MacCrimmon","year":"1977"},{"key":"S0269888921000163_ref26","unstructured":"Yoon, K. 1980. Systems selection by multiple attribute decision making, Ph.D. Dissertation, Kansas State University, Manhattan, Kansas."},{"key":"S0269888921000163_ref3","article-title":"Reinforcement learning for MDPs with Constraints. In Machine Learning: ECML 2006, Lecture Notes in","volume":"4212","author":"Geibel","year":"2006","journal-title":"Computer Science"},{"key":"S0269888921000163_ref13","first-page":"96","article-title":"A multi-objective deep reinforcement learning framework","author":"Nguyen","year":"2020","journal-title":"Engineering Applications of Artificial Intelligence"},{"key":"S0269888921000163_ref7","volume-title":"Decision with Multiple Objectives: Preferences and Value Tradeoffs","author":"Keeney","year":"1976"},{"key":"S0269888921000163_ref6","first-page":"626","article-title":"An empirical comparison of two common multiobjective reinforcement learning algorithms","volume":"7691","author":"Issabekov","year":"2012","journal-title":"AI 2012: Advances in Artificial Intelligence."},{"key":"S0269888921000163_ref17","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-67664-3_28"},{"key":"S0269888921000163_ref14","unstructured":"Roijers, D. M. , R\u00f6pke, W. , Nowe, A. & Radulescu, R. 2021. On following pareto-optimal policies in multi-objective planning and reinforcement learning. Paper Presented at Multi-Objective Decision Making Workshop 2021."},{"key":"S0269888921000163_ref18","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-67504-6_2"},{"key":"S0269888921000163_ref22","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.09.141"},{"key":"S0269888921000163_ref4","volume-title":"Systems","author":"Hwang","year":"1981"},{"key":"S0269888921000163_ref15","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3987"},{"key":"S0269888921000163_ref25","unstructured":"Watkins, C. 1989. Learning from delayed rewards, Ph.D. thesis, University of Cambridge, England."},{"key":"S0269888921000163_ref16","unstructured":"Roijers, D. M. , Zintgraf, L. M. , Libin, P. & Now\u00e9, A. 2018. Interactive multi-objective reinforcement learning in multi-armed bandits for any utility function. In ALA Workshop at FAIM, vol. 8."},{"key":"S0269888921000163_ref19","volume-title":"Adaptive Computation and Machine Learning","author":"Sutton","year":"1998"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888921000163","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:22Z","timestamp":1767624142000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888921000163\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":26,"alternative-id":["S0269888921000163"],"URL":"https:\/\/doi.org\/10.1017\/s0269888921000163","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"\u00a9 The Author(s), 2022. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}],"article-number":"e7"}}