{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:10:12Z","timestamp":1750180212011,"version":"3.41.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/P020909\/1 and EP\/X017796\/1"],"award-info":[{"award-number":["EP\/P020909\/1 and EP\/X017796\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Science Foundation","award":["CCF-2009022 and CCF-2146563"],"award-info":[{"award-number":["CCF-2009022 and CCF-2146563"]}]},{"name":"European Union\u2019s Horizon 2020 research and innovation programme","award":["101032464 (SyGaST), 864075 (CAESAR), and 956123 (FOCETA)"],"award-info":[{"award-number":["101032464 (SyGaST), 864075 (CAESAR), and 956123 (FOCETA)"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Form. Asp. Comput."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>\n            The expanding role of reinforcement learning (RL) in safety-critical system design has promoted \u03c9-automata as a way to express learning requirements\u2014often non-Markovian\u2014with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1)\n            <jats:italic>weighted preference<\/jats:italic>\n            , where the decision maker provides scalar weights for various objectives, and (2)\n            <jats:italic>lexicographic preference<\/jats:italic>\n            , where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple \u03c9-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple \u03c9-regular objectives to a scalar reward signal that is both\n            <jats:italic>faithful<\/jats:italic>\n            (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and\n            <jats:italic>effective<\/jats:italic>\n            (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool,\n            <jats:sc>Mungojerrie<\/jats:sc>\n            , and we present an experimental evaluation of our technique on benchmark learning problems.\n          <\/jats:p>","DOI":"10.1145\/3605950","type":"journal-article","created":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T12:05:17Z","timestamp":1687781117000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Multi-objective \u03c9-Regular Reinforcement Learning"],"prefix":"10.1145","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9348-7684","authenticated-orcid":false,"given":"Ernst Moritz","family":"Hahn","sequence":"first","affiliation":[{"name":"University of Twente, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4220-3212","authenticated-orcid":false,"given":"Mateo","family":"Perez","sequence":"additional","affiliation":[{"name":"University of Colorado Boulder, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9093-9518","authenticated-orcid":false,"given":"Sven","family":"Schewe","sequence":"additional","affiliation":[{"name":"University of Liverpool, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2085-2003","authenticated-orcid":false,"given":"Fabio","family":"Somenzi","sequence":"additional","affiliation":[{"name":"University of Colorado Boulder, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9346-0126","authenticated-orcid":false,"given":"Ashutosh","family":"Trivedi","sequence":"additional","affiliation":[{"name":"University of Colorado Boulder, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5560-0546","authenticated-orcid":false,"given":"Dominik","family":"Wojtczak","sequence":"additional","affiliation":[{"name":"University of Liverpool, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,7,18]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"2669","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Alshiekh M.","year":"2018","unstructured":"M. Alshiekh, R. Bloem, R. Ehlers, B. K\u00f6nighofer, S. Niekum, and U. Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence. 2669\u20132678."},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1007\/978-3-642-10631-6_13","volume-title":"Algorithms and Computation","author":"Andersson D.","year":"2009","unstructured":"D. Andersson and P. B. Miltersen2009. The complexity of solving stochastic games on graphs. In Algorithms and Computation. 112\u2013121."},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1007\/978-3-319-21690-4_31","volume-title":"Proceedings of the International Conference on Computer Aided Verification (CAV\u201915)","author":"Babiak T.","year":"2015","unstructured":"T. Babiak, F. Blahoudek, A. Duret-Lutz, J. Klein, J. K\u0159et\u00ednsk\u00fd, D. M\u00fcller, D. Parker, and J. Strej\u010dek. 2015. The Hanoi \\(\\omega\\) -automata format. In Proceedings of the International Conference on Computer Aided Verification (CAV\u201915). 479\u2013486. LNCS 9206."},{"key":"e_1_3_1_5_2","first-page":"137","volume-title":"Proceedings of the Conference on Logic in Computer Science (LICS\u201905)","author":"Baier Ch.","year":"2005","unstructured":"Ch. Baier and M. Gr\u00f6\u00dfer. 2005. Recognizing \\(\\omega\\) -regular languages with probabilistic automata. In Proceedings of the Conference on Logic in Computer Science (LICS\u201905). 137\u2013146."},{"key":"e_1_3_1_6_2","volume-title":"Principles of Model Checking","author":"Baier Ch.","year":"2008","unstructured":"Ch. Baier and J.-P. Katoen. 2008. Principles of Model Checking. MIT Press."},{"issue":"1","key":"e_1_3_1_7_2","doi-asserted-by":"crossref","first-page":"133","DOI":"10.4064\/fm-3-1-133-181","article-title":"Sur les op\u00e9rations dans les ensembles abstraits et leur application aux \u00e9quations int\u00e9grales","volume":"3","author":"Banach Stefan","year":"1922","unstructured":"Stefan Banach. 1922. Sur les op\u00e9rations dans les ensembles abstraits et leur application aux \u00e9quations int\u00e9grales. Fundamenta Mathematicae 3, 1 (1922), 133\u2013181. http:\/\/eudml.org\/doc\/213289.","journal-title":"Fundamenta Mathematicae"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1007\/978-3-642-02658-4_14","volume-title":"Proceedings of the International Conference on Computer Aided Verification (CAV\u201909)","author":"Bloem Roderick","year":"2009","unstructured":"Roderick Bloem, Krishnendu Chatterjee, Thomas A. Henzinger, and Barbara Jobstmann. 2009. Better quality in synthesis through quantitative objectives. In Proceedings of the International Conference on Computer Aided Verification (CAV\u201909). Springer, 140\u2013156."},{"issue":"1","key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1038\/npp.2010.151","article-title":"Opponency revisited: Competition and cooperation between dopamine and serotonin","volume":"36","author":"Boureau Y.-Lan","year":"2011","unstructured":"Y.-Lan Boureau and Peter Dayan. 2011. Opponency revisited: Competition and cooperation between dopamine and serotonin. Neuropsychopharmacology 36, 1 (2011), 74\u201397.","journal-title":"Neuropsychopharmacology"},{"key":"e_1_3_1_10_2","article-title":"Model-free learning of safe yet effective controllers","author":"Bozkurt Alper Kamil","year":"2021","unstructured":"Alper Kamil Bozkurt, Yu Wang, and Miroslav Pajic. 2021. Model-free learning of safe yet effective controllers. Retrieved from https:\/\/arXiv:2103.14600.","journal-title":"R"},{"key":"e_1_3_1_11_2","first-page":"10349","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201920)","author":"Bozkurt Alper Kamil","year":"2020","unstructured":"Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, and Miroslav Pajic. 2020. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201920). 10349\u201310355. DOI:10.1109\/ICRA40945.2020.9196796"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1016\/j.ic.2016.10.011","article-title":"Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games","volume":"254","author":"Bruyere V\u00e9ronique","year":"2017","unstructured":"V\u00e9ronique Bruyere, Emmanuel Filiot, Mickael Randour, and Jean-Fran\u00e7ois Raskin. 2017. Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. Info. Comput. 254 (2017), 259\u2013295.","journal-title":"Info. Comput."},{"key":"e_1_3_1_13_2","article-title":"Parameterized complexity of games with monotonically ordered  \\(\\omega\\) -regular objectives","author":"Bruy\u00e8re V\u00e9ronique","year":"2017","unstructured":"V\u00e9ronique Bruy\u00e8re, Quentin Hautem, and Jean-Fran\u00e7ois Raskin. 2017. Parameterized complexity of games with monotonically ordered \\(\\omega\\) -regular objectives. Retrieved from https:\/\/arXiv:1707.05968.","journal-title":"R"},{"key":"e_1_3_1_14_2","first-page":"6065","volume-title":"Proceedings of the Joint Conference on Artificial Intelligence","author":"Camacho A.","year":"2019","unstructured":"A. Camacho, R. Toro Icarte, T. Q. Klassen, R. Valenzano, and S. A. McIlraith. 2019. LTL and beyond: Formal languages for reward function specification in reinforcement learning. In Proceedings of the Joint Conference on Artificial Intelligence. 6065\u20136073."},{"key":"e_1_3_1_15_2","unstructured":"S. Carr S. Junges N. Jansen and U. Topcu. 2022. Safe reinforcement learning via shielding under partial observability. Retrieved from https:\/\/arxiv.org\/pdf\/2204.00755.pdf."},{"key":"e_1_3_1_16_2","first-page":"473","volume-title":"Proceedings of the Foundations of Software Technology and Theoretical Computer Science (FSTTCS\u201907)","author":"Chatterjee Krishnendu","year":"2007","unstructured":"Krishnendu Chatterjee. 2007. Markov decision processes with multiple long-run average objectives. In Proceedings of the Foundations of Software Technology and Theoretical Computer Science (FSTTCS\u201907), V. Arvind and Sanjiva Prasad (Eds.). Springer, Berlin, 473\u2013484."},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1007\/978-3-030-53291-8_21","volume-title":"Proceedings of the International Conference on Computer Aided Verification","author":"Chatterjee Krishnendu","year":"2020","unstructured":"Krishnendu Chatterjee, Joost-Pieter Katoen, Maximilian Weininger, and Tobias Winkler. 2020. Stochastic games with lexicographic reachability-safety objectives. In Proceedings of the International Conference on Computer Aided Verification. Springer, 398\u2013420."},{"key":"e_1_3_1_18_2","first-page":"325","volume-title":"Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science","author":"Chatterjee Krishnendu","year":"2006","unstructured":"Krishnendu Chatterjee, Rupak Majumdar, and Thomas A. Henzinger. 2006. Markov decision processes with multiple objectives. In Proceedings of the Annual Symposium on Theoretical Aspects of Computer Science. Springer, 325\u2013336."},{"issue":"1","key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/BF01197559","article-title":"A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems","volume":"14","author":"Das Indraneel","year":"1997","unstructured":"Indraneel Das and John E. Dennis. 1997. A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems. Struct. Optimiz. 14, 1 (1997), 63\u201369.","journal-title":"Struct. Optimiz."},{"key":"e_1_3_1_20_2","volume-title":"Reinforcement Learning Models of the Dopamine System and their Behavioral Implications","author":"Daw Nathaniel D.","year":"2003","unstructured":"Nathaniel D. Daw. 2003. Reinforcement Learning Models of the Dopamine System and their Behavioral Implications. Carnegie Mellon University."},{"issue":"4","key":"e_1_3_1_21_2","first-page":"603","article-title":"Opponent interactions between serotonin and dopamine","volume":"15","author":"Daw Nathaniel D.","year":"2002","unstructured":"Nathaniel D. Daw, Sham Kakade, and Peter Dayan. 2002. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 4-6 (2002), 603\u2013616.","journal-title":"Neural Netw."},{"key":"e_1_3_1_22_2","volume-title":"Formal Verification of Probabilistic Systems","author":"Alfaro L. de","year":"1998","unstructured":"L. de Alfaro. 1998. Formal Verification of Probabilistic Systems. Ph.D. Dissertation. Stanford University."},{"key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1007\/978-3-540-71209-1_6","volume-title":"Tools and Algorithms for the Construction and Analysis of Systems","author":"Etessami K.","year":"2007","unstructured":"K. Etessami, M. Kwiatkowska, M. Y. Vardi, and M. Yannakakis. 2007. Multi-objective model checking of Markov decision processes. In Tools and Algorithms for the Construction and Analysis of Systems, Orna Grumberg and Michael Huth (Eds.). Springer, Berlin, 50\u201365."},{"key":"e_1_3_1_24_2","first-page":"573","article-title":"The seven bridges of K\u00f6nigsberg","volume":"1","author":"Euler Leonhard","year":"1956","unstructured":"Leonhard Euler. 1956. The seven bridges of K\u00f6nigsberg. World Math. 1 (1956), 573\u2013580.","journal-title":"World Math."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4615-0805-2","volume-title":"Handbook of Markov Decision Processes","author":"Feinberg E. A.","year":"2002","unstructured":"E. A. Feinberg and A. Shwartz (Eds.). 2002. Handbook of Markov Decision Processes. Springer."},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1145\/2735960.2735973","volume-title":"Proceedings of the ACM\/IEEE 6th International Conference on Cyber-Physical Systems (ICCPS\u201915)","author":"Feng Lu","year":"2015","unstructured":"Lu Feng, Clemens Wiltsche, Laura R. Humphrey, and Ufuk Topcu. 2015. Controller synthesis for autonomous systems interacting with human operators. In Proceedings of the ACM\/IEEE 6th International Conference on Cyber-Physical Systems (ICCPS\u201915), Alexandre M. Bayen and Michael S. Branicky (Eds.). ACM, 70\u201379. DOI:10.1145\/2735960.2735973"},{"issue":"11","key":"e_1_3_1_27_2","doi-asserted-by":"crossref","first-page":"1442","DOI":"10.1287\/mnsc.20.11.1442","article-title":"Exceptional paper\u2013Lexicographic orders, utilities, and decision rules: A survey","volume":"20","author":"Fishburn Peter C.","year":"1974","unstructured":"Peter C. Fishburn. 1974. Exceptional paper\u2013Lexicographic orders, utilities, and decision rules: A survey. Manage. Sci. 20, 11 (1974), 1442\u20131471.","journal-title":"Manage. Sci."},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1007\/978-3-642-19835-9_11","volume-title":"Tools and Algorithms for the Construction and Analysis of Systems","author":"Forejt Vojtech","year":"2011","unstructured":"Vojtech Forejt, Marta Kwiatkowska, Gethin Norman, David Parker, and Hongyang Qu. 2011. Quantitative multi-objective verification for probabilistic systems. In Tools and Algorithms for the Construction and Analysis of Systems, Parosh Aziz Abdulla and K. Rustan M. Leino (Eds.). Springer, Berlin, 112\u2013127."},{"key":"e_1_3_1_29_2","volume-title":"Proceedings of Robotics: Science and Systems","author":"Fu J.","year":"2014","unstructured":"J. Fu and U. Topcu. 2014. Probably approximately correct MDP learning and control with temporal logic constraints. In Proceedings of Robotics: Science and Systems\u2014A Robotics Conference (RSS\u201914)."},{"key":"e_1_3_1_30_2","first-page":"197","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201998)","volume":"98","author":"G\u00e1bor Zolt\u00e1n","year":"1998","unstructured":"Zolt\u00e1n G\u00e1bor, Zsolt Kalm\u00e1r, and Csaba Szepesv\u00e1ri. 1998. Multi-criteria reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML\u201998), Vol. 98. Citeseer, 197\u2013205."},{"key":"e_1_3_1_31_2","first-page":"1437","article-title":"A comprehensive survey on safe reinforcement learning","volume":"16","author":"Garcia J.","year":"2015","unstructured":"J. Garcia and F. Fern\u00e1ndez. 2015. A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16 (2015), 1437\u20131480.","journal-title":"J. Mach. Learn. Res."},{"issue":"3","key":"e_1_3_1_32_2","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1109\/LCSYS.2020.2979635","article-title":"Chance-constrained control with lexicographic deep reinforcement learning","volume":"4","author":"Giuseppi Alessandro","year":"2020","unstructured":"Alessandro Giuseppi and Antonio Pietrabissa. 2020. Chance-constrained control with lexicographic deep reinforcement learning. IEEE Control Syst. Lett. 4, 3 (2020), 755\u2013760.","journal-title":"IEEE Control Syst. Lett."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1014269108"},{"key":"e_1_3_1_34_2","volume-title":"Deep Learning","author":"Goodfellow I.","year":"2016","unstructured":"I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press."},{"key":"e_1_3_1_35_2","first-page":"354","volume-title":"Proceedings of the International Conference on Concurrency Theory (CONCUR\u201915)","author":"Hahn E. M.","year":"2015","unstructured":"E. M. Hahn, G. Li, S. Schewe, A. Turrini, and L. Zhang. 2015. Lazy probabilistic model checking without determinisation. In Proceedings of the International Conference on Concurrency Theory (CONCUR\u201915). 354\u2013367."},{"key":"e_1_3_1_36_2","first-page":"395","volume-title":"Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS\u201919)","author":"Hahn E. M.","year":"2019","unstructured":"E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. 2019. \\(\\omega\\) -Regular objectives in model-free reinforcement learning. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS\u201919). 395\u2013412. LNCS 11427."},{"key":"e_1_3_1_37_2","first-page":"108","volume-title":"Proceedings of the 18th International Symposium on Automated Technology for Verification and Analysis (ATVA\u201920) (Lecture Notes in Computer Science)","volume":"12302","author":"Hahn Ernst Moritz","year":"2020","unstructured":"Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. 2020. Faithful and effective reward schemes for model-free reinforcement learning of \\(\\omega\\) -regular objectives. In Proceedings of the 18th International Symposium on Automated Technology for Verification and Analysis (ATVA\u201920) (Lecture Notes in Computer Science), Dang Van Hung and Oleg Sokolsky (Eds.), Vol. 12302. Springer, 108\u2013124. DOI:10.1007\/978-3-030-59152-6_6"},{"key":"e_1_3_1_38_2","volume-title":"Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS\u201920)","author":"Hahn E. M.","year":"2020","unstructured":"E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. 2020. Good-for-MDPs automata for probabilistic analysis and reinforcement learning. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS\u201920)."},{"key":"e_1_3_1_39_2","first-page":"142","volume-title":"Proceedings of the International Symposium on Formal Methods","author":"Hahn Ernst Moritz","year":"2021","unstructured":"Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. 2021. Model-free reinforcement learning for lexicographic \\(\\omega\\) -regular objectives. In Proceedings of the International Symposium on Formal Methods. Springer, 142\u2013159."},{"key":"e_1_3_1_40_2","volume-title":"Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS\u201923)","author":"Hahn Ernst Moritz","year":"2023","unstructured":"Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. 2023. Mungojerrie: Reinforcement learning of linear-time objectives. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS\u201923). Retrieved from https:\/\/plv.colorado.edu\/wwwmungojerrie\/."},{"key":"e_1_3_1_41_2","unstructured":"M. Hasanbeig A. Abate and D. Kroening. 2018. Logically-correct reinforcement learning. Retrieved from http:\/\/arxiv.org\/abs\/1801.08099."},{"key":"e_1_3_1_42_2","unstructured":"M. Hasanbeig A. Abate and D. Kroening. 2019. Certified reinforcement learning with logic guidance. Retrieved from https:\/\/arXiv:1902.00778."},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1007\/978-1-4615-0805-2_8","volume-title":"Handbook of Markov Decision Processes: Methods and Applications","author":"Hordijk A.","year":"2002","unstructured":"A. Hordijk and A. A. Yushkevich. 2002. Handbook of Markov Decision Processes: Methods and Applications. Springer, 231\u2013267."},{"key":"e_1_3_1_44_2","first-page":"3:1\u20133:16","volume-title":"Proceedings of the International Conference on Concurrency Theory (CONCUR\u201920)","author":"Jansen N.","year":"2020","unstructured":"N. Jansen, B. K\u00f6nighofer, S. Junges, A. Serban, and R. Bloem. 2020. Safe reinforcement learning using probabilistic shields. In Proceedings of the International Conference on Concurrency Theory (CONCUR\u201920). 3:1\u20133:16."},{"key":"e_1_3_1_45_2","unstructured":"Lukasz Kaiser Mohammad Babaeizadeh Piotr Milos Blazej Osinski Roy H. Campbell Konrad Czechowski Dumitru Erhan Chelsea Finn Piotr Kozakowski Sergey Levine et\u00a0al. 2019. Model-based reinforcement learning for atari. Retrieved from https:\/\/arXiv:1903.00374."},{"key":"e_1_3_1_46_2","first-page":"290","volume-title":"Proceedings of the International Symposium on Leveraging Applications of Formal Methods","author":"K\u00f6nighofer Bettina","year":"2020","unstructured":"Bettina K\u00f6nighofer, Florian Lorber, Nils Jansen, and Roderick Bloem. 2020. Shield synthesis for reinforcement learning. In Proceedings of the International Symposium on Leveraging Applications of Formal Methods. Springer, 290\u2013306."},{"key":"e_1_3_1_47_2","first-page":"8:1\u20138:18","volume-title":"Proceedings of the 29th International Conference on Concurrency Theory (CONCUR\u201918) (LIPIcs)","volume":"118","author":"Kret\u00ednsk\u00fd Jan","year":"2018","unstructured":"Jan Kret\u00ednsk\u00fd, Guillermo A. P\u00e9rez, and Jean-Fran\u00e7ois Raskin. 2018. Learning-based mean-payoff optimization in an unknown MDP under \\(\\omega\\) -regular constraints. In Proceedings of the 29th International Conference on Concurrency Theory (CONCUR\u201918) (LIPIcs), Sven Schewe and Lijun Zhang (Eds.), Vol. 118. Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik, 8:1\u20138:18. DOI:10.4230\/LIPIcs.CONCUR.2018.8"},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1007\/978-3-642-22110-1_47","volume-title":"Proceedings of the Conference on Computer Aided Verification (CAV\u201911)","author":"Kwiatkowska M.","year":"2011","unstructured":"M. Kwiatkowska, G. Norman, and D. Parker. 2011. PRISM 4.0: Verification of probabilistic real-time systems. In Proceedings of the Conference on Computer Aided Verification (CAV\u201911). 585\u2013591. LNCS 6806."},{"issue":"12","key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1016\/j.tcs.2008.12.058","article-title":"Probabilistic mobile ambients","volume":"410","author":"Kwiatkowska M.","year":"2009","unstructured":"M. Kwiatkowska, G. Norman, D. Parker, and M.G. Vigliotti. 2009. Probabilistic mobile ambients. Theor. Comput. Sci. 410, 12\u201313 (2009), 1272\u20131303.","journal-title":"Theor. Comput. Sci."},{"key":"e_1_3_1_50_2","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/978-1-4615-0805-2_3","volume-title":"Handbook of Markov Decision Processes","author":"Lewis M. E.","year":"2002","unstructured":"M. E. Lewis. 2002. Bias optimality. In Handbook of Markov Decision Processes, E. A. Feinberg and A. Shwartz (Eds.). Springer, 89\u2013111."},{"issue":"4","key":"e_1_3_1_51_2","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1137\/1011093","article-title":"Short notes: Stochastic games with perfect information and time average payoff","volume":"11","author":"Liggett T. M.","year":"1969","unstructured":"T. M. Liggett and S. A. Lippman. 1969. Short notes: Stochastic games with perfect information and time average payoff. SIAM Rev. 11, 4 (1969), 604\u2013607.","journal-title":"SIAM Rev."},{"issue":"3","key":"e_1_3_1_52_2","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1109\/TSMC.2014.2358639","article-title":"Multiobjective reinforcement learning: A comprehensive overview","volume":"45","author":"Liu Chunming","year":"2014","unstructured":"Chunming Liu, Xin Xu, and Dewen Hu. 2014. Multiobjective reinforcement learning: A comprehensive overview. IEEE Trans. Syst., Man, Cybernet.: Syst. 45, 3 (2014), 385\u2013398.","journal-title":"IEEE Trans. Syst., Man, Cybernet.: Syst."},{"issue":"2","key":"e_1_3_1_53_2","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1007\/s00199-020-01256-2","article-title":"The lexicographic method in preference theory","volume":"71","author":"Mandler Michael","year":"2021","unstructured":"Michael Mandler. 2021. The lexicographic method in preference theory. Econ. Theory 71, 2 (2021), 553\u2013577.","journal-title":"Econ. Theory"},{"key":"e_1_3_1_54_2","volume-title":"The Temporal Logic of Reactive and Concurrent Systems Specification","author":"Manna Z.","year":"1991","unstructured":"Z. Manna and A. Pnueli. 1991. The Temporal Logic of Reactive and Concurrent Systems Specification. Springer."},{"key":"e_1_3_1_55_2","unstructured":"MITtr18. 10 Breakthrough Technologies 2017. Retrieved from https:\/\/www.technologyreview.com\/10-breakthrough-technologies\/2017\/. Date accessed: 07-24-2022."},{"issue":"7540","key":"e_1_3_1_56_2","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et\u00a0al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529\u2013533.","journal-title":"Nature"},{"key":"e_1_3_1_57_2","unstructured":"Nvidia20. NVIDIA: Paving the Way for Smarter Safer Autonomous Vehicles. Retrieved from https:\/\/www.nvidia.com\/en-us\/industries\/transportation\/. Date accessed: 07-07-2020."},{"key":"e_1_3_1_58_2","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1007\/978-3-319-13823-7_31","volume-title":"Proceedings of the International Conference on Modelling and Simulation for Autonomous Systems (MESAS\u201914)","author":"Pecka M.","year":"2014","unstructured":"M. Pecka and T. Svoboda. 2014. Safe exploration techniques for reinforcement learning\u2014An overview. In Proceedings of the International Conference on Modelling and Simulation for Autonomous Systems (MESAS\u201914). 357\u2013375."},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316887","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman M. L.","year":"1994","unstructured":"M. L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley."},{"key":"e_1_3_1_60_2","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1613\/jair.3987","article-title":"A survey of multi-objective sequential decision-making","volume":"48","author":"Roijers Diederik M.","year":"2013","unstructured":"Diederik M. Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. 2013. A survey of multi-objective sequential decision-making. J. Artific. Intell. Res. 48 (2013), 67\u2013113.","journal-title":"J. Artific. Intell. Res."},{"key":"e_1_3_1_61_2","first-page":"1091","volume-title":"Proceedings of the IEEE Conference on Decision and Control (CDC\u201914)","author":"Sadigh D.","year":"2014","unstructured":"D. Sadigh, E. Kim, S. Coogan, S. S. Sastry, and S. A. Seshia. 2014. A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications. In Proceedings of the IEEE Conference on Decision and Control (CDC\u201914). 1091\u20131096."},{"key":"e_1_3_1_62_2","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1007\/978-3-319-46520-3_9","volume-title":"Automated Technology for Verification and Analysis","author":"Sickert S.","year":"2016","unstructured":"S. Sickert and J. K\u0159et\u00ednsk\u00fd. 2016. MoChiBA: Probabilistic LTL model checking using limit-deterministic B\u00fcchi automata. In Automated Technology for Verification and Analysis. 130\u2013137. LNCS 9938."},{"key":"e_1_3_1_63_2","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver D.","year":"2016","unstructured":"D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (Jan.2016), 484\u2013489.","journal-title":"Nature"},{"key":"e_1_3_1_64_2","first-page":"1226","volume-title":"Proceedings of the International Conference on Autonomous Agents and Multiagent Systems: AAMAS","author":"Sim\u00e3o T. D.","year":"2021","unstructured":"T. D. Sim\u00e3o, N. Jansen, and M. T. J. Spaan. 2021. AlwaysSafe: Reinforcement learning without safety constraint violations during training. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems: AAMAS. 1226\u20131235."},{"key":"e_1_3_1_65_2","first-page":"3430","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI\u201922)","author":"Skalse J.","year":"2022","unstructured":"J. Skalse, L. Hammond, C. Griffin, and A. Abate. 2022. Lexicographic multi-objective reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI\u201922). 3430\u20133436."},{"key":"e_1_3_1_66_2","volume-title":"Reinforcement Learning: An Introduction (2nd ed.)","author":"Sutton R. S.","year":"2018","unstructured":"R. S. Sutton and A. G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). MIT Press."},{"key":"e_1_3_1_67_2","article-title":"Managing power consumption and performance of computing systems using reinforcement learning","volume":"20","author":"Tesauro Gerald","year":"2007","unstructured":"Gerald Tesauro, Rajarshi Das, Hoi Chan, Jeffrey Kephart, David Levine, Freeman Rawson, and Charles Lefurgy. 2007. Managing power consumption and performance of computing systems using reinforcement learning. Adv. Neural Info. Process. Syst. 20 (2007).","journal-title":"Adv. Neural Info. Process. Syst."},{"issue":"1","key":"e_1_3_1_68_2","first-page":"3483","article-title":"Multi-objective reinforcement learning using sets of pareto dominating policies","volume":"15","author":"Moffaert Kristof Van","year":"2014","unstructured":"Kristof Van Moffaert and Ann Now\u00e9. 2014. Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15, 1 (2014), 3483\u20133512.","journal-title":"J. Mach. Learn. Res."},{"issue":"7782","key":"e_1_3_1_69_2","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals Oriol","year":"2019","unstructured":"Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Micha\u00ebl Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, and Petko Georgiev. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350\u2013354.","journal-title":"Nature"},{"issue":"3","key":"e_1_3_1_70_2","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins Christopher J. C. H.","year":"1992","unstructured":"Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3-4 (1992), 279\u2013292.","journal-title":"Machine Learning"},{"key":"e_1_3_1_71_2","volume-title":"Learning from Delayed Rewards","author":"Watkins C. J. C. H.","year":"1989","unstructured":"C. J. C. H. Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King\u2019s College, Cambridge, UK."},{"key":"e_1_3_1_72_2","unstructured":"Wayve18. Wayve: Learning to Drive in a Day with Reinforcement Learning. Retrieved from https:\/\/wayve.ai\/blog\/learning-to-drive-in-a-day-with-reinforcement-learning. Date accessed: 11-05-2018."},{"key":"e_1_3_1_73_2","volume-title":"Proceedings of the 24th International Joint Conference on Artificial Intelligence","author":"Wray Kyle Hollins","year":"2015","unstructured":"Kyle Hollins Wray and Shlomo Zilberstein. 2015. Multi-objective POMDPs with lexicographic reward preferences. In Proceedings of the 24th International Joint Conference on Artificial Intelligence."}],"container-title":["Formal Aspects of Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3605950","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3605950","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:19Z","timestamp":1750178179000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3605950"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,30]]},"references-count":72,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3605950"],"URL":"https:\/\/doi.org\/10.1145\/3605950","relation":{},"ISSN":["0934-5043","1433-299X"],"issn-type":[{"type":"print","value":"0934-5043"},{"type":"electronic","value":"1433-299X"}],"subject":[],"published":{"date-parts":[[2023,6,30]]},"assertion":[{"value":"2022-07-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-11","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}