{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T14:39:56Z","timestamp":1777559996909,"version":"3.51.4"},"reference-count":24,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AIC"],"published-print":{"date-parts":[[2022,3,18]]},"abstract":"<jats:p>Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after \u201cdeterminizing\u201d the optimal stochastic policy leads to a policy far from the exact deterministic policy.<\/jats:p>","DOI":"10.3233\/aic-190632","type":"journal-article","created":{"date-parts":[[2021,9,21]],"date-time":"2021-09-21T13:08:12Z","timestamp":1632229692000},"page":"229-244","source":"Crossref","is-referenced-by-count":0,"title":["Deterministic policies based on maximum regrets in MDPs with imprecise rewards"],"prefix":"10.1177","volume":"34","author":[{"given":"Pegah","family":"Alizadeh","sequence":"first","affiliation":[{"name":"L\u00e9onard de Vinci P\u00f4le Universitaire, Research Center, 92 916 Paris, La D\u00e9fense, France. E-mail:\u00a0pegah.alizadeh@devinci.fr"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emiliano","family":"Traversi","sequence":"additional","affiliation":[{"name":"LIPN-UMR CNRS 7030, Universit\u00e9 Sorbonne Paris Nord, Villetaneuse, France. E-mails:\u00a0emiliano.traversi@lipn.univ-paris13.fr,\u00a0aomar.osmani@lipn.univ-paris13.fr"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aomar","family":"Osmani","sequence":"additional","affiliation":[{"name":"LIPN-UMR CNRS 7030, Universit\u00e9 Sorbonne Paris Nord, Villetaneuse, France. E-mails:\u00a0emiliano.traversi@lipn.univ-paris13.fr,\u00a0aomar.osmani@lipn.univ-paris13.fr"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"key":"10.3233\/AIC-190632_ref1","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1613\/jair.5242","article-title":"Sampling based approaches for minimizing regret in uncertain Markov decision processes (MDPs)","volume":"59","author":"Ahmed","year":"2017","journal-title":"J. Artif. Intell. Res."},{"key":"10.3233\/AIC-190632_ref2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2016.7727695"},{"key":"10.3233\/AIC-190632_ref3","doi-asserted-by":"crossref","unstructured":"P. Alizadeh, Y. Chevaleyre and J. Zucker, Approximate regret based elicitation in Markov decision process, in: RIVF, IEEE, 2015, pp. 47\u201352.","DOI":"10.1109\/RIVF.2015.7049873"},{"issue":"5","key":"10.3233\/AIC-190632_ref4","doi-asserted-by":"publisher","first-page":"961","DOI":"10.1287\/opre.30.5.961","article-title":"Regret in decision making under uncertainty","volume":"30","author":"Bell","year":"1982","journal-title":"Operations Research"},{"key":"10.3233\/AIC-190632_ref5","doi-asserted-by":"crossref","unstructured":"F. Benavent and B. Zanuttini, An experimental study of advice in sequential decision-making under uncertainty, in: 32nd AAAI Conference on Artificial Intelligence, 2018.","DOI":"10.1609\/aaai.v32i1.12118"},{"issue":"1","key":"10.3233\/AIC-190632_ref6","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1007\/BF01386316","article-title":"Partitioning procedures for solving mixed-variables programming problems","volume":"4","author":"Benders","year":"1962","journal-title":"Numer. Math."},{"key":"10.3233\/AIC-190632_ref7","unstructured":"D. Bertsimas and R. Weismantel, Optimization over Integers, Dynamic Ideas, 2005. ISBN 9780975914625."},{"key":"10.3233\/AIC-190632_ref8","doi-asserted-by":"crossref","unstructured":"V.F. da Silva and A.H.R. Costa, A geometric approach to find nondominated policies to imprecise reward MDPs, in: Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases \u2013 Volume Part I, ECML PKDD\u201911, Springer-Verlag, Berlin, Heidelberg, 2011, pp. 439\u2013454.","DOI":"10.1007\/978-3-642-23780-5_38"},{"key":"10.3233\/AIC-190632_ref9","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273525"},{"key":"10.3233\/AIC-190632_ref10","unstructured":"D. Dolgov and E. Durfee, Stationary deterministic policies for constrained MDPs with multiple rewards, costs, and discount factors, in: Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI\u201905, Morgan Kaufmann Publishers Inc., 2005, pp. 1326\u20131331."},{"issue":"1","key":"10.3233\/AIC-190632_ref11","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/s10994-012-5313-8","article-title":"Preference-based reinforcement learning: A formal framework and a policy iteration algorithm","volume":"89","author":"F\u00fcrnkranz","year":"2012","journal-title":"Machine Learning"},{"issue":"1","key":"10.3233\/AIC-190632_ref12","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/S0004-3702(00)00047-3","article-title":"Bounded-parameter Markov decision processes","volume":"122","author":"Givan","year":"2000","journal-title":"Artificial Intelligence"},{"issue":"2","key":"10.3233\/AIC-190632_ref13","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1287\/moor.1040.0129","article-title":"Robust dynamic programming","volume":"30","author":"Iyengar","year":"2005","journal-title":"Mathematics of Operations Research"},{"key":"10.3233\/AIC-190632_ref14","unstructured":"S. Mannor, O. Mebel and H. Xu, Lightning does not strike twice: Robust MDPs with coupled uncertainty, in: Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML\u201912, 2012, pp. 451\u2013458."},{"issue":"2","key":"10.3233\/AIC-190632_ref15","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1287\/mnsc.1060.0614","article-title":"Bias and variance approximation in value function estimates","volume":"53","author":"Mannor","year":"2007","journal-title":"Management Science"},{"issue":"5","key":"10.3233\/AIC-190632_ref18","doi-asserted-by":"publisher","first-page":"780","DOI":"10.1287\/opre.1050.0216","article-title":"Robust control of Markov decision processes with uncertain transition matrices","volume":"53","author":"Nilim","year":"2005","journal-title":"Operations Research"},{"key":"10.3233\/AIC-190632_ref19","doi-asserted-by":"crossref","unstructured":"M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn, Wiley, New York, NY, USA, 1994. ISBN 0471619779.","DOI":"10.1002\/9780470316887"},{"key":"10.3233\/AIC-190632_ref20","unstructured":"K. Regan and C. Boutilier, Regret-based reward elicitation for Markov decision processes, in: UAI, AUAI Press, 2009, pp. 444\u2013451."},{"key":"10.3233\/AIC-190632_ref21","doi-asserted-by":"crossref","unstructured":"K. Regan and C. Boutilier, Robust policy computation in reward-uncertain MDPs using nondominated policies, in: AAAI, AAAI Press, 2010.","DOI":"10.1609\/aaai.v24i1.7740"},{"key":"10.3233\/AIC-190632_ref22","unstructured":"R.S. Sutton and A.G. Barto, Introduction to Reinforcement Learning, 1st edn, MIT Press, Cambridge, MA, USA, 1998. ISBN 0262193981."},{"key":"10.3233\/AIC-190632_ref23","unstructured":"P. Weng, Ordinal decision models for Markov decision processes, in: ECAI, Frontiers in Artificial Intelligence and Applications, Vol. 242, IOS Press, 2012, pp. 828\u2013833."},{"issue":"1","key":"10.3233\/AIC-190632_ref24","first-page":"2415","article-title":"Interactive value iteration for Markov decision processes with unknown rewards","volume":"4","author":"Weng","year":"2013","journal-title":"IJCAI\/AAAI"},{"issue":"1","key":"10.3233\/AIC-190632_ref25","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1287\/moor.1120.0566","article-title":"Robust Markov decision processes","volume":"38","author":"Wiesemann","year":"2013","journal-title":"Mathematics of Operations Research"},{"key":"10.3233\/AIC-190632_ref26","doi-asserted-by":"crossref","unstructured":"H. Xu and S. Mannor, Parametric regret in uncertain Markov decision processes, in: CDC, IEEE, 2009, pp. 3606\u20133613.","DOI":"10.1109\/CDC.2009.5400796"}],"container-title":["AI Communications"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/AIC-190632","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T18:28:00Z","timestamp":1777400880000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/AIC-190632"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":24,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/aic-190632","relation":{},"ISSN":["1875-8452","0921-7126"],"issn-type":[{"value":"1875-8452","type":"electronic"},{"value":"0921-7126","type":"print"}],"subject":[],"published":{"date-parts":[[2022,3,18]]}}}