{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,1]],"date-time":"2025-05-01T16:10:53Z","timestamp":1746115853918,"version":"3.40.4"},"publisher-location":"Cham","reference-count":56,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031906596"},{"type":"electronic","value":"9783031906602"}],"license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,1]],"date-time":"2025-05-01T00:00:00Z","timestamp":1746057600000},"content-version":"vor","delay-in-days":120,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>We present a data-driven approach for producing policies that are provably robust across unknown stochastic environments. Existing approaches can learn models of a single environment as an interval Markov decision processes (IMDP) and produce a robust policy with a probably approximately correct (PAC) guarantee on its performance. However these are unable to reason about the impact of environmental parameters underlying the uncertainty. We propose a framework based on parametric Markov decision processes with unknown distributions over parameters. We learn and analyse IMDPs for a set of unknown sample environments induced by parameters. The key challenge is then to produce meaningful performance guarantees that combine the two layers of uncertainty: (1) multiple environments induced by parameters with an unknown distribution; (2) unknown induced environments which are approximated by IMDPs. We present a novel approach based on scenario optimisation that yields a single PAC guarantee quantifying the risk level for which a specified performance level can be assured in unseen environments, plus a means to trade-off risk and performance. We implement and evaluate our framework using multiple robust policy generation methods on a range of benchmarks. We show that our approach produces tight bounds on a policy\u2019s performance with high confidence.<\/jats:p>","DOI":"10.1007\/978-3-031-90660-2_4","type":"book-chapter","created":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T09:37:43Z","timestamp":1746005863000},"page":"63-83","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Certifiably Robust Policies for Uncertain Parametric Environments"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7406-3440","authenticated-orcid":false,"given":"Yannik","family":"Schnitzer","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5627-9093","authenticated-orcid":false,"given":"Alessandro","family":"Abate","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4137-8862","authenticated-orcid":false,"given":"David","family":"Parker","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,1]]},"reference":[{"key":"4_CR1","doi-asserted-by":"crossref","unstructured":"Agha, G., Palmskog, K.: A survey of statistical model checking. ACM Trans. Model. Comput. Simul. 28(1), 6:1\u20136:39 (2018)","DOI":"10.1145\/3158668"},{"key":"4_CR2","doi-asserted-by":"crossref","unstructured":"Araya-L\u00f3pez, M., Buffet, O., Thomas, V., Charpillet, F.: Active learning of MDP models. In: EWRL. Lecture Notes in Computer Science, vol.\u00a07188, pp. 42\u201353. Springer (2011)","DOI":"10.1007\/978-3-642-29946-9_8"},{"key":"4_CR3","doi-asserted-by":"crossref","unstructured":"Ashok, P., Kret\u00ednsk\u00fd, J., Weininger, M.: PAC statistical model checking for Markov decision processes and stochastic games. In: CAV (1). Lecture Notes in Computer Science, vol. 11561, pp. 497\u2013519. Springer (2019)","DOI":"10.1007\/978-3-030-25540-4_29"},{"key":"4_CR4","doi-asserted-by":"crossref","unstructured":"Bacci, G., Hansen, M., Larsen, K.G.: Model checking constrained Markov reward models with uncertainties. In: QEST. Lecture Notes in Computer Science, vol. 11785, pp. 37\u201351. Springer (2019)","DOI":"10.1007\/978-3-030-30281-8_3"},{"key":"4_CR5","doi-asserted-by":"crossref","unstructured":"Badings, T., Sim\u00e3o, T.D., Suilen, M., Jansen, N.: Decision-making under uncertainty: Beyond probabilities (2023)","DOI":"10.1007\/s10009-023-00704-3"},{"key":"4_CR6","doi-asserted-by":"crossref","unstructured":"Badings, T.S., Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Topcu, U.: Scenario-based verification of uncertain parametric MDPs. Int. J. Softw. Tools Technol. Transf. 24(5), 803\u2013819 (2022)","DOI":"10.1007\/s10009-022-00673-z"},{"key":"4_CR7","doi-asserted-by":"crossref","unstructured":"Badings, T.S., Jansen, N., Junges, S., Stoelinga, M., Volk, M.: Sampling-based verification of CTMCs with uncertain rates. In: CAV (2). Lecture Notes in Computer Science, vol. 13372, pp. 26\u201347. Springer (2022)","DOI":"10.1007\/978-3-031-13188-2_2"},{"key":"4_CR8","doi-asserted-by":"crossref","unstructured":"B\u00e4uerle, N., Ott, J.: Markov decision processes with average-value-at-risk criteria. Math. Methods Oper. Res. 74(3), 361\u2013379 (2011)","DOI":"10.1007\/s00186-011-0367-0"},{"key":"4_CR9","unstructured":"Beck, J., Vuorio, R., Liu, E.Z., Xiong, Z., Zintgraf, L.M., Finn, C., Whiteson, S.: A survey of meta-reinforcement learning. CoRR abs\/2301.08028 (2023)"},{"key":"4_CR10","doi-asserted-by":"crossref","unstructured":"Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities - A Nonasymptotic Theory of Independence. Oxford University Press (2013)","DOI":"10.1093\/acprof:oso\/9780199535255.001.0001"},{"key":"4_CR11","doi-asserted-by":"crossref","unstructured":"Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211\u20131230 (2008)","DOI":"10.1137\/07069821X"},{"key":"4_CR12","doi-asserted-by":"crossref","unstructured":"Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chanceconstrained optimization: Feasibility and optimality. J. Optim. Theory Appl. 148(2), 257\u2013280 (2011)","DOI":"10.1007\/s10957-010-9754-6"},{"key":"4_CR13","doi-asserted-by":"crossref","unstructured":"Campi, M.C., Garatti, S.: Introduction to the scenario approach. SIAM (2018)","DOI":"10.1137\/1.9781611975444"},{"key":"4_CR14","doi-asserted-by":"crossref","unstructured":"Chi, Z., Liu, Y., Turrini, A., Zhang, L., Jansen, D.N.: A scenario approach for parametric Markov decision processes. In: Principles of Verification (2). Lecture Notes in Computer Science, vol. 15261, pp. 234\u2013266. Springer (2024)","DOI":"10.1007\/978-3-031-75775-4_11"},{"key":"4_CR15","unstructured":"Collins, L., Mokhtari, A., Shakkottai, S.: Task-robust model-agnostic meta-learning. In: NeurIPS (2020)"},{"key":"4_CR16","doi-asserted-by":"crossref","unstructured":"Costen, C., Rigter, M., Lacerda, B., Hawes, N.: Planning with hidden parameter polynomial MDPs. Proceedings of the AAAI Conference on Artificial Intelligence 37(10), 11963\u201311971 (Jun 2023)","DOI":"10.1609\/aaai.v37i10.26411"},{"key":"4_CR17","doi-asserted-by":"crossref","unstructured":"Daca, P., Henzinger, T.A., Kret\u00ednsk\u00fd, J., Petrov, T.: Faster statistical model checking for unbounded temporal properties. In: TACAS. Lecture Notes in Computer Science, vol. 9636, pp. 112\u2013129. Springer (2016)","DOI":"10.1007\/978-3-662-49674-9_7"},{"key":"4_CR18","unstructured":"Derman, E., Mankowitz, D., Mann, T., Mannor, S.: A Bayesian approach to robust reinforcement learning (2019)"},{"key":"4_CR19","unstructured":"Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML. Proceedings of Machine Learning Research, vol.\u00a070, pp. 1126\u20131135. PMLR (2017)"},{"key":"4_CR20","unstructured":"Ghosh, D., Rahme, J., Kumar, A., Zhang, A., Adams, R.P., Levine, S.: Why generalization in RL is difficult: Epistemic POMDPs and implicit partial observability (2021)"},{"key":"4_CR21","doi-asserted-by":"crossref","unstructured":"Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1-2), 71\u2013109 (2000)","DOI":"10.1016\/S0004-3702(00)00047-3"},{"key":"4_CR22","unstructured":"Greenberg, I., Mannor, S., Chechik, G., Meirom, E.A.: Train hard, fight easy: Robust meta reinforcement learning. In: NeurIPS (2023)"},{"key":"4_CR23","unstructured":"Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., Levine, S.: Meta-reinforcement learning of structured exploration strategies. In: NeurIPS (2018)"},{"key":"4_CR24","doi-asserted-by":"crossref","unstructured":"Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512\u2013535 (1994)","DOI":"10.1007\/BF01211866"},{"key":"4_CR25","doi-asserted-by":"crossref","unstructured":"Hartmanns, A., Klauck, M., Parker, D., Quatmann, T., Ruijters, E.: The quantitative verification benchmark set. In: TACAS (1). Lecture Notes in Computer Science, vol. 11427, pp. 344\u2013350. Springer (2019)","DOI":"10.1007\/978-3-030-17462-0_20"},{"key":"4_CR26","doi-asserted-by":"crossref","unstructured":"H\u00e9rault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: VMCAI. Lecture Notes in Computer Science, vol. 2937, pp. 73\u201384. Springer (2004)","DOI":"10.1007\/978-3-540-24622-0_8"},{"key":"4_CR27","doi-asserted-by":"crossref","unstructured":"Hoeffding, W.: Probability inequalities for sums of bounded random variables. Springer Series in Statistics (1994)","DOI":"10.1007\/978-1-4612-0865-5_26"},{"key":"4_CR28","doi-asserted-by":"crossref","unstructured":"Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257\u2013280 (2005)","DOI":"10.1287\/moor.1040.0129"},{"key":"4_CR29","doi-asserted-by":"crossref","unstructured":"Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.: Safety-constrained reinforcement learning for MDPs. In: TACAS. Lecture Notes in Computer Science, vol. 9636, pp. 130\u2013146. Springer (2016)","DOI":"10.1007\/978-3-662-49674-9_8"},{"key":"4_CR30","doi-asserted-by":"crossref","unstructured":"Kochenderfer, M.: Decision Making Under Uncertainty: Theory and Application (2015)","DOI":"10.7551\/mitpress\/10187.001.0001"},{"key":"4_CR31","doi-asserted-by":"crossref","unstructured":"Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: CAV. Lecture Notes in Computer Science, vol. 6806, pp. 585\u2013591. Springer (2011)","DOI":"10.1007\/978-3-642-22110-1_47"},{"key":"4_CR32","doi-asserted-by":"crossref","unstructured":"Legay, A., Delahaye, B., Bensalem, S.: Statistical model checking: An overview. In: RV. Lecture Notes in Computer Science, vol. 6418, pp. 122\u2013135. Springer (2010)","DOI":"10.1007\/978-3-642-16612-9_11"},{"key":"4_CR33","unstructured":"Meggendorfer, T., Weininger, M., Wienh\u00f6ft, P.: What are the odds? improving the foundations of statistical model checking. CoRR abs\/2404.05424 (2024)"},{"key":"4_CR34","doi-asserted-by":"crossref","unstructured":"Nilim, A., Ghaoui, L.E.: Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5), 780\u2013798 (2005)","DOI":"10.1287\/opre.1050.0216"},{"key":"4_CR35","doi-asserted-by":"crossref","unstructured":"Pnueli, A.: The temporal logic of programs. In: FOCS. pp. 46\u201357. IEEE Computer Society (1977)","DOI":"10.1109\/SFCS.1977.32"},{"key":"4_CR36","doi-asserted-by":"crossref","unstructured":"Polgreen, E., Wijesuriya, V.B., Haesaert, S., Abate, A.: Data-efficient Bayesian verification of parametric Markov chains. In: QEST. Lecture Notes in Computer Science, vol. 9826, pp. 35\u201351. Springer (2016)","DOI":"10.1007\/978-3-319-43425-4_3"},{"key":"4_CR37","doi-asserted-by":"crossref","unstructured":"Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: ICML. ACM International Conference Proceeding Series, vol.\u00a0148, pp. 697\u2013704. ACM (2006)","DOI":"10.1145\/1143844.1143932"},{"key":"4_CR38","unstructured":"Raskin, J., Sankur, O.: Multiple-environment Markov decision processes. In: FSTTCS. LIPIcs, vol.\u00a029, pp. 531\u2013543. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik (2014)"},{"key":"4_CR39","unstructured":"Rickard, L., Abate, A., Margellos, K.: Learning robust policies for uncertain parametric Markov decision processes. In: Proc. L4DC\u201924. Proceedings of Machine Learning Research, vol.\u00a0242, pp. 876\u2013889. PMLR (2024)"},{"key":"4_CR40","doi-asserted-by":"crossref","unstructured":"Ross, S.M.: Introduction to Probability Models. Academic Press (2014)","DOI":"10.1016\/B978-0-12-407948-9.00001-3"},{"key":"4_CR41","doi-asserted-by":"crossref","unstructured":"Scheftelowitsch, D., Buchholz, P., Hashemi, V., Hermanns, H.: Multi-objective approaches to Markov decision processes with uncertain transition parameters. In: VALUETOOLS. pp. 44\u201351. ACM (2017)","DOI":"10.1145\/3150928.3150945"},{"key":"4_CR42","unstructured":"Schnitzer, Y., Abate, A., Parker, D.: Learning provably robust policies in uncertain parametric environments. CoRR abs\/2408.03093 (2024)"},{"key":"4_CR43","doi-asserted-by":"crossref","unstructured":"Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic systems. In: CAV. Lecture Notes in Computer Science, vol. 3114, pp. 202\u2013215. Springer (2004)","DOI":"10.1007\/978-3-540-27813-9_16"},{"key":"4_CR44","doi-asserted-by":"crossref","unstructured":"Steimle, L.N., Kaufman, D.L., Denton, B.T.: Multi-model Markov decision processes. IISE Trans. 53(10), 1124\u20131139 (2021)","DOI":"10.1080\/24725854.2020.1869351"},{"key":"4_CR45","doi-asserted-by":"crossref","unstructured":"Strehl, A.L., Littman, M.L.: A theoretical analysis of model-based interval estimation. In: ICML. ACM International Conference Proceeding Series, vol.\u00a0119, pp. 856\u2013863. ACM (2005)","DOI":"10.1145\/1102351.1102459"},{"key":"4_CR46","doi-asserted-by":"crossref","unstructured":"St\u00fcckler, J., Schwarz, M., Schadler, M., Topalidou-Kyniazopoulou, A., Behnke, S.: Nimbro explorer: Semiautonomous exploration and mobile manipulation in rough terrain. Journal of Field Robotics (2015)","DOI":"10.1002\/rob.21592"},{"key":"4_CR47","doi-asserted-by":"crossref","unstructured":"Suilen, M., Badings, T.S., Bovy, E.M., Parker, D., Jansen, N.: Robust markov decision processes: A place where AI and formal methods meet. In: Principles of Verification (3). Lecture Notes in Computer Science, vol. 15262, pp. 126\u2013154. Springer (2024)","DOI":"10.1007\/978-3-031-75778-5_7"},{"key":"4_CR48","unstructured":"Suilen, M., Sim\u00e3o, T.D., Parker, D., Jansen, N.: Robust anytime learning of Markov decision processes. In: NeurIPS (2022)"},{"key":"4_CR49","unstructured":"Sutton, R.S., McAllester, D.A., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS. pp. 1057\u20131063. The MIT Press (1999)"},{"key":"4_CR50","unstructured":"Teh, Y.W., Bapst, V., Czarnecki, W.M., Quan, J., Kirkpatrick, J., Hadsell, R., Heess, N., Pascanu, R.: Distral: Robust multitask reinforcement learning. In: NIPS. pp. 4496\u20134506 (2017)"},{"key":"4_CR51","unstructured":"Towers, M., Kwiatkowski, A., Terry, J.K., Balis, J.U., Cola, G.D., Deleu, T., Goul\u00e3o, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierr\u00e9, A., Schulhoff, S., Tai, J.J., Tan, H., Younis, O.G.: Gymnasium: A standard interface for reinforcement learning environments. CoRR abs\/2407.17032 (2024)"},{"key":"4_CR52","doi-asserted-by":"crossref","unstructured":"Wang, Y., Zou, S.: Online robust reinforcement learning with model uncertainty (2021)","DOI":"10.1115\/1.0001814V"},{"key":"4_CR53","doi-asserted-by":"crossref","unstructured":"Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153\u2013183 (2013)","DOI":"10.1287\/moor.1120.0566"},{"key":"4_CR54","doi-asserted-by":"crossref","unstructured":"Wilson, E.B.: Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association (1927)","DOI":"10.2307\/2276774"},{"key":"4_CR55","doi-asserted-by":"crossref","unstructured":"Wolff, E.M., Topcu, U., Murray, R.M.: Robust control of uncertain markov decision processes with temporal logic specifications. In: CDC. IEEE (2012)","DOI":"10.1109\/CDC.2012.6426174"},{"key":"4_CR56","doi-asserted-by":"crossref","unstructured":"Younes, H.L.S., Simmons, R.G.: Probabilistic verification of discrete event systems using acceptance sampling. In: CAV. Lecture Notes in Computer Science, vol. 2404, pp. 223\u2013235. Springer (2002)","DOI":"10.1007\/3-540-45657-0_17"}],"container-title":["Lecture Notes in Computer Science","Tools and Algorithms for the Construction and Analysis of Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-90660-2_4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T09:38:11Z","timestamp":1746005891000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-90660-2_4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"ISBN":["9783031906596","9783031906602"],"references-count":56,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-90660-2_4","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2025]]},"assertion":[{"value":"1 May 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"TACAS","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Tools and Algorithms for the Construction and Analysis of Systems","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Hamilton, ON","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Canada","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2025","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"3 May 2025","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"8 May 2025","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"31","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"tacas2025","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/etaps.org\/2025\/conferences\/tacas\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}