{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:34:22Z","timestamp":1757619262513,"version":"3.44.0"},"publisher-location":"Cham","reference-count":41,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031986789"},{"type":"electronic","value":"9783031986796"}],"license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,22]],"date-time":"2025-07-22T00:00:00Z","timestamp":1753142400000},"content-version":"vor","delay-in-days":202,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Markov decision processes (MDPs) describe decision making subject to probabilistic uncertainty. A classical problem on MDPs is to compute a policy, selecting actions in every state, that maximizes the probability of reaching a dedicated set of target states. Computing such policies in tabular form is efficiently possible via standard algorithms. However, for further processing by either humans or machines, policies should be represented concisely, e.g., as a decision tree. This paper considers finding (almost) optimal decision trees of minimal depth and contributes a deductive synthesis approach. Technically, we combine pruning the space of concise policies with an abstraction-refinement loop with an SMT-encoding that maps candidate policies into decision trees. Our experiments show that this approach beats the state-of-the-art solver using an MILP encoding by orders of magnitude. The approach also pairs well with heuristic approaches that map a fixed policy into a decision tree: for an MDP with 1.5M states, our approach reduces the size of the given tree by 90%, while sacrificing only 1% of the optimal performance.<\/jats:p>","DOI":"10.1007\/978-3-031-98679-6_8","type":"book-chapter","created":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T09:19:58Z","timestamp":1753089598000},"page":"169-192","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Small Decision Trees for\u00a0MDPs with\u00a0Deductive Synthesis"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1286-934X","authenticated-orcid":false,"given":"Roman","family":"Andriushchenko","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0300-9727","authenticated-orcid":false,"given":"Milan","family":"\u010ce\u0161ka","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0978-8466","authenticated-orcid":false,"given":"Sebastian","family":"Junges","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4277-2751","authenticated-orcid":false,"given":"Filip","family":"Mac\u00e1k","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,22]]},"reference":[{"key":"8_CR1","doi-asserted-by":"crossref","unstructured":"Andriushchenko, R., \u010ce\u0161ka, M., Junges, S., Katoen, J.P.: Inductive synthesis for probabilistic programs reaches new horizons. In: TACAS. LNCS, vol. 12651, pp. 191\u2013209. Springer (2021)","DOI":"10.1007\/978-3-030-72016-2_11"},{"key":"8_CR2","unstructured":"Andriushchenko, R., \u010ce\u0161ka, M., Junges, S., Katoen, J.P.: Inductive synthesis of finite-state controllers for POMDPs. In: UAI, vol.\u00a0180, pp. 85\u201395. PMRL (2022)"},{"key":"8_CR3","doi-asserted-by":"crossref","unstructured":"Andriushchenko, R., \u010ce\u0161ka, M., Junges, S., Katoen, J.P., Stupinsk\u00fd, \u0160.: PAYNT: a tool for inductive synthesis of probabilistic programs. In: CAV. LNCS, vol. 12759, pp. 856\u2013869. Springer (2021)","DOI":"10.1007\/978-3-030-81685-8_40"},{"key":"8_CR4","unstructured":"Andriushchenko, R., \u010ce\u0161ka, M., Chakraborty, D., Junges, S., K\u0159etinsk\u00fd, J., Mac\u00e1k, F.: Symbiotic local search for small decision tree policies in MDPs. In: UAI (to appear) (2025)"},{"key":"8_CR5","doi-asserted-by":"crossref","unstructured":"Andriushchenko, R., \u010ce\u0161ka, M., Junges, S., Mac\u00e1k, F.: Small decision trees for MDPs with deductive synthesis (2025). https:\/\/arxiv.org\/abs\/2501.10126","DOI":"10.1007\/978-3-031-98679-6_8"},{"key":"8_CR6","doi-asserted-by":"crossref","unstructured":"Ashok, P., Jackermeier, M., K\u0159et\u00ednsk\u1ef3, J., Weinhuber, C., Weininger, M., Yadav, M.: dtcontrol 2.0: explainable strategy representation via decision tree learning steered by experts. In: TACAS. LNCS, vol. 12652, pp. 326\u2013345. Springer (2021)","DOI":"10.1007\/978-3-030-72013-1_17"},{"key":"8_CR7","doi-asserted-by":"crossref","unstructured":"Ashok, P., K\u0159et\u00ednsk\u1ef3, J., Larsen, K.G., Le\u00a0Co\u00ebnt, A., Taankvist, J.H., Weininger, M.: SOS: safe, optimal and small strategies for hybrid Markov decision processes. In: QEST, pp. 147\u2013164. Springer (2019)","DOI":"10.1007\/978-3-030-30281-8_9"},{"key":"8_CR8","doi-asserted-by":"crossref","unstructured":"Baier, C., de\u00a0Alfaro, L., Forejt, V., Kwiatkowska, M.: Model checking probabilistic systems. In: Handbook of Model Checking, pp. 963\u2013999. Springer (2018)","DOI":"10.1007\/978-3-319-10575-8_28"},{"key":"8_CR9","unstructured":"Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. In: Advances in Neural Information Processing Systems, vol. 31 (2018)"},{"key":"8_CR10","doi-asserted-by":"crossref","unstructured":"Batz, K., Biskup, T.J., Katoen, J.P., Winkler, T.: Programmatic strategy synthesis: resolving nondeterminism in probabilistic programs. Proc. ACM Program. Lang. 8(POPL), 2792\u20132820 (2024)","DOI":"10.1145\/3632935"},{"key":"8_CR11","unstructured":"Biere, A., Heule, M., van Maaren, H.: Handbook of Satisfiability, vol.\u00a0185. IOS Press (2009)"},{"key":"8_CR12","doi-asserted-by":"crossref","unstructured":"Br\u00e1zdil, T., Chatterjee, K., Chmel\u00edk, M., Fellner, A., K\u0159et\u00ednsk\u1ef3, J.: Counterexample explanation by learning small strategies in Markov decision processes. In: CAV, pp. 158\u2013177. Springer (2015)","DOI":"10.1007\/978-3-319-21690-4_10"},{"key":"8_CR13","doi-asserted-by":"crossref","unstructured":"Budde, C.E., et al.: On correctness, precision, and performance in quantitative verification: QComp 2020 competition report. In: ISoLA, pp. 216\u2013241. Springer (2020)","DOI":"10.1007\/978-3-030-83723-5_15"},{"key":"8_CR14","doi-asserted-by":"crossref","unstructured":"\u010ce\u0161ka, M., Jansen, N., Junges, S., Katoen, J.P.: Shepherding hordes of Markov chains. In: TACAS. LNCS, vol. 11428, pp. 172\u2013190. Springer (2019)","DOI":"10.1007\/978-3-030-17465-1_10"},{"key":"8_CR15","doi-asserted-by":"crossref","unstructured":"Chatterjee, K., Chmelik, M., Topcu, U.: Sensor synthesis for pomdps with reachability objectives. In: ICAPS, pp. 47\u201355. AAAI Press (2018)","DOI":"10.1609\/icaps.v28i1.13875"},{"key":"8_CR16","doi-asserted-by":"crossref","unstructured":"David, A., Jensen, P.G., Larsen, K.G., Miku\u010dionis, M., Taankvist, J.H.: Uppaal stratego. In: TACAS, pp. 206\u2013211. Springer (2015)","DOI":"10.1007\/978-3-662-46681-0_16"},{"key":"8_CR17","doi-asserted-by":"crossref","unstructured":"Dehnert, C., Jansen, N., Wimmer, R., \u00c1brah\u00e1m, E., Katoen, J.P.: Fast debugging of PRISM models. In: ATVA. LNCS, vol.\u00a08837, pp. 146\u2013162. Springer (2014)","DOI":"10.1007\/978-3-319-11936-6_11"},{"key":"8_CR18","doi-asserted-by":"crossref","unstructured":"Delgrange, F., Katoen, J.P., Quatmann, T., Randour, M.: Simple strategies in multi-objective MDPs. In: TACAS, pp. 346\u2013364. Springer (2020)","DOI":"10.1007\/978-3-030-45190-5_19"},{"key":"8_CR19","doi-asserted-by":"crossref","unstructured":"Drager, K., Forejt, V., Kwiatkowska, M., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. Log. Methods Comput. Sci. 11 (2015)","DOI":"10.2168\/LMCS-11(2:16)2015"},{"key":"8_CR20","doi-asserted-by":"crossref","unstructured":"Dubslaff, C., Kl\u00f6s, V., P\u00e4\u00dfler, J.: Template decision diagrams for meta control and explainability. In: XAI, pp. 219\u2013242. Springer (2024)","DOI":"10.1007\/978-3-031-63797-1_12"},{"key":"8_CR21","doi-asserted-by":"crossref","unstructured":"Gupta, U.D., Talvitie, E., Bowling, M.: Policy tree: adaptive representation for policy gradient. In: AAAI, vol.\u00a029 (2015)","DOI":"10.1609\/aaai.v29i1.9613"},{"issue":"2","key":"8_CR22","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1006\/inco.1996.0040","volume":"126","author":"T Hancock","year":"1996","unstructured":"Hancock, T., Jiang, T., Li, M., Tromp, J.: Lower bounds on learning decision lists and trees. Inf. Comput. 126(2), 114\u2013122 (1996)","journal-title":"Inf. Comput."},{"key":"8_CR23","unstructured":"Hauskrecht, M.: Incremental methods for computing bounds in partially observable Markov decision processes. In: AAAI\/IAAI, pp. 734\u2013739 (1997)"},{"issue":"4","key":"8_CR24","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1007\/s10009-021-00633-z","volume":"24","author":"C Hensel","year":"2022","unstructured":"Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589\u2013610 (2022)","journal-title":"Int. J. Softw. Tools Technol. Transf."},{"key":"8_CR25","doi-asserted-by":"crossref","unstructured":"Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.: Safety-constrained reinforcement learning for MDPs. In: TACAS. LNCS, vol.\u00a09636, pp. 130\u2013146. Springer (2016)","DOI":"10.1007\/978-3-662-49674-9_8"},{"key":"8_CR26","unstructured":"Kohler, H., Delfosse, Q., Akrour, R., Kersting, K., Preux, P.: Interpretable and editable programmatic tree policies for reinforcement learning. Preprint arXiv:2405.14956 (2024)"},{"key":"8_CR27","doi-asserted-by":"crossref","unstructured":"Konsta, A.M., Lluch\u00a0Lafuente, A., Matheja, C.: What should be observed for optimal reward in POMDPs? In: CAV, pp. 373\u2013394. Springer (2024)","DOI":"10.1007\/978-3-031-65633-0_17"},{"key":"8_CR28","doi-asserted-by":"crossref","unstructured":"Kumar, A., Zilberstein, S.: History-based controller design and optimization for partially observable MDPs. In: ICAPS, pp. 156\u2013164. AAAI Press (2015)","DOI":"10.1609\/icaps.v25i1.13730"},{"key":"8_CR29","doi-asserted-by":"crossref","unstructured":"Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: CAV. LNCS, vol.\u00a06806, pp. 585\u2013591. Springer (2011)","DOI":"10.1007\/978-3-642-22110-1_47"},{"issue":"1","key":"8_CR30","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/0020-0190(76)90095-8","volume":"5","author":"H Laurent","year":"1976","unstructured":"Laurent, H., Rivest, R.L.: Constructing optimal binary decision trees is np-complete. Inf. Process. Lett. 5(1), 15\u201317 (1976)","journal-title":"Inf. Process. Lett."},{"issue":"3","key":"8_CR31","doi-asserted-by":"publisher","first-page":"556","DOI":"10.1016\/j.ejor.2010.12.014","volume":"211","author":"Y Li","year":"2011","unstructured":"Li, Y., Yin, B., Xi, H.: Finding optimal memoryless policies of POMDPs under the expected average reward criterion. Eur. J. Oper. Res. 211(3), 556\u2013567 (2011)","journal-title":"Eur. J. Oper. Res."},{"key":"8_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2020.103568","volume":"131","author":"A Likmeta","year":"2020","unstructured":"Likmeta, A., Metelli, A.M., Tirinzoni, A., Giol, R., Restelli, M., Romano, D.: Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving. Robot. Auton. Syst. 131, 103568 (2020)","journal-title":"Robot. Auton. Syst."},{"key":"8_CR33","doi-asserted-by":"crossref","unstructured":"de\u00a0Moura, L., Bj\u00f8rner, N.: Z3: an efficient SMT solver. In: TACAS, pp. 337\u2013340. Springer (2008)","DOI":"10.1007\/978-3-540-78800-3_24"},{"key":"8_CR34","doi-asserted-by":"crossref","unstructured":"Narodytska, N., Ignatiev, A., Pereira, F., Marques-Silva, J.: Learning optimal decision trees with SAT. In: IJCAI, pp. 1362\u20131368. AAAI Press (2018)","DOI":"10.24963\/ijcai.2018\/189"},{"key":"8_CR35","doi-asserted-by":"crossref","unstructured":"Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics, Wiley (1994)","DOI":"10.1002\/9780470316887"},{"key":"8_CR36","unstructured":"Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2014)"},{"key":"8_CR37","unstructured":"Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)"},{"key":"8_CR38","doi-asserted-by":"crossref","unstructured":"Topin, N., Milani, S., Fang, F., Veloso, M.: Iterative bounding MDPs: learning interpretable policies via non-interpretable methods. In: AAAI, vol.\u00a035, pp. 9923\u20139931 (2021)","DOI":"10.1609\/aaai.v35i11.17192"},{"issue":"4\u20135","key":"8_CR39","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1007\/s00165-021-00547-2","volume":"33","author":"M \u010ce\u0161ka","year":"2021","unstructured":"\u010ce\u0161ka, M., Hensel, C., Junges, S., Katoen, J.P.: Counterexample-guided inductive synthesis for probabilistic systems. Form. Asp. Comput. 33(4\u20135), 637\u2013667 (2021)","journal-title":"Form. Asp. Comput."},{"key":"8_CR40","unstructured":"Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. In: ICML, vol.\u00a080, pp. 5052\u20135061. PMLR (2018)"},{"key":"8_CR41","doi-asserted-by":"crossref","unstructured":"Vos, D., Verwer, S.: Optimal decision tree policies for Markov decision processes. In: IJCAI, pp. 5457\u20135465 (2023)","DOI":"10.24963\/ijcai.2023\/606"}],"container-title":["Lecture Notes in Computer Science","Computer Aided Verification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-98679-6_8","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T16:17:41Z","timestamp":1757261861000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-98679-6_8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"ISBN":["9783031986789","9783031986796"],"references-count":41,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-98679-6_8","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2025]]},"assertion":[{"value":"22 July 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Disclosure of Interests"}},{"value":"CAV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Computer Aided Verification","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Zagreb","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Croatia","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2025","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"21 July 2025","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 July 2025","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"37","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cav2025","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/conferences.i-cav.org\/2025\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}