{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T18:54:19Z","timestamp":1757616859935,"version":"3.44.0"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,8,17]],"date-time":"2024-08-17T00:00:00Z","timestamp":1723852800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,17]],"date-time":"2024-08-17T00:00:00Z","timestamp":1723852800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001843","name":"Science and Engineering Research Board","doi-asserted-by":"publisher","award":["SRG\/2021\/000466"],"award-info":[{"award-number":["SRG\/2021\/000466"]}],"id":[{"id":"10.13039\/501100001843","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["427755713"],"award-info":[{"award-number":["427755713"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Grant Agency of Masaryk University","award":["MUNI\/I\/1757\/2021"],"award-info":[{"award-number":["MUNI\/I\/1757\/2021"]}]},{"DOI":"10.13039\/501100001405","name":"Tata Institute of Fundamental Research","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001405","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Form Methods Syst Des"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Markov decision processes (MDPs) and continuous-time MDP (CTMDPs) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first practical algorithm to compute mean payoff probably approximately correctly in unknown MDPs. Our algorithm is anytime in the sense that if terminated prematurely, it returns an approximate value with the required confidence. Further, we extend it to unknown CTMDPs. We do not require any knowledge of the state or number of successors of a state, but only a lower bound on the minimum transition probability, which has been advocated in literature. Our algorithm learns the unknown MDP\/CTMDP through repeated, directed sampling; thus spending less time on learning components with smaller impact on the mean payoff. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.<\/jats:p>","DOI":"10.1007\/s10703-024-00463-0","type":"journal-article","created":{"date-parts":[[2024,8,16]],"date-time":"2024-08-16T23:24:19Z","timestamp":1723850659000},"page":"195-237","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["PAC statistical model checking of mean payoff in discrete- and continuous-time MDP"],"prefix":"10.1007","volume":"66","author":[{"given":"Chaitanya","family":"Agarwal","sequence":"first","affiliation":[]},{"given":"Shibashis","family":"Guha","sequence":"additional","affiliation":[]},{"given":"Jan","family":"K\u0159et\u00ednsk\u00fd","sequence":"additional","affiliation":[]},{"given":"M.","family":"Pazhamalai","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,8,17]]},"reference":[{"key":"463_CR1","doi-asserted-by":"crossref","unstructured":"Agarwal C, Guha S, Kret\u00ednsk\u00fd J, Pazhamalai M (2022) PAC statistical model checking of mean payoff in discrete- and continuous-time MDP. arXiv:2206.01465","DOI":"10.1007\/978-3-031-13188-2_1"},{"key":"463_CR2","doi-asserted-by":"crossref","unstructured":"Israeli A, Jalfon M (1990) Token management schemes and random walks yield self-stabilizing mutual exclusion. In PODC, pp 119\u2013131","DOI":"10.1145\/93385.93409"},{"key":"463_CR3","doi-asserted-by":"crossref","unstructured":"Ashok P, Chatterjee K, Daca P, Kret\u00ednsk\u00fd J, Meggendorfer T (2017) Value iteration for long-run average reward in Markov decision processes. In CAV (1). Springer, pp 201\u2013221","DOI":"10.1007\/978-3-319-63387-9_10"},{"key":"463_CR4","doi-asserted-by":"crossref","unstructured":"Ashok P, Kret\u00ednsk\u00fd J, Weininger M (2019) PAC statistical model checking for Markov decision processes and stochastic games. In CAV, Part I, vol 11561 of LNCS. Springer, pp 497\u2013519","DOI":"10.1007\/978-3-030-25540-4_29"},{"issue":"1","key":"463_CR5","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1016\/0196-6774(90)90021-6","volume":"15","author":"J Aspnes","year":"1990","unstructured":"Aspnes J, Herlihy M (1990) Fast randomized consensus using shared memory. J Algorithms 15(1):441\u2013460","journal-title":"J Algorithms"},{"key":"463_CR6","doi-asserted-by":"crossref","unstructured":"Auer P, Ortner R (2006) Logarithmic online regret bounds for undiscounted reinforcement learning. In NIPS. MIT Press, pp 49\u201356","DOI":"10.7551\/mitpress\/7503.003.0011"},{"issue":"9","key":"463_CR7","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1145\/1810891.1810912","volume":"53","author":"C Baier","year":"2010","unstructured":"Baier C, Haverkort BR, Hermanns H, Katoen J-P (2010) Performance evaluation and model checking join forces. Commun ACM 53(9):76\u201385","journal-title":"Commun ACM"},{"key":"463_CR8","unstructured":"Baier C, Katoen J-P (2008) Principles of model checking. MIT Press"},{"key":"463_CR9","unstructured":"Bertsekas DP (1995) Dynamic programming and optimal control, vol\u00a0II. Athena Scientific"},{"issue":"1","key":"463_CR10","first-page":"1","volume":"10","author":"T Br\u00e1zdil","year":"2014","unstructured":"Br\u00e1zdil T, Bro\u017eek V, Chatterjee K, Forejt V, Ku\u010dera A (2014) Two views on multiple mean-payoff objectives in Markov decision processes. LMCS 10(1):1\u201329","journal-title":"LMCS"},{"key":"463_CR11","doi-asserted-by":"crossref","unstructured":"Br\u00e1zdil T, Chatterjee K, Chmelik M, Forejt V, K\u0159et\u00ednsk\u00fd J, Kwiatkowska MZ, Parker D, Ujma M (2014) Verification of Markov decision processes using learning algorithms. In ATVA. Springer, pp 98\u2013114","DOI":"10.1007\/978-3-319-11936-6_8"},{"key":"463_CR12","doi-asserted-by":"crossref","unstructured":"Br\u00e1zdil T, Chatterjee K, Chmelik M, Forejt V, K\u0159et\u00ednsk\u00fd J, Kwiatkowska MZ, Parker D, Ujma M (2014) Verification of markov decision processes using learning algorithms. arXiv:1402.2967","DOI":"10.1007\/978-3-319-11936-6_8"},{"issue":"1","key":"463_CR13","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1145\/322234.322242","volume":"28","author":"JL Bruno","year":"1981","unstructured":"Bruno JL, Downey PJ, Frederickson GN (1981) Sequencing tasks with exponential service times to minimize the expected flow time or makespan. J ACM 28(1):100\u2013113","journal-title":"J ACM"},{"key":"463_CR14","unstructured":"Butkova Y (2020) Towards efficient analysis of Markov automata. PhD thesis, Saarland University, Saarbr\u00fccken, Germany"},{"key":"463_CR15","doi-asserted-by":"crossref","unstructured":"Chatterjee K (2012) Robustness of structurally equivalent concurrent parity games. In FOSSACS, pp 270\u2013285","DOI":"10.1007\/978-3-642-28729-9_18"},{"key":"463_CR16","doi-asserted-by":"crossref","unstructured":"Daca P, Henzinger TA, K\u0159et\u00ednsk\u00fd J, Petrov T (2016) Faster statistical model checking for unbounded temporal properties. In TACAS. Springer Nature, pp 112\u2013129","DOI":"10.1007\/978-3-662-49674-9_7"},{"key":"463_CR17","doi-asserted-by":"crossref","unstructured":"Dehnert C, Junges S, Katoen J-P, Volk M (2017) A storm is coming: a modern probabilistic model checker. In CAV","DOI":"10.1007\/978-3-319-63390-9_31"},{"key":"463_CR18","doi-asserted-by":"crossref","unstructured":"Dembo A, Zeitouni O (2010) Large deviations techniques and applications. Springer","DOI":"10.1007\/978-3-642-03311-7"},{"issue":"3","key":"463_CR19","doi-asserted-by":"publisher","first-page":"492","DOI":"10.1287\/moor.1040.0089","volume":"29","author":"EA Feinberg","year":"2004","unstructured":"Feinberg EA (2004) Continuous time discounted jump Markov decision processes: a discrete-event approach. Math Oper Res 29(3):492\u2013524","journal-title":"Math Oper Res"},{"key":"463_CR20","unstructured":"Feinberg EA, Shwartz A (2012) Handbook of Markov decision processes: methods and applications, vol\u00a040. Springer Science & Business Media"},{"key":"463_CR21","doi-asserted-by":"crossref","unstructured":"Fu J, Topcu U (2014) Probably approximately correct MDP learning and control with temporal logic constraints. Sci Syst Robot","DOI":"10.15607\/RSS.2014.X.039"},{"key":"463_CR22","doi-asserted-by":"crossref","unstructured":"Ghemawat S, Gobioff H, Leung S (2003) The google file system. In SOSP","DOI":"10.1145\/945449.945450"},{"key":"463_CR23","doi-asserted-by":"crossref","unstructured":"Guck D, Hatefi H, Hermanns H, Katoen J, Timmer M (2013) Modelling, reduction and analysis of Markov automata. In QEST, vol 8054 of LNCS. Springer, pp 55\u201371","DOI":"10.1007\/978-3-642-40196-1_5"},{"key":"463_CR24","unstructured":"Hahn EM, Perez M, Schewe S, Somenzi F, Trivedi A, Wojtczak D (2021) Mungojerrie: Reinforcement learning of linear-time objectives. arXiv:2106.09161"},{"key":"463_CR25","doi-asserted-by":"crossref","unstructured":"Hahn Ernst\u00a0Moritz, Perez M, Schewe S, Somenzi F, Trivedi A, Wojtczak D (2019) Omega-regular objectives in model-free reinforcement learning. In TACAS (1). Springer, pp 395\u2013412","DOI":"10.1007\/978-3-030-17462-0_27"},{"key":"463_CR26","doi-asserted-by":"crossref","unstructured":"Hartmanns A, Klauck M, Parker D, Quatmann T, Ruijters E (2019) The quantitative verification benchmark set. In TACAS, Proceedings, Part I, vol 11427 of LNCS. Springer, pp 344\u2013350","DOI":"10.1007\/978-3-030-17462-0_20"},{"key":"463_CR27","unstructured":"Hatefi H, Hermanns H (2012) Model checking algorithms for Markov automata. Electron Commun Eur Assoc Softw Sci Technol 53"},{"key":"463_CR28","doi-asserted-by":"crossref","unstructured":"Haverkort BR, Hermanns H, Katoen J-P (2000) On the use of model checking techniques for dependability evaluation. In SRDS\u201900","DOI":"10.1109\/RELDI.2000.885410"},{"key":"463_CR29","doi-asserted-by":"crossref","unstructured":"Henriques D, Martins JG, Zuliani P, Platzer A, Clarke EM (2012) Statistical model checking for Markov decision processes. In QEST. IEEE Computer Society, pp 84\u201393","DOI":"10.1109\/QEST.2012.19"},{"key":"463_CR30","unstructured":"Hopper A, Temple S, Wheeler D, Williamson R (1986) Local area network design. Addison-Wesley"},{"key":"463_CR31","unstructured":"Jansen N, K\u00f6nighofer B, Junges S, Serban AC, Bloem R (2019) Safe reinforcement learning via probabilistic shields"},{"key":"463_CR32","doi-asserted-by":"crossref","unstructured":"Kelmendi E, Kr\u00e4mer J, Kret\u00ednsk\u00fd J, Weininger M (2018) Value iteration for simple stochastic games: stopping criterion and learning algorithm. In CAV, 2018, Proceedings, Part I, vol 10981 of LNCS. Springer, pp 623\u2013642","DOI":"10.1007\/978-3-319-96145-3_36"},{"key":"463_CR33","doi-asserted-by":"publisher","DOI":"10.1002\/9780470517147","volume-title":"Synchronization and arbitration in digital systems","author":"DJ Kinniment","year":"2007","unstructured":"Kinniment DJ (2007) Synchronization and arbitration in digital systems. Wiley-Blackwell, Hoboken"},{"key":"463_CR34","doi-asserted-by":"crossref","unstructured":"Komuravelli A, P\u0103s\u0103reanu CS, Clarke EM (2012) Assume-guarantee abstraction refinement for probabilistic systems. In CAV. Springer, pp 310\u2013326","DOI":"10.1007\/978-3-642-31424-7_25"},{"key":"463_CR35","unstructured":"Kret\u00ednsk\u00fd J, Michel F, Michel L, P\u00e9rez GA (2020) Finite-memory near-optimal learning for Markov decision processes with long-run average reward. In UAI, vol 124 of proceedings of machine learning research. AUAI Press, pp 1149\u20131158"},{"key":"463_CR36","unstructured":"K\u0159et\u00ednsk\u00fd Jan, P\u00e9rez Guillermo\u00a0A, Raskin Jean-Fran\u00e7ois (2018) Learning-based mean-payoff optimization in an unknown MDP under omega-regular constraints. In CONCUR. Dagstuhl, pp 8:1\u20138:18"},{"key":"463_CR37","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s10703-006-0005-2","volume":"29","author":"M Kwiatkowska","year":"2006","unstructured":"Kwiatkowska M, Norman G, Parker D, Sproston J (2006) Performance analysis of probabilistic timed automata using digital clocks. Formal Methods Syst Des 29:33\u201378","journal-title":"Formal Methods Syst Des"},{"issue":"12\u201313","key":"463_CR38","doi-asserted-by":"publisher","first-page":"1272","DOI":"10.1016\/j.tcs.2008.12.058","volume":"410","author":"M Kwiatkowska","year":"2009","unstructured":"Kwiatkowska M, Norman G, Parker D, Vigliotti MG (2009) Probabilistic mobile ambients. Theoret Comput Sci 410(12\u201313):1272\u20131303","journal-title":"Theoret Comput Sci"},{"key":"463_CR39","doi-asserted-by":"crossref","unstructured":"Kwiatkowska MZ, Norman G, Parker D (2002) PRISM: probabilistic symbolic model checker. In TOOLS, volume 2324 of LNCS. Springer, pp 200\u2013204","DOI":"10.1007\/3-540-46029-2_13"},{"key":"463_CR40","doi-asserted-by":"crossref","unstructured":"Kwiatkowska MZ, Norman G, Parker D (2012) The PRISM benchmark suite. In QEST. IEEE Computer Society, pp 203\u2013204","DOI":"10.1109\/QEST.2012.14"},{"key":"463_CR41","doi-asserted-by":"crossref","unstructured":"Kwiatkowska MZ, Norman G, Sproston J (2002) Probabilistic model checking of the IEEE 802.11 wireless local area network protocol. In PAPM-PROBMIV, vol 2399 of LNCS. Springer, pp 169\u2013187","DOI":"10.1007\/3-540-45605-8_11"},{"key":"463_CR42","doi-asserted-by":"crossref","unstructured":"Lassaigne R, Peyronnet S (2012) Approximate planning and verification for large Markov decision processes. In SAC. ACM, pp 1314\u20131319","DOI":"10.1145\/2245276.2231984"},{"key":"463_CR43","doi-asserted-by":"crossref","unstructured":"McIver A, Morgan C (2002) Games, probability and the quantitative mu-calculus QMU. In Baaz M, Voronkov A (eds) Proceedings of LPAR\u201902, vol 2514 of LNAI. Springer","DOI":"10.1007\/3-540-36078-6_20"},{"key":"463_CR44","doi-asserted-by":"crossref","unstructured":"McIver A, Morgan C (2007) Results on the quantitative $$\\mu$$-calculus qMu. ACM Trans Comput Logic 8(1)","DOI":"10.1145\/1182613.1182616"},{"key":"463_CR45","doi-asserted-by":"crossref","unstructured":"McMahan HB, Likhachev M, Gordon GJ (2005) Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In ICML, pp 569\u2013576","DOI":"10.1145\/1102351.1102423"},{"key":"463_CR46","doi-asserted-by":"crossref","unstructured":"Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley","DOI":"10.1002\/9780470316887"},{"key":"463_CR47","doi-asserted-by":"crossref","unstructured":"Qiu Q, Wu Q, Pedram M (1999) Stochastic modeling of a power-managed system: construction and optimization. In International symposium on low power electronics and design. ACM, pp 194\u2013199","DOI":"10.1145\/313817.313923"},{"issue":"10","key":"463_CR48","doi-asserted-by":"publisher","first-page":"1200","DOI":"10.1109\/43.952737","volume":"20","author":"Q Qiu","year":"2001","unstructured":"Qiu Q, Qu Q, Pedram M (2001) Stochastic modeling of a power-managed system-construction and optimization. IEEE Trans CAD Integr Circ Syst 20(10):1200\u20131217","journal-title":"IEEE Trans CAD Integr Circ Syst"},{"key":"463_CR49","doi-asserted-by":"crossref","unstructured":"Quatmann T, Junges S, Katoen J (2017) Markov automata with multiple objectives. In CAV - Part I. Springer, pp 140\u2013159","DOI":"10.1007\/978-3-319-63387-9_7"},{"key":"463_CR50","volume-title":"Stochastic dynamic programming and the control of queueing systems","author":"LI Sennott","year":"1999","unstructured":"Sennott LI (1999) Stochastic dynamic programming and the control of queueing systems. Wiley-Interscience, New York"},{"key":"463_CR51","doi-asserted-by":"publisher","first-page":"831","DOI":"10.1023\/B:JOTP.0000011995.28536.ef","volume":"16","author":"E Solan","year":"2003","unstructured":"Solan E (2003) Continuity of the value of competitive Markov decision processes. J Theor Probab 16:831\u2013845","journal-title":"J Theor Probab"},{"key":"463_CR52","doi-asserted-by":"crossref","unstructured":"Strehl AL, Li L, Wiewiora E, Langford J, Littman ML (2006) PAC model-free reinforcement learning. In ICML. ACM, pp 881\u2013888","DOI":"10.1145\/1143844.1143955"},{"key":"463_CR53","doi-asserted-by":"crossref","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Adaptive computation and machine learning. MIT Press","DOI":"10.1109\/TNN.1998.712192"},{"key":"463_CR54","unstructured":"Ortner R, Jaksch T, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J Mach Learn Res, pp 1563\u20131600"},{"key":"463_CR55","doi-asserted-by":"crossref","unstructured":"Younes HLS, Simmons RG (2002) Probabilistic verification of discrete event systems using acceptance sampling. In CAV. Springer, pp 223\u2013235","DOI":"10.1007\/3-540-45657-0_17"},{"key":"463_CR56","doi-asserted-by":"crossref","unstructured":"Zhang L, Neuh\u00e4u\u00dfer MR (2010) Model checking interactive Markov chains. In TACAS. Springer, pp 53\u201368","DOI":"10.1007\/978-3-642-12002-2_5"},{"key":"463_CR57","unstructured":"Koushik, Sen Mahesh, Viswanathan, Statistical Model Checking of Black-Box Probabilistic Systems, In CAV. Springer Berlin Heidelberg, pp 202\u2013215"}],"container-title":["Formal Methods in System Design"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10703-024-00463-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10703-024-00463-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10703-024-00463-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T20:53:44Z","timestamp":1757105624000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10703-024-00463-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,17]]},"references-count":57,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["463"],"URL":"https:\/\/doi.org\/10.1007\/s10703-024-00463-0","relation":{},"ISSN":["0925-9856","1572-8102"],"issn-type":[{"type":"print","value":"0925-9856"},{"type":"electronic","value":"1572-8102"}],"subject":[],"published":{"date-parts":[[2024,8,17]]},"assertion":[{"value":"15 June 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 July 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 August 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}