{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T07:20:01Z","timestamp":1781076001563,"version":"3.54.1"},"publisher-location":"Cham","reference-count":32,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030816872","type":"print"},{"value":"9783030816889","type":"electronic"}],"license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,15]],"date-time":"2021-07-15T00:00:00Z","timestamp":1626307200000},"content-version":"vor","delay-in-days":195,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best\/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.<\/jats:p>","DOI":"10.1007\/978-3-030-81688-9_30","type":"book-chapter","created":{"date-parts":[[2021,7,16]],"date-time":"2021-07-16T16:20:47Z","timestamp":1626452447000},"page":"651-673","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Model-Free Reinforcement Learning for Branching Markov Decision Processes"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9348-7684","authenticated-orcid":false,"given":"Ernst Moritz","family":"Hahn","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4220-3212","authenticated-orcid":false,"given":"Mateo","family":"Perez","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9093-9518","authenticated-orcid":false,"given":"Sven","family":"Schewe","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2085-2003","authenticated-orcid":false,"given":"Fabio","family":"Somenzi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9346-0126","authenticated-orcid":false,"given":"Ashutosh","family":"Trivedi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5560-0546","authenticated-orcid":false,"given":"Dominik","family":"Wojtczak","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,7,15]]},"reference":[{"key":"30_CR1","doi-asserted-by":"crossref","unstructured":"Becker, N.: Estimation for discrete time branching processes with application to epidemics. In: Biometrics, pp. 515\u2013522 (1977)","DOI":"10.2307\/2529366"},{"key":"30_CR2","doi-asserted-by":"publisher","unstructured":"Br\u00e1zdil, T., Kiefer, S.: Stabilization of branching queueing networks. In: 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012), vol. 14, pp. 507\u2013518 (2012). https:\/\/doi.org\/10.4230\/LIPIcs.STACS.2012.507","DOI":"10.4230\/LIPIcs.STACS.2012.507"},{"key":"30_CR3","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym. CoRR abs\/1606.01540 (2016)"},{"key":"30_CR4","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1007\/978-3-642-32589-2_26","volume-title":"Mathematical Foundations of Computer Science 2012","author":"T Chen","year":"2012","unstructured":"Chen, T., Dr\u00e4ger, K., Kiefer, S.: Model checking stochastic branching processes. In: Rovan, B., Sassone, V., Widmayer, P. (eds.) MFCS 2012. LNCS, vol. 7464, pp. 271\u2013282. Springer, Heidelberg (2012). https:\/\/doi.org\/10.1007\/978-3-642-32589-2_26"},{"issue":"10\u201311","key":"30_CR5","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1016\/j.ipl.2013.02.015","volume":"113","author":"J Esparza","year":"2013","unstructured":"Esparza, J., Gaiser, A., Kiefer, S.: A strongly polynomial algorithm for criticality of branching processes and consistency of stochastic context-free grammars. Inf. Process. Lett. 113(10\u201311), 381\u2013385 (2013)","journal-title":"Inf. Process. Lett."},{"key":"30_CR6","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1016\/j.ic.2018.02.013","volume":"261","author":"K Etessami","year":"2018","unstructured":"Etessami, K., Stewart, A., Yannakakis, M.: Greatest fixed points of probabilistic min\/max polynomial equations, and reachability for branching Markov decision processes. Inf. Comput. 261, 355\u2013382 (2018). https:\/\/doi.org\/10.1016\/j.ic.2018.02.013","journal-title":"Inf. Comput."},{"issue":"1","key":"30_CR7","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1287\/moor.2018.0970","volume":"45","author":"K Etessami","year":"2020","unstructured":"Etessami, K., Stewart, A., Yannakakis, M.: Polynomial time algorithms for branching Markov decision processes and probabilistic min(max) polynomial bellman equations. Math. Oper. Res. 45(1), 34\u201362 (2020). https:\/\/doi.org\/10.1287\/moor.2018.0970","journal-title":"Math. Oper. Res."},{"key":"30_CR8","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1016\/j.tcs.2018.12.018","volume":"777","author":"K Etessami","year":"2019","unstructured":"Etessami, K., Wojtczak, D., Yannakakis, M.: Recursive stochastic games with positive rewards. Theor. Comput. Sci. 777, 308\u2013328 (2019). https:\/\/doi.org\/10.1016\/j.tcs.2018.12.018","journal-title":"Theor. Comput. Sci."},{"issue":"1","key":"30_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1462153.1462154","volume":"56","author":"K Etessami","year":"2009","unstructured":"Etessami, K., Yannakakis, M.: Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56(1), 1\u201366 (2009)","journal-title":"J. ACM"},{"issue":"2","key":"30_CR10","doi-asserted-by":"publisher","first-page":"11:1","DOI":"10.1145\/2699431","volume":"62","author":"K Etessami","year":"2015","unstructured":"Etessami, K., Yannakakis, M.: Recursive Markov decision processes and recursive stochastic games. J. ACM 62(2), 11:1\u201311:69 (2015). https:\/\/doi.org\/10.1145\/2699431","journal-title":"J. ACM"},{"key":"30_CR11","unstructured":"Even-Dar, E., Mansour, Y., Bartlett, P.: Learning rates for q-learning. J. Mach. Learn. Res. 5(1) (2003)"},{"key":"30_CR12","doi-asserted-by":"crossref","unstructured":"Haccou, P., Haccou, P., Jagers, P., Vatutin, V.: Branching processes: variation, growth, and extinction of populations. No. 5 in Cambridge Studies in Adaptive Dynamics, Cambridge University Press (2005)","DOI":"10.1017\/CBO9780511629136"},{"key":"30_CR13","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-51866-9","volume-title":"The Theory of Branching Processes","author":"TE Harris","year":"1963","unstructured":"Harris, T.E.: The Theory of Branching Processes. Springer, Berlin (1963)"},{"key":"30_CR14","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4684-9469-3","volume-title":"I. J. Bienaym\u00e9: Statistical Theory Anticipated","author":"CC Heyde","year":"1977","unstructured":"Heyde, C.C., Seneta, E.: I. J. Bienaym\u00e9: Statistical Theory Anticipated. Springer, Heidelberg (1977). https:\/\/doi.org\/10.1007\/978-1-4684-9469-3"},{"key":"30_CR15","doi-asserted-by":"crossref","unstructured":"Jo, K.Y.: Optimal control of service in branching exponential queueing networks. In: 26th IEEE Conference on Decision and Control, vol. 26, pp. 1092\u20131097. IEEE (1987)","DOI":"10.1109\/CDC.1987.272570"},{"key":"30_CR16","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1007\/978-3-642-19835-9_28","volume-title":"Tools and Algorithms for the Construction and Analysis of Systems","author":"S Kiefer","year":"2011","unstructured":"Kiefer, S., Wojtczak, D.: On probabilistic parallel programs with process creation and synchronisation. In: Abdulla, P.A., Leino, K.R.M. (eds.) TACAS 2011. LNCS, vol. 6605, pp. 296\u2013310. Springer, Heidelberg (2011). https:\/\/doi.org\/10.1007\/978-3-642-19835-9_28"},{"key":"30_CR17","unstructured":"Kolmogorov, A.N., Sevastyanov, B.A.: The calculation of final probabilities for branching random processes. Doklady Akad. Nauk. U.S.S.R. (N.S.) 56, 783\u2013786 (1947)"},{"key":"30_CR18","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"585","DOI":"10.1007\/978-3-642-22110-1_47","volume-title":"Computer Aided Verification","author":"M Kwiatkowska","year":"2011","unstructured":"Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585\u2013591. Springer, Heidelberg (2011). https:\/\/doi.org\/10.1007\/978-3-642-22110-1_47"},{"key":"30_CR19","doi-asserted-by":"crossref","unstructured":"Munsky, B., Khammash, M.: The finite state projection algorithm for the solution of the chemical master equation. J. Chem. Phys. 124(4), 044104+ (2006)","DOI":"10.1063\/1.2145882"},{"key":"30_CR20","series-title":"International Series in Operations Research & Management Science","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1007\/978-1-4939-2483-7_19","volume-title":"Handbook of Operations Research in Agriculture and the Agri-Food Industry","author":"LR Nielsen","year":"2015","unstructured":"Nielsen, L.R., Kristensen, A.R.: Markov decision processes to model livestock systems. In: Pl\u00e0-Aragon\u00e9s, L.M. (ed.) Handbook of Operations Research in Agriculture and the Agri-Food Industry. ISORMS, vol. 224, pp. 419\u2013454. Springer, New York (2015). https:\/\/doi.org\/10.1007\/978-1-4939-2483-7_19"},{"key":"30_CR21","unstructured":"Perez, M., Somenzi, F., Trivedi, A.: Mungojerrie: formal reinforcement learning (2021). https:\/\/plv.colorado.edu\/mungojerrie\/. University of Colorado Boulder"},{"key":"30_CR22","unstructured":"Perron, L., Furnon, V.: Or-tools (version 7.2) (2019). https:\/\/developers.google.com\/optimization. Google"},{"issue":"2","key":"30_CR23","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1287\/mnsc.23.2.117","volume":"23","author":"SR Pliska","year":"1976","unstructured":"Pliska, S.R.: Optimization of multitype branching processes. Manag. Sci. 23(2), 117\u2013124 (1976)","journal-title":"Manag. Sci."},{"key":"30_CR24","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"ML Puterman","year":"1994","unstructured":"Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (1994)"},{"issue":"4","key":"30_CR25","first-page":"595","volume":"44","author":"A Rao","year":"2008","unstructured":"Rao, A., Bauch, C.T.: Classical Galton-Watson branching process and vaccination. Int. J. Pure Appl. Math. 44(4), 595 (2008)","journal-title":"Int. J. Pure Appl. Math."},{"issue":"4","key":"30_CR26","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1287\/moor.7.4.582","volume":"7","author":"UG Rothblum","year":"1982","unstructured":"Rothblum, U.G., Whittle, P.: Growth optimality for branching Markov decision chains. Math. Oper. Res. 7(4), 582\u2013601 (1982)","journal-title":"Math. Oper. Res."},{"key":"30_CR27","volume-title":"Reinforcement Learning: An Introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)","edition":"2"},{"key":"30_CR28","doi-asserted-by":"crossref","unstructured":"Trivedi, A., Wojtczak, D.: Timed branching processes. In: 2010 Seventh International Conference on the Quantitative Evaluation of Systems, pp. 219\u2013228. IEEE (2010)","DOI":"10.1109\/QEST.2010.36"},{"issue":"2","key":"30_CR29","first-page":"31","volume":"4","author":"AU Udom","year":"2013","unstructured":"Udom, A.U.: A Markov decision process approach to optimal control of a multi-level hierarchical manpower system. CBN J. Appl. Stat. 4(2), 31\u201349 (2013)","journal-title":"CBN J. Appl. Stat."},{"issue":"3\u20134","key":"30_CR30","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJ Watkins","year":"1992","unstructured":"Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3\u20134), 279\u2013292 (1992). https:\/\/doi.org\/10.1007\/BF00992698","journal-title":"Mach. Learn."},{"key":"30_CR31","first-page":"138","volume":"4","author":"HW Watson","year":"1874","unstructured":"Watson, H.W., Galton, F.: On the probability of the extinction of families. J. Anthrop. Inst. 4, 138\u2013144 (1874)","journal-title":"J. Anthrop. Inst."},{"key":"30_CR32","unstructured":"Wojtczak, D.: Recursive probabilistic models : efficient analysis and implementation. Ph.D. thesis, University of Edinburgh, UK (2009). http:\/\/hdl.handle.net\/1842\/3217"}],"container-title":["Lecture Notes in Computer Science","Computer Aided Verification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-81688-9_30","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,16]],"date-time":"2021-07-16T16:28:15Z","timestamp":1626452895000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-81688-9_30"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"ISBN":["9783030816872","9783030816889"],"references-count":32,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-81688-9_30","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021]]},"assertion":[{"value":"15 July 2021","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"CAV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Computer Aided Verification","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2021","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"20 July 2021","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"23 July 2021","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"33","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"cav2021","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"http:\/\/i-cav.org\/2021\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"290","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"63","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"22% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"12","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"16 tool papers and 5 invited papers are also included.","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}