{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T17:15:54Z","timestamp":1768410954848,"version":"3.49.0"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8]]},"abstract":"<jats:p>In many real-world reinforcement learning (RL) problems, in addition to maximizing the  objective, the learning agent has to  maintain some necessary safety constraints.  We formulate the problem of learning a safe policy as  an  infinite-horizon discounted Constrained Markov Decision Process (CMDP)  with an unknown transition probability matrix, where the safety requirements are modeled as   constraints on expected cumulative costs.  We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm,  where the algorithm has access to a generative model, and (ii) UC-CRL  algorithm,  where the algorithm learns the model using an upper confidence style online exploration method.   We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction.<\/jats:p>","DOI":"10.24963\/ijcai.2021\/347","type":"proceedings-article","created":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T11:00:49Z","timestamp":1628679649000},"page":"2519-2525","source":"Crossref","is-referenced-by-count":4,"title":["Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes"],"prefix":"10.24963","author":[{"given":"Aria","family":"HasanzadeZonuzy","sequence":"first","affiliation":[{"name":"Texas A&M University"}]},{"given":"Dileep","family":"Kalathil","sequence":"additional","affiliation":[{"name":"Texas A&M University"}]},{"given":"Srinivas","family":"Shakkottai","sequence":"additional","affiliation":[{"name":"Texas A&M University"}]}],"member":"10584","event":{"name":"Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}","theme":"Artificial Intelligence","location":"Montreal, Canada","acronym":"IJCAI-2021","number":"30","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2021,8,19]]},"end":{"date-parts":[[2021,8,27]]}},"container-title":["Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2021,8,11]],"date-time":"2021-08-11T11:02:45Z","timestamp":1628679765000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2021\/347"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2021,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2021\/347","relation":{},"subject":[],"published":{"date-parts":[[2021,8]]}}}