{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:24:15Z","timestamp":1750307055055,"version":"3.41.0"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2013,1,1]],"date-time":"2013-01-01T00:00:00Z","timestamp":1356998400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANS-008-BLAN-0218 BigMC"],"award-info":[{"award-number":["ANS-008-BLAN-0218 BigMC"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Model. Comput. Simul."],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:p>We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state\/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.<\/jats:p>","DOI":"10.1145\/2414416.2414420","type":"journal-article","created":{"date-parts":[[2013,1,29]],"date-time":"2013-01-29T16:20:55Z","timestamp":1359476455000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Bayesian Learning of Noisy Markov Decision Processes"],"prefix":"10.1145","volume":"23","author":[{"given":"Sumeetpal S.","family":"Singh","sequence":"first","affiliation":[{"name":"University of Cambridge"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicolas","family":"Chopin","sequence":"additional","affiliation":[{"name":"CREST---ENSAE and HEC Paris"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nick","family":"Whiteley","sequence":"additional","affiliation":[{"name":"University of Bristol"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2013,1]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1111\/1468-0262.00340"},{"key":"e_1_2_1_3_1","article-title":"Bayesian analysis of binary and polychotomous response data","volume":"88","author":"Albert J.","year":"1993","journal-title":"J. Amer. Statis. Assn."},{"edition":"3","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas D.","key":"e_1_2_1_4_1"},{"edition":"3","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas D.","key":"e_1_2_1_5_1"},{"volume-title":"Eds","year":"1996","author":"Bertsekas D.","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/89.3.539"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-009-9168-1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1538788.1538812"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2006.00553.x"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Geweke J. and Keane M. 1996. Bayesian inference for dynamic discrete choice models without the need for dynamic programming. Working Paper 564 Federal Reserve Bank of Minneapolis.  Geweke J. and Keane M. 1996. Bayesian inference for dynamic discrete choice models without the need for dynamic programming. Working Paper 564 Federal Reserve Bank of Minneapolis.","DOI":"10.21034\/wp.564"},{"volume-title":"Econometrics: Methods and Applications","year":"2000","author":"Geweke J.","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Geweke J. Keane M. and Runkle D. 1994. Alternative computational approaches to inference in the multinomial probit model. In Review of Economics and Statistics 609--632.  Geweke J. Keane M. and Runkle D. 1994. Alternative computational approaches to inference in the multinomial probit model. In Review of Economics and Statistics 609--632.","DOI":"10.2307\/2109766"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/0165-1765(80)90070-1"},{"volume-title":"Handbook of Markov Chain Monte Carlo","author":"Hobert J.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1214\/009053607000000569"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.2307\/2298122"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jeconom.2004.02.002"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3982\/ECTA5658"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1999.10473879"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-4076(00)00034-8"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-4076(94)90064-7"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/86.2.301"},{"volume-title":"Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann","author":"Ng A.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008905311214"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.spl.2011.09.009"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.2307\/1911259"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1137\/0326056"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1177\/105971239700600103"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-247X(85)90317-8"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1987.10478458"},{"key":"e_1_2_1_32_1","unstructured":"Tsitsiklis J. and Roy B. V. 1994. Feature-based methods for large scale dynamic programming. Tech. rep. LIDS-P 2277 Laboratory for Information and Decision Systems. Massachusetts Institute of Technology.  Tsitsiklis J. and Roy B. V. 1994. Feature-based methods for large scale dynamic programming. Tech. rep. LIDS-P 2277 Laboratory for Information and Decision Systems. Massachusetts Institute of Technology."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1086\/261262"}],"container-title":["ACM Transactions on Modeling and Computer Simulation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2414416.2414420","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2414416.2414420","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:21:10Z","timestamp":1750238470000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2414416.2414420"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,1]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["10.1145\/2414416.2414420"],"URL":"https:\/\/doi.org\/10.1145\/2414416.2414420","relation":{},"ISSN":["1049-3301","1558-1195"],"issn-type":[{"type":"print","value":"1049-3301"},{"type":"electronic","value":"1558-1195"}],"subject":[],"published":{"date-parts":[[2013,1]]},"assertion":[{"value":"2011-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}