{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T18:21:40Z","timestamp":1781374900890,"version":"3.54.1"},"reference-count":89,"publisher":"MIT Press","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Neural Computation"],"published-print":{"date-parts":[[2021,3]]},"abstract":"<jats:p>Active inference is a first principle account of how autonomous agents operate in dynamic, nonstationary environments. This problem is also considered in reinforcement learning, but limited work exists on comparing the two approaches on the same discrete-state environments. In this letter, we provide (1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in reinforcement learning, and (2) an explicit discrete-state comparison between active inference and reinforcement learning on an OpenAI gym baseline. We begin by providing a condensed overview of the active inference literature, in particular viewing the various natural behaviors of active inference agents through the lens of reinforcement learning. We show that by operating in a pure belief-based setting, active inference agents can carry out epistemic exploration\u2014and account for uncertainty about their environment\u2014in a Bayes-optimal fashion. Furthermore, we show that the reliance on an explicit reward signal in reinforcement learning is removed in active inference, where reward can simply be treated as another observation we have a preference over; even in the total absence of rewards, agent behaviors are learned through preference learning. We make these properties explicit by showing two scenarios in which active inference agents can infer behaviors in reward-free environments compared to both Q-learning and Bayesian model-based reinforcement learning agents and by placing zero prior preferences over rewards and learning the prior preferences over the observations corresponding to reward. We conclude by noting that this formalism can be applied to more complex settings (e.g., robotic arm movement, Atari games) if appropriate generative models can be formulated. In short, we aim to demystify the behavior of active inference agents by presenting an accessible discrete state-space and time formulation and demonstrate these behaviors in a OpenAI gym environment, alongside reinforcement learning agents.<\/jats:p>","DOI":"10.1162\/neco_a_01357","type":"journal-article","created":{"date-parts":[[2021,1,5]],"date-time":"2021-01-05T22:38:38Z","timestamp":1609886318000},"page":"674-712","source":"Crossref","is-referenced-by-count":138,"title":["Active Inference: Demystified and Compared"],"prefix":"10.1162","volume":"33","author":[{"given":"Noor","family":"Sajid","sequence":"first","affiliation":[{"name":"Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Philip J.","family":"Ball","sequence":"additional","affiliation":[{"name":"Machine Learning Research Group, Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, U.K."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thomas","family":"Parr","sequence":"additional","affiliation":[{"name":"Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Karl J.","family":"Friston","sequence":"additional","affiliation":[{"name":"Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, U.K."}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","reference":[{"key":"B1","author":"Al-Shedivat M.","year":"2017","journal-title":"Continuous adaptation via meta-learning in nonstationary and competitive environments"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.1177\/0272989X09353194"},{"key":"B3","author":"Amodei D.","year":"2016","journal-title":"Concrete problems in AI safety"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1016\/0022-247X(65)90154-X"},{"key":"B5","author":"Attias H.","year":"2003","journal-title":"AISTATS"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-398532-3.00007-5"},{"key":"B7","author":"Beal M. J.","year":"2003","journal-title":"Variational algorithms for approximate Bayesian inference"},{"key":"B8","first-page":"3059","volume-title":"Advances in neural information processing systems","volume":"25","author":"Beck J.","year":"2012"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.38.8.716"},{"key":"B10","author":"Blau T.","year":"2019","journal-title":"Bayesian curiosity for efficient exploration in reinforcement learning."},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2017.1285773"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmp.2015.11.003"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2012.08.006"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmp.2017.09.004"},{"key":"B15","author":"Burda Y.","year":"2018","journal-title":"Exploration by random network distillation"},{"key":"B16","author":"Camacho E. F.","year":"2013","journal-title":"Model predictive control"},{"key":"B17","first-page":"73","volume-title":"Advances in neural information processing systems","volume":"25","author":"Cao F.","year":"2012"},{"key":"B18","first-page":"6284","volume":"25","author":"Cesa-Bianchi N.","year":"2017","journal-title":"Advances in neural information processing systems"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1512144113"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1007\/BF01193705"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1016\/j.bpsc.2018.06.010"},{"key":"B22","author":"Da Costa L.","year":"2020","journal-title":"Active inference on discrete state-spaces: A synthesis"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2006.18.7.1637"},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2011.02.027"},{"key":"B25","author":"Dearden R.","year":"2013","journal-title":"Model-based Bayesian exploration"},{"key":"B26","first-page":"761","author":"Dearden R.","year":"1998","journal-title":"Proceedings of the 15th National Conference on Artificial Intelligence"},{"key":"B27","author":"Foerster J. N.","year":"2017","journal-title":"Learning with opponent-learning awareness"},{"key":"B28","author":"Friston K.","year":"2019","journal-title":"A free energy principle for a particular physics"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.1016\/j.neubiorev.2016.06.022"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1162\/NECO_a_00912"},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.1007\/s00422-011-0424-z"},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.1162\/NETN_a_00018"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.1080\/17588928.2015.1020053"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.1016\/j.neubiorev.2017.04.009"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.1007\/s00422-012-0512-8"},{"key":"B36","doi-asserted-by":"publisher","DOI":"10.1098\/rstb.2013.0481"},{"key":"B37","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2007.19.12.3173"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-012-5313-8"},{"key":"B39","doi-asserted-by":"publisher","DOI":"10.1002\/9781119159193.ch33"},{"key":"B40","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-psych-122414-033625"},{"key":"B41","doi-asserted-by":"publisher","DOI":"10.1016\/j.conb.2010.02.008"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.1016\/j.cobeha.2015.07.007"},{"key":"B43","doi-asserted-by":"publisher","DOI":"10.1561\/2200000049"},{"key":"B44","author":"Haarnoja T.","year":"2018","journal-title":"Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor"},{"key":"B45","first-page":"13978","volume-title":"Advances in neural information processing systems","volume":"32","author":"Igl M.","year":"2019"},{"key":"B46","author":"Igl M.","year":"2018","journal-title":"Deep variational reinforcement learning for POMDPs"},{"key":"B47","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1145\/1553374.1553441","author":"Kolter J. Z.","year":"2009","journal-title":"Proceedings of the 26th Annual International Conference on Machine Learning"},{"key":"B48","author":"Lee L.","year":"2019","journal-title":"Efficient exploration via state marginal matching"},{"key":"B49","author":"Levine S.","year":"2018","journal-title":"Reinforcement learning and control as probabilistic inference: Tutorial and review"},{"key":"B50","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2013.0069"},{"key":"B51","author":"Millidge B.","year":"2020","journal-title":"Whence the expected free energy?"},{"key":"B52","author":"Millidge B.","year":"2020","journal-title":"On the relationship between active inference and control as inference"},{"key":"B53","doi-asserted-by":"publisher","DOI":"10.3389\/fncom.2016.00056"},{"key":"B54","first-page":"1928","author":"Mnih V.","year":"2016","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"B55","author":"Mnih V.","year":"2013","journal-title":"Playing Atari with deep reinforcement learning"},{"key":"B56","unstructured":"Mohamed, S. & Rezende, D. J. (2015).Variational information maximisation for intrinsically motivated reinforcement learning. arXiv:1509.08731."},{"key":"B57","doi-asserted-by":"publisher","DOI":"10.3389\/fnhum.2014.00160"},{"key":"B58","author":"Ng A. Y.","year":"2003","journal-title":"Shaping and policy search in reinforcement learning"},{"key":"B59","author":"O'Donoghue B.","year":"2020","journal-title":"Making sense of reinforcement learning and probabilistic inference"},{"key":"B60","first-page":"3836","author":"O'Donoghue B.","year":"2018","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"B61","first-page":"4026","volume-title":"Advances in neural information processing systems","volume":"29","author":"Osband I.","year":"2016"},{"key":"B62","author":"Padakandla S.","year":"2019","journal-title":"Reinforcement learning in non-stationary environments"},{"key":"B63","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2017.0376"},{"key":"B64","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01102"},{"key":"B65","doi-asserted-by":"publisher","DOI":"10.1007\/s00213-019-05240-0"},{"key":"B66","doi-asserted-by":"publisher","DOI":"10.1007\/s00422-019-00805-w"},{"key":"B67","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-018-38246-3"},{"key":"B68","first-page":"16","author":"Pathak D.","year":"2017","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops"},{"key":"B69","unstructured":"Poupart, P. (2018). Lecture slides on Bayesian reinforcement learning from cs885: https:\/\/cs.uwaterloo.ca\/ppoupart\/teaching\/cs885-spring18\/slides\/cs885-lecture10.pdf"},{"key":"B70","first-page":"5331","author":"Rakelly K.","year":"2019","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"B71","first-page":"1225","volume-title":"Advances in neural information processing systems","volume":"20","author":"Ross S.","year":"2008"},{"key":"B72","doi-asserted-by":"publisher","DOI":"10.1080\/09540090600768658"},{"key":"B73","doi-asserted-by":"publisher","DOI":"10.1016\/j.mehy.2014.12.007"},{"key":"B74","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.41703"},{"key":"B75","doi-asserted-by":"publisher","DOI":"10.1088\/0034-4885\/75\/12\/126001"},{"key":"B76","author":"Sekar R.","year":"2020","journal-title":"Planning to explore via self-supervised world models"},{"key":"B77","author":"Sorg J.","year":"2012","journal-title":"Variance-based rewards for approximate Bayesian reinforcement learning"},{"key":"B78","doi-asserted-by":"publisher","DOI":"10.1007\/s12064-011-0142-z"},{"key":"B79","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-141-3.50030-4"},{"key":"B80","author":"Sutton R. S.","year":"1998","journal-title":"Introduction to reinforcement learning"},{"key":"B81","author":"Sutton R. S.","year":"2018","journal-title":"Reinforcement learning: An introduction"},{"key":"B82","doi-asserted-by":"publisher","DOI":"10.1016\/j.conb.2016.01.014"},{"key":"B83","first-page":"1","author":"Tijsma A. D.","year":"2016","journal-title":"Proceedings of the IEEE Symposium Series on Computational Intelligence"},{"key":"B84","doi-asserted-by":"publisher","DOI":"10.1007\/s00422-018-0785-7"},{"key":"B85","first-page":"437","author":"Vermorel J.","year":"2005","journal-title":"Proceedings of theEuropean Conference on Machine Learning"},{"key":"B86","author":"Watkins C. J. C. H.","year":"1989","journal-title":"Learning from delayed rewards"},{"key":"B87","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"B88","first-page":"223","volume":"6","author":"Wiering M.","year":"1998","journal-title":"Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior: From Animals to Animats"},{"key":"B89","author":"Zintgraf L.","year":"2019","journal-title":"Varibad: A very good method for Bayes-adaptive deep Rl via meta-learning"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/neco_a_01357","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,16]],"date-time":"2023-10-16T23:40:30Z","timestamp":1697499630000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/33\/3\/674-712\/97486"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3]]},"references-count":89,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,3]]}},"alternative-id":["10.1162\/neco_a_01357"],"URL":"https:\/\/doi.org\/10.1162\/neco_a_01357","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3]]}}}