{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:55:16Z","timestamp":1760151316132,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,3,13]],"date-time":"2022-03-13T00:00:00Z","timestamp":1647129600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Estonian Centre of Excellence in IT (EXCITE)","award":["TK148"],"award-info":[{"award-number":["TK148"]}]},{"name":"TRUST-AI project from the European Union's Horizon 2020 research and innovation programme","award":["952060"],"award-info":[{"award-number":["952060"]}]},{"name":"European Social Fund via IT Academy Programme","award":["SLTAT18311"],"award-info":[{"award-number":["SLTAT18311"]}]},{"name":"Nieders\u00e4chsisches Vorab of the VolkswagenStiftung","award":["ZN3326","ZN3371"],"award-info":[{"award-number":["ZN3326","ZN3371"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Intuitively, the level of autonomy of an agent is related to the degree to which the agent\u2019s goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent\u2019s behaviour against perturbations in the observations.<\/jats:p>","DOI":"10.3390\/e24030401","type":"journal-article","created":{"date-parts":[[2022,3,13]],"date-time":"2022-03-13T22:29:43Z","timestamp":1647210583000},"page":"401","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Quantifying Reinforcement-Learning Agent\u2019s Autonomy, Reliance on Memory and Internalisation of the Environment"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3207-3269","authenticated-orcid":false,"given":"Anti","family":"Ingel","sequence":"first","affiliation":[{"name":"Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3581-8262","authenticated-orcid":false,"given":"Abdullah","family":"Makkeh","sequence":"additional","affiliation":[{"name":"G\u00f6ttingen Campus Institute for Dynamics of Biological Networks, University of G\u00f6ttingen, 37075 G\u00f6ttingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5552-7123","authenticated-orcid":false,"given":"Oriol","family":"Corcoll","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2497-0007","authenticated-orcid":false,"given":"Raul","family":"Vicente","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,13]]},"reference":[{"key":"ref_1","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press. [2nd ed.]. Adaptive Computation and Machine Learning (Francis Bach Series Editor)."},{"key":"ref_2","unstructured":"Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., and Mordatch, I. (2019). Emergent Tool Use From Multi-Agent Autocurricula. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_4","unstructured":"Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., and Hesse, C. (2019). Dota 2 with Large Scale Deep Reinforcement Learning. arXiv."},{"key":"ref_5","unstructured":"Stooke, A., Mahajan, A., Barros, C., Deck, C., Bauer, J., Sygnowski, J., Trebacz, M., Jaderberg, M., Mathieu, M., and McAleese, N. (2021). Open-Ended Learning Leads to Generally Capable Agents. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/j.biosystems.2007.05.018","article-title":"Autonomy: An information theoretic perspective","volume":"91","author":"Bertschinger","year":"2008","journal-title":"Biosystems"},{"key":"ref_7","unstructured":"Klyubin, A.S., Polani, D., and Nehaniv, C.L. (2005, January 2\u20135). Empowerment: A universal agent-centric measure of control. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, Edinburgh, UK."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1177\/1059712310392389","article-title":"Empowerment for continuous agent\u2014Environment systems","volume":"19","author":"Jung","year":"2011","journal-title":"Adapt. Behav."},{"key":"ref_9","first-page":"2125","article-title":"Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning","volume":"Volume 2","author":"Mohamed","year":"2015","journal-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems"},{"key":"ref_10","unstructured":"Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., and Garnett, R. (2016). VIME: Variational Information Maximizing Exploration. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1103\/PhysRevLett.85.461","article-title":"Measuring information transfer","volume":"85","author":"Schreiber","year":"2000","journal-title":"Phys. Rev. Lett."},{"key":"ref_12","unstructured":"Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P., Strouse, D., Leibo, J.Z., and De Freitas, N. (2019, January 9\u201315). Social influence as intrinsic motivation for multi-agent deep reinforcement learning. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sootla, S., Theis, D.O., and Vicente, R. (2017). Analyzing Information Distribution in Complex Systems. Entropy, 19.","DOI":"10.3390\/e19120636"},{"key":"ref_14","unstructured":"Zhao, R., Gao, Y., Abbeel, P., Tresp, V., and Xu, W. (2021). Mutual Information State Intrinsic Control. International Conference on Learning Representations. arXiv."},{"key":"ref_15","unstructured":"Williams, P.L., and Beer, R.D. (2010). Nonnegative decomposition of multivariate information. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.bandc.2015.09.004","article-title":"Partial Information Decomposition as a Unified Approach to the Specification of Neural Goal Functions","volume":"112","author":"Wibral","year":"2017","journal-title":"Brain Cogn."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wibral, M., Finn, C., Wollstadt, P., Lizier, J.T., and Priesemann, V. (2017). Quantifying Information Modification in Developing Neural Networks via Partial Information Decomposition. Entropy, 19.","DOI":"10.3390\/e19090494"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tax, T.M.S., Mediano, P.A.M., and Shanahan, M. (2017). The Partial Information Decomposition of Generative Neural Network Models. Entropy, 19.","DOI":"10.3390\/e19090474"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1016\/j.artint.2008.12.001","article-title":"Enactive artificial intelligence: Investigating the systemic organization of life and mind","volume":"173","author":"Froese","year":"2009","journal-title":"Artif. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1016\/j.cogsys.2019.08.006","article-title":"CASH only: Constitutive autonomy through motorsensory self-programming","volume":"58","author":"Georgeon","year":"2019","journal-title":"Cogn. Syst. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1080\/09528130600926066","article-title":"The metacognitive loop I: Enhancing reinforcement learning with metacognitive monitoring and control for improved perturbation tolerance","volume":"18","author":"Anderson","year":"2006","journal-title":"J. Exp. Theor. Artif. Intell."},{"key":"ref_22","first-page":"20210110","article-title":"Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic","volume":"477","author":"Gutknecht","year":"2021","journal-title":"Proc. R. Soc. A Math. Phys. Eng. Sci."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"012130","DOI":"10.1103\/PhysRevE.87.012130","article-title":"Bivariate measure of redundant information","volume":"87","author":"Harder","year":"2013","journal-title":"Phys. Rev. E"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2161","DOI":"10.3390\/e16042161","article-title":"Quantifying unique information","volume":"16","author":"Bertschinger","year":"2014","journal-title":"Entropy"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Prokopenko, M. (2014). Quantifying Synergistic Mutual Information. Guided Self-Organization: Inception, Springer.","DOI":"10.1007\/978-3-642-53734-9"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"35","DOI":"10.3389\/frobt.2015.00035","article-title":"Hierarchical quantification of synergy in channels","volume":"2","author":"Perrone","year":"2016","journal-title":"Front. Robot. AI"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ince, R. (2017). Measuring multivariate redundant information with pointwise common change in surprisal. Entropy, 19.","DOI":"10.3390\/e19070318"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Quax, R., Har-Shemesh, O., and Sloot, P.M.A. (2017). Quantifying synergistic information using intermediate stochastic variables. Entropy, 19.","DOI":"10.3390\/e19020085"},{"key":"ref_29","unstructured":"Chicharro, D. (2018). Quantifying multivariate redundancy with maximum entropy decompositions of mutual information. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Finn, C., and Lizier, J. (2018). Pointwise partial information decomposition using the specificity and ambiguity lattices. Entropy, 20.","DOI":"10.3390\/e20040297"},{"key":"ref_31","unstructured":"Kolchinsky, A. (2020). A novel approach to multivariate redundancy and synergy. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Sigtermans, D. (2020). A path-based partial information decomposition. Entropy, 22.","DOI":"10.3390\/e22090952"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"032149","DOI":"10.1103\/PhysRevE.103.032149","article-title":"Introducing a differentiable measure of pointwise shared information","volume":"103","author":"Makkeh","year":"2021","journal-title":"Phys. Rev. E"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_35","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M.A. (2013). Playing Atari with Deep Reinforcement Learning. NIPS Deep Learning Workshop. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Makkeh, A., Theis, D.O., and Vicente, R. (2018). BROJA-2PID: A robust estimator for bivariate partial information decomposition. Entropy, 20.","DOI":"10.3390\/e20040271"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1162\/artl.2010.16.2.16204","article-title":"Measuring Autonomy and Emergence via Granger Causality","volume":"16","author":"Seth","year":"2010","journal-title":"Artif. Life"},{"key":"ref_38","unstructured":"Dittrich, P., and Artmann, S. (2006). Information and closure in systems theory. Explorations in the Complexity of Possible Life, IOS Press."},{"key":"ref_39","unstructured":"Ari, P., Amin, N., Dar, G., Abdullah, M., Luca, M., Michael, W., and Elad, S. (2021, January 6\u201314). Estimating the unique information of continuous variables. Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), online."},{"key":"ref_40","unstructured":"Schick-Poland, K., Makkeh, A., Gutknecht, A.J., Wollstadt, P., Sturm, A., and Wibral, M. (2021). A partial information decomposition for discrete and continuous variables. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Corcoll, O., Makkeh, A., Aru, J., Oliver Theis, D., and Vicente, R. (2019, January 13\u201316). Attention manipulation in reinforcement learning agents. Proceedings of the Conference on Cognitive Computational Neuroscience, CCN, Berlin, Germany.","DOI":"10.32470\/CCN.2019.1274-0"},{"key":"ref_42","unstructured":"Gilbert, T., Kirkilionis, M., and Nicolis, G. Shared Information\u2014New Insights and Problems in Decomposing Information in Complex Systems. Proceedings of the European Conference on Complex Systems 2012."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Makkeh, A., Theis, D.O., and Vicente, R. (2017). Bivariate partial information decomposition: The optimization perspective. Entropy, 19.","DOI":"10.3390\/e19100530"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/3\/401\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:35:52Z","timestamp":1760135752000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/3\/401"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,13]]},"references-count":43,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["e24030401"],"URL":"https:\/\/doi.org\/10.3390\/e24030401","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2022,3,13]]}}}