{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T16:40:03Z","timestamp":1751042403990,"version":"3.41.0"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"19","license":[{"start":{"date-parts":[[2022,9,4]],"date-time":"2022-09-04T00:00:00Z","timestamp":1662249600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,4]],"date-time":"2022-09-04T00:00:00Z","timestamp":1662249600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"publisher","award":["758824"],"award-info":[{"award-number":["758824"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose <jats:italic>influence-aware memory<\/jats:italic>, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating <jats:italic>Q<\/jats:italic> values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN\u2019s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.<\/jats:p>","DOI":"10.1007\/s00521-022-07691-7","type":"journal-article","created":{"date-parts":[[2022,9,4]],"date-time":"2022-09-04T11:02:23Z","timestamp":1662289343000},"page":"13145-13161","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Influence-aware memory architectures for deep reinforcement learning in POMDPs"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2607-8992","authenticated-orcid":false,"given":"Miguel","family":"Suau","sequence":"first","affiliation":[]},{"given":"Jinke","family":"He","sequence":"additional","affiliation":[]},{"given":"Elena","family":"Congeduti","sequence":"additional","affiliation":[]},{"given":"Rolf A. N.","family":"Starre","sequence":"additional","affiliation":[]},{"given":"Aleksander","family":"Czechowski","sequence":"additional","affiliation":[]},{"given":"Frans A.","family":"Oliehoek","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,9,4]]},"reference":[{"key":"7691_CR1","volume-title":"Advances in neural information processing systems","author":"B Bakker","year":"2001","unstructured":"Bakker B (2001) Reinforcement learning with long short-term memory. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems. MIT Press, London"},{"key":"7691_CR2","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","volume":"47","author":"MG Bellemare","year":"2013","unstructured":"Bellemare MG, Naddaf Y, Veness J et al (2013) The Arcade Learning Environment: an evaluation platform for general agents. J Artif Intell Res 47:253\u2013279","journal-title":"J Artif Intell Res"},{"key":"7691_CR3","volume-title":"Pattern recognition and machine learning","author":"CM Bishop","year":"2006","unstructured":"Bishop CM (2006) Pattern recognition and machine learning. Springer, New York"},{"key":"7691_CR4","unstructured":"Boutilier C,   David P (1996) Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations. In: Proceedings of the National Conference on Artificial Intelligence, pp 1168\u20131175"},{"key":"7691_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1613\/jair.575","volume":"11","author":"C Boutilier","year":"1999","unstructured":"Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and computational leverage. J Artif Intell Res 11:1\u201394","journal-title":"J Artif Intell Res"},{"key":"7691_CR6","unstructured":"Chevalier-Boisvert M, Willems L, Pal S (2018) Minimalistic gridworld environment for openai gym. https:\/\/github.com\/maximecb\/gym-minigrid"},{"key":"7691_CR7","doi-asserted-by":"crossref","unstructured":"Cho K, van Merrienboer B, Gulcehre C, et\u00a0al (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2014)","DOI":"10.3115\/v1\/D14-1179"},{"key":"7691_CR8","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence"},{"issue":"8","key":"7691_CR9","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u20131780","journal-title":"Neural Comput"},{"key":"7691_CR10","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1007\/978-1-4612-4380-9_14","volume-title":"Breakthroughs in statistics","author":"H Hotelling","year":"1992","unstructured":"Hotelling H (1992) Relations between two sets of variates. Breakthroughs in statistics. Springer, New York, pp 162\u2013190"},{"key":"7691_CR11","unstructured":"Igl M, Zintgraf L, Le TA, et\u00a0al (2018) Deep variational reinforcement learning for POMDPs. In: Proceedings of the 35th international conference on machine learning, pp 2117\u20132126"},{"key":"7691_CR12","unstructured":"Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, pp 2961\u20132970"},{"key":"7691_CR13","unstructured":"Jaakkola T, Singh SP, Jordan MI (1995) Reinforcement learning algorithm for partially observable markov decision problems. In: Advances in neural information processing systems, pp 345\u2013352"},{"issue":"6443","key":"7691_CR14","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1126\/science.aau6249","volume":"364","author":"M Jaderberg","year":"2019","unstructured":"Jaderberg M, Czarnecki WM, Dunning I et al (2019) Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443):859\u2013865","journal-title":"Science"},{"key":"7691_CR15","first-page":"237","volume":"4","author":"LP Kaelbling","year":"1996","unstructured":"Kaelbling LP, Littman M, Moore A (1996) Reinforcement learning: a survey. J AI Res 4:237\u2013285","journal-title":"J AI Res"},{"key":"7691_CR16","doi-asserted-by":"crossref","unstructured":"Lample G, Chaplot DS (2017) Playing fps games with deep reinforcement learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 2140\u20132146","DOI":"10.1609\/aaai.v31i1.10827"},{"key":"7691_CR17","first-page":"271","volume":"2","author":"LJ Lin","year":"1993","unstructured":"Lin LJ, Mitchell TM (1993) Reinforcement learning with hidden states. From Anim Animats 2:271\u2013280","journal-title":"From Anim Animats"},{"key":"7691_CR18","doi-asserted-by":"crossref","unstructured":"Littman ML (1994) Memoryless policies: Theoretical limitations and practical results. In: Proceedings of the third international conference on simulation of adaptive behavior : from animals to animats 3, pp 238\u2013245","DOI":"10.7551\/mitpress\/3117.003.0041"},{"key":"7691_CR19","doi-asserted-by":"crossref","unstructured":"Lopez PA, Behrisch M, Bieker-Walz L, et\u00a0al (2018) Microscopic traffic simulation using sumo. In: The 21st IEEE international conference on intelligent transportation systems. IEEE","DOI":"10.1109\/ITSC.2018.8569938"},{"key":"7691_CR20","doi-asserted-by":"crossref","unstructured":"Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025","DOI":"10.18653\/v1\/D15-1166"},{"key":"7691_CR21","doi-asserted-by":"crossref","unstructured":"McCallum AK (1995a) Instance-based utile distinctions for reinforcement learning with hidden state. In: Machine learning proceedings 1995. Elsevier, pp 387\u2013395","DOI":"10.1016\/B978-1-55860-377-6.50055-4"},{"key":"7691_CR22","unstructured":"McCallum AK (1995b) Reinforcement learning with selective perception and hidden state. PhD thesis, University of Rochester"},{"issue":"7540","key":"7691_CR23","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529","journal-title":"Nature"},{"key":"7691_CR24","unstructured":"Mott A, Zoran D, Chrzanowski M, et\u00a0al (2019) Towards interpretable reinforcement learning using attention augmented agents. In: Advances in neural information processing systems, pp 12329\u201312338"},{"key":"7691_CR25","unstructured":"Ng AY, Jordan M (2000) PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 406\u2013415"},{"key":"7691_CR26","unstructured":"Oh J, Chockalingam V, Satinder, et\u00a0al (2016) Control of memory, active perception, and action in minecraft. In: Proceedings of The 33rd international conference on machine learning"},{"key":"7691_CR27","doi-asserted-by":"publisher","first-page":"789","DOI":"10.1613\/jair.1.12136","volume":"70","author":"F Oliehoek","year":"2021","unstructured":"Oliehoek F, Witwicki S, Kaelbling L (2021) A sufficient statistic for influence in structured multiagent environments. J Artif Intell Res 70:789\u2013870","journal-title":"J Artif Intell Res"},{"key":"7691_CR28","unstructured":"Oliehoek FA, Witwicki SJ, Kaelbling LP (2012) Influence-based abstraction for multiagent systems. In: AAAI12"},{"key":"7691_CR29","doi-asserted-by":"crossref","unstructured":"Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann","DOI":"10.1016\/B978-0-08-051489-5.50008-4"},{"key":"7691_CR30","unstructured":"Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of the international joint conference on artificial intelligence, pp 1025\u20131032"},{"key":"7691_CR31","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887","volume-title":"Markov decision processes-discrete stochastic dynamic programming","author":"ML Puterman","year":"1994","unstructured":"Puterman ML (1994) Markov decision processes-discrete stochastic dynamic programming. Wiley, Hoboken"},{"key":"7691_CR32","unstructured":"Schmidhuber J (1991) Reinforcement learning in Markovian and non-Markovian environments. In: Lippman DS, Moody JE, Touretzky DS (eds) Advances in neural information processing systems 3 (NIPS 3). Morgan Kaufmann, pp 500\u2013506"},{"key":"7691_CR33","unstructured":"Schulman J, Wolski F, Dhariwal P, et\u00a0al (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347"},{"key":"7691_CR34","unstructured":"Silver D, Veness J (2010) Monte-carlo planning in large pomdps. In: Advances in neural information processing systems, pp 2164\u20132172"},{"key":"7691_CR35","doi-asserted-by":"crossref","unstructured":"Singh SP, Jaakkola T, Jordan MI (1994) Learning without state-estimation in partially observable Markovian decision processes. In: Proceedings of the international conference on machine learning. Morgan Kaufmann, pp 284\u2013292","DOI":"10.1016\/B978-1-55860-335-6.50042-8"},{"key":"7691_CR36","unstructured":"Sorokin I, Seleznev A, Pavlov M, et\u00a0al (2015) Deep attention recurrent q-network. arXiv preprint arXiv:1512.01693"},{"key":"7691_CR37","doi-asserted-by":"crossref","unstructured":"Steckelmacher D, Roijers D, Harutyunyan A, et\u00a0al (2018) Reinforcement learning in pomdps with memoryless options and option-observation initiation sets. In: Proceedings of the AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v32i1.11606"},{"key":"7691_CR38","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge"},{"issue":"1","key":"7691_CR39","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton RS, Precup D, Singh S (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artif Intell 112(1):181\u2013211","journal-title":"Artif Intell"},{"key":"7691_CR40","doi-asserted-by":"crossref","unstructured":"Tang Y, Nguyen D, Ha D (2020) Neuroevolution of self-interpretable agents. arXiv preprint arXiv:2003.08165","DOI":"10.1145\/3377930.3389847"},{"key":"7691_CR41","doi-asserted-by":"crossref","unstructured":"Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle. In: 2015 IEEE information theory workshop (ITW), IEEE, pp 1\u20135","DOI":"10.1109\/ITW.2015.7133169"},{"key":"7691_CR42","first-page":"5998","volume":"17","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 17:5998\u20136008","journal-title":"Adv Neural Inf Process Syst"},{"key":"7691_CR43","doi-asserted-by":"crossref","unstructured":"Witwicki SJ, Durfee EH (2010) Influence-based policy abstraction for weakly-coupled Dec-POMDPs. In: Proceedings of the international conference on automated planning and scheduling, pp 185\u2013192","DOI":"10.1609\/icaps.v20i1.13419"},{"key":"7691_CR44","unstructured":"Xu K, Ba J, Kiros R, et\u00a0al (2015) Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 32nd international conference on machine learning, pp 2048\u20132057"},{"key":"7691_CR45","unstructured":"Zhu P, Li X, Poupart P (2017) On improving deep reinforcement learning for POMDPs. ArXiv preprint arXiv:1704.07978"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-022-07691-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-022-07691-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-022-07691-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T16:02:44Z","timestamp":1751040164000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-022-07691-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,4]]},"references-count":45,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["7691"],"URL":"https:\/\/doi.org\/10.1007\/s00521-022-07691-7","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"type":"print","value":"0941-0643"},{"type":"electronic","value":"1433-3058"}],"subject":[],"published":{"date-parts":[[2022,9,4]]},"assertion":[{"value":"23 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 September 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}]}}