{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T04:21:50Z","timestamp":1777522910682,"version":"3.51.4"},"reference-count":58,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2019,12,30]],"date-time":"2019-12-30T00:00:00Z","timestamp":1577664000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/100007224","name":"national foundation for science and technology development","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007224","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003725","name":"national research foundation of korea","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000288","name":"royal society","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000288","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2021,6]]},"abstract":"<jats:p>Learning to make decisions in partially observable environments is a notorious problem that requires a complex representation of controllers. In most work, the controllers are designed as a non-linear mapping from a sequence of temporal observations to actions. These problems can, in principle, be formulated as a partially observable Markov decision process whose policy can be parameterised through the use of recurrent neural networks. In this paper, we will propose an alternative framework that (a) uses the Long-Short-Term-Memory (LSTM) Encoder-Decoder framework to learn an internal state representation for historical observations and then (b) integrates it into existing recurrent policy models to improve the task performance. The LSTM Encoder encodes a history of observations as input into a representation of internal states. The LSTM Decoder can perform two alternative decoding tasks: predicting the same input observation sequence or predicting future observation sequences. The first proposed decoder acts like an auto-encoder that will guide and constrain the learning of a useful internal state for the policy optimisation task. The second proposed decoder decodes the learnt internal state by the encoder to predict future observation sequences. This idea makes the network act like a non-linear predictive state representation model. Both these decoding parts, which introduce constraints to policy representation, will help guide both the policy optimisation problem and latent state representation learning. The integration of representation learning and policy optimisation aims to help learn more complex policies and improve the performance of policy learning tasks.<\/jats:p>","DOI":"10.1177\/1059712319891641","type":"journal-article","created":{"date-parts":[[2019,12,30]],"date-time":"2019-12-30T03:27:30Z","timestamp":1577676450000},"page":"253-265","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Constrained representation learning for recurrent policy optimisation under uncertainty"],"prefix":"10.1177","volume":"29","author":[{"given":"Viet-Hung","family":"Dang","sequence":"first","affiliation":[{"name":"Institute of Research and Development, DuyTan University, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9646-267X","authenticated-orcid":false,"given":"Ngo Anh","family":"Vien","sequence":"additional","affiliation":[{"name":"School of EEECS, Queen\u2019s University Belfast, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"TaeChoong","family":"Chung","sequence":"additional","affiliation":[{"name":"Artificial Intelligent Lab, Department of Computer Engineering, Kyung Hee University, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2019,12,30]]},"reference":[{"key":"bibr1-1059712319891641","first-page":"265","volume-title":"OSDI","author":"Abadi M.","year":"2016"},{"key":"bibr2-1059712319891641","author":"Azizzadenesheli K.","year":"2018","journal-title":"arXiv preprint arXiv:1810.07900"},{"key":"bibr3-1059712319891641","volume-title":"3rd International Conference on Learning Representations, ICLR 2015","author":"Bahdanau D.","year":"2015"},{"key":"bibr4-1059712319891641","first-page":"1475","volume-title":"Advances in Neural Information Processing Systems 14 (NIPS)","author":"Bakker B.","year":"2001"},{"key":"bibr5-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/IRDS.2002.1041511"},{"key":"bibr6-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0180944"},{"key":"bibr7-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1613\/jair.806"},{"key":"bibr8-1059712319891641","volume-title":"Spectral approaches to learning predictive representations","author":"Boots B.","year":"2012"},{"key":"bibr9-1059712319891641","volume-title":"Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, UAI 2013","author":"Boots B.","year":"2013"},{"key":"bibr10-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553383"},{"key":"bibr11-1059712319891641","author":"Brockman G.","year":"2016","journal-title":"arXiv preprint arXiv:1606.01540"},{"key":"bibr12-1059712319891641","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"bibr13-1059712319891641","first-page":"183","volume-title":"The Tenth National Conference on Artificial Intelligence (AAAI)","author":"Chrisman L.","year":"1992"},{"key":"bibr14-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.191"},{"key":"bibr15-1059712319891641","first-page":"913","volume-title":"The 28th International Conference on Machine Learning (ICML)","author":"Doshi-Velez F.","year":"2011"},{"key":"bibr16-1059712319891641","first-page":"6055","author":"Downey C.","year":"2017","journal-title":"Advances in Neural Information Processing Systems 30 (NIPS)"},{"key":"bibr17-1059712319891641","first-page":"1329","volume-title":"International Conference on Machine Learning (ICML)","author":"Duan Y.","year":"2016"},{"key":"bibr18-1059712319891641","author":"Hausknecht M. J.","year":"2015","journal-title":"CoRR abs\/1507.06527"},{"key":"bibr19-1059712319891641","author":"Heess N.","year":"2015","journal-title":"arXiv preprint arXiv:1512.04455"},{"key":"bibr20-1059712319891641","first-page":"1963","author":"Hefny A.","year":"2015","journal-title":"Advances in Neural Information Processing Systems 28 (NIPS)"},{"key":"bibr21-1059712319891641","first-page":"1954","volume-title":"Proceedings of the 35th International Conference on Machine Learning (ICML)","author":"Hefny A.","year":"2018"},{"key":"bibr22-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"bibr23-1059712319891641","first-page":"2122","volume-title":"Proceedings of the 35th International Conference on Machine Learning, ICML 2018","author":"Igl M.","year":"2018"},{"key":"bibr24-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(98)00023-X"},{"key":"bibr25-1059712319891641","first-page":"1700","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013","author":"Kalchbrenner N.","year":"2013"},{"key":"bibr26-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2010.5596468"},{"key":"bibr27-1059712319891641","volume-title":"Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015","author":"Lever G.","year":"2015"},{"key":"bibr28-1059712319891641","first-page":"1555","volume-title":"Advances in neural information processing systems (NIPS)","author":"Littman M. L.","year":"2002"},{"key":"bibr29-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1155\/2017\/4694860"},{"key":"bibr30-1059712319891641","first-page":"190","volume-title":"The Tenth International Conference on Machine Learning (ICML)","author":"McCallum A.","year":"1993"},{"key":"bibr31-1059712319891641","first-page":"377","volume-title":"Advances in Neural Information Processing Systems 7","author":"McCallum A.","year":"1994"},{"key":"bibr32-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50055-4"},{"key":"bibr33-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"bibr34-1059712319891641","first-page":"1310","volume-title":"Proceedings of the 30th International Conference on Machine Learning, ICML 2013","author":"Pascanu R.","year":"2013"},{"key":"bibr35-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"bibr36-1059712319891641","first-page":"855","author":"Rudary M. R.","year":"2004","journal-title":"Advances in neural information processing systems (NIPS)"},{"key":"bibr37-1059712319891641","doi-asserted-by":"publisher","DOI":"10.21236\/ADA164453"},{"key":"bibr38-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1038\/323533a0"},{"key":"bibr39-1059712319891641","first-page":"387","volume-title":"Proceedings of the 31th International Conference on Machine Learning, ICML 2014","author":"Silver D.","year":"2014"},{"key":"bibr40-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1287\/opre.21.5.1071"},{"key":"bibr41-1059712319891641","volume-title":"The optimal control of partially observable Markov processes","author":"Sondik E. J.","year":"1971"},{"key":"bibr42-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2013.2252713"},{"key":"bibr43-1059712319891641","first-page":"843","volume-title":"International conference on machine learning","author":"Srivastava N.","year":"2015"},{"key":"bibr44-1059712319891641","first-page":"1197","volume-title":"ICML, JMLR Workshop and Conference Proceedings","volume":"48","author":"Sun W.","year":"2016"},{"key":"bibr45-1059712319891641","volume-title":"Training recurrent neural networks","author":"Sutskever I.","year":"2013"},{"key":"bibr46-1059712319891641","first-page":"3104","volume-title":"Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014","author":"Sutskever I.","year":"2014"},{"key":"bibr47-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2854283"},{"key":"bibr48-1059712319891641","first-page":"1172","volume-title":"Advances in Neural Information Processing Systems 30 (NIPS)","author":"Venkatraman A.","year":"2017"},{"key":"bibr49-1059712319891641","first-page":"2089","volume-title":"Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016","author":"Vien N. A.","year":"2016"},{"key":"bibr50-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1587\/transinf.E93.D.271"},{"key":"bibr51-1059712319891641","first-page":"559","volume-title":"Proceedings of the 31th International Conference on Machine Learning, ICML 2014","author":"Vien N. A.","year":"2014"},{"key":"bibr52-1059712319891641","author":"Wahlstr\u00f6m N.","year":"2015","journal-title":"arXiv preprint arXiv:1502.02251"},{"key":"bibr53-1059712319891641","first-page":"2746","author":"Watter M.","year":"2015","journal-title":"Advances in Neural Information Processing Systems 28 (NIPS)"},{"key":"bibr54-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1109\/5.58337"},{"key":"bibr55-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1093\/jigpal\/jzp049"},{"key":"bibr56-1059712319891641","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"bibr57-1059712319891641","author":"Zhang M.","year":"2015","journal-title":"CoRR, abs\/1507.01273"},{"key":"bibr58-1059712319891641","author":"Zhu P.","year":"2018","journal-title":"arXiv preprint arXiv:1804.06309"}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712319891641","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1059712319891641","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1059712319891641","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T16:19:01Z","timestamp":1777393141000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1059712319891641"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,30]]},"references-count":58,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,6]]}},"alternative-id":["10.1177\/1059712319891641"],"URL":"https:\/\/doi.org\/10.1177\/1059712319891641","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,30]]}}}