{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T17:36:59Z","timestamp":1775324219652,"version":"3.50.1"},"reference-count":33,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2018,12,1]],"date-time":"2018-12-01T00:00:00Z","timestamp":1543622400000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Spoken language is one of the most efficientways to instruct robots about performing domestic tasks. However, the state of the environment has to be considered to plan and execute actions successfully. We propose a system that learns to recognise the user\u2019s intention and map it to a goal. A reinforcement learning (RL) system then generates a sequence of actions toward this goal considering the state of the environment. A novel contribution in this paper is the use of symbolic representations for both input and output of a neural Deep Q-network (DQN), which enables it to be used in a hybrid system. To show the effectiveness of our approach, the Tell-Me-Dave corpus is used to train an intention detection model and in a second step an RL agent generates the sequences of actions towards the detected objective, represented by a set of state predicates. We show that the system can successfully recognise command sequences fromthis corpus aswell as train the deep- RL network with symbolic input.We further show that the performance can be significantly increased by exploiting the symbolic representation to generate intermediate rewards.<\/jats:p>","DOI":"10.1515\/pjbr-2018-0026","type":"journal-article","created":{"date-parts":[[2018,12,19]],"date-time":"2018-12-19T04:04:10Z","timestamp":1545192250000},"page":"358-373","source":"Crossref","is-referenced-by-count":2,"title":["Deep reinforcement learning using compositional representations for performing instructions"],"prefix":"10.1515","volume":"9","author":[{"given":"Mohammad Ali","family":"Zamani","sequence":"first","affiliation":[{"name":"Knowledge Technology, Department of Informatics, University of Hamburg, Vogt-Koelln-Str. 30, Hamburg , Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sven","family":"Magg","sequence":"additional","affiliation":[{"name":"Knowledge Technology, Department of Informatics, University of Hamburg, Vogt- Koelln-Str. 30, Hamburg , Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cornelius","family":"Weber","sequence":"additional","affiliation":[{"name":"Knowledge Technology, Department of Informatics, University of Hamburg, Vogt- Koelln-Str. 30, Hamburg , Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Wermter","sequence":"additional","affiliation":[{"name":"Knowledge Technology, Department of Informatics, University of Hamburg, Vogt- Koelln-Str. 30, Hamburg , Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Di","family":"Fu","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing , China"},{"name":"Department of Psychology, University of Chinese Academy of Sciences, Beijing , China"},{"name":"Knowledge Technology, Department of Informatics, University of Hamburg, Vogt-Koelln-Str. 30, Hamburg , Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"374","published-online":{"date-parts":[[2018,12,6]]},"reference":[{"key":"2022042712092638225_j_pjbr-2018-0026_ref_001_w2aab3b7c26b1b6b1ab1ab1Aa","doi-asserted-by":"crossref","unstructured":"[1] S. Schaal, The new robotics - towards human-centered machines, HFSP Journal, 2007, 1(2), 115-12610.2976\/1.2748612","DOI":"10.2976\/1.2748612"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_002_w2aab3b7c26b1b6b1ab1ab2Aa","doi-asserted-by":"crossref","unstructured":"[2] S. Schaal, C. G. Atkeson, Learning control in robotics, IEEE Robotics & Automation Magazine, 2010, 17(2), 20-2910.1109\/MRA.2010.936957","DOI":"10.1109\/MRA.2010.936957"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_003_w2aab3b7c26b1b6b1ab1ab3Aa","doi-asserted-by":"crossref","unstructured":"[3] J. Peters, S. Schaal, Learning to control in operational space, The International Journal of Robotics Research, 2008, 27(2), 197-21210.1177\/0278364907087548","DOI":"10.1177\/0278364907087548"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_004_w2aab3b7c26b1b6b1ab1ab4Aa","doi-asserted-by":"crossref","unstructured":"[4] S. Lauria, G. Bugmann, T. Kyriacou, E. Klein, Mobile robot programming using natural language, Robotics and Autonomous Systems, 2002, 38(3), 171-18110.1016\/S0921-8890(02)00166-5","DOI":"10.1016\/S0921-8890(02)00166-5"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_005_w2aab3b7c26b1b6b1ab1ab5Aa","doi-asserted-by":"crossref","unstructured":"[5] S. Lauria, G. Bugmann, T. Kyriacou, J. Bos, E. Klein, Converting natural language route instructions into robot executable procedures, In: Proceedings of the 11th IEEE International Workshop on Robot and Human Interactive Communication, IEEE, 2002, 223-228","DOI":"10.1109\/ROMAN.2002.1045626"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_006_w2aab3b7c26b1b6b1ab1ab6Aa","doi-asserted-by":"crossref","unstructured":"[6] T. Nishizawa, K. Kishita, Y. Takano, Y. Fujita, S. Yuta, Proposed system of unlocking potentially hazardous function of robot based on verbal communication, In: 2011 IEEE\/SICE International Symposium on System Integration (SII), IEEE, 2011, 1208-121310.1109\/SII.2011.6147621","DOI":"10.1109\/SII.2011.6147621"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_007_w2aab3b7c26b1b6b1ab1ab7Aa","doi-asserted-by":"crossref","unstructured":"[7] W. Hua, Z. Wang, H. Wang, K. Zheng, X. Zhou, Short text understanding through lexical semantic analysis, In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), IEEE, 2015, 495-50610.1109\/ICDE.2015.7113309","DOI":"10.1109\/ICDE.2015.7113309"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_008_w2aab3b7c26b1b6b1ab1ab8Aa","unstructured":"[8] A. Abdulkader, A. Lakshmiratan, J. Zhang, Introducing DeepText: Facebook\u2019s text understanding engine, https:\/\/code.facebook.com\/posts\/181565595577955\/introducingdeeptext-facebook-s-textunderstanding-engine [Accessed: 2018-01-30]"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_009_w2aab3b7c26b1b6b1ab1ab9Aa","unstructured":"[9] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, Journal of Machine Learning Research, 2011, 12, 2493-2537"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_010_w2aab3b7c26b1b6b1ab1ac10Aa","doi-asserted-by":"crossref","unstructured":"[10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 2015, 521(7553), 436-44410.1038\/nature14539","DOI":"10.1038\/nature14539"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_011_w2aab3b7c26b1b6b1ab1ac11Aa","unstructured":"[11] I. Sutskever,O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, In: NIPS\u201914 Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014, 2, 3104-3112"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_012_w2aab3b7c26b1b6b1ab1ac12Aa","doi-asserted-by":"crossref","unstructured":"[12] S. Hochreiter, J. Schmidhuber, long short-term memory, Neural Computation, 1997, 9(8), 1735-178010.1162\/neco.1997.9.8.1735","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_013_w2aab3b7c26b1b6b1ab1ac13Aa","unstructured":"[13] R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, vol.1, MIT Press Cambridge, 1998"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_014_w2aab3b7c26b1b6b1ab1ac14Aa","unstructured":"[14] A. L. Thomaz, G. Hoffman, C. Breazeal, Real-time interactive reinforcement learning for robots, In: AAAI 2005 Workshop on Human Comprehensible Machine Learning, 2005"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_015_w2aab3b7c26b1b6b1ab1ac15Aa","doi-asserted-by":"crossref","unstructured":"[15] A. L. Thomaz, C. Breazeal, Teachable robots: understanding human teaching behavior to build more effective robot learners, Artificial Intelligence, 2008, 172(6-7), 716-73710.1016\/j.artint.2007.09.009","DOI":"10.1016\/j.artint.2007.09.009"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_016_w2aab3b7c26b1b6b1ab1ac16Aa","doi-asserted-by":"crossref","unstructured":"[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., Human-level control through deep reinforcement Learning, Nature, 2015, 518(7540), 529-53310.1038\/nature14236","DOI":"10.1038\/nature14236"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_017_w2aab3b7c26b1b6b1ab1ac17Aa","doi-asserted-by":"crossref","unstructured":"[17] K. Narasimhan, T. Kulkarni, R. Barzilay, Language understanding for text-based games using deep reinforcement learning, In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, 1-1110.18653\/v1\/D15-1001","DOI":"10.18653\/v1\/D15-1001"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_018_w2aab3b7c26b1b6b1ab1ac18Aa","doi-asserted-by":"crossref","unstructured":"[18] A. Kumar, T. Oates, Connecting deep neural networks with symbolic knowledge, In: The 2017 International Joint Conference on Neural Networks (IJCNN), May 2017, 3601-360810.1109\/IJCNN.2017.7966309","DOI":"10.1109\/IJCNN.2017.7966309"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_019_w2aab3b7c26b1b6b1ab1ac19Aa","unstructured":"[19] M. Garnelo, K. Arulkumaran, M. Shanahan, Towards deep symbolic reinforcement learning, arXiv:1609.05518, 2016"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_020_w2aab3b7c26b1b6b1ab1ac20Aa","unstructured":"[20] E. Bastianelli, G. Castellucci, D. Croce, L. Iocchi, R. Basili, D. Nardi, HuRIC: a human robot interaction corpus, In: the Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914), Reykjavik, Iceland, 26-31 May, 2014, 4519-4526"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_021_w2aab3b7c26b1b6b1ab1ac21Aa","doi-asserted-by":"crossref","unstructured":"[21] D. K. Misra, J. Sung, K. Lee, A. Saxena, Tell me Dave: Contextsensitive grounding of natural language to manipulation instructions, The International Journal of Robotics Research, 2016, 35(1-3), 281-30010.1177\/0278364915602060","DOI":"10.1177\/0278364915602060"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_022_w2aab3b7c26b1b6b1ab1ac22Aa","doi-asserted-by":"crossref","unstructured":"[22] D. K. Misra, K. Tao, P. Liang, A. Saxena, Environment-driven lexicon induction for high-level instructions, In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, July 26-31, 2015, 992-100210.3115\/v1\/P15-1096","DOI":"10.3115\/v1\/P15-1096"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_023_w2aab3b7c26b1b6b1ab1ac23Aa","doi-asserted-by":"crossref","unstructured":"[23] D. Rasmussen, A. Voelker, C. Eliasmith, A neural model of hierarchical reinforcement learning, PLOS ONE, 2017, 12(7), 1-39, https:\/\/doi.org\/10.1371\/journal.pone.018023410.1371\/journal.pone.0180234","DOI":"10.1371\/journal.pone.0180234"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_024_w2aab3b7c26b1b6b1ab1ac24Aa","unstructured":"[24] E. Kolve, R. Mottaghi, D. Gordon, Y. Zhu, A. Gupta, A. Farhadi, AI2-THOR: An interactive 3D environment for visual AI, arXiv:1712.05474, 2017"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_025_w2aab3b7c26b1b6b1ab1ac25Aa","unstructured":"[25] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), 2011, 315-323"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_026_w2aab3b7c26b1b6b1ab1ac26Aa","unstructured":"[26] D. Kingma, J. Ba, Adam: a method for stochastic optimization, In: 3rd International Conference for Learning Representations, San Diego, 2015"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_027_w2aab3b7c26b1b6b1ab1ac27Aa","unstructured":"[27] M. Ghallab, A. Howe, C. Knoblock, D. McDermott A. Ram, M. Veloso, et al., PDDL - The Planning Domain Definition Language, Technical Report TR-98-003, Yale Center for Computational Vision and Control, 1998"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_028_w2aab3b7c26b1b6b1ab1ac28Aa","unstructured":"[28] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang,W. Zaremba, OpenAI Gym, arXiv:1606.01540, 2016"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_029_w2aab3b7c26b1b6b1ab1ac29Aa","doi-asserted-by":"crossref","unstructured":"[29] H. van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, In: AAAI\u201916 Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, 16, 2094-2100","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_030_w2aab3b7c26b1b6b1ab1ac30Aa","unstructured":"[30] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, In: International Conference on Learning Representations (ICLR), May 2016"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_031_w2aab3b7c26b1b6b1ab1ac31Aa","doi-asserted-by":"crossref","unstructured":"[31] M. Khamassi, G. Velentzas, T. Tsitsimis, C. Tzafestas, Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task, In: 2017 First IEEE International Conference on Robotic Computing (IRC), April 2017, 28-35, 10.1109\/IRC.2017.3310.1109\/IRC.2017.33","DOI":"10.1109\/IRC.2017.33"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_032_w2aab3b7c26b1b6b1ab1ac32Aa","doi-asserted-by":"crossref","unstructured":"[32] J. Pennington, R. Socher, C. D. Manning, GloVe: global vectors for word representation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, 1532-1543, ISSN 1049525810.3115\/v1\/D14-1162","DOI":"10.3115\/v1\/D14-1162"},{"key":"2022042712092638225_j_pjbr-2018-0026_ref_033_w2aab3b7c26b1b6b1ab1ac33Aa","unstructured":"[33] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, In: NIPS 2014 Workshop on Deep Learning, December 2014."}],"container-title":["Paladyn, Journal of Behavioral Robotics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.degruyter.com\/view\/j\/pjbr.2018.9.issue-1\/pjbr-2018-0026\/pjbr-2018-0026.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/pjbr-2018-0026\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/pjbr-2018-0026\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T16:26:56Z","timestamp":1775320016000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/pjbr-2018-0026\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,1]]},"references-count":33,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,7,25]]},"published-print":{"date-parts":[[2018,7,1]]}},"alternative-id":["10.1515\/pjbr-2018-0026"],"URL":"https:\/\/doi.org\/10.1515\/pjbr-2018-0026","relation":{},"ISSN":["2081-4836"],"issn-type":[{"value":"2081-4836","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,1]]}}}