{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T05:40:40Z","timestamp":1780638040070,"version":"3.54.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,4,1]],"date-time":"2022-04-01T00:00:00Z","timestamp":1648771200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,7]],"date-time":"2022-04-07T00:00:00Z","timestamp":1649289600000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper we introduce Detect, Understand, Act (DUA), a neuro-symbolic reinforcement learning framework. The Detect component is composed of a traditional computer vision object detector and tracker. The Act component houses a set of options, high-level actions enacted by pre-trained deep reinforcement learning (DRL) policies. The Understand component provides a novel answer set programming (ASP) paradigm for symbolically implementing a meta-policy over options and effectively learning it using inductive logic programming (ILP). We evaluate our framework on the Animal-AI (AAI) competition testbed, a set of physical cognitive reasoning problems. Given a set of pre-trained DRL policies, DUA requires only a few examples to learn a meta-policy that allows it to improve the state-of-the-art on multiple of the most challenging categories from the testbed. DUA constitutes the first holistic hybrid integration of computer vision, ILP and DRL applied to an AAI-like environment and sets the foundations for further use of ILP in complex DRL challenges.<\/jats:p>","DOI":"10.1007\/s10994-022-06142-7","type":"journal-article","created":{"date-parts":[[2022,4,7]],"date-time":"2022-04-07T18:03:51Z","timestamp":1649354631000},"page":"1523-1549","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Detect, Understand, Act: A Neuro-symbolic Hierarchical Reinforcement Learning Framework"],"prefix":"10.1007","volume":"111","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5968-9776","authenticated-orcid":false,"given":"Ludovico","family":"Mitchener","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Tuckey","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matthew","family":"Crosby","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alessandra","family":"Russo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,4,7]]},"reference":[{"key":"6142_CR1","first-page":"6172","volume":"33","author":"G Anderson","year":"2020","unstructured":"Anderson, G., Verma, A., Dillig, I., & Chaudhuri, S. (2020). Neurosymbolic reinforcement learning with formally verified exploration. Advances in Neural Information Processing Systems, 33, 6172\u20136183.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6142_CR2","unstructured":"Andreas, J., Klein, D., & Levine, S. (2017). Modular multitask reinforcement learning with policy sketches. In Proceedings of the34th International Conference on Machine Learning."},{"key":"6142_CR3","unstructured":"Berner, C., Brockman, G., Chan, B., Cheung, V., D\u0119biak, P., Dennison, C. et al. (2019). Dota 2 with large scale deep reinforcement learning. Retrieved from arXiv:1912.06680."},{"key":"6142_CR4","doi-asserted-by":"crossref","unstructured":"Booch, G., Fabiano, F., Horesh, L., Kate, K., Lenchner, J., Linck, N., Srivastava, B. (2020). Thinking fast and slow in AI.","DOI":"10.1609\/aaai.v35i17.17765"},{"issue":"2","key":"6142_CR5","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1145\/3167132.3167165","volume":"18","author":"N Bougie","year":"2018","unstructured":"Bougie, N., Cheng, L. K., & Ichise, R. (2018). Combining deep reinforcement learning with prior knowledge and reasoning. ACM SIGAPP Applied Computing Review, 18(2), 33\u201345. https:\/\/doi.org\/10.1145\/3167132.3167165","journal-title":"ACM SIGAPP Applied Computing Review"},{"key":"6142_CR6","doi-asserted-by":"crossref","unstructured":"Calimeri, F., Faber, W., Gebser, M., Ianni, G., Kaminski, R., Krennwallner, T., Schaub, T. (2019) Asp-core-2 input language format. Retrieved from http:\/\/arxiv.org\/abs\/1911.04326.","DOI":"10.1017\/S1471068419000450"},{"key":"6142_CR7","unstructured":"Clark, K. (1987). Negation as failure. In readings in nonmonotonic reasoning (pp. 311\u2013325)."},{"key":"6142_CR8","unstructured":"Clark, P., Etzioni, O., Khashabi, D., Khot, T., Mishra, B. D., Richardson, K.,... Schmitz, M. (2019, sep). From \u2018F\u2019 to \u2018A\u2019 on the N.Y. regents science exams: An overview of the aristo project. Retrieved from https:\/\/arxiv.org\/abs\/1909.01958."},{"key":"6142_CR9","unstructured":"Cranmer, M. D., Xu, R., Battaglia, P., & Ho, S. (2019). Learning symbolic physics with graph networks. Retrieved from https:\/\/arxiv.org\/abs\/1909.05862."},{"key":"6142_CR10","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-019-0050-3","author":"M Crosby","year":"2019","unstructured":"Crosby, M., Beyret, B., & Halina, M. (2019). The Animal-AI olympics. Nature Machine Intelligence. https:\/\/doi.org\/10.1038\/s42256-019-0050-3","journal-title":"Nature Machine Intelligence"},{"key":"6142_CR11","unstructured":"Crosby, M., Beyret, B., Shanahan, M., Hern\u00e1ndez-Orallo, J., Cheke, L., & Halina, M. (2020). The Animal-AI testbed and competition. In Neurips 2019 competition and demonstration track (pp. 164\u2013176)."},{"key":"6142_CR12","unstructured":"Cunnington, D., Russo, A., Law, M., Lobo, J., & Kaplan, L. (2020). NSL: Hybrid interpretable learning from noisy raw data. Retrieved from https:\/\/arxiv.org\/abs\/2012.05023."},{"issue":"4","key":"6142_CR13","first-page":"611","volume":"6","author":"A d\u2019Avila Garcez","year":"2019","unstructured":"d\u2019Avila Garcez, A., Gori, M., Lamb, L. C., Serafini, L., Spranger, M., & Tran, S. N. (2019). Neural-symbolic computing: an effective methodology for principled integration of machine learning and reasoning. IfCoLoG Journal of Logics and their Applications, 6(4), 611\u2013631.","journal-title":"IfCoLoG Journal of Logics and their Applications"},{"key":"6142_CR14","unstructured":"Dong, H., Mao, J., Lin, T., Wang, C., Li, L., & Zhou, D. (2019). Neural logic machines. In 7th International Conference on Learning Representations, ICLR 2019. Retrieved from https:\/\/arxiv.org\/abs\/1904.11694."},{"key":"6142_CR15","unstructured":"Fawzi, A., Malinowski, M., Fawzi, H., & Fawzi, O. (2019, jun). Learning dy- namic polynomial proofs. Retrieved from http:\/\/arxiv.org\/abs\/1906.01681."},{"key":"6142_CR16","doi-asserted-by":"publisher","first-page":"1031","DOI":"10.1613\/jair.1.12372","volume":"70","author":"D Furelos-Blanco","year":"2021","unstructured":"Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., & Russo, A. (2021). Induction and exploitation of subgoal automata for reinforcement learning. Journal of Artificial Intelligence Research, 70, 1031\u20131116.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6142_CR17","unstructured":"Garnelo, M., Arulkumaran, K., & Shanahan, M. (2016). Towards deep symbolic re-inforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/1609.05518."},{"key":"6142_CR18","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.cobeha.2018.12.010","volume":"29","author":"M Garnelo","year":"2019","unstructured":"Garnelo, M., & Shanahan, M. (2019). Reconciling deep learning with symbolic artificial intelligence: Representing objects and relations. Current Opinion in Behavioral Sciences, 29, 17\u201323. https:\/\/doi.org\/10.1016\/j.cobeha.2018.12.010.","journal-title":"Current Opinion in Behavioral Sciences"},{"issue":"1","key":"6142_CR19","first-page":"274","volume":"57","author":"M Gelfond","year":"2000","unstructured":"Gelfond, M., & Lifschitz, V. (2000). Logic programming: The stable model semantics for logic programming. The Journal of Symbolic Logic, 57(1), 274\u2013277.","journal-title":"The Journal of Symbolic Logic"},{"key":"6142_CR20","unstructured":"Gupta, N., Lin, K., Roth, D., Singh, S., & Gardner, M. (2019). Neural module networks for reasoning over text. Retrieved from https:\/\/arxiv.org\/abs\/1912.04971"},{"key":"6142_CR21","unstructured":"Han, C., Mao, J., Csail, M., Gan, C., Tenenbaum, J. B., Bcs, M., & Wu, J. (n.d.). Visual Concept-Metaconcept Learning (Tech. Rep.). Retrieved from http:\/\/vcml.csail.mit.edu."},{"key":"6142_CR22","doi-asserted-by":"crossref","unstructured":"Hart, P., & Knoll, A. (2020). Graph neural networks and reinforcement learning for behavior generation in semantic environments. Retrieved fromhttps:\/\/arxiv.org\/abs\/2006.12576.","DOI":"10.1109\/IV47402.2020.9304738"},{"key":"6142_CR23","unstructured":"Hasanbeig, M., Jeppu, N. Y., Abate, A., Melham, T., & Kroening, D. (2019). Deep- synth: Program synthesis for automatic task segmentation in deep reinforcement learning. CoRR, abs\/1911.10244. Retrieved from https:\/\/arxiv.org\/abs\/1911.10244"},{"key":"6142_CR45","doi-asserted-by":"publisher","unstructured":"Hengst, B. (2011). Hierarchical reinforcement learning. In Encyclopedia of machine learning (pp. 495\u2013502). Springer US. Retrieved from https:\/\/doi.org\/10.1007\/978-0-387-30164-8_363","DOI":"10.1007\/978-0-387-30164-8_363"},{"key":"6142_CR24","unstructured":"Icarte, R. T., Klassen, T. Q., Valenzano, R., & McIlraith, S. A. (2018). Using reward machines for high-level task specification and decomposition in reinforcement learning. In 35th International Conference on Machine Learning, ICML 2018."},{"key":"6142_CR25","unstructured":"Jiang, J., Dun, C., Huang, T., & Lu, Z. (2018). Graph convolutional Reinforcement Learning. https:\/\/arxiv.org\/abs\/1810.09202"},{"key":"6142_CR26","unstructured":"Juliani, A., Berges, V.-P., Vckay, E., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A general platform for intelligent agents. Retrieved from http:\/\/arxiv.org\/abs\/1809.02627."},{"key":"6142_CR27","unstructured":"Kahneman, D. (2011). Thinking, fast and slow. New York: Far- rar, Straus and Giroux. Retrieved from https:\/\/www.amazon.de\/Thinking-Fast-Slow-Daniel-Kahneman\/dp\/0374275637\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&colid=151193SNGKJT9&coliid=I3OCESLZCVDFL7"},{"key":"6142_CR28","doi-asserted-by":"crossref","unstructured":"Kowalski, R., & Sergot, M. (1989). A logic-based calculus of events. In Foundations of Knowledge Base Management (pp. 23\u201355). Springer.","DOI":"10.1007\/978-3-642-83397-7_2"},{"key":"6142_CR29","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1016\/j.artint.2018.03.005","volume":"259","author":"M Law","year":"2018","unstructured":"Law, M., Russo, A., & Broda, K. (2018). The complexity and generality of learning answer set programs. Artificial Intelligence, 259, 110\u2013146.","journal-title":"Artificial Intelligence"},{"key":"6142_CR30","doi-asserted-by":"crossref","unstructured":"Law, M., Russo, A., & Broda, K. (2020). The ilasp system for inductive learning of answer set programs.","DOI":"10.1007\/978-3-030-31423-1_6"},{"key":"6142_CR31","unstructured":"Liao, Q., & Poggio, T. (2017). Object-oriented deep learning. Retrieved from https:\/\/dspace.mit.edu\/handle\/1721.1\/1121037."},{"key":"6142_CR32","unstructured":"Manhaeve, R., Leuven, K. U., Dumancit, S., Ku Leuven, D., Kimmig, A., Demeester, T., & De Raedt, L. (2018). DeepProbLog: Neural Probabilistic Logic Pro- gramming (Tech. Rep.). Retrieved from https:\/\/bitbucket.org\/problog\/deepproblog."},{"key":"6142_CR33","unstructured":"Mao, J., Gan, C., Kohli, P., Tenenbaum, J. B., & Wu, J. (2019). The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural super- vision. In 7th International Conference on Learning Representations, ICLR 2019."},{"key":"6142_CR34","unstructured":"Marcus, G. (2020). The next decade in AI: Four steps towards robust artificial intelligence. Retrieved from https:\/\/arxiv.org\/abs\/2002.06177."},{"key":"6142_CR35","unstructured":"Minervini, P., Bo\u0161njak, M., Rockt\u00e4schel, T., Riedel, S., & Grefenstette, E. (2019). Differentiable reasoning on large knowledge bases and natural language. Retrieved from http:\/\/arxiv.org\/abs\/1912.10824."},{"key":"6142_CR36","doi-asserted-by":"publisher","unstructured":"Nascimento, J. C., Abrantes, A. J., & Marques, J. S. (1999). Algorithm for centroid- based tracking of moving objects. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 6, 3305\u20133308. https:\/\/doi.org\/10.1109\/icassp.1999.757548.","DOI":"10.1109\/icassp.1999.757548"},{"key":"6142_CR37","doi-asserted-by":"crossref","unstructured":"Sadri, F., & Kowalski, R. A. (1995). Variants of the event calculus. In ICIP (pp. 67\u201381).","DOI":"10.7551\/mitpress\/4298.003.0017"},{"key":"6142_CR38","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","volume":"7839","author":"J Schrittwieser","year":"2020","unstructured":"Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., et al. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 7839, 604\u2013609.","journal-title":"Nature"},{"key":"6142_CR39","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. Retrieved from http:\/\/arxiv.org\/abs\/1707.06347"},{"key":"6142_CR40","unstructured":"Shanahan, M., Nikiforou, K., Deepmind, A. C., Kaplanis, C., Deepmind, D. B., & Deepmind, M. G. (2020). An explicitly relational neural network architecture. Retrieved from https:\/\/arxiv.org\/abs\/1905.10307"},{"issue":"4","key":"6142_CR41","doi-asserted-by":"publisher","DOI":"10.3233\/aic-2011-0508","volume":"24","author":"S Srivastava","year":"2011","unstructured":"Srivastava, S. (2011). Foundations and applications of generalized planning. AI Communications, 24(4), 349351. https:\/\/doi.org\/10.3233\/aic-2011-0508","journal-title":"AI Communications"},{"key":"6142_CR42","unstructured":"Sun, S.-H., Wu, T.-L., & Lim, J. J. (2020). Program guided agent. Retrieved from https:\/\/openreview.net\/forum?id=BkxUvnEYDH"},{"key":"6142_CR43","volume-title":"Reinforcement learning: An introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press."},{"key":"6142_CR44","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181\u2013211.","journal-title":"Artificial Intelligence"},{"key":"6142_CR46","doi-asserted-by":"crossref","unstructured":"Xu, Z., Gavran, I., Ahmad, Y., Majumdar, R., Neider, D., Topcu, U., & Wu, B. (2020). Joint inference of reward machines and policies for reinforcement learning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 30, pp. 590\u2013598).","DOI":"10.1609\/icaps.v30i1.6756"},{"key":"6142_CR47","unstructured":"Yang, Y., Inala, J. P., Bastani, O., Pu, Y., Solar-Lezama, A., & Rinard, M. (2021). Program synthesis guided reinforcement learning."},{"key":"6142_CR48","doi-asserted-by":"publisher","unstructured":"Yi, K., Wu, J., Gan, C., Torralba, A., Deepmind, P. K., & Tenenbaum, J. B. (n.d.). Neural-symbolic VQA: Disentangling reasoning from vision and language understanding (Tech. Rep.). Retrieved from https:\/\/link.springer.com\/, https:\/\/doi.org\/10.1007\/978-0-387-30164-8_363.","DOI":"10.1007\/978-0-387-30164-8_363"},{"key":"6142_CR49","unstructured":"Zamani, M. A., Magg, S., Weber, C., & Wermter, S. (2017). Deep reinforcement learning using symbolic representation for performing spoken language instructions (Tech. Rep.). Retrieved from https:\/\/code.facebook.com\/posts\/181565595577955\/introducing."},{"key":"6142_CR50","unstructured":"Zhang, Q., & Sornette, D. (2017). Learning like humans with Deep Symbolic Networks. Retrieved from http:\/\/arxiv.org\/abs\/1707.03377."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06142-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-022-06142-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06142-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,21]],"date-time":"2024-09-21T19:49:32Z","timestamp":1726948172000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-022-06142-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4]]},"references-count":50,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["6142"],"URL":"https:\/\/doi.org\/10.1007\/s10994-022-06142-7","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4]]},"assertion":[{"value":"2 June 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 October 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}