{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T16:16:10Z","timestamp":1774455370871,"version":"3.50.1"},"reference-count":51,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T00:00:00Z","timestamp":1726358400000},"content-version":"vor","delay-in-days":366,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"name":"NSF","award":["IIS- 1914792, DMS-1664644, CNS-1645681 and NSF-182"],"award-info":[{"award-number":["IIS- 1914792, DMS-1664644, CNS-1645681 and NSF-182"]}]},{"DOI":"10.13039\/100007297","name":"Office of Naval Research Global","doi-asserted-by":"publisher","award":["N00014-17-1-2304"],"award-info":[{"award-number":["N00014-17-1-2304"]}],"id":[{"id":"10.13039\/100007297","id-type":"DOI","asserted-by":"publisher"}]},{"name":"ONR","award":["N00014-19-1-2571 and N00014-21-1-2844"],"award-info":[{"award-number":["N00014-19-1-2571 and N00014-21-1-2844"]}]},{"name":"NIH","award":["R01 GM135930 and UL54 TR004130"],"award-info":[{"award-number":["R01 GM135930 and UL54 TR004130"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:p> We develop a framework to learn bio-inspired foraging policies using human data. We conduct an experiment where humans are virtually immersed in an open field foraging environment and are trained to collect the highest amount of rewards. A Markov Decision Process (MDP) framework is introduced to model the human decision dynamics. Then, Imitation Learning (IL) based on maximum likelihood estimation is used to train Neural Networks (NN) that map human decisions to observed states. The results show that passive imitation substantially underperforms humans. We further refine the human-inspired policies via Reinforcement Learning (RL) using the on-policy Proximal Policy Optimization (PPO) algorithm which shows better stability than other algorithms and can steadily improve the policies pre-trained with IL. We show that the combination of IL and RL match human performance and that the artificial agents trained with our approach can quickly adapt to reward distribution shift. We finally show that good performance and robustness to reward distribution shift strongly depend on combining allocentric information with an egocentric representation of the environment. <\/jats:p>","DOI":"10.1177\/10597123231201655","type":"journal-article","created":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T09:38:59Z","timestamp":1694770739000},"page":"251-263","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["Combining imitation and deep reinforcement learning to human-level performance on a virtual foraging task"],"prefix":"10.1177","volume":"32","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7082-3262","authenticated-orcid":false,"given":"Vittorio","family":"Giammarino","sequence":"first","affiliation":[{"name":"Division of Systems Engineering, Boston University, Boston, MA, USA"}]},{"given":"Matthew F","family":"Dunne","sequence":"additional","affiliation":[{"name":"Cognitive Neuroimaging Center, Boston University, Boston, MA, USA"},{"name":"Graduate Program for Neuroscience, Boston University, Boston, MA, USA"},{"name":"Center for Systems Neuroscience, Boston University, Boston, MA, USA"}]},{"given":"Kylie N","family":"Moore","sequence":"additional","affiliation":[{"name":"Cognitive Neuroimaging Center, Boston University, Boston, MA, USA"},{"name":"Graduate Program for Neuroscience, Boston University, Boston, MA, USA"},{"name":"Center for Systems Neuroscience, Boston University, Boston, MA, USA"}]},{"given":"Michael E","family":"Hasselmo","sequence":"additional","affiliation":[{"name":"Center for Systems Neuroscience, Boston University, Boston, MA, USA"}]},{"given":"Chantal E","family":"Stern","sequence":"additional","affiliation":[{"name":"Cognitive Neuroimaging Center, Boston University, Boston, MA, USA"},{"name":"Graduate Program for Neuroscience, Boston University, Boston, MA, USA"}]},{"given":"Ioannis Ch","family":"Paschalidis","sequence":"additional","affiliation":[{"name":"Division of Systems Engineering, Boston University, Boston, MA, USA"},{"name":"Department of Electrical and Computer Engineering, Boston University, Boston, MA, USA"},{"name":"Department of Biomedical Engineering, Boston University, Boston, MA, USA"}]}],"member":"179","published-online":{"date-parts":[[2023,9,15]]},"reference":[{"key":"bibr1-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910371999"},{"key":"bibr2-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"bibr3-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.aaz2322"},{"key":"bibr4-10597123231201655","volume-title":"What matters in on-policy reinforcement learning? A large-scale empirical study","author":"Andrychowicz M.","year":"2020"},{"key":"bibr5-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2019.02.006"},{"key":"bibr6-10597123231201655","volume-title":"Uncertainty in artificial intelligence","author":"Cheng C.","year":"2019"},{"key":"bibr7-10597123231201655","volume-title":"Challenges of real-world reinforcement learning","author":"Dulac-Arnold G.","year":"2019"},{"key":"bibr8-10597123231201655","unstructured":"Engstrom L., Ilyas A., Santurkar S., Tsipras D., Janoos F., Rudolph L., Madry A. (2020). Implementation matters in deep policy gradients: A case study on ppo and trpo. In: International Conference ON Learning Representations. Virtual Conference, 2020."},{"key":"bibr9-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1037\/0894-4105.18.3.462"},{"key":"bibr10-10597123231201655","unstructured":"Finn C., Levine S., Abbeel P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning. New York City, NY, USA, 2016, (pp. 49\u201358)."},{"key":"bibr11-10597123231201655","unstructured":"Fujimoto S., Hoof H., Meger D. (2018). Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning. Stockholmsm\u00e4ssan, Stockholm, Sweden, 2018, (pp. 1587\u20131596)."},{"key":"bibr12-10597123231201655","unstructured":"Ghasemipour S. K. S., Zemel R., Gu S. (2020). A divergence minimization perspective on imitation learning methods. In: Conference on Robot Learning, Virtual Event \/ Cambridge, MA, USA, 2020, (pp. 1259\u20131277)."},{"key":"bibr13-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1111\/cdev.13412"},{"key":"bibr14-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"bibr15-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1177\/0963721414556653"},{"key":"bibr16-10597123231201655","unstructured":"Haarnoja T., Zhou A., Abbeel P., Levine S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning. Stockholmsm\u00e4ssan, Stockholm, Sweden, 2018, (pp. 1861\u20131870)."},{"key":"bibr17-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"bibr18-10597123231201655","author":"Ho J.","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr19-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1098\/rstb.2009.0045"},{"key":"bibr20-10597123231201655","unstructured":"Kang B., Jie Z., Feng J. (2018). Policy optimization with demonstrations. In: International Conference on Machine Learning, Stockholmsm\u00e4ssan, Stockholm, Sweden, 2018, (pp. 2469\u20132478)."},{"key":"bibr21-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-05181-4_10"},{"key":"bibr22-10597123231201655","volume-title":"Offline reinforcement learning: Tutorial, review, and perspectives on open problems","author":"Levine S.","year":"2020"},{"key":"bibr23-10597123231201655","unstructured":"Libardi G., De Fabritiis G., Dittert S. (2021). Guided exploration with proximal policy optimization using a single demonstration. In: International Conference on Machine Learning. Virtual Event, 2021, (pp. 6611\u20136620)."},{"key":"bibr24-10597123231201655","volume-title":"Playing atari with deep reinforcement learning","author":"Mnih V.","year":"2013"},{"key":"bibr25-10597123231201655","volume-title":"Virtual human foraging behavior follows predictions for heavy-tailed search","author":"Moore K.","year":"2021"},{"key":"bibr26-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8463162"},{"key":"bibr27-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-2681(97)00109-1"},{"key":"bibr28-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2013.6696804"},{"key":"bibr29-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1991.3.1.88"},{"key":"bibr30-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i11.17130"},{"key":"bibr31-10597123231201655","volume-title":"Learning complex dexterous manipulation with deep reinforcement learning and demonstrations","author":"Rajeswaran A.","year":"2017"},{"key":"bibr32-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143936"},{"key":"bibr33-10597123231201655","unstructured":"Ross S., Bagnell D. (2010). Efficient reductions for imitation learning. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, 2010, (pp. 661\u2013668)."},{"key":"bibr34-10597123231201655","volume-title":"Reinforcement and imitation learning via interactive no-regret learning","author":"Ross S.","year":"2014"},{"key":"bibr35-10597123231201655","unstructured":"Ross S., Gordon G., Bagnell D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 627\u2013635)."},{"key":"bibr36-10597123231201655","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/uzdvp"},{"key":"bibr37-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1016\/S1364-6613(99)01327-3"},{"key":"bibr38-10597123231201655","unstructured":"Schulman J., Levine S., Abbeel P., Jordan M., Moritz P. (2015). Trust region policy optimization. In: International Conference on Machine learning. Lille, France, 2015, (pp. 1889\u20131897)."},{"key":"bibr39-10597123231201655","volume-title":"Proximal policy optimization algorithms","author":"Schulman J.","year":"2017"},{"key":"bibr40-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1109\/WOWMOM.2010.5534926"},{"key":"bibr41-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1177\/1059712319859650"},{"key":"bibr42-10597123231201655","doi-asserted-by":"publisher","DOI":"10.1038\/nature24270"},{"key":"bibr43-10597123231201655","unstructured":"Subramanian K., Isbell C. L.Jr, Thomaz A. L. (2016). Exploration from demonstration for interactive reinforcement learning. In: Proceedings of the 2016 International Conference on autonomous agents & Multiagent systems. Singapore, 2016, (pp. 447\u2013456)."},{"key":"bibr44-10597123231201655","unstructured":"Sun W., Bagnell J. A., Boots B. (2018). Truncated horizon policy search: Combining reinforcement learning and imitation learning. In: International Conference on learning representations. Vancouver Convention Center, Vancouver, Canada, 2018."},{"key":"bibr45-10597123231201655","volume-title":"Reinforcement learning: An introduction","author":"Sutton R. S.","year":"2018"},{"key":"bibr46-10597123231201655","author":"Syed U.","year":"2010","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr47-10597123231201655","volume-title":"NIPS Workshop on Robot Learning: Self-Supervised and Lifelong Learning","author":"Uchendu I.","year":"2021"},{"key":"bibr48-10597123231201655","volume-title":"Jump-start reinforcement learning","author":"Uchendu I.","year":"2022"},{"key":"bibr49-10597123231201655","volume-title":"Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards","author":"Vecerik M.","year":"2017"},{"key":"bibr50-10597123231201655","unstructured":"Walker C. M., Williams J. J., Lombrozo T., Gopnik A. (2012). Explaining influences children\u2019s reliance on evidence and prior knowledge in causal induction. In: Proceedings of the Annual Meeting of the Cognitive Science Society. Sapporo, Japan, 2012."},{"key":"bibr51-10597123231201655","unstructured":"Ziebart B. D., Maas A. L., Bagnell J. A., Dey A. K. (2008). Maximum entropy inverse reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence. Chicago, IL, 2008."}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10597123231201655","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10597123231201655","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10597123231201655","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10597123231201655","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T10:03:31Z","timestamp":1740823411000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10597123231201655"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,15]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["10.1177\/10597123231201655"],"URL":"https:\/\/doi.org\/10.1177\/10597123231201655","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,15]]}}}