{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:26:41Z","timestamp":1766068001912,"version":"3.37.3"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,6,26]],"date-time":"2024-06-26T00:00:00Z","timestamp":1719360000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,26]],"date-time":"2024-06-26T00:00:00Z","timestamp":1719360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>A significant challenge for real-world automated vehicles (AVs) is their interaction with human pedestrians. This paper develops a methodology to directly elicit the AV behaviour pedestrians find suitable by collecting quantitative data that can be used to measure and improve an algorithm's performance. Starting with a Deep Q Network (DQN) trained on a simple Pygame\/Python-based pedestrian crossing environment, the reward structure was adapted to allow adjustment by human feedback. Feedback was collected by eliciting behavioural judgements collected from people in a controlled environment. The reward was shaped by the inter-action vector, decomposed into feature aspects for relevant behaviours, thereby facilitating both implicit preference selection and explicit task discovery in tandem. Using computational RL and behavioural-science techniques, we harness a formal iterative feedback loop where the rewards were repeatedly adapted based on human behavioural judgments. Experiments were conducted with 124 participants that showed strong initial improvement in the judgement of AV behaviours with the adaptive reward structure. The results indicate that the primary avenue for enhancing vehicle behaviour lies in the predictability of its movements when introduced. More broadly, recognising AV behaviours that receive favourable human judgments can pave the way for enhanced performance.<\/jats:p>","DOI":"10.1007\/s10458-024-09659-4","type":"journal-article","created":{"date-parts":[[2024,6,26]],"date-time":"2024-06-26T03:39:51Z","timestamp":1719373191000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Assimilating human feedback from autonomous vehicle interaction in reinforcement learning models"],"prefix":"10.1007","volume":"38","author":[{"given":"Richard","family":"Fox","sequence":"first","affiliation":[]},{"given":"Elliot A.","family":"Ludvig","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,6,26]]},"reference":[{"issue":"6293","key":"9659_CR1","doi-asserted-by":"publisher","first-page":"1573","DOI":"10.1126\/science.aaf2654","volume":"352","author":"J-F Bonnefon","year":"2016","unstructured":"Bonnefon, J.-F., Shariff, A., & Rahwan, I. (2016). The social dilemma of autonomous vehicles. Science, 352(6293), 1573\u20131576. https:\/\/doi.org\/10.1126\/science.aaf2654","journal-title":"Science"},{"key":"9659_CR2","unstructured":"Pal, A., Philion, J., Liao, Y.-H., & Fidler, S. (2020). Emergent road rules in multi-agent driving environments. ArXiv201110753 Cs. Accessed: 23 Feb, 2021. Available: http:\/\/arxiv.org\/abs\/2011.10753"},{"issue":"2","key":"9659_CR3","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1016\/j.tics.2017.11.008","volume":"22","author":"N Chater","year":"2018","unstructured":"Chater, N., Misyak, J., Watson, D., Griffiths, N., & Mouzakitis, A. (2018). Negotiating the traffic: can cognitive science help make autonomous vehicles a reality? Trends in Cognitive Sciences, 22(2), 93\u201395. https:\/\/doi.org\/10.1016\/j.tics.2017.11.008","journal-title":"Trends in Cognitive Sciences"},{"key":"9659_CR4","doi-asserted-by":"publisher","first-page":"406","DOI":"10.1016\/j.trf.2019.09.016","volume":"66","author":"OT Ritchie","year":"2019","unstructured":"Ritchie, O. T.,\u00a0Watson, D. G., Griffiths, N., Misyak, J., Chater, N., Xu, Z., & Mouzakitis, A. (2019). How should autonomous vehicles overtake other drivers? Transportation Research Part F: Traffic Psychology and Behaviour, 66, 406\u2013418. https:\/\/doi.org\/10.1016\/j.trf.2019.09.016","journal-title":"Transportation Research Part F: Traffic Psychology and Behaviour"},{"key":"9659_CR5","unstructured":"Knox, W. B., Allievi, A., Banzhaf, H., Schmitt, F., & Stone, P. (2021). Reward (Mis)design for autonomous driving. ArXiv210413906 Cs. Accessed 26 Jul, 2021. Available: http:\/\/arxiv.org\/abs\/2104.13906"},{"issue":"9","key":"9659_CR6","doi-asserted-by":"publisher","first-page":"3634","DOI":"10.1109\/TITS.2019.2930310","volume":"21","author":"J Li","year":"2020","unstructured":"Li, J., Zhan, W., Hu, Y., & Tomizuka, M. (2020). Generic tracking and probabilistic prediction framework and its application in autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 21(9), 3634\u20133649. https:\/\/doi.org\/10.1109\/TITS.2019.2930310","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"9659_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.trc.2021.103008","volume":"125","author":"X Di","year":"2021","unstructured":"Di, X., & Shi, R. (2021). A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning. Transportation Research Part C: Emerging Technologies, 125, 103008. https:\/\/doi.org\/10.1016\/j.trc.2021.103008","journal-title":"Transportation Research Part C: Emerging Technologies"},{"key":"9659_CR8","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3054625","author":"BR Kiran","year":"2021","unstructured":"Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., & P\u00e9rez, P. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems. https:\/\/doi.org\/10.1109\/TITS.2021.3054625","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"9659_CR9","unstructured":"Zhou, M., Liu, Z., Sui, P., Li, Y., & Chung, Y. Y. (2020). Learning implicit credit assignment for cooperative multi-agent reinforcement learning. ArXiv200702529 Cs Stat. Accessed 23 Feb, 2021. Available: http:\/\/arxiv.org\/abs\/2007.02529"},{"key":"9659_CR10","unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: an introduction, 2nd edn. Adaptive computation and machine learning series. The MIT Press"},{"key":"9659_CR11","unstructured":"Brown, D., Goo, W., Nagarajan, P., Niekum, S. (2019). Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In Proceedings of the 36th international conference on machine learning (pp. 783\u2013792). PMLR. Accessed 22 Feb, 2023. Available: https:\/\/proceedings.mlr.press\/v97\/brown19a.html"},{"issue":"10","key":"9659_CR12","doi-asserted-by":"publisher","first-page":"1296","DOI":"10.1177\/0278364915581193","volume":"34","author":"A Jain","year":"2015","unstructured":"Jain, A., Sharma, S., Joachims, T., & Saxena, A. (2015). Learning preferences for manipulation tasks from online coactive feedback. International Journal of Robotics Research, 34(10), 1296\u20131313. https:\/\/doi.org\/10.1177\/0278364915581193","journal-title":"International Journal of Robotics Research"},{"key":"9659_CR13","unstructured":"Suresh, J., Creech, C., Tilbury, D., Yang, X. J., Pradhan, A., Tsui, K., & Robert, L. (2019). Pedestrian trust in automated vehicles: role of traffic signal and Av driving behaviour. Social Science Research Network, Rochester, NY, SSRN Scholarly Paper ID 3478133. Accessed: 03 Feb, 2022. Available: https:\/\/papers.ssrn.com\/abstract=3478133"},{"key":"9659_CR14","doi-asserted-by":"publisher","unstructured":"Suresh, J., Robert, L., Yang, J., & Tilbury, D. (2021). Automated vehicle behavior design for pedestrian interactions at unsignalized crosswalks. Rochester, NY. https:\/\/doi.org\/10.2139\/ssrn.3859366.","DOI":"10.2139\/ssrn.3859366"},{"key":"9659_CR15","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. ArXiv13125602 Cs. Accessed: 16 Mar, 2021. Available: http:\/\/arxiv.org\/abs\/1312.5602"},{"key":"9659_CR16","unstructured":"Shinners, P. (2011). PyGame\u2014Python game development. Available: http:\/\/www.pygame.org"},{"key":"9659_CR17","doi-asserted-by":"publisher","unstructured":"Huang, S., & Onta\u00f1\u00f3n, S. (2022). A closer look at invalid action masking in policy gradient algorithms. The International FLAIRS Conference Proceedings, 35. https:\/\/doi.org\/10.32473\/flairs.v35i.130584.","DOI":"10.32473\/flairs.v35i.130584"},{"key":"9659_CR18","unstructured":"Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N., Stable-Baselines3: Reliable reinforcement learning implementations."},{"issue":"1\u20132","key":"9659_CR19","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.jneumeth.2006.11.017","volume":"162","author":"JW Peirce","year":"2007","unstructured":"Peirce, J. W. (2007). PsychoPy\u2014Psychophysics software in Python. Journal of Neuroscience Methods, 162(1\u20132), 8\u201313. https:\/\/doi.org\/10.1016\/j.jneumeth.2006.11.017","journal-title":"Journal of Neuroscience Methods"},{"key":"9659_CR20","unstructured":"Fox, R., Ludvig, E.A. (2022). Using human behaviour to guide reward functions for autonomous vehicles. Presented at the RLDM 2022 \u2013 2.141 (Poster), Providence, Rhode Island. Available: https:\/\/rldm.org\/"},{"key":"9659_CR21","unstructured":"Devlin, S., Georgescu, R., Momennejad, I., Rzepecki, J., Zuniga, E., Costello, G., Leroy, G., Shaw, A., & Hofmann, K. (2021). Navigation turing test (ntt): Learning to evaluate human-like navigation. In International Conference on Machine Learning (pp. 2644\u20132653). PMLR. ArXiv210509637 Cs. Accessed: 26 Jul, 2021. Available: http:\/\/arxiv.org\/abs\/2105.09637"},{"issue":"4","key":"9659_CR22","doi-asserted-by":"publisher","first-page":"1453","DOI":"10.1007\/s00146-022-01607-8","volume":"38","author":"V Yazdanpanah","year":"2023","unstructured":"Yazdanpanah, V., Gerding, E. H., Stein, S., Dastani, M., Jonker, C. M., Norman, T. J., & Ramchurn, S. D. (2023). Reasoning about responsibility in autonomous systems: challenges and opportunities. AI & SOCIETY, 38(4), 1453\u20131464. https:\/\/doi.org\/10.1007\/s00146-022-01607-8","journal-title":"AI & Society"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-024-09659-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-024-09659-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-024-09659-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,13]],"date-time":"2024-11-13T15:20:24Z","timestamp":1731511224000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-024-09659-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,26]]},"references-count":22,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["9659"],"URL":"https:\/\/doi.org\/10.1007\/s10458-024-09659-4","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"type":"print","value":"1387-2532"},{"type":"electronic","value":"1573-7454"}],"subject":[],"published":{"date-parts":[[2024,6,26]]},"assertion":[{"value":"13 June 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 June 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"26"}}