{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T17:32:14Z","timestamp":1779211934941,"version":"3.51.4"},"reference-count":43,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2018,9,17]],"date-time":"2018-09-17T00:00:00Z","timestamp":1537142400000},"content-version":"unspecified","delay-in-days":259,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2018]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Reinforcement learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer\u2019s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied,\n                    <jats:italic>in practice<\/jats:italic>\n                    , by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named\n                    <jats:italic>State Action Similarity Solutions (SASS)<\/jats:italic>\n                    which is based on the notion of similarities in the agent\u2019s state\u2013action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical\n                    <jats:italic>reward shaping<\/jats:italic>\n                    technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single-speedup method with minimal human designer effort overhead.\n                  <\/jats:p>","DOI":"10.1017\/s0269888918000206","type":"journal-article","created":{"date-parts":[[2018,10,1]],"date-time":"2018-10-01T06:47:01Z","timestamp":1538376421000},"source":"Crossref","is-referenced-by-count":16,"title":["Leveraging human knowledge in tabular reinforcement learning: a study of human subjects"],"prefix":"10.48130","volume":"33","author":[{"given":"Ariel","family":"Rosenfeld","sequence":"first","affiliation":[]},{"given":"Moshe","family":"Cohen","sequence":"additional","affiliation":[]},{"given":"Matthew E.","family":"Taylor","sequence":"additional","affiliation":[]},{"given":"Sarit","family":"Kraus","sequence":"additional","affiliation":[]}],"member":"27968","published-online":{"date-parts":[[2018,9,17]]},"reference":[{"key":"S0269888918000206_ref8","doi-asserted-by":"publisher","DOI":"10.1201\/9781439821091"},{"key":"S0269888918000206_ref13","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-012-5322-7"},{"key":"S0269888918000206_ref37","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"1998"},{"key":"S0269888918000206_ref11","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2007.899419"},{"key":"S0269888918000206_ref15","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2012.2188528"},{"key":"S0269888918000206_ref12","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-4115(08)62386-9"},{"key":"S0269888918000206_ref35","unstructured":"Stone P. , Kuhlmann G. , Taylor M. E. & Liu Y. 2006. Keepaway soccer: from machine learning testbed to benchmark. In RoboCup-2005: Robot Soccer World Cup IX, I. Noda, A. Jacoff, A. Bredenfeld & Y. Takahashi (eds). Springer Verlag 4020, 93\u2013105."},{"key":"S0269888918000206_ref30","unstructured":"Rosenfeld A. , Taylor M. E. & Kraus S. 2017a. Leveraging human knowledge in tabular reinforcement learning: a study of human subjects. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19\u201325, 2017, 3823\u20133830."},{"key":"S0269888918000206_ref17","unstructured":"Knox W. B. & Stone P. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of AAMAS."},{"key":"S0269888918000206_ref19","unstructured":"Littman M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. ICML 157, 157\u2013163."},{"key":"S0269888918000206_ref36","unstructured":"Suay H. B. , Brys T. , Taylor M. E. & Chernova S. 2016. Learning from demonstration for shaping through inverse reinforcement learning. In AAMAS, 429\u2013437."},{"key":"S0269888918000206_ref39","doi-asserted-by":"crossref","unstructured":"Tamassia M. , Zambetta F. , Raffe W. , Mueller F. & Li X. 2016. Dynamic choice of state abstraction in q-learning. In ECAI.","DOI":"10.3233\/978-1-61499-672-9-46"},{"key":"S0269888918000206_ref2","unstructured":"Benda M. 1985. On Optimal Cooperation of Knowledge Sources. Technical report BCS-G2010-28."},{"key":"S0269888918000206_ref16","volume-title":"Personal Construct Psychology","author":"Kelly","year":"1955"},{"key":"S0269888918000206_ref21","unstructured":"Mataric M. J. 1994. Reward functions for accelerated learning. In Machine Learning: Proceedings of the Eleventh International Conference, 181\u2013189."},{"key":"S0269888918000206_ref9","unstructured":"Devlin S. , Grze\u00b4s M. & Kudenko D. 2011. Multi-agent, reward shaping for robocup keepaway. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, 1227\u20131228. International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"S0269888918000206_ref1","volume-title":"Brains, Behavior and Robotics","author":"Albus","year":"1981"},{"key":"S0269888918000206_ref18","unstructured":"Leffler B. R. , Littman M. L. & Edmunds T. 2007. Efficient reinforcement learning with relocatable action models. AAAI 7, 572\u2013577."},{"key":"S0269888918000206_ref6","unstructured":"Brys T. , Harutyunyan A. , Suay H. B. , Chernova S. , Taylor M. E. & Now\u00e9 A. 2015. Reinforcement learning from demonstration through shaping. In IJCAI, 3352\u20133358."},{"key":"S0269888918000206_ref40","first-page":"2133","article-title":"RL-Glue : language-independent software for reinforcement-learning experiments","volume":"10","author":"Tanner","year":"2009","journal-title":"Journal of Machine Learning Research"},{"key":"S0269888918000206_ref23","unstructured":"Narayanamurthy S. M. & Ravindran B. 2008. On the hardness of finding symmetries in Markov decision processes. In ICML, 688\u2013695."},{"key":"S0269888918000206_ref14","doi-asserted-by":"crossref","unstructured":"Jong N. K. & Stone P. 2007. Model-based function approximation in reinforcement learning. In AAMAS, 95. ACM.","DOI":"10.1145\/1329125.1329242"},{"key":"S0269888918000206_ref22","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"S0269888918000206_ref10","unstructured":"Geramifard A. , Klein R. H. , Dann C. , Dabney W. & How J. P. 2013. RLPy: The Reinforcement Learning Library for Education and Research. http:\/\/acl.mit.edu\/RLPy."},{"key":"S0269888918000206_ref5","first-page":"119","article-title":"Going beyond the information given","volume":"1","author":"Bruner","year":"1957","journal-title":"Contemporary Approaches to Cognition"},{"key":"S0269888918000206_ref7","unstructured":"Brys T. , Now\u00e9 A. , Kudenko D. & Taylor M. E. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In AAAI, 1687\u20131693."},{"key":"S0269888918000206_ref24","unstructured":"Ng A. Y. , Harada D. & Russell S. 1999. Policy invariance under reward transformations: theory and application to reward shaping. ICML. 99, 278\u2013287."},{"key":"S0269888918000206_ref25","unstructured":"Peng B. , MacGlashan J. , Loftin R. , Littman M. L. , Roberts D. L. & Taylor M. E. 2016. A need for speed: adapting agent action speed to improve task learning from non-expert humans. In AAMAS, 957\u2013965."},{"key":"S0269888918000206_ref26","unstructured":"Randl\u00f8v J. & Alstr\u00f8m P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. ICML 98, 463\u2013471."},{"key":"S0269888918000206_ref28","unstructured":"Ribeiro C. & Szepesv\u00b4ari C. 1996. Q-learning combined with spreading: convergence and results. In Proceedings of the ISRF-IEE International Conference on Intelligent and Cognitive Systems (Neural Networks Symposium), 32\u201336."},{"key":"S0269888918000206_ref29","doi-asserted-by":"publisher","DOI":"10.2200\/S00820ED1V01Y201712AIM036"},{"key":"S0269888918000206_ref31","unstructured":"Rosenfeld A. , Taylor M. E. & Kraus S. 2017b. Speeding up tabular reinforcement learning using stateaction similarities. In AAMAS, 1722\u20131724."},{"key":"S0269888918000206_ref32","unstructured":"Schaul T. , Bayer J. , Wierstra D. , Sun Y. , Felder M. , Sehnke F. , R\u00fcckstie\u00df T & Schmidhuber J. 2010. PyBrain, Journal of Machine Learning Research 11, 743\u2013746."},{"key":"S0269888918000206_ref33","doi-asserted-by":"crossref","unstructured":"Sequeira P. , Melo F. S. & Paiva A. 2013. An associative state-space metric for learning in factored mdps. In Portuguese Conference on Artificial Intelligence, 163\u2013174. Springer.","DOI":"10.1007\/978-3-642-40669-0_15"},{"key":"S0269888918000206_ref34","doi-asserted-by":"publisher","DOI":"10.1037\/h0049039"},{"key":"S0269888918000206_ref38","doi-asserted-by":"publisher","DOI":"10.1162\/089976699300016070"},{"key":"S0269888918000206_ref41","unstructured":"Watkins C. J. C. H. 1989. Learning from Delayed Rewards. PhD thesis, University of Cambridge."},{"key":"S0269888918000206_ref42","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques","author":"Witten","year":"2016"},{"key":"S0269888918000206_ref20","doi-asserted-by":"crossref","unstructured":"Martins M. F. & Bianchi R. A. 2013. Heuristically-accelerated reinforcement learning: a comparative analysis of performance. In Conference Towards Autonomous Robotic Systems, 15\u201327. Springer.","DOI":"10.1007\/978-3-662-43645-5_2"},{"key":"S0269888918000206_ref4","unstructured":"Brockman G. , Cheung V. , Pettersson L. , Schneider J. , Schulman J. , Tang J. & Zaremba W. 2016. Openai gym. https:\/\/gym.openai.com (accessed 24 October 2017)."},{"key":"S0269888918000206_ref43","unstructured":"Zinkevich M. & Balch T. 2001. Symmetry in Markov decision processes and its implications for single agent and multi agent learning. In ICML."},{"key":"S0269888918000206_ref3","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2013.2253094"},{"key":"S0269888918000206_ref27","unstructured":"Ribeiro C. H. 1995. Attentional mechanisms as a strategy for generalisation in the q-learning algorithm. Proceedings of ICANN 95, 455\u2013460."}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888918000206","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:09Z","timestamp":1767624129000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888918000206\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018]]},"references-count":43,"alternative-id":["S0269888918000206"],"URL":"https:\/\/doi.org\/10.1017\/s0269888918000206","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018]]},"article-number":"e14"}}