{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:14:38Z","timestamp":1772907278810,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":117,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,7,3]],"date-time":"2020-07-03T00:00:00Z","timestamp":1593734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,7,3]]},"DOI":"10.1145\/3357236.3395525","type":"proceedings-article","created":{"date-parts":[[2020,7,5]],"date-time":"2020-07-05T19:29:16Z","timestamp":1593977356000},"page":"1195-1209","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":65,"title":["A Survey on Interactive Reinforcement Learning"],"prefix":"10.1145","author":[{"given":"Christian","family":"Arzate Cruz","sequence":"first","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}]},{"given":"Takeo","family":"Igarashi","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2020,7,3]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_3_2_1_2_1","volume-title":"Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079","author":"Abel David","year":"2017","unstructured":"David Abel , John Salvatier , Andreas Stuhlm\u00fcller , and Owain Evans . 2017. Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079 ( 2017 ). David Abel, John Salvatier, Andreas Stuhlm\u00fcller, and Owain Evans. 2017. Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079 (2017)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2870052"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1018410.1018852"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v35i4.2513"},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1168--1176","author":"Amir Dan","year":"2018","unstructured":"Dan Amir and Ofra Amir . 2018 . Highlights: Summarizing agent behavior to people . In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1168--1176 . Dan Amir and Ofra Amir. 2018. Highlights: Summarizing agent behavior to people. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1168--1176."},{"key":"e_1_3_2_1_7_1","unstructured":"Ofra Amir Ece Kamar Andrey Kolobov and Barbara Grosz. 2016. Interactive Teaching Strategies for Agent Training. In In Proceedings of IJCAI 2016 (in proceedings of ijcai 2016 ed.). https:\/\/www.microsoft.com\/en-us\/research\/publication\/interactive-teaching-strategies-agent-training\/  Ofra Amir Ece Kamar Andrey Kolobov and Barbara Grosz. 2016. Interactive Teaching Strategies for Agent Training. In In Proceedings of IJCAI 2016 (in proceedings of ijcai 2016 ed.). https:\/\/www.microsoft.com\/en-us\/research\/publication\/interactive-teaching-strategies-agent-training\/"},{"key":"e_1_3_2_1_8_1","volume-title":"Concrete problems in AI safety. arXiv preprint arXiv:1606.06565","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei , Chris Olah , Jacob Steinhardt , Paul Christiano , John Schulman , and Dan Man\u00e9 . 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 ( 2016 ). Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man\u00e9. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Sule Anjomshoae Kary Fr\u00e4mling and Amro Najjar. 2019. Explanations of Black-Box Model Predictions by Contextual Importance and Utility.  Sule Anjomshoae Kary Fr\u00e4mling and Amro Najjar. 2019. Explanations of Black-Box Model Predictions by Contextual Importance and Utility.","DOI":"10.1007\/978-3-030-30391-4_6"},{"key":"e_1_3_2_1_10_1","volume-title":"DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback. arXiv preprint arXiv:1810.11748","author":"Arakawa Riku","year":"2018","unstructured":"Riku Arakawa , Sosuke Kobayashi , Yuya Unno , Yuta Tsuboi , and Shin-ichi Maeda. 2018. DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback. arXiv preprint arXiv:1810.11748 ( 2018 ). Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, and Shin-ichi Maeda. 2018. DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback. arXiv preprint arXiv:1810.11748 (2018)."},{"key":"e_1_3_2_1_11_1","volume-title":"Miles Brundage, and Anil Anthony Bharath.","author":"Arulkumaran Kai","year":"2017","unstructured":"Kai Arulkumaran , Marc Peter Deisenroth , Miles Brundage, and Anil Anthony Bharath. 2017 . A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017). Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. 2017. A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866 (2017)."},{"key":"e_1_3_2_1_12_1","volume-title":"Sophie Saskin, and Michael L Littman.","author":"Arumugam Dilip","year":"2019","unstructured":"Dilip Arumugam , Jun Ki Lee , Sophie Saskin, and Michael L Littman. 2019 . Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257 (2019). Dilip Arumugam, Jun Ki Lee, Sophie Saskin, and Michael L Littman. 2019. Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257 (2019)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.3390\/app8122453"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2717316"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2009.07.008"},{"key":"e_1_3_2_1_16_1","volume-title":"Carlos HC Ribeiro, and Anna HR Costa","author":"Bianchi Reinaldo AC","year":"2013","unstructured":"Reinaldo AC Bianchi , Murilo F Martins , Carlos HC Ribeiro, and Anna HR Costa . 2013 . Heuristically-accelerated multiagent reinforcement learning. IEEE transactions on cybernetics 44, 2 (2013), 252--265. Reinaldo AC Bianchi, Murilo F Martins, Carlos HC Ribeiro, and Anna HR Costa. 2013. Heuristically-accelerated multiagent reinforcement learning. IEEE transactions on cybernetics 44, 2 (2013), 252--265."},{"key":"e_1_3_2_1_17_1","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. OpenAI Gym. (2016).  Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. OpenAI Gym. (2016)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2832581.2832716"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3026044"},{"key":"e_1_3_2_1_20_1","unstructured":"Paul F Christiano Jan Leike Tom Brown Miljan Martic Shane Legg and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems. 4299--4307.  Paul F Christiano Jan Leike Tom Brown Miljan Martic Shane Legg and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems. 4299--4307."},{"key":"e_1_3_2_1_21_1","unstructured":"Jack Clark and Dario Amodei. 2016. Faulty Reward Functions in the Wild. (2016). https:\/\/openai.com\/blog\/faulty-reward-functions\/Accessed: 2019-08--21.  Jack Clark and Dario Amodei. 2016. Faulty Reward Functions in the Wild. (2016). https:\/\/openai.com\/blog\/faulty-reward-functions\/Accessed: 2019-08--21."},{"key":"e_1_3_2_1_22_1","unstructured":"European Commission. 2018. 2018 reform of EU data protection rules. (2018). https:\/\/ec.europa.eu\/commission\/sites\/beta-political\/files\/data-protection-factsheet-changes_en.pdf Accessed: 2019-06-17.  European Commission. 2018. 2018 reform of EU data protection rules. (2018). https:\/\/ec.europa.eu\/commission\/sites\/beta-political\/files\/data-protection-factsheet-changes_en.pdf Accessed: 2019-06-17."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCDS.2016.2543839"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2015.7280477"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 998--1006","author":"Cuccu Giuseppe","year":"2019","unstructured":"Giuseppe Cuccu , Julian Togelius , and Philippe Cudr\u00e9-Mauroux . 2019 . Playing atari with six neurons . In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 998--1006 . Giuseppe Cuccu, Julian Togelius, and Philippe Cudr\u00e9-Mauroux. 2019. Playing atari with six neurons. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 998--1006."},{"key":"e_1_3_2_1_26_1","unstructured":"Richard Dearden Nir Friedman and Stuart Russell. 1998. Bayesian Q-learning. In Aaai\/iaai. 761--768.  Richard Dearden Nir Friedman and Stuart Russell. 1998. Bayesian Q-learning. In Aaai\/iaai. 761--768."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.639"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24873-3_4"},{"key":"e_1_3_2_1_29_1","volume-title":"Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217","author":"Dubey Rachit","year":"2018","unstructured":"Rachit Dubey , Pulkit Agrawal , Deepak Pathak , Thomas L Griffiths , and Alexei A Efros . 2018. Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217 ( 2018 ). Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Thomas L Griffiths, and Alexei A Efros. 2018. Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217 (2018)."},{"key":"e_1_3_2_1_30_1","unstructured":"Francisco Elizalde and Luis Enrique Sucar. 2009. Expert Evaluation of Probabilistic Explanations.. In ExaCt. 1--12.  Francisco Elizalde and Luis Enrique Sucar. 2009. Expert Evaluation of Probabilistic Explanations.. In ExaCt. 1--12."},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the 4th European Workshop on Probabilistic Graphical Models (PGM","author":"Elizalde Francisco","year":"2008","unstructured":"Francisco Elizalde , L Enrique Sucar , Manuel Luque , J Diez , and Alberto Reyes . 2008 . Policy explanation in factored Markov decision processes . In Proceedings of the 4th European Workshop on Probabilistic Graphical Models (PGM 2008). 97--104. Francisco Elizalde, L Enrique Sucar, Manuel Luque, J Diez, and Alberto Reyes. 2008. Policy explanation in factored Markov decision processes. In Proceedings of the 4th European Workshop on Probabilistic Graphical Models (PGM 2008). 97--104."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3390\/make1010002"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/604045.604056"},{"key":"e_1_3_2_1_34_1","first-page":"1","article-title":"A Comprehensive Survey on Safe Reinforcement Learning","volume":"16","author":"Garc\u00eda Javier","year":"2015","unstructured":"Javier Garc\u00eda and Fernando Fern\u00e1ndez . 2015 . A Comprehensive Survey on Safe Reinforcement Learning . J. Mach. Learn. Res. 16 , 1 (Jan. 2015), 1437--1480. http:\/\/dl.acm.org\/citation.cfm?id=2789272.2886795 Javier Garc\u00eda and Fernando Fern\u00e1ndez. 2015. A Comprehensive Survey on Safe Reinforcement Learning. J. Mach. Learn. Res. 16, 1 (Jan. 2015), 1437--1480. http:\/\/dl.acm.org\/citation.cfm?id=2789272.2886795","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_1_35_1","volume-title":"Bayesian reinforcement learning: A survey. Foundations and Trends\u00ae in Machine Learning 8, 5--6","author":"Ghavamzadeh Mohammad","year":"2015","unstructured":"Mohammad Ghavamzadeh , Shie Mannor , Joelle Pineau , Aviv Tamar , and others. 2015. Bayesian reinforcement learning: A survey. Foundations and Trends\u00ae in Machine Learning 8, 5--6 ( 2015 ), 359--483. Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar, and others. 2015. Bayesian reinforcement learning: A survey. Foundations and Trends\u00ae in Machine Learning 8, 5--6 (2015), 359--483."},{"key":"e_1_3_2_1_36_1","volume-title":"In Proceedings of the International Conference on Neural Information Processing Systems (NIPS.","author":"Griffith Shane","year":"2013","unstructured":"Shane Griffith , Kaushik Subramanian , Jonathan Scholz , Charles L. Isbell , and Andrea Thomaz . 2013 a. Policy shaping: Integrating human feedback with reinforcement learning . In In Proceedings of the International Conference on Neural Information Processing Systems (NIPS. Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea Thomaz. 2013a. Policy shaping: Integrating human feedback with reinforcement learning. In In Proceedings of the International Conference on Neural Information Processing Systems (NIPS."},{"key":"e_1_3_2_1_37_1","unstructured":"Shane Griffith Kaushik Subramanian Jonathan Scholz Charles L Isbell and Andrea L Thomaz. 2013b. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems. 2625--2633.  Shane Griffith Kaushik Subramanian Jonathan Scholz Charles L Isbell and Andrea L Thomaz. 2013b. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems. 2625--2633."},{"key":"e_1_3_2_1_38_1","first-page":"I","article-title":"Inverse Reward Design","volume":"30","author":"Hadfield-Menell Dylan","year":"2017","unstructured":"Dylan Hadfield-Menell , Smitha Milli , Pieter Abbeel , Stuart J Russell , and Anca Dragan . 2017 . Inverse Reward Design . In Advances in Neural Information Processing Systems 30 , I . Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6765--6774. http:\/\/papers.nips.cc\/paper\/7253-inverse-reward-design.pdf Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan. 2017. Inverse Reward Design. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 6765--6774. http:\/\/papers.nips.cc\/paper\/7253-inverse-reward-design.pdf","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_39_1","unstructured":"Mark K Ho Michael L Littman Fiery Cushman and Joseph L Austerweil. 2015. Teaching with rewards and punishments: Reinforcement or communication?. In CogSci.  Mark K Ho Michael L Littman Fiery Cushman and Joseph L Austerweil. 2015. Teaching with rewards and punishments: Reinforcement or communication?. In CogSci."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s40708-016-0042-6"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-45507-5_6"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-006-0005-z"},{"key":"e_1_3_2_1_43_1","volume-title":"Cobot: A social reinforcement learning agent. In Advances in neural information processing systems. 1393--1400.","author":"Isbell Charles Lee","year":"2002","unstructured":"Charles Lee Isbell Jr and Christian R Shelton . 2002 . Cobot: A social reinforcement learning agent. In Advances in neural information processing systems. 1393--1400. Charles Lee Isbell Jr and Christian R Shelton. 2002. Cobot: A social reinforcement learning agent. In Advances in neural information processing systems. 1393--1400."},{"key":"e_1_3_2_1_44_1","unstructured":"Natasha Jaques Shixiang Gu Richard E Turner and Douglas Eck. 2016. Generating music by fine-tuning recurrent neural networks with reinforcement learning. (2016).  Natasha Jaques Shixiang Gu Richard E Turner and Douglas Eck. 2016. Generating music by fine-tuning recurrent neural networks with reinforcement learning. (2016)."},{"key":"e_1_3_2_1_45_1","volume-title":"Social influence as intrinsic motivation for multi-agent deep reinforcement learning. arXiv preprint arXiv:1810.08647","author":"Jaques Natasha","year":"2018","unstructured":"Natasha Jaques , Angeliki Lazaridou , Edward Hughes , Caglar Gulcehre , Pedro A Ortega , DJ Strouse , Joel Z Leibo , and Nando De Freitas . 2018. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. arXiv preprint arXiv:1810.08647 ( 2018 ). Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A Ortega, DJ Strouse, Joel Z Leibo, and Nando De Freitas. 2018. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. arXiv preprint arXiv:1810.08647 (2018)."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622737.1622748"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22362-4_31"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2012.2188528"},{"key":"e_1_3_2_1_49_1","volume-title":"Pcgrl: Procedural content generation via reinforcement learning. arXiv preprint arXiv:2001.09212","author":"Khalifa Ahmed","year":"2020","unstructured":"Ahmed Khalifa , Philip Bontrager , Sam Earle , and Julian Togelius . 2020 . Pcgrl: Procedural content generation via reinforcement learning. arXiv preprint arXiv:2001.09212 (2020). Ahmed Khalifa, Philip Bontrager, Sam Earle, and Julian Togelius. 2020. Pcgrl: Procedural content generation via reinforcement learning. arXiv preprint arXiv:2001.09212 (2020)."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-012-0163-x"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1597735.1597738"},{"key":"e_1_3_2_1_52_1","volume-title":"Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems","volume":"1","author":"Bradley Knox W","year":"2010","unstructured":"W Bradley Knox and Peter Stone . 2010 . Combining manual feedback with subsequent MDP reward signals for reinforcement learning . In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems : volume 1-Volume 1 . International Foundation for Autonomous Agents and Multiagent Systems, 5--12. W Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12."},{"key":"e_1_3_2_1_53_1","volume-title":"Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 475--482","author":"Bradley Knox W","year":"2012","unstructured":"W Bradley Knox and Peter Stone . 2012 . Reinforcement learning from simultaneous human and MDP reward . In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 475--482 . W Bradley Knox and Peter Stone. 2012. Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 475--482."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-02675-6_46"},{"key":"e_1_3_2_1_55_1","volume-title":"Why: Natural explanations from a robot navigator. arXiv preprint arXiv:1709.09741","author":"Korpan Raj","year":"2017","unstructured":"Raj Korpan , Susan L Epstein , Anoop Aroor , and Gil Dekel . 2017 . Why: Natural explanations from a robot navigator. arXiv preprint arXiv:1709.09741 (2017). Raj Korpan, Susan L Epstein, Anoop Aroor, and Gil Dekel. 2017. Why: Natural explanations from a robot navigator. arXiv preprint arXiv:1709.09741 (2017)."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3277904"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3322276.3322379"},{"key":"e_1_3_2_1_58_1","volume-title":"Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 720--727","author":"Krening Samantha","year":"2019","unstructured":"Samantha Krening and Karen M Feigh . 2019 b. Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning . In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 720--727 . Samantha Krening and Karen M Feigh. 2019b. Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 720--727."},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/2021442.2021502"},{"key":"e_1_3_2_1_60_1","volume-title":"Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871","author":"Leike Jan","year":"2018","unstructured":"Jan Leike , David Krueger , Tom Everitt , Miljan Martic , Vishal Maini , and Shane Legg . 2018. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871 ( 2018 ). Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, and Shane Legg. 2018. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871 (2018)."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/TG.2017.2783361"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300380"},{"key":"e_1_3_2_1_63_1","volume-title":"Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS '13)","author":"Li Guangliang","unstructured":"Guangliang Li , Hayley Hung , Shimon Whiteson , and W. Bradley Knox . 2013. Using Informative Behavior to Increase Engagement in the Tamer Framework . In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS '13) . International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 909--916. http:\/\/dl.acm.org\/citation.cfm?id=2484920.2485064 Guangliang Li, Hayley Hung, Shimon Whiteson, and W. Bradley Knox. 2013. Using Informative Behavior to Increase Engagement in the Tamer Framework. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems (AAMAS '13). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 909--916. http:\/\/dl.acm.org\/citation.cfm?id=2484920.2485064"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-015--9308--2"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-017-9374-8"},{"key":"e_1_3_2_1_66_1","volume-title":"Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541","author":"Li Jiwei","year":"2016","unstructured":"Jiwei Li , Will Monroe , Alan Ritter , Michel Galley , Jianfeng Gao , and Dan Jurafsky . 2016a. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 ( 2016 ). Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016a. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 (2016)."},{"key":"e_1_3_2_1_67_1","volume-title":"Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8, 3--4","author":"Lin Long-Ji","year":"1992","unstructured":"Long-Ji Lin . 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8, 3--4 ( 1992 ), 293--321. Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8, 3--4 (1992), 293--321."},{"key":"e_1_3_2_1_68_1","first-page":"45","article-title":"Experience-based Causality Learning for Intelligent Agents","volume":"18","author":"Liu Yang","year":"2019","unstructured":"Yang Liu , Shaonan Wang , Jiajun Zhang , and Chengqing Zong . 2019 a. Experience-based Causality Learning for Intelligent Agents . ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18 , 4 (2019), 45 . Yang Liu, Shaonan Wang, Jiajun Zhang, and Chengqing Zong. 2019a. Experience-based Causality Learning for Intelligent Agents. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 4 (2019), 45.","journal-title":"ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)"},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314943"},{"key":"e_1_3_2_1_70_1","volume-title":"Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems 30, 1","author":"Loftin Robert","year":"2016","unstructured":"Robert Loftin , Bei Peng , James MacGlashan , Michael L Littman , Matthew E Taylor , Jeff Huang , and David L Roberts . 2016. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems 30, 1 ( 2016 ), 30--59. Robert Loftin, Bei Peng, James MacGlashan, Michael L Littman, Matthew E Taylor, Jeff Huang, and David L Roberts. 2016. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems 30, 1 (2016), 30--59."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3305917"},{"key":"e_1_3_2_1_72_1","volume-title":"Conference Towards Autonomous Robotic Systems. Springer, 15--27","author":"Martins Murilo Fernandes","year":"2013","unstructured":"Murilo Fernandes Martins and Reinaldo AC Bianchi . 2013 . Heuristically-accelerated reinforcement learning: A comparative analysis of performance . In Conference Towards Autonomous Robotic Systems. Springer, 15--27 . Murilo Fernandes Martins and Reinaldo AC Bianchi. 2013. Heuristically-accelerated reinforcement learning: A comparative analysis of performance. In Conference Towards Autonomous Robotic Systems. Springer, 15--27."},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvlc.2016.10.007"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-013-1504-x"},{"key":"e_1_3_2_1_75_1","unstructured":"Raymond G Miltenberger. 2011. Behavior modification: Principles and procedures. Cengage Learning.  Raymond G Miltenberger. 2011. Behavior modification: Principles and procedures. Cengage Learning."},{"key":"e_1_3_2_1_76_1","volume-title":"Active Inverse Reward Design. arXiv preprint arXiv:1809.03060","author":"Mindermann S\u00f6ren","year":"2018","unstructured":"S\u00f6ren Mindermann , Rohin Shah , Adam Gleave , and Dylan Hadfield-Menell . 2018. Active Inverse Reward Design. arXiv preprint arXiv:1809.03060 ( 2018 ). S\u00f6ren Mindermann, Rohin Shah, Adam Gleave, and Dylan Hadfield-Menell. 2018. Active Inverse Reward Design. arXiv preprint arXiv:1809.03060 (2018)."},{"key":"e_1_3_2_1_77_1","volume-title":"Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ( 2013 ). Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"key":"e_1_3_2_1_78_1","volume-title":"Human-level control through deep reinforcement learning. Nature 518, 7540","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A Rusu , Joel Veness , Marc G Bellemare , Alex Graves , Martin Riedmiller , Andreas K Fidjeland , Georg Ostrovski , and others. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 ( 2015 ), 529. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, and others. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529."},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3322276.3322345"},{"key":"e_1_3_2_1_80_1","volume-title":"Anushay Furqan, Sebastian Risi, and Jichen Zhu.","author":"Myers Chelsea M","year":"2020","unstructured":"Chelsea M Myers , Evan Freed , Luis Fernando Laris Pardo , Anushay Furqan, Sebastian Risi, and Jichen Zhu. 2020 . Revealing Neural Network Bias to Non-Experts Through Interactive Counterfactual Examples . arXiv preprint arXiv:2001.02271 (2020). Chelsea M Myers, Evan Freed, Luis Fernando Laris Pardo, Anushay Furqan, Sebastian Risi, and Jichen Zhu. 2020. Revealing Neural Network Bias to Non-Experts Through Interactive Counterfactual Examples. arXiv preprint arXiv:2001.02271 (2020)."},{"key":"e_1_3_2_1_81_1","first-page":"278","article-title":"Policy invariance under reward transformations: Theory and application to reward shaping","volume":"99","author":"Ng Andrew Y","year":"1999","unstructured":"Andrew Y Ng , Daishi Harada , and Stuart Russell . 1999 . Policy invariance under reward transformations: Theory and application to reward shaping . In ICML , Vol. 99. 278 -- 287 . Andrew Y Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, Vol. 99. 278--287.","journal-title":"ICML"},{"key":"e_1_3_2_1_82_1","volume-title":"Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 957--965","author":"Peng Bei","year":"2016","unstructured":"Bei Peng , James MacGlashan , Robert Loftin , Michael L Littman , David L Roberts , and Matthew E Taylor . 2016 . A need for speed: Adapting agent action speed to improve task learning from non-expert humans . In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 957--965 . Bei Peng, James MacGlashan, Robert Loftin, Michael L Littman, David L Roberts, and Matthew E Taylor. 2016. A need for speed: Adapting agent action speed to improve task learning from non-expert humans. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 957--965."},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-017-0468-y"},{"key":"e_1_3_2_1_84_1","volume-title":"Markov Decision Processes.: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L","unstructured":"Martin L Puterman . 2014. Markov Decision Processes.: Discrete Stochastic Dynamic Programming . John Wiley & Sons . Martin L Puterman. 2014. Markov Decision Processes.: Discrete Stochastic Dynamic Programming. John Wiley & Sons."},{"key":"e_1_3_2_1_85_1","volume-title":"Leveraging human knowledge in tabular reinforcement learning: A study of human subjects. The Knowledge Engineering Review 33","author":"Rosenfeld Ariel","year":"2018","unstructured":"Ariel Rosenfeld , Moshe Cohen , Matthew E Taylor , and Sarit Kraus . 2018. Leveraging human knowledge in tabular reinforcement learning: A study of human subjects. The Knowledge Engineering Review 33 ( 2018 ). Ariel Rosenfeld, Moshe Cohen, Matthew E Taylor, and Sarit Kraus. 2018. Leveraging human knowledge in tabular reinforcement learning: A study of human subjects. The Knowledge Engineering Review 33 (2018)."},{"key":"e_1_3_2_1_86_1","volume-title":"On-line Q-learning using connectionist systems","author":"Rummery Gavin A","unstructured":"Gavin A Rummery and Mahesan Niranjan . 1994. On-line Q-learning using connectionist systems . Vol. 37 . University of Cambridge , Department of Engineering Cambridge, England. Gavin A Rummery and Mahesan Niranjan. 1994. On-line Q-learning using connectionist systems. Vol. 37. University of Cambridge, Department of Engineering Cambridge, England."},{"key":"e_1_3_2_1_87_1","volume-title":"Artificial intelligence: a modern approach. Malaysia","author":"Russell Stuart J","unstructured":"Stuart J Russell and Peter Norvig . 2016. Artificial intelligence: a modern approach. Malaysia ; Pearson Education Limited ,. Stuart J Russell and Peter Norvig. 2016. Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited,."},{"key":"e_1_3_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300391"},{"key":"e_1_3_2_1_89_1","unstructured":"Pararth Shah Dilek Hakkani-Tur and Larry Heck. 2016. Interactive reinforcement learning for task-oriented dialogue management. (2016).  Pararth Shah Dilek Hakkani-Tur and Larry Heck. 2016. Interactive reinforcement learning for task-oriented dialogue management. (2016)."},{"key":"e_1_3_2_1_90_1","volume-title":"Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and others.","author":"Simard Patrice Y","year":"2017","unstructured":"Patrice Y Simard , Saleema Amershi , David M Chickering , Alicia Edelman Pelton , Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and others. 2017 . Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017). Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, and others. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017)."},{"key":"e_1_3_2_1_91_1","volume-title":"Reinforcement learning with replacing eligibility traces. Machine learning 22, 1--3","author":"Singh Satinder P","year":"1996","unstructured":"Satinder P Singh and Richard S Sutton . 1996. Reinforcement learning with replacing eligibility traces. Machine learning 22, 1--3 ( 1996 ), 123--158. Satinder P Singh and Richard S Sutton. 1996. Reinforcement learning with replacing eligibility traces. Machine learning 22, 1--3 (1996), 123--158."},{"key":"e_1_3_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIG.2016.7860436"},{"key":"e_1_3_2_1_93_1","volume-title":"Effect of human guidance and state space size on interactive reinforcement learning. In 2011 Ro-Man","author":"Suay Halit Bener","unstructured":"Halit Bener Suay and Sonia Chernova . 2011. Effect of human guidance and state space size on interactive reinforcement learning. In 2011 Ro-Man . IEEE , 1--6. Halit Bener Suay and Sonia Chernova. 2011. Effect of human guidance and state space size on interactive reinforcement learning. In 2011 Ro-Man. IEEE, 1--6."},{"key":"e_1_3_2_1_94_1","volume-title":"The value-function hypothesis. (2019)","author":"Sutton Richard","unstructured":"Richard Sutton and JAMH. 2019. The value-function hypothesis. (2019) . http:\/\/incompleteideas.net\/rlai.cs.ualberta.ca\/RLAI\/valuefunctionhypothesis.html Accessed: 2019-08-21. Richard Sutton and JAMH. 2019. The value-function hypothesis. (2019). http:\/\/incompleteideas.net\/rlai.cs.ualberta.ca\/RLAI\/valuefunctionhypothesis.html Accessed: 2019-08-21."},{"key":"e_1_3_2_1_95_1","volume-title":"The reward hypothesis. (2019)","author":"Sutton Richard","unstructured":"Richard Sutton , Michael Littman , and Al Paris . 2019. The reward hypothesis. (2019) . http:\/\/incompleteideas.net\/rlai.cs.ualberta.ca\/RLAI\/rewardhypothesis.html Accessed: 2019-08-21. Richard Sutton, Michael Littman, and Al Paris. 2019. The reward hypothesis. (2019). http:\/\/incompleteideas.net\/rlai.cs.ualberta.ca\/RLAI\/rewardhypothesis.html Accessed: 2019-08-21."},{"key":"e_1_3_2_1_96_1","unstructured":"Richard S Sutton. 1985. Temporal Credit Assignment in Reinforcement Learning. (1985).  Richard S Sutton. 1985. Temporal Credit Assignment in Reinforcement Learning. (1985)."},{"key":"e_1_3_2_1_97_1","unstructured":"Richard S Sutton. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems. 1038--1044.  Richard S Sutton. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in neural information processing systems. 1038--1044."},{"key":"e_1_3_2_1_98_1","unstructured":"Richard S Sutton and Andrew G Barto. 2011. Reinforcement learning: An introduction. (2011).  Richard S Sutton and Andrew G Barto. 2011. Reinforcement learning: An introduction. (2011)."},{"key":"e_1_3_2_1_99_1","volume-title":"Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1--2","author":"Sutton Richard S","year":"1999","unstructured":"Richard S Sutton , Doina Precup , and Satinder Singh . 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1--2 ( 1999 ), 181--211. Richard S Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1--2 (1999), 181--211."},{"key":"e_1_3_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273607"},{"key":"e_1_3_2_1_101_1","volume-title":"The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 617--624","author":"Taylor Matthew E","year":"2011","unstructured":"Matthew E Taylor , Halit Bener Suay , and Sonia Chernova . 2011 . Integrating reinforcement learning with human demonstrations of varying ability . In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 617--624 . Matthew E Taylor, Halit Bener Suay, and Sonia Chernova. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. International Foundation for Autonomous Agents and Multiagent Systems, 617--624."},{"key":"e_1_3_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-16952-6_49"},{"key":"e_1_3_2_1_103_1","volume-title":"Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI).","author":"Thomaz AL","year":"2006","unstructured":"AL Thomaz and C Breazeal . 2006 . Adding guidance to interactive reinforcement learning . In Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI). AL Thomaz and C Breazeal. 2006. Adding guidance to interactive reinforcement learning. In Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_3_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2007.09.009"},{"key":"e_1_3_2_1_105_1","volume-title":"AAAI 2005 workshop on human comprehensible machine learning.","author":"Thomaz Andrea Lockerd","year":"2005","unstructured":"Andrea Lockerd Thomaz , Guy Hoffman , and Cynthia Breazeal . 2005 . Real-time interactive reinforcement learning for robots . In AAAI 2005 workshop on human comprehensible machine learning. Andrea Lockerd Thomaz, Guy Hoffman, and Cynthia Breazeal. 2005. Real-time interactive reinforcement learning for robots. In AAAI 2005 workshop on human comprehensible machine learning."},{"key":"e_1_3_2_1_106_1","volume-title":"Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks. arXiv preprint arXiv:1609.02993","author":"Usunier Nicolas","year":"2016","unstructured":"Nicolas Usunier , Gabriel Synnaeve , Zeming Lin , and Soumith Chintala . 2016. Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks. arXiv preprint arXiv:1609.02993 ( 2016 ). Nicolas Usunier, Gabriel Synnaeve, Zeming Lin, and Soumith Chintala. 2016. Episodic exploration for deep deterministic policies: An application to starcraft micromanagement tasks. arXiv preprint arXiv:1609.02993 (2016)."},{"key":"e_1_3_2_1_107_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-78978-1_5"},{"key":"e_1_3_2_1_108_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11485"},{"key":"e_1_3_2_1_109_1","volume-title":"Machine learning 8, 3--4","author":"Watkins Christopher JCH","year":"1992","unstructured":"Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning 8, 3--4 ( 1992 ), 279--292. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3--4 (1992), 279--292."},{"key":"e_1_3_2_1_110_1","volume-title":"Proceedings of the 20th International Conference on Machine Learning (ICML-03)","author":"Wiewiora Eric","year":"2003","unstructured":"Eric Wiewiora , Garrison W Cottrell , and Charles Elkan . 2003 . Principled methods for advising reinforcement learning agents . In Proceedings of the 20th International Conference on Machine Learning (ICML-03) . 792--799. Eric Wiewiora, Garrison W Cottrell, and Charles Elkan. 2003. Principled methods for advising reinforcement learning agents. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 792--799."},{"key":"e_1_3_2_1_111_1","unstructured":"Aaron Wilson Alan Fern and Prasad Tadepalli. 2012. A bayesian approach for policy learning from trajectory preference queries. In Advances in neural information processing systems. 1133--1141.  Aaron Wilson Alan Fern and Prasad Tadepalli. 2012. A bayesian approach for policy learning from trajectory preference queries. In Advances in neural information processing systems. 1133--1141."},{"key":"e_1_3_2_1_112_1","volume-title":"2017 AAAI Spring Symposium Series.","author":"Yang Qian","year":"2017","unstructured":"Qian Yang . 2017 . The role of design in creating machine-learning-enhanced user experience . In 2017 AAAI Spring Symposium Series. Qian Yang. 2017. The role of design in creating machine-learning-enhanced user experience. In 2017 AAAI Spring Symposium Series."},{"key":"e_1_3_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196709.3196729"},{"key":"e_1_3_2_1_114_1","volume-title":"Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning. arXiv preprint arXiv:1811.04272","author":"Yu Chao","year":"2018","unstructured":"Chao Yu , Tianpei Yang , Wenxuan Zhu , Guangliang Li , and others. 2018. Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning. arXiv preprint arXiv:1811.04272 ( 2018 ). Chao Yu, Tianpei Yang, Wenxuan Zhu, Guangliang Li, and others. 2018. Learning Shaping Strategies in Human-in-the-loop Interactive Reinforcement Learning. arXiv preprint arXiv:1811.04272 (2018)."},{"key":"e_1_3_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1108\/17563781211255862"},{"key":"e_1_3_2_1_116_1","unstructured":"Brian D Ziebart Andrew Maas J Andrew Bagnell and Anind K Dey. 2008. Maximum entropy inverse reinforcement learning. (2008).  Brian D Ziebart Andrew Maas J Andrew Bagnell and Anind K Dey. 2008. Maximum entropy inverse reinforcement learning. (2008)."},{"key":"e_1_3_2_1_117_1","volume-title":"AAAI Spring Symposium: Human Behavior Modeling. 92","author":"Ziebart Brian D","year":"2009","unstructured":"Brian D Ziebart , Andrew L Maas , J Andrew Bagnell , and Anind K Dey . 2009 . Human Behavior Modeling with Maximum Entropy Inverse Optimal Control .. In AAAI Spring Symposium: Human Behavior Modeling. 92 . Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. 2009. Human Behavior Modeling with Maximum Entropy Inverse Optimal Control.. In AAAI Spring Symposium: Human Behavior Modeling. 92."}],"event":{"name":"DIS '20: Designing Interactive Systems Conference 2020","location":"Eindhoven Netherlands","acronym":"DIS '20","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 2020 ACM Designing Interactive Systems Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357236.3395525","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3357236.3395525","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:26Z","timestamp":1750200086000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357236.3395525"}},"subtitle":["Design Principles and Open Challenges"],"short-title":[],"issued":{"date-parts":[[2020,7,3]]},"references-count":117,"alternative-id":["10.1145\/3357236.3395525","10.1145\/3357236"],"URL":"https:\/\/doi.org\/10.1145\/3357236.3395525","relation":{},"subject":[],"published":{"date-parts":[[2020,7,3]]},"assertion":[{"value":"2020-07-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}