{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T01:26:15Z","timestamp":1768008375754,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":79,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T00:00:00Z","timestamp":1624838400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,6,28]]},"DOI":"10.1145\/3461778.3462135","type":"proceedings-article","created":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T20:27:20Z","timestamp":1624912040000},"page":"1579-1590","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design Patterns"],"prefix":"10.1145","author":[{"given":"Zhaoxing","family":"Li","sequence":"first","affiliation":[{"name":"Durham university Department of Computer Science, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Shi","sequence":"additional","affiliation":[{"name":"Durham University, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexandra I.","family":"Cristea","sequence":"additional","affiliation":[{"name":"Durham University, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunzhan","family":"Zhou","sequence":"additional","affiliation":[{"name":"Durham university Department of Computer Science, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,28]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2870052"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v35i4.2513"},{"key":"e_1_3_2_1_3_1","volume-title":"Dqn-tamer: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748(2018).","author":"Arakawa Riku","year":"2018","unstructured":"Riku Arakawa , Sosuke Kobayashi , Yuya Unno , Yuta Tsuboi , and Shin-ichi Maeda. 2018 . Dqn-tamer: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748(2018). Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, and Shin-ichi Maeda. 2018. Dqn-tamer: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748(2018)."},{"key":"e_1_3_2_1_4_1","volume-title":"A survey of robot learning from demonstration. Robotics and autonomous systems 57, 5","author":"Argall D","year":"2009","unstructured":"Brenna\u00a0 D Argall , Sonia Chernova , Manuela Veloso , and Brett Browning . 2009. A survey of robot learning from demonstration. Robotics and autonomous systems 57, 5 ( 2009 ), 469\u2013483. Brenna\u00a0D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and autonomous systems 57, 5 (2009), 469\u2013483."},{"key":"e_1_3_2_1_5_1","unstructured":"Dilip Arumugam Jun\u00a0Ki Lee Sophie Saskin and Michael\u00a0L Littman. 2019. Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257(2019).  Dilip Arumugam Jun\u00a0Ki Lee Sophie Saskin and Michael\u00a0L Littman. 2019. Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257(2019)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357236.3395525"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1117\/12.2309945"},{"key":"e_1_3_2_1_8_1","unstructured":"Gwern Branwen. 2020. GPT-3 Creative Fiction. (2020).  Gwern Branwen. 2020. GPT-3 Creative Fiction. (2020)."},{"key":"e_1_3_2_1_9_1","volume-title":"Tenth Artificial Intelligence and Interactive Digital Entertainment Conference.","author":"Cardona-Rivera Rogelio\u00a0Enrique","year":"2014","unstructured":"Rogelio\u00a0Enrique Cardona-Rivera and Robert\u00a0Michael Young . 2014 . Games as conversation . In Tenth Artificial Intelligence and Interactive Digital Entertainment Conference. Rogelio\u00a0Enrique Cardona-Rivera and Robert\u00a0Michael Young. 2014. Games as conversation. In Tenth Artificial Intelligence and Interactive Digital Entertainment Conference."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the AIIDE workshop on Experimental AI in Games.","author":"Chac\u00f3n Pablo\u00a0Sauma","year":"2019","unstructured":"Pablo\u00a0Sauma Chac\u00f3n and Markus Eger . 2019 . Pandemic as a challenge for human-AI cooperation . In Proceedings of the AIIDE workshop on Experimental AI in Games. Pablo\u00a0Sauma Chac\u00f3n and Markus Eger. 2019. Pandemic as a challenge for human-AI cooperation. In Proceedings of the AIIDE workshop on Experimental AI in Games."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78743-3_2"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759137"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2211477"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSM.2005.40"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1098-2736(199909)36:7<806::AID-TEA5>3.0.CO;2-2"},{"key":"e_1_3_2_1_16_1","first-page":"385","article-title":"Learning behavior-selection by emotions and cognition in a multi-goal robot task","author":"Gadanho Sandra\u00a0Clara","year":"2003","unstructured":"Sandra\u00a0Clara Gadanho . 2003 . Learning behavior-selection by emotions and cognition in a multi-goal robot task . Journal of Machine Learning Research 4 , Jul (2003), 385 \u2013 412 . Sandra\u00a0Clara Gadanho. 2003. Learning behavior-selection by emotions and cognition in a multi-goal robot task. Journal of Machine Learning Research 4, Jul (2003), 385\u2013412.","journal-title":"Journal of Machine Learning Research 4"},{"key":"e_1_3_2_1_17_1","volume-title":"International Journal of Intelligent Computing and Cybernetics","author":"Gao Yang","year":"2012","unstructured":"Yang Gao , Jan Peters , Antonios Tsourdos , Shao Zhifei , and Er\u00a0Meng Joo . 2012. A survey of inverse reinforcement learning techniques . International Journal of Intelligent Computing and Cybernetics ( 2012 ). Yang Gao, Jan Peters, Antonios Tsourdos, Shao Zhifei, and Er\u00a0Meng Joo. 2012. A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics (2012)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2886795"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1038\/529445a"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.9914"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Prasoon Goyal Scott Niekum and Raymond\u00a0J Mooney. 2019. Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020(2019).  Prasoon Goyal Scott Niekum and Raymond\u00a0J Mooney. 2019. Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020(2019).","DOI":"10.24963\/ijcai.2019\/331"},{"key":"e_1_3_2_1_22_1","unstructured":"Shane Griffith Kaushik Subramanian Jonathan Scholz Charles\u00a0L Isbell and Andrea\u00a0L Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems. 2625\u20132633.  Shane Griffith Kaushik Subramanian Jonathan Scholz Charles\u00a0L Isbell and Andrea\u00a0L Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems. 2625\u20132633."},{"key":"e_1_3_2_1_23_1","unstructured":"Mark\u00a0K Ho Michael\u00a0L Littman Fiery Cushman and Joseph\u00a0L Austerweil. 2015. Teaching with rewards and punishments: Reinforcement or communication?. In CogSci.  Mark\u00a0K Ho Michael\u00a0L Littman Fiery Cushman and Joseph\u00a0L Austerweil. 2015. Teaching with rewards and punishments: Reinforcement or communication?. In CogSci."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-7373(83)80034-0"},{"key":"e_1_3_2_1_25_1","volume-title":"From machine learning to explainable AI. In 2018 world symposium on digital intelligence for systems and machines (DISA)","author":"Holzinger Andreas","unstructured":"Andreas Holzinger . 2018. From machine learning to explainable AI. In 2018 world symposium on digital intelligence for systems and machines (DISA) . IEEE , 55\u201366. Andreas Holzinger. 2018. From machine learning to explainable AI. In 2018 world symposium on digital intelligence for systems and machines (DISA). IEEE, 55\u201366."},{"key":"e_1_3_2_1_26_1","unstructured":"Geoffrey Irving Paul Christiano and Dario Amodei. 2018. AI safety via debate. arXiv preprint arXiv:1805.00899(2018).  Geoffrey Irving Paul Christiano and Dario Amodei. 2018. AI safety via debate. arXiv preprint arXiv:1805.00899(2018)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/375735.376334"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5898\/JHRI.3.1.Johnson"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-10203-5_26"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","first-page":"29","DOI":"10.12700\/APH.15.8.2018.8.2","article-title":"Unsupervised clustering for deep learning: A tutorial survey","volume":"15","author":"K\u00e1roly Art\u00far\u00a0Istv\u00e1n","year":"2018","unstructured":"Art\u00far\u00a0Istv\u00e1n K\u00e1roly , R\u00f3bert Full\u00e9r , and P\u00e9ter Galambos . 2018 . Unsupervised clustering for deep learning: A tutorial survey . Acta Polytechnica Hungarica 15 , 8 (2018), 29 \u2013 53 . Art\u00far\u00a0Istv\u00e1n K\u00e1roly, R\u00f3bert Full\u00e9r, and P\u00e9ter Galambos. 2018. Unsupervised clustering for deep learning: A tutorial survey. Acta Polytechnica Hungarica 15, 8 (2018), 29\u201353.","journal-title":"Acta Polytechnica Hungarica"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-010-9422-y"},{"key":"e_1_3_2_1_32_1","unstructured":"R KLING. 1981. ROUTINE DECISION-MAKING-THE FUTURE OF BUREAUCRACY-INBAR M.  R KLING. 1981. ROUTINE DECISION-MAKING-THE FUTURE OF BUREAUCRACY-INBAR M."},{"key":"e_1_3_2_1_33_1","unstructured":"William\u00a0Bradley Knox. 2012. Learning from human-generated reward. (2012).  William\u00a0Bradley Knox. 2012. Learning from human-generated reward. (2012)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1597735.1597738"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"e_1_3_2_1_36_1","volume-title":"The AAAI-2004 workshop on supervisory control of learning and adaptive systems","author":"Kuhlmann Gregory","year":"2004","unstructured":"Gregory Kuhlmann , Peter Stone , Raymond Mooney , and Jude Shavlik . 2004 . Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer . In The AAAI-2004 workshop on supervisory control of learning and adaptive systems . San Jose, CA. Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. 2004. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In The AAAI-2004 workshop on supervisory control of learning and adaptive systems. San Jose, CA."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10827"},{"key":"e_1_3_2_1_38_1","unstructured":"Jan Leike David Krueger Tom Everitt Miljan Martic Vishal Maini and Shane Legg. 2018. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871(2018).  Jan Leike David Krueger Tom Everitt Miljan Martic Vishal Maini and Shane Legg. 2018. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871(2018)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300380"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-020-09447-w"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/THMS.2019.2912447"},{"key":"e_1_3_2_1_42_1","unstructured":"Mao Li Tim Brys and Daniel Kudenko. 2018. Introspective Reinforcement Learning and Learning from Demonstration.. In AAMAS. 1992\u20131994.  Mao Li Tim Brys and Daniel Kudenko. 2018. Introspective Reinforcement Learning and Learning from Demonstration.. In AAMAS. 1992\u20131994."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300325"},{"key":"e_1_3_2_1_44_1","volume-title":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 414\u2013429","author":"Liu Guiliang","year":"2018","unstructured":"Guiliang Liu , Oliver Schulte , Wang Zhu , and Qingcan Li . 2018 . Toward interpretable deep reinforcement learning with linear model u-trees . In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 414\u2013429 . Guiliang Liu, Oliver Schulte, Wang Zhu, and Qingcan Li. 2018. Toward interpretable deep reinforcement learning with linear model u-trees. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 414\u2013429."},{"key":"e_1_3_2_1_45_1","volume-title":"Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems 30, 1","author":"Loftin Robert","year":"2016","unstructured":"Robert Loftin , Bei Peng , James MacGlashan , Michael\u00a0 L Littman , Matthew\u00a0 E Taylor , Jeff Huang , and David\u00a0 L Roberts . 2016. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems 30, 1 ( 2016 ), 30\u201359. Robert Loftin, Bei Peng, James MacGlashan, Michael\u00a0L Littman, Matthew\u00a0E Taylor, Jeff Huang, and David\u00a0L Roberts. 2016. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems 30, 1 (2016), 30\u201359."},{"key":"e_1_3_2_1_46_1","unstructured":"James MacGlashan Mark\u00a0K Ho Robert Loftin Bei Peng David Roberts Matthew\u00a0E Taylor and Michael\u00a0L Littman. 2017. Interactive learning from policy-dependent human feedback. arXiv preprint arXiv:1701.06049(2017).  James MacGlashan Mark\u00a0K Ho Robert Loftin Bei Peng David Roberts Matthew\u00a0E Taylor and Michael\u00a0L Littman. 2017. Interactive learning from policy-dependent human feedback. arXiv preprint arXiv:1701.06049(2017)."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114730"},{"key":"e_1_3_2_1_48_1","unstructured":"Prashan Madumal Tim Miller Liz Sonenberg and Frank Vetere. 2019. Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958(2019).  Prashan Madumal Tim Miller Liz Sonenberg and Frank Vetere. 2019. Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958(2019)."},{"key":"e_1_3_2_1_49_1","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602(2013).  Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602(2013)."},{"key":"e_1_3_2_1_50_1","volume-title":"Proceedings 2003 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)","author":"Moon Inhyuk","year":"2003","unstructured":"Inhyuk Moon , Myungjoon Lee , Jeicheong Ryu , and Museong Mun . 2003 . Intelligent robotic wheelchair with EMG-, gesture-, and voice-based interfaces . In Proceedings 2003 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Vol.\u00a04. IEEE, 3453\u20133458. Inhyuk Moon, Myungjoon Lee, Jeicheong Ryu, and Museong Mun. 2003. Intelligent robotic wheelchair with EMG-, gesture-, and voice-based interfaces. In Proceedings 2003 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), Vol.\u00a04. IEEE, 3453\u20133458."},{"key":"e_1_3_2_1_51_1","unstructured":"Anis Najar and Mohamed Chetouani. 2020. Reinforcement learning with human advice. A survey. arXiv preprint arXiv:2005.11016(2020).  Anis Najar and Mohamed Chetouani. 2020. Reinforcement learning with human advice. A survey. arXiv preprint arXiv:2005.11016(2020)."},{"key":"e_1_3_2_1_52_1","unstructured":"Erika Puiutta and Eric Veith. 2020. Explainable Reinforcement Learning: A Survey. arXiv preprint arXiv:2005.06247(2020).  Erika Puiutta and Eric Veith. 2020. Explainable Reinforcement Learning: A Survey. arXiv preprint arXiv:2005.06247(2020)."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/568513.568514"},{"key":"e_1_3_2_1_54_1","volume-title":"Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720(2018).","author":"Rohde David","year":"2018","unstructured":"David Rohde , Stephen Bonner , Travis Dunlop , Flavian Vasile , and Alexandros Karatzoglou . 2018 . Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720(2018). David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720(2018)."},{"key":"e_1_3_2_1_55_1","volume-title":"Cooperative work: A conceptual framework. Distributed decision making: Cognitive models for cooperative work","author":"Schmidt Kjeld","year":"1991","unstructured":"Kjeld Schmidt , J Rasmussen , B Brehmer , and J Leplat . 1991. Cooperative work: A conceptual framework. Distributed decision making: Cognitive models for cooperative work ( 1991 ), 75\u2013110. Kjeld Schmidt, J Rasmussen, B Brehmer, and J Leplat. 1991. Cooperative work: A conceptual framework. Distributed decision making: Cognitive models for cooperative work (1991), 75\u2013110."},{"key":"e_1_3_2_1_56_1","volume-title":"10th European Congress on Embedded Real Time Software and Systems (ERTS","author":"Schwalbe Gesina","year":"2020","unstructured":"Gesina Schwalbe and Martin Schels . 2020 . A survey on methods for the safety assurance of machine learning based systems . In 10th European Congress on Embedded Real Time Software and Systems (ERTS 2020). Gesina Schwalbe and Martin Schels. 2020. A survey on methods for the safety assurance of machine learning based systems. In 10th European Congress on Embedded Real Time Software and Systems (ERTS 2020)."},{"key":"e_1_3_2_1_57_1","unstructured":"David Silver Thomas Hubert Julian Schrittwieser Ioannis Antonoglou Matthew Lai Arthur Guez Marc Lanctot Laurent Sifre Dharshan Kumaran Thore Graepel 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815(2017).  David Silver Thomas Hubert Julian Schrittwieser Ioannis Antonoglou Matthew Lai Arthur Guez Marc Lanctot Laurent Sifre Dharshan Kumaran Thore Graepel 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815(2017)."},{"key":"e_1_3_2_1_58_1","volume-title":"Did Searle attack strong strong or weak strong AI. Artificial Intelligence and Its Applications","author":"Sloman Aaron","year":"1986","unstructured":"Aaron Sloman . 1986. Did Searle attack strong strong or weak strong AI. Artificial Intelligence and Its Applications , John Wiley and Sons ( 1986 ). Aaron Sloman. 1986. Did Searle attack strong strong or weak strong AI. Artificial Intelligence and Its Applications, John Wiley and Sons (1986)."},{"key":"e_1_3_2_1_59_1","unstructured":"Nisan Stiennon Long Ouyang Jeff Wu Daniel\u00a0M Ziegler Ryan Lowe Chelsea Voss Alec Radford Dario Amodei and Paul Christiano. 2020. Learning to summarize from human feedback. arXiv preprint arXiv:2009.01325(2020).  Nisan Stiennon Long Ouyang Jeff Wu Daniel\u00a0M Ziegler Ryan Lowe Chelsea Voss Alec Radford Dario Amodei and Paul Christiano. 2020. Learning to summarize from human feedback. arXiv preprint arXiv:2009.01325(2020)."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368986"},{"key":"e_1_3_2_1_61_1","volume-title":"Reinforcement learning: An introduction","author":"Sutton S","unstructured":"Richard\u00a0 S Sutton and Andrew\u00a0 G Barto . 2018. Reinforcement learning: An introduction . MIT press . Richard\u00a0S Sutton and Andrew\u00a0G Barto. 2018. Reinforcement learning: An introduction. MIT press."},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/2031678.2031705"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/203330.203343"},{"key":"e_1_3_2_1_64_1","volume-title":"Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI).","author":"Thomaz L","year":"2006","unstructured":"Andrea\u00a0 L Thomaz and Cynthia Breazeal . 2006 . Adding guidance to interactive reinforcement learning . In Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI). Andrea\u00a0L Thomaz and Cynthia Breazeal. 2006. Adding guidance to interactive reinforcement learning. In Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2007.09.009"},{"key":"e_1_3_2_1_66_1","volume-title":"AAAI 2005 workshop on human comprehensible machine learning.","author":"Thomaz Andrea\u00a0Lockerd","year":"2005","unstructured":"Andrea\u00a0Lockerd Thomaz , Guy Hoffman , and Cynthia Breazeal . 2005 . Real-time interactive reinforcement learning for robots . In AAAI 2005 workshop on human comprehensible machine learning. Andrea\u00a0Lockerd Thomaz, Guy Hoffman, and Cynthia Breazeal. 2005. Real-time interactive reinforcement learning for robots. In AAAI 2005 workshop on human comprehensible machine learning."},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2006.314459"},{"key":"e_1_3_2_1_68_1","unstructured":"Karel van\u00a0den Bosch and Adelbert Bronkhorst. 2018. Human-AI cooperation to benefit military decision making. NATO.  Karel van\u00a0den Bosch and Adelbert Bronkhorst. 2018. Human-AI cooperation to benefit military decision making. NATO."},{"key":"e_1_3_2_1_69_1","volume-title":"Six Challenges for Human-AI Co-learning. In International Conference on Human-Computer Interaction. Springer, 572\u2013589","author":"van\u00a0den Bosch Karel","year":"2019","unstructured":"Karel van\u00a0den Bosch , Tjeerd Schoonderwoerd , Romy Blankendaal , and Mark Neerincx . 2019 . Six Challenges for Human-AI Co-learning. In International Conference on Human-Computer Interaction. Springer, 572\u2013589 . Karel van\u00a0den Bosch, Tjeerd Schoonderwoerd, Romy Blankendaal, and Mark Neerincx. 2019. Six Challenges for Human-AI Co-learning. In International Conference on Human-Computer Interaction. Springer, 572\u2013589."},{"key":"e_1_3_2_1_70_1","unstructured":"Abhinav Verma Vijayaraghavan Murali Rishabh Singh Pushmeet Kohli and Swarat Chaudhuri. 2018. Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477(2018).  Abhinav Verma Vijayaraghavan Murali Rishabh Singh Pushmeet Kohli and Swarat Chaudhuri. 2018. Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477(2018)."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1999.770058"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-5007"},{"key":"e_1_3_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11485"},{"key":"e_1_3_2_1_74_1","unstructured":"Klaus Weber Hannes Ritschel Florian Lingenfelser and Elisabeth Andr\u00e9. 2018. Real-time adaptation of a robotic joke teller based on human social signals. (2018).  Klaus Weber Hannes Ritschel Florian Lingenfelser and Elisabeth Andr\u00e9. 2018. Real-time adaptation of a robotic joke teller based on human social signals. (2018)."},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460937"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"crossref","unstructured":"Deirdre Wilson and Dan Sperber. 1981. On Grice\u2019s theory of conversation. Conversation and discourse(1981) 155\u201378.  Deirdre Wilson and Dan Sperber. 1981. On Grice\u2019s theory of conversation. Conversation and discourse(1981) 155\u201378.","DOI":"10.4324\/9781003291039-11"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"crossref","unstructured":"Mike Wu Sonali Parbhoo Michael\u00a0C Hughes Ryan Kindle Leo\u00a0A Celi Maurizio Zazzi Volker Roth and Finale Doshi-Velez. 2020. Regional Tree Regularization for Interpretability in Deep Neural Networks.. In AAAI. 6413\u20136421.  Mike Wu Sonali Parbhoo Michael\u00a0C Hughes Ryan Kindle Leo\u00a0A Celi Maurizio Zazzi Volker Roth and Finale Doshi-Velez. 2020. Regional Tree Regularization for Interpretability in Deep Neural Networks.. In AAAI. 6413\u20136421.","DOI":"10.1609\/aaai.v34i04.6112"},{"key":"e_1_3_2_1_78_1","unstructured":"George Zarkadakis. 2015. In our own image: will artificial intelligence save or destroy us?Random House.  George Zarkadakis. 2015. In our own image: will artificial intelligence save or destroy us?Random House."},{"key":"e_1_3_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10111-009-0134-7"}],"event":{"name":"DIS '21: Designing Interactive Systems Conference 2021","location":"Virtual Event USA","acronym":"DIS '21","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Designing Interactive Systems Conference 2021"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461778.3462135","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3461778.3462135","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:33Z","timestamp":1750191453000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461778.3462135"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,28]]},"references-count":79,"alternative-id":["10.1145\/3461778.3462135","10.1145\/3461778"],"URL":"https:\/\/doi.org\/10.1145\/3461778.3462135","relation":{},"subject":[],"published":{"date-parts":[[2021,6,28]]},"assertion":[{"value":"2021-06-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}