{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T09:52:30Z","timestamp":1756461150258,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,3,9]],"date-time":"2020-03-09T00:00:00Z","timestamp":1583712000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"DARPA I2O","award":["FA8750-17-C-0018","FA8750-19-2-1006"],"award-info":[{"award-number":["FA8750-17-C-0018","FA8750-19-2-1006"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,3,9]]},"DOI":"10.1145\/3319502.3374824","type":"proceedings-article","created":{"date-parts":[[2020,3,7]],"date-time":"2020-03-07T01:30:31Z","timestamp":1583544631000},"page":"649-657","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Teaching a Robot Tasks of Arbitrary Complexity via Human Feedback"],"prefix":"10.1145","author":[{"given":"Guan","family":"Wang","sequence":"first","affiliation":[{"name":"Brown University, Providence, RI, USA"}]},{"given":"Carl","family":"Trimbach","sequence":"additional","affiliation":[{"name":"Brown University, Providence, RI, USA"}]},{"given":"Jun Ki","family":"Lee","sequence":"additional","affiliation":[{"name":"Brown University, Providence, RI, USA"}]},{"given":"Mark K.","family":"Ho","sequence":"additional","affiliation":[{"name":"Princeton University, Princeton, NJ, USA"}]},{"given":"Michael L.","family":"Littman","sequence":"additional","affiliation":[{"name":"Brown University, Providence, RI, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,3,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23780-5_11"},{"key":"e_1_3_2_1_3_1","unstructured":"Kareem Amin Nan Jiang and Satinder Singh. 2017. Repeated Inverse Reinforcement Learning. (2017). arXiv preprint arXiv:1705.05427.  Kareem Amin Nan Jiang and Satinder Singh. 2017. Repeated Inverse Reinforcement Learning. (2017). arXiv preprint arXiv:1705.05427."},{"volume-title":"Theory and Practice of Formal Methods","author":"Ancona Davide","key":"e_1_3_2_1_4_1","unstructured":"Davide Ancona , Angelo Ferrando , and Viviana Mascardi . 2016. Comparing trace expressions and linear temporal logic for runtime verification . In Theory and Practice of Formal Methods . Springer , 47--64. Davide Ancona, Angelo Ferrando, and Viviana Mascardi. 2016. Comparing trace expressions and linear temporal logic for runtime verification. In Theory and Practice of Formal Methods . Springer, 47--64."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the International Conference on Machine Learning. 897--904","author":"Babes Monica","year":"2011","unstructured":"Monica Babes , Vukosi N. Marivate , Michael L. Littman , and Kaushik Subramanian . 2011 . Apprenticeship Learning About Multiple Intentions . In Proceedings of the International Conference on Machine Learning. 897--904 . Monica Babes, Vukosi N. Marivate, Michael L. Littman, and Kaushik Subramanian. 2011. Apprenticeship Learning About Multiple Intentions. In Proceedings of the International Conference on Machine Learning. 897--904."},{"key":"e_1_3_2_1_6_1","volume-title":"Rewarding Behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press\/The MIT Press, 1160--1167","author":"Bacchus Fahiem","year":"1996","unstructured":"Fahiem Bacchus , Craig Boutilier , and Adam Grove . 1996 . Rewarding Behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press\/The MIT Press, 1160--1167 . Fahiem Bacchus, Craig Boutilier, and Adam Grove. 1996. Rewarding Behaviors. In Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press\/The MIT Press, 1160--1167."},{"key":"e_1_3_2_1_7_1","unstructured":"Alberto Camacho and Sheila A McIlraith. 2019. Learning Interpretable Models Expressed in Linear Temporal Logic. ICAPS.  Alberto Camacho and Sheila A McIlraith. 2019. Learning Interpretable Models Expressed in Linear Temporal Logic. ICAPS."},{"key":"e_1_3_2_1_8_1","volume-title":"Carlos and Jens Kober","author":"Solar Celemin Javier","year":"2019","unstructured":"Javier Ruiz-del- Solar Celemin , Carlos and Jens Kober . 2019 . A fast hybrid reinforcement learning framework with human corrective feedback. In Autonomous Robots . Springer , 1173--1186. Javier Ruiz-del-Solar Celemin, Carlos and Jens Kober. 2019. A fast hybrid reinforcement learning framework with human corrective feedback. In Autonomous Robots . Springer, 1173--1186."},{"key":"e_1_3_2_1_9_1","unstructured":"Paul Christiano Jan Leike Tom B Brown Miljan Martic Shane Legg and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems. 4302--4310.  Paul Christiano Jan Leike Tom B Brown Miljan Martic Shane Legg and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems. 4302--4310."},{"key":"e_1_3_2_1_10_1","unstructured":"Dylan Hadfield-Menell Stuart J Russell Pieter Abbeel and Anca Dragan. 2016. Cooperative inverse reinforcement learning. In Advances in Neural Information Processing Systems. 3909--3917.  Dylan Hadfield-Menell Stuart J Russell Pieter Abbeel and Anca Dragan. 2016. Cooperative inverse reinforcement learning. In Advances in Neural Information Processing Systems. 3909--3917."},{"volume-title":"Proceedings of the 37th Annual Meeting of the Cognitive Science Society .","author":"Ho Mark K","key":"e_1_3_2_1_11_1","unstructured":"Mark K Ho , Michael L. Littman , Fiery Cushman , and Joseph L. Austerweil . 2015. Teaching with rewards and punishments: Reinforcement or communication? . In Proceedings of the 37th Annual Meeting of the Cognitive Science Society . Mark K Ho, Michael L. Littman, Fiery Cushman, and Joseph L. Austerweil. 2015. Teaching with rewards and punishments: Reinforcement or communication?. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society ."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2017.03.006"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-006-0005-z"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2017.8264386"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2017.8264386"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1597735.1597738"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the 2013 International Conference on Intelligent User Interfaces. 191--202","author":"Bradley Knox W","year":"2013","unstructured":"W Bradley Knox and Peter Stone . 2013 . Learning non-myopically from human-generated reward . In Proceedings of the 2013 International Conference on Intelligent User Interfaces. 191--202 . W Bradley Knox and Peter Stone. 2013. Learning non-myopically from human-generated reward. In Proceedings of the 2013 International Conference on Intelligent User Interfaces. 191--202."},{"volume-title":"Proceedings of the Eleventh National Conference on Artificial Intelligence. AAAI Press\/MIT Press","author":"Koenig Sven","key":"e_1_3_2_1_18_1","unstructured":"Sven Koenig and Reid G. Simmons . 1993. Complexity Analysis of Real-time Reinforcement Learning . In Proceedings of the Eleventh National Conference on Artificial Intelligence. AAAI Press\/MIT Press , Menlo Park, CA, 99--105. Sven Koenig and Reid G. Simmons. 1993. Complexity Analysis of Real-time Reinforcement Learning. In Proceedings of the Eleventh National Conference on Artificial Intelligence. AAAI Press\/MIT Press, Menlo Park, CA, 99--105."},{"key":"e_1_3_2_1_19_1","volume-title":"Genetic Programming: On the Programming of Computers by Means of Natural Selection","author":"Koza John R.","year":"1992","unstructured":"John R. Koza . 1992 . Genetic Programming: On the Programming of Computers by Means of Natural Selection . The MIT Press . John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection .The MIT Press."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2009.2030225"},{"key":"e_1_3_2_1_21_1","volume-title":"Nature","volume":"521","author":"Littman Michael L.","year":"2015","unstructured":"Michael L. Littman . 2015 . Reinforcement learning improves behaviour from evaluative feedback . Nature , Vol. 521 , 7553 (2015), 394--556. Michael L. Littman. 2015. Reinforcement learning improves behaviour from evaluative feedback. Nature , Vol. 521, 7553 (2015), 394--556."},{"key":"e_1_3_2_1_22_1","unstructured":"Michael L. Littman Ufuk Topcu Jie Fu Charles Isbell Min Wen and James MacGlashan. 2017. Environment-Independent Task Specifications via GLTL. (2017). arXiv preprint arXiv:1704.04341.  Michael L. Littman Ufuk Topcu Jie Fu Charles Isbell Min Wen and James MacGlashan. 2017. Environment-Independent Task Specifications via GLTL. (2017). arXiv preprint arXiv:1704.04341."},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of the Twenty-Eighth Association for the Advancement of Artificial Intelligence Conference .","author":"Loftin Robert","year":"2014","unstructured":"Robert Loftin , James MacGlashan , Michael L. Littman , Matthew E. Taylor , David L. Roberts , and Jeff Huang . 2014 . A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback . In Proceedings of the Twenty-Eighth Association for the Advancement of Artificial Intelligence Conference . Robert Loftin, James MacGlashan, Michael L. Littman, Matthew E. Taylor, David L. Roberts, and Jeff Huang. 2014. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback. In Proceedings of the Twenty-Eighth Association for the Advancement of Artificial Intelligence Conference ."},{"volume-title":"Proceedings of the Thirty-Fourth International Conference on Machine Learning .","author":"MacGlashan James","key":"e_1_3_2_1_24_1","unstructured":"James MacGlashan , Mark K Ho , Robert Loftin , Bei Peng , Guan Wang , David L. Roberts , Matthew E. Taylor , and Michael L. Littman . 2017. Interactive Learning from Policy-Dependent Human Feedback . In Proceedings of the Thirty-Fourth International Conference on Machine Learning . James MacGlashan, Mark K Ho, Robert Loftin, Bei Peng, Guan Wang, David L. Roberts, Matthew E. Taylor, and Michael L. Littman. 2017. Interactive Learning from Policy-Dependent Human Feedback. In Proceedings of the Thirty-Fourth International Conference on Machine Learning ."},{"volume-title":"Learning linear temporal properties. In 2018 Formal Methods in Computer Aided Design (FMCAD)","author":"Neider Daniel","key":"e_1_3_2_1_25_1","unstructured":"Daniel Neider and Ivan Gavran . 2018. Learning linear temporal properties. In 2018 Formal Methods in Computer Aided Design (FMCAD) . IEEE , 1--10. Daniel Neider and Ivan Gavran. 2018. Learning linear temporal properties. In 2018 Formal Methods in Computer Aided Design (FMCAD). IEEE, 1--10."},{"volume-title":"International Conference on Machine Learning . 663--670","author":"Andrew","key":"e_1_3_2_1_26_1","unstructured":"Andrew Y. Ng and Stuart Russell. 2000. Algorithms for inverse reinforcement learning . In International Conference on Machine Learning . 663--670 . Andrew Y. Ng and Stuart Russell. 2000. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning . 663--670."},{"volume-title":"Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems .","author":"Peng Bei","key":"e_1_3_2_1_27_1","unstructured":"Bei Peng , James MacGlashan , Robert Loftin , Michael L. Littman , David L. Roberts , and Matthew E. Taylor . 2017. Curriculum Design for Machine Learners in Sequential Decision Tasks . In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems . Bei Peng, James MacGlashan, Robert Loftin, Michael L. Littman, David L. Roberts, and Matthew E. Taylor. 2017. Curriculum Design for Machine Learners in Sequential Decision Tasks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems ."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1006\/inco.1994.1081"},{"key":"e_1_3_2_1_29_1","unstructured":"Ankit Shah Pritish Kamath Julie A Shah and Shen Li. 2018. Bayesian inference of temporal task specifications from demonstrations. In Advances in Neural Information Processing Systems. 3804--3813.  Ankit Shah Pritish Kamath Julie A Shah and Shen Li. 2018. Bayesian inference of temporal task specifications from demonstrations. In Advances in Neural Information Processing Systems. 3804--3813."},{"key":"e_1_3_2_1_30_1","volume-title":"Proceedings of the Annual Conference of the Cognitive Science Society .","author":"Singh S","year":"2009","unstructured":"S Singh , R L Lewis , and A G Barto . 2009 . Where do rewards come from? . In Proceedings of the Annual Conference of the Cognitive Science Society . S Singh, R L Lewis, and A G Barto. 2009. Where do rewards come from?. In Proceedings of the Annual Conference of the Cognitive Science Society ."},{"key":"e_1_3_2_1_31_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998","unstructured":"Richard S. Sutton and Andrew G . Barto . 1998 . Reinforcement Learning : An Introduction .The MIT Press . Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction .The MIT Press."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2007.09.009"},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the ECML\/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards .","author":"Wirth Christian","year":"2013","unstructured":"Christian Wirth and Johannes F\u00fcrnkranz . 2013 . Preference-based reinforcement learning: A preliminary survey . In Proceedings of the ECML\/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards . Christian Wirth and Johannes F\u00fcrnkranz. 2013. Preference-based reinforcement learning: A preliminary survey. In Proceedings of the ECML\/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards ."},{"key":"e_1_3_2_1_34_1","first-page":"1433","article-title":"Maximum Entropy Inverse Reinforcement Learning","volume":"8","author":"Ziebart Brian D","year":"2008","unstructured":"Brian D Ziebart , Andrew L Maas , J Andrew Bagnell , and Anind K Dey . 2008 . Maximum Entropy Inverse Reinforcement Learning . In AAAI , Vol. 8. 1433 -- 1438 . Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, and Anind K Dey. 2008. Maximum Entropy Inverse Reinforcement Learning. In AAAI, Vol. 8. 1433--1438.","journal-title":"AAAI"}],"event":{"name":"HRI '20: ACM\/IEEE International Conference on Human-Robot Interaction","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGCHI ACM Special Interest Group on Computer-Human Interaction","IEEE-RAS Robotics and Automation"],"location":"Cambridge United Kingdom","acronym":"HRI '20"},"container-title":["Proceedings of the 2020 ACM\/IEEE International Conference on Human-Robot Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3319502.3374824","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3319502.3374824","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:22Z","timestamp":1750199902000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3319502.3374824"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,9]]},"references-count":34,"alternative-id":["10.1145\/3319502.3374824","10.1145\/3319502"],"URL":"https:\/\/doi.org\/10.1145\/3319502.3374824","relation":{},"subject":[],"published":{"date-parts":[[2020,3,9]]},"assertion":[{"value":"2020-03-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}