{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T15:18:31Z","timestamp":1774365511786,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T00:00:00Z","timestamp":1662595200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Hum.-Robot Interact."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>\n            Robots can learn from humans by asking questions. In these questions, the robot demonstrates a few different behaviors and asks the human for their favorite. But how should robots choose which questions to ask? Today\u2019s robots optimize for\n            <jats:italic>informative<\/jats:italic>\n            questions that actively probe the human\u2019s preferences as efficiently as possible. But while informative questions make sense from the robot\u2019s perspective, human onlookers may find them arbitrary and\n            <jats:italic>misleading<\/jats:italic>\n            . For example, consider an assistive robot learning to put away the dishes. Based on your answers to previous questions this robot knows where it should stack each dish; however, the robot is unsure about right height to carry these dishes. A robot optimizing only for informative questions focuses purely on this height: it shows trajectories that carry the plates near or far from the table, regardless of whether or not they stack the dishes correctly. As a result, when we see this question, we mistakenly think that the robot is still confused about where to stack the dishes! In this article, we formalize active preference-based learning from the human\u2019s perspective. We hypothesize that\u2014from the human\u2019s point-of-view \u2014the robot\u2019s questions\n            <jats:italic>reveal<\/jats:italic>\n            what the robot has and has not learned. Our insight enables robots to use questions to make their learning process\n            <jats:italic>transparent<\/jats:italic>\n            to the human operator. We develop and test a model that robots can leverage to relate the questions they ask to the information these questions reveal. We then introduce a tradeoff between informative and revealing questions that considers both human and robot perspectives: a robot that optimizes for this tradeoff actively gathers information from the human while simultaneously keeping the human up to date with what it has learned. We evaluate our approach across simulations, online surveys, and in-person user studies. We find that robots, which consider the human\u2019s point of view learn just as quickly as state-of-the-art baselines while also communicating what they have learned to the human operator. Videos of our user studies and results are available here: https:\/\/youtu.be\/tC6y_jHN7Vw.\n          <\/jats:p>","DOI":"10.1145\/3526107","type":"journal-article","created":{"date-parts":[[2022,3,26]],"date-time":"2022-03-26T11:19:37Z","timestamp":1648293577000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["<i>Here\u2019s What I\u2019ve Learned:<\/i>\n            Asking Questions that Reveal Reward Learning"],"prefix":"10.1145","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4103-4664","authenticated-orcid":false,"given":"Soheil","family":"Habibian","sequence":"first","affiliation":[{"name":"Virginia Tech, Blacksburg, VA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0711-2051","authenticated-orcid":false,"given":"Ananth","family":"Jonnavittula","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, VA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8787-5293","authenticated-orcid":false,"given":"Dylan P.","family":"Losey","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, VA"}]}],"member":"320","published-online":{"date-parts":[[2022,9,8]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.5555\/2188385.2188390"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-012-0160-0"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2019.12.012"},{"key":"e_1_3_3_6_2","first-page":"141","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Bajcsy Andrea","year":"2018","unstructured":"Andrea Bajcsy, Dylan P. Losey, Marcia K. O\u2019Malley, and Anca D. Dragan. 2018. Learning from physical human corrections, one feature at a time. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 141\u2013149."},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2009.07.005"},{"key":"e_1_3_3_8_2","first-page":"132","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Basu Chandrayee","year":"2018","unstructured":"Chandrayee Basu, Mukesh Singhal, and Anca D. Dragan. 2018. Learning from richer human guidance: Augmenting comparison-based learning with feature queries. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 132\u2013140."},{"key":"e_1_3_3_9_2","first-page":"417","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Basu Chandrayee","year":"2017","unstructured":"Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Sinahal, and Anca D. Draqan. 2017. Do you want your autonomous car to drive like you?. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 417\u2013425."},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1177\/02783649211041652"},{"key":"e_1_3_3_11_2","first-page":"1177","volume-title":"Proceedings of the Conference on Robot Learning","author":"B\u0131y\u0131k Erdem","year":"2019","unstructured":"Erdem B\u0131y\u0131k, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning. In Proceedings of the Conference on Robot Learning. 1177\u20131190."},{"key":"e_1_3_3_12_2","article-title":"Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations","author":"Brown Daniel S.","year":"2019","unstructured":"Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum. 2019. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In Proceedings of the International Conference on Machine Learning (2019).","journal-title":"Proceedings of the International Conference on Machine Learning"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/283"},{"key":"e_1_3_3_14_2","first-page":"17","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Cakmak Maya","year":"2012","unstructured":"Maya Cakmak and Andrea L. Thomaz. 2012. Designing robot learners that ask good questions. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 17\u201324."},{"key":"e_1_3_3_15_2","first-page":"4299","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Christiano Paul F.","year":"2017","unstructured":"Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Proceedings of the Advances in Neural Information Processing Systems. 4299\u20134307."},{"key":"e_1_3_3_16_2","article-title":"Beyond question answering","author":"Cohen Philip R.","year":"1981","unstructured":"Philip R. Cohen, C. Raymond Perrault, and James F. Allen. 1981. Beyond question answering. Strategies for Natural Language Processing (1981), 245\u2013274.","journal-title":"Strategies for Natural Language Processing"},{"key":"e_1_3_3_17_2","unstructured":"Erwin Coumans and Yunfei Bai. 2016. Pybullet a python module for physics simulation for games robotics and machine learning. http:\/\/pybullet.org."},{"key":"e_1_3_3_18_2","volume-title":"Elements of Information Theory","author":"Cover Thomas M.","year":"2012","unstructured":"Thomas M. Cover. 2012. Elements of Information Theory. John Wiley & Sons."},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2014.X.031"},{"key":"e_1_3_3_20_2","first-page":"301","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Dragan Anca D.","year":"2013","unstructured":"Anca D. Dragan, Kenton C. T. Lee, and Siddhartha S. Srinivasa. 2013. Legibility and predictability of robot motion. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 301\u2013308."},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1515\/pjbr-2018-0009"},{"key":"e_1_3_3_22_2","volume-title":"Proceedings of the RSS Workshop on Model Learning for Human-Robot Communication","author":"Holladay Rachel","year":"2016","unstructured":"Rachel Holladay, Shervin Javdani, Anca Dragan, and Siddhartha Srinivasa. 2016. Active comparison based learning incorporating user uncertainty and noise. In Proceedings of the RSS Workshop on Model Learning for Human-Robot Communication."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-018-9771-0"},{"key":"e_1_3_3_24_2","first-page":"8011","article-title":"Reward learning from human preferences and demonstrations in Atari","volume":"31","author":"Ibarz Borja","year":"2018","unstructured":"Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, and Dario Amodei. 2018. Reward learning from human preferences and demonstrations in Atari. In Advances in Neural Information Processing Systems (NeurIPS). Vol. 31. Curran Associates, Inc., 8011\u20138023.","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915581193"},{"key":"e_1_3_3_26_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Jeon Hong Jun","year":"2020","unstructured":"Hong Jun Jeon, Smitha Milli, and Anca D. Dragan. 2020. Reward-rational (implicit) choice: A unifying formalism for reward learning. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_27_2","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation","author":"Jonnavittula Ananth","year":"2021","unstructured":"Ananth Jonnavittula and Dylan P. Losey. 2021. Learning human objectives by (under)estimating their choice set. In Proceedings of the IEEE International Conference on Robotics and Automation."},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1142\/9789814417358_0006"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.23.11.1224"},{"key":"e_1_3_3_30_2","first-page":"43","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Kwon Minae","year":"2020","unstructured":"Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P. Losey, and Dorsa Sadigh. 2020. When humans aren\u2019t optimal: Robots that collaborate with risk-aware humans. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 43\u201352."},{"key":"e_1_3_3_31_2","first-page":"87","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Kwon Minae","year":"2018","unstructured":"Minae Kwon, Sandy H. Huang, and Anca D. Dragan. 2018. Expressing robot incapability. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 87\u201395."},{"key":"e_1_3_3_32_2","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Losey Dylan P.","year":"2019","unstructured":"Dylan P. Losey and Dorsa Sadigh. 2019. Robots that take advantage of human trust. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems."},{"key":"e_1_3_3_33_2","volume-title":"Individual Choice Behavior: A Theoretical Analysis","author":"Luce R. Duncan","year":"2012","unstructured":"R. Duncan Luce. 2012. Individual Choice Behavior: A Theoretical Analysis. Courier Corporation."},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01588971"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917690593"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3203305"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1561\/2300000053"},{"key":"e_1_3_3_38_2","first-page":"817","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence","volume":"5","author":"Powers Rob","year":"2005","unstructured":"Rob Powers and Yoav Shoham. 2005. Learning against opponents with bounded memory. In Proceedings of the International Joint Conference on Artificial Intelligence 5. 817\u2013822."},{"key":"e_1_3_3_39_2","first-page":"123","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Racca Mattia","year":"2018","unstructured":"Mattia Racca and Ville Kyrki. 2018. Active robot learning for temporal task models. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 123\u2013131."},{"key":"e_1_3_3_40_2","first-page":"335","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Racca Mattia","year":"2019","unstructured":"Mattia Racca, Antti Oulasvirta, and Ville Kyrki. 2019. Teacher-aware active robot learning. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 335\u2013343."},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.5555\/2026506.2026545"},{"key":"e_1_3_3_42_2","first-page":"2586","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence","author":"Ramachandran Deepak","year":"2007","unstructured":"Deepak Ramachandran and Eyal Amir. 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence. 2586\u20132591."},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919842925"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2017.XIII.053"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(78)90012-7"},{"key":"e_1_3_3_46_2","first-page":"43","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Schrum Mariah L.","year":"2020","unstructured":"Mariah L. Schrum, Michael Johnson, Muyleng Ghuy, and Matthew C. Gombolay. 2020. Four years in review: Statistical practices of likert scales in human-robot interaction studies. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 43\u201352."},{"key":"e_1_3_3_47_2","article-title":"Interactive robot training for non-markov tasks","author":"Shah Ankit","year":"2020","unstructured":"Ankit Shah and Julie Shah. 2020. Interactive robot training for non-markov tasks. arXiv:2003.02232. Retrieved from https:\/\/arxiv.org\/abs\/2003.02232.","journal-title":"arXiv:2003.02232"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2014.X.024"},{"key":"e_1_3_3_49_2","first-page":"67","volume-title":"Proceedings of the Conference on Robot Learning","author":"Thomason Jesse","year":"2017","unstructured":"Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Justin Hart, Peter Stone, and Raymond J. Mooney. 2017. Opportunistic active learning for grounding natural language descriptions. In Proceedings of the Conference on Robot Learning. 67\u201376."},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.5688\/ajpe777155"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196661"},{"key":"e_1_3_3_52_2","first-page":"316","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Walker Michael","year":"2018","unstructured":"Michael Walker, Hooman Hedayati, Jennifer Lee, and Daniel Szafir. 2018. Communicating robot motion intent with augmented reality. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 316\u2013324."},{"key":"e_1_3_3_53_2","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Wilde Nils","year":"2020","unstructured":"Nils Wilde, Dana Kulic, and Stephen L. Smith. 2020. Active preference learning using maximum regret. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems."},{"key":"e_1_3_3_54_2","unstructured":"Brian D. Ziebart Andrew L. Maas J. Andrew Bagnell and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd National Conference on Artificial Intelligence . 1433\u20131438."}],"container-title":["ACM Transactions on Human-Robot Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3526107","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3526107","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:46Z","timestamp":1750272226000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3526107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,8]]},"references-count":53,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3526107"],"URL":"https:\/\/doi.org\/10.1145\/3526107","relation":{},"ISSN":["2573-9522","2573-9522"],"issn-type":[{"value":"2573-9522","type":"print"},{"value":"2573-9522","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,8]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}