{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T09:58:33Z","timestamp":1777715913073,"version":"3.51.4"},"reference-count":60,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2021,8,28]],"date-time":"2021-08-28T00:00:00Z","timestamp":1630108800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"FLI","award":["RFP2-000"],"award-info":[{"award-number":["RFP2-000"]}]},{"DOI":"10.13039\/100015599","name":"toyota research institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100015599","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NSF","award":["#1849952"],"award-info":[{"award-number":["#1849952"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2022,1]]},"abstract":"<jats:p>Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human\u2019s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.<\/jats:p>","DOI":"10.1177\/02783649211041652","type":"journal-article","created":{"date-parts":[[2021,8,28]],"date-time":"2021-08-28T07:56:16Z","timestamp":1630137376000},"page":"45-67","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":63,"title":["Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences"],"prefix":"10.1177","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9516-3130","authenticated-orcid":false,"given":"Erdem","family":"B\u0131y\u0131k","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dylan P.","family":"Losey","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Malayandi","family":"Palan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicholas C.","family":"Landolfi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gleb","family":"Shevchuk","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dorsa","family":"Sadigh","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Stanford University, Stanford, CA, USA"},{"name":"Department of Computer Science, Stanford University, Stanford, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2021,8,28]]},"reference":[{"key":"bibr1-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"bibr2-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102352"},{"key":"bibr3-02783649211041652","first-page":"137","volume":"13","author":"Ailon N","year":"2012","journal-title":"Journal of Machine Learning Research"},{"key":"bibr4-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-012-0160-0"},{"key":"bibr5-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33486-3_8"},{"key":"bibr6-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/3171221.3171267"},{"key":"bibr7-02783649211041652","first-page":"217","volume":"78","author":"Bajcsy A","year":"2017","journal-title":"Proceedings of Machine Learning Research"},{"key":"bibr8-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8968522"},{"key":"bibr9-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/2909824.3020250"},{"key":"bibr10-02783649211041652","volume-title":"Discrete Choice Analysis: Theory and Application to Travel Demand ( Transportation Studies Series","volume":"9","author":"Ben-Akiva ME","year":"1985"},{"key":"bibr11-02783649211041652","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2020.XVI.041"},{"key":"bibr12-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/CDC40024.2019.9030169"},{"key":"bibr13-02783649211041652","author":"Biyik E","year":"2019","journal-title":"Proceedings of the 3rd Conference on Robot Learning (CoRL)"},{"key":"bibr14-02783649211041652","author":"Biyik E","year":"2018","journal-title":"Conference on Robot Learning (CoRL)"},{"key":"bibr15-02783649211041652","author":"Byk E","year":"2019","journal-title":"arXiv preprint arXiv:1906.07975"},{"key":"bibr16-02783649211041652","first-page":"796","author":"Bobu A","year":"2018","journal-title":"Conference on Robot Learning"},{"key":"bibr17-02783649211041652","author":"Brockman G","year":"2016","journal-title":"arXiv preprint arXiv:1606.01540"},{"key":"bibr18-02783649211041652","first-page":"783","author":"Brown D","year":"2019","journal-title":"International Conference on Machine Learning"},{"key":"bibr19-02783649211041652","first-page":"330","author":"Brown DS","year":"2020","journal-title":"Conference on Robot Learning"},{"key":"bibr20-02783649211041652","author":"Brown DS","year":"2019","journal-title":"Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019"},{"key":"bibr21-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6094735"},{"key":"bibr22-02783649211041652","author":"Chen L","year":"2020","journal-title":"Conference on Robot Learning"},{"key":"bibr23-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2019.8673256"},{"key":"bibr24-02783649211041652","first-page":"4299","author":"Christiano PF","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr25-02783649211041652","first-page":"1019","volume":"6","author":"Chu W","year":"2005","journal-title":"Journal of Machine Learning Research"},{"key":"bibr26-02783649211041652","volume-title":"Elements of Information Theory","author":"Cover TM","year":"2012"},{"key":"bibr27-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1038\/nature04766"},{"key":"bibr28-02783649211041652","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2012.VIII.010"},{"key":"bibr29-02783649211041652","first-page":"289","author":"Guo S","year":"2010","journal-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics"},{"key":"bibr30-02783649211041652","author":"Habibian S","year":"2021","journal-title":"arXiv preprint arXiv:2107.01995"},{"key":"bibr31-02783649211041652","author":"Holladay R","year":"2016","journal-title":"RSS Workshop on Model Learning for Human\u2013Robot Communication"},{"key":"bibr32-02783649211041652","first-page":"8011","author":"Ibarz B","year":"2018","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr33-02783649211041652","author":"Javdani S","year":"2015","journal-title":"Robotics Science and Systems: Online Proceedings"},{"key":"bibr34-02783649211041652","author":"Katz S","year":"2021","journal-title":"arXiv preprint arXiv:2103.02727"},{"key":"bibr35-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/DASC43569.2019.9081648"},{"key":"bibr36-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1162\/PRES_a_00223"},{"key":"bibr37-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.23.11.1224"},{"key":"bibr38-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/2556288.2557238"},{"key":"bibr39-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/3319502.3374832"},{"key":"bibr40-02783649211041652","doi-asserted-by":"publisher","DOI":"10.2514\/1.I010363"},{"key":"bibr41-02783649211041652","author":"Li K","year":"2021","journal-title":"International Conference on Robotics and Automation (ICRA)"},{"key":"bibr42-02783649211041652","author":"Li M","year":"2021","journal-title":"International Conference on Robotics and Automation (ICRA)"},{"key":"bibr43-02783649211041652","first-page":"985","author":"Lucas CG","year":"2009","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr44-02783649211041652","volume-title":"Individual Choice Behavior: A Theoretical Analysis","author":"Luce RD","year":"2012"},{"key":"bibr45-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33486-3_10"},{"key":"bibr46-02783649211041652","first-page":"2","volume":"1","author":"Ng AY","year":"2000","journal-title":"International Conference on Machine Learning"},{"key":"bibr47-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1145\/2696454.2696455"},{"key":"bibr48-02783649211041652","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2019.XV.023"},{"key":"bibr49-02783649211041652","first-page":"1005","volume":"100","author":"Park D","year":"2020","journal-title":"Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research"},{"key":"bibr50-02783649211041652","first-page":"2586","volume":"7","author":"Ramachandran D","year":"2007","journal-title":"International Joint Conference on Artificial Intelligence"},{"key":"bibr51-02783649211041652","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2017.XIII.053"},{"key":"bibr52-02783649211041652","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2016.XII.029"},{"key":"bibr53-02783649211041652","author":"Schulman J","year":"2017","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"bibr54-02783649211041652","author":"Shah A","year":"2020","journal-title":"arXiv preprint arXiv:2003.02232"},{"key":"bibr55-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"bibr56-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196661"},{"key":"bibr57-02783649211041652","first-page":"2352","author":"Viappiani P","year":"2010","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr58-02783649211041652","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2019.2897342"},{"key":"bibr59-02783649211041652","author":"Wise M","year":"2016","journal-title":"Workshop on Autonomous Mobile Service Robots"},{"key":"bibr60-02783649211041652","first-page":"1433","volume-title":"Proceedings of the AAAI","volume":"8","author":"Ziebart BD","year":"2008"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649211041652","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649211041652","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649211041652","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:16:50Z","timestamp":1777457810000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649211041652"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,28]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1]]}},"alternative-id":["10.1177\/02783649211041652"],"URL":"https:\/\/doi.org\/10.1177\/02783649211041652","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,28]]}}}