{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T03:02:14Z","timestamp":1776740534027,"version":"3.51.2"},"reference-count":89,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,6,14]],"date-time":"2024-06-14T00:00:00Z","timestamp":1718323200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"ONR"},{"name":"AFOSR YIP"},{"name":"DARPA YFA"},{"name":"NSF","award":["No. 2218760 and No. 2125511"],"award-info":[{"award-number":["No. 2218760 and No. 2125511"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Hum.-Robot Interact."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>\n            Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this article, we develop a set of novel algorithms,\n            <jats:italic>batch active preference-based learning<\/jats:italic>\n            methods, that enable efficient learning of reward functions using as few data samples as possible while still having short query generation times and also retaining parallelizability. We introduce a method based on determinantal point processes for active batch generation and several heuristic-based alternatives. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We showcase one of our algorithms in a study to learn human users\u2019 preferences.\n          <\/jats:p>","DOI":"10.1145\/3649885","type":"journal-article","created":{"date-parts":[[2024,2,29]],"date-time":"2024-02-29T12:26:10Z","timestamp":1709209570000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Batch Active Learning of Reward Functions from Human Preferences"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9516-3130","authenticated-orcid":false,"given":"Erdem","family":"Biyik","sequence":"first","affiliation":[{"name":"Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4394-3530","authenticated-orcid":false,"given":"Nima","family":"Anari","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7802-9183","authenticated-orcid":false,"given":"Dorsa","family":"Sadigh","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University, Stanford, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,6,14]]},"reference":[{"key":"e_1_3_4_2_2","first-page":"137","article-title":"An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity","volume":"13","author":"Ailon Nir","year":"2012","unstructured":"Nir Ailon. 2012. An active learning algorithm for ranking from pairwise preferences with an almost optimal query complexity. J. Mach. Learn. Res. 13(Jan.2012), 137\u2013164.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-012-0160-0"},{"key":"e_1_3_4_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23780-5_11"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33486-3_8"},{"key":"e_1_3_4_6_2","first-page":"103","volume-title":"Proceedings of the Conference on Learning Theory","author":"Anari Nima","year":"2016","unstructured":"Nima Anari, Shayan Oveis Gharan, and Alireza Rezaei. 2016. Monte carlo markov chain algorithms for sampling strongly rayleigh distributions and determinantal point processes. In Proceedings of the Conference on Learning Theory. 103\u2013115."},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3313276.3316385"},{"key":"e_1_3_4_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273501"},{"key":"e_1_3_4_9_2","article-title":"Medoids in almost linear time via multi-armed bandits","author":"Bagaria Vivek","year":"2017","unstructured":"Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos, Martin J. Zhang, and David Tse. 2017. Medoids in almost linear time via multi-armed bandits. arXiv preprint arXiv:1711.00817 (2017).","journal-title":"arXiv preprint arXiv:1711.00817"},{"key":"e_1_3_4_10_2","first-page":"141","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Bajcsy Andrea","year":"2018","unstructured":"Andrea Bajcsy, Dylan P. Losey, Marcia K. O\u2019Malley, and Anca D. Dragan. 2018. Learning from physical human corrections, one feature at a time. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 141\u2013149."},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8968522"},{"key":"e_1_3_4_12_2","first-page":"417","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"Basu Chandrayee","year":"2017","unstructured":"Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Singhal, and Anca D. Dragan. 2017. Do you want your autonomous car to drive like you? In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. ACM, 417\u2013425."},{"key":"e_1_3_4_13_2","article-title":"Numpy\/scipy recipes for data science: k-medoids clustering","author":"Bauckhage Christian","year":"2015","unstructured":"Christian Bauckhage. 2015. Numpy\/scipy recipes for data science: k-medoids clustering. Researchgate.net. Retrieved from https:\/\/www.researchgate.net\/publication\/272351873_NumPy_SciPy_Recipes_for_Data_Science_k-Medoids_Clustering","journal-title":"Researchgate.net"},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1177011077"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2020.XVI.041"},{"key":"e_1_3_4_16_2","doi-asserted-by":"crossref","unstructured":"E. B\u0131y\u0131k N. Huynh M. J. Kochenderfer and D. Sadigh. 2023. Active preference-based gaussian process regression for reward learning and optimization. The International Journal of Robotics Research 0 0 (2023). DOI:10.1177\/02783649231208729","DOI":"10.1177\/02783649231208729"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1177\/02783649211041652"},{"key":"e_1_3_4_18_2","volume-title":"Proceedings of the 3rd Conference on Robot Learning (CoRL\u201919)","author":"Biyik Erdem","year":"2019","unstructured":"Erdem Biyik, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, and Dorsa Sadigh. 2019. Asking easy questions: A user-friendly approach to active reward learning. In Proceedings of the 3rd Conference on Robot Learning (CoRL\u201919)."},{"key":"e_1_3_4_19_2","first-page":"519","volume-title":"Proceedings of the 2nd Conference on Robot Learning (CoRL\u201918) (Proceedings of Machine Learning Research)","volume":"87","author":"Biyik Erdem","year":"2018","unstructured":"Erdem Biyik and Dorsa Sadigh. 2018. Batch active preference-based learning of reward functions. In Proceedings of the 2nd Conference on Robot Learning (CoRL\u201918) (Proceedings of Machine Learning Research), Vol. 87. PMLR, 519\u2013528."},{"key":"e_1_3_4_20_2","first-page":"613","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction","author":"B\u0131y\u0131k Erdem","year":"2022","unstructured":"Erdem B\u0131y\u0131k, Aditi Talati, and Dorsa Sadigh. 2022. APReL: A library for active preference-based reward learning algorithms. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. 613\u2013617."},{"key":"e_1_3_4_21_2","unstructured":"Erdem Biyik Kenneth Wang Nima Anari and Dorsa Sadigh. 2019. Batch active learning using determinantal point processes. Retrieved from https:\/\/arXiv:1906.07975"},{"key":"e_1_3_4_22_2","unstructured":"Erdem Biyik Fan Yao Yinlam Chow Alex Haig Chih-wei Hsu Mohammad Ghavamzadeh and Craig Boutilier. 2023. Preference elicitation with soft attributes in interactive recommendation. Retrieved from https:\/\/arXiv:2311.02085"},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/2213556.2213580"},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10955-005-7583-z"},{"key":"e_1_3_4_25_2","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. Openai gym. Retrieved from https:\/\/arXiv:1606.01540"},{"key":"e_1_3_4_26_2","first-page":"330","volume-title":"Proceedings of the Conference on Robot Learning","author":"Brown Daniel S.","year":"2020","unstructured":"Daniel S. Brown, Wonjoon Goo, and Scott Niekum. 2020. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Proceedings of the Conference on Robot Learning. PMLR, 330\u2013359."},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2016.10.037"},{"key":"e_1_3_4_28_2","unstructured":"Stephen Casper Xander Davies Claudia Shi Thomas Krendl Gilbert J\u00e9r\u00e9my Scheurer Javier Rando Rachel Freedman Tomasz Korbak David Lindner Pedro Freire Tony Tong Wang Samuel Marks Charbel-Raphael Segerie Micah Carroll Andi Peng Phillip Christoffersen Mehul Damani Stewart Slocum Usman Anwar Anand Siththaranjan Max Nadeau Eric J. Michaud Jacob Pfau Dmitrii Krasheninnikov Xin Chen Lauro Langosco Peter Hase Erdem Biyik Anca Dragan David Krueger Dorsa Sadigh and Dylan Hadfield-Menell. 2023. Open problems and fundamental limitations of reinforcement learning from human feedback. Transactions on Machine Learning Research (2023). Retrieved from https:\/\/openreview.net\/forum?id=bx24KpJ4Eb"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611974782.9"},{"key":"e_1_3_4_30_2","unstructured":"Yuxin Chen and Andreas Krause. 2013. Near-optimal batch mode active learning and adaptive submodular optimization. In Proceedings of the International Conference on Machine Learning (ICML\u201913). 160\u2013168."},{"key":"e_1_3_4_31_2","first-page":"4302","volume-title":"Advances in Neural Information Processing Systems","author":"Christiano Paul F","year":"2017","unstructured":"Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems. 4302\u20134310."},{"key":"e_1_3_4_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2009.06.018"},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-011-9582-6"},{"key":"e_1_3_4_34_2","volume-title":"Elements of Information Theory","author":"Cover Thomas M.","year":"1999","unstructured":"Thomas M. Cover. 1999. Elements of Information Theory. John Wiley & Sons."},{"key":"e_1_3_4_35_2","first-page":"1457","volume-title":"Advances in Neural Information Processing Systems","author":"Cuong Nguyen Viet","year":"2013","unstructured":"Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, and Hai Leong Chieu. 2013. Active learning for probabilistic hypotheses using the maximum Gibbs error criterion. In Advances in Neural Information Processing Systems. 1457\u20131465."},{"key":"e_1_3_4_36_2","article-title":"Preference learning in recommender systems","volume":"41","author":"Gemmis Marco De","year":"2009","unstructured":"Marco De Gemmis, Leo Iaquinta, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Giovanni Semeraro. 2009. Preference learning in recommender systems. Pref. Learn. 41 (2009).","journal-title":"Pref. Learn."},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2511748"},{"key":"e_1_3_4_38_2","volume-title":"Proceedings of the International Conference on Robotics and Automation (ICRA\u201924)","author":"Ellis Evan","year":"2024","unstructured":"Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca Dragan, and Erdem Biyik. 2024. A generalized acquisition function for preference-based reward learning. In Proceedings of the International Conference on Robotics and Automation (ICRA\u201924)."},{"key":"e_1_3_4_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-012-5313-8"},{"key":"e_1_3_4_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526761"},{"key":"e_1_3_4_41_2","first-page":"593","volume-title":"Advances in Neural Information Processing Systems","author":"Guo Yuhong","year":"2008","unstructured":"Yuhong Guo and Dale Schuurmans. 2008. Discriminative batch mode active learning. In Advances in Neural Information Processing Systems. 593\u2013600."},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.2307\/3318737"},{"key":"e_1_3_4_43_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201924)","author":"Hejna Joey","year":"2024","unstructured":"Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, and Dorsa Sadigh. 2024. Contrastive preference learning: Learning from human feedback without RL. In Proceedings of the International Conference on Learning Representations (ICLR\u201924)."},{"key":"e_1_3_4_44_2","unstructured":"Jonathan Hermon and Justin Salez. 2019. Modified log-sobolev inequalities for strong-rayleigh measures. Retrieved from https:\/\/arXiv:1902.02775"},{"key":"e_1_3_4_45_2","volume-title":"Proceedings of the RSS Workshop on Model Learning for Human-Robot Communication","author":"Holladay Rachel","year":"2016","unstructured":"Rachel Holladay, Shervin Javdani, Anca Dragan, and Siddhartha Srinivasa. 2016. Active comparison based learning incorporating user uncertainty and noise. In Proceedings of the RSS Workshop on Model Learning for Human-Robot Communication."},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915581193"},{"key":"e_1_3_4_47_2","unstructured":"Sydney M. Katz Amir Maleki Erdem B\u0131y\u0131k and Mykel J. Kochenderfer. 2021. Preference-based learning of reward function features. Retrieved from https:\/\/arXiv:2103.02727"},{"key":"e_1_3_4_48_2","volume-title":"Clustering by Means of Medoids","author":"Kaufman Leonard","year":"1987","unstructured":"Leonard Kaufman and Peter Rousseeuw. 1987. Clustering by Means of Medoids. North-Holland."},{"key":"e_1_3_4_49_2","doi-asserted-by":"publisher","DOI":"10.1287\/opre.43.4.684"},{"key":"e_1_3_4_50_2","doi-asserted-by":"publisher","DOI":"10.5555\/3351864"},{"key":"e_1_3_4_51_2","volume-title":"Proceedings of the NIPS Workshop on the Future of Interactive Learning Machines","author":"Krueger David","year":"2016","unstructured":"David Krueger, Jan Leike, Owain Evans, and John Salvatier. 2016. Active reinforcement learning: Observing rewards at a cost. In Proceedings of the NIPS Workshop on the Future of Interactive Learning Machines."},{"key":"e_1_3_4_52_2","first-page":"1193","volume-title":"Proceedings of the 28th International Conference on Machine Learning (ICML\u201911)","author":"Kulesza Alex","year":"2011","unstructured":"Alex Kulesza and Ben Taskar. 2011. k-DPPs: Fixed-size determinantal point processes. In Proceedings of the 28th International Conference on Machine Learning (ICML\u201911). 1193\u20131200."},{"key":"e_1_3_4_53_2","doi-asserted-by":"publisher","DOI":"10.5555\/2481023"},{"key":"e_1_3_4_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3319502.3374832"},{"key":"e_1_3_4_55_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems (NeurIPS\u201921)","author":"Lee Kimin","year":"2021","unstructured":"Kimin Lee, Laura Smith, Anca Dragan, and Pieter Abbeel. 2021. B-Pref: Benchmarking preference-based reinforcement learning. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS\u201921)."},{"key":"e_1_3_4_56_2","first-page":"475","volume-title":"Proceedings of the 29th International Conference on Machine Learning","author":"Levine Sergey","year":"2012","unstructured":"Sergey Levine and Vladlen Koltun. 2012. Continuous inverse optimal control with locally optimal examples. In Proceedings of the 29th International Conference on Machine Learning. 475\u2013482."},{"key":"e_1_3_4_57_2","first-page":"4188","volume-title":"Advances in Neural Information Processing Systems","author":"Li Chengtao","year":"2016","unstructured":"Chengtao Li, Suvrit Sra, and Stefanie Jegelka. 2016. Fast mixing Markov chains for strongly rayleigh measures, DPPs, and constrained sampling. In Advances in Neural Information Processing Systems. 4188\u20134196."},{"key":"e_1_3_4_58_2","unstructured":"Kejun Li Maegan Tucker Erdem Biyik Ellen Novoseller Joel W. Burdick Yanan Sui Dorsa Sadigh Yisong Yue and Aaron D. Ames. 2020. ROIAL: Region of interest active learning for characterizing exoskeleton gait preference landscapes. Retrieved from https:\/\/arXiv:2011.04812"},{"key":"e_1_3_4_59_2","doi-asserted-by":"crossref","unstructured":"Yi Liu Gaurav Datta Ellen Novoseller and Daniel S. Brown. 2023. Efficient preference-based reinforcement learning using learned dynamics models. Retrieved from https:\/\/arXiv:2301.04741","DOI":"10.1109\/ICRA48891.2023.10161081"},{"key":"e_1_3_4_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_3_4_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197197"},{"key":"e_1_3_4_62_2","first-page":"4464","volume-title":"Advances in Neural Information Processing Systems","author":"Mariet Zelda E","year":"2018","unstructured":"Zelda E Mariet, Suvrit Sra, and Stefanie Jegelka. 2018. Exponentiated strongly rayleigh distributions. In Advances in Neural Information Processing Systems. 4464\u20134474."},{"key":"e_1_3_4_63_2","first-page":"342","volume-title":"Proceedings of the Conference on Robot Learning","author":"Myers Vivek","year":"2022","unstructured":"Vivek Myers, Erdem Biyik, Nima Anari, and Dorsa Sadigh. 2022. Learning multimodal rewards from rankings. In Proceedings of the Conference on Robot Learning. PMLR, 342\u2013352."},{"key":"e_1_3_4_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160439"},{"key":"e_1_3_4_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/2746539.2746628"},{"key":"e_1_3_4_66_2","doi-asserted-by":"publisher","DOI":"10.15607\/rss.2019.xv.023"},{"key":"e_1_3_4_67_2","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Stefano Ermon Christopher D. Manning and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. Retrieved from https:\/\/arXiv:2305.18290"},{"key":"e_1_3_4_68_2","volume-title":"Safe and Interactive Autonomy: Control, Learning, and Verification","author":"Sadigh Dorsa","year":"2017","unstructured":"Dorsa Sadigh. 2017. Safe and Interactive Autonomy: Control, Learning, and Verification. Ph.D. Dissertation. EECS Department, University of California, Berkeley."},{"key":"e_1_3_4_69_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2017.XIII.053"},{"key":"e_1_3_4_70_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2016.XII.029"},{"key":"e_1_3_4_71_2","unstructured":"Ozan Sener and Silvio Savarese. 2017. A geometric approach to active learning for convolutional neural networks. Retrieved from https:\/\/arXiv:1708.00489"},{"key":"e_1_3_4_72_2","doi-asserted-by":"crossref","unstructured":"Ankit Shah Samir Wadhwania and Julie Shah. 2020. Interactive robot training for non-markov tasks. Retrieved from https:\/\/arXiv:2003.02232","DOI":"10.1145\/3371382.3377437"},{"key":"e_1_3_4_73_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2012-72"},{"key":"e_1_3_4_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_3_4_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196661"},{"key":"e_1_3_4_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2009.4960685"},{"key":"e_1_3_4_77_2","doi-asserted-by":"publisher","DOI":"10.3844\/jcssp.2010.363.368"},{"key":"e_1_3_4_78_2","unstructured":"Yufei Wang Zhanyi Sun Jesse Zhang Zhou Xian Erdem Biyik David Held and Zackory Erickson. 2024. RL-VLM-F: Reinforcement learning from vision language foundation model feedback. Retrieved from https:\/\/arXiv:2402.03681"},{"key":"e_1_3_4_79_2","first-page":"1954","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wei Kai","year":"2015","unstructured":"Kai Wei, Rishabh Iyer, and Jeff Bilmes. 2015. Submodularity in data subset selection and active learning. In Proceedings of the International Conference on Machine Learning. 1954\u20131963."},{"key":"e_1_3_4_80_2","doi-asserted-by":"publisher","DOI":"10.2307\/3001968"},{"key":"e_1_3_4_81_2","volume-title":"Proceedings of the 6th Annual Conference on Robot Learning","author":"Wilde Nils","year":"2022","unstructured":"Nils Wilde and Javier Alonso-Mora. 2022. Do we use the right measure? Challenges in evaluating reward learning algorithms. In Proceedings of the 6th Annual Conference on Robot Learning. Retrieved from https:\/\/openreview.net\/forum?id=1vV0JRA2HY0"},{"key":"e_1_3_4_82_2","volume-title":"Proceedings of the 5th Conference on Robot Learning (CoRL\u201921)","author":"Wilde Nils","year":"2021","unstructured":"Nils Wilde and Erdem Biyik. 2021. Learning reward functions from scale feedback. In Proceedings of the 5th Conference on Robot Learning (CoRL\u201921)."},{"key":"e_1_3_4_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341530"},{"key":"e_1_3_4_84_2","first-page":"1133","volume-title":"Advances in Neural Information Processing Systems","author":"Wilson Aaron","year":"2012","unstructured":"Aaron Wilson, Alan Fern, and Prasad Tadepalli. 2012. A bayesian approach for policy learning from trajectory preference queries. In Advances in Neural Information Processing Systems. 1133\u20131141."},{"key":"e_1_3_4_85_2","volume-title":"Proceedings of the Workshop on Autonomous Mobile Service Robots","author":"Wise Melonee","year":"2016","unstructured":"Melonee Wise, Michael Ferguson, Derek King, Eric Diehr, and David Dymesich. 2016. Fetch and freight: Standard platforms for service robot applications. In Proceedings of the Workshop on Autonomous Mobile Service Robots."},{"key":"e_1_3_4_86_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2018.12.027"},{"key":"e_1_3_4_87_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0781-x"},{"key":"e_1_3_4_88_2","volume-title":"Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI\u201917)","author":"Zhang Cheng","year":"2017","unstructured":"Cheng Zhang, Hedvig Kjellstr\u00f6m, and Stephan Mandt. 2017. Determinantal point processes for mini-batch diversification. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI\u201917). AUAI Press, Corvallis."},{"key":"e_1_3_4_89_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33015741"},{"key":"e_1_3_4_90_2","first-page":"1433","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201908)","volume":"8","author":"Ziebart Brian D.","year":"2008","unstructured":"Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201908), Vol. 8. 1433\u20131438."}],"container-title":["ACM Transactions on Human-Robot Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649885","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3649885","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3649885","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:54:07Z","timestamp":1750287247000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649885"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,14]]},"references-count":89,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3649885"],"URL":"https:\/\/doi.org\/10.1145\/3649885","relation":{},"ISSN":["2573-9522"],"issn-type":[{"value":"2573-9522","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,14]]},"assertion":[{"value":"2022-08-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-15","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}