{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T04:40:47Z","timestamp":1775709647215,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,8,26]],"date-time":"2024-08-26T00:00:00Z","timestamp":1724630400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSF","award":["#2129201"],"award-info":[{"award-number":["#2129201"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Hum.-Robot Interact."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p>\n            Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine some interaction types. Some methods do so by assuming that the robot has prior information about the features of the task and the reward structure. By contrast, in this article, we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human\u2019s input to nearby alternatives, i.e., trajectories close to the human\u2019s feedback. We first derive a loss function that trains an ensemble of reward models to match the human\u2019s demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: We enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study, we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"url\" xlink:href=\"https:\/\/youtu.be\/FSUJsTYvEKU\">https:\/\/youtu.be\/FSUJsTYvEKU<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3623384","type":"journal-article","created":{"date-parts":[[2023,9,23]],"date-time":"2023-09-23T04:34:17Z","timestamp":1695443657000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human\u2013Robot Interaction"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3160-6879","authenticated-orcid":false,"given":"Shaunak A.","family":"Mehta","sequence":"first","affiliation":[{"name":"Virginia Tech, Blacksburg, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8787-5293","authenticated-orcid":false,"given":"Dylan P.","family":"Losey","sequence":"additional","affiliation":[{"name":"Virginia Tech, Blacksburg, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,8,26]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-012-0160-0"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2008.10.024"},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Erdem B\u0131y\u0131k Dylan P. Losey Malayandi Palan Nicholas C. Landolfi Gleb Shevchuk and Dorsa Sadigh. 2022. Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences. The International Journal of Robotics Research 41 1 (2022) 45\u201367.","DOI":"10.1177\/02783649211041652"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2020.2971415"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"Andreea Bobu Marius Wiggert Claire Tomlin and Anca D. Dragan. 2022. Inducing structure in reward learning by learning features. The International Journal of Robotics Research 41 5 (2022) 497\u2013518.","DOI":"10.1177\/02783649221078031"},{"key":"e_1_3_2_8_2","first-page":"783","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Brown Daniel","year":"2019","unstructured":"Daniel Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum. 2019. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In Proceedings of the International Conference on Machine Learning. 783\u2013792."},{"key":"e_1_3_2_9_2","first-page":"330","volume-title":"Proceedings of the Conference on Robot Learning","author":"Brown Daniel S.","year":"2020","unstructured":"Daniel S. Brown, Wonjoon Goo, and Scott Niekum. 2020. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. In Proceedings of the Conference on Robot Learning. PMLR, 330\u2013359."},{"key":"e_1_3_2_10_2","first-page":"1262","volume-title":"Proceedings of the Conference on Robot Learning","author":"Chen Letian","year":"2021","unstructured":"Letian Chen, Rohan Paleja, and Matthew Gombolay. 2021. Learning from suboptimal demonstration via self-supervised reward regression. In Proceedings of the Conference on Robot Learning. 1262\u20131277."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.5555\/3294996.3295184"},{"key":"e_1_3_2_12_2","volume-title":"Elements of Information Theory","author":"Cover Thomas M.","year":"1999","unstructured":"Thomas M. Cover. 1999. Elements of Information Theory. John Wiley & Sons."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.mechmachtheory.2007.03.003"},{"key":"e_1_3_2_14_2","first-page":"2339","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation","author":"Dragan Anca D.","year":"2015","unstructured":"Anca D. Dragan, Katharina Muelling, J. Andrew Bagnell, and Siddhartha S. Srinivasa. 2015. Movement primitives via optimization. In Proceedings of the IEEE International Conference on Robotics and Automation. 2339\u20132346."},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Fu Justin","year":"2018","unstructured":"Justin Fu, Katie Luo, and Sergey Levine. 2018. Learning robust rewards with adversarial inverse reinforcement learning. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0036144504446096"},{"key":"e_1_3_2_17_2","first-page":"3356","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Haddadin Sami","year":"2008","unstructured":"Sami Haddadin, Alin Albu-Schaffer, Alessandro De Luca, and Gerd Hirzinger. 2008. Collision detection and reaction: A contribution to safe physical human\u2013robot interaction. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. 3356\u20133363."},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Sami Haddadin and Elizabeth Croft. 2016. Physical Human\u2013Robot Interaction. In Springer Handbook of Robotics Bruno Siciliano and Oussama Khatib (Eds.). Springer International Publishing Cham 1835\u20131874.","DOI":"10.1007\/978-3-319-32552-1_69"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3064500"},{"key":"e_1_3_2_20_2","first-page":"304","volume-title":"Proceedings of the American Control Conference","author":"Hogan Neville","year":"1984","unstructured":"Neville Hogan. 1984. Impedance control: An approach to manipulation. In Proceedings of the American Control Conference. 304\u2013313."},{"key":"e_1_3_2_21_2","volume-title":"Proceedings of the Conference on Robot Learning","author":"Hoque Ryan","year":"2021","unstructured":"Ryan Hoque, Ashwin Balakrishna, Ellen Novoseller, Albert Wilcox, Daniel S. Brown, and Ken Goldberg. 2021. ThriftyDAgger: Budget-aware novelty and risk gating for interactive imitation learning. In Proceedings of the Conference on Robot Learning."},{"key":"e_1_3_2_22_2","first-page":"7674","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Howell Taylor A.","year":"2019","unstructured":"Taylor A. Howell, Brian E. Jackson, and Zachary Manchester. 2019. ALTRO: A fast solver for constrained trajectory optimization. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. 7674\u20137679."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327757.3327897"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915581193"},{"key":"e_1_3_2_25_2","first-page":"4415","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"Jeon Hong Jun","year":"2020","unstructured":"Hong Jun Jeon, Smitha Milli, and Anca Dragan. 2020. Reward-rational (implicit) choice: A unifying formalism for reward learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 4415\u20134426."},{"key":"e_1_3_2_26_2","first-page":"1331","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation","author":"Kalakrishnan Mrinal","year":"2013","unstructured":"Mrinal Kalakrishnan, Peter Pastor, Ludovic Righetti, and Stefan Schaal. 2013. Learning objective functions for manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation. 1331\u20131336."},{"key":"e_1_3_2_27_2","first-page":"8077","volume-title":"Proceedings of the International Conference on Robotics and Automation","author":"Kelly Michael","year":"2019","unstructured":"Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell, and Mykel J. Kochenderfer. 2019. HG-DAgger: Interactive imitation learning with human experts. In Proceedings of the International Conference on Robotics and Automation. 8077\u20138083."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-018-9764-z"},{"key":"e_1_3_2_29_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_30_2","first-page":"6152","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Lee Kimin","year":"2021","unstructured":"Kimin Lee, Laura M. Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In Proceedings of the International Conference on Machine Learning. 6152\u20136163."},{"key":"e_1_3_2_31_2","first-page":"2877","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation","author":"Li Mengxi","year":"2021","unstructured":"Mengxi Li, Alper Canberk, Dylan P. Losey, and Dorsa Sadigh. 2021. Learning human objectives from sequences of physical corrections. In Proceedings of the IEEE International Conference on Robotics and Automation. 2877\u20132883."},{"issue":"1","key":"e_1_3_2_32_2","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/s42256-018-0010-3","article-title":"Differential game theory for versatile physical human\u2013robot interaction","volume":"1","author":"Li Yanan","year":"2019","unstructured":"Yanan Li, Gerolamo Carboni, Franck Gonzalez, Domenico Campolo, and Etienne Burdet. 2019. Differential game theory for versatile physical human\u2013robot interaction. Nature Machine Intelligence 1, 1 (2019), 36\u201343.","journal-title":"Nature Machine Intelligence"},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","unstructured":"Dylan P. Losey Andrea Bajcsy Marcia K O\u2019Malley and Anca D. Dragan. 2022. Physical interaction as communication: Learning robot objectives online from human corrections. The International Journal of Robotics Research 41 1 (2022) 20\u201344.","DOI":"10.1177\/02783649211050958"},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"Dylan P. Losey Craig G. McDonald Edoardo Battaglia and Marcia K. O\u2019Malley. 2018. A review of intent detection arbitration and communication aspects of shared control for physical human.robot interaction. Applied Mechanics Reviews 70 1 (2018) 010804.","DOI":"10.1115\/1.4039225"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2017.2765335"},{"issue":"1","key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3354139","article-title":"Learning the correct robot trajectory in real-time from physical human interactions","volume":"9","author":"Losey Dylan P.","year":"2019","unstructured":"Dylan P. Losey and Marcia K. O\u2019Malley. 2019. Learning the correct robot trajectory in real-time from physical human interactions. ACM Transactions on Human-Robot Interaction 9, 1 (2019), 1\u201319.","journal-title":"ACM Transactions on Human-Robot Interaction"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/2981780.2981903"},{"key":"e_1_3_2_38_2","volume-title":"Individual Choice Behavior: A Theoretical Analysis","author":"Luce R. Duncan","year":"2012","unstructured":"R. Duncan Luce. 2012. Individual Choice Behavior: A Theoretical Analysis. Courier Corporation."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364912455366"},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","unstructured":"Selma Musi\u0107 and Sandra Hirche. 2017. Control sharing in human-robot team interaction. Annual Reviews in Control 44 (2017) 342\u2013354.","DOI":"10.1016\/j.arcontrol.2017.09.017"},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ng Andrew Y.","year":"2000","unstructured":"Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Takayuki Osa Joni Pajarinen Gerhard Neumann J. Andrew Bagnell Pieter Abbeel Jan Peters and others. 2018. An algorithmic perspective on imitation learning. Foundations and Trends\u00ae in Robotics 7 1\u20132 (2018) 1\u2013179.","DOI":"10.1561\/2300000053"},{"key":"e_1_3_2_43_2","first-page":"627","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Ross St\u00e9phane","year":"2011","unstructured":"St\u00e9phane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 627\u2013635."},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/HRI53351.2022.9889616"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3371382.3380739"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914528132"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02288967"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-021-10006-9"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-018-9757-y"},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Zhang Songyuan","year":"2021","unstructured":"Songyuan Zhang, Zhangjie Cao, Dorsa Sadigh, and Yanan Sui. 2021. Confidence-aware imitation learning from demonstrations with varying optimality. In Proceedings of the Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_51_2","first-page":"1433","volume-title":"Proceedings of the 23rd National Conference on Artificial Intelligence","author":"Ziebart Brian D.","year":"2008","unstructured":"Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd National Conference on Artificial Intelligence. 1433\u20131438."}],"container-title":["ACM Transactions on Human-Robot Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623384","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3623384","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:26Z","timestamp":1750178186000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623384"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,26]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1145\/3623384"],"URL":"https:\/\/doi.org\/10.1145\/3623384","relation":{},"ISSN":["2573-9522"],"issn-type":[{"value":"2573-9522","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,26]]},"assertion":[{"value":"2022-06-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}