{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T12:42:23Z","timestamp":1766580143059,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,4,20]],"date-time":"2023-04-20T00:00:00Z","timestamp":1681948800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["804907"],"award-info":[{"award-number":["804907"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Space Agency","award":["804907"],"award-info":[{"award-number":["804907"]}]},{"name":"Honda Research Institute Europe","award":["804907"],"award-info":[{"award-number":["804907"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps, first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object transportation task, both in the shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.<\/jats:p>","DOI":"10.3390\/robotics12020061","type":"journal-article","created":{"date-parts":[[2023,4,21]],"date-time":"2023-04-21T02:05:31Z","timestamp":1682042731000},"page":"61","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences"],"prefix":"10.3390","volume":"12","author":[{"given":"Armin","family":"Avaei","sequence":"first","affiliation":[{"name":"Cognitive Robotics, Delft University of Technology, 2628 CD Delft, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6062-3141","authenticated-orcid":false,"given":"Linda","family":"van der Spaa","sequence":"additional","affiliation":[{"name":"Cognitive Robotics, Delft University of Technology, 2628 CD Delft, The Netherlands"},{"name":"Honda Research Institute Europe, 63073 Offenbach am Main, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8696-3689","authenticated-orcid":false,"given":"Luka","family":"Peternel","sequence":"additional","affiliation":[{"name":"Cognitive Robotics, Delft University of Technology, 2628 CD Delft, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7257-5434","authenticated-orcid":false,"given":"Jens","family":"Kober","sequence":"additional","affiliation":[{"name":"Cognitive Robotics, Delft University of Technology, 2628 CD Delft, The Netherlands"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1296","DOI":"10.1177\/0278364915581193","article-title":"Learning preferences for manipulation tasks from online coactive feedback","volume":"34","author":"Jain","year":"2015","journal-title":"Int. J. Robot. Res."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Duchaine, V., and Gosselin, C.M. (2007, January 22\u201324). General model of human\u2013robot cooperation using a novel velocity based variable impedance control. Proceedings of the Second Joint EuroHaptics Conference and Symp. on Haptic Interfaces for Virtual Environment and Teleoperator Systems, Tsukuba, Japan.","DOI":"10.1109\/WHC.2007.59"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/s10514-013-9361-0","article-title":"Teaching robots to cooperate with humans in dynamic manipulation tasks based on multi-modal human-in-the-loop approach","volume":"36","author":"Peternel","year":"2014","journal-title":"Auton. Robot."},{"key":"ref_4","unstructured":"Bajcsy, A., Losey, D.P., O\u2019Malley, M.K., and Dragan, A.D. (2017, January 13\u201315). Learning robot objectives from physical human interaction. Proceedings of the Conference on Robot Learning, Mountain View, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3354139","article-title":"Learning the correct robot trajectory in real-time from physical human interactions","volume":"9","author":"Losey","year":"2019","journal-title":"ACM Trans. Hum.-Robot Interact."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1177\/02783649211050958","article-title":"Physical interaction as communication: Learning robot objectives online from human corrections","volume":"41","author":"Losey","year":"2022","journal-title":"Int. J. Robot. Res."},{"key":"ref_7","unstructured":"Ijspeert, A.J., Nakanishi, J., and Schaal, S. (2002, January 11\u201315). Movement imitation with nonlinear dynamical systems in humanoid robots. Proceedings of the IEEE International Conference on Robotics and Automation, Washington, DC, USA."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1450","DOI":"10.1016\/j.robot.2013.07.009","article-title":"Interaction learning for dynamic movement primitives used in cooperative robotic tasks","volume":"61","author":"Kulvicius","year":"2013","journal-title":"Robot. Auton. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"340","DOI":"10.1016\/j.robot.2015.09.011","article-title":"Adaptation and coaching of periodic motion primitives through physical and visual interaction","volume":"75","author":"Gams","year":"2016","journal-title":"Robot. Auton. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1023","DOI":"10.1007\/s10514-017-9676-3","article-title":"Human robot cooperation with compliance adaptation along the motion trajectory","volume":"42","author":"Nemec","year":"2018","journal-title":"Auton. Robot."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ewerton, M., Maeda, G., Kollegger, G., Wiemeyer, J., and Peters, J. (2016, January 15\u201317). Incremental imitation learning of context-dependent motor skills. Proceedings of the 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), Cancun, Mexico.","DOI":"10.1109\/HUMANOIDS.2016.7803300"},{"key":"ref_12","first-page":"1","article-title":"A survey of preference-based reinforcement learning methods","volume":"18","author":"Wirth","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ratliff, N.D., Bagnell, J.A., and Zinkevich, M.A. (2006, January 25\u201329). Maximum margin planning. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143936"},{"key":"ref_14","unstructured":"Ziebart, B.D., Maas, A.L., Bagnell, J.A., and Dey, A.K. (2008, January 13\u201317). Maximum entropy inverse reinforcement learning. Proceedings of the AAAI, Chicago, IL, USA."},{"key":"ref_15","first-page":"4415","article-title":"Reward-rational (implicit) choice: A unifying formalism for reward learning","volume":"33","author":"Jeon","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_16","unstructured":"Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., and Amodei, D. (2018). Reward Learning from Human Preferences and Demonstrations in Atari. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1177\/02783649211041652","article-title":"Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences","volume":"41","author":"Losey","year":"2022","journal-title":"Int. J. Robot. Res."},{"key":"ref_18","unstructured":"Kirby, R., Simmons, R., and Forlizzi, J. (October, January 27). Companion: A constraint-optimizing method for person-acceptable navigation. Proceedings of the 18th IEEE International Symp. on Robot and Human Interactive Communication, Toyama, Japan."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1177\/0278364915619772","article-title":"Socially compliant mobile robot navigation via inverse reinforcement learning","volume":"35","author":"Kretzschmar","year":"2016","journal-title":"Int. J. Robot. Res."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1613\/jair.4539","article-title":"Coactive learning","volume":"53","author":"Shivaswamy","year":"2015","journal-title":"J. Artif. Intell. Res."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Palan, M., Landolfi, N.C., Shevchuk, G., and Sadigh, D. (2019). Learning reward functions by integrating human demonstrations and preferences. arXiv.","DOI":"10.15607\/RSS.2019.XV.023"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fahad, M., Chen, Z., and Guo, Y. (2018, January 1\u20135). Learning how pedestrians navigate: A deep inverse reinforcement learning approach. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593438"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1164","DOI":"10.1177\/0278364913488805","article-title":"CHOMP: Covariant Hamiltonian Optimization for Motion Planning","volume":"32","author":"Zucker","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Vasquez, D., Okal, B., and Arras, K.O. (2014, January 14\u201318). Inverse reinforcement learning algorithms and features for robot navigation in crowds: An experimental comparison. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA.","DOI":"10.1109\/IROS.2014.6942731"},{"key":"ref_25","unstructured":"(2021, August 03). MathWorks. Waypoint Trajectory Generator, 2018. Available online: https:\/\/www.mathworks.com\/help\/fusion\/ref\/waypointtrajectory-system-object.html."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/S0166-4115(08)62386-9","article-title":"Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research","volume":"Volume 52","author":"Hart","year":"1988","journal-title":"Advances in Psychology"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1007\/s12369-012-0160-0","article-title":"Keyframe-based learning from demonstration","volume":"4","author":"Akgun","year":"2012","journal-title":"Int. J. Soc. Robot."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1177\/0278364912472380","article-title":"Learning to select and generalize striking movements in robot table tennis","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1177\/02783649221078031","article-title":"Inducing structure in reward learning by learning features","volume":"41","author":"Bobu","year":"2022","journal-title":"Int. J. Robot. Res."},{"key":"ref_30","unstructured":"Katz, S.M., Maleki, A., B\u0131y\u0131k, E., and Kochenderfer, M.J. (2021). Preference-based learning of reward function features. arXiv."},{"key":"ref_31","unstructured":"Vadakkepat, P., and Goswami, A. (2019). Humanoid Robotics: A Reference, Springer."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"B\u0131y\u0131k, E., Huynh, N., Kochenderfer, M.J., and Sadigh, D. (2020). Active Preference-based Gaussian Process Regression for Reward Learning. arXiv.","DOI":"10.15607\/RSS.2020.XVI.041"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/2\/61\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:20:12Z","timestamp":1760124012000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/12\/2\/61"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,20]]},"references-count":32,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["robotics12020061"],"URL":"https:\/\/doi.org\/10.3390\/robotics12020061","relation":{},"ISSN":["2218-6581"],"issn-type":[{"type":"electronic","value":"2218-6581"}],"subject":[],"published":{"date-parts":[[2023,4,20]]}}}