{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T09:59:12Z","timestamp":1777715952084,"version":"3.51.4"},"reference-count":77,"publisher":"SAGE Publications","issue":"10","license":[{"start":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T00:00:00Z","timestamp":1687737600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:p>In this work, we introduce composable energy policies (CEP), a novel framework for multi-objective motion generation. We frame the problem of composing multiple policy components from a probabilistic view. We consider a set of stochastic policies represented in arbitrary task spaces, where each policy represents a distribution of the actions to solve a particular task. Then, we aim to find the action in the configuration space that optimally satisfies all the policy components. The presented framework allows the fusion of motion generators from different sources: optimal control, data-driven policies, motion planning, and handcrafted policies. Classically, the problem of multi-objective motion generation is solved by the composition of a set of deterministic policies, rather than stochastic policies. However, there are common situations where different policy components have conflicting behaviors, leading to oscillations or the robot getting stuck in an undesirable state. While our approach is not directly able to solve the conflicting policies problem, we claim that modeling each policy as a stochastic policy allows more expressive representations for each component in contrast with the classical reactive motion generation approaches. In some tasks, such as reaching a target in a cluttered environment, we show experimentally that CEP additional expressivity allows us to model policies that reduce these conflicting behaviors. A field that benefits from these reactive motion generators is the one of robot reinforcement learning. Integrating these policy architectures with reinforcement learning allows us to include a set of inductive biases in the learning problem. These inductive biases guide the reinforcement learning agent towards informative regions or improve collision safety while exploring. In our work, we show how to integrate our proposed reactive motion generator as a structured policy for reinforcement learning. Combining the reinforcement learning agent exploration with the prior-based CEP, we can improve the learning performance and explore safer.<\/jats:p>","DOI":"10.1177\/02783649231179499","type":"journal-article","created":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T18:49:31Z","timestamp":1687805371000},"page":"827-858","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Composable energy policies for reactive motion generation and reinforcement learning"],"prefix":"10.1177","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1135-6654","authenticated-orcid":false,"given":"Julen","family":"Urain","sequence":"first","affiliation":[{"name":"Computer Science Department, Institute for Intelligent Autonomous Systems, TU Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8580-4750","authenticated-orcid":false,"given":"Anqi","family":"Li","sequence":"additional","affiliation":[{"name":"Robot Learning Lab, University of Washington, Seattle, WA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Puze","family":"Liu","sequence":"additional","affiliation":[{"name":"Computer Science Department, Institute for Intelligent Autonomous Systems, TU Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carlo","family":"D\u2019Eramo","sequence":"additional","affiliation":[{"name":"Institute of Computer Science, University of W\u00fcrzburg, Germany"},{"name":"Hessian.AI, The Hessian Center for Artificial Intelligence, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jan","family":"Peters","sequence":"additional","affiliation":[{"name":"Computer Science Department, Institute for Intelligent Autonomous Systems, TU Darmstadt, Germany"},{"name":"Hessian.AI, The Hessian Center for Artificial Intelligence, Germany"},{"name":"Research Department: Systems AI for Robot LearningGerman Research Center for AI (DFKI), Darmstadt, Germany"},{"name":"Centre for Cognitive Science, Darmstadt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2023,6,26]]},"reference":[{"key":"bibr1-02783649231179499","first-page":"2040","volume-title":"Conference on Robot Learning","author":"Aljalbout E","year":"2021"},{"key":"bibr2-02783649231179499","first-page":"9","volume-title":"International Workshop on Artificial Intelligence and Statistics","author":"Attias H","year":"2003"},{"key":"bibr3-02783649231179499","unstructured":"Bahl S, Mukadam M, Gupta A, et al. (2020) Neural dynamic policies for End-To-End sensorimotor learning.\n                      arXiv preprint arXiv:2012.02788\n                      ."},{"key":"bibr4-02783649231179499","first-page":"750","volume-title":"Proceedings of the 5th Conference on Robot Learning, Proceedings of Machine Learning Research","volume":"164","author":"Bhardwaj M","year":"2022"},{"key":"bibr5-02783649231179499","first-page":"35","volume-title":"Handbook of Statistics","volume":"31","author":"Botev ZI","year":"2013"},{"key":"bibr6-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561320"},{"key":"bibr7-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/SII.2019.8700380"},{"key":"bibr8-02783649231179499","unstructured":"Cheng CA, Mukadam M, Issac J, et al. (2018) Rmpflow: a computational graph for automatic motion policy generation. In: International Workshop on the Algorithmic Foundations of Robotics, December 9\u201311, 2018, Universidad Polit\u00e9cnica de Yucat\u00e1n in M\u00e9rida, M\u00e9xico."},{"key":"bibr9-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1145\/1531326.1531388"},{"key":"bibr10-02783649231179499","volume":"34","author":"Dalal M","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr11-02783649231179499","first-page":"1","volume":"17","author":"Daniel C","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"bibr12-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-005-5724-z"},{"key":"bibr13-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/CCA.2016.7587882"},{"issue":"131","key":"bibr14-02783649231179499","first-page":"1","volume":"22","author":"D\u2019Eramo C","year":"2021","journal-title":"Journal of Machine Learning Research"},{"key":"bibr15-02783649231179499","unstructured":"Du Y, Li S, Mordatch I (2020) Compositional visual generation and inference with energy based models. In: Conference on Neural Information Processing Systems, 6\u201312 December 2020."},{"key":"bibr16-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2013.7029990"},{"key":"bibr17-02783649231179499","first-page":"158","volume-title":"Conference on Robot Learning","author":"Florence P","year":"2022"},{"key":"bibr18-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/100.580977"},{"key":"bibr19-02783649231179499","first-page":"1587","volume-title":"International Conference on Machine Learning","author":"Fujimoto S","year":"2018"},{"key":"bibr20-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3129139"},{"key":"bibr21-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1016\/0005-1098(89)90002-2"},{"key":"bibr22-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1023\/A:1020564024509"},{"key":"bibr23-02783649231179499","volume-title":"Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics","author":"Gibbs JW","year":"1902"},{"key":"bibr24-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460756"},{"key":"bibr25-02783649231179499","first-page":"1352","volume-title":"International Conference on Machine Learning","author":"Haarnoja T","year":"2017"},{"key":"bibr26-02783649231179499","first-page":"1861","volume-title":"International Conference on Machine Learning","author":"Haarnoja T","year":"2018"},{"key":"bibr27-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1162\/089976602760128018"},{"key":"bibr28-02783649231179499","doi-asserted-by":"publisher","DOI":"10.3390\/app9020348"},{"key":"bibr29-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRev.106.620"},{"key":"bibr30-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794127"},{"key":"bibr31-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980391"},{"key":"bibr32-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913484072"},{"key":"bibr33-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980280"},{"key":"bibr34-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2795645"},{"key":"bibr35-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/70.508439"},{"key":"bibr36-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1985.1087247"},{"key":"bibr37-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/JRA.1987.1087068"},{"key":"bibr38-02783649231179499","first-page":"1278","volume-title":"Conference on Robot Learning","author":"Lambert A","year":"2021"},{"key":"bibr39-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511546877"},{"key":"bibr40-02783649231179499","first-page":"293","volume":"5","author":"LaValle SM","year":"2001","journal-title":"Algorithmic and Computational Robotics: New Directions"},{"key":"bibr41-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793485"},{"key":"bibr42-02783649231179499","unstructured":"Levine S (2018) Reinforcement learning and control as probabilistic inference: tutorial and review.\n                      arXiv preprint arXiv:1805.00909\n                      ."},{"key":"bibr43-02783649231179499","unstructured":"Li A, Cheng CA, Rana MA, et al. (2021) RMP2: a structured composable policy class for robot learning. In: Robotics: Science and Systems (R:SS), Jul 12\u201316 2021, Virtual."},{"key":"bibr44-02783649231179499","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, et al. (2015) Continuous control with deep reinforcement learning.\n                      arXiv preprint arXiv:1509.02971\n                      ."},{"key":"bibr45-02783649231179499","first-page":"7224","volume-title":"International Conference on Machine Learning","author":"Lutter M","year":"2021"},{"key":"bibr46-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1016\/S0098-1354(98)00301-9"},{"key":"bibr47-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1177\/0278364918790369"},{"key":"bibr48-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1177\/0278364912472380"},{"key":"bibr49-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2003.11.005"},{"key":"bibr50-02783649231179499","unstructured":"Paraschos A, Daniel C, Peters JR, et al. (2013) Probabilistic movement primitives. In: Advances in Neural Information Processing Systems, Dec 5\u201310 2013, Harrahs and Harveys, Lake Tahoe US, pp. 2616\u20132624."},{"key":"bibr51-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00025"},{"key":"bibr52-02783649231179499","volume":"32","author":"Peng XB","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr53-02783649231179499","first-page":"188","volume-title":"Conference on Robot Learning","author":"Pertsch K","year":"2021"},{"key":"bibr54-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273590"},{"key":"bibr55-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/AMC.2000.862901"},{"key":"bibr56-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2009.5152817"},{"key":"bibr57-02783649231179499","unstructured":"Ratliff ND, Issac J, Kappler D, et al. (2018) Riemannian motion policies.\n                      arXiv preprint arXiv:1801.02854\n                      ."},{"key":"bibr58-02783649231179499","unstructured":"Ratliff ND, Van Wyk K, Xie M, et al. (2020) Optimization fabrics.\n                      arXiv preprint arXiv:2008.02399\n                      ."},{"key":"bibr59-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561543"},{"key":"bibr60-02783649231179499","author":"Rawlik K","year":"2012","journal-title":"Proceedings of Robotics: Science and Systems"},{"key":"bibr61-02783649231179499","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2020.XVI.072"},{"key":"bibr62-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914528132"},{"key":"bibr63-02783649231179499","unstructured":"Schulman J, Wolski F, Dhariwal P, et al. (2017) Proximal policy optimization algorithms.\n                      arXiv preprint arXiv:1707.06347\n                      ."},{"key":"bibr64-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA46639.2022.9811986"},{"key":"bibr65-02783649231179499","unstructured":"Silver T, Allen K, Tenenbaum J, et al. (2018) Residual policy learning.\n                      arXiv preprint arXiv:1812.06298\n                      ."},{"key":"bibr66-02783649231179499","unstructured":"Sola J, Deray J, Atchuthan D (2018) A micro lie theory for state estimation in robotics.\n                      arXiv preprint arXiv:1812.01537\n                      ."},{"key":"bibr67-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"bibr68-02783649231179499","unstructured":"Tasse GN, James S, Rosman B (2020) A boolean task algebra for reinforcement learning. In: Conference on Neural Information Processing Systems, December 6\u201312, 2020, Virtual."},{"key":"bibr69-02783649231179499","first-page":"1856","volume":"22","author":"Todorov E","year":"2009","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr70-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553508"},{"key":"bibr71-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341035"},{"key":"bibr72-02783649231179499","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2021.XVII.052"},{"key":"bibr73-02783649231179499","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19457-3_1"},{"key":"bibr74-02783649231179499","unstructured":"Van Niekerk B, James S, Earle A, et al. (2019) Composing value functions in reinforcement learning. In: International Conference on Machine Learning, 9\u201315 June 2019, Long Beach, California, USA, pp. 6401\u20136409."},{"key":"bibr75-02783649231179499","doi-asserted-by":"publisher","DOI":"10.2514\/1.G001921"},{"key":"bibr76-02783649231179499","unstructured":"Xie M, Van Wyk K, Li A, et al. (2020) Geometric fabrics for the acceleration-based design of robotic motion.\n                      arXiv preprint arXiv:2010.14750\n                      ."},{"key":"bibr77-02783649231179499","unstructured":"Ziebart BD, Bagnell JA, Dey AK (2010) Modeling interaction via the principle of maximum causal entropy. In: International Conference on Machine Learning, 21\u201324 June 2010, Haifa, Israel."}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649231179499","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649231179499","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649231179499","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:16:58Z","timestamp":1777457818000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649231179499"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,26]]},"references-count":77,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["10.1177\/02783649231179499"],"URL":"https:\/\/doi.org\/10.1177\/02783649231179499","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,26]]}}}