{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T20:19:16Z","timestamp":1777407556978,"version":"3.51.4"},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T00:00:00Z","timestamp":1637712000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T00:00:00Z","timestamp":1637712000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2022,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Mastering robotic manipulation skills through reinforcement learning (RL) typically requires the design of shaped reward functions. Recent developments in this area have demonstrated that using sparse rewards, i.e. rewarding the agent only when the task has been successfully completed, can lead to better policies. However, state-action space exploration is more difficult in this case. Recent RL approaches to learning with sparse rewards have leveraged high-quality human demonstrations for the task, but these can be costly, time consuming or even impossible to obtain. In this paper, we propose a novel and effective approach that does not require human demonstrations. We observe that every robotic manipulation task could be seen as involving a locomotion task from the perspective of the object being manipulated, i.e. the object could learn how to reach a target state on its own. In order to exploit this idea, we introduce a framework whereby an object locomotion policy is initially obtained using a realistic physics simulator. This policy is then used to generate auxiliary rewards, called simulated locomotion demonstration rewards (SLDRs), which enable us to learn the robot manipulation policy. The proposed approach has been evaluated on 13 tasks of increasing complexity, and can achieve higher success rate and faster learning rates compared to alternative algorithms. SLDRs are especially beneficial for tasks like multi-object stacking and non-rigid object manipulation.<\/jats:p>","DOI":"10.1007\/s10994-021-06116-1","type":"journal-article","created":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T20:47:22Z","timestamp":1637786842000},"page":"465-486","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Reinforcement learning for robotic manipulation using simulated locomotion demonstrations"],"prefix":"10.1007","volume":"111","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5269-2382","authenticated-orcid":false,"given":"Ozsel","family":"Kilinc","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giovanni","family":"Montana","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,11,24]]},"reference":[{"key":"6116_CR1","doi-asserted-by":"crossref","unstructured":"Abbeel, P., & Ng, A.\u00a0Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning.","DOI":"10.1145\/1015330.1015430"},{"key":"6116_CR2","doi-asserted-by":"crossref","unstructured":"Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A., et\u00a0al. (2018). Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177.","DOI":"10.1177\/0278364919887447"},{"key":"6116_CR3","unstructured":"Andrychowicz, M., Crow, D., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in neural information processing systems (pp. 5055\u20135065)"},{"key":"6116_CR4","doi-asserted-by":"crossref","unstructured":"Belter, D., Kopicki, M., Zurek, S., & Wyatt, J. (2014). Kinematically optimised predictions of object motion. In 2014 IEEE\/RSJ international conference on intelligent robots and systems (pp. 4422\u20134427). IEEE.","DOI":"10.1109\/IROS.2014.6943188"},{"key":"6116_CR5","unstructured":"Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., Zhang, X., & Zhao, J. (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316."},{"key":"6116_CR6","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. arXiv preprint arXiv:1606.01540."},{"key":"6116_CR7","unstructured":"Brown, D. S., Goo, W., Nagarajan, P., & Niekum, S. (2019). Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In International conference on learning representations ."},{"key":"6116_CR8","unstructured":"Burda, Y., Edwards, H., Storkey, A., & Klimov, O. (2018a). Exploration by random network distillation. arXiv preprint arXiv:1810.12894."},{"key":"6116_CR9","unstructured":"Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A.\u00a0A. (2018b). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355."},{"key":"6116_CR10","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1109\/TRO.2019.2891173","volume":"35","author":"S Choi","year":"2019","unstructured":"Choi, S., Lee, K., & Songhwai, O. (2019). Robust learning from demonstrations with mixed qualities using leveraged gaussian processes. IEEE Transactions on Robotics, 35, 564\u2013576.","journal-title":"IEEE Transactions on Robotics"},{"key":"6116_CR11","unstructured":"Chua, K., Calandra, R., McAllister, R., & Levine, S. (2019). Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In International conference on machine learning."},{"key":"6116_CR12","unstructured":"Coumans, E., & Bai, Y. (2017). Pybullet, a python module for physics simulation in robotics, games and machine learning. https:\/\/pybullet.org."},{"key":"6116_CR13","unstructured":"Duan, Y., Andrychowicz, M., Stadie, B.\u00a0C., Ho, J., Schneider, J., Sutskever, I., Abbeel, P., & Zaremba, W. (2017). One-shot imitation learning. In Advances in neural information processing systems (pp. 1087\u20131098)."},{"key":"6116_CR14","unstructured":"Fang, M., Zhou, T., Du, Y., Han, L., & Zhang, Z. (2019) Curriculum-guided hindsight experience replay."},{"key":"6116_CR15","unstructured":"Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning (pp. 49\u201358)."},{"key":"6116_CR16","unstructured":"Florensa, C., Held, D., Geng, X., & Abbeel, P. (2018). Automatic goal generation for reinforcement learning agents. In International conference on machine learning (pp. 1514\u20131523)."},{"key":"6116_CR17","doi-asserted-by":"crossref","unstructured":"Fu, J., Levine, S., & Abbeel, P. (2016). One-shot learning of manipulation skills with online dynamics adaptation and neural network priors. In 2016 IEEE\/RSJ International conference on intelligent robots and systems (IROS) (pp. 4019\u20134026). IEEE.","DOI":"10.1109\/IROS.2016.7759592"},{"key":"6116_CR18","unstructured":"Gao, Y., Xu, H., Lin, J., Yu, F., Levine, S., & Darrell, T. (2018). Reinforcement learning from imperfect demonstrations. In International conference on learning representations."},{"key":"6116_CR19","doi-asserted-by":"crossref","unstructured":"Grollman, D.\u00a0H., & Billard, A. (2011). Donut as I Do: Learning from failed demonstrations. In IEEE international conference on robotics and automation.","DOI":"10.1109\/ICRA.2011.5979757"},{"key":"6116_CR20","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T.\u00a0P., & Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In International conference on robotics and automation (pp. 3389\u20133396).","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"6116_CR21","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning."},{"key":"6116_CR22","doi-asserted-by":"crossref","unstructured":"Hester, T., et\u00a0al. (2018). Deep q-learning from demonstrations. In AAAI Conference on artificial intelligence (pp. 3223\u20133230).","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"6116_CR23","unstructured":"Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In Advances in neural information processing systems (pp. 4565\u20134573)."},{"key":"6116_CR24","unstructured":"Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). VIME: Variational information maximizing exploration. In Advances in neural information processing systems."},{"key":"6116_CR25","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR. arXiv:1412.6980."},{"key":"6116_CR26","doi-asserted-by":"crossref","unstructured":"Koenig, N., & Howard, A. (2004). Design and use paradigms for gazebo, an open-source multi-robot simulator. In IEEE\/RSJ international conference on intelligent robots and systems (IROS).","DOI":"10.1109\/IROS.2004.1389727"},{"key":"6116_CR27","unstructured":"Kopicki, M., Wyatt, J., & Stolkin, R. (2009). Prediction learning in robotic pushing manipulation. In 2009 International conference on advanced robotics (pp. 1\u20136). IEEE."},{"key":"6116_CR28","doi-asserted-by":"crossref","unstructured":"Kopicki, M., Zurek, S., Stolkin, R., M\u00f6rwald, T., & Wyatt, J. (2011). Learning to predict how rigid objects behave under simple manipulation. In 2011 IEEE international conference on robotics and automation (pp. 5722\u20135729). IEEE.","DOI":"10.1109\/ICRA.2011.5980295"},{"key":"6116_CR29","unstructured":"Kroemer, O., Niekum, S., & Konidaris, G. (2019). A review of robot learning for manipulation: Challenges, representations, and algorithms. arXiv preprint arXiv:1907.03146."},{"issue":"1","key":"6116_CR30","first-page":"1334","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1), 1334\u20131373.","journal-title":"The Journal of Machine Learning Research"},{"key":"6116_CR31","unstructured":"Li, Y., Wu, J., Tedrake, R., Tenenbaum, J.\u00a0B., & Torralba, A. (2018). Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. arXiv preprint arXiv:1810.01566."},{"key":"6116_CR32","unstructured":"Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. CoRR. arXiv:1509.02971."},{"issue":"7540","key":"6116_CR33","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529\u2013533.","journal-title":"Nature"},{"key":"6116_CR34","unstructured":"Mnih, V., Badia, A.\u00a0P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928\u20131937)."},{"key":"6116_CR35","doi-asserted-by":"crossref","unstructured":"Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018a) Overcoming exploration in reinforcement learning with demonstrations. In International conference on robotics and automation (pp. 6292\u20136299).","DOI":"10.1109\/ICRA.2018.8463162"},{"key":"6116_CR36","unstructured":"Nair, A., Pong, V., Dalal, M., Bahl, S., Lin, S., & Levine, S. (2018b). Visual reinforcement learning with imagined goals. In Advances in neural information processing systems (pp. 9209\u20139220)."},{"key":"6116_CR37","unstructured":"Ng, A.\u00a0Y., & Russell, S.\u00a0J. (2000). Algorithms for inverse reinforcement learning. In International conference on machine learning (pp. 663\u2013670)."},{"key":"6116_CR38","doi-asserted-by":"crossref","unstructured":"Pathak, D., Agrawal, P., Efros, A.\u00a0A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In IEEE Conference on computer vision and pattern recognition workshops (pp. 16\u201317).","DOI":"10.1109\/CVPRW.2017.70"},{"key":"6116_CR39","unstructured":"Plappert, M., et al. (2018). Multi-goal reinforcement learning: Challenging robotics environments and request for research. CoRR. arXiv:1802.09464."},{"key":"6116_CR40","unstructured":"Pomerleau, D. (1988). ALVINN: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (pp. 305\u2013313)."},{"key":"6116_CR41","unstructured":"Popov, I., Heess, N., Lillicrap, T. P., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T., & Riedmiller, M. A. (2017). Data-efficient deep reinforcement learning for dexterous manipulation. CoRR. arXiv:1704.03073."},{"key":"6116_CR42","doi-asserted-by":"crossref","unstructured":"Ratliff, N.\u00a0D., Bagnell, J.\u00a0A., & Srinivasa, S.\u00a0S. (2007). Imitation learning for locomotion and manipulation. In International conference on humanoid robots (pp. 392\u2013397).","DOI":"10.21236\/ADA528601"},{"key":"6116_CR43","unstructured":"Reddy, S., Dragan, A.\u00a0D., & Levine, S. (2019). SQIL: Imitation learning via regularized behavioral cloning. arXiv preprint arXiv:1905.11108."},{"key":"6116_CR44","unstructured":"Riedmiller, M. A., et al. (2018). Learning by playing solving sparse reward tasks from scratch. In International conference on machine learning (pp. 4341\u20134350)."},{"key":"6116_CR45","unstructured":"Ross, S., Gordon, G.\u00a0J., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In International conference on artificial intelligence and statistics (pp. 627\u2013635)."},{"key":"6116_CR46","unstructured":"Sasaki, F., Yohira, T., & Kawaguchi, A. (2019). Sample efficient imitation learning for continuous control. In In International conference on learning representations."},{"key":"6116_CR47","unstructured":"Savinov, N., Raichuk, A., Marinier, R., Vincent, D., Pollefeys, M., Lillicrap, T., & Gelly, S. (2018). Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274."},{"key":"6116_CR48","unstructured":"Schaul, T., Horgan, D., Gregor, K., & Silver, D. (2015). Universal value function approximators. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6\u201311 July 2015 (pp. 1312\u20131320)."},{"key":"6116_CR49","doi-asserted-by":"crossref","unstructured":"Schmidhuber, J. (1991). Curious model-building control systems. In IEEE international joint conference on neural networks (pp. 1458\u20131463). IEEE.","DOI":"10.1109\/IJCNN.1991.170605"},{"key":"6116_CR50","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning (pp. 1889\u20131897)."},{"key":"6116_CR51","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347."},{"key":"6116_CR52","unstructured":"Shiarlis, K., Messias, J., & Whiteson, S. (2016). Inverse reinforcement learning from failure. In International conference on autonomous agents & multiagent systems."},{"key":"6116_CR53","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. A. (2014). Deterministic policy gradient algorithms. In textitInternational conference on machine learning (pp. 387\u2013395)."},{"issue":"7587","key":"6116_CR54","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver, D., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484\u2013489.","journal-title":"Nature"},{"key":"6116_CR55","unstructured":"Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., & Fergus, R. (2018). Intrinsic motivation and automatic curricula via asymmetric self-play. In In International conference on learning representations."},{"key":"6116_CR56","unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press."},{"key":"6116_CR57","doi-asserted-by":"crossref","unstructured":"Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In 2012 IEEE\/RSJ international conference on intelligent robots and systems (pp. 5026\u20135033). IEEE.","DOI":"10.1109\/IROS.2012.6386109"},{"key":"6116_CR58","unstructured":"Vecerik, M., et al. (2017). Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. CoRR. arXiv:1707.08817."},{"key":"6116_CR59","unstructured":"Wang, Z., Merel, J., Reed, S., Wayne, G., de Freitas, N., & Heess, N. (2017). Robust imitation of diverse behaviors. arXiv preprint arXiv:1707.02747."},{"key":"6116_CR60","first-page":"279","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins, C. J. C. H., & Dayan, P. (1992). Technical note q-learning. Machine Learning, 8, 279\u2013292.","journal-title":"Machine Learning"},{"key":"6116_CR61","doi-asserted-by":"crossref","unstructured":"Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end learning of driving models from large-scale video datasets. In IEEE Conference on computer vision and pattern recognition.","DOI":"10.1109\/CVPR.2017.376"},{"key":"6116_CR62","unstructured":"Zhang, M., Vikram, S., Smith, L., Abbeel, P., Johnson, M., & Levine, S. (2019). SOLAR: deep structured representations for model-based reinforcement learning. In International conference on machine learning."},{"key":"6116_CR63","doi-asserted-by":"crossref","unstructured":"Zheng, J., Liu, S., & Ni, L. M. (2014). Robust Bayesian inverse reinforcement learning with sparse behavior noise. In AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v28i1.8979"},{"key":"6116_CR64","doi-asserted-by":"crossref","unstructured":"Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., & Kumar, V. (2018). Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. arXiv preprint arXiv:1810.06045.","DOI":"10.1109\/ICRA.2019.8794102"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-021-06116-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-021-06116-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-021-06116-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,16]],"date-time":"2023-01-16T05:40:22Z","timestamp":1673847622000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-021-06116-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,24]]},"references-count":64,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,2]]}},"alternative-id":["6116"],"URL":"https:\/\/doi.org\/10.1007\/s10994-021-06116-1","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,24]]},"assertion":[{"value":"31 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 October 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 November 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}