{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T18:21:03Z","timestamp":1773858063292,"version":"3.50.1"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,2,6]],"date-time":"2023-02-06T00:00:00Z","timestamp":1675641600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,6]],"date-time":"2023-02-06T00:00:00Z","timestamp":1675641600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Robot"],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In order to provide adaptive and user-friendly solutions to robotic manipulation, it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse, this paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm and can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity, this approach shows superior performance compared to other state-of-the-art intrinsic exploration approaches during extensive empirical testing. When combined with other strategies for tackling the exploration challenge, e.g. curriculum learning, our approach is able to further improve the exploration efficiency and task success rate. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations, this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process.<\/jats:p>","DOI":"10.1007\/s10514-023-10087-8","type":"journal-article","created":{"date-parts":[[2023,2,6]],"date-time":"2023-02-06T08:03:16Z","timestamp":1675670596000},"page":"617-633","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["An empowerment-based solution to robotic manipulation tasks with sparse rewards"],"prefix":"10.1007","volume":"47","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0924-3629","authenticated-orcid":false,"given":"Siyu","family":"Dai","sequence":"first","affiliation":[]},{"given":"Wei","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Andreas","family":"Hofmann","sequence":"additional","affiliation":[]},{"given":"Brian","family":"Williams","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,6]]},"reference":[{"key":"10087_CR1","unstructured":"Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O.\u00a0P., & Zaremba, W. (2017). Hindsight experience replay. In Advances in Neural Information Processing Systems, pp. 5048\u20135058."},{"key":"10087_CR2","doi-asserted-by":"crossref","unstructured":"Bacon, P.-L., Harb, J., & Precup, D. (2017). The option-critic architecture. In Thirty-First AAAI Conference on Artificial Intelligence.","DOI":"10.1609\/aaai.v31i1.10916"},{"key":"10087_CR3","unstructured":"Belghazi, M.\u00a0Ishmael, B., Aristide, R., Sai, O., Sherjil, B., Yoshua, C., Aaron, & Hjelm, D. (2018). Mutual information neural estimation. In International Conference on Machine Learning, pp. 531\u2013540."},{"key":"10087_CR4","unstructured":"Brockman, G., Cheung, V., & Pettersson, L., Jonas, S., John, S., Jie, T., Wojciech, Z. (2016). Openai gym."},{"key":"10087_CR5","doi-asserted-by":"crossref","unstructured":"Chitnis, R., Kaelbling, L.\u00a0P., & Lozano-P\u00e9rez, T. (2019). Learning quickly to plan quickly using modular meta-learning. In 2019 IEEE\/RSJ International Conference on Robotics and Automation (ICRA), pp. 7865\u20137871. IEEE.","DOI":"10.1109\/ICRA.2019.8794342"},{"key":"10087_CR6","unstructured":"Colas, C., Oudeyer, P.-Y., Sigaud, O., Fournier, P., & Chetouani, M. (2019). Curious: Intrinsically motivated modular multi-goal reinforcement learning. In International Conference on Machine Learning, pp. 1331\u20131340."},{"key":"10087_CR7","volume-title":"Elements of information theory","author":"TM Cover","year":"2012","unstructured":"Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. London: Wiley."},{"key":"10087_CR8","unstructured":"Dai, S., Hofmann, A., & Williams, B. (2021a). Automatic curricula via expert demonstrations. arXiv:2106.09159."},{"key":"10087_CR9","doi-asserted-by":"publisher","unstructured":"Dai, S., Xu, W., Hofmann, A., & Williams, B.\u00a0C. (2021b). An empowerment-based solution to robotic manipulation tasks with sparse rewards. In Proceedings of Robotics: Science and Systems, Virtual. https:\/\/doi.org\/10.15607\/RSS.2021.XVII.001.","DOI":"10.15607\/RSS.2021.XVII.001"},{"key":"10087_CR10","unstructured":"Dauphin, Y.\u00a0N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In International Conference on Machine Learning, pp. 933\u2013941."},{"key":"10087_CR11","unstructured":"Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., & Zhokhov, P. (2017). Openai baselines. https:\/\/github.com\/openai\/baselines."},{"key":"10087_CR12","unstructured":"Eysenbach, B., Gupta, A., Ibarz, J., & Levine, S. (2019). Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations."},{"key":"10087_CR13","unstructured":"Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pp. 1126\u20131135."},{"key":"10087_CR14","unstructured":"Florensa, C., Held, D., Wulfmeier, M., Zhang, M., & Abbeel, P. (2017). Reverse curriculum generation for reinforcement learning. In Conference on Robot Learning, pp. 482\u2013495."},{"key":"10087_CR15","unstructured":"Galashov, A., Jayakumar, S., Hasenclever, L., Tirumala, D., Schwarz, J., Desjardins, G., Czarnecki, W.\u00a0M., Teh, Y.\u00a0W., Pascanu, R., & Heess, N. (2019). Information asymmetry in KL-regularized RL. In International Conference on Learning Representations."},{"key":"10087_CR16","first-page":"199","volume":"12","author":"IM Gel\u2019Fand","year":"1959","unstructured":"Gel\u2019Fand, I. M., & Yaglom, A. M. (1959). Calculation of amount of information about a random function contained in another such function. Eleven Papers on Analysis, Probability and Topology, 12, 199.","journal-title":"Eleven Papers on Analysis, Probability and Topology"},{"key":"10087_CR17","unstructured":"Goyal, A., Islam, R., Strouse, D.\u00a0J., Ahmed, Z., Larochelle, H., Botvinick, M., Levine, S., & Bengio, Y. (2019). Transfer and exploration via the information bottleneck. In International Conference on Learning Representations."},{"key":"10087_CR18","unstructured":"Graves, A., Bellemare, M.\u00a0G, Menick, J., Munos, R., & Kavukcuoglu, K. (2017). Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1311\u20131320. JMLR. org."},{"key":"10087_CR19","unstructured":"Houthooft, R., Chen, X., Duan, Y., Schulman, J., De\u00a0Turck, F., & Abbeel, P. (2016). Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pp. 1109\u20131117."},{"key":"10087_CR20","doi-asserted-by":"crossref","unstructured":"Ivanovic, B., Harrison, J., Sharma, A., Chen, M., & Pavone, M. (2019). Barc: Backward reachability curriculum for robotic reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pp. 15\u201321. IEEE.","DOI":"10.1109\/ICRA.2019.8794206"},{"issue":"9\u201310","key":"10087_CR21","doi-asserted-by":"publisher","first-page":"1194","DOI":"10.1177\/0278364913484072","volume":"32","author":"LP Kaelbling","year":"2013","unstructured":"Kaelbling, L. P., & Lozano-P\u00e9rez, T. (2013). Integrated task and motion planning in belief space. The International Journal of Robotics Research, 32(9\u201310), 1194\u20131227.","journal-title":"The International Journal of Robotics Research"},{"key":"10087_CR22","unstructured":"Kim, H., Kim, J., Jeong, Y., Levine, S., & Song, H.\u00a0O. (2019a). Emi: Exploration with mutual information. In Proceedings of the 36th International Conference on Machine Learning, pp. 3360\u20133369."},{"key":"10087_CR23","unstructured":"Kim, Y., Nam, W., Kim, H., Kim, J.-H., & Kim, G. (2019b). Curiosity-bottleneck: Exploration by distilling task-specific novelty. In International Conference on Machine Learning, pp. 3379\u20133388."},{"key":"10087_CR24","doi-asserted-by":"crossref","unstructured":"Klyubin, A.\u00a0S., Polani, D., & Nehaniv, C.\u00a0L.: Empowerment: A universal agent-centric measure of control. In 2005 IEEE Congress on Evolutionary Computation, volume\u00a01, pp. 128\u2013135.","DOI":"10.1109\/CEC.2005.1554676"},{"issue":"1","key":"10087_CR25","first-page":"1334","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1), 1334\u20131373.","journal-title":"The Journal of Machine Learning Research"},{"issue":"10","key":"10087_CR26","doi-asserted-by":"publisher","first-page":"4394","DOI":"10.1109\/TIT.2006.881731","volume":"52","author":"F Liese","year":"2006","unstructured":"Liese, F., & Vajda, I. (2006). On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10), 4394\u20134412.","journal-title":"IEEE Transactions on Information Theory"},{"key":"10087_CR27","unstructured":"Mirowski, P., Grimes, M., Malinowski, M., Hermann, K.\u00a0M., Anderson, K., Teplyashin, D., Simonyan, K., Zisserman, A., & Hadsell, R., et\u00a0al. (2018). Learning to navigate in cities without a map. In Advances in Neural Information Processing Systems, pp. 2419\u20132430."},{"key":"10087_CR28","unstructured":"Mohamed, S., & Rezende, D.\u00a0J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems."},{"issue":"11","key":"10087_CR29","doi-asserted-by":"publisher","first-page":"5847","DOI":"10.1109\/TIT.2010.2068870","volume":"56","author":"XL Nguyen","year":"2010","unstructured":"Nguyen, X. L., Wainwright, M. J., & Jordan, M. I. (2010). Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11), 5847\u20135861.","journal-title":"IEEE Transactions on Information Theory"},{"key":"10087_CR30","unstructured":"Nowozin, S., Cseke, B., & Tomioka, R. (2016). f-gan: Training generative neural samplers using variational divergence minimization. In Advances in neural information processing systems, pp. 271\u2013279."},{"issue":"6","key":"10087_CR31","doi-asserted-by":"publisher","first-page":"1191","DOI":"10.1162\/089976603321780272","volume":"15","author":"L Paninski","year":"2003","unstructured":"Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15(6), 1191\u20131253.","journal-title":"Neural Computation"},{"key":"10087_CR32","doi-asserted-by":"crossref","unstructured":"Pathak, D., Agrawal, P., Efros, A.\u00a0A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, pp. 2778\u20132787.","DOI":"10.1109\/CVPRW.2017.70"},{"key":"10087_CR33","unstructured":"Pathak, D., Gandhi, D., & Gupta, A. (2019). Self-supervised exploration via disagreement. arXiv:1906.04161."},{"key":"10087_CR34","unstructured":"Riedmiller, M., Hafner, R., Lampe, T., Neunert, M., Degrave, J., Wiele, T., Mnih, V., Heess, N., & Springenberg, J.\u00a0T. (2018). Learning by playing solving sparse reward tasks from scratch. In International Conference on Machine Learning, pp. 4341\u20134350."},{"key":"10087_CR35","unstructured":"Russo, D., & Van\u00a0Roy, B. (2014). Learning to optimize via information-directed sampling. Advances in Neural Information Processing Systems, 27."},{"key":"10087_CR36","unstructured":"Savinov, N., Raichuk, A., Vincent, D., Marinier, R., Pollefeys, M., Lillicrap, T., & Gelly, S. (2019). Episodic curiosity through reachability. In International Conference on Learning Representations."},{"key":"10087_CR37","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347."},{"key":"10087_CR38","unstructured":"Sharma, A., Gu, S., Levine, S., Kumar, V., & Hausman, K. (2020). Dynamics-aware unsupervised skill discovery. In International Conference on Learning Representations."},{"key":"10087_CR39","unstructured":"Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., & Fergus, R. (2018). Intrinsic motivation and automatic curricula via asymmetric self-play. In International Conference on Learning Representations."},{"key":"10087_CR40","unstructured":"Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, O.\u00a0X., Duan, Y., Schulman, J., DeTurck, F., & Abbeel, P. (2017). # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 2753\u20132762."},{"key":"10087_CR41","unstructured":"Wang, R., Lehman, J., Clune, J., & Stanley, K.\u00a0O. (2019). Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv:1901.01753."},{"key":"10087_CR42","unstructured":"Weerakoon, K., Chakraborty, S., Karapetyan, N., Sathyamoorthy, A.\u00a0J., Bedi, A.\u00a0S., & Manocha, D. (2022). Htron: Efficient outdoor navigation with sparse rewards via heavy tailed adaptive reinforce algorithm. arXiv:2207.03694."},{"key":"10087_CR43","unstructured":"Yoshua, L., J\u00e9r\u00f4me, C., Ronan, & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41\u201348. ACM."}],"container-title":["Autonomous Robots"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-023-10087-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10514-023-10087-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-023-10087-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,13]],"date-time":"2024-10-13T17:16:02Z","timestamp":1728839762000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10514-023-10087-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,6]]},"references-count":43,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["10087"],"URL":"https:\/\/doi.org\/10.1007\/s10514-023-10087-8","relation":{},"ISSN":["0929-5593","1573-7527"],"issn-type":[{"value":"0929-5593","type":"print"},{"value":"1573-7527","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,6]]},"assertion":[{"value":"25 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 January 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}]}}