{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T13:35:39Z","timestamp":1770471339644,"version":"3.49.0"},"reference-count":76,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T00:00:00Z","timestamp":1732060800000},"content-version":"unspecified","delay-in-days":324,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Recently, the field of robotics development and control has been advancing rapidly. Even though humans effortlessly manipulate everyday objects, enabling robots to interact with human-made objects in real-world environments remains a challenge despite years of dedicated research. For example, typing on a keyboard requires adapting to various external conditions, such as the size and position of the keyboard, and demands high accuracy from a robot to be able to use it properly. This paper introduces a novel hierarchical reinforcement learning algorithm based on the Deep Deterministic Policy Gradient (DDPG) algorithm to address the dual-arm robot typing problem. In this regard, the proposed algorithm employs a Convolutional Auto-Encoder (CAE) to deal with the associated complexities of continuous state and action spaces at the first stage, and then a DDPG algorithm serves as a strategy controller for the typing problem. Using a dual-arm humanoid robot, we have extensively evaluated our proposed algorithm in simulation and real-world experiments. The results showcase the high efficiency of our approach, boasting an average success rate of 96.14% in simulations and 92.2% in real-world settings. Furthermore, we demonstrate that our proposed algorithm outperforms DDPG and Deep Q-Learning, two frequently employed algorithms in robotic applications.<\/jats:p>","DOI":"10.1017\/s0269888924000080","type":"journal-article","created":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T02:40:42Z","timestamp":1732070442000},"update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":2,"title":["A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot"],"prefix":"10.48130","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3360-1892","authenticated-orcid":false,"given":"Jacky","family":"Baltes","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hanjaya","family":"Mandala","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8294-8625","authenticated-orcid":false,"given":"Saeed","family":"Saeedvand","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"27968","published-online":{"date-parts":[[2024,11,20]]},"reference":[{"key":"S0269888924000080_ref42","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2017.123"},{"key":"S0269888924000080_ref18","doi-asserted-by":"publisher","DOI":"10.1126\/science.1127647"},{"key":"S0269888924000080_ref4","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2015.7363472"},{"key":"S0269888924000080_ref30","doi-asserted-by":"publisher","DOI":"10.1109\/ISESD.2017.8253313"},{"key":"S0269888924000080_ref67","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390294"},{"key":"S0269888924000080_ref44","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2014.7041477"},{"key":"S0269888924000080_ref32","unstructured":"Laschi, C. , et al. 2000. Grasping and manipulation in humanoid robotics. Scuola Superiore Sant Anna, Italia."},{"key":"S0269888924000080_ref48","unstructured":"Moosavian, S. A. A. , Semsarilar, H. & Kalantari, A. 2006. Design and manufacturing of a mobile rescue robot. In 2006 IEEE\/RSJ International Conference on Intelligent Robots and Systems."},{"key":"S0269888924000080_ref33","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"S0269888924000080_ref36","unstructured":"Li, Y. & Chuang, L. 2013. Controller design for music playing robot \u2014 Applied to the anthropomorphic piano robot. In 2013 IEEE 10th International Conference on Power Electronics and Drive Systems (PEDS)."},{"key":"S0269888924000080_ref5","unstructured":"Bochkovskiy, A. , Wang, C.-Y. & Liao, H.-Y.M. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection."},{"key":"S0269888924000080_ref53","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2018.12.017"},{"key":"S0269888924000080_ref28","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21558"},{"key":"S0269888924000080_ref38","unstructured":"Lillicrap, T. P. , et al. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971."},{"key":"S0269888924000080_ref31","doi-asserted-by":"publisher","DOI":"10.1109\/QIR.2017.8168506"},{"key":"S0269888924000080_ref27","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"S0269888924000080_ref37","doi-asserted-by":"publisher","DOI":"10.1109\/TCDS.2017.2770168"},{"key":"S0269888924000080_ref41","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01247-4"},{"key":"S0269888924000080_ref3","doi-asserted-by":"publisher","DOI":"10.1109\/ICHR.2010.5686330"},{"key":"S0269888924000080_ref16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47426-3_59"},{"key":"S0269888924000080_ref11","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919872545"},{"key":"S0269888924000080_ref63","unstructured":"Silver, D. , et al. 2014. Deterministic policy gradient algorithms."},{"key":"S0269888924000080_ref17","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21735-7_6"},{"key":"S0269888924000080_ref12","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487173"},{"key":"S0269888924000080_ref58","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-019-01475-8"},{"key":"S0269888924000080_ref65","unstructured":"Sugano, S. & Kato, I. 1987. WABOT-2: Autonomous robot with dexterous finger-arm\u2013Finger-arm coordination control in keyboard performance. In Proceedings. 1987 IEEE International Conference on Robotics and Automation. IEEE."},{"key":"S0269888924000080_ref64","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2012.2210294"},{"key":"S0269888924000080_ref71","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759419"},{"key":"S0269888924000080_ref45","doi-asserted-by":"crossref","unstructured":"Mandala, H. , Saeedvand, S. & Baltes, J . 2020. Synchronous dual-arm manipulation by adult-sized humanoid robot. In 2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS). IEEE.","DOI":"10.1109\/ARIS50834.2020.9205783"},{"key":"S0269888924000080_ref8","unstructured":"Brockman, G. , Sutskever, I. & Altman, S. 2020. (5\/18\/2020), [website], OpenAI, Retrieved from https:\/\/gym.openai.com\/."},{"key":"S0269888924000080_ref60","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2021.107601"},{"key":"S0269888924000080_ref50","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1109\/TRO.2005.852263","article-title":"Control of an object with parallel surfaces by a pair of finger robots without object sensing","volume":"21","author":"Ozawa","year":"2005","journal-title":"IEEE Transactions on Robotics"},{"key":"S0269888924000080_ref69","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-78963-7_59"},{"key":"S0269888924000080_ref40","first-page":"1601","volume-title":"Intelligent Autonomous Systems,","volume":"13","author":"Lioutikov","year":"2016."},{"key":"S0269888924000080_ref19","unstructured":"Hoof, H. V. , et al. 2015. Learning robot in-hand manipulation with tactile features. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids)."},{"key":"S0269888924000080_ref24","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6095096"},{"key":"S0269888924000080_ref68","doi-asserted-by":"publisher","DOI":"10.3390\/s20030939"},{"key":"S0269888924000080_ref46","doi-asserted-by":"crossref","unstructured":"Masci, J. , et al. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks. Springer.","DOI":"10.1007\/978-3-642-21735-7_7"},{"key":"S0269888924000080_ref55","doi-asserted-by":"crossref","unstructured":"Ranzato, M.A. , et al. 2007. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE.","DOI":"10.1109\/CVPR.2007.383157"},{"key":"S0269888924000080_ref22","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21571"},{"key":"S0269888924000080_ref35","first-page":"1334","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine","year":"2016","journal-title":"The Journal of Machine Learning Research"},{"key":"S0269888924000080_ref59","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106700"},{"key":"S0269888924000080_ref43","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-9267-2_2"},{"key":"S0269888924000080_ref7","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v29i1.9378"},{"key":"S0269888924000080_ref61","doi-asserted-by":"publisher","DOI":"10.1177\/0278364907087172"},{"key":"S0269888924000080_ref73","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980342"},{"key":"S0269888924000080_ref47","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"S0269888924000080_ref39","doi-asserted-by":"publisher","DOI":"10.1109\/3CA.2010.5533457"},{"key":"S0269888924000080_ref62","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2019.07.003"},{"key":"S0269888924000080_ref76","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794102"},{"key":"S0269888924000080_ref9","doi-asserted-by":"publisher","DOI":"10.1016\/j.procir.2018.01.021"},{"key":"S0269888924000080_ref25","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197415"},{"key":"S0269888924000080_ref66","volume-title":"Advances in Neural Information Processing Systems.","author":"Sutton","year":"2000"},{"key":"S0269888924000080_ref54","doi-asserted-by":"crossref","unstructured":"Rajeswaran, A. , et al. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087.","DOI":"10.15607\/RSS.2018.XIV.049"},{"key":"S0269888924000080_ref70","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.01.006"},{"key":"S0269888924000080_ref52","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1109\/41.824136","article-title":"Global minimum-jerk trajectory planning of robot manipulators","volume":"47","author":"Piazzi","year":"2000","journal-title":"IEEE Transactions on Industrial Electronics"},{"key":"S0269888924000080_ref10","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-26326-3_3"},{"key":"S0269888924000080_ref6","doi-asserted-by":"publisher","DOI":"10.1108\/IR-01-2015-0010"},{"key":"S0269888924000080_ref29","doi-asserted-by":"publisher","DOI":"10.1109\/SIET.2017.8304167"},{"key":"S0269888924000080_ref21","unstructured":"Jiang, Y. , Moseson, S. & Saxena, A. 2011. Efficient grasping from rgbd images: Learning using a new rectangle representation. In 2011 IEEE International Conference on Robotics and Automation. IEEE."},{"key":"S0269888924000080_ref15","unstructured":"Hafner, D. , et al. 2020. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193."},{"key":"S0269888924000080_ref26","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2019.106460"},{"key":"S0269888924000080_ref1","doi-asserted-by":"publisher","DOI":"10.1109\/URSIGASS.2014.6929384"},{"key":"S0269888924000080_ref57","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888919000158"},{"key":"S0269888924000080_ref13","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"S0269888924000080_ref56","doi-asserted-by":"crossref","first-page":"315","DOI":"10.3390\/aerospace9060315","article-title":"Research on dual-arm control of lunar assisted robot based on hierarchical reinforcement learning under unstructured environment","volume":"9","author":"Ren","year":"2022","journal-title":"Aerospace"},{"key":"S0269888924000080_ref49","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386301"},{"key":"S0269888924000080_ref72","doi-asserted-by":"crossref","unstructured":"Ye, G. , Thobbi, A. & Sheng, W. 2011. Human-robot collaborative manipulation through imitation and reinforcement learning. In 2011 IEEE International Conference on Information and Automation.","DOI":"10.1109\/ICINFA.2011.5948979"},{"key":"S0269888924000080_ref75","unstructured":"Zhang, F. , et al. 2015. Towards vision-based deep reinforcement learning for robotic motion control. arXiv preprint arXiv:1511.03791."},{"key":"S0269888924000080_ref14","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2019.102805"},{"key":"S0269888924000080_ref2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2023.106941"},{"key":"S0269888924000080_ref23","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-54536-8"},{"key":"S0269888924000080_ref20","unstructured":"Huang, S. H. , et al. 2019. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning. arXiv preprint arXiv:1903.08542."},{"key":"S0269888924000080_ref51","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5980200"},{"key":"S0269888924000080_ref74","doi-asserted-by":"publisher","DOI":"10.1109\/ICINFA.2009.5205022"},{"key":"S0269888924000080_ref34","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914549607"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888924000080","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:25Z","timestamp":1767624145000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888924000080\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":76,"alternative-id":["S0269888924000080"],"URL":"https:\/\/doi.org\/10.1017\/s0269888924000080","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"\u00a9 The Author(s), 2024. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and\/or adaptation of the article.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}],"article-number":"e7"}}