{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T08:34:21Z","timestamp":1778834061605,"version":"3.51.4"},"reference-count":37,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2021,7,24]],"date-time":"2021-07-24T00:00:00Z","timestamp":1627084800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>In robotics, obstacle avoidance is an essential ability for distance sensor-based robots. This type of robot has axisymmetrically distributed distance sensors to acquire obstacle distance, so the state is symmetrical. Training the control policy with a reinforcement learning method is a trend. Considering the complexity of environments, such as narrow paths and right-angle turns, robots will have a better ability if the control policy can control the steering direction and speed simultaneously. This paper proposes the multi-dimensional action control (MDAC) approach based on a reinforcement learning technique, which can be used in multiple continuous action space tasks. It adopts a hierarchical structure, which has high and low-level modules. Low-level policies output concrete actions and the high-level policy determines when to invoke low-level modules according to the environment\u2019s features. We design robot navigation experiments with continuous action spaces to test the method\u2019s performance. It is an end-to-end approach and can solve complex obstacle avoidance tasks in navigation.<\/jats:p>","DOI":"10.3390\/sym13081335","type":"journal-article","created":{"date-parts":[[2021,7,25]],"date-time":"2021-07-25T22:07:00Z","timestamp":1627250820000},"page":"1335","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement Learning"],"prefix":"10.3390","volume":"13","author":[{"given":"Menghao","family":"Wu","sequence":"first","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"},{"name":"Department of Computer Science, Aalto University, 02150 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanbin","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1899-8134","authenticated-orcid":false,"given":"Pengfei","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fan","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhejun","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,7,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Khatib, O. (1986). Real-time obstacle avoidance for manipulators and mobile robots. Autonomous Robot Vehicles, Springer.","DOI":"10.1007\/978-1-4613-8997-2_29"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Stulp, F., and Schaal, S. (2011, January 26\u201328). Hierarchical reinforcement learning with movement primitives. Proceedings of the 2011 11th IEEE-RAS International Conference on Humanoid Robots, Bled, Slovenia.","DOI":"10.1109\/Humanoids.2011.6100841"},{"key":"ref_3","unstructured":"Sutton, R.S., and Barto, A.G. (2013). [Draft-2] Reinforcement Learning: An Introduction, The MIT Press."},{"key":"ref_4","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Wu, M., Gao, Y., Jung, A., Zhang, Q., and Du, S. (2019). The Actor-Dueling-Critic Method for Reinforcement Learning. Sensors, 19.","DOI":"10.3390\/s19071547"},{"key":"ref_8","unstructured":"Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., and Freitas, N. (2015). Dueling network architectures for deep reinforcement learning. arXiv."},{"key":"ref_9","first-page":"2613","article-title":"Double Q-learning","volume":"23","author":"Hasselt","year":"2010","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_10","unstructured":"Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., and Abbeel, P. (2018). Soft actor\u2013critic algorithms and applications. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1023\/A:1022140919877","article-title":"Recent advances in hierarchical reinforcement learning","volume":"13","author":"Barto","year":"2003","journal-title":"Discret. Event Dyn. Syst."},{"key":"ref_12","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume":"12","author":"Sutton","year":"1999","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2471","DOI":"10.1016\/j.automatica.2009.07.008","article-title":"Natural actor\u2013critic algorithms","volume":"45","author":"Bhatnagar","year":"2009","journal-title":"Automatica"},{"key":"ref_14","unstructured":"O\u2019Donoghue, B., Munos, R., Kavukcuoglu, K., and Mnih, V. (2016). Combining policy gradient and Q-learning. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Al-Emran, M. (2015). Hierarchical reinforcement learning: A survey. Int. J. Comput. Digit. Syst., 4.","DOI":"10.12785\/ijcds\/040207"},{"key":"ref_16","unstructured":"Marthi, B., Russell, S.J., Latham, D., and Guestrin, C. (August, January 30). Concurrent hierarchical reinforcement learning. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland."},{"key":"ref_17","unstructured":"Bakker, B., and Schmidhuber, J. (2010, January 24). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. Proceedings of the 8-th Conference on Intelligent Autonomous Systems, Amsterdam, The Netherlands."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tavakoli, A., Pardo, F., and Kormushev, P. (2018, January 2\u20137). Action branching architectures for deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, Hilton New Orleans Riverside, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11798"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. (2017). Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0172395"},{"key":"ref_20","unstructured":"Metz, L., Ibarz, J., Jaitly, N., and Davidson, J. (2017). Discrete sequential prediction of continuous actions for deep rl. arXiv."},{"key":"ref_21","unstructured":"Vezhnevets, A.S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. (2017, January 6\u201311). Feudal networks for hierarchical reinforcement learning. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Dietterich, T.G. (2000). An Overview of MAXQ Hierarchical Reinforcement Learning. International Symposium on Abstraction, Springer.","DOI":"10.1007\/3-540-44914-0_2"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/S0921-8890(01)00113-0","article-title":"Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning","volume":"36","author":"Morimoto","year":"2001","journal-title":"Robot. Auton. Syst."},{"key":"ref_24","first-page":"3675","article-title":"Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation","volume":"29","author":"Kulkarni","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1613\/jair.639","article-title":"Hierarchical reinforcement learning with the MAXQ value function decomposition","volume":"13","author":"Dietterich","year":"2000","journal-title":"J. Artif. Intell. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"5174","DOI":"10.1109\/TNNLS.2018.2805379","article-title":"Hierarchical deep reinforcement learning for continuous action control","volume":"29","author":"Yang","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bacon, P.L., Harb, J., and Precup, D. (2017, January 4\u20139). The option-critic architecture. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.10916"},{"key":"ref_28","first-page":"3303","article-title":"Data-efficient hierarchical reinforcement learning","volume":"31","author":"Nachum","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_29","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor\u2013critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2015). Deep reinforcement learning with double q-learning. arXiv.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_31","first-page":"2154","article-title":"Value iteration networks","volume":"29","author":"Tamar","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","unstructured":"Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2015). Prioritized experience replay. arXiv."},{"key":"ref_33","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Diuk, C., Strehl, A.L., and Littman, M.L. (2006, January 8\u201312). A hierarchical approach to efficient reinforcement learning in deterministic domains. Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems, Future University, Hakodate, Japan.","DOI":"10.1145\/1160633.1160686"},{"key":"ref_35","unstructured":"Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. (2018). Diversity is all you need: Learning skills without a reward function. arXiv."},{"key":"ref_36","unstructured":"Florensa, C., Duan, Y., and Abbeel, P. (2017). Stochastic neural networks for hierarchical reinforcement learning. arXiv."},{"key":"ref_37","unstructured":"Heess, N., Wayne, G., Tassa, Y., Lillicrap, T., Riedmiller, M., and Silver, D. (2016). Learning and transfer of modulated locomotor controllers. arXiv."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/13\/8\/1335\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:34:19Z","timestamp":1760164459000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/13\/8\/1335"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,24]]},"references-count":37,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2021,8]]}},"alternative-id":["sym13081335"],"URL":"https:\/\/doi.org\/10.3390\/sym13081335","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,24]]}}}