{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T12:27:47Z","timestamp":1768998467698,"version":"3.49.0"},"reference-count":37,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ningbo K&amp;D Project","award":["2023Z116"],"award-info":[{"award-number":["2023Z116"]}]},{"name":"Open Foundation of the State Key Laboratory of Fluid Power and Mechatronic Systems","award":["2023Z116"],"award-info":[{"award-number":["2023Z116"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In the research of robot systems, path planning and obstacle avoidance are important research directions, especially in unknown dynamic environments where flexibility and rapid decision makings are required. In this paper, a state attention network (SAN) was developed to extract features to represent the interaction between an intelligent robot and its obstacles. An auxiliary actor discriminator (AAD) was developed to calculate the probability of a collision. Goal-directed and gap-based navigation strategies were proposed to guide robotic exploration. The proposed policy was trained through simulated scenarios and updated by the Soft Actor-Critic (SAC) algorithm. The robot executed the action depending on the AAD output. Heuristic knowledge (HK) was developed to prevent blind exploration of the robot. Compared to other methods, adopting our approach in robot systems can help robots converge towards an optimal action strategy. Furthermore, it enables them to explore paths in unknown environments with fewer moving steps (showing a decrease of 33.9%) and achieve higher average rewards (showning an increase of 29.15%).<\/jats:p>","DOI":"10.3390\/s24020700","type":"journal-article","created":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T12:01:13Z","timestamp":1705924873000},"page":"700","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Deep Reinforcement Learning for Autonomous Driving with an Auxiliary Actor Discriminator"],"prefix":"10.3390","volume":"24","author":[{"given":"Qiming","family":"Gao","sequence":"first","affiliation":[{"name":"Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5149-255X","authenticated-orcid":false,"given":"Fangle","family":"Chang","sequence":"additional","affiliation":[{"name":"Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China"},{"name":"State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China"}]},{"given":"Jiahong","family":"Yang","sequence":"additional","affiliation":[{"name":"Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China"},{"name":"Polytechnic Institute, Zhejiang University, Hangzhou 310013, China"}]},{"given":"Yu","family":"Tao","sequence":"additional","affiliation":[{"name":"Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China"}]},{"given":"Longhua","family":"Ma","sequence":"additional","affiliation":[{"name":"Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China"},{"name":"Institute of Intelligent Automation, NingboTech University, Ningbo 315100, China"}]},{"given":"Hongye","family":"Su","sequence":"additional","affiliation":[{"name":"Ningbo Innovation Center, Zhejiang University, Ningbo 315100, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1508","DOI":"10.1109\/TNNLS.2013.2293499","article-title":"Distributed neural network control for adaptive synchronization of uncertain dynamical multiagent systems","volume":"25","author":"Peng","year":"2013","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1109\/JAS.2019.1911732","article-title":"Path planning for intelligent robots based on deep q-learning with experience replay and heuristic knowledge","volume":"7","author":"Jiang","year":"2019","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1109\/JAS.2021.1003841","article-title":"Boundary Gap Based Reactive Navigation in Unknown Environments","volume":"8","author":"Gao","year":"2021","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bounini, F., Gingras, D., Pollart, H., and Gruyer, D. (2017, January 11\u201317). Modified artificial potential field method for online path planning applications. Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA.","DOI":"10.1109\/IVS.2017.7995717"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"9198","DOI":"10.1109\/TNNLS.2022.3156907","article-title":"Research on Obstacle Detection and Avoidance of Autonomous Underwater Vehicle Based on Forward-Looking Sonar","volume":"34","author":"Cao","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1177\/027836499101000604","article-title":"Robot motion planning: A distributed representation approach","volume":"10","author":"Barraquand","year":"1991","journal-title":"Int. J. Robot. Res."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zeng, J., Ju, R., Qin, L., Hu, Y., and Hu, C. (2019). Navigation in unknown dynamic environments based on deep reinforcement learning. Sensors, 19.","DOI":"10.3390\/s19183837"},{"key":"ref_8","unstructured":"Berg, J., Guy, S.J., Lin, M., and Manocha, D. (2011). Robotics Research, Springer."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1109\/JAS.2020.1003246","article-title":"Simulation and field testing of multiple vehicles collision avoidance algorithms","volume":"7","author":"Zu","year":"2020","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Jin, J., Kim, Y.G., Wee, S.G., and Gans, N. (2015, January 26\u201330). Decentralized cooperative mean approach to collision avoidance for nonholonomic mobile robots. Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Washington, DC, USA.","DOI":"10.1109\/ICRA.2015.7138977"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"113","DOI":"10.3233\/IFS-2010-0440","article-title":"A new mobile robot navigation method using fuzzy logic and a modified Q-learning algorithm","volume":"21","author":"Boubertakh","year":"2010","journal-title":"J. Intell. Fuzzy Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1109\/JAS.2014.7004666","article-title":"An adaptive obstacle avoidance algorithm for unmanned surface vehicle in complicated marine environments","volume":"1","author":"Zhang","year":"2014","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"38200","DOI":"10.1109\/ACCESS.2018.2853146","article-title":"Scalable coverage path planning for cleaning robots using rectangular map decomposition on large environments","volume":"6","author":"Miao","year":"2018","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1109\/12.663776","article-title":"A note on the complexity of Dijkstra\u2019s algorithm for graphs with weighted vertices","volume":"47","author":"Barbehenn","year":"1998","journal-title":"IEEE Trans. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/0020-0255(84)90009-4","article-title":"A result on the computational complexity of heuristic estimates for the A* algorithm","volume":"34","author":"Valtorta","year":"1984","journal-title":"Inf. Sci."},{"key":"ref_16","unstructured":"Stentz, A. (1997). Intelligent Unmanned Ground Vehicles, Springer."},{"key":"ref_17","first-page":"293","article-title":"Rapidly-exploring random trees: Progress and prospects","volume":"5","author":"LaValle","year":"2001","journal-title":"Algorithmic Comput. Robot. New Dir."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1109\/70.508439","article-title":"Probabilistic roadmaps for path planning in high-dimensional configuration spaces","volume":"12","author":"Kavraki","year":"1996","journal-title":"IEEE Trans. Robot. Autom."},{"key":"ref_19","unstructured":"Khatib, O. (1986). Autonomous Robot Vehicles, Springer."},{"key":"ref_20","unstructured":"Alonso-Mora, J., Breitenmoser, A., Rufli, M., and Beardsley, P. (2013). Distributed Autonomous Robotic Systems, Springer."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Han, R., Chen, S., and Hao, Q. (2020, January 25\u201329). A Distributed Range-Only Collision Avoidance Approach for Low-cost Large-scale Multi-Robot Systems. Proceedings of the 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9341539"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ataka, A., Lam, H.K., and Althoefer, K. (2018, January 21\u201325). Reactive magnetic-field-inspired navigation for non-holonomic mobile robots in unknown environments. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8463203"},{"key":"ref_23","unstructured":"Tai, L., and Liu, M. (2016). Deep-learning in mobile robotics-from perception to control systems: A survey on why and why not. arXiv."},{"key":"ref_24","first-page":"6382","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume":"30","author":"Lowe","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Everett, M., Chen, Y.F., and How, J.P. (2018, January 1\u20135). Motion planning among dynamic, decision-making agents with deep reinforcement learning. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593871"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_27","first-page":"729","article-title":"Reinforcement learning","volume":"12","author":"Wiering","year":"2012","journal-title":"Adapt. Learn. Optim."},{"key":"ref_28","unstructured":"Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., and Abbeel, P. (2018). Soft actor-critic algorithms and applications. arXiv."},{"key":"ref_29","unstructured":"Christodoulou, P. (2019). Soft actor-critic for discrete action settings. arXiv."},{"key":"ref_30","unstructured":"Yarats, D., Zhang, A., Kostrikov, I., Amos, B., Pineau, J., and Fergus, R. (2019). Improving sample efficiency in model-free reinforcement learning from images. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2042","DOI":"10.1109\/TNNLS.2017.2773458","article-title":"Optimal and autonomous control using reinforcement learning: A survey","volume":"29","author":"Kiumarsi","year":"2017","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1007\/s11370-021-00387-2","article-title":"Reinforcement learning-based dynamic obstacle avoidance and integration of path planning","volume":"14","author":"Choi","year":"2021","journal-title":"Intell. Serv. Robot."},{"key":"ref_33","unstructured":"Zhelo, O., Zhang, J., Tai, L., Liu, M., and Burgard, W. (2018). Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, C., Wang, J., Zhang, X., and Zhang, X. (2017, January 14\u201316). Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning. Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (Glob-alSIP), Montreal, QC, Canada.","DOI":"10.1109\/GlobalSIP.2017.8309082"},{"key":"ref_35","first-page":"012173","article-title":"An overview of the attention mechanisms in computer vision","volume":"Volume 1693","author":"Yang","year":"2020","journal-title":"Journal of Physics: Conference Series"},{"key":"ref_36","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1361","DOI":"10.1109\/JAS.2020.1003300","article-title":"A recurrent attention and interaction model for pedestrian trajectory prediction","volume":"7","author":"Li","year":"2020","journal-title":"IEEE\/CAA J. Autom. Sin."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/2\/700\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:47:14Z","timestamp":1760104034000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/24\/2\/700"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,22]]},"references-count":37,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["s24020700"],"URL":"https:\/\/doi.org\/10.3390\/s24020700","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,22]]}}}