{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T17:13:24Z","timestamp":1773335604608,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2019,9,5]],"date-time":"2019-09-05T00:00:00Z","timestamp":1567641600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation of Hunan Province","award":["2017JJ3371"],"award-info":[{"award-number":["2017JJ3371"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we propose a novel Deep Reinforcement Learning (DRL) algorithm which can navigate non-holonomic robots with continuous control in an unknown dynamic environment with moving obstacles. We call the approach MK-A3C (Memory and Knowledge-based Asynchronous Advantage Actor-Critic) for short. As its first component, MK-A3C builds a GRU-based memory neural network to enhance the robot\u2019s capability for temporal reasoning. Robots without it tend to suffer from a lack of rationality in face of incomplete and noisy estimations for complex environments. Additionally, robots with certain memory ability endowed by MK-A3C can avoid local minima traps by estimating the environmental model. Secondly, MK-A3C combines the domain knowledge-based reward function and the transfer learning-based training task architecture, which can solve the non-convergence policies problems caused by sparse reward. These improvements of MK-A3C can efficiently navigate robots in unknown dynamic environments, and satisfy kinetic constraints while handling moving objects. Simulation experiments show that compared with existing methods, MK-A3C can realize successful robotic navigation in unknown and challenging environments by outputting continuous acceleration commands.<\/jats:p>","DOI":"10.3390\/s19183837","type":"journal-article","created":{"date-parts":[[2019,9,6]],"date-time":"2019-09-06T02:59:22Z","timestamp":1567738762000},"page":"3837","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":60,"title":["Navigation in Unknown Dynamic Environments Based on Deep Reinforcement Learning"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4353-1093","authenticated-orcid":false,"given":"Junjie","family":"Zeng","sequence":"first","affiliation":[{"name":"College of Systems Engineering, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Rusheng","family":"Ju","sequence":"additional","affiliation":[{"name":"College of Systems Engineering, National University of Defense Technology, Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1245-6622","authenticated-orcid":false,"given":"Long","family":"Qin","sequence":"additional","affiliation":[{"name":"College of Systems Engineering, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Yue","family":"Hu","sequence":"additional","affiliation":[{"name":"College of Systems Engineering, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Quanjun","family":"Yin","sequence":"additional","affiliation":[{"name":"College of Systems Engineering, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Cong","family":"Hu","sequence":"additional","affiliation":[{"name":"College of Systems Engineering, National University of Defense Technology, Changsha 410073, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,9,5]]},"reference":[{"key":"ref_1","first-page":"75","article-title":"A Review of Real-Time Strategy Game AI","volume":"35","author":"Robertson","year":"2014","journal-title":"AI Mag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1017\/S0263574714000289","article-title":"Algorithms for collision-free navigation of mobile robots in complex cluttered environments: A survey","volume":"33","author":"Hoy","year":"2015","journal-title":"Robotica"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1568","DOI":"10.1016\/j.robot.2014.05.006","article-title":"Seeking a path through the crowd: Robot navigation in unknown dynamic environments with moving obstacles based on an integrated environment representation","volume":"62","author":"Savkin","year":"2014","journal-title":"Robot. Auton. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Uras, T., Koenig, S., and Hernandez, C. (2013, January 10\u201314). Subgoal graphs in for optimal pathfinding in eight-neighbour grids. Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS \u201913), Rome, Italy.","DOI":"10.1609\/icaps.v23i1.13568"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Uras, T., and Koenig, S. (2014, January 27\u201331). Identifying hierarchies for fast optimal search. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada.","DOI":"10.1609\/aaai.v28i1.8845"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1109\/70.508439","article-title":"Probabilistic roadmaps for path planning in high-dimensional configuration spaces","volume":"12","author":"Kavraki","year":"1994","journal-title":"IEEE Trans. Robot. Autom."},{"key":"ref_7","unstructured":"Lavalle, S.M. (2000, January 16\u201318). Rapidly-exploring random trees: Progress and prospects. Proceedings of the 4th International Workshop on Algorithmic Foundations of Robotics, Hanover, Germany."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1613\/jair.1789","article-title":"Learning in real-time search: A unifying framework","volume":"25","author":"Bulitko","year":"2006","journal-title":"J. Artif. Intell. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1109\/TRO.2010.2076851","article-title":"Human-Centered Robot Navigation\u2014Towards a Harmoniously Human\u2013Robot Coexisting Environment","volume":"27","author":"Lam","year":"2011","journal-title":"IEEE Trans. Robot."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Van Den Berg, J., Guy, S.J., Lin, M., and Manocha, D. (2011). Reciprocal nbody collision avoidance. Robotics Research, Springer.","DOI":"10.1007\/978-3-642-19457-3_1"},{"key":"ref_11","unstructured":"Tai, L., and Liu, M. (2016). Deep-learning in Mobile Robotics\u2014From Perception to Control Systems: A Survey on Why and Why not. arXiv."},{"key":"ref_12","unstructured":"Li, Y. (2017). Deep Reinforcement Learning: An Overview. arXiv."},{"key":"ref_13","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19\u201324). Asynchronous methods for deep reinforcement learning. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_14","unstructured":"Jaderberg, M., Mnih, V., Czarnecki, W.M., Schaul, T., Leibo, J.Z., Silver, D., and Kavukcuoglu, K. (2017, January 24\u201326). Reinforcement Learning with Unsupervised Auxiliary Tasks. Proceedings of the 5th International Conference on Learning Representations, Toulon, France."},{"key":"ref_15","unstructured":"Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A.J., Banino, A., Denil, M., Goroshin, R., Sifre, L., and Kavukcuoglu, K. (2016). Learning to Navigate in Complex Environments. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T.P., and Levine, S. (June, January 29). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. Proceedings of the 2017 International Conference on Robotics and Automation, Singapore.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_17","unstructured":"Mirowski, P., Grimes, M., Malinowski, M., Hermann, K.M., Anderson, K., Teplyashin, D., Simonyan, K., Zisserman, A., and Hadsell, R. (2018). Learning to Navigate in Cities Without a Map. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_19","unstructured":"Ng, A.Y., Harada, D., and Russell, S.J. (1999, January 27\u201330). Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. Proceedings of the International Conference on Machine Learning, Bled, Slovenia."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A Survey on Transfer Learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Xu, H., Yu, F., Darrell, T., and Gao, Y. (2017, January 21\u201326). End-to-End Learning of Driving Models from Large-Scale Video Datasets. Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.376"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Tai, L., Li, S., and Liu, M. (2016, January 9\u201314). A deep-network solution towards model-less obstacle avoidance. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea.","DOI":"10.1109\/IROS.2016.7759428"},{"key":"ref_23","unstructured":"Agrawal, P., Nair, A., Abbeel, P., Malik, J., and Levine, S. (2016, January 5\u201310). Learning to Poke by Poking: Experiential Learning of Intuitive Physics. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tai, L., Paolo, G., and Liu, M. (2017, January 24\u201328). Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202134"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2124","DOI":"10.1109\/TVT.2018.2890773","article-title":"Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach","volume":"68","author":"Wang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_26","unstructured":"Zhelo, O., Zhang, J., Tai, L., Liu, M., and Burgard, W. (2018). Curiosity-driven Exploration for Mapless Navigation with Deep Reinforcement Learning. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Canny, J. (1988). The Complexity of Robot Motion Planning, MIT Press.","DOI":"10.1109\/SFCS.1988.21947"},{"key":"ref_28","first-page":"A187","article-title":"Continuous control with deep reinforcement learning","volume":"8","author":"Lillicrap","year":"2015","journal-title":"Comput. Sci."},{"key":"ref_29","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_30","unstructured":"Heess, N., Tb, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., and Eslami, S.M.A. (2017). Emergence of Locomotion Behaviours in Rich Environments. arXiv."},{"key":"ref_31","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press. [2nd ed.]."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Van Hasselt, H., Guez, A., and Silver, D. (2015). Deep Reinforcement Learning with Double Q-learning. arXiv.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., and Silver, D. (2018, January 2\u20137). Rainbow: Combining Improvements in Deep Reinforcement Learning. Proceedings of the National Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"ref_35","unstructured":"Schulman, J., Levine, S., Moritz, P., Jordan, M.I., and Abbeel, P. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the International Conference on Machine Learning, Lugano, Switzerland."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009, January 14\u201318). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, (ICML 2009), Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553380"},{"key":"ref_37","unstructured":"Zaremba, W., and Sutskever, I. (2014). Learning to Execute. arXiv."},{"key":"ref_38","unstructured":"Vezhnevets, A., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K. (2017, January 6\u201311). FeUdal networks for hierarchical reinforcement learning. Proceedings of the 2017 International Conference on Machine Learning, Sydney, Australia."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/18\/3837\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:17:04Z","timestamp":1760188624000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/18\/3837"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,5]]},"references-count":38,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2019,9]]}},"alternative-id":["s19183837"],"URL":"https:\/\/doi.org\/10.3390\/s19183837","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,5]]}}}