{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T16:57:31Z","timestamp":1778691451015,"version":"3.51.4"},"reference-count":44,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2020,9,25]],"date-time":"2020-09-25T00:00:00Z","timestamp":1600992000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"China Scholarship Council (CSC)","award":["201908440537"],"award-info":[{"award-number":["201908440537"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61803103"],"award-info":[{"award-number":["61803103"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper proposes a novel incremental training mode to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. Firstly, we evaluate the related graphic search algorithms and Reinforcement Learning (RL) algorithms in a lightweight 2D environment. Then, we design the algorithm based on DRL, including observation states, reward function, network structure as well as parameters optimization, in a 2D environment to circumvent the time-consuming works for a 3D environment. We transfer the designed algorithm to a simple 3D environment for retraining to obtain the converged network parameters, including the weights and biases of deep neural network (DNN), etc. Using these parameters as initial values, we continue to train the model in a complex 3D environment. To improve the generalization of the model in different scenes, we propose to combine the DRL algorithm Twin Delayed Deep Deterministic policy gradients (TD3) with the traditional global path planning algorithm Probabilistic Roadmap (PRM) as a novel path planner (PRM+TD3). Experimental results show that the incremental training mode can notably improve the development efficiency. Moreover, the PRM+TD3 path planner can effectively improve the generalization of the model.<\/jats:p>","DOI":"10.3390\/s20195493","type":"journal-article","created":{"date-parts":[[2020,9,25]],"date-time":"2020-09-25T08:57:32Z","timestamp":1601024252000},"page":"5493","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":188,"title":["Deep Reinforcement Learning for Indoor Mobile Robot Path Planning"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0327-3616","authenticated-orcid":false,"given":"Junli","family":"Gao","sequence":"first","affiliation":[{"name":"School of Automation, Guangdong University of Technology, Guangzhou 510006, China"}]},{"given":"Weijie","family":"Ye","sequence":"additional","affiliation":[{"name":"School of Automation, Guangdong University of Technology, Guangzhou 510006, China"}]},{"given":"Jing","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Automation, Guangdong University of Technology, Guangzhou 510006, China"}]},{"given":"Zhongjuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Automation, Guangdong University of Technology, Guangzhou 510006, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhang, L., Chen, Z., Cui, W., Li, B., Chen, C.Y., Cao, Z., and Gao, K. (2020). WiFi-Based Indoor Robot Positioning Using Deep Fuzzy Forests. IEEE Internet Things J.","DOI":"10.1109\/JIOT.2020.2986685"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Dissanayake, G., Huang, S., Wang, Z., and Ranasinghe, R. (2011, January 16\u201319). A review of recent developments in Simultaneous Localization and Mapping. Proceedings of the 2011 6th International Conference on Industrial and Information Systems, Kandy, Sri Lanka.","DOI":"10.1109\/ICIINFS.2011.6038117"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1017\/S0263574711000567","article-title":"Self-adaptive Monte Carlo localization for mobile robots using range finders","volume":"30","author":"Zhang","year":"2012","journal-title":"Robotica"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.neucom.2019.01.023","article-title":"Using FTOC to track shuttlecock for the badminton robot","volume":"334","author":"Chen","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"113833","DOI":"10.1016\/j.eswa.2020.113833","article-title":"Detecting the shuttlecock for a badminton robot: A YOLO based approach","volume":"164","author":"Cao","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1177\/0278364917710318","article-title":"Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection","volume":"37","author":"Levine","year":"2018","journal-title":"Int. J. Robot. Res."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., and Meger, D. (2018). Deep Reinforcement Learning that Matters. arXiv.","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fadzli, S.A., Abdulkadir, S.I., Makhtar, M., and Jamal, A.A. (2015, January 14\u201316). Robotic Indoor Path Planning Using Dijkstra\u2019s Algorithm with Multi-Layer Dictionaries. Proceedings of the 2015 2nd International Conference on Information Science and Security (ICISS), Seoul, Korea.","DOI":"10.1109\/ICISSEC.2015.7371031"},{"key":"ref_9","unstructured":"Latombe, L.E.K.J.C. (1998). Probabilistic roadmaps for robot path planning. Pratical Motion Planning in Robotics: Current Aproaches and Future Challenges, Wiley."},{"key":"ref_10","first-page":"11","article-title":"Rapidly-exploring random trees: A new tool for path planning","volume":"98","author":"Lavalle","year":"1998","journal-title":"Comput. Sci. Dept. Oct."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1688","DOI":"10.1109\/TITS.2015.2498160","article-title":"Finding the Shortest Path in Stochastic Vehicle Routing: A Cardinality Minimization Approach","volume":"17","author":"Cao","year":"2016","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3993","DOI":"10.1109\/TVT.2015.2480964","article-title":"Improving the Efficiency of Stochastic Vehicle Routing: A Partial Lagrange Multiplier Method","volume":"65","author":"Cao","year":"2016","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1109\/70.163777","article-title":"Exact robot navigation using artificial potential functions","volume":"8","author":"Rimon","year":"1992","journal-title":"IEEE Trans. Robot. Autom."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/100.580977","article-title":"The dynamic window approach to collision avoidance","volume":"4","author":"Fox","year":"1997","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_15","unstructured":"R\u00f6smann, C., Feiten, W., W\u00f6sch, T., Hoffmann, F., and Bertram, T. (2012, January 21\u201322). Trajectory modification considering dynamic constraints of autonomous robots. Proceedings of the 7th German Conference on Robotics, Munich, Germany."},{"key":"ref_16","first-page":"961","article-title":"Survey on technology of mobile robot path planning","volume":"25","author":"Zhu","year":"2010","journal-title":"Control Decis."},{"key":"ref_17","unstructured":"Wu, Y., Song, W., Cao, Z., Zhang, J., and Lim, A. (2019). Learning Improvement Heuristics for Solving Routing Problems. arXiv."},{"key":"ref_18","unstructured":"Bao, Q.Y., Li, S.M., Shen, H., and Men, X.H. (2009). Survey of local path planning of autonomous mobile robot. Transducer Microsyst. Technol., 9."},{"key":"ref_19","first-page":"1","article-title":"Present Situation and Future Develepment of Mobile Robot Path Planning Technology","volume":"10","author":"Xu","year":"2006","journal-title":"Comput. Simul."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Tai, L., and Liu, M. (2016). Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots. arXiv.","DOI":"10.1186\/s40638-016-0055-x"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pfeiffer, M., Schaeuble, M., Nieto, J., Siegwart, R., and Cadena, C. (June, January 29). From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989182"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Tai, L., and Liu, M. (2016, January 6\u201310). A robot exploration strategy based on Q-learning network. Proceedings of the 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR), Angkor Wat, Cambodia.","DOI":"10.1109\/RCAR.2016.7784001"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tai, L., Paolo, G., and Liu, M. (2017, January 24\u201328). Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202134"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Faust, A., Ram\u00edrez, O., Fiser, M., Oslund, K., Francis, A., Davidson, J.O., and Tapia, L. (2018, January 21\u201325). PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461096"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1109\/TRO.2020.2975428","article-title":"Long-Range Indoor Navigation With PRM-RL","volume":"36","author":"Francis","year":"2020","journal-title":"IEEE Trans. Robot."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zeng, J., Qin, L., Hu, Y., Yin, Q., and Hu, C. (2019). Integrating a Path Planner and an Adaptive Motion Controller for Navigation in Dynamic Environments. Appl. Sci., 9.","DOI":"10.3390\/app9071384"},{"key":"ref_27","unstructured":"Iyer, A., and Mahadevan, A. (2020). Collision Avoidance Robotics Via Meta-Learning (CARML). arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1007\/s10846-018-0809-5","article-title":"A Real-Time 3D Path Planning Solution for Collision-Free Navigation of Multirotor Aerial Robots in Dynamic Environments","volume":"93","author":"Wang","year":"2019","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Kim, M., Han, D.K., Park, J., and Kim, J.S. (2020). Motion Planning of Robot Manipulators for a Smoother Path Using a Twin Delayed Deep Deterministic Policy Gradient with Hindsight Experience Replay. Appl. Sci., 10.","DOI":"10.3390\/app10020575"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Mottaghi, R., Kolve, E., Lim, J.J., Gupta, A., Fei-Fei, L., and Farhadi, A. (June, January 29). Target-driven visual navigation in indoor scenes using deep reinforcement learning. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989381"},{"key":"ref_31","unstructured":"Stooke, A., and Abbeel, P. (2018). Accelerated Methods for Deep Reinforcement Learning. arXiv."},{"key":"ref_32","unstructured":"Florensa, C., Held, D., Wulfmeier, M., Zhang, M., and Abbeel, P. (2017). Reverse Curriculum Generation for Reinforcement Learning. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"4298","DOI":"10.1109\/LRA.2019.2931199","article-title":"RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators From RL Policies","volume":"4","author":"Chiang","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_34","unstructured":"Fujimoto, S., Hoof, H., and Meger, D. (2018, January 10\u201315). Addressing Function Approximation Error in Actor-Critic Methods. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Bandyopadhyay, T., Won, K., Frazzoli, E., Hsu, D., Lee, W.S., and Rus, D. (2013). Intention-Aware Motion Planning. Algorithmic Foundations of Robotics X, Springer.","DOI":"10.1007\/978-3-642-36279-8_29"},{"key":"ref_36","unstructured":"Quigley, M. (2009, January 12\u201317). ROS: An open-source Robot Operating System. Proceedings of the ICRA 2009, Kobe, Japan."},{"key":"ref_37","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ye, W. (2020). Research on Path Planning of Indoor Mobile Robot Based on Deep Reinforcement Learning. [Master\u2019s Thesis, Guandong University of Technology].","DOI":"10.1109\/ICMA49215.2020.9233738"},{"key":"ref_39","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_40","unstructured":"Lillicrap, T., Hunt, J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2016). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_41","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv."},{"key":"ref_42","unstructured":"Konda, V.R., and Tsitsiklis, J. (December, January 29). Actor-Critic Algorithms. Proceedings of the NIPS 1999, Denver, CO, USA."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1958","DOI":"10.1109\/TITS.2016.2613997","article-title":"A Unified Framework for Vehicle Rerouting and Traffic Light Control to Reduce Traffic Congestion","volume":"18","author":"Cao","year":"2017","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Cao, Z., Guo, H., Zhang, J., and Fastenrath, U. (2016, January 12\u201317). Multiagent-Based Route Guidance for Increasing the Chance of Arrival on Time. Proceedings of the AAAI 2016, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.9893"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/19\/5493\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:13:32Z","timestamp":1760177612000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/19\/5493"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,25]]},"references-count":44,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2020,10]]}},"alternative-id":["s20195493"],"URL":"https:\/\/doi.org\/10.3390\/s20195493","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,25]]}}}