{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T05:16:08Z","timestamp":1775193368882,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T00:00:00Z","timestamp":1672358400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T00:00:00Z","timestamp":1672358400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61906021"],"award-info":[{"award-number":["61906021"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In order to achieve collision-free path planning in complex environment, Munchausen deep Q-learning network (M-DQN) is applied to mobile robot to learn the best decision. On the basis of Soft-DQN, M-DQN adds the scaled log-policy to the immediate reward. The method allows agent to do more exploration. However, the M-DQN algorithm has the problem of slow convergence. A new and improved M-DQN algorithm (DM-DQN) is proposed in the paper to address the problem. First, its network structure was improved on the basis of M-DQN by decomposing the network structure into a value function and an advantage function, thus decoupling action selection and action evaluation and speeding up its convergence, giving it better generalization performance and enabling it to learn the best decision faster. Second, to address the problem of the robot\u2019s trajectory being too close to the edge of the obstacle, a method of using an artificial potential field to set a reward function is proposed to drive the robot\u2019s trajectory away from the vicinity of the obstacle. The result of simulation experiment shows that the method learns more efficiently and converges faster than DQN, Dueling DQN and M-DQN in both static and dynamic environments, and is able to plan collision-free paths away from obstacles.<\/jats:p>","DOI":"10.1007\/s40747-022-00948-7","type":"journal-article","created":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T10:14:45Z","timestamp":1672395285000},"page":"4287-4300","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":44,"title":["DM-DQN: Dueling Munchausen deep Q network for robot path planning"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3148-9414","authenticated-orcid":false,"given":"Yuwan","family":"Gu","sequence":"first","affiliation":[]},{"given":"Zhitao","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Jidong","family":"Lv","sequence":"additional","affiliation":[]},{"given":"Lin","family":"Shi","sequence":"additional","affiliation":[]},{"given":"Zhenjie","family":"Hou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7119-1006","authenticated-orcid":false,"given":"Shoukun","family":"Xu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,12,30]]},"reference":[{"key":"948_CR1","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-319-77042-0_1","volume":"772","author":"A Koubaa","year":"2018","unstructured":"Koubaa A, Bennaceur H, Chaari I et al (2018) Introduction to mobile robot path planning. Robot Path Plan Cooperation 772:3\u201312","journal-title":"Robot Path Plan Cooperation"},{"key":"948_CR2","first-page":"1398","volume":"2","author":"Y Koren","year":"1991","unstructured":"Koren Y, Borenstein J (1991) Potential field methods and their inherent limitations for mobile robot navigation. IEEE Int Conf Robot Automation 2:1398\u20131404","journal-title":"IEEE Int Conf Robot Automation"},{"key":"948_CR3","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1016\/j.apm.2022.03.014","volume":"107","author":"XL Fu","year":"2022","unstructured":"Fu XL, Huang JZ, Jing ZL (2022) Complex switching dynamics and chatter alarm for aerial agents with artificial potential field method. Appl Math Model 107:637\u2013649","journal-title":"Appl Math Model"},{"issue":"3","key":"948_CR4","doi-asserted-by":"crossref","first-page":"65","DOI":"10.14569\/IJARAI.2013.020310","volume":"2","author":"A Reshamwala","year":"2013","unstructured":"Reshamwala A, Vinchurkar DP (2013) robot path planning using an ant colony optimization approach: a survey. Int J Adv Res Artif Intell 2(3):65\u201371","journal-title":"Int J Adv Res Artif Intell"},{"key":"948_CR5","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1007\/s00500-006-0068-4","volume":"11","author":"O Castillo","year":"2007","unstructured":"Castillo O, Leonardo T, Patricia M (2007) Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots. Soft Comput 11:269\u2013279","journal-title":"Soft Comput"},{"issue":"1","key":"948_CR6","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1109\/4235.985692","volume":"6","author":"M Clerc","year":"2002","unstructured":"Clerc M, Kennedy J (2002) The particle swarm explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evolution Comput 6(1):58\u201373","journal-title":"IEEE Trans Evolution Comput"},{"issue":"2","key":"948_CR7","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1016\/j.ejor.2021.07.016","volume":"298","author":"RN Boute","year":"2022","unstructured":"Boute RN, Gijsbrechts J, van Jaarsveld W et al (2022) Deep reinforcement learning for inventory control: a roadmap. Eur J Oper Res 298(2):401\u2013412","journal-title":"Eur J Oper Res"},{"key":"948_CR8","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.neunet.2022.05.013","volume":"153","author":"T Rupprecht","year":"2022","unstructured":"Rupprecht T, Yanzhi W (2022) A survey for deep reinforcement learning in markovian cyber-physical systems: Common problems and solutions. Neural Netw Off J Int Neural Netw Soc 153:13\u201336","journal-title":"Neural Netw Off J Int Neural Netw Soc"},{"key":"948_CR9","doi-asserted-by":"publisher","first-page":"19572","DOI":"10.1109\/ACCESS.2022.3151248","volume":"10","author":"A Halbouni","year":"2022","unstructured":"Halbouni A, Gunawan TS, Habaebi MH et al (2022) Machine learning and deep learning approaches for cybersecurity: a review. IEEE Access 10:19572\u201319585","journal-title":"IEEE Access"},{"key":"948_CR10","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1146\/annurev-control-042920-020211","volume":"5","author":"L Brunke","year":"2022","unstructured":"Brunke L, Greeff M, Hall AW et al (2022) Safe Learning in robotics: from learning-based control to safe reinforcement learning. Annu Rev Control Robot Autonom Syst 5:411\u2013444","journal-title":"Annu Rev Control Robot Autonom Syst"},{"issue":"6","key":"948_CR11","first-page":"1406","volume":"42","author":"JW Liu","year":"2019","unstructured":"Liu JW, Gao F, Luo XL (2019) Survey of deep reinforcement learning based on value function and policy gradient. Chin J Comput 42(6):1406\u20131438","journal-title":"Chin J Comput"},{"issue":"7540","key":"948_CR12","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533","journal-title":"Nature"},{"key":"948_CR13","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279\u2013292","journal-title":"Mach Learn"},{"key":"948_CR14","unstructured":"Wang Z, Schaul T et al (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd international conference on international conference on machine learning. IEEE"},{"key":"948_CR15","unstructured":"Haarnoja T, Zhou A, Abbeel P, et al (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: 35th International conference on machine learning"},{"key":"948_CR16","unstructured":"Vieillard N, Pietquin O, Geist M (2020) Munchausen reinforcement learning. In: 34th advances in neural information processing systems"},{"issue":"3","key":"948_CR17","doi-asserted-by":"publisher","first-page":"749","DOI":"10.1109\/JSAC.2022.3142348","volume":"40","author":"SH Liu","year":"2022","unstructured":"Liu SH, Zheng C, Huang YM et al (2022) Distributed reinforcement learning for privacy-preserving dynamic edge caching. IEEE J Sel Areas Commun 40(3):749\u2013760","journal-title":"IEEE J Sel Areas Commun"},{"key":"948_CR18","first-page":"552","volume":"42","author":"Y Dong","year":"2021","unstructured":"Dong Y, Yang C et al (2021) Robot path planning based on improved DQN. J Comput Des Eng 42:552\u2013558","journal-title":"J Comput Des Eng"},{"key":"948_CR19","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1016\/j.neucom.2021.12.039","volume":"474","author":"HL Wu","year":"2022","unstructured":"Wu HL, Zhang JW, Wang Z et al (2022) Sub-AVG: overestimation reduction for cooperative multi-agent reinforcement learning. Neurocomputing 474:94\u2013106","journal-title":"Neurocomputing"},{"key":"948_CR20","doi-asserted-by":"publisher","DOI":"10.1002\/oca.2781","author":"RN Huang","year":"2021","unstructured":"Huang RN, Qin CX, Li JL, Lan XJ (2021) Path planning of mobile robot in unknown dynamic continuous environment using reward-modified deep Q-network. Optim Control Appl Methods. https:\/\/doi.org\/10.1002\/oca.2781","journal-title":"Optim Control Appl Methods"},{"issue":"6","key":"948_CR21","doi-asserted-by":"publisher","first-page":"5773","DOI":"10.3233\/JIFS-192171","volume":"41","author":"P Lou","year":"2021","unstructured":"Lou P, Xu K et al (2021) Path planning in an unknown environment based on deep reinforcement learning with prior knowledge. J Intell Fuzzy Syst 41(6):5773\u20135789","journal-title":"J Intell Fuzzy Syst"},{"key":"948_CR22","first-page":"1","volume":"2021","author":"N Yan","year":"2021","unstructured":"Yan N, Huang SB, Kong C (2021) Reinforcement learning-based autonomous navigation and obstacle avoidance for USVS under partially observable conditions. Math Problems Eng 2021:1\u201313","journal-title":"Math Problems Eng"},{"issue":"2","key":"948_CR23","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1007\/s10846-019-01073-3","volume":"98","author":"C Yan","year":"2020","unstructured":"Yan C, Xiang XJ, Wang C (2020) Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments. J Intell Rob Syst 98(2):297\u2013309","journal-title":"J Intell Rob Syst"},{"issue":"10","key":"948_CR24","doi-asserted-by":"publisher","first-page":"7788","DOI":"10.1109\/TIE.2018.2884240","volume":"66","author":"YB Hu","year":"2018","unstructured":"Hu YB, Wu XY, Geng P et al (2018) Evolution strategies learning with variable impedance control for grasping under uncertainty. IEEE Trans Ind Electron 66(10):7788\u20137799","journal-title":"IEEE Trans Ind Electron"},{"issue":"12","key":"948_CR25","doi-asserted-by":"publisher","first-page":"12636","DOI":"10.1109\/TIE.2020.3044776","volume":"68","author":"YB Hu","year":"2020","unstructured":"Hu YB, Su H, Fu JL et al (2020) Nonlinear model predictive control for mobile medical robot using neural optimization. IEEE Trans Ind Electron 68(12):12636\u201312645","journal-title":"IEEE Trans Ind Electron"},{"issue":"11","key":"948_CR26","doi-asserted-by":"publisher","first-page":"2058","DOI":"10.1111\/2041-210X.13692","volume":"12","author":"I Chades","year":"2021","unstructured":"Chades I, Pascal LV, Nicol S et al (2021) A primer on partially observable Markov decision processes. Methods Ecol Evol 12(11):2058\u20132072","journal-title":"Methods Ecol Evol"},{"key":"948_CR27","doi-asserted-by":"publisher","first-page":"72","DOI":"10.1016\/j.spl.2016.01.007","volume":"111","author":"PG Sankaran","year":"2016","unstructured":"Sankaran PG, Sunoj SM, Nair NU (2016) Kullback\u2013Leibler divergence: a quantile approach. Stat Prob Lett 111:72\u201379","journal-title":"Stat Prob Lett"},{"key":"948_CR28","unstructured":"Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: 32nd International conference on machine learning"},{"key":"948_CR29","unstructured":"Abdolmaleki A, Springenberg JT, Tassa Y, Munos R, Heess N, Riedmiller M (2018) Maximum a posteriori policy optimisation. In: 8th International conference on learning representations"},{"key":"948_CR30","unstructured":"Hasselt HV, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: The association for the advancement of artificial intelligence"},{"issue":"1","key":"948_CR31","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1177\/027836498600500106","volume":"5","author":"O Khatib","year":"1986","unstructured":"Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Robot Res 5(1):90\u201398","journal-title":"Int J Robot Res"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00948-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00948-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00948-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,3]],"date-time":"2023-12-03T19:13:23Z","timestamp":1701630803000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00948-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,30]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["948"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00948-7","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,30]]},"assertion":[{"value":"1 April 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 December 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}