{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T12:30:28Z","timestamp":1778070628014,"version":"3.51.4"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,4]],"date-time":"2026-01-04T00:00:00Z","timestamp":1767484800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T00:00:00Z","timestamp":1768867200000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"crossref","award":["EP\/W020408\/1"],"award-info":[{"award-number":["EP\/W020408\/1"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Path selection and planning are crucial for autonomous mobile robots (AMRs) to navigate efficiently and avoid obstacles. Traditional methods rely on analytical search to identify the shortest distance. However, Reinforcement learning enhances performance by optimizing a sequence of actions efficiently. It is an iterative approach used for computational sequence modeling and dynamic programming. RL received sensory input from the environment in the form of observation or state. The agent interpreted every reward or penalty through trial-and-error interaction. Policy maximizes the rewards and selects the optimal action among all possible actions. A challenging problem in traditional reinforcement learning is environment generalization for dynamic systems. Q-learning faces challenges in dynamic environments because it relies on rewards or penalties based on the entire sequence of actions from the start to the end state. This approach often fails to produce optimal results when the environment changes unexpectedly due to state transitions, iterations, or blocked routes. Such limitations make Q-learning less effective for dynamic path planning. To overcome these challenges, this study focuses on optimizing reward functions for efficient navigation in RL-based path planning, aiming to enhance navigation efficiency and obstacle avoidance. The proposed method evaluates the shortest decision path by considering total steps, counted steps, and discount rates in dynamic environments. By implementing this RL with an optimized reward mechanism, the study analyzes state reward values across different environments, and it evaluates the effect on state-action pair-based Q-Learning and neural networks using Deep Q-Learning algorithms. Here, results demonstrate that the optimized reward function effectively decreases the number of iterations and episodes while achieving a 30% to 70% reduction in overall trajectory distance. These results highlight the effectiveness of reward-based reinforcement learning, demonstrating its potential to improve path optimization, learning rate, episode completion, and decision accuracy in intelligent navigation systems. Q-learning-based reinforcement learning becomes more effective by combining multiple agents and utilizing decision-making techniques such as federated and transfer learning on larger maps to ensure convergence.<\/jats:p>","DOI":"10.1007\/s11063-025-11821-2","type":"journal-article","created":{"date-parts":[[2026,1,4]],"date-time":"2026-01-04T08:54:38Z","timestamp":1767516878000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Reinforcement Learning-Based Intelligent Path Planning for Optimal Navigation in Dynamic Environments"],"prefix":"10.1007","volume":"58","author":[{"given":"Anil Kumar","family":"Yadav","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Purushottam","family":"Sharma","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaochun","family":"Cheng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shiv Shankar Prasad","family":"Shukla","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,1,4]]},"reference":[{"key":"11821_CR1","doi-asserted-by":"crossref","unstructured":"Mohanty PK, Singh AK, Kumar A, Mahto MK, Kundu S (2021, December) Path planning techniques for mobile robots: a review. In: International conference on soft computing and pattern recognition, pp 657\u2013667. Springer International Publishing, Cham","DOI":"10.1007\/978-3-030-96302-6_62"},{"key":"11821_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.oceaneng.2021.109355","volume":"235","author":"C Cheng","year":"2021","unstructured":"Cheng C, Sha Q, He B, Li G (2021) Path planning and obstacle avoidance for AUV: a review. Ocean Eng 235:109355","journal-title":"Ocean Eng"},{"key":"11821_CR3","volume":"40","author":"A Loganathan","year":"2023","unstructured":"Loganathan A, Ahmad NS (2023) A systematic review on recent advances in autonomous mobile robot navigation. Int J Eng Sci Technol 40:101343","journal-title":"Int J Eng Sci Technol"},{"issue":"6","key":"11821_CR4","first-page":"648","volume":"43","author":"M Wu","year":"2023","unstructured":"Wu M, Yeong CF, Su ELM, Holderbaum W, Yang C (2023) A review on energy efficiency in autonomous mobile robots. Robot Intell Autom 43(6):648\u2013668","journal-title":"Robot Intell Autom"},{"key":"11821_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.120254","volume":"227","author":"L Liu","year":"2023","unstructured":"Liu L, Wang X, Yang X, Liu H, Li J, Wang P (2023) Path planning techniques for mobile robots: review and prospect. Expert Syst Appl 227:120254","journal-title":"Expert Syst Appl"},{"key":"11821_CR6","doi-asserted-by":"publisher","first-page":"149982","DOI":"10.1109\/ACCESS.2021.3125105","volume":"9","author":"OA Salama","year":"2021","unstructured":"Salama OA, Eltaib ME, Mohamed HA, Salah O (2021) RCD: radial cell decomposition algorithm for mobile robot path planning. IEEE Access 9:149982\u2013149992","journal-title":"IEEE Access"},{"key":"11821_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2021.102196","volume":"72","author":"G Chen","year":"2021","unstructured":"Chen G, Luo N, Liu D, Zhao Z, Liang C (2021) Path planning for manipulators based on an improved probabilistic roadmap method. Robot Comput Integr Manuf 72:102196","journal-title":"Robot Comput Integr Manuf"},{"issue":"4","key":"11821_CR8","doi-asserted-by":"publisher","first-page":"1558","DOI":"10.3390\/s22041558","volume":"22","author":"RMJA Souza","year":"2022","unstructured":"Souza RMJA, Lima GV, Morais AS, Oliveira-Lopes LC, Ramos DC, Tofoli FL (2022) Modified artificial potential field for the path planning of aircraft swarms in three-dimensional environments. Sensors 22(4):1558","journal-title":"Sensors"},{"key":"11821_CR9","doi-asserted-by":"crossref","unstructured":"Lindqvist B, Agha-Mohammadi AA, Nikolakopoulos G (2021, September) Exploration-RRT: a multi-objective path planning and exploration framework for unknown and unstructured environments. In: 2021 IEEE\/RSJ international conference on intelligent robots and systems (IROS), pp 3429\u20133435. IEEE.","DOI":"10.1109\/IROS51168.2021.9636243"},{"key":"11821_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2023.109338","volume":"181","author":"ES Low","year":"2023","unstructured":"Low ES, Ong P, Low CY (2023) A modified Q-learning path planning approach using distortion concept and optimization in dynamic environment for autonomous mobile robot. Comput Ind Eng 181:109338","journal-title":"Comput Ind Eng"},{"issue":"17","key":"11821_CR11","doi-asserted-by":"publisher","first-page":"7654","DOI":"10.3390\/app14177654","volume":"14","author":"R Jaramillo-Mart\u00ednez","year":"2024","unstructured":"Jaramillo-Mart\u00ednez R, Chavero-Navarrete E, Ibarra-P\u00e9rez T (2024) Reinforcement-learning-based path planning: a reward function strategy. Appl Sci 14(17):7654","journal-title":"Appl Sci"},{"key":"11821_CR12","doi-asserted-by":"publisher","DOI":"10.1007\/s11831-021-09694-4","author":"AG Gad","year":"2022","unstructured":"Gad AG (2022) Particle swarm optimization algorithm and its applications: a systematic review. Arch Comput Methods Eng. https:\/\/doi.org\/10.1007\/s11831-021-09694-4","journal-title":"Arch Comput Methods Eng"},{"issue":"6","key":"11821_CR13","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1109\/MSP.2017.2743240","volume":"34","author":"K Arulkumaran","year":"2017","unstructured":"Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26\u201338","journal-title":"IEEE Signal Process Mag"},{"key":"11821_CR14","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-19-7784-8","volume-title":"Reinforcement learning for sequential decision and optimal control","author":"SE Li","year":"2023","unstructured":"Li SE (2023) Reinforcement learning for sequential decision and optimal control. Springer, Berlin"},{"issue":"2","key":"11821_CR15","doi-asserted-by":"publisher","first-page":"2277","DOI":"10.1007\/s11042-022-13290-4","volume":"82","author":"P Sharma","year":"2023","unstructured":"Sharma P, Alshehri M, Sharma R (2023) Activities tracking by smartphone and smartwatch biometric sensors using fuzzy set theory. Multimed Tools Appl 82(2):2277\u20132302. https:\/\/doi.org\/10.1007\/s11042-022-13290-4","journal-title":"Multimed Tools Appl"},{"key":"11821_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.oceaneng.2022.112226","volume":"262","author":"W Lan","year":"2022","unstructured":"Lan W, Jin X, Chang X, Wang T, Zhou H, Tian W, Zhou L (2022) Path planning for underwater gliders in time-varying ocean current using deep reinforcement learning. Ocean Eng 262:112226","journal-title":"Ocean Eng"},{"issue":"1","key":"11821_CR17","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1109\/TSG.2021.3119972","volume":"13","author":"Z Li","year":"2021","unstructured":"Li Z, Wu L, Xu Y, Moazeni S, Tang Z (2021) Multi-stage real-time operation of a multi-energy microgrid with electrical and thermal energy storage assets: a data-driven MPC-ADP approach. IEEE Trans Smart Grid 13(1):213\u2013226","journal-title":"IEEE Trans Smart Grid"},{"issue":"6","key":"11821_CR18","doi-asserted-by":"publisher","first-page":"7064","DOI":"10.1109\/TPWRS.2024.3371093","volume":"39","author":"H Gao","year":"2024","unstructured":"Gao H, Jiang S, Li Z, Wang R, Liu Y, Liu J (2024) A two-stage multi-agent deep reinforcement learning method for urban distribution network reconfiguration considering switch contribution. IEEE Trans Power Syst 39(6):7064\u20137076","journal-title":"IEEE Trans Power Syst"},{"issue":"5","key":"11821_CR19","doi-asserted-by":"publisher","first-page":"984","DOI":"10.1007\/s11431-020-1729-2","volume":"64","author":"C Xu","year":"2021","unstructured":"Xu C, Zhao W, Chen Q, Wang C (2021) An actor-critic based learning method for decision-making and planning of autonomous vehicles. Sci China Technol Sci 64(5):984\u2013994","journal-title":"Sci China Technol Sci"},{"issue":"3","key":"11821_CR20","doi-asserted-by":"publisher","first-page":"2525","DOI":"10.32604\/cmc.2021.012469","volume":"66","author":"M Alshehri","year":"2021","unstructured":"Alshehri M, Sharma P, Sharma R, Alfarraj O (2021) Motion-based activities monitoring through biometric sensors using genetic algorithm. Comput Mater Continua 66(3):2525\u20132538. https:\/\/doi.org\/10.32604\/cmc.2021.012469","journal-title":"Comput Mater Continua"},{"key":"11821_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.dajour.2023.100314","volume":"8","author":"ES Low","year":"2023","unstructured":"Low ES, Ong P, Low CY (2023) An empirical evaluation of Q-learning in autonomous mobile robots in static and dynamic environments using simulation. Decis Anal J 8:100314","journal-title":"Decis Anal J"},{"key":"11821_CR22","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.117191","volume":"199","author":"ES Low","year":"2022","unstructured":"Low ES, Ong P, Low CY, Omar R (2022) Modified Q-learning with distance metric and virtual target on path planning of mobile robot. Expert Syst Appl 199:117191","journal-title":"Expert Syst Appl"},{"key":"11821_CR23","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106796","volume":"97","author":"A Maoudj","year":"2020","unstructured":"Maoudj A, Hentout A (2020) Optimal path planning approach based on Q-learning algorithm for mobile robots. Appl Soft Comput 97:106796","journal-title":"Appl Soft Comput"},{"key":"11821_CR24","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1016\/j.robot.2019.02.013","volume":"115","author":"ES Low","year":"2019","unstructured":"Low ES, Ong P, Cheah KC (2019) Solving the optimal path planning of a mobile robot using improved Q-learning. Robot Auton Syst 115:143\u2013161","journal-title":"Robot Auton Syst"},{"key":"11821_CR25","doi-asserted-by":"publisher","DOI":"10.1016\/j.oceaneng.2019.106299","volume":"189","author":"C Chen","year":"2019","unstructured":"Chen C, Chen X-Q, Ma F, Zeng X-J, Wang J (2019) A knowledge-free path planning approach for smart ships based on reinforcement learning. Ocean Eng 189:106299","journal-title":"Ocean Eng"},{"key":"11821_CR26","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2024.104655","volume":"175","author":"F Huo","year":"2024","unstructured":"Huo F, Zhu S, Dong H, Ren W (2024) A new approach to smooth path planning of Ackerman mobile robot based on improved ACO algorithm and B-spline curve. Robot Auton Syst 175:104655","journal-title":"Robot Auton Syst"},{"key":"11821_CR27","first-page":"339","volume":"25","author":"M Elhoseny","year":"2018","unstructured":"Elhoseny M, Tharwat A, Hassanien AE (2018) Bezier curve-based path planning in a dynamic field using modified genetic algorithm J. Comput Sci 25:339\u2013350","journal-title":"Comput Sci"},{"issue":"3","key":"11821_CR28","doi-asserted-by":"publisher","first-page":"1532","DOI":"10.3390\/en16031532","volume":"16","author":"A Rapalski","year":"2023","unstructured":"Rapalski A, Dudzik S (2023) Energy consumption analysis of the selected navigation algorithms for wheeled mobile robots. Energies 16(3):1532","journal-title":"Energies"},{"issue":"2","key":"11821_CR29","doi-asserted-by":"publisher","first-page":"6177","DOI":"10.1016\/j.ifacol.2020.12.1704","volume":"53","author":"R Kubo","year":"2020","unstructured":"Kubo R, Fujii Y, Nakamura H (2020) Control Lyapunov function design for trajectory tracking problems of wheeled mobile robot. IFAC Pap Online 53(2):6177\u20136182","journal-title":"IFAC Pap Online"},{"issue":"11","key":"11821_CR30","doi-asserted-by":"publisher","first-page":"2048","DOI":"10.3390\/sym15112048","volume":"15","author":"Z Wu","year":"2023","unstructured":"Wu Z, Yin Y, Liu J, Zhang De, Chen J, Jiang W (2023) A novel path planning approach for mobile robot in radioactive environment based on improved deep Q network algorithm. Symmetry 15(11):2048","journal-title":"Symmetry"},{"issue":"1","key":"11821_CR31","volume":"2022","author":"W Wang","year":"2022","unstructured":"Wang W, Wu Z, Luo H, Zhang B (2022) Path planning method of mobile robot using improved deep reinforcement learning. J Electr Comput Eng 2022(1):5433988","journal-title":"J Electr Comput Eng"},{"key":"11821_CR32","doi-asserted-by":"publisher","first-page":"121922","DOI":"10.1109\/ACCESS.2019.2938240","volume":"7","author":"X Li","year":"2019","unstructured":"Li X, Lv Z, Wang S, Wei Z, Wu L (2019) A reinforcement learning model based on temporal difference algorithm. IEEE Access 7:121922\u2013121930","journal-title":"IEEE Access"},{"issue":"14","key":"11821_CR33","doi-asserted-by":"publisher","first-page":"2120","DOI":"10.3390\/electronics11142120","volume":"11","author":"X Zhang","year":"2022","unstructured":"Zhang X, Shi X, Zhang Z, Wang Z, Zhang L (2022) A DDQN path planning algorithm based on experience classification and multi steps for mobile robots. Electronics 11(14):2120","journal-title":"Electronics"},{"key":"11821_CR34","doi-asserted-by":"publisher","DOI":"10.1016\/j.rineng.2025.107750","author":"S Venu","year":"2025","unstructured":"Venu S, Gurusamy M (2025) A comprehensive review of path planning algorithms for autonomous navigation. Results Eng. https:\/\/doi.org\/10.1016\/j.rineng.2025.107750","journal-title":"Results Eng"},{"key":"11821_CR35","doi-asserted-by":"crossref","unstructured":"Chen Z, Sheng K, Zhou R, Dong H, Wang J (2024, September) Exploring urban UAV navigation: SAC-based static obstacle avoidance in height-restricted areas using a forward camera. In: 2024 6th international symposium on robotics and intelligent manufacturing technology (ISRIMT), pp 182\u2013186. IEEE","DOI":"10.1109\/ISRIMT63979.2024.10875300"},{"key":"11821_CR36","doi-asserted-by":"crossref","unstructured":"Benjumea DC (2024, June) Formalising safety requirements for robotic autonomous systems in highly regulated domains. In: 2024 IEEE 32nd international requirements engineering conference (RE), pp 512\u2013516. IEEE","DOI":"10.1109\/RE59067.2024.00066"},{"issue":"1","key":"11821_CR37","doi-asserted-by":"publisher","first-page":"3305430","DOI":"10.1155\/jece\/3305430","volume":"2025","author":"AK Yadav","year":"2025","unstructured":"Yadav AK, Sharma P, Cheng X, Gupta NK (2025) Hybrid reinforcement learning with optimized SARSA for improved face recognition systems. J Electr Comput Eng 2025(1):3305430","journal-title":"J Electr Comput Eng"},{"issue":"3","key":"11821_CR38","doi-asserted-by":"publisher","first-page":"1198","DOI":"10.1007\/s13198-021-01414-2","volume":"13","author":"AK Yadav","year":"2022","unstructured":"Yadav AK, Sharma P, Yadav RK (2022) A novel algorithm for wireless sensor network routing protocols based on reinforcement learning. Int J Syst Assur Eng Manag 13(3):1198\u20131204","journal-title":"Int J Syst Assur Eng Manag"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-025-11821-2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-025-11821-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-025-11821-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T07:25:14Z","timestamp":1771053914000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-025-11821-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,4]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["11821"],"URL":"https:\/\/doi.org\/10.1007\/s11063-025-11821-2","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,4]]},"assertion":[{"value":"3 September 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 December 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All images within this manuscript are original works created by the author(s) unless otherwise stated. The authors retain all copyrights to these images.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Publish"}},{"value":"This is an observational study. The research involves no human or animal subjects; therefore, no ethical approval is required.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"10"}}