{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T09:03:35Z","timestamp":1769850215572,"version":"3.49.0"},"reference-count":37,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,9,22]],"date-time":"2020-09-22T00:00:00Z","timestamp":1600732800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61603406"],"award-info":[{"award-number":["61603406"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Reinforcement learning, as a branch of machine learning, has been gradually applied in the control field. However, in the practical application of the algorithm, the hyperparametric approach to network settings for deep reinforcement learning still follows the empirical attempts of traditional machine learning (supervised learning and unsupervised learning). This method ignores part of the information generated by agents exploring the environment contained in the updating of the reinforcement learning value function, which will affect the performance of the convergence and cumulative return of reinforcement learning. The reinforcement learning algorithm based on dynamic parameter adjustment is a new method for setting learning rate parameters of deep reinforcement learning. Based on the traditional method of setting parameters for reinforcement learning, this method analyzes the advantages of different learning rates at different stages of reinforcement learning and dynamically adjusts the learning rates in combination with the temporal-difference (TD) error values to achieve the advantages of different learning rates in different stages to improve the rationality of the algorithm in practical application. At the same time, by combining the Robbins\u2013Monro approximation algorithm and deep reinforcement learning algorithm, it is proved that the algorithm of dynamic regulation learning rate can theoretically meet the convergence requirements of the intelligent control algorithm. In the experiment, the effect of this method is analyzed through the continuous control scenario in the standard experimental environment of \u201dCar-on-The-Hill\u201d of reinforcement learning, and it is verified that the new method can achieve better results than the traditional reinforcement learning in practical application. According to the model characteristics of the deep reinforcement learning, a more suitable setting method for the learning rate of the deep reinforcement learning network proposed. At the same time, the feasibility of the method has been proved both in theory and in the application. Therefore, the method of setting the learning rate parameter is worthy of further development and research.<\/jats:p>","DOI":"10.3390\/a13090239","type":"journal-article","created":{"date-parts":[[2020,9,22]],"date-time":"2020-09-22T09:40:56Z","timestamp":1600767656000},"page":"239","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Feasibility Analysis and Application of Reinforcement Learning Algorithm Based on Dynamic Parameter Adjustment"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3307-5490","authenticated-orcid":false,"given":"Menglin","family":"Li","sequence":"first","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xueqiang","family":"Gu","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengyi","family":"Zeng","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuan","family":"Feng","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","first-page":"2834","article-title":"A review of reinforcement learning research","volume":"27","author":"Chen","year":"2010","journal-title":"Appl. Res. Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/s41593-018-0147-8","article-title":"Prefrontal cortex as a meta-reinforcement learning system","volume":"21","author":"Wang","year":"2018","journal-title":"Nat. Neuroence"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_5","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M.A. (2013). Playing Atari with Deep Reinforcement Learning. arXiv."},{"key":"ref_6","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2016). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv."},{"key":"ref_7","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, BC, Canada."},{"key":"ref_8","first-page":"255","article-title":"Model Selection and Hyper-parameter Optimization based on Reinforcement learning","volume":"49","author":"Jia","year":"2020","journal-title":"J. Univ. Electron. Sci. Technol. China"},{"key":"ref_9","first-page":"281","article-title":"Random Search for Hyper-Parameter Optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_10","unstructured":"Jomaa, H.S., Grabocka, J., and Schmidt-Thieme, L. (2019). Hyp-RL: Hyperparameter Optimization by Reinforcement Learning. arXiv."},{"key":"ref_11","unstructured":"Bernstein, A., Chen, Y., Colombino, M., Dall\u2019Anese, E., Mehta, P., and Meyn, S.P. (2019). Optimal Rate of Convergence for Quasi-Stochastic Approximation. arXiv."},{"key":"ref_12","unstructured":"Pohlen, T., Piot, B., Hester, T., Azar, M.G., Horgan, D., Budden, D., Barth-Maron, G., van Hasselt, H., Quan, J., and Vecer\u00edk, M. (2018). Observe and Look Further: Achieving Consistent Performance on Atari. arXiv."},{"key":"ref_13","unstructured":"Marco, W., and Otterlo, V. (2012). Reinforcement Learning: State of the Art, Springer."},{"key":"ref_14","unstructured":"Leslie, K. (1993). Learning in Embedded Systems, MIT Press."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1613\/jair.301","article-title":"Reinforcement Learning: A Survey","volume":"4","author":"Kaelbling","year":"1996","journal-title":"J. Artif. Intell. Res."},{"key":"ref_16","unstructured":"Watkins, C. (1989). Learning From Delayed Rewards. [Ph.D. Thesis, University of Cambridge]."},{"key":"ref_17","first-page":"226","article-title":"Incremental multi-step Q-learning","volume":"22","author":"Peng","year":"1996","journal-title":"Mach. Learn. Proc. 1994"},{"key":"ref_18","first-page":"190","article-title":"Design of Heuristic Return Function in Reinforcement Learning Algorithm and Its Convergence Analysis","volume":"32","author":"Yingzi","year":"2005","journal-title":"Comput. Sci."},{"key":"ref_19","unstructured":"Liu, S., Grzelak, L., and Oosterlee, C. (2020). The Seven-League Scheme: Deep learning for large time step Monte Carlo simulations of stochastic differential equations. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1080\/00401706.1995.10484354","article-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","volume":"37","author":"Baxter","year":"1995","journal-title":"Technometrics"},{"key":"ref_21","first-page":"1345","article-title":"Rationality, optimism and guarantees in general reinforcement learning","volume":"16","author":"Sunehag","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jet.2004.03.008","article-title":"On the convergence of reinforcement learning","volume":"122","author":"Beggs","year":"2002","journal-title":"J. Econ. Theory"},{"key":"ref_23","unstructured":"Matignon, L., Laurent, G.J., and Fort-Piat, N.L. (November, January 29). Hysteretic q-learning: An algorithm for decentralized reinforcement learning in cooperative multi-agent teams. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1115\/1.3426540","article-title":"A Rationale for Fuzzy Control","volume":"94","author":"Zadeh","year":"1972","journal-title":"J. Dyn. Syst. Meas. Control"},{"key":"ref_25","first-page":"13","article-title":"Research and development of parameter self-adjusting method for fuzzy controller","volume":"1","author":"Yingshi","year":"2006","journal-title":"Harbin Railw. Sci. Technol."},{"key":"ref_26","first-page":"1","article-title":"Comparison Between Genetic Fuzzy Methodology and Q-Learning for Collaborative Control Design","volume":"10","author":"Sathyan","year":"2019","journal-title":"Int. J. Artif. Intell. Appl."},{"key":"ref_27","first-page":"1582","article-title":"Convergence analysis of multi steps reinforcement learning algorithm","volume":"47","author":"Rui","year":"2019","journal-title":"Comput. Digit. Eng."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sutton, R., and Barto, A. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_29","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."},{"key":"ref_30","first-page":"1107","article-title":"Least-Squares Policy Iteration","volume":"4","author":"Lagoudakis","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_31","unstructured":"Michail, L., and Ronald, P. (2020, September 21). Reinforcement Learning as Classification: Leveraging Modern Classifiers. Available online: https:\/\/www.aaai.org\/Papers\/ICML\/2003\/ICML03-057.pdf."},{"key":"ref_32","first-page":"600","article-title":"Deep reinforcement learning method based on resampling optimization cache experience playback mechanism","volume":"33","author":"Xiliang","year":"2018","journal-title":"Control Decis."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/BF00993591","article-title":"The Parti game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State spaces","volume":"21","author":"Moore","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1057\/jors.1984.92","article-title":"Problem Complexity and Method Efficiency in Optimizationby (AS Nemirovsky and DB Yudin)","volume":"35","author":"Darzentas","year":"1984","journal-title":"J. Oper. Res. Soc."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhang, T. (2004, January 4\u20138). Solving Large Scale Linear Prediction Problems Using Stochastic 2004. Proceedings of the Twenty-First International Conference on MACHINE Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015332"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kolesnikov, A., Zhai, X., and Beyer, L. (2019, January 16\u201320). Revisiting Self-Supervised Visual Representation Learning. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00202"},{"key":"ref_37","unstructured":"Lu, Y. (2015). Unsupervised Learning on Neural Network Outputs. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/9\/239\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:12:16Z","timestamp":1760177536000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/9\/239"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,22]]},"references-count":37,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["a13090239"],"URL":"https:\/\/doi.org\/10.3390\/a13090239","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,22]]}}}