{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T19:00:19Z","timestamp":1773255619463,"version":"3.50.1"},"reference-count":28,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,3,22]],"date-time":"2022-03-22T00:00:00Z","timestamp":1647907200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In the field of reinforcement learning, we propose a Correct Proximal Policy Optimization (CPPO) algorithm based on the modified penalty factor \u03b2 and relative entropy in order to solve the robustness and stationarity of traditional algorithms. Firstly, In the process of reinforcement learning, this paper establishes a strategy evaluation mechanism through the policy distribution function. Secondly, the state space function is quantified by introducing entropy, whereby the approximation policy is used to approximate the real policy distribution, and the kernel function estimation and calculation of relative entropy is used to fit the reward function based on complex problem. Finally, through the comparative analysis on the classic test cases, we demonstrated that our proposed algorithm is effective, has a faster convergence speed and better performance than the traditional PPO algorithm, and the measure of the relative entropy can show the differences. In addition, it can more efficiently use the information of complex environment to learn policies. At the same time, not only can our paper explain the rationality of the policy distribution theory, the proposed framework can also balance between iteration steps, computational complexity and convergence speed, and we also introduced an effective measure of performance using the relative entropy concept.<\/jats:p>","DOI":"10.3390\/e24040440","type":"journal-article","created":{"date-parts":[[2022,3,22]],"date-time":"2022-03-22T14:55:35Z","timestamp":1647960935000},"page":"440","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Relative Entropy of Correct Proximal Policy Optimization Algorithms with Modified Penalty Factor in Complex Environment"],"prefix":"10.3390","volume":"24","author":[{"given":"Weimin","family":"Chen","sequence":"first","affiliation":[{"name":"School of Information and Electronics, Hunan City University, Yiyang 413000, China"}]},{"given":"Kelvin Kian Loong","family":"Wong","sequence":"additional","affiliation":[{"name":"School of Information and Electronics, Hunan City University, Yiyang 413000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4450-5345","authenticated-orcid":false,"given":"Sifan","family":"Long","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, Changsha 410075, China"},{"name":"School of Computer Science, National University of Defense Technology, Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1909-4946","authenticated-orcid":false,"given":"Zhili","family":"Sun","sequence":"additional","affiliation":[{"name":"5G&6G Innovation Centre, Department of Electrical and Electronic Engineering, Institute for Communication Systems, University of Surrey, Guildford GU2 7XH, UK"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of Go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_2","first-page":"1329","article-title":"Benchmarking Deep Reinforcement Learning for Continuous Control","volume":"48","author":"Yan","year":"2016","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_3","unstructured":"Hussain, Q.A., Nakamura, Y., Yoshikawa, Y., and Ishiguro, H. (2017). Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_5","unstructured":"Li, Y. (2017). Deep reinforcement learning: An overview. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Hou, Y., Liu, L., Wei, Q., Xu, X., and Chen, C. (2017, January 5\u20138). A novel DDPG method with prioritized experience replay. Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Bannf, Canada.","DOI":"10.1109\/SMC.2017.8122622"},{"key":"ref_7","first-page":"1889","article-title":"Trust Region Policy Optimization","volume":"37","author":"Schulman","year":"2015","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_8","unstructured":"Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., and de Freitas, N. (2016). Sample efficient actor-critic with experience replay. arXiv."},{"key":"ref_9","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Youlve, C., Kaiyun, B., and Zhaoyang, L. (2021, January 5\u20137). Asynchronous Distributed Proximal Policy Optimization Training Framework Based on GPU. Proceedings of the 2021 Chinese Intelligent Automation Conference, Zhanjiang, China.","DOI":"10.1007\/978-981-16-6372-7_67"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wei, Z., Xu, J., Lan, Y., Guo, J., and Cheng, X. (2017, January 7\u201311). Reinforcement Learning to Rank with Markov Decision Process. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan.","DOI":"10.1145\/3077136.3080685"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"806","DOI":"10.1093\/jigpal\/jzx022","article-title":"Logical information theory: New logical foundations for information theory","volume":"25","author":"Ellerman","year":"2017","journal-title":"Log. J. IGPL"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"126621","DOI":"10.1016\/j.physa.2021.126621","article-title":"Entropy analysis of Boolean network reduction according to the determinative power of nodes","volume":"589","author":"Pelz","year":"2022","journal-title":"Phys. A Stat. Mech. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.jspi.2021.05.009","article-title":"The properties of entropy as a measure of randomness in a clinical trial","volume":"216","author":"Hoberman","year":"2022","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dai, E., Jin, W., Liu, H., and Wang, S. (2022). Towards Robust Graph Neural Networks for Noisy Graphs with Sparse Labels. arXiv.","DOI":"10.1145\/3488560.3498408"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1007\/s12555-015-0371-x","article-title":"Maximum likelihood estimation method for dual-rate Hammerstein systems","volume":"15","author":"Wang","year":"2017","journal-title":"Int. J. Control Autom. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Vestner, M., Litman, R., Rodola, E., Bronstein, A., and Cremers, D. (2017, January 21\u201326). Product manifold filter: Non-rigid shape correspondence via kernel density estimation in the product space. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.707"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.neucom.2016.12.038","article-title":"A survey of deep neural network architectures and their applications","volume":"234","author":"Liu","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3306","DOI":"10.1007\/s10489-018-1140-3","article-title":"Nonlinear feature selection using Gaussian kernel SVM-RFE for fault diagnosis","volume":"48","author":"Xue","year":"2018","journal-title":"Appl. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3014","DOI":"10.1162\/neco_a_01002","article-title":"A robust regression framework with laplace kernel-induced loss","volume":"29","author":"Yang","year":"2017","journal-title":"Neural Comput."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1543","DOI":"10.1109\/LSP.2016.2606661","article-title":"Guaranteed bounds on the Kullback\u2013Leibler divergence of univariate mixtures","volume":"23","author":"Nielsen","year":"2016","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yu, D., Yao, K., Su, H., Li, G., and Seide, F. (2013, January 26\u201331). KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada.","DOI":"10.1109\/ICASSP.2013.6639201"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1664","DOI":"10.1109\/TKDE.2016.2545657","article-title":"Entropy optimized feature-based bag-of-words representation for information retrieval","volume":"28","author":"Passalis","year":"2016","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_24","first-page":"99","article-title":"Kullback-Leibler Divergence-based Attacks against Remote State Estimation in Cyber-physical Systems","volume":"69","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.energy.2019.06.051","article-title":"The multi-objective optimization of combustion system operations based on deep data-driven models","volume":"182","author":"Tang","year":"2019","journal-title":"Energy"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Shang, H., Li, Y., Xu, J., Qi, B., and Yin, J. (2020). A novel hybrid approach for partial discharge signal detection based on complete ensemble empirical mode decomposition with adaptive noise and approximate entropy. Entropy, 22.","DOI":"10.3390\/e22091039"},{"key":"ref_27","first-page":"203","article-title":"Filter-Based Feature Selection Using Information Theory and Binary Cuckoo Optimisation Algorithm","volume":"14","author":"Usman","year":"2022","journal-title":"J. Inf. Technol. Manag."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"108203","DOI":"10.1016\/j.asoc.2021.108203","article-title":"A dissimilarity-based approach to automatic classification of biosignal modalities","volume":"115","author":"Bota","year":"2022","journal-title":"Appl. Soft Comput."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/4\/440\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:41:11Z","timestamp":1760136071000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/4\/440"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,22]]},"references-count":28,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["e24040440"],"URL":"https:\/\/doi.org\/10.3390\/e24040440","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,22]]}}}