{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T04:17:32Z","timestamp":1772252252894,"version":"3.50.1"},"reference-count":21,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,4,1]],"date-time":"2021-04-01T00:00:00Z","timestamp":1617235200000},"content-version":"vor","delay-in-days":90,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61672123"],"award-info":[{"award-number":["61672123"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61602083"],"award-info":[{"award-number":["61602083"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62002044"],"award-info":[{"award-number":["62002044"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["DUT20LAB136"],"award-info":[{"award-number":["DUT20LAB136"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004543","name":"China Scholarship Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004543","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Complexity"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high\u2010dimensional and large\u2010scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor\u2010Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged\u2010SAC. By averaging the previously learned action\u2010state estimates, it reduces the overestimation problem of soft Q\u2010learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged\u2010SAC through some games in the MuJoCo environment. The experimental results show that the Averaged\u2010SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.<\/jats:p>","DOI":"10.1155\/2021\/6658724","type":"journal-article","created":{"date-parts":[[2021,4,1]],"date-time":"2021-04-01T22:35:05Z","timestamp":1617316505000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Averaged Soft Actor\u2010Critic for Deep Reinforcement Learning"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7642-4182","authenticated-orcid":false,"given":"Feng","family":"Ding","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9750-5851","authenticated-orcid":false,"given":"Guanfeng","family":"Ma","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9209-2189","authenticated-orcid":false,"given":"Zhikui","family":"Chen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5099-6991","authenticated-orcid":false,"given":"Jing","family":"Gao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7138-430X","authenticated-orcid":false,"given":"Peng","family":"Li","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,4]]},"reference":[{"key":"e_1_2_10_1_2","doi-asserted-by":"publisher","DOI":"10.1109\/tsmc.1983.6313077"},{"key":"e_1_2_10_2_2","unstructured":"MnihV. KavukcuogluK. SilverD.et al. Playing atari with deep reinforcement learning 2013 https:\/\/arxiv.org\/abs\/1312.5602."},{"key":"e_1_2_10_3_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_2_10_4_2","unstructured":"HaarnojaT. ZhouA. AbbeelP.et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor 2018 https:\/\/arxiv.org\/abs\/1801.01290."},{"key":"e_1_2_10_5_2","unstructured":"HasseltH. V. DoubleQ-learning Proceedings of the Advances in Neural Information Processing Systems (NIPS) December 2010 Vancouver CA USA 2613\u20132621."},{"key":"e_1_2_10_6_2","unstructured":"LevineS.andKoltunV. Guided policy search Proceedings of the International conference on machine learning (ICML) June 2013 Atlanta GA USA 1\u20139."},{"key":"e_1_2_10_7_2","unstructured":"HendersonP. IslamR. BachmanP. PineauJ. PrecupD. andMegerD. Deep reinforcement learning that matters 2017 https:\/\/arxiv.org\/abs\/1709.06560."},{"key":"e_1_2_10_8_2","unstructured":"MazoureB. DoanT. DurandA. HjelmR. D. andPineauJ. Leveraging exploration in off-policy algorithms via normalizing flows 2019 https:\/\/arxiv.org\/abs\/1905.06893."},{"key":"e_1_2_10_9_2","article-title":"A distributional code for value in dopamine-based reinforcement learning","volume":"2020","author":"Dabney W.","year":"2020","journal-title":"Nature"},{"key":"e_1_2_10_10_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-its.2019.0317"},{"key":"e_1_2_10_11_2","unstructured":"AnschelO. BaramN. andShimkinN. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning Proceedings of the International Conference on Machine Learning August 2017 Sydney Australia 176\u2013185."},{"key":"e_1_2_10_12_2","unstructured":"HeessN. WayneG. SilverD. LillicrapT. ErezT. andTassaY. Learning continuous control policies by stochastic value gradients Proceedings of the Advances in Neural Information Processing Systems (NIPS) December 2015 Montreal CA USA 2944\u20132952."},{"key":"e_1_2_10_13_2","first-page":"1","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine S.","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_10_14_2","unstructured":"KingmaD.andBaJ. A. A method for stochastic optimization Proceedings of the International Conference for Learning Presentations (ICLR) May 2015 San Diego CA USA."},{"key":"e_1_2_10_15_2","unstructured":"KondaV. R.andTsitsiklisJ. N. Actor-critic algorithms Proceedings of the Advances in Neural Information Processing Systems June 2000 Denver CO USA 1008\u20131014."},{"key":"e_1_2_10_16_2","unstructured":"SilverD. LeverG. HeessN. DegrisT. WierstraD. andRiedmillerM. Deterministic policy gradient algorithms Proceedings of the International Conference on Machine Learning June 2014 Beijing China 387\u2013395."},{"key":"e_1_2_10_17_2","unstructured":"SchulmanJ. LevineS. AbbeelP. JordanM. I. andMoritzP. Trust region policy optimization Proceedings of the In International Conference on Machine Learning (ICML) July 2015 Lille France 1889\u20131897."},{"key":"e_1_2_10_18_2","unstructured":"SchulmanJ. WolskiF. DhariwalP. RadfordA. andKlimovO. Proximal policy optimization algorithms 2017 https:\/\/arxiv.org\/abs\/1707.06347."},{"key":"e_1_2_10_19_2","unstructured":"ThomasP. Bias in natural actor-critic algorithms Proceedings of the International Conference on Machine Learning (ICML) June 2014 Beijing China 441\u2013448."},{"key":"e_1_2_10_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2008.02.003"},{"key":"e_1_2_10_21_2","unstructured":"HaarnojaT. ZhouA. HartikainenK.et al. Soft actor-critic algorithms and applications 2018 https:\/\/arxiv.org\/abs\/1812.05905."}],"container-title":["Complexity"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6658724.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6658724.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/6658724","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T22:35:25Z","timestamp":1723242925000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/6658724"}},"subtitle":[],"editor":[{"given":"Ning","family":"Cai","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/6658724"],"URL":"https:\/\/doi.org\/10.1155\/2021\/6658724","archive":["Portico"],"relation":{},"ISSN":["1076-2787","1099-0526"],"issn-type":[{"value":"1076-2787","type":"print"},{"value":"1099-0526","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-11-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-03-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"6658724"}}