{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T03:19:15Z","timestamp":1775186355399,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T00:00:00Z","timestamp":1677024000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T00:00:00Z","timestamp":1677024000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11901578"],"award-info":[{"award-number":["11901578"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In the complex tasks environment, efficient state feature learning is a key factor to improve the performance of the agent\u2019s policy. When encountering a similar new environment, reinforcement learning agents usually need to learn from scratch. However, humans naturally have a common sense of the environment and are able to use prior knowledge to extract environmental state features. Although the prior knowledge may not be fully applicable to the new environment, it is able to speed up the learning process of the state feature. Taking this inspiration, we propose an artificial potential field-based reinforcement learning (APF-RL) method. The method consists of an artificial potential field state feature abstractor (APF-SA) and an artificial potential field intrinsic reward model (APF-IR). The APF-SA can introduce human knowledge to accelerate the learning process of the state feature. The APF-IR can generate an intrinsic reward to reduce the invalid exploration and guide the learning of the agent\u2019s policy. We conduct experiments on PySC2 with different mini-games. The experimental results show that the APF-RL method achieves improvement in the learning efficiency compared to the benchmarks.<\/jats:p>","DOI":"10.1007\/s40747-023-00995-8","type":"journal-article","created":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T06:06:46Z","timestamp":1677046006000},"page":"4911-4922","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Efficient state representation with artificial potential fields for reinforcement learning"],"prefix":"10.1007","volume":"9","author":[{"given":"Hao","family":"Jiang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3145-8044","authenticated-orcid":false,"given":"Shengze","family":"Li","sequence":"additional","affiliation":[]},{"given":"Jieyuan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yuqi","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Xinhai","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Donghong","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,22]]},"reference":[{"issue":"7540","key":"995_CR1","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533","journal-title":"Nature"},{"issue":"7587","key":"995_CR2","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484\u2013489","journal-title":"Nature"},{"key":"995_CR3","unstructured":"Zhang F, Leitner J, Milford M, Upcroft B, Corke P (2015) Towards vision-based deep reinforcement learning for robotic motion control. arXiv preprint arXiv:1511.03791"},{"key":"995_CR4","doi-asserted-by":"crossref","unstructured":"Chen J, Yuan B, Tomizuka M (2019) Model-free deep reinforcement learning for urban autonomous driving. In: 2019 IEEE intelligent transportation systems conference (ITSC). IEEE, pp 2765\u20132771","DOI":"10.1109\/ITSC.2019.8917306"},{"key":"995_CR5","unstructured":"Chen X, Li S, Li H, Jiang S, Qi Y, Song L (2019) Generative adversarial user model for reinforcement learning based recommendation system. In: International conference on machine learning. PMLR, pp 1052\u20131061"},{"key":"995_CR6","doi-asserted-by":"crossref","unstructured":"Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi E.H (2019) Top-k off-policy correction for a reinforce recommender system. In: Proceedings of the twelfth ACM International conference on web search and data mining, pp 456\u2013464","DOI":"10.1145\/3289600.3290999"},{"issue":"16","key":"995_CR7","doi-asserted-by":"publisher","first-page":"6683","DOI":"10.1002\/rnc.5131","volume":"30","author":"V Stojanovic","year":"2020","unstructured":"Stojanovic V, He S, Zhang B (2020) State and parameter joint estimation of linear stochastic systems in presence of faults and non-gaussian noises. Int J Robust Nonlinear Control 30(16):6683\u20136700","journal-title":"Int J Robust Nonlinear Control"},{"key":"995_CR8","doi-asserted-by":"crossref","unstructured":"Hu Z, Ma X, Liu Z, Hovy E, Xing E (2016) Harnessing deep neural networks with logic rules. arXiv preprint arXiv:1603.06318","DOI":"10.18653\/v1\/P16-1228"},{"key":"995_CR9","unstructured":"Fischer M, Balunovic M, Drachsler-Cohen D, Gehr T, Zhang C, Vechev M (2019) Dl2: training and querying neural networks with logic. In: International conference on machine learning. PMLR, pp 1931\u20131941"},{"key":"995_CR10","doi-asserted-by":"crossref","unstructured":"Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou JP, Leibo JZ, Gruslys A (2017) A Learning from demonstrations for real world reinforcement learning. arXiv:1704.03732","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"995_CR11","doi-asserted-by":"crossref","unstructured":"Zhang P, Hao J, Wang W, Tang H. Ma Y, Duan Y, Zheng Y (2020) Kogun: accelerating deep reinforcement learning via integrating human suboptimal knowledge. arXiv preprint arXiv:2002.07418","DOI":"10.24963\/ijcai.2020\/317"},{"key":"995_CR12","doi-asserted-by":"crossref","first-page":"126537","DOI":"10.1016\/j.amc.2021.126537","volume":"412","author":"X Xin","year":"2022","unstructured":"Xin X, Tu Y, Stojanovic V, Wang H, Shi K, He S, Pan T (2022) Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems. Appl Math Comput 412:126537","journal-title":"Appl Math Comput"},{"key":"995_CR13","doi-asserted-by":"publisher","unstructured":"Zhuang Z, Tao H, Chen Y, Stojanovic V, Paszke W (2022) An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans Syst Man Cybern Syst. https:\/\/doi.org\/10.1109\/TSMC.2022.3225381","DOI":"10.1109\/TSMC.2022.3225381"},{"issue":"4","key":"995_CR14","doi-asserted-by":"publisher","first-page":"552","DOI":"10.1007\/s11771-008-0104-x","volume":"15","author":"L-J Xie","year":"2008","unstructured":"Xie L-J, Xie G-R, Chen H-W, Li X-L (2008) Solution to reinforcement learning problems with artificial potential field. J Cent South Univ Technol 15(4):552\u2013557","journal-title":"J Cent South Univ Technol"},{"key":"995_CR15","doi-asserted-by":"crossref","unstructured":"Noguchi Y, Maki T (2019) Path planning method based on artificial potential field and reinforcement learning for intervention auvs. In: 2019 IEEE underwater technology (UT). IEEE, pp 1\u20136","DOI":"10.1109\/UT.2019.8734314"},{"issue":"1","key":"995_CR16","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1177\/027836498600500106","volume":"5","author":"O Khatib","year":"1986","unstructured":"Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Robot Res 5(1):90\u201398","journal-title":"Int J Robot Res"},{"key":"995_CR17","unstructured":"Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets A.S, Yeo M, Makhzani A, K\u00fcttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782"},{"key":"995_CR18","doi-asserted-by":"crossref","unstructured":"Fran\u00e7ois-Lavet V, Henderson P, Islam R, Bellemare M.G, Pineau J (2018) An introduction to deep reinforcement learning. arXiv preprint arXiv:1811.12560","DOI":"10.1561\/9781680835397"},{"issue":"7839","key":"995_CR19","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","volume":"588","author":"J Schrittwieser","year":"2020","unstructured":"Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T et al (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604\u2013609","journal-title":"Nature"},{"key":"995_CR20","unstructured":"Srinivas A, Laskin M, Abbeel P (2020) Curl: contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136"},{"key":"995_CR21","unstructured":"Anand A, Racah E, Ozair S, Bengio Y, C\u00f4t\u00e9 M-A, Hjelm RD (2019) Unsupervised state representation learning in atari. arXiv preprint arXiv:1906.08226"},{"key":"995_CR22","unstructured":"Stooke A, Lee K, Abbeel P, Laskin M (2021) Decoupling representation learning from reinforcement learning. In: International conference on machine learning, pp 9870\u20139879"},{"key":"995_CR23","volume-title":"Robot shaping: an experiment in behavior engineering","author":"M Dorigo","year":"1998","unstructured":"Dorigo M, Colombetti M (1998) Robot shaping: an experiment in behavior engineering. MIT Press, Cambridge"},{"key":"995_CR24","first-page":"278","volume":"99","author":"AY Ng","year":"1999","unstructured":"Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. Icml 99:278\u2013287","journal-title":"Icml"},{"key":"995_CR25","unstructured":"Wiewiora E, Cottrell GW, Elkan C (2003) Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 792\u2013799"},{"key":"995_CR26","unstructured":"Devlin S.M, Kudenko D (2012) Dynamic potential-based reward shaping. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. IFAAMAS, pp 433\u2013440"},{"key":"995_CR27","doi-asserted-by":"crossref","unstructured":"Harutyunyan A, Devlin S, Vrancx P, Now\u00e9 A (2015) Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the AAAI conference on artificial intelligence, vol 29","DOI":"10.1609\/aaai.v29i1.9628"},{"key":"995_CR28","first-page":"15931","volume":"33","author":"Y Hu","year":"2020","unstructured":"Hu Y, Wang W, Jia H, Wang Y, Chen Y, Hao J, Wu F, Fan C (2020) Learning to utilize shaping rewards: a new approach of reward shaping. Adv Neural Inf Process Syst 33:15931\u201315941","journal-title":"Adv Neural Inf Process Syst"},{"key":"995_CR29","volume-title":"Introduction to reinforcement learning","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG et al (1998) Introduction to reinforcement learning, vol 135. MIT Press, Cambridge"},{"key":"995_CR30","volume-title":"Markov decision processes: discrete stochastic dynamic programming","author":"ML Puterman","year":"2014","unstructured":"Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York"},{"key":"995_CR31","unstructured":"Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning. PMLR, pp 387\u2013395"},{"key":"995_CR32","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347"},{"issue":"3","key":"995_CR33","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1145\/203330.203343","volume":"38","author":"G Tesauro","year":"1995","unstructured":"Tesauro G et al (1995) Temporal difference learning and td-gammon. Commun ACM 38(3):58\u201368","journal-title":"Commun ACM"},{"key":"995_CR34","unstructured":"Schulman J, Moritz P, Levine S, Jordan M, Abbeel P (2015) High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438"},{"issue":"1","key":"995_CR35","first-page":"2529","volume":"13","author":"J Kim","year":"2012","unstructured":"Kim J, Scott CD (2012) Robust kernel density estimation. J Mach Learn Res 13(1):2529\u20132565","journal-title":"J Mach Learn Res"},{"key":"995_CR36","unstructured":"Simonmeister (2018) pysc2-rl-agents. https:\/\/github.com\/simonmeister\/pysc2-rl-agents. Accessed 28 Apr 2018"},{"key":"995_CR37","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-00995-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-00995-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-00995-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,7]],"date-time":"2023-12-07T10:55:15Z","timestamp":1701946515000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-00995-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,22]]},"references-count":37,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["995"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-00995-8","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,22]]},"assertion":[{"value":"26 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We all declare that we have no conflict of interest in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}