{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:38:47Z","timestamp":1775745527432,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T00:00:00Z","timestamp":1736208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach. Intell. Res."],"published-print":{"date-parts":[[2025,8]]},"DOI":"10.1007\/s11633-024-1503-7","type":"journal-article","created":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T11:08:44Z","timestamp":1736248124000},"page":"797-816","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making"],"prefix":"10.1007","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1359-0364","authenticated-orcid":false,"given":"Yiming","family":"Yang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8251-9118","authenticated-orcid":false,"given":"Dengpeng","family":"Xing","sequence":"additional","affiliation":[]},{"given":"Wannian","family":"Xia","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8265-9866","authenticated-orcid":false,"given":"Peng","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,7]]},"reference":[{"key":"1503_CR1","first-page":"287","volume-title":"Proceedings of the 6th Conference on Robot Learning","author":"B Ichter","year":"2022","unstructured":"B. Ichter, A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian, D. Kalashnikov, S. Levine, Y. Lu, C. Parada, K. Rao, P. Sermanet, A. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, M. Y. Yan, N. Brown, M. Ahn, O. Cortes, N. Sievers, C. Tan, S. C. Xu, D. Reyes, J. Rettinghouse, J. Quiambao, P. Pastor, L. Luu, K. H. Lee, Y. H. Kuang, S. Jesmonth, N. J. Joshi, K. Jeffrey, R. J. Ruano, J. Hsu, K. Gopalakrishnan, B. David, A. Zeng, C. K. Fu. Do as I can, not as I say: Grounding language in robotic affordances. In Proceedings of the 6th Conference on Robot Learning, Auckland, New Zealand, pp. 287\u2013318, 2022."},{"key":"1503_CR2","first-page":"477","volume-title":"Proceedings of the 5th Conference on Robot Learning","author":"S Srivastava","year":"2021","unstructured":"S. Srivastava, C. S. Li, M. Lingelbach, R. Mart\u00edn-Mart\u00edn, F. Xia, K. E. Vainio, Z. Lian, C. Gokmen, S. Buch, C. K. Liu, S. Savarese, H. Gweon, J. J. Wu, F. F. Li. Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Proceedings of the 5th Conference on Robot Learning, London, UK, pp. 477\u2013490, 2021."},{"key":"1503_CR3","volume-title":"Open-ended learning leads to generally capable agents","author":"A Stooke","year":"2021","unstructured":"A. Stooke, A. Mahajan, C. Barros, C. Deck, J. Bauer, J. Sygnowski, M. Trebacz, M. Jaderberg, M. Mathieu, N. McAleese, N. Bradley-Schmieg, N. Wong, N. Porcel, R. Raileanu, S. Hughes-Fitt, V. Dalibard, W. M. Czarnecki. Open-ended learning leads to generally capable agents, [Online], Available: https:\/\/arxiv.org\/abs\/2107.12808, 2021."},{"key":"1503_CR4","volume-title":"Gpt-4 technical report","author":"OpenAI","year":"2023","unstructured":"OpenAI. Gpt-4 technical report, [Online], Available: https:\/\/arxiv.org\/abs\/2303.08774, 2023."},{"key":"1503_CR5","first-page":"1752","volume-title":"Proceedings of the 5th Conference on Robot Learning","author":"S Levine","year":"2021","unstructured":"S. Levine. Understanding the world through action. In Proceedings of the 5th Conference on Robot Learning, London, UK, pp. 1752\u20131757, 2021."},{"key":"1503_CR6","first-page":"27730","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"L Ouyang","year":"2022","unstructured":"L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, R. Lowe. Training language models to follow instructions with human feedback. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, pp. 27730\u201327744, 2022."},{"key":"1503_CR7","doi-asserted-by":"publisher","first-page":"1057","DOI":"10.5555\/3009657.3009806","volume-title":"Proceedings of the 12th International Conference on Neural Information Processing Systems","author":"R S Sutton","year":"1999","unstructured":"R. S. Sutton, D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, USA, pp. 1057\u20131063, 1999. DOI: https:\/\/doi.org\/10.5555\/3009657.3009806."},{"issue":"1","key":"1503_CR8","first-page":"273","volume":"82","author":"S Lohmann","year":"1992","unstructured":"S. Lohmann. Optimal commitment in monetary policy: Credibility versus flexibility. The American Economic Review, vol. 82, no. 1, pp. 273\u2013286, 1992.","journal-title":"The American Economic Review"},{"key":"1503_CR9","doi-asserted-by":"publisher","first-page":"1889","DOI":"10.5555\/3045118.3045319","volume":"37","author":"J Schulman","year":"2015","unstructured":"J. Schulman, S. Levine, P. Moritz, M. Jordan, P. Abbeel. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, vol. 37, pp. 1889\u20131897, 2015. DOI: https:\/\/doi.org\/10.5555\/3045118.3045319.","journal-title":"Proceedings of the 32nd International Conference on Machine Learning"},{"key":"1503_CR10","volume-title":"Proximal policy optimization algorithms","author":"J Schulman","year":"2017","unstructured":"J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Proximal policy optimization algorithms, [Online], Available: https:\/\/arxiv.org\/abs\/1707.06347, 2017."},{"issue":"7782","key":"1503_CR11","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Y. Wang, T. Pfaff, Y. H. Wu, R. Ring, D. Yogatama, D. W\u00fcnsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver. Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature, vol. 575, no. 7782, pp.350\u2013354, 2019. DOI: https:\/\/doi.org\/10.1038\/s41586-019-1724-z.","journal-title":"Nature"},{"key":"1503_CR12","first-page":"10905","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"X J Wang","year":"2021","unstructured":"X. J. Wang, J. X. Song, P. H. Qi, P. Peng, Z. K. Tang, W. Zhang, W. M. Li, X. J. Pi, J. J. He, C. Gao, H. T. Long, Q. Yuan. SCC: An efficient deep reinforcement learning agent mastering the game of starcraft II. In Proceedings of the 38th International Conference on Machine Learning, pp. 10905\u201310915,2021."},{"key":"1503_CR13","unstructured":"DI-starContributors: DI-star: An Open-sourse Reinforcement Learning Framework for StarCraftll. GitHub, 2021. https:\/\/github.com\/opendilab\/DI-star"},{"key":"1503_CR14","volume-title":"Dota 2 with large scale deep reinforcement learning","author":"OpenAI","year":"2019","unstructured":"OpenAI. Dota 2 with large scale deep reinforcement learning, [Online], Available: https:\/\/arxiv.org\/abs\/1912.06680, 2019."},{"key":"1503_CR15","doi-asserted-by":"publisher","first-page":"6672","DOI":"10.1609\/aaai.v34i04.6144","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence","author":"D H Ye","year":"2020","unstructured":"D. H. Ye, Z. Liu, M. F. Sun, B. Shi, P. L. Zhao, H. Wu, H. S. Yu, S. J. Yang, X. P. Wu, Q. W. Guo, Q. B. Chen, Y. Y. T. Yin, H. Zhang, T. F. Shi, L. Wang, Q. Fu, W. Yang, L. X. Huang. Mastering complex control in moba games with deep reinforcement learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, pp. 6672\u20136679, 2020. DOI: https:\/\/doi.org\/10.1609\/aaai.v34i04.6144."},{"key":"1503_CR16","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems","author":"M Mathieu","year":"2021","unstructured":"M. Mathieu, S. Ozair, S. Srinivasan, C. Gulcehre, S. T. Zhang, R. Jiang, T. Le Paine, K. \u017bolna, R. Powell, J. Schrittwieser, D. Choi, P. Georgiev, D. Toyama, A. Huang, R. Ring, I. Babuschkin, T. Ewalds, M. Bordbar, S. Henderson, S. G. Colmenarejo, A. Van Den Oord, W. M. Czarnecki, N. De Freitas, O. Vinyals. Starcraft II unplugged: Large scale offline reinforcement learning. In Proceedings of the 35th Conference on Neural Information Processing Systems, Sydney, Australia, 2021."},{"key":"1503_CR17","volume-title":"Proceedings of the 7th International Conference on Learning Representations","author":"C Eisenach","year":"2019","unstructured":"C. Eisenach, H. C. Yang, J. Liu, H. Liu. Marginal policy gradients: A unified family of estimators for bounded action spaces with applications. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, USA, 2019."},{"key":"1503_CR18","first-page":"1","volume-title":"Proceedings of the 10th International Conference on Learning Representations","author":"J G Kuba","year":"2022","unstructured":"J. G. Kuba, R. Q. Chen, M. N. Wen, Y. Wen, F. L. Sun, J. Wang, Y. D. Yang. Trust region policy optimisation in multi-agent reinforcement learning. In Proceedings of the 10th International Conference on Learning Representations, pp.1\u201327, 2022."},{"key":"1503_CR19","doi-asserted-by":"publisher","first-page":"4691","DOI":"10.1609\/aaai.v33i01.33014691","volume-title":"Proceedings of the 33rd AAAI Conference on Artificial Intelligence","author":"Z J Pang","year":"2019","unstructured":"Z. J. Pang, R. Z. Liu, Z. Y. Meng, Y. Zhang, Y. Yu, T. Lu. On reinforcement learning for full-length game of Star-Craft. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, pp.4691\u20134698, 2019. DOI: https:\/\/doi.org\/10.1609\/aaai.v33i01.33014691."},{"key":"1503_CR20","first-page":"10026","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"K Jothimurugan","year":"2021","unstructured":"K. Jothimurugan, S. Bansal, O. Bastani, R. Alur. Compositional reinforcement learning from logical specifications. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 10026\u201310039, 2021"},{"key":"1503_CR21","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.5555\/3298239.3298465","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence","author":"C Tessler","year":"2017","unstructured":"C. Tessler, S. Givony, T. Zahavy, D. J. Mankowitz, S. Mannor. A deep hierarchical approach to lifelong learning in minecraft. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, pp.1553\u20131561, 2017. DOI: https:\/\/doi.org\/10.5555\/3298239.3298465."},{"key":"1503_CR22","doi-asserted-by":"publisher","first-page":"1206","DOI":"10.1609\/aaai.v33i01.33011206","volume-title":"Proceedings of the 33rd AAAI Conference on Artif\u00edcial Intelligence","author":"B Wu","year":"2019","unstructured":"B. Wu. Hierarchical macro strategy model for MOBA game AI. In Proceedings of the 33rd AAAI Conference on Artif\u00edcial Intelligence, Honolulu, USA, pp. 1206\u20131213, 2019. DOI: https:\/\/doi.org\/10.1609\/aaai.v33i01.33011206."},{"key":"1503_CR23","volume-title":"Discrete sequential prediction of continuous actions for deep RL","author":"L Metz","year":"2017","unstructured":"L. Metz, J. Ibarz, N. Jaitly, J. Davidson. Discrete sequential prediction of continuous actions for deep RL, [Online], Available: https:\/\/arxiv.org\/abs\/1705.05035, 2017."},{"key":"1503_CR24","volume-title":"Large language models play starcraft II: Benchmarks and a chain of summarization approach","author":"W Y Ma","year":"2023","unstructured":"W. Y. Ma, Q. R. Mi, X. Yan, Y. Q. Wu, R. J. Lin, H. F. Zhang, J. Wang. Large language models play starcraft II: Benchmarks and a chain of summarization approach, [Online], Available: https:\/\/arxiv.org\/abs\/2312.11865, 2023."},{"key":"1503_CR25","doi-asserted-by":"publisher","first-page":"13734","DOI":"10.1109\/CVPR52729.2023.01320","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"S F Cai","year":"2023","unstructured":"S. F. Cai, Z. H. Wang, X. J. Ma, A. J. Liu, Y. T. Liang. Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, pp. 13734\u201313744, 2023. DOI: https:\/\/doi.org\/10.1109\/CVPR52729.2023.01320."},{"key":"1503_CR26","first-page":"1","volume-title":"Proceedings of the Foundation Models for Decision Making Workshop","author":"H Yuan","year":"2023","unstructured":"H. Yuan, C. Zhang, H. Wang, F. Xie, P. Cai, H. Dong, Z. Lu. Skill reinforcement learning and planning for open-world long-horizon tasks. In Proceedings of the Foundation Models for Decision Making Workshop, New Orleans, USA, pp. 1\u201324, 2023."},{"key":"1503_CR27","doi-asserted-by":"publisher","first-page":"2970","DOI":"10.1609\/aaai.v33i01.33012970","volume-title":"Proceedings of the 33rd AAAI Conference on Artificial Intelligence","author":"D M Lyu","year":"2019","unstructured":"D. M. Lyu, F. K. Yang, B. Liu, S. Gustafson. SDRL: Interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, pp. 2970\u20132977, 2019. DOI: https:\/\/doi.org\/10.1609\/aaai.v33i01.33012970."},{"key":"1503_CR28","volume-title":"Learning symbolic rules for interpretable deep reinforcement learning","author":"Z H Ma","year":"2021","unstructured":"Z. H. Ma, Y. Z. Zhuang, P. Weng, H. H. Zhuo, D. Li, W. L. Liu, J. Y. Hao. Learning symbolic rules for interpretable deep reinforcement learning, [Online], Available: https:\/\/arxiv.org\/abs\/2103.08228, 2021."},{"key":"1503_CR29","first-page":"29669","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"Y C Yang","year":"2021","unstructured":"Y. C. Yang, J. P. Inala, O. Bastani, Y. W. Pu, A. Solar-Lezama, M. C. Rinard. Program synthesis guided reinforcement learning for partially observed environments. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 29669\u201329683, 2021."},{"key":"1503_CR30","doi-asserted-by":"publisher","first-page":"1704","DOI":"10.5555\/3305381.3305557","volume":"70","author":"N Jiang","year":"2017","unstructured":"N. Jiang, A. Krishnamurthy, A. Agarwal, J. Langford, R. E. Schapire. Contextual decision processes with low bellman rank are PAC-learnable. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, vol. 70, pp. 1704\u20131713, 2017. DOI: https:\/\/doi.org\/10.5555\/3305381.3305557.","journal-title":"Proceedings of the 34th International Conference on Machine Learning"},{"key":"1503_CR31","first-page":"13399","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"A Agarwal","year":"2020","unstructured":"A. Agarwal, M. Henaff, S. M. Kakade, W. Sun. PC-PG: Policy cover directed exploration for provable policy gradient learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp.13399\u201313412, 2020."},{"key":"1503_CR32","first-page":"34556","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"I Uchendu","year":"2023","unstructured":"I. Uchendu, T. Xiao, Y. Lu, B. H. Zhu, M. Y. Yan, J. Simon, M. Bennice, C. Y. Fu, C. Ma, J. T. Jiao, S. Levine, K. Hausman. Jump-start reinforcement learning. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, USA, pp. 34556\u201334583, 2023."},{"key":"1503_CR33","first-page":"11909","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"J Queeney","year":"2021","unstructured":"J. Queeney, Y. Paschalidis, C. G. Cassandras. Generalized proximal policy optimization with sample reuse. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 11909\u201311919, 2021."},{"key":"1503_CR34","volume-title":"Starcraft II: A new challen for reinforcement learning","author":"O Vinyals","year":"2017","unstructured":"O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. K\u00fcttler, J. P. Agapiou, J. Schrittwieser, J. Quan, S. Gaffney, S. Petersen, K. Simonyan, T. Schaul, H. Van Hasselt, D. Silver, T. P. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, R. Tsing. Starcraft II: A new challen for reinforcement learning, [Online], Available: https:\/\/arxiv.org\/abs\/1708.04782, 2017."},{"key":"1503_CR35","doi-asserted-by":"publisher","first-page":"2974","DOI":"10.1609\/aaai.v32il.11794","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence","author":"J Foerster","year":"2018","unstructured":"J. Foerster, G. Farquhar, T. Afouras, N. Nardelli, S. Whiteson. Counterfactual multi-agent policy gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, pp. 2974\u20132982, 2018. DOI: https:\/\/doi.org\/10.1609\/aaai.v32il.11794."},{"key":"1503_CR36","first-page":"13458","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"J G Kuba","year":"2021","unstructured":"J. G. Kuba, M. N. Wen, L. H. Meng, S. D. Gu, H. F. Zhang, D. Mguni, J. Wang, Y. D. Yang. Settling the variance of multi-agent policy gradients. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 13458\u201313470, 2021."},{"key":"1503_CR37","first-page":"26437","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"Z F Wu","year":"2021","unstructured":"Z. F. Wu, C. Yu, D. H. Ye, J. G. Zhang, H. Y. Piao, H. H. Zhuo. Coordinated proximal policy optimization. In Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 26437\u201326448, 2021."},{"key":"1503_CR38","first-page":"1407","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"L Espeholt","year":"2018","unstructured":"L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, K. Kavukcuoglu. IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1407\u20131416, 2018."},{"key":"1503_CR39","doi-asserted-by":"publisher","first-page":"2720","DOI":"10.3233\/FAIA230581","volume-title":"Proceedings of the 26th European Conference on Artificial Intelligence","author":"W N Xia","year":"2023","unstructured":"W. N. Xia, Y. M. Yang, J. Q. Ruan, D. P. Xing, B. Xu. Cardsformer: Grounding language to learn a generalizable policy in hearthstone. In Proceedings of the 26th European Conference on Artificial Intelligence, Krak\u00f3w, Poland, pp. 2720\u20132727, 2023. DOI: https:\/\/doi.org\/10.3233\/FAIA230581."},{"key":"1503_CR40","doi-asserted-by":"publisher","unstructured":"S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Q. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, Y. Tassa. DM_Control: Software and tasks for continuous control. Software Impacts, vol. 6, Article number 100022, 2020. DOI: https:\/\/doi.org\/10.1016\/j.simpa.2020.100022.","DOI":"10.1016\/j.simpa.2020.100022"},{"key":"1503_CR41","doi-asserted-by":"publisher","first-page":"1866","DOI":"10.1090\/mbk\/107","volume-title":"Markov Chains and Mixing Times","author":"D A Levin","year":"2017","unstructured":"D. A. Levin, Y. Peres. Markov Chains and Mixing Times, 2nd ed., Providence Road, USA: American Mathematical Society, pp. 1866\u20137414, 2017.","edition":"2nd ed."},{"key":"1503_CR42","first-page":"8545","volume-title":"Proceedings of the 37th International Conference on Machine Learning","author":"S Schmitt","year":"2020","unstructured":"S. Schmitt, M. Hessel, K. Simonyan. Off-policy actor-critic with shared experience replay. In Proceedings of the 37th International Conference on Machine Learning, pp. 8545\u20138554, 2020."}],"container-title":["Machine Intelligence Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11633-024-1503-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11633-024-1503-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11633-024-1503-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T02:42:24Z","timestamp":1757126544000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11633-024-1503-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,7]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["1503"],"URL":"https:\/\/doi.org\/10.1007\/s11633-024-1503-7","relation":{},"ISSN":["2731-538X","2731-5398"],"issn-type":[{"value":"2731-538X","type":"print"},{"value":"2731-5398","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,7]]},"assertion":[{"value":"3 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declared that they have no conflicts of interest to this work.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations of conflict of interest"}}]}}