{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T16:43:24Z","timestamp":1764175404716,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,5,28]],"date-time":"2023-05-28T00:00:00Z","timestamp":1685232000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research & Development Program of China","award":["2019YFB1404904"],"award-info":[{"award-number":["2019YFB1404904"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>In the implementation of deep reinforcement learning (DRL), action persistence strategies are often adopted so agents maintain their actions for a fixed or variable number of steps. The choice of the persistent duration for agent actions usually has notable effects on the performance of reinforcement learning algorithms. Aiming at the research gap of global dynamic optimal action persistence and its application in multi-agent systems, we propose a novel framework: global dynamic action persistence (GLDAP), which achieves global action persistence adaptation for deep reinforcement learning. We introduce a closed-loop method that is used to learn the estimated value and the corresponding policy of each candidate action persistence. Our experiment shows that GLDAP achieves an average of 2.5%~90.7% performance improvement and 3~20 times higher sampling efficiency over several baselines across various single-agent and multi-agent domains. We also validate the ability of GLDAP to determine the optimal action persistence through multiple experiments.<\/jats:p>","DOI":"10.1145\/3590154","type":"journal-article","created":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T12:19:51Z","timestamp":1680524391000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["GLDAP: Global Dynamic Action Persistence Adaptation for Deep Reinforcement Learning"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7433-0788","authenticated-orcid":false,"given":"Junbo","family":"Tong","sequence":"first","affiliation":[{"name":"Department of Automation, Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8327-555X","authenticated-orcid":false,"given":"Daming","family":"Shi","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4797-7826","authenticated-orcid":false,"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0040-5759","authenticated-orcid":false,"given":"Wenhui","family":"Fan","sequence":"additional","affiliation":[{"name":"Department of Automation, Tsinghua University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,28]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Ross E. Allen Jayesh K. Gupta Jaime Pena Yutai Zhou Javona White Bear and Mykel J. Kochenderfer. 2019. Health-informed policy gradients for multi-agent reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/1908.01022."},{"key":"e_1_3_1_3_2","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. OpenAI Gym. Retrieved from https:\/\/arxiv.org\/abs\/1606.01540."},{"key":"e_1_3_1_4_2","volume-title":"Advances in Neural Information Processing Systems","author":"Buckland Kenneth","year":"1993","unstructured":"Kenneth Buckland and Peter Lawrence. 1993. Transition point dynamic programming. In Advances in Neural Information Processing Systems, Vol. 6. Morgan-Kaufmann."},{"key":"e_1_3_1_5_2","volume-title":"Optimal Control of Dynamic Systems Through the Reinforcement Learning of Transition Points","author":"Buckland Kenneth M.","year":"1994","unstructured":"Kenneth M. Buckland. 1994. Optimal Control of Dynamic Systems Through the Reinforcement Learning of Transition Points. Ph. D. Dissertation. University of British Columbia."},{"key":"e_1_3_1_6_2","unstructured":"Will Dabney Georg Ostrovski and Andr\u00e9 Barreto. 2020. Temporally-extended  \\(\\epsilon\\) -greedy exploration. Retrieved from https:\/\/arxiv.org\/abs\/2006.01782."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"e_1_3_1_9_2","article-title":"On inductive biases in deep reinforcement learning","author":"Hessel Matteo","year":"2019","unstructured":"Matteo Hessel, Hado van Hasselt, Joseph Modayil, and David Silver. 2019. On inductive biases in deep reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/1907.02908.","journal-title":"Retrieved from https:\/\/arxiv.org\/abs\/1907.02908"},{"key":"e_1_3_1_10_2","unstructured":"Shivaram Kalyanakrishnan Siddharth Aravindan Vishwajeet Bagdawat Varun Bhatt Harshith Goka Archit Gupta Kalpesh Krishna and Vihari Piratla. 2021. An analysis of frame-skipping in reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/2102.03718."},{"key":"e_1_3_1_11_2","volume-title":"Reinforcement Learning Control with Approximation of Time-dependent Agent Dynamics","author":"Kirkpatrick Kenton Conrad","year":"2013","unstructured":"Kenton Conrad Kirkpatrick. 2013. Reinforcement Learning Control with Approximation of Time-dependent Agent Dynamics. Texas A&M University."},{"key":"e_1_3_1_12_2","volume-title":"Learning Motor Skills: From Algorithms to Robot Experiments","author":"Kober Jens","year":"2013","unstructured":"Jens Kober and Jan Peters. 2013. Learning Motor Skills: From Algorithms to Robot Experiments. Vol. 97. Springer."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10918"},{"key":"e_1_3_1_14_2","first-page":"3254","volume-title":"Advances in Neural Information Processing Systems","author":"Lee Jongmin","year":"2020","unstructured":"Jongmin Lee, Byung-Jun Lee, and Kee-Eung Kim. 2020. Reinforcement learning for control with multiple frequencies. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc. 3254\u20133264."},{"key":"e_1_3_1_15_2","unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/1509.02971."},{"key":"e_1_3_1_16_2","unstructured":"Yilun Lin Xingyuan Dai Li Li and Fei-Yue Wang. 2018. An efficient deep reinforcement learning model for urban traffic control. Retrieved from https:\/\/arxiv.org\/abs\/1808.01876."},{"key":"e_1_3_1_17_2","volume-title":"Advances in Neural Information Processing Systems","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_1_18_2","volume-title":"Introduction to Dynamic Systems; Theory, Models, and Applications","author":"Luenberger David G.","year":"1979","unstructured":"David G. Luenberger. 1979. Introduction to Dynamic Systems; Theory, Models, and Applications. Technical Report. John Wiley & Sons Chichester."},{"key":"e_1_3_1_19_2","first-page":"6862","volume-title":"37th International Conference on Machine Learning","volume":"119","author":"Metelli Alberto Maria","year":"2020","unstructured":"Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, and Marcello Restelli. 2020. Control frequency adaptation via action persistence in batch reinforcement learning. In 37th International Conference on Machine Learning, Vol. 119. 6862\u20136873."},{"key":"e_1_3_1_20_2","first-page":"1928","volume-title":"33rd International Conference on Machine Learning","volume":"48","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016Asynchronous methods for deep reinforcement learning. In 33rd International Conference on Machine Learning, Vol. 48. 1928\u20131937."},{"key":"e_1_3_1_21_2","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/1312.5602."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11492"},{"key":"e_1_3_1_23_2","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L.","year":"2014","unstructured":"Martin L. Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons."},{"key":"e_1_3_1_24_2","first-page":"4295","volume-title":"35th International Conference on Machine Learning","volume":"80","author":"Rashid Tabish","year":"2018","unstructured":"Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In 35th International Conference on Machine Learning, Vol. 80. 4295\u20134304."},{"key":"e_1_3_1_25_2","first-page":"1889","volume-title":"32nd International Conference on Machine Learning","volume":"37","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In 32nd International Conference on Machine Learning, Vol. 37. 1889\u20131897."},{"key":"e_1_3_1_26_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347."},{"key":"e_1_3_1_27_2","unstructured":"Sahil Sharma Aravind Srinivas and Balaraman Ravindran. 2017. Learning to repeat: Fine grained action repetition for deep reinforcement learning. Retrieved from https:\/\/arxiv.org\/abs\/1702.06054."},{"key":"e_1_3_1_28_2","first-page":"387","volume-title":"31st International Conference on Machine Learning","volume":"32","author":"Silver David","year":"2014","unstructured":"David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014Deterministic policy gradient algorithms. In 31st International Conference on Machine Learning, Vol. 32. 387\u2013395."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.1998.712192"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Richard S. Sutton Doina Precup and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112 1-2 (1999) 181\u2013211.","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_3_1_31_2","first-page":"6096","volume-title":"36th International Conference on Machine Learning","volume":"97","author":"Tallec Corentin","year":"2019","unstructured":"Corentin Tallec, L\u00e9onard Blier, and Yann Ollivier. 2019. Making deep q-learning methods robust to time discretization. In 36th International Conference on Machine Learning, Vol. 97. 6096\u20136104."},{"key":"e_1_3_1_32_2","unstructured":"Yuval Tassa Yotam Doron Alistair Muldal Tom Erez Yazhe Li Diego de Las Casas David Budden Abbas Abdolmaleki Josh Merel Andrew Lefrancq Timothy Lillicrap and Martin Riedmiller. 2018. DeepMind control suite. Retrieved from https:\/\/arxiv.org\/abs\/1801.00690."},{"key":"e_1_3_1_33_2","volume-title":"Advances in Neural Information Processing Systems","author":"Terry Justin K.","year":"2021","unstructured":"Justin K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Clemens Dieffendahl, Caroline Horsch, Rodrigo Perez-Vicente, et\u00a0al. 2021. PettingZoo: Gym for multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Onder Tutsoy. 2021. COVID-19 epidemic and opening of the schools: Artificial intelligence-based long-term adaptive policy making to control the pandemic diseases. IEEE Access 9 (2021) 68461\u201368471.","DOI":"10.1109\/ACCESS.2021.3078080"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Onder Tutsoy. 2021. Pharmacological non-pharmacological policies and mutation: An artificial intelligence based multi-dimensional policy making algorithm for controlling the casualties of the pandemic diseases. IEEE Trans. Pattern Anal. Mach. Intell. 44 12 (2021) 9477\u20139488.","DOI":"10.1109\/TPAMI.2021.3127674"},{"key":"e_1_3_1_36_2","doi-asserted-by":"crossref","unstructured":"Onder Tutsoy Duygun Erol Barkana and Kemal Balikci. 2021. A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans. Cybern. 53 1 (2021) 329\u2013337.","DOI":"10.1109\/TCYB.2021.3091680"},{"key":"e_1_3_1_37_2","volume-title":"Advances in Neural Information Processing Systems","author":"Yu Haonan","year":"2021","unstructured":"Haonan Yu, Wei Xu, and Haichao Zhang. 2021. TAAC: Temporally abstract actor-critic for continuous control. In Advances in Neural Information Processing Systems, Vol. 34."}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3590154","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3590154","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:26Z","timestamp":1750178186000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3590154"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,28]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3590154"],"URL":"https:\/\/doi.org\/10.1145\/3590154","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"type":"print","value":"1556-4665"},{"type":"electronic","value":"1556-4703"}],"subject":[],"published":{"date-parts":[[2023,5,28]]},"assertion":[{"value":"2021-11-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-29","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}