{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:59Z","timestamp":1750220099690,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,3,31]],"date-time":"2022-03-31T00:00:00Z","timestamp":1648684800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Ministry of Science and Technology","award":["111-2628-E-007-010"],"award-info":[{"award-number":["111-2628-E-007-010"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Evol. Learn. Optim."],"published-print":{"date-parts":[[2022,3,31]]},"abstract":"<jats:p>\n            Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an RL agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this article, we unveil the properties of\n            <jats:italic>reusability<\/jats:italic>\n            and\n            <jats:italic>transferability<\/jats:italic>\n            of macro actions. The first property,\n            <jats:italic>reusability<\/jats:italic>\n            , means that a macro action derived along with one RL method can be reused by another RL method for training, while the second one,\n            <jats:italic>transferability<\/jats:italic>\n            , indicates that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first derive macro actions along with RL methods. We then provide a set of analyses to reveal the properties of\n            <jats:italic>reusability<\/jats:italic>\n            and\n            <jats:italic>transferability<\/jats:italic>\n            of the derived macro actions.\n          <\/jats:p>","DOI":"10.1145\/3514260","type":"journal-article","created":{"date-parts":[[2022,2,24]],"date-time":"2022-02-24T16:13:11Z","timestamp":1645719191000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Reusability and Transferability of Macro Actions for Reinforcement Learning"],"prefix":"10.1145","volume":"2","author":[{"given":"Yi-Hsiang","family":"Chang","sequence":"first","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1120-2970","authenticated-orcid":false,"given":"Kuan-Yu","family":"Chang","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4667-4794","authenticated-orcid":false,"given":"Henry","family":"Kuo","sequence":"additional","affiliation":[{"name":"Harvard University, Cambridge, Massachusetts"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4680-4800","authenticated-orcid":false,"given":"Chun-Yi","family":"Lee","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,4,5]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"16","volume-title":"Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS)","author":"Asai M.","year":"2015","unstructured":"M. Asai and A. Fukunaga. 2015. Solving large-scale planning problems by decomposition and macro generation. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 16\u201324."},{"key":"e_1_3_1_3_2","first-page":"1471","article-title":"Unifying count-based exploration and intrinsic motivation","volume":"29","author":"Bellemare Marc","year":"2016","unstructured":"Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. 2016. Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems 29 (2016), 1471\u20131479.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1613\/jair.3912","article-title":"The arcade learning environment: An evaluation platform for general agents","volume":"47","author":"Bellemare M. G.","year":"2013","unstructured":"M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. J. Artificial Intelligence Research (JAIR) 47 (Jun. 2013), 253\u2013279.","journal-title":"J. Artificial Intelligence Research (JAIR)"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1613\/jair.1696","article-title":"Macro-FF: Improving AI planning with automatically learned macro-operators","volume":"24","author":"Botea A.","year":"2005","unstructured":"A. Botea, M. Enzenberger, M. M\u00fcller, and J. Schaeffer. 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. J. Artificial Intelligence Research (JAIR) 24 (Oct. 2005), 581\u2013621.","journal-title":"J. Artificial Intelligence Research (JAIR)"},{"key":"e_1_3_1_6_2","article-title":"Large-scale study of curiosity-driven learning","author":"Burda Yuri","year":"2018","unstructured":"Yuri Burda, Harri Edwards, Deepak Pathak, Amos Storkey, Trevor Darrell, and Alexei A. Efros. 2018. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355 (2018).","journal-title":"arXiv preprint arXiv:1808.04355"},{"key":"e_1_3_1_7_2","article-title":"Exploration by random network distillation","author":"Burda Yuri","year":"2018","unstructured":"Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).","journal-title":"arXiv preprint arXiv:1810.12894"},{"key":"e_1_3_1_8_2","first-page":"7546","volume-title":"Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19)","volume":"33","author":"Chrpa L.","year":"2019","unstructured":"L. Chrpa and M. Vallati. 2019. Improving domain-independent planning via critical section macro-operators. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 33. 7546\u20137553."},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1613\/jair.2077","article-title":"MARVIN: A heuristic search planner with online macro-action learning","volume":"28","author":"Coles A. I.","year":"2007","unstructured":"A. I. Coles and A. J. Smith. 2007. MARVIN: A heuristic search planner with online macro-action learning. J. Artificial Intelligence Research (JAIR) 28 (2007), 119\u2013156.","journal-title":"J. Artificial Intelligence Research (JAIR)"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114116"},{"key":"e_1_3_1_11_2","volume-title":"Feedback and Control Systems","author":"DiStefano Joseph J.","year":"2012","unstructured":"Joseph J. DiStefano, Allen R. Stubberud, and Ivan J. Williams. 2012. Feedback and Control Systems. McGraw-Hill Education."},{"key":"e_1_3_1_12_2","article-title":"Deep reinforcement learning with macro-actions","author":"Durugkar I. P.","year":"2016","unstructured":"I. P. Durugkar, C. Rosenbaum, S. Dernbach, and S. Mahadevan. 2016. Deep reinforcement learning with macro-actions. arXiv:1606.04615 (Jun. 2016).","journal-title":"arXiv:1606.04615"},{"key":"e_1_3_1_13_2","article-title":"Macro action reinforcement learning with sequence disentanglement using variational autoencoder","author":"Heecheol K.","year":"2019","unstructured":"K. Heecheol, M. Yamada, K. Miyoshi, and H. Yamakawa. 2019. Macro action reinforcement learning with sequence disentanglement using variational autoencoder. arXiv:1903.09366 (May 2019).","journal-title":"arXiv:1903.09366"},{"key":"e_1_3_1_14_2","article-title":"Stable baselines","author":"Hill A.","year":"2018","unstructured":"A. Hill, A. Raffin, M. Ernestus, A. Gleave, A. Kanervisto, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y. Wu. 2018. Stable baselines. https:\/\/github.com\/hill-a\/stable-baselines. (2018).","journal-title":"https:\/\/github.com\/hill-a\/stable-baselines"},{"key":"e_1_3_1_15_2","first-page":"1109","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS)","author":"Houthooft R.","year":"2016","unstructured":"R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. De Turck, and P. Abbeel. 2016. VIME: Variational Information Maximizing Exploration. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS). 1109\u20131117."},{"key":"e_1_3_1_16_2","first-page":"167","volume-title":"Proceedings of the 10th International Conference on Machine Learning","volume":"951","author":"Kaelbling L. P.","year":"1993","unstructured":"L. P. Kaelbling. 1993. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the 10th International Conference on Machine Learning, 951, 167\u2013173."},{"key":"e_1_3_1_17_2","first-page":"4444","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Khetarpal Khimya","year":"2020","unstructured":"Khimya Khetarpal, Martin Klissarov, Maxime Chevalier-Boisvert, Pierre-Luc Bacon, and Doina Precup. 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence. 4444\u20134451."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(85)90012-8"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.5555\/522098"},{"key":"e_1_3_1_20_2","first-page":"1928","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Mnih V.","year":"2016","unstructured":"V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML). 1928\u20131937."},{"key":"e_1_3_1_21_2","article-title":"Playing Atari with deep reinforcement learning","author":"Mnih V.","year":"2013","unstructured":"V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602 (Dec. 2013).","journal-title":"arXiv:1312.5602"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1613\/jair.613","article-title":"Evolutionary algorithms for reinforcement learning","volume":"11","author":"Moriarty D. E.","year":"1999","unstructured":"D. E. Moriarty, A. C. Schultz, and J. J. Grefenstette. 1999. Evolutionary algorithms for reinforcement learning. J. Artificial Intelligence Research (JAIR) 11 (1999), 241\u2013276.","journal-title":"J. Artificial Intelligence Research (JAIR)"},{"key":"e_1_3_1_24_2","first-page":"256","volume-title":"Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS)","author":"Newton M. A. H.","year":"2007","unstructured":"M. A. H. Newton, J. Levine, M. Fox, and D. Long. 2007. Learning macro-actions for arbitrary planners and domains. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 256\u2013263."},{"key":"e_1_3_1_25_2","first-page":"2721","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ostrovski Georg","year":"2017","unstructured":"Georg Ostrovski, Marc G. Bellemare, A\u00e4ron Oord, and R\u00e9mi Munos. 2017. Count-based exploration with neural density models. In Proceedings of the International Conference on Machine Learning(PMLR), 2721\u20132730."},{"key":"e_1_3_1_26_2","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Pathak D.","year":"2017","unstructured":"D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning (ICML)."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(74)90026-5"},{"key":"e_1_3_1_28_2","article-title":"Evolution strategies as a scalable alternative to reinforcement learning","author":"Salimans T.","year":"2017","unstructured":"T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (Sept. 2017).","journal-title":"arXiv:1703.03864"},{"key":"e_1_3_1_29_2","article-title":"Proximal policy optimization algorithms","author":"Schulman J.","year":"2017","unstructured":"J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347 (Aug. 2017).","journal-title":"arXiv:1707.06347"},{"key":"e_1_3_1_30_2","article-title":"Accelerated methods for deep reinforcement learning","author":"Stooke Adam","year":"2018","unstructured":"Adam Stooke and Pieter Abbeel. 2018. Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811 (2018).","journal-title":"arXiv preprint arXiv:1803.02811"},{"key":"e_1_3_1_31_2","article-title":"Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning","author":"Such F. P.","year":"2018","unstructured":"F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune. 2018. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (Apr. 2018).","journal-title":"arXiv:1712.06567"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.1998.712192"},{"key":"e_1_3_1_33_2","first-page":"1057","volume-title":"Advances in Neural Information Processing Systems","author":"Sutton Richard S.","year":"2000","unstructured":"Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems. 1057\u20131063."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_1_36_2","first-page":"94","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment","volume":"15","author":"Xu S.","year":"2019","unstructured":"S. Xu, H. Kuang, Z. Zhi, R. Hu, Y. Liu, and H. Sun. 2019. Macro action selection with deep reinforcement learning in StarCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 15. 94\u201399."}],"container-title":["ACM Transactions on Evolutionary Learning and Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514260","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3514260","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:14Z","timestamp":1750183814000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514260"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,31]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,3,31]]}},"alternative-id":["10.1145\/3514260"],"URL":"https:\/\/doi.org\/10.1145\/3514260","relation":{},"ISSN":["2688-299X","2688-3007"],"issn-type":[{"type":"print","value":"2688-299X"},{"type":"electronic","value":"2688-3007"}],"subject":[],"published":{"date-parts":[[2022,3,31]]},"assertion":[{"value":"2021-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-04-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}