{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:14:41Z","timestamp":1771697681928,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":51,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["U21A20519 and 61772072"],"award-info":[{"award-number":["U21A20519 and 61772072"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,6]]},"DOI":"10.1145\/3580305.3599379","type":"proceedings-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T18:10:58Z","timestamp":1691172658000},"page":"3239-3248","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["HiMacMic: Hierarchical Multi-Agent Deep Reinforcement Learning with Dynamic Asynchronous Macro Strategy"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-4479-5056","authenticated-orcid":false,"given":"Hancheng","family":"Zhang","sequence":"first","affiliation":[{"name":"Beijing Inst. of Tech., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6663-6712","authenticated-orcid":false,"given":"Guozheng","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing Inst. of Tech., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0252-329X","authenticated-orcid":false,"given":"Chi Harold","family":"Liu","sequence":"additional","affiliation":[{"name":"Beijing Inst. of Tech., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0181-8379","authenticated-orcid":false,"given":"Guoren","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing Inst. of Tech., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4418-0114","authenticated-orcid":false,"given":"Jian","family":"Tang","sequence":"additional","affiliation":[{"name":"Midea Group, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2015. ECML\/PKDD 15: Taxi Trajectory. https:\/\/www.kaggle.com\/competitions\/ pkdd-15-predict-taxi-service-trajectory-i\/data. Accessed on 6 April 2023.  2015. ECML\/PKDD 15: Taxi Trajectory. https:\/\/www.kaggle.com\/competitions\/ pkdd-15-predict-taxi-service-trajectory-i\/data. Accessed on 6 April 2023."},{"key":"e_1_3_2_2_2_1","volume-title":"Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. CoRR","author":"Ahilan Sanjeevan","year":"2019","unstructured":"Sanjeevan Ahilan and Peter Dayan . 2019. Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. CoRR , Vol. abs\/ 1901 .08492 ( 2019 ). Sanjeevan Ahilan and Peter Dayan. 2019. Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning. CoRR, Vol. abs\/1901.08492 (2019)."},{"key":"e_1_3_2_2_3_1","volume-title":"OpenAI Pieter Abbeel, and Wojciech Zaremba","author":"Andrychowicz Marcin","year":"2017","unstructured":"Marcin Andrychowicz , Filip Wolski , Alex Ray , Jonas Schneider , Rachel Fong , Peter Welinder , Bob McGrew , Josh Tobin , OpenAI Pieter Abbeel, and Wojciech Zaremba . 2017 . Hindsight Experience Replay. In NeurIPS '17, Vol. 30 . 5048--5058. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight Experience Replay. In NeurIPS'17, Vol. 30. 5048--5058."},{"key":"e_1_3_2_2_4_1","volume-title":"ICLR'19","author":"Bansal Trapit","year":"2018","unstructured":"Trapit Bansal , Jakub Pachocki , Szymon Sidor , Ilya Sutskever , and Igor Mordatch . 2018 . Emergent Complexity via Multi-Agent Competition . In ICLR'19 . Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. 2018. Emergent Complexity via Multi-Agent Competition. In ICLR'19."},{"key":"e_1_3_2_2_5_1","volume-title":"ICML'21","volume":"139","author":"Biedenkapp Andr\u00e9","year":"2021","unstructured":"Andr\u00e9 Biedenkapp , Raghu Rajan , Frank Hutter , and Marius Lindauer . 2021 . TempoRL: Learning When to Act . In ICML'21 , Vol. 139 . 914--924. Andr\u00e9 Biedenkapp, Raghu Rajan, Frank Hutter, and Marius Lindauer. 2021. TempoRL: Learning When to Act. In ICML'21, Vol. 139. 914--924."},{"key":"e_1_3_2_2_6_1","volume-title":"Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS'14 Deep Learning and Representation Learning Workshop.","author":"Chung Junyoung","unstructured":"Junyoung Chung , c C aglar G\u00fc lcc ehre, KyungHyun Cho, and Yoshua Bengio. 2014 . Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS'14 Deep Learning and Representation Learning Workshop. Junyoung Chung, cC aglar G\u00fc lcc ehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS'14 Deep Learning and Representation Learning Workshop."},{"key":"e_1_3_2_2_7_1","volume-title":"CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning. In ICML'19","volume":"97","author":"Colas C\u00e9dric","year":"2019","unstructured":"C\u00e9dric Colas , Pierre-Yves Oudeyer , Olivier Sigaud , Pierre Fournier , and Mohamed Chetouani . 2019 . CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning. In ICML'19 , Vol. 97 . 1331--1340. C\u00e9dric Colas, Pierre-Yves Oudeyer, Olivier Sigaud, Pierre Fournier, and Mohamed Chetouani. 2019. CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning. In ICML'19, Vol. 97. 1331--1340."},{"key":"e_1_3_2_2_8_1","volume-title":"Hinton","author":"Dayan Peter","year":"1992","unstructured":"Peter Dayan and Geoffrey E . Hinton . 1992 . Feudal Reinforcement Learning. In NIPS '92, Vol. 5 . 271--278. Peter Dayan and Geoffrey E. Hinton. 1992. Feudal Reinforcement Learning. In NIPS'92, Vol. 5. 271--278."},{"key":"e_1_3_2_2_9_1","volume-title":"SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. CoRR","author":"Ellis Benjamin","year":"2022","unstructured":"Benjamin Ellis , Skander Moalla , Mikayel Samvelyan , Mingfei Sun , Anuj Mahajan , Jakob N. Foerster , and Shimon Whiteson . 2022. SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. CoRR , Vol. abs\/ 2212 .07489 ( 2022 ). Benjamin Ellis, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, and Shimon Whiteson. 2022. SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning. CoRR, Vol. abs\/2212.07489 (2022)."},{"key":"e_1_3_2_2_10_1","volume-title":"ICLR'19","author":"Eysenbach Benjamin","year":"2019","unstructured":"Benjamin Eysenbach , Abhishek Gupta , Julian Ibarz , and Sergey Levine . 2019 . Diversity is All You Need: Learning Skills without a Reward Function . In ICLR'19 . Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2019. Diversity is All You Need: Learning Skills without a Reward Function. In ICLR'19."},{"key":"e_1_3_2_2_11_1","volume-title":"Automatic Goal Generation for Reinforcement Learning Agents. In ICML'19","volume":"80","author":"Florensa Carlos","year":"2018","unstructured":"Carlos Florensa , David Held , Xinyang Geng , and Pieter Abbeel . 2018 . Automatic Goal Generation for Reinforcement Learning Agents. In ICML'19 , Vol. 80 . 1514--1523. Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic Goal Generation for Reinforcement Learning Agents. In ICML'19, Vol. 80. 1514--1523."},{"key":"e_1_3_2_2_12_1","volume-title":"Counterfactual Multi-Agent Policy Gradients. In AAAI'18","author":"Foerster Jakob N.","year":"2018","unstructured":"Jakob N. Foerster , Gregory Farquhar , Triantafyllos Afouras , Nantas Nardelli , and Shimon Whiteson . 2018 . Counterfactual Multi-Agent Policy Gradients. In AAAI'18 . 2974--2982. Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2018. Counterfactual Multi-Agent Policy Gradients. In AAAI'18. 2974--2982."},{"key":"e_1_3_2_2_13_1","volume-title":"NeurIPS'21","volume":"34","author":"Gao Yiming","year":"2021","unstructured":"Yiming Gao , Bei Shi , Xueying Du , Liang Wang , Guangwei Chen , Zhenjie Lian , Fuhao Qiu , GUOAN HAN , Weixuan Wang , Deheng Ye , Qiang Fu , Wei Yang , and Lanxiao Huang . 2021 . Learning Diverse Policies in MOBA Games via Macro-Goals . In NeurIPS'21 , Vol. 34 . 16171--16182. Yiming Gao, Bei Shi, Xueying Du, Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, GUOAN HAN, Weixuan Wang, Deheng Ye, Qiang Fu, Wei Yang, and Lanxiao Huang. 2021. Learning Diverse Policies in MOBA Games via Macro-Goals. In NeurIPS'21, Vol. 34. 16171--16182."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-021-09996-w"},{"key":"e_1_3_2_2_15_1","volume-title":"NeurIPS'21","volume":"34","author":"G\u00fcrtler Nico","year":"2021","unstructured":"Nico G\u00fcrtler , Dieter B\u00fcchler , and Georg Martius . 2021 . Hierarchical Reinforcement Learning with Timed Subgoals . In NeurIPS'21 , Vol. 34 . 21732--21743. Nico G\u00fcrtler, Dieter B\u00fcchler, and Georg Martius. 2021. Hierarchical Reinforcement Learning with Timed Subgoals. In NeurIPS'21, Vol. 34. 21732--21743."},{"key":"e_1_3_2_2_16_1","volume-title":"API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks. CoRR","author":"Hao Xiaotian","year":"2022","unstructured":"Xiaotian Hao , Weixun Wang , Hangyu Mao , Yaodong Yang , Dong Li , Yan Zheng , Zhen Wang , and Jianye Hao . 2022 . API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks. CoRR , Vol. abs\/ 2203 .05285 (2022). Xiaotian Hao, Weixun Wang, Hangyu Mao, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, and Jianye Hao. 2022. API: Boosting Multi-Agent Reinforcement Learning via Agent-Permutation-Invariant Networks. CoRR, Vol. abs\/2203.05285 (2022)."},{"key":"e_1_3_2_2_17_1","volume-title":"ICML'22","volume":"162","author":"Jeon Jeewon","year":"2022","unstructured":"Jeewon Jeon , Woojun Kim , Whiyoung Jung , and Youngchul Sung . 2022 . MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer . In ICML'22 , Vol. 162 . 10041--10052. Jeewon Jeon, Woojun Kim, Whiyoung Jung, and Youngchul Sung. 2022. MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer. In ICML'22, Vol. 162. 10041--10052."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.01.031"},{"key":"e_1_3_2_2_19_1","volume-title":"Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In NIPS'16","author":"Kulkarni Tejas D.","year":"2016","unstructured":"Tejas D. Kulkarni , Karthik Narasimhan , Ardavan Saeedi , and Josh Tenenbaum . 2016 . Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In NIPS'16 . 3675--3683. Tejas D. Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. In NIPS'16. 3675--3683."},{"key":"e_1_3_2_2_20_1","volume-title":"Google Research Football: A Novel Reinforcement Learning Environment. In AAAI'20","author":"Kurach Karol","year":"2020","unstructured":"Karol Kurach , Anton Raichuk , Piotr Stanczyk , Michal Zajac , Olivier Bachem , Lasse Espeholt , Carlos Riquelme , Damien Vincent , Marcin Michalski , Olivier Bousquet , and Sylvain Gelly . 2020 . Google Research Football: A Novel Reinforcement Learning Environment. In AAAI'20 . 4501--4510. Karol Kurach, Anton Raichuk, Piotr Stanczyk, Michal Zajac, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, and Sylvain Gelly. 2020. Google Research Football: A Novel Reinforcement Learning Environment. In AAAI'20. 4501--4510."},{"key":"e_1_3_2_2_21_1","volume-title":"Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning. In KDD'21","author":"Li Jiahui","year":"2021","unstructured":"Jiahui Li , Kun Kuang , Baoxiang Wang , Furui Liu , Long Chen , Fei Wu , and Jun Xiao . 2021 . Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning. In KDD'21 . 934--942. Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu, and Jun Xiao. 2021. Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning. In KDD'21. 934--942."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2018.2864373"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2019.2908171"},{"key":"e_1_3_2_2_24_1","volume-title":"NeurIPS'17","volume":"30","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , Pieter Abbeel , and Igor Mordatch . 2017 a. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments . In NeurIPS'17 , Vol. 30 . 6379--6390. Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017a. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In NeurIPS'17, Vol. 30. 6379--6390."},{"key":"e_1_3_2_2_25_1","volume-title":"OpenAI Pieter Abbeel, and Igor Mordatch","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , OpenAI Pieter Abbeel, and Igor Mordatch . 2017 b. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In NeurIPS '17, Vol. 30 . 6379--6390. Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017b. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In NeurIPS'17, Vol. 30. 6379--6390."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539481"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_2_28_1","volume-title":"Oliehoek and Christopher Amato","author":"Frans","year":"2016","unstructured":"Frans A. Oliehoek and Christopher Amato . 2016 . A Concise Introduction to Decentralized POMDPs. Springer . Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs. Springer."},{"key":"e_1_3_2_2_29_1","volume-title":"Inferences on a multidimensional social hierarchy use a grid-like code. Nature neuroscience","author":"Park Seongmin A","year":"2021","unstructured":"Seongmin A Park , Douglas S Miller , and Erie D Boorman . 2021. Inferences on a multidimensional social hierarchy use a grid-like code. Nature neuroscience , Vol. 24 , 9 ( 2021 ), 1292--1301. Seongmin A Park, Douglas S Miller, and Erie D Boorman. 2021. Inferences on a multidimensional social hierarchy use a grid-like code. Nature neuroscience, Vol. 24, 9 (2021), 1292--1301."},{"key":"e_1_3_2_2_30_1","volume-title":"Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Boehmer, and Shimon Whiteson.","author":"Peng Bei","year":"2021","unstructured":"Bei Peng , Tabish Rashid , Christian Schr\u00f6der de Witt , Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Boehmer, and Shimon Whiteson. 2021 . FACMAC : Factored Multi-Agent Centralised Policy Gradients. In NeurIPS '21, Vol. 34 . 12208--12221. Bei Peng, Tabish Rashid, Christian Schr\u00f6der de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Boehmer, and Shimon Whiteson. 2021. FACMAC: Factored Multi-Agent Centralised Policy Gradients. In NeurIPS'21, Vol. 34. 12208--12221."},{"key":"e_1_3_2_2_31_1","volume-title":"Abhishek Gupta, Pengfei Wei, Zhu Sun, and Zejun Ma.","author":"Qu Xinghua","year":"2022","unstructured":"Xinghua Qu , Yew Soon Ong , Abhishek Gupta, Pengfei Wei, Zhu Sun, and Zejun Ma. 2022 . Importance Prioritized Policy Distillation. In KDD '22. 1420--1429. Xinghua Qu, Yew Soon Ong, Abhishek Gupta, Pengfei Wei, Zhu Sun, and Zejun Ma. 2022. Importance Prioritized Policy Distillation. In KDD'22. 1420--1429."},{"key":"e_1_3_2_2_32_1","volume-title":"ICML'19","volume":"80","author":"Raileanu Roberta","year":"2018","unstructured":"Roberta Raileanu , Emily Denton , Arthur Szlam , and Rob Fergus . 2018 . Modeling Others using Oneself in Multi-Agent Reinforcement Learning . In ICML'19 , Vol. 80 . 4254--4263. Roberta Raileanu, Emily Denton, Arthur Szlam, and Rob Fergus. 2018. Modeling Others using Oneself in Multi-Agent Reinforcement Learning. In ICML'19, Vol. 80. 4254--4263."},{"key":"e_1_3_2_2_33_1","volume-title":"NeurIPS'20","volume":"33","author":"Rashid Tabish","year":"2020","unstructured":"Tabish Rashid , Gregory Farquhar , Bei Peng , and Shimon Whiteson . 2020 . Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning . In NeurIPS'20 , Vol. 33 . 10199--10210. Tabish Rashid, Gregory Farquhar, Bei Peng, and Shimon Whiteson. 2020. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In NeurIPS'20, Vol. 33. 10199--10210."},{"key":"e_1_3_2_2_34_1","volume-title":"Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson.","author":"Rashid Tabish","year":"2018","unstructured":"Tabish Rashid , Mikayel Samvelyan , Christian Schr\u00f6 der de Witt , Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018 . QMIX : Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In ICML '18, Vol. 80 . 4292--4301. Tabish Rashid, Mikayel Samvelyan, Christian Schr\u00f6 der de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In ICML'18, Vol. 80. 4292--4301."},{"key":"e_1_3_2_2_35_1","volume-title":"Q-Decomposition for Reinforcement Learning Agents. In ICML'03","author":"Russell Stuart","year":"2003","unstructured":"Stuart Russell and Andrew Zimdars . 2003 . Q-Decomposition for Reinforcement Learning Agents. In ICML'03 . 656--663. Stuart Russell and Andrew Zimdars. 2003. Q-Decomposition for Reinforcement Learning Agents. In ICML'03. 656--663."},{"key":"e_1_3_2_2_36_1","volume-title":"The StarCraft Multi-Agent Challenge. In AAMAS'19","author":"Samvelyan Mikayel","year":"2019","unstructured":"Mikayel Samvelyan , Tabish Rashid , Christian Schr\u00f6der de Witt , Gregory Farquhar , Nantas Nardelli , Tim G. J. Rudner , Chia-Man Hung , Philip H. S. Torr , Jakob N. Foerster , and Shimon Whiteson . 2019 . The StarCraft Multi-Agent Challenge. In AAMAS'19 . 2186--2188. Mikayel Samvelyan, Tabish Rashid, Christian Schr\u00f6der de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob N. Foerster, and Shimon Whiteson. 2019. The StarCraft Multi-Agent Challenge. In AAMAS'19. 2186--2188."},{"key":"e_1_3_2_2_37_1","volume-title":"David Hostallero, and Yung Yi.","author":"Son Kyunghwan","year":"2019","unstructured":"Kyunghwan Son , Daewoo Kim , Wan Ju Kang , David Hostallero, and Yung Yi. 2019 . QTRAN : Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In ICML '19, Vol. 97 . 5887--5896. Kyunghwan Son, Daewoo Kim, Wan Ju Kang, David Hostallero, and Yung Yi. 2019. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In ICML'19, Vol. 97. 5887--5896."},{"key":"e_1_3_2_2_38_1","volume-title":"Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In AAMAS'18","author":"Sunehag Peter","year":"2018","unstructured":"Peter Sunehag , Guy Lever , Audrunas Gruslys , Wojciech Marian Czarnecki , Vin\u00edcius Flores Zambaldi , Max Jaderberg , Marc Lanctot , Nicolas Sonnerat , Joel Z. Leibo , Karl Tuyls , and Thore Graepel . 2018 . Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In AAMAS'18 . 2085--2087. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vin\u00edcius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, and Thore Graepel. 2018. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In AAMAS'18. 2085--2087."},{"key":"e_1_3_2_2_39_1","first-page":"1","article-title":"Between MDPs and Semi-MDPs","volume":"112","author":"Sutton Richard S.","year":"1999","unstructured":"Richard S. Sutton , Doina Precup , and Satinder Singh . 1999 . Between MDPs and Semi-MDPs : A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. , Vol. 112 , 1 -- 2 (1999), 181--211. Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell., Vol. 112, 1--2 (1999), 181--211.","journal-title":"A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467096"},{"key":"e_1_3_2_2_41_1","volume-title":"Multi-Objective Model-Based Reinforcement Learning for Infectious Disease Control. In KDD'21","author":"Wan Runzhe","year":"2021","unstructured":"Runzhe Wan , Xinyu Zhang , and Rui Song . 2021 . Multi-Objective Model-Based Reinforcement Learning for Infectious Disease Control. In KDD'21 . 1634--1644. Runzhe Wan, Xinyu Zhang, and Rui Song. 2021. Multi-Objective Model-Based Reinforcement Learning for Infectious Disease Control. In KDD'21. 1634--1644."},{"key":"e_1_3_2_2_42_1","volume-title":"Zipeng Dai, Jian Tang, and Guoren Wang.","author":"Wang Hao","year":"2021","unstructured":"Hao Wang , Chi Harold Liu , Zipeng Dai, Jian Tang, and Guoren Wang. 2021 b. Energy-Efficient 3D Vehicular Crowdsourcing for Disaster Response by Distributed Deep Reinforcement Learning. In KDD '21. 3679--3687. Hao Wang, Chi Harold Liu, Zipeng Dai, Jian Tang, and Guoren Wang. 2021b. Energy-Efficient 3D Vehicular Crowdsourcing for Disaster Response by Distributed Deep Reinforcement Learning. In KDD'21. 3679--3687."},{"key":"e_1_3_2_2_43_1","volume-title":"QPLEX: Duplex Dueling Multi-Agent Q-Learning. In ICLR'21","author":"Wang Jianhao","year":"2021","unstructured":"Jianhao Wang , Zhizhou Ren , Terry Liu , Yang Yu , and Chongjie Zhang . 2021 c. QPLEX: Duplex Dueling Multi-Agent Q-Learning. In ICLR'21 . Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, and Chongjie Zhang. 2021c. QPLEX: Duplex Dueling Multi-Agent Q-Learning. In ICLR'21."},{"key":"e_1_3_2_2_44_1","volume-title":"Too Many Cooks: Coordinating Multi-agent Collaboration Through Inverse Planning. In AAMAS'20","author":"Wang Rose E.","year":"2020","unstructured":"Rose E. Wang , Sarah A. Wu , James A. Evans , Joshua B. Tenenbaum , David C. Parkes , and Max Kleiman-Weiner . 2020 . Too Many Cooks: Coordinating Multi-agent Collaboration Through Inverse Planning. In AAMAS'20 . 2032--2034. Rose E. Wang, Sarah A. Wu, James A. Evans, Joshua B. Tenenbaum, David C. Parkes, and Max Kleiman-Weiner. 2020. Too Many Cooks: Coordinating Multi-agent Collaboration Through Inverse Planning. In AAMAS'20. 2032--2034."},{"key":"e_1_3_2_2_45_1","volume-title":"ICLR'21","author":"Wang Tonghan","year":"2021","unstructured":"Tonghan Wang , Tarun Gupta , Anuj Mahajan , Bei Peng , Shimon Whiteson , and Chongjie Zhang . 2021 a. RODE: Learning Roles to Decompose Multi-Agent Tasks . In ICLR'21 . Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, and Chongjie Zhang. 2021a. RODE: Learning Roles to Decompose Multi-Agent Tasks. In ICLR'21."},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"crossref","unstructured":"Yuchen Xiao Weihao Tan and Christopher Amato. 2022. Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning. In NeurIPS'22.  Yuchen Xiao Weihao Tan and Christopher Amato. 2022. Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning. In NeurIPS'22.","DOI":"10.1109\/MRS50823.2021.9620607"},{"key":"e_1_3_2_2_47_1","volume-title":"Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery. In AAMAS'20","author":"Yang Jiachen","year":"2020","unstructured":"Jiachen Yang , Igor Borovikov , and Hongyuan Zha . 2020 a. Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery. In AAMAS'20 . 1566--1574. Jiachen Yang, Igor Borovikov, and Hongyuan Zha. 2020a. Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery. In AAMAS'20. 1566--1574."},{"key":"e_1_3_2_2_48_1","volume-title":"ICLR'20","author":"Yang Jiachen","year":"2020","unstructured":"Jiachen Yang , Alireza Nakhaei , David Isele , Kikuo Fujimura , and Hongyuan Zha . 2020 b. CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning . In ICLR'20 . Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, and Hongyuan Zha. 2020b. CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning. In ICLR'20."},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539391"},{"key":"e_1_3_2_2_50_1","volume-title":"Multi-Agent Games. CoRR","author":"Yu Chao","year":"1955","unstructured":"Chao Yu , Akash Velu , Eugene Vinitsky , Yu Wang , Alexandre M. Bayen , and Yi Wu. 2021. The Surprising Effectiveness of MAPPO in Cooperative , Multi-Agent Games. CoRR , Vol. abs\/ 2103 .0 1955 (2021). Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre M. Bayen, and Yi Wu. 2021. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. CoRR, Vol. abs\/2103.01955 (2021)."},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2017.2783439"}],"event":{"name":"KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Long Beach CA USA","acronym":"KDD '23","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599379","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580305.3599379","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:48Z","timestamp":1750178268000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599379"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":51,"alternative-id":["10.1145\/3580305.3599379","10.1145\/3580305"],"URL":"https:\/\/doi.org\/10.1145\/3580305.3599379","relation":{},"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"2023-08-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}