{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T07:46:12Z","timestamp":1753602372157,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,6,30]],"date-time":"2020-06-30T00:00:00Z","timestamp":1593475200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Fundamental Research Funds for the Central Universities, SCUT","award":["D2182480"],"award-info":[{"award-number":["D2182480"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62076100"],"award-info":[{"award-number":["62076100"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Science and Technology Programs of Guangzhou","award":["201802010027, 201902010046"],"award-info":[{"award-number":["201802010027, 201902010046"]}]},{"name":"the Research Grants Council of the Hong Kong Special Administrative Region, China","award":["CUHK 14209321"],"award-info":[{"award-number":["CUHK 14209321"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher\u2019s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.<\/jats:p>","DOI":"10.1145\/3447268","type":"journal-article","created":{"date-parts":[[2021,4,19]],"date-time":"2021-04-19T23:26:18Z","timestamp":1618874778000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint"],"prefix":"10.1145","volume":"15","author":[{"given":"Changxi","family":"Zhu","sequence":"first","affiliation":[{"name":"School of Software Engineering, South China University of Technology, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ho-Fung","family":"Leung","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, The Chinese University of Hong Kong and Department of Sociology, The Chinese University of Hong Kong, Hong Kong SAR, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuyue","family":"Hu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National University of Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Cai","sequence":"additional","affiliation":[{"name":"School of Software Engineering, South China University of Technology and Key Laboratory of Big Data and Intelligent Robot (South China University of Technology), Ministry of Education, Guangzhou, Guangdong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,4,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/3477.979961"},{"key":"e_1_2_1_2_1","unstructured":"Hidehisa Akiyama. 2012. Agent2d base code. Retrieved from https:\/\/zh.osdn.net\/projects\/rctools\/.  Hidehisa Akiyama. 2012. Agent2d base code. Retrieved from https:\/\/zh.osdn.net\/projects\/rctools\/."},{"volume-title":"Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI\u201916)","author":"Amir Ofra","key":"e_1_2_1_3_1","unstructured":"Ofra Amir , Ece Kamar , Andrey Kolobov , and Barbara J. Grosz . 2016. Interactive teaching strategies for agent training . In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI\u201916) . 804--811. Ofra Amir, Ece Kamar, Andrey Kolobov, and Barbara J. Grosz. 2016. Interactive teaching strategies for agent training. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI\u201916). 804--811."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICC.2018.8422864"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2009.11.009"},{"volume-title":"Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1687--1693","author":"Brys Tim","key":"e_1_2_1_6_1","unstructured":"Tim Brys , Ann Now\u00e9 , Daniel Kudenko , and Matthew E. Taylor . 2014. Combining multiple correlated reward and shaping signals by measuring confidence . In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1687--1693 . Tim Brys, Ann Now\u00e9, Daniel Kudenko, and Matthew E. Taylor. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1687--1693."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the National Conference on Artificial Intelligence. 746--752","author":"Claus Caroline","year":"1998","unstructured":"Caroline Claus and Craig Boutilier . 1998 . The dynamics of reinforcement learning in cooperative multiagent systems . In Proceedings of the National Conference on Artificial Intelligence. 746--752 . Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence. 746--752."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/WI-IAT.2012.28"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11396"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems. 1100--1108","author":"da Silva Felipe Leno","year":"2017","unstructured":"Felipe Leno da Silva , Ruben Glatt , and Anna Helena Reali Costa . 2017 . Simultaneously learning and advising in multiagent reinforcement learning . In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems. 1100--1108 . Felipe Leno da Silva, Ruben Glatt, and Anna Helena Reali Costa. 2017. Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems. 1100--1108."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.4236\/ica.2016.74012"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.3390\/make1010002"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3310090"},{"volume-title":"Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. 66--83","author":"Gupta Jayesh K.","key":"e_1_2_1_15_1","unstructured":"Jayesh K. Gupta , Maxim Egorov , and Mykel J. Kochenderfer . 2017. Cooperative multi-agent control using deep reinforcement learning . In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. 66--83 . Jayesh K. Gupta, Maxim Egorov, and Mykel J. Kochenderfer. 2017. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. 66--83."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2644819"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the AAMAS Adaptive Learning Agents (ALA) Workshop.","author":"Hausknecht Matthew","year":"2016","unstructured":"Matthew Hausknecht , Prannoy Mupparaju , Sandeep Subramanian , Shivaram Kalyanakrishnan , and Peter Stone . 2016 . Half field offense: An environment for multiagent learning and ad hoc teamwork . In Proceedings of the AAMAS Adaptive Learning Agents (ALA) Workshop. Retrieved from http:\/\/www.cs.utexas.edu\/users\/ai-lab?hausknecht:aamasws16. Matthew Hausknecht, Prannoy Mupparaju, Sandeep Subramanian, Shivaram Kalyanakrishnan, and Peter Stone. 2016. Half field offense: An environment for multiagent learning and ad hoc teamwork. In Proceedings of the AAMAS Adaptive Learning Agents (ALA) Workshop. Retrieved from http:\/\/www.cs.utexas.edu\/users\/ai-lab?hausknecht:aamasws16."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems 2","author":"De Hauwere Yann-Micha\u00ebl","year":"2010","unstructured":"Yann-Micha\u00ebl De Hauwere , Peter Vrancx , and Ann Now\u00e9 . 2010 . Learning multi-agent state space representations . In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems 2 (2010), 715--722. Yann-Micha\u00ebl De Hauwere, Peter Vrancx, and Ann Now\u00e9. 2010. Learning multi-agent state space representations. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems 2 (2010), 715--722."},{"volume-title":"Adaptive and Learning Agents","author":"De Hauwere Yann-Micha\u00ebl","key":"e_1_2_1_19_1","unstructured":"Yann-Micha\u00ebl De Hauwere , Peter Vrancx , and Ann Now\u00e9 . 2011. Solving sparse delayed coordination problems in multi-agent reinforcement learning . In Adaptive and Learning Agents . Springer Berlin , 114--133. Yann-Micha\u00ebl De Hauwere, Peter Vrancx, and Ann Now\u00e9. 2011. Solving sparse delayed coordination problems in multi-agent reinforcement learning. In Adaptive and Learning Agents. Springer Berlin, 114--133."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1388--1396","author":"Hong Zhang-Wei","year":"2018","unstructured":"Zhang-Wei Hong , Shih-Yang Su , Tzu-Yun Shann , Yi-Hsiang Chang , and Chun-Yi Lee . 2018 . A deep policy inference Q-network for multi-agent systems . In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1388--1396 . Zhang-Wei Hong, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, and Chun-Yi Lee. 2018. A deep policy inference Q-network for multi-agent systems. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1388--1396."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIG.2019.8847988"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the Robot Soccer World Cup I (RoboCup\u201997)","author":"Kitano Hiroaki","year":"1997","unstructured":"Hiroaki Kitano , Minoru Asada , Yasuo Kuniyoshi , Itsuki Noda , Eiichi Osawa , and Hitoshi Matsubara . 1997 . RoboCup: A challenge problem for AI . In Proceedings of the Robot Soccer World Cup I (RoboCup\u201997) . 1--19. Hiroaki Kitano, Minoru Asada, Yasuo Kuniyoshi, Itsuki Noda, Eiichi Osawa, and Hitoshi Matsubara. 1997. RoboCup: A challenge problem for AI. In Proceedings of the Robot Soccer World Cup I (RoboCup\u201997). 1--19."},{"volume-title":"Proceedings of the Machine Learning Conference of Belgium and the Netherlands. 65--71","author":"Jelle","key":"e_1_2_1_23_1","unstructured":"Jelle R. Kok and Nikos Vlassis. 2004. Sparse tabular multiagent Q-learning . In Proceedings of the Machine Learning Conference of Belgium and the Netherlands. 65--71 . Jelle R. Kok and Nikos Vlassis. 2004. Sparse tabular multiagent Q-learning. In Proceedings of the Machine Learning Conference of Belgium and the Netherlands. 65--71."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2140--2146","author":"Lample Guillaume","year":"2017","unstructured":"Guillaume Lample and Devendra Singh Chaplot . 2017 . Playing FPS games with deep reinforcement learning . In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2140--2146 . Guillaume Lample and Devendra Singh Chaplot. 2017. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2140--2146."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning. 1995--2003","author":"Le Hoang Minh","year":"2017","unstructured":"Hoang Minh Le , Yisong Yue , Peter Carr , and Patrick Lucey . 2017 . Coordinated multi-agent imitation learning . In Proceedings of the 34th International Conference on Machine Learning. 1995--2003 . Hoang Minh Le, Yisong Yue, Peter Carr, and Patrick Lucey. 2017. Coordinated multi-agent imitation learning. In Proceedings of the 34th International Conference on Machine Learning. 1995--2003."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3070861"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888912000057"},{"key":"e_1_2_1_28_1","volume-title":"Oliehoek and Christopher Amato","author":"Frans","year":"2016","unstructured":"Frans A. Oliehoek and Christopher Amato . 2016 . A Concise Introduction to Decentralized POMDPs. (SpringerBriefs in Intelligent Systems). Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs. (SpringerBriefs in Intelligent Systems)."},{"volume-title":"Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. 6128--6136","author":"Omidshafiei Shayegan","key":"e_1_2_1_29_1","unstructured":"Shayegan Omidshafiei , Dong-Ki Kim , Miao Liu , Gerald Tesauro , Matthew Riemer , Christopher Amato , Murray Campbell , and Jonathan P. How . 2019. Learning to teach in cooperative multiagent reinforcement learning . In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. 6128--6136 . Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, and Jonathan P. How. 2019. Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. 6128--6136."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1722--1724","author":"Rosenfeld Ariel","year":"2017","unstructured":"Ariel Rosenfeld , Matthew E. Taylor , and Sarit Kraus . 2017 . Speeding up tabular reinforcement learning using state-action similarities . In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1722--1724 . Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. 2017. Speeding up tabular reinforcement learning using state-action similarities. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1722--1724."},{"volume-title":"Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA\u201905)","author":"Alexander","key":"e_1_2_1_31_1","unstructured":"Alexander A. Sherstov and Peter Stone. 2005. Function approximation via tile coding: Automating parameter choice . In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA\u201905) . 194--205. Alexander A. Sherstov and Peter Stone. 2005. Function approximation via tile coding: Automating parameter choice. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA\u201905). 194--205."},{"key":"e_1_2_1_32_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998","unstructured":"Richard S. Sutton and Andrew G . Barto . 1998 . Reinforcement Learning : An Introduction (1st. ed.). The MIT Press , Cambridge, MA. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction (1st. ed.). The MIT Press, Cambridge, MA."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2019.8851784"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2014.6889438"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/1314498.1314569"},{"volume-title":"Proceedings of the Adaptive and Learning Agents Workshop (ALA\u201912)","author":"Torrey Lisa","key":"e_1_2_1_37_1","unstructured":"Lisa Torrey and Matthew E. Taylor . 2012. Help an agent out: Student\/teacher learning in sequential decision tasks . In Proceedings of the Adaptive and Learning Agents Workshop (ALA\u201912) . 41--48. Lisa Torrey and Matthew E. Taylor. 2012. Help an agent out: Student\/teacher learning in sequential decision tasks. In Proceedings of the Adaptive and Learning Agents Workshop (ALA\u201912). 41--48."},{"volume-title":"Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems. 1053--1060","author":"Torrey Lisa","key":"e_1_2_1_38_1","unstructured":"Lisa Torrey and Matthew E. Taylor . 2013. Teaching on a budget: Agents advising agents in reinforcement learning . In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems. 1053--1060 . Lisa Torrey and Matthew E. Taylor. 2013. Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems. 1053--1060."},{"key":"e_1_2_1_39_1","volume-title":"Double Q-learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2613--2621","author":"van Hasselt Hado","year":"2010","unstructured":"Hado van Hasselt . 2010 . Double Q-learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2613--2621 . Hado van Hasselt. 2010. Double Q-learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2613--2621."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"volume-title":"Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS\u201908)","author":"Zhang Chongjie","key":"e_1_2_1_41_1","unstructured":"Chongjie Zhang , Sherief Abdallah , and Victor R. Lesser . 2008. Efficient multi-agent reinforcement learning through automated supervision . In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS\u201908) . 1365--1370. Chongjie Zhang, Sherief Abdallah, and Victor R. Lesser. 2008. Efficient multi-agent reinforcement learning through automated supervision. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS\u201908). 1365--1370."},{"volume-title":"Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3455--3461","author":"Zhang Zongzhang","key":"e_1_2_1_42_1","unstructured":"Zongzhang Zhang , Zhiyuan Pan , and Mykel J. Kochenderfer . 2017. Weighted double Q-learning . In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3455--3461 . Zongzhang Zhang, Zhiyuan Pan, and Mykel J. Kochenderfer. 2017. Weighted double Q-learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3455--3461."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the AAMAS Workshop on Autonomous Robots & Multirobot Systems.","author":"Zimmer Matthieu","year":"2014","unstructured":"Matthieu Zimmer , Paolo Viappiani , and Paul Weng . 2014 . Teacher-student framework: A reinforcement learning approach . In Proceedings of the AAMAS Workshop on Autonomous Robots & Multirobot Systems. Matthieu Zimmer, Paolo Viappiani, and Paul Weng. 2014. Teacher-student framework: A reinforcement learning approach. In Proceedings of the AAMAS Workshop on Autonomous Robots & Multirobot Systems."}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447268","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447268","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:27Z","timestamp":1750195707000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447268"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,30]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3447268"],"URL":"https:\/\/doi.org\/10.1145\/3447268","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"type":"print","value":"1556-4665"},{"type":"electronic","value":"1556-4703"}],"subject":[],"published":{"date-parts":[[2020,6,30]]},"assertion":[{"value":"2019-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}