{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:53:42Z","timestamp":1760234022494,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2021,3,21]],"date-time":"2021-03-21T00:00:00Z","timestamp":1616284800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["Grant 2018AAA0102402"],"award-info":[{"award-number":["Grant 2018AAA0102402"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"External cooperation key project of Chinese Academy Sciences","award":["No. 173211KYSB20200002"],"award-info":[{"award-number":["No. 173211KYSB20200002"]}]},{"name":"Innovation Academy for Light-duty Gas Turbine,Chinese Academy of Sciences","award":["No.CXYJJ19-ZD-02 and No.CXYJJ20-QN-05"],"award-info":[{"award-number":["No.CXYJJ19-ZD-02 and No.CXYJJ20-QN-05"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Multiagent cooperation is one of the most attractive research fields in multiagent systems. There are many attempts made by researchers in this field to promote cooperation behavior. However, several issues still exist, such as complex interactions among different groups of agents, redundant communication contents of irrelevant agents, which prevents the learning and convergence of agent cooperation behaviors. To address the limitations above, a novel method called multiagent hierarchical cognition difference policy (MA-HCDP) is proposed in this paper. It includes a hierarchical group network (HGN), a cognition difference network (CDN), and a soft communication network (SCN). HGN is designed to distinguish different underlying information of diverse groups\u2019 observations (including friendly group, enemy group, and object group) and extract different high-dimensional state representations of different groups. CDN is designed based on a variational auto-encoder to allow each agent to choose its neighbors (communication targets) adaptively with its environment cognition difference. SCN is designed to handle the complex interactions among the agents with a soft attention mechanism. The results of simulations demonstrate the superior effectiveness of our method compared with existing methods.<\/jats:p>","DOI":"10.3390\/a14030098","type":"journal-article","created":{"date-parts":[[2021,3,21]],"date-time":"2021-03-21T22:00:37Z","timestamp":1616364037000},"page":"98","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Multiagent Hierarchical Cognition Difference Policy for Multiagent Cooperation"],"prefix":"10.3390","volume":"14","author":[{"given":"Huimu","family":"Wang","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Zhen","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3268-9482","authenticated-orcid":false,"given":"Jianqiang","family":"Yi","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Zhiqiang","family":"Pu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,21]]},"reference":[{"key":"ref_1","first-page":"569","article-title":"Recurrent Deep Multiagent Q-Learning for Autonomous Brokers in Smart Grid","volume":"18","author":"Yang","year":"2018","journal-title":"IJCAI"},{"key":"ref_2","unstructured":"Li, X., Zhang, J., Bian, J., Tong, Y., and Liu, T.Y. (2019, January 13\u201317). A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Montreal, QC, Canada."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_4","unstructured":"Ye, D., Chen, G., Zhao, P., Qiu, F., Yuan, B., Zhang, W., Chen, S., Sun, M., Li, X., and Li, S. (2020). Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings. IEEE Trans. Neural Netw. Learn. Syst., 1\u201311."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_6","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 20\u201322). Asynchronous methods for deep reinforcement learning. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_7","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous Control with Deep Reinforcement Learning. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T., and Levine, S. (2016). Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"ref_9","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., and Mordatch, I. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. (2018, January 2\u20137). Counterfactual multi-agent policy gradients. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"ref_11","unstructured":"Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., and Wang, J. (2018, January 10\u201315). Mean Field Multi-Agent Reinforcement Learning. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ryu, H., Shin, H., and Park, J. (2020, January 7\u201312). Multi-agent actor-critic with hierarchical graph attention network. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i05.6214"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wu, S., Pu, Z., Yi, J., and Wang, H. (2020, January 18\u201322). Multi-agent Cooperation and Competition with Two-Level Attention Network. Proceedings of the International Conference on Neural Information Processing, Bangkok, Thailand.","DOI":"10.1007\/978-3-030-63833-7_44"},{"key":"ref_14","unstructured":"Sukhbaatar, S., Szlam, A., and Fergus, R. (2016, January 5\u201310). Learning multiagent communication with backpropagation. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_15","unstructured":"Agarwal, A., Kumar, S., Sycara, K., and Lewis, M. (2020, January 9\u201313). Learning Transferable Cooperative Behavior in Multi-Agent Teams. Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, Auckland, New Zealand."},{"key":"ref_16","unstructured":"Jiang, J., and Lu, Z. (2018, January 3\u20138). Learning attentional communication for multi-agent cooperation. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_17","unstructured":"Iqbal, S., and Sha, F. (2019, January 10\u201315). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_18","unstructured":"Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., and Pineau, J. (2019, January 10\u201315). TarMAC: Targeted Multi-Agent Communication. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_19","unstructured":"Peng, P., Wen, Y., Yang, Y., Yuan, Q., Tang, Z., Long, H., and Wang, J. (2017). Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv."},{"key":"ref_20","unstructured":"Kong, X., Xin, B., Liu, F., and Wang, Y. (2017). Revisiting the Master-Slave Architecture in Multi-Agent Deep Reinforcement Learning. arXiv."},{"key":"ref_21","unstructured":"Kingma, D.P., and Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv."},{"key":"ref_22","unstructured":"Kullback, S. (1997). Information Theory and Statistics, Courier Corporation."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine Learning Proceedings 1994, Elsevier.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"ref_24","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_25","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_26","unstructured":"Mnih, V., Heess, N., Graves, A., and Kavukcuoglu, K. (2014, January 8\u201313). Recurrent models of visual attention. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_27","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_28","unstructured":"Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015, January 7\u20139). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., and Gao, Y. (2020, January 7\u201312). Multi-agent game abstraction via graph attention neural network. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i05.6211"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Malysheva, A., Kudenko, D., and Shpilman, A. (2019, January 21\u201325). MAGNet: Multi-agent Graph Network for Deep Multi-agent Reinforcement Learning. Proceedings of the 2019 XVI International Symposium \u201cProblems of Redundancy in Information and Control Systems\u201d (REDUNDANCY), Moscow, Russia.","DOI":"10.1109\/REDUNDANCY48165.2019.9003345"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","article-title":"Variational Inference: A Review for Statisticians","volume":"112","author":"Blei","year":"2017","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_32","unstructured":"Jang, E., Gu, S., and Poole, B. (2016). Categorical Reparameterization with Gumbel-softmax. arXiv."},{"key":"ref_33","unstructured":"Veli\u010dkovi\u0107, P., Cucurull, G., Casanova, A., Romero, A., Li\u00f2, P., and Bengio, Y. (May, January 30). Graph Attention Networks. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/3\/98\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:38:52Z","timestamp":1760161132000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/3\/98"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,21]]},"references-count":33,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["a14030098"],"URL":"https:\/\/doi.org\/10.3390\/a14030098","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2021,3,21]]}}}