{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T08:19:05Z","timestamp":1765354745425,"version":"3.37.3"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2021,1,4]],"date-time":"2021-01-04T00:00:00Z","timestamp":1609718400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2021,1,4]],"date-time":"2021-01-04T00:00:00Z","timestamp":1609718400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2021,7]]},"DOI":"10.1007\/s10489-020-02034-2","type":"journal-article","created":{"date-parts":[[2021,1,4]],"date-time":"2021-01-04T04:15:53Z","timestamp":1609733753000},"page":"4434-4452","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks"],"prefix":"10.1007","volume":"51","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5774-767X","authenticated-orcid":false,"given":"Takumi","family":"Aotani","sequence":"first","affiliation":[]},{"given":"Taisuke","family":"Kobayashi","sequence":"additional","affiliation":[]},{"given":"Kenji","family":"Sugimoto","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,4]]},"reference":[{"key":"2034_CR1","doi-asserted-by":"crossref","unstructured":"Darmanin RN, Bugeja MK (2017) A review on multi-robot systems categorised by application domain. In: Mediterranean Conference on Control and Automation, pp 701\u2013706, IEEE","DOI":"10.1109\/MED.2017.7984200"},{"issue":"4","key":"2034_CR2","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1109\/TRO.2010.2052169","volume":"26","author":"H Bai","year":"2010","unstructured":"Bai H, Wen JT (2010) Cooperative load transport: A formation-control perspective. IEEE Trans Robot 26(4):742\u2013750","journal-title":"IEEE Trans Robot"},{"issue":"6","key":"2034_CR3","doi-asserted-by":"publisher","first-page":"492","DOI":"10.1016\/j.isprsjprs.2010.09.003","volume":"65","author":"R Sandau","year":"2010","unstructured":"Sandau R, Brie\u00df K., D\u2019Errico M (2010) Small satellites for global coverage: Potential and limits. ISPRS Journal of photogrammetry and Remote Sensing 65(6):492\u2013504","journal-title":"ISPRS Journal of photogrammetry and Remote Sensing"},{"issue":"4","key":"2034_CR4","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1007\/s10514-012-9320-1","volume":"34","author":"KM Wurm","year":"2013","unstructured":"Wurm KM, Dornhege C, Nebel B, Burgard W, Stachniss C (2013) Coordinating heterogeneous teams of robots using temporal symbolic planning. Auton Robot 34(4):277\u2013294","journal-title":"Auton Robot"},{"key":"2034_CR5","doi-asserted-by":"crossref","unstructured":"Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings, pp 157\u2013163, Elsevier","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"2034_CR6","volume-title":"Reinforcement learning: An introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press, Cambridge"},{"key":"2034_CR7","unstructured":"Sen S, Sekaran M, Hale J, et al. (1994) Learning to coordinate without sharing information. In: AAAI Conference on Artificial Intelligence, 94, pp 426\u2013431"},{"issue":"1","key":"2034_CR8","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1023\/A:1008819414322","volume":"4","author":"MJ Matari\u0107","year":"1997","unstructured":"Matari\u0107 M.J. (1997) Reinforcement learning in the multi-robot domain. Auton Robot 4(1):73\u201383","journal-title":"Auton Robot"},{"issue":"7540","key":"2034_CR9","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529","journal-title":"Nature"},{"key":"2034_CR10","unstructured":"Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International Conference on Machine Learning, pp 1889\u20131897"},{"key":"2034_CR11","unstructured":"Foerster J, Assael IA, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp 2137\u20132145"},{"key":"2034_CR12","doi-asserted-by":"crossref","unstructured":"Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp 66\u201383, Springer","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"2034_CR13","doi-asserted-by":"publisher","first-page":"855","DOI":"10.1016\/j.procs.2016.05.376","volume":"80","author":"DM Guisi","year":"2016","unstructured":"Guisi DM, Ribeiro R, Teixeira M, Borges AP, Enembreck F (2016) Reinforcement learning with multiple shared rewards. Procedia Computer Science 80:855\u2013864","journal-title":"Procedia Computer Science"},{"key":"2034_CR14","unstructured":"Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. In: International Conference on Machine Learning, pp 4254\u20134263"},{"key":"2034_CR15","unstructured":"Lowe R, Wu Y, Tamar A, Harb J, Abbeel OpenAI Pieter, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp 6379\u20136390"},{"key":"2034_CR16","unstructured":"Devlin S, Yliniemi L, Kudenko D, Tumer K (2014) Potential-based difference rewards for multiagent reinforcement learning. In: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pp 165\u2013172"},{"key":"2034_CR17","unstructured":"Devlin S, Kudenko D (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp 225\u2013232, International Foundation for Autonomous Agents and Multiagent Systems"},{"issue":"2","key":"2034_CR18","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1007\/s10458-008-9046-9","volume":"17","author":"AK Agogino","year":"2008","unstructured":"Agogino AK, Tumer K (2008) Analyzing and visualizing multiagent rewards in dynamic and stochastic domains. Auton Agent Multi-Agent Syst 17(2):320\u2013338","journal-title":"Auton Agent Multi-Agent Syst"},{"key":"2034_CR19","doi-asserted-by":"crossref","unstructured":"Aotani T, Kobayashi T, Sugimoto K (2018) Bottom-up multi-agent reinforcement learning for selective cooperation. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp 3590\u20133595, IEEE","DOI":"10.1109\/SMC.2018.00607"},{"issue":"4","key":"2034_CR20","doi-asserted-by":"publisher","first-page":"e0172395","DOI":"10.1371\/journal.pone.0172395","volume":"12","author":"A Tampuu","year":"2017","unstructured":"Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PloS one 12(4):e0172395","journal-title":"PloS one"},{"key":"2034_CR21","unstructured":"Foerster JN, Song F, Hughes E, Burch N, Dunning I, Whiteson S, Botvinick M, Bowling M (2018) Bayesian action decoder for deep multi-agent reinforcement learning. arXiv preprint arXiv:1811.01458"},{"key":"2034_CR22","doi-asserted-by":"crossref","unstructured":"Zhang K, Yang Z, Liu H, Zhang T, Ba\u015far T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. arXiv preprint arXiv:1802.08757","DOI":"10.1109\/CDC.2018.8619581"},{"key":"2034_CR23","unstructured":"Iqbal S, Sha F (2018) Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.02912"},{"key":"2034_CR24","unstructured":"Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, P\u00e9rolat J, Silver D, Graepel T (2017) A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in Neural Information Processing Systems, pp 4190\u20134203"},{"key":"2034_CR25","unstructured":"Omidshafiei S, Pazis J, Amato C, How JP, Vian J (2017) Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp 2681\u20132690, JMLR. org"},{"key":"2034_CR26","unstructured":"Da Silva FL, Glatt R, Costa AHR (2017) Simultaneously learning and advising in multiagent reinforcement learning. In: International Conference on Autonomous Agents and MultiAgent Systems, pp 1100\u20131108"},{"key":"2034_CR27","unstructured":"Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. arXiv preprint arXiv:1802.05438"},{"key":"2034_CR28","doi-asserted-by":"crossref","unstructured":"Bu\u015foniu L, Babu\u0161ka R, De Schutter B (2010) Multi-agent reinforcement learning: An overview. In: Innovations in multi-agent systems and applications-1, pp 183\u2013221, Springer","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"2034_CR29","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347"},{"issue":"3-4","key":"2034_CR30","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/BF00992696","volume":"8","author":"RJ Williams","year":"1992","unstructured":"Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8(3-4):229\u2013256","journal-title":"Machine learning"},{"issue":"145","key":"2034_CR31","first-page":"1","volume":"17","author":"H Van Seijen","year":"2016","unstructured":"Van Seijen H, Mahmood AR, Pilarski PM, Machado MC, Sutton RS (2016) True online temporal-difference learning. J Mach Learn Res 17(145):1\u201340","journal-title":"J Mach Learn Res"},{"issue":"12","key":"2034_CR32","doi-asserted-by":"publisher","first-page":"4335","DOI":"10.1007\/s10489-019-01510-8","volume":"49","author":"T Kobayashi","year":"2019","unstructured":"Kobayashi T (2019) Student-t policy in reinforcement learning to acquire global optimum of robot control. Appl Intell 49(12):4335\u20134347","journal-title":"Appl Intell"},{"key":"2034_CR33","unstructured":"Achiam J, Sastry S (2017) Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732"},{"key":"2034_CR34","volume-title":"Seminumerical algorithms, vol. 2: The art of the computer programming","author":"DE Knuth","year":"1981","unstructured":"Knuth DE (1981) Seminumerical algorithms, vol. 2: The art of the computer programming. Addison-Wesley, Boston"},{"issue":"3","key":"2034_CR35","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1016\/0197-2456(86)90046-2","volume":"7","author":"R DerSimonian","year":"1986","unstructured":"DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Controlled clinical trials 7 (3):177\u2013188","journal-title":"Controlled clinical trials"},{"key":"2034_CR36","unstructured":"Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540"},{"issue":"3","key":"2034_CR37","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1016\/j.cosrev.2009.03.005","volume":"3","author":"M Luko\u0161evi\u010dius","year":"2009","unstructured":"Luko\u0161evi\u010dius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Computer Science Review 3(3):127\u2013149","journal-title":"Computer Science Review"},{"key":"2034_CR38","doi-asserted-by":"crossref","unstructured":"Mordatch I, Abbeel P (2018) Emergence of grounded compositional language in multi-agent populations. In: Thirty-Second AAAI Conference on Artificial Intelligence","DOI":"10.1609\/aaai.v32i1.11492"},{"key":"2034_CR39","unstructured":"Khan A, Zhang C, Lee DD, Kumar V, Ribeiro A (2018) Scalable centralized deep multi-agent reinforcement learning via policy gradients. arXiv preprint arXiv:1805.08776"},{"key":"2034_CR40","doi-asserted-by":"crossref","unstructured":"Malysheva A, Kudenko D, Shpilman A (2019) Magnet: Multi-agent graph network for deep multi-agent reinforcement learning. In: 2019 XVI International Symposium\u201d Problems of Redundancy in Information and Control Systems\u201d (REDUNDANCY), pp 171\u2013176, IEEE","DOI":"10.1109\/REDUNDANCY48165.2019.9003345"},{"key":"2034_CR41","unstructured":"Quigley M, Conley K, Gerkey B, Faust J, Foote T, Leibs J, Wheeler R, Ng AY (2009) Ros: an open-source robot operating system. In: ICRA workshop on open source software, 3, p 5, Kobe, Japan"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-020-02034-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-020-02034-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-020-02034-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,10]],"date-time":"2022-12-10T09:22:02Z","timestamp":1670664122000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-020-02034-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,4]]},"references-count":41,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["2034"],"URL":"https:\/\/doi.org\/10.1007\/s10489-020-02034-2","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"type":"print","value":"0924-669X"},{"type":"electronic","value":"1573-7497"}],"subject":[],"published":{"date-parts":[[2021,1,4]]},"assertion":[{"value":"21 October 2020","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}