{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T15:54:40Z","timestamp":1778255680795,"version":"3.51.4"},"reference-count":192,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T00:00:00Z","timestamp":1756252800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T00:00:00Z","timestamp":1756252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001787","name":"University of South Australia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001787","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Multi-Agent Reinforcement Learning (MARL) has become a powerful framework for numerous real-world applications, modeling distributed decision-making and learning from interactions with complex environments. Resource Allocation Optimization (RAO) benefits significantly from MARL\u2019s ability to tackle dynamic and decentralized contexts. MARL-based approaches are increasingly applied to RAO challenges across sectors playing a pivotal role in industry 4.0 developments. This survey provides a comprehensive review of recent MARL algorithms for RAO, encompassing core concepts, classifications, design steps and benchmarks. By outlining the current research landscape and identifying primary challenges and future directions, this survey aims to support researchers and practitioners in leveraging MARL\u2019s potential to advance resource allocation solutions.<\/jats:p>","DOI":"10.1007\/s10462-025-11340-5","type":"journal-article","created":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T05:33:45Z","timestamp":1756272825000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Multi-agent reinforcement learning for resources allocation optimization: a survey"],"prefix":"10.1007","volume":"58","author":[{"given":"Mohamad A.","family":"Hady","sequence":"first","affiliation":[]},{"given":"Siyi","family":"Hu","sequence":"additional","affiliation":[]},{"given":"Mahardhika","family":"Pratama","sequence":"additional","affiliation":[]},{"given":"Zehong","family":"Cao","sequence":"additional","affiliation":[]},{"given":"Ryszard","family":"Kowalczyk","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,8,27]]},"reference":[{"key":"11340_CR1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2023.122029","volume":"353","author":"MS Abid","year":"2024","unstructured":"Abid MS, Apon HJ, Hossain S, Ahmed A, Ahshan R, Lipu MH (2024) A novel multi-objective optimization based multi-agent deep reinforcement learning approach for microgrid resources planning. Appl Energy 353:122029","journal-title":"Appl Energy"},{"issue":"6","key":"11340_CR2","first-page":"101420","volume":"35","author":"M Ahmed","year":"2023","unstructured":"Ahmed M, Liu J, Mirza MA, Khan WU, Al-Wesabi FN (2023) Marl based resource allocation scheme leveraging vehicular cloudlet in automotive-industry 5.0. Journal of King Saud University 35(6):101420","journal-title":"Journal of King Saud University"},{"issue":"1","key":"11340_CR3","doi-asserted-by":"publisher","first-page":"659","DOI":"10.1109\/TII.2020.2977104","volume":"17","author":"M Ahrarinouri","year":"2020","unstructured":"Ahrarinouri M, Rastegar M, Seifi AR (2020) Multiagent reinforcement learning for energy management in residential buildings. IEEE Trans Ind Inf 17(1):659\u2013666","journal-title":"IEEE Trans Ind Inf"},{"issue":"1","key":"11340_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2897165","volume":"49","author":"MR Alam","year":"2016","unstructured":"Alam MR, St-Hilaire M, Kunz T (2016) Computational methods for residential energy cost optimization in smart grids: a survey. ACM Comput Surv 49(1):1\u201334","journal-title":"ACM Comput Surv"},{"key":"11340_CR5","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1023\/A:1010949931021","volume":"102","author":"J Alcaraz","year":"2001","unstructured":"Alcaraz J, Maroto C (2001) A robust genetic algorithm for resource allocation in project scheduling. Ann Oper Res 102:83\u2013109","journal-title":"Ann Oper Res"},{"issue":"2","key":"11340_CR6","doi-asserted-by":"publisher","first-page":"1287","DOI":"10.1109\/TCCN.2022.3155727","volume":"8","author":"MS Allahham","year":"2022","unstructured":"Allahham MS, Abdellatif AA, Mhaisen N, Mohamed A, Erbad A, Guizani M (2022) Multi-agent reinforcement learning for network selection and resource allocation in heterogeneous multi-rat networks. IEEE Trans Cogn Commun Netw 8(2):1287\u20131300","journal-title":"IEEE Trans Cogn Commun Netw"},{"issue":"7","key":"11340_CR7","doi-asserted-by":"publisher","first-page":"7033","DOI":"10.1109\/TVT.2022.3169907","volume":"71","author":"GP Antonio","year":"2022","unstructured":"Antonio GP, Maria-Dolores C (2022) Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrow\u2019s intersections. IEEE Trans Veh Technol 71(7):7033\u20137043","journal-title":"IEEE Trans Veh Technol"},{"issue":"10","key":"11340_CR8","doi-asserted-by":"publisher","first-page":"1259","DOI":"10.1016\/j.jpdc.2006.06.006","volume":"66","author":"G Attiya","year":"2006","unstructured":"Attiya G, Hamam Y (2006) Task allocation for maximizing reliability of distributed systems: a simulated annealing approach. J Parallel Distrib Comput 66(10):1259\u20131266","journal-title":"J Parallel Distrib Comput"},{"issue":"5","key":"11340_CR9","doi-asserted-by":"publisher","first-page":"3774","DOI":"10.1109\/JIOT.2020.3024223","volume":"8","author":"J Bi","year":"2020","unstructured":"Bi J, Yuan H, Duanmu S, Zhou M, Abusorrah A (2020) Energy-optimized partial computation offloading in mobile-edge computing with genetic simulated-annealing-based particle swarm optimization. IEEE Internet Things J 8(5):3774\u20133785","journal-title":"IEEE Internet Things J"},{"issue":"21","key":"11340_CR10","doi-asserted-by":"publisher","first-page":"8099","DOI":"10.3390\/s22218099","volume":"22","author":"SS Binyamin","year":"2022","unstructured":"Binyamin SS, Ben Slama S (2022) Multi-agent systems for resource allocation and scheduling in a smart grid. Sensors 22(21):8099","journal-title":"Sensors"},{"key":"11340_CR11","doi-asserted-by":"crossref","unstructured":"Bratton D, Kennedy J (2007) Defining a standard for particle swarm optimization. In: 2007 IEEE swarm intelligence symposium. IEEE, pp 120\u2013127","DOI":"10.1109\/SIS.2007.368035"},{"issue":"2","key":"11340_CR12","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","volume":"38","author":"L Bu","year":"2008","unstructured":"Bu L, Babu R, De Schutter B et al (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C Appl Rev 38(2):156\u2013172","journal-title":"IEEE Trans Syst Man Cybern C Appl Rev"},{"issue":"7","key":"11340_CR13","doi-asserted-by":"publisher","first-page":"6201","DOI":"10.1109\/JIOT.2020.2968951","volume":"7","author":"Z Cao","year":"2020","unstructured":"Cao Z, Zhou P, Li R, Huang S, Wu D (2020) Multiagent deep reinforcement learning for joint multichannel access and task offloading of mobile-edge computing in industry 4.0. IEEE Internet Things J 7(7):6201\u20136213","journal-title":"IEEE Internet Things J"},{"issue":"2\u20133","key":"11340_CR14","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1016\/S0921-8890(00)00088-9","volume":"33","author":"A Cardon","year":"2000","unstructured":"Cardon A, Galinho T, Vacher JP (2000) Genetic algorithms using multi-objectives in a multi-agent system. Robot Auton Syst 33(2\u20133):179\u2013190","journal-title":"Robot Auton Syst"},{"key":"11340_CR15","doi-asserted-by":"crossref","unstructured":"Cesana M, Malanchini I, Capone A (2008) Modelling network selection and resource allocation in wireless access networks with non-cooperative games. In: 2008 5th IEEE international conference on mobile ad hoc and sensor systems. IEEE, pp 404\u2013409","DOI":"10.1109\/MAHSS.2008.4660055"},{"key":"11340_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2022.118825","volume":"314","author":"F Charbonnier","year":"2022","unstructured":"Charbonnier F, Morstyn T, McCulloch MD (2022) Scalable multi-agent reinforcement learning for distributed control of residential energy flexibility. Appl Energy 314:118825","journal-title":"Appl Energy"},{"issue":"12","key":"11340_CR17","doi-asserted-by":"publisher","first-page":"3579","DOI":"10.1109\/JSAC.2021.3118346","volume":"39","author":"M Chen","year":"2021","unstructured":"Chen M, G\u00fcnd\u00fcz D, Huang K, Saad W, Bennis M, Feljan AV, Poor HV (2021) Distributed learning in wireless networks: recent progress and future challenges. IEEE J Sel Areas Commun 39(12):3579\u20133605","journal-title":"IEEE J Sel Areas Commun"},{"issue":"7","key":"11340_CR18","doi-asserted-by":"publisher","first-page":"838","DOI":"10.1111\/mice.12702","volume":"36","author":"S Chen","year":"2021","unstructured":"Chen S, Dong J, Ha P, Li Y, Labi S (2021) Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles. Comput-Aided Civil Infrastruct Eng 36(7):838\u2013857","journal-title":"Comput-Aided Civil Infrastruct Eng"},{"issue":"4","key":"11340_CR19","doi-asserted-by":"publisher","first-page":"2656","DOI":"10.1109\/TSG.2022.3228636","volume":"14","author":"P Chen","year":"2022","unstructured":"Chen P, Liu S, Wang X, Kamwa I (2022) Physics-shielded multi-agent deep reinforcement learning for safe active voltage control with photovoltaic\/battery energy storage systems. IEEE Trans Smart Grid 14(4):2656\u20132667","journal-title":"IEEE Trans Smart Grid"},{"issue":"11","key":"11340_CR20","doi-asserted-by":"publisher","first-page":"11623","DOI":"10.1109\/TITS.2023.3285442","volume":"24","author":"D Chen","year":"2023","unstructured":"Chen D, Hajidavalloo MR, Li Z, Chen K, Wang Y, Jiang L, Wang Y (2023) Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic. IEEE Trans Intell Transp Syst 24(11):11623\u201311638","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"11340_CR21","doi-asserted-by":"publisher","DOI":"10.1109\/TNSE.2024.3375374","author":"Y Chen","year":"2024","unstructured":"Chen Y, Sun Y, Yu H, Taleb T (2024) Joint task and computing resource allocation in distributed edge computing systems via multi-agent deep reinforcement learning. IEEE Trans Netw Sci Eng. https:\/\/doi.org\/10.1109\/TNSE.2024.3375374","journal-title":"IEEE Trans Netw Sci Eng"},{"key":"11340_CR22","doi-asserted-by":"crossref","unstructured":"Costa B, Carvalho L, Rosa M, Araujo A, et\u00a0al (2022) Computational resource allocation in fog computing: a comprehensive survey. ACM computing surveys","DOI":"10.1145\/3486221"},{"key":"11340_CR23","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.15164624","author":"Y Deng","year":"2025","unstructured":"Deng Y (2025) A reinforcement learning approach to traffic scheduling in complex data center topologies. J Comput Technol Softw. https:\/\/doi.org\/10.5281\/zenodo.15164624","journal-title":"J Comput Technol Softw"},{"issue":"3","key":"11340_CR24","doi-asserted-by":"publisher","first-page":"1900","DOI":"10.1109\/TWC.2022.3207918","volume":"22","author":"X Du","year":"2022","unstructured":"Du X, Wang T, Feng Q, Ye C, Tao T, Wang L, Shi Y, Chen M (2022) Multi-agent reinforcement learning for dynamic resource management in 6G in-X subnetworks. IEEE Trans Wireless Commun 22(3):1900\u20131914","journal-title":"IEEE Trans Wireless Commun"},{"issue":"2","key":"11340_CR25","doi-asserted-by":"publisher","first-page":"134","DOI":"10.3934\/jdg.2024017","volume":"12","author":"S Erg\u00fcn","year":"2025","unstructured":"Erg\u00fcn S (2025) Resource allocation optimization for effective vehicle network communications using multi-agent deep reinforcement learning. J Dyn Games 12(2):134\u2013156","journal-title":"J Dyn Games"},{"issue":"2","key":"11340_CR26","doi-asserted-by":"publisher","first-page":"1226","DOI":"10.1109\/COMST.2021.3063822","volume":"23","author":"A Feriani","year":"2021","unstructured":"Feriani A, Hossain E (2021) Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: a tutorial. IEEE Commun Surv Tutor 23(2):1226\u20131252","journal-title":"IEEE Commun Surv Tutor"},{"key":"11340_CR27","unstructured":"Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. PMLR, pp 1126\u20131135"},{"issue":"3","key":"11340_CR28","doi-asserted-by":"publisher","first-page":"692","DOI":"10.1109\/TPDS.2020.3030920","volume":"32","author":"X Gao","year":"2020","unstructured":"Gao X, Liu R, Kaushik A (2020) Hierarchical multi-agent optimization for resource allocation in cloud computing. IEEE Trans Parallel Distrib Syst 32(3):692\u2013707","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"8","key":"11340_CR29","doi-asserted-by":"publisher","first-page":"6818","DOI":"10.1109\/JIOT.2022.3228246","volume":"10","author":"Z Gao","year":"2022","unstructured":"Gao Z, Yang L, Dai Y (2022) Fast adaptive task offloading and resource allocation via multiagent reinforcement learning in heterogeneous vehicular fog computing. IEEE Internet Things J 10(8):6818\u20136835","journal-title":"IEEE Internet Things J"},{"issue":"6","key":"11340_CR30","doi-asserted-by":"publisher","first-page":"3425","DOI":"10.1109\/TMC.2022.3141080","volume":"22","author":"Z Gao","year":"2022","unstructured":"Gao Z, Yang L, Dai Y (2022) Large-scale computation offloading using a multi-agent reinforcement learning in heterogeneous multi-access edge computing. IEEE Trans Mob Comput 22(6):3425\u20133443","journal-title":"IEEE Trans Mob Comput"},{"key":"11340_CR31","doi-asserted-by":"publisher","first-page":"2303","DOI":"10.1109\/JIOT.2023.3292387","volume":"11","author":"Z Gao","year":"2023","unstructured":"Gao Z, Yang L, Dai Y (2023) Large-scale cooperative task offloading and resource allocation in heterogeneous MEC systems via multi-agent reinforcement learning. IEEE Internet Things J 11:2303","journal-title":"IEEE Internet Things J"},{"key":"11340_CR32","doi-asserted-by":"crossref","unstructured":"Gao H, Wang X, Wei W, Al-Dulaimi A, Xu Y (2023) Com-ddpg: task offloading based on multiagent reinforcement learning for information-communication-enhanced mobile edge computing in the internet of vehicles. IEEE transactions on vehicular technology","DOI":"10.1109\/TVT.2023.3309321"},{"issue":"6","key":"11340_CR33","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1109\/TEVC.2012.2185052","volume":"16","author":"YJ Gong","year":"2012","unstructured":"Gong YJ, Zhang J, Chung HSH, Chen WN, Zhan ZH, Li Y, Shi YH (2012) An efficient resource allocation scheme using particle swarm optimization. IEEE Trans Evol Comput 16(6):801\u2013816","journal-title":"IEEE Trans Evol Comput"},{"issue":"1","key":"11340_CR34","doi-asserted-by":"publisher","DOI":"10.1186\/1478-7547-10-9","volume":"10","author":"LA Guindo","year":"2012","unstructured":"Guindo LA, Wagner M, Baltussen R, Rindress D, van Til J, Kind P, Goetghebeur MM (2012) From efficacy to equity: literature review of decision criteria for resource allocation and healthcare decisionmaking. Cost Eff Resour Alloc 10(1):9. https:\/\/doi.org\/10.1186\/1478-7547-10-9","journal-title":"Cost Eff Resour Alloc"},{"issue":"11","key":"11340_CR35","doi-asserted-by":"publisher","first-page":"13124","DOI":"10.1109\/TVT.2020.3020400","volume":"69","author":"D Guo","year":"2020","unstructured":"Guo D, Tang L, Zhang X, Liang YC (2020) Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning. IEEE Trans Veh Technol 69(11):13124\u201313138","journal-title":"IEEE Trans Veh Technol"},{"key":"11340_CR36","doi-asserted-by":"crossref","unstructured":"Guo J, Chen Y, Hao Y, Yin Z, Yu Y, Li S (2022) Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 115\u2013122","DOI":"10.1109\/CVPRW56347.2022.00022"},{"issue":"3","key":"11340_CR37","doi-asserted-by":"publisher","first-page":"627","DOI":"10.1109\/JSAC.2019.2894305","volume":"37","author":"H Halabian","year":"2019","unstructured":"Halabian H (2019) Distributed resource allocation optimization in 5G virtualized networks. IEEE J Sel Areas Commun 37(3):627\u2013642","journal-title":"IEEE J Sel Areas Commun"},{"key":"11340_CR38","first-page":"709","volume":"4","author":"EA Hansen","year":"2004","unstructured":"Hansen EA, Bernstein DS, Zilberstein S (2004) Dynamic programming for partially observable stochastic games. In AAAI 4:709\u2013715","journal-title":"In AAAI"},{"key":"11340_CR39","doi-asserted-by":"crossref","unstructured":"Hao J, Yang T, Tang H, Bai C, Liu J, Meng Z, Liu P, Wang Z (2023) Exploration in deep reinforcement learning: from single-agent to multiagent domain. IEEE transactions on neural networks and learning systems","DOI":"10.1109\/TNNLS.2023.3236361"},{"key":"11340_CR40","doi-asserted-by":"crossref","unstructured":"Heik D, Bahrpeyma F, Reichelt D (2024) Adaptive manufacturing: dynamic resource allocation using multi-agent reinforcement learning","DOI":"10.33968\/2024.52"},{"key":"11340_CR41","doi-asserted-by":"crossref","unstructured":"Herrmann A, Stephenson M, Schaub H (2023) Reinforcement learning for multi-satellite agile earth observing scheduling under various communication assumptions. In: AAS Rocky Mountain GN &C conference","DOI":"10.1109\/TAES.2023.3251307"},{"issue":"1","key":"11340_CR42","doi-asserted-by":"publisher","first-page":"114","DOI":"10.2514\/1.A35736","volume":"61","author":"A Herrmann","year":"2024","unstructured":"Herrmann A, Stephenson MA, Schaub H (2024) Single-agent reinforcement learning for scalable earth-observing satellite constellation operations. J Spacecr Rocket 61(1):114\u2013132","journal-title":"J Spacecr Rocket"},{"key":"11340_CR43","first-page":"32438","volume":"35","author":"Y Hong","year":"2022","unstructured":"Hong Y, Jin Y, Tang Y (2022) Rethinking individual global max in cooperative multi-agent reinforcement learning. Adv Neural Inf Process Syst 35:32438\u201332449","journal-title":"Adv Neural Inf Process Syst"},{"key":"11340_CR44","first-page":"1039","volume":"4","author":"J Hu","year":"2003","unstructured":"Hu J, Wellman MP (2003) Nash q-learning for general-sum stochastic games. J Mach Learn Res 4:1039\u20131069","journal-title":"J Mach Learn Res"},{"issue":"11","key":"11340_CR45","doi-asserted-by":"publisher","first-page":"6807","DOI":"10.1109\/TCOMM.2020.3013599","volume":"68","author":"J Hu","year":"2020","unstructured":"Hu J, Zhang H, Song L, Schober R, Poor HV (2020) Cooperative internet of UAVs: distributed trajectory design by multi-agent deep reinforcement learning. IEEE Trans Commun 68(11):6807\u20136821","journal-title":"IEEE Trans Commun"},{"key":"11340_CR46","unstructured":"Hu S, Zhu F, Chang X, Liang X (2021) Updet: universal multi-agent reinforcement learning via policy decoupling with transformers. arXiv:2101.08001"},{"key":"11340_CR47","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2024.3432728","author":"B Hu","year":"2024","unstructured":"Hu B, Zhang W, Gao Y, Du J, Chu X (2024) Multi-agent deep deterministic policy gradient-based computation offloading and resource allocation for ISAC-aided 6G V2X networks. IEEE Internet Things J. https:\/\/doi.org\/10.1109\/JIOT.2024.3432728","journal-title":"IEEE Internet Things J"},{"key":"11340_CR48","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijepes.2023.109531","volume":"155","author":"D Hu","year":"2024","unstructured":"Hu D, Li Z, Ye Z, Peng Y, Xi W, Cai T (2024) Multi-agent graph reinforcement learning for decentralized volt-var control in power distribution systems. Int J Electr Power Energy Syst 155:109531","journal-title":"Int J Electr Power Energy Syst"},{"key":"11340_CR49","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2025.3574280","author":"M Hua","year":"2025","unstructured":"Hua M, Qi X, Chen D, Jiang K, Liu ZE, Sun H, Zhou Q, Xu H (2025) Multi-agent reinforcement learning for connected and automated vehicles control: recent advancements and future prospects. IEEE Trans Automat Sci Eng. https:\/\/doi.org\/10.1109\/TASE.2025.3574280","journal-title":"IEEE Trans Automat Sci Eng"},{"issue":"11","key":"11340_CR50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3570326","volume":"55","author":"B Huang","year":"2023","unstructured":"Huang B, Zhou M, Lu XS, Abusorrah A (2023) Scheduling of resource allocation systems with timed petri nets: a survey. ACM Comput Surv 55(11):1\u201327","journal-title":"ACM Comput Surv"},{"key":"11340_CR51","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2025.3527016","author":"S Hwang","year":"2025","unstructured":"Hwang S, Lee H, Kim M, Lee I (2025) Multi-agent deep reinforcement learning for decentralized multi-UAV mobile edge computing networks. IEEE Internet Things J. https:\/\/doi.org\/10.1109\/JIOT.2025.3527016","journal-title":"IEEE Internet Things J"},{"key":"11340_CR52","volume-title":"Resource allocation problems: algorithmic approaches","author":"T Ibaraki","year":"1988","unstructured":"Ibaraki T, Katoh N (1988) Resource allocation problems: algorithmic approaches. MIT press, Cambridge"},{"issue":"1","key":"11340_CR53","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1007\/s10922-022-09696-y","volume":"31","author":"V Jain","year":"2023","unstructured":"Jain V, Kumar B (2023) Qos-aware task offloading in fog environment using multi-agent deep reinforcement learning. J Netw Syst Manage 31(1):7","journal-title":"J Netw Syst Manage"},{"key":"11340_CR54","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2024.3360448","author":"A Jayanetti","year":"2024","unstructured":"Jayanetti A, Halgamuge S, Buyya R (2024) Multi-agent deep reinforcement learning framework for renewable energy-aware workflow scheduling on distributed cloud data centers. IEEE Trans Parallel Distrib Syst. https:\/\/doi.org\/10.1109\/TPDS.2024.3360448","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"11340_CR55","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2022.120500","volume":"332","author":"I Jendoubi","year":"2023","unstructured":"Jendoubi I, Bouffard F (2023) Multi-agent hierarchical reinforcement learning for energy management. Appl Energy 332:120500","journal-title":"Appl Energy"},{"issue":"10","key":"11340_CR56","doi-asserted-by":"publisher","first-page":"13447","DOI":"10.1109\/TVT.2023.3275546","volume":"72","author":"Y Ji","year":"2023","unstructured":"Ji Y, Wang Y, Zhao H, Gui G, Gacanin H, Sari H, Adachi F (2023) Multi-agent reinforcement learning resources allocation method using dueling double deep q-network in vehicular networks. IEEE Trans Veh Technol 72(10):13447\u201313460","journal-title":"IEEE Trans Veh Technol"},{"issue":"6","key":"11340_CR57","doi-asserted-by":"publisher","first-page":"1421","DOI":"10.23919\/JSEE.2021.000121","volume":"32","author":"Z Jiandong","year":"2021","unstructured":"Jiandong Z, Qiming Y, Guoqing S, Yi L, Yong W (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32(6):1421\u20131438","journal-title":"J Syst Eng Electron"},{"issue":"2","key":"11340_CR58","doi-asserted-by":"publisher","first-page":"585","DOI":"10.1109\/TPDS.2015.2407900","volume":"27","author":"Y Jiang","year":"2015","unstructured":"Jiang Y (2015) A survey of task allocation and load balancing in distributed systems. IEEE Trans Parallel Distrib Syst 27(2):585\u2013599","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"3","key":"11340_CR59","doi-asserted-by":"publisher","first-page":"6520","DOI":"10.1016\/j.eswa.2008.07.036","volume":"36","author":"C Jiang","year":"2009","unstructured":"Jiang C, Sheng Z (2009) Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst Appl 36(3):6520\u20136526","journal-title":"Expert Syst Appl"},{"key":"11340_CR60","doi-asserted-by":"crossref","unstructured":"Jiang W, Zhan Y, Fang X (2025) Satellite edge computing for mobile multimedia communications: a multi-agent federated reinforcement learning approach. ACM Transactions on Autonomous and Adaptive Systems","DOI":"10.1145\/3715146"},{"issue":"1","key":"11340_CR61","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1007\/s10845-022-02037-5","volume":"35","author":"X Jing","year":"2024","unstructured":"Jing X, Yao X, Liu M, Zhou J (2024) Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling. J Intell Manuf 35(1):75\u201393","journal-title":"J Intell Manuf"},{"issue":"5","key":"11340_CR62","doi-asserted-by":"publisher","first-page":"5555","DOI":"10.1109\/TITS.2023.3242997","volume":"24","author":"Y Ju","year":"2023","unstructured":"Ju Y, Chen Y, Cao Z, Liu L, Pei Q, Xiao M, Ota K, Dong M, Leung VC (2023) Joint secure offloading and resource allocation for vehicular edge computing network: a multi-agent deep reinforcement learning approach. IEEE Trans Intell Transp Syst 24(5):5555\u20135569","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"1","key":"11340_CR63","doi-asserted-by":"publisher","first-page":"192","DOI":"10.3390\/electronics14010192","volume":"14","author":"W Jun-Han","year":"2025","unstructured":"Jun-Han W, He H, Cha J, Jeong I, Chang-Jun A (2025) Multi-agent reinforcement learning for efficient resource allocation in internet of vehicles. Electronics 14(1):192","journal-title":"Electronics"},{"issue":"12","key":"11340_CR64","doi-asserted-by":"publisher","first-page":"10497","DOI":"10.1109\/JIOT.2023.3240173","volume":"10","author":"H Kang","year":"2023","unstructured":"Kang H, Chang X, Mi\u0161i\u0107 J, Mi\u0161i\u0107 VB, Fan J, Liu Y (2023) Cooperative UAV resource allocation and task offloading in hierarchical aerial computing systems: a mappo-based approach. IEEE Internet Things J 10(12):10497\u201310509","journal-title":"IEEE Internet Things J"},{"key":"11340_CR65","doi-asserted-by":"crossref","unstructured":"Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN\u201995-international conference on neural networks, vol 4. IEEE, pp 1942\u20131948","DOI":"10.1109\/ICNN.1995.488968"},{"key":"11340_CR66","doi-asserted-by":"crossref","unstructured":"Khan SU, Ahmad I, (2006) Non-cooperative, semi-cooperative, and cooperative games-based grid resource allocation. In: Proceedings 20th IEEE international parallel & distributed processing symposium. IEEE, pp 10\u2013pp","DOI":"10.1109\/IPDPS.2006.1639358"},{"issue":"7","key":"11340_CR67","doi-asserted-by":"publisher","first-page":"6964","DOI":"10.1109\/TVT.2019.2915194","volume":"68","author":"AA Khan","year":"2019","unstructured":"Khan AA, Abolhasan M, Ni W, Lipman J, Jamalipour A (2019) A hybrid-fuzzy logic guided genetic algorithm (h-flga) approach for resource optimization in 5g vanets. IEEE Trans Veh Technol 68(7):6964\u20136974","journal-title":"IEEE Trans Veh Technol"},{"issue":"15","key":"11340_CR68","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.7995","volume":"36","author":"M Khani","year":"2024","unstructured":"Khani M, Sadr MM, Jamali S (2024) Deep reinforcement learning-based resource allocation in multi-access edge computing. Concurr Comput Pract Exp 36(15):e7995","journal-title":"Concurr Comput Pract Exp"},{"key":"11340_CR69","doi-asserted-by":"publisher","first-page":"56178","DOI":"10.1109\/ACCESS.2021.3072435","volume":"9","author":"Y Kim","year":"2021","unstructured":"Kim Y, Lim H (2021) Multi-agent reinforcement learning-based resource management for end-to-end network slicing. IEEE Access 9:56178\u201356190. https:\/\/doi.org\/10.1109\/ACCESS.2021.3072435","journal-title":"IEEE Access"},{"issue":"4598","key":"11340_CR70","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1126\/science.220.4598.671","volume":"220","author":"S Kirkpatrick","year":"1983","unstructured":"Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671\u2013680","journal-title":"Science"},{"key":"11340_CR71","unstructured":"Konda V, Tsitsiklis J (1999) Actor-critic algorithms. Adv Neural Inf Process Syst 12"},{"issue":"1","key":"11340_CR72","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1007\/s10479-022-04612-8","volume":"339","author":"F Kosanoglu","year":"2024","unstructured":"Kosanoglu F, Atmis M, Turan HH (2024) A deep reinforcement learning assisted simulated annealing algorithm for a maintenance planning problem. Ann Oper Res 339(1):79\u2013110","journal-title":"Ann Oper Res"},{"key":"11340_CR73","doi-asserted-by":"publisher","DOI":"10.1016\/j.jobe.2024.109031","volume":"87","author":"A Kumari","year":"2024","unstructured":"Kumari A, Kakkar R, Tanwar S, Garg D, Polkowski Z, Alqahtani F, Tolba A (2024) Multi-agent-based decentralized residential energy management using deep reinforcement learning. J Build Eng 87:109031","journal-title":"J Build Eng"},{"key":"11340_CR74","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2024.124431","volume":"377","author":"N Lee","year":"2025","unstructured":"Lee N, Woo J, Kim S (2025) A deep reinforcement learning ensemble for maintenance scheduling in offshore wind farms. Appl Energy 377:124431","journal-title":"Appl Energy"},{"issue":"3","key":"11340_CR75","doi-asserted-by":"publisher","first-page":"1722","DOI":"10.1109\/COMST.2020.2988367","volume":"22","author":"L Lei","year":"2020","unstructured":"Lei L, Tan Y, Zheng K, Liu S, Zhang K, Shen X (2020) Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Commun Surv Tutor 22(3):1722\u20131760","journal-title":"IEEE Commun Surv Tutor"},{"issue":"2","key":"11340_CR76","doi-asserted-by":"publisher","first-page":"1240","DOI":"10.1109\/COMST.2022.3160697","volume":"24","author":"T Li","year":"2022","unstructured":"Li T, Zhu K, Luong NC, Niyato D, Wu Q, Zhang Y, Chen B (2022) Applications of multi-agent reinforcement learning in future internet: a comprehensive survey. IEEE Commun Surv Tutor 24(2):1240\u20131279","journal-title":"IEEE Commun Surv Tutor"},{"issue":"8","key":"11340_CR77","doi-asserted-by":"publisher","first-page":"8810","DOI":"10.1109\/TVT.2022.3173057","volume":"71","author":"X Li","year":"2022","unstructured":"Li X, Lu L, Ni W, Jamalipour A, Zhang D, Du H (2022) Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans Veh Technol 71(8):8810\u20138824","journal-title":"IEEE Trans Veh Technol"},{"key":"11340_CR78","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2025.3535148","author":"S Li","year":"2025","unstructured":"Li S, Jin J, Afrin M, Ge X, Fu J, Tian YC (2025) Mobility-as-a-resilience-service in internet of robotic things through robust multi-agent deep reinforcement learning. IEEE Internet Things J. https:\/\/doi.org\/10.1109\/JIOT.2025.3535148","journal-title":"IEEE Internet Things J"},{"issue":"12","key":"11340_CR79","doi-asserted-by":"publisher","first-page":"2785","DOI":"10.1109\/LCOMM.2020.3019437","volume":"24","author":"X Liao","year":"2020","unstructured":"Liao X, Hu X, Liu Z, Ma S, Xu L, Li X, Wang W, Ghannouchi FM (2020) Distributed intelligence: a verification for multi-agent drl-based multibeam satellite resource allocation. IEEE Commun Lett 24(12):2785\u20132789","journal-title":"IEEE Commun Lett"},{"issue":"3","key":"11340_CR80","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1007\/s10845-015-1124-7","volume":"29","author":"JT Lin","year":"2018","unstructured":"Lin JT, Chiu CC (2018) A hybrid particle swarm optimization with local search for stochastic resource allocation problem. J Intell Manuf 29(3):481\u2013495","journal-title":"J Intell Manuf"},{"key":"11340_CR81","doi-asserted-by":"publisher","DOI":"10.1016\/j.vehcom.2025.100895","volume":"53","author":"Z Liu","year":"2025","unstructured":"Liu Z, Deng Y (2025) Resource allocation strategy for vehicular communication networks based on multi-agent deep reinforcement learning. Vehicular Commun 53:100895","journal-title":"Vehicular Commun"},{"issue":"2","key":"11340_CR82","doi-asserted-by":"publisher","first-page":"1000","DOI":"10.1109\/TCYB.2022.3193888","volume":"53","author":"XF Liu","year":"2022","unstructured":"Liu XF, Zhang J, Wang J (2022) Cooperative particle swarm optimization with a bilevel resource allocation mechanism for large-scale dynamic optimization. IEEE Trans Cybern 53(2):1000\u20131011","journal-title":"IEEE Trans Cybern"},{"key":"11340_CR83","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2023.102605","volume":"84","author":"Y Liu","year":"2023","unstructured":"Liu Y, Fan J, Zhao L, Shen W, Zhang C (2023) Integration of deep reinforcement learning and multi-agent system for dynamic scheduling of re-entrant hybrid flow shop considering worker fatigue and skill levels. Robotics and Computer-Integrated Manufacturing 84:102605","journal-title":"Robotics and Computer-Integrated Manufacturing"},{"key":"11340_CR84","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2024.3371791","author":"P Liu","year":"2024","unstructured":"Liu P, An K, Lei J, Sun Y, Liu W, Chatzinotas S (2024) Computation rate maximization for SCMA-aided edge computing in IoT networks: a multi-agent reinforcement learning approach. IEEE Trans Wireless Commun. https:\/\/doi.org\/10.1109\/TWC.2024.3371791","journal-title":"IEEE Trans Wireless Commun"},{"key":"11340_CR85","doi-asserted-by":"crossref","unstructured":"Lotfi F, Afghah F (2025) Meta reinforcement learning approach for adaptive resource optimization in o-ran. In: 2025 IEEE wireless communications and networking conference (WCNC). IEEE, pp 1\u20136","DOI":"10.1109\/WCNC61545.2025.10978365"},{"key":"11340_CR86","unstructured":"Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379\u20136390"},{"key":"11340_CR87","doi-asserted-by":"publisher","DOI":"10.1016\/j.energy.2023.127087","volume":"271","author":"Y Lu","year":"2023","unstructured":"Lu Y, Xiang Y, Huang Y, Yu B, Weng L, Liu J (2023) Deep reinforcement learning based optimal scheduling of active distribution system considering distributed generation, energy storage and flexible load. Energy 271:127087","journal-title":"Energy"},{"key":"11340_CR88","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s42256-024-00796-9","volume":"6","author":"C Ma","year":"2024","unstructured":"Ma C, Li A, Du Y, Dong H, Yang Y (2024) Efficient and scalable reinforcement learning for large-scale network control. Nat Mach Intell 6:1\u201315","journal-title":"Nat Mach Intell"},{"key":"11340_CR89","doi-asserted-by":"crossref","unstructured":"Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50\u201356","DOI":"10.1145\/3005745.3005750"},{"key":"11340_CR90","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2024.3486156","author":"R Mei","year":"2024","unstructured":"Mei R, Wang Z (2024) Multi-agent deep reinforcement learning-based resource allocation for cognitive radio networks. IEEE Trans Veh Technol. https:\/\/doi.org\/10.1109\/TVT.2024.3486156","journal-title":"IEEE Trans Veh Technol"},{"key":"11340_CR91","first-page":"13367","volume":"35","author":"D Melcer","year":"2022","unstructured":"Melcer D, Amato C, Tripakis S (2022) Shield decentralization for safe multi-agent reinforcement learning. Adv Neural Inf Process Syst 35:13367\u201313379","journal-title":"Adv Neural Inf Process Syst"},{"issue":"10","key":"11340_CR92","doi-asserted-by":"publisher","first-page":"6255","DOI":"10.1109\/TWC.2020.3001736","volume":"19","author":"F Meng","year":"2020","unstructured":"Meng F, Chen P, Wu L, Cheng J (2020) Power allocation in multi-user cellular networks: deep reinforcement learning approaches. IEEE Trans Wireless Commun 19(10):6255\u20136267","journal-title":"IEEE Trans Wireless Commun"},{"key":"11340_CR93","unstructured":"Mnih V, Badia AP, Mirza M, GravesA, Lillicrap T, HarleyT, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1928\u20131937"},{"key":"11340_CR94","doi-asserted-by":"crossref","unstructured":"Mondal A, Mishra D, Alexandropoulos GC, Al-Nahari A, J\u00e4ntti R (2025) Multi-agent reinforcement learning for offloading cellular communications with cooperating UAVs. IEEE Transactions on aerospace and electronic systems","DOI":"10.1109\/TAES.2025.3554150"},{"issue":"6","key":"11340_CR95","doi-asserted-by":"publisher","first-page":"3507","DOI":"10.1109\/TWC.2021.3051163","volume":"20","author":"N Naderializadeh","year":"2021","unstructured":"Naderializadeh N, Sydir JJ, Simsek M, Nikopour H (2021) Resource management in wireless networks via multi-agent deep reinforcement learning. IEEE Trans Wireless Commun 20(6):3507\u20133523","journal-title":"IEEE Trans Wireless Commun"},{"issue":"1","key":"11340_CR96","doi-asserted-by":"publisher","DOI":"10.1111\/exsy.13362","volume":"42","author":"S Nagarajan","year":"2025","unstructured":"Nagarajan S, Rani PS, Vinmathi M, Subba Reddy V, Saleth ALM, Abdus Subhahan D (2025) Multi agent deep reinforcement learning for resource allocation in container-based clouds environments. Expert Syst 42(1):e13362","journal-title":"Expert Syst"},{"key":"11340_CR97","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s40866-018-0052-y","volume":"3","author":"AS Nair","year":"2018","unstructured":"Nair AS, Hossen T, Campion M, Selvaraj DF, Goveas N, Kaabouch N, Ranganathan P (2018) Multi-agent systems for resource allocation and scheduling in a smart grid. Technol Econ Smart Grids Sustain Energy 3:1\u201315","journal-title":"Technol Econ Smart Grids Sustain Energy"},{"issue":"10","key":"11340_CR98","doi-asserted-by":"publisher","first-page":"2239","DOI":"10.1109\/JSAC.2019.2933973","volume":"37","author":"YS Nasir","year":"2019","unstructured":"Nasir YS, Guo D (2019) Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE J Sel Areas Commun 37(10):2239\u20132250","journal-title":"IEEE J Sel Areas Commun"},{"issue":"9","key":"11340_CR99","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1109\/TCYB.2020.2977374","volume":"50","author":"TT Nguyen","year":"2020","unstructured":"Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826\u20133839","journal-title":"IEEE Trans Cybern"},{"key":"11340_CR100","doi-asserted-by":"publisher","DOI":"10.1016\/j.jai.2024.02.003","author":"Z Ning","year":"2024","unstructured":"Ning Z, Xie L (2024) A survey on multi-agent reinforcement learning and its application. J Autom Intell. https:\/\/doi.org\/10.1016\/j.jai.2024.02.003","journal-title":"J Autom Intell"},{"issue":"2","key":"11340_CR101","doi-asserted-by":"publisher","first-page":"701","DOI":"10.1109\/TITS.2020.3019322","volume":"23","author":"M Noor-A-Rahim","year":"2020","unstructured":"Noor-A-Rahim M, Liu Z, Lee H, Ali GMN, Pesch D, Xiao P (2020) A survey on resource allocation in vehicular networks. IEEE Trans Intell Transp Syst 23(2):701\u2013721","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"7","key":"11340_CR102","doi-asserted-by":"publisher","DOI":"10.3390\/s23073625","volume":"23","author":"J Orr","year":"2023","unstructured":"Orr J, Dutta A (2023) Multi-agent deep reinforcement learning for multi-robot applications: a survey. Sensors (Basel) 23(7):3625","journal-title":"Sensors (Basel)"},{"issue":"8","key":"11340_CR103","doi-asserted-by":"publisher","first-page":"9880","DOI":"10.1109\/TVT.2023.3259688","volume":"72","author":"M Parvini","year":"2023","unstructured":"Parvini M, Javan MR, Mokari N, Abbasi B, Jorswieck EA (2023) Aoi-aware resource allocation for platoon-based c-v2x networks via multi-agent multi-task reinforcement learning. IEEE Trans Veh Technol 72(8):9880\u20139896","journal-title":"IEEE Trans Veh Technol"},{"issue":"1","key":"11340_CR104","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ejor.2006.12.006","volume":"185","author":"M Patriksson","year":"2008","unstructured":"Patriksson M (2008) A survey on the continuous nonlinear resource allocation problem. Eur J Oper Res 185(1):1\u201346","journal-title":"Eur J Oper Res"},{"key":"11340_CR105","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1007\/978-3-031-53969-5_7","volume-title":"Machine learning, optimization, and data science","author":"A Pendyala","year":"2024","unstructured":"Pendyala A, Dettmer J, Glasmachers T, Atamna A (2024) Containergym: a real-world reinforcement learning benchmark for resource allocation. In: Nicosia G, Ojha V, La Malfa E, La Malfa G, Pardalos PM, Umeton R (eds) Machine learning, optimization, and data science. Springer, Cham, pp 78\u201392"},{"key":"11340_CR106","doi-asserted-by":"publisher","DOI":"10.1109\/TAES.2024.3418944","author":"N Rao","year":"2024","unstructured":"Rao N, Xu H, Qi Z, Wang D, Zhang Y (2024) Fast adaptive jamming resource allocation against frequency-hopping spread spectrum in wireless sensor networks via meta deep reinforcement learning. IEEE Trans Aerosp Electron Syst. https:\/\/doi.org\/10.1109\/TAES.2024.3418944","journal-title":"IEEE Trans Aerosp Electron Syst"},{"issue":"178","key":"11340_CR107","first-page":"1","volume":"21","author":"T Rashid","year":"2020","unstructured":"Rashid T, Samvelyan M, De Witt CS, Farquhar G, Foerster J, Whiteson S (2020) Monotonic value function factorisation for deep multi-agent reinforcement learning. J Mach Learn Res 21(178):1\u201351","journal-title":"J Mach Learn Res"},{"issue":"9","key":"11340_CR108","doi-asserted-by":"publisher","first-page":"16410","DOI":"10.1109\/TITS.2022.3150151","volume":"23","author":"L Ren","year":"2022","unstructured":"Ren L, Fan X, Cui J, Shen Z, Lv Y, Xiong G (2022) A multi-agent reinforcement learning method with route recorders for vehicle routing in supply chain management. IEEE Trans Intell Transp Syst 23(9):16410\u201316420","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"19","key":"11340_CR109","doi-asserted-by":"publisher","DOI":"10.3390\/app10196900","volume":"10","author":"M Roesch","year":"2020","unstructured":"Roesch M, Linder C, Zimmermann R, Rudolf A, Hohmann A, Reinhart G (2020) Smart grid for industry using multi-agent reinforcement learning. Appl Sci 10(19):6900","journal-title":"Appl Sci"},{"issue":"3","key":"11340_CR110","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1016\/S0038-0121(02)00039-3","volume":"37","author":"TL Saaty","year":"2003","unstructured":"Saaty TL, Vargas LG, Dellmann K (2003) The allocation of intangible resources: the analytic hierarchy process and linear programming. Socioecon Plann Sci 37(3):169\u2013184","journal-title":"Socioecon Plann Sci"},{"issue":"2","key":"11340_CR111","doi-asserted-by":"publisher","first-page":"450","DOI":"10.1016\/j.dcan.2022.03.003","volume":"9","author":"K Sadatdiynov","year":"2023","unstructured":"Sadatdiynov K, Cui L, Zhang L, Huang JZ, Salloum S, Mahmud MS (2023) A review of optimization methods for computation offloading in edge computing networks. Digit Commun Netw 9(2):450\u2013461","journal-title":"Digit Commun Netw"},{"key":"11340_CR112","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2023.109720","volume":"227","author":"A Sarah","year":"2023","unstructured":"Sarah A, Nencioni G, Khan MMI (2023) Resource allocation in multi-access edge computing for 5g-and-beyond networks. Comput Netw 227:109720","journal-title":"Comput Netw"},{"key":"11340_CR113","unstructured":"Schulman J (2015) Trust region policy optimization. arXiv:1502.05477"},{"key":"11340_CR114","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347"},{"issue":"4","key":"11340_CR115","doi-asserted-by":"publisher","first-page":"4531","DOI":"10.1109\/TNSM.2021.3096673","volume":"18","author":"AM Seid","year":"2021","unstructured":"Seid AM, Boateng GO, Mareri B, Sun G, Jiang W (2021) Multi-agent DRL for task offloading and resource allocation in multi-UAV enabled IoT edge network. IEEE Trans Netw Serv Manag 18(4):4531\u20134547","journal-title":"IEEE Trans Netw Serv Manag"},{"issue":"1","key":"11340_CR116","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-024-51778-1","volume":"14","author":"X Shao","year":"2024","unstructured":"Shao X, Kshitij FS, Kim CS (2024) Gails: an effective multi-object job shop scheduler based on genetic algorithm and iterative local search. Sci Rep 14(1):2068","journal-title":"Sci Rep"},{"issue":"2","key":"11340_CR117","doi-asserted-by":"publisher","first-page":"67","DOI":"10.4316\/AECE.2021.02008","volume":"21","author":"S Sharma","year":"2021","unstructured":"Sharma S, Wonsik Y (2021) Multiobjective optimization for resource allocation in full-duplex large distributed MIMO systems. Adv Electr Comput Eng 21(2):67","journal-title":"Adv Electr Comput Eng"},{"key":"11340_CR118","first-page":"2035","volume":"11","author":"S Sharma","year":"2018","unstructured":"Sharma S, Yoon W (2018) Multi-objective energy efficient resource allocation for wpcn. Int J Eng Res Tech 11:2035\u20132043","journal-title":"Int J Eng Res Tech"},{"key":"11340_CR119","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2022.118724","volume":"312","author":"R Shen","year":"2022","unstructured":"Shen R, Zhong S, Wen X, An Q, Zheng R, Li Y, Zhao J (2022) Multi-agent deep reinforcement learning optimization framework for building energy system with renewable energy. Appl Energy 312:118724","journal-title":"Appl Energy"},{"key":"11340_CR120","doi-asserted-by":"publisher","DOI":"10.1145\/3057267","author":"AK Singh","year":"2017","unstructured":"Singh AK, Dziurzanski P, Mendis HR, Indrusiak LS (2017) A survey and comparative study of hard and soft real-time dynamic resource allocation strategies for multi-\/many-core systems. ACM Comput Surv. https:\/\/doi.org\/10.1145\/3057267","journal-title":"ACM Comput Surv"},{"issue":"3","key":"11340_CR121","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1080\/002075400189284","volume":"38","author":"D Spinellis","year":"2000","unstructured":"Spinellis D, Papadopoulos C, Smith JM (2000) Large production line optimization using simulated annealing. Int J Prod Res 38(3):509\u2013541","journal-title":"Int J Prod Res"},{"key":"11340_CR122","unstructured":"Stephenson M Schaub H (2024a) Reinforcement learning for earth-observing satellite autonomy with event-based task intervals. In AAS Rocky Mountain GN &C conference, Breckenridge, CO"},{"key":"11340_CR123","doi-asserted-by":"crossref","unstructured":"Stephenson MA, Schaub H (2024b) Bsk-rl: modular, high-fidelity reinforcement learning environments for spacecraft tasking. In: 75th international astronautical congress, Milan, Italy, IAF","DOI":"10.52202\/078372-0120"},{"issue":"10","key":"11340_CR124","doi-asserted-by":"publisher","first-page":"1143","DOI":"10.1057\/palgrave.jors.2602068","volume":"57","author":"B Suman","year":"2006","unstructured":"Suman B, Kumar P (2006) A survey of simulated annealing as a tool for single and multiobjective optimization. J Oper Res Soc 57(10):1143\u20131160","journal-title":"J Oper Res Soc"},{"issue":"4","key":"11340_CR125","doi-asserted-by":"publisher","first-page":"2903","DOI":"10.1109\/TSG.2021.3052998","volume":"12","author":"X Sun","year":"2021","unstructured":"Sun X, Qiu J (2021) Two-stage volt\/var control in active distribution networks with multi-agent deep reinforcement learning method. IEEE Trans Smart Grid 12(4):2903\u20132912","journal-title":"IEEE Trans Smart Grid"},{"key":"11340_CR126","unstructured":"Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, Graepel T (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and MultiAgent systems. International Foundation for Autonomous Agents and Multiagent Systems, AAMAS \u201918, Richland, SC, pp 2085\u20132087"},{"key":"11340_CR127","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1023\/A:1022633531479","volume":"3","author":"RS Sutton","year":"1988","unstructured":"Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9\u201344","journal-title":"Mach Learn"},{"key":"11340_CR128","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge"},{"issue":"3","key":"11340_CR129","doi-asserted-by":"publisher","first-page":"1982","DOI":"10.1109\/TNSM.2022.3149243","volume":"19","author":"A Suzuki","year":"2022","unstructured":"Suzuki A, Kawahara R, Harada S (2022) Cooperative multi-agent deep reinforcement learning for dynamic virtual network allocation with traffic fluctuations. IEEE Trans Netw Serv Manage 19(3):1982\u20132000","journal-title":"IEEE Trans Netw Serv Manage"},{"issue":"10","key":"11340_CR130","doi-asserted-by":"publisher","first-page":"2104","DOI":"10.1109\/JSAC.2015.2435351","volume":"33","author":"J Tang","year":"2015","unstructured":"Tang J, So DK, Alsusa E, Hamdi KA, Shojaeifard A (2015) Resource allocation for energy efficiency optimization in heterogeneous networks. IEEE J Sel Areas Commun 33(10):2104\u20132117","journal-title":"IEEE J Sel Areas Commun"},{"key":"11340_CR131","unstructured":"Towers M, Kwiatkowski A, Terry J, Balis JU, De\u00a0Cola G, Deleu T, Goul\u00e3o M, Kallinteris A, Krimmel M, KG A et al (2024) Gymnasium: a standard interface for reinforcement learning environments. arXiv:2407.17032"},{"issue":"2","key":"11340_CR132","doi-asserted-by":"publisher","first-page":"1688","DOI":"10.1109\/JSYST.2017.2722476","volume":"12","author":"FH Tseng","year":"2017","unstructured":"Tseng FH, Wang X, Chou LD, Chao HC, Leung VC (2017) Dynamic resource prediction and allocation for cloud data center using the multiobjective genetic algorithm. IEEE Syst J 12(2):1688\u20131699","journal-title":"IEEE Syst J"},{"key":"11340_CR133","doi-asserted-by":"publisher","DOI":"10.1016\/j.icte.2025.01.010","author":"YH Tu","year":"2025","unstructured":"Tu YH, Ma YW (2025) A comprehensive multi-agent deep reinforcement learning framework with adaptive interaction strategies for contention window optimization in IEEE 802.11 wireless lans. ICT Express. https:\/\/doi.org\/10.1016\/j.icte.2025.01.010","journal-title":"ICT Express"},{"key":"11340_CR134","doi-asserted-by":"publisher","DOI":"10.1002\/9781118400715","volume-title":"Optimal resource allocation: with practical statistical applications and theory","author":"IA Ushakov","year":"2013","unstructured":"Ushakov IA (2013) Optimal resource allocation: with practical statistical applications and theory. Wiley, Hoboken"},{"key":"11340_CR135","doi-asserted-by":"crossref","unstructured":"Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30","DOI":"10.1609\/aaai.v30i1.10295"},{"issue":"3","key":"11340_CR136","doi-asserted-by":"publisher","first-page":"383","DOI":"10.1016\/j.engappai.2006.06.019","volume":"20","author":"D Vengerov","year":"2007","unstructured":"Vengerov D (2007) A reinforcement learning approach to dynamic resource allocation. Eng Appl Artif Intell 20(3):383\u2013390","journal-title":"Eng Appl Artif Intell"},{"key":"11340_CR137","unstructured":"Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1995\u20132003"},{"issue":"1","key":"11340_CR138","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1109\/TCYB.2020.3015811","volume":"51","author":"X Wang","year":"2020","unstructured":"Wang X, Ke L, Qiao Z, Chai X (2020) Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans Cybern 51(1):174\u2013187","journal-title":"IEEE Trans Cybern"},{"issue":"6","key":"11340_CR139","doi-asserted-by":"publisher","first-page":"2228","DOI":"10.1109\/TMC.2020.3033782","volume":"21","author":"Y Wang","year":"2020","unstructured":"Wang Y, Xu T, Niu X, Tan C, Chen E, Xiong H (2020) Stmarl: a spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control. IEEE Trans Mob Comput 21(6):2228\u20132242","journal-title":"IEEE Trans Mob Comput"},{"key":"11340_CR140","first-page":"3271","volume-title":"Adv Neural Inf Process Syst","author":"J Wang","year":"2021","unstructured":"Wang J, Xu W, Gu Y, Song W, Green TC (2021) Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW (eds) Adv Neural Inf Process Syst, vol 34. Curran Associates Inc, New York, pp 3271\u20133284"},{"key":"11340_CR141","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2024.3414447","author":"L Wang","year":"2024","unstructured":"Wang L, Liang H, Mao G, Zhao D, Liu Q, Yao Y, Zhang H (2024) Resource allocation for dynamic platoon digital twin networks: a multi-agent deep reinforcement learning method. IEEE Trans Veh Technol. https:\/\/doi.org\/10.1109\/TVT.2024.3414447","journal-title":"IEEE Trans Veh Technol"},{"key":"11340_CR142","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2024.110526","volume":"250","author":"Q Wang","year":"2024","unstructured":"Wang Q, Li W, Mohajer A (2024) Load-aware continuous-time optimization for multi-agent systems: toward dynamic resource allocation and real-time adaptability. Comput Netw 250:110526","journal-title":"Comput Netw"},{"key":"11340_CR143","doi-asserted-by":"publisher","DOI":"10.1016\/j.energy.2024.134165","volume":"314","author":"C Wang","year":"2025","unstructured":"Wang C, Wang M, Wang A, Zhang X, Zhang J, Ma H, Yang N, Zhao Z, Lai CS, Lai LL (2025) Multiagent deep reinforcement learning-based cooperative optimal operation with strong scalability for residential microgrid clusters. Energy 314:134165","journal-title":"Energy"},{"key":"11340_CR144","doi-asserted-by":"crossref","unstructured":"Wei H, Chen C, Zheng G, Wu K, Gayah V, Xu K, Li Z (2019) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, KDD \u201919, New York, NY, USA, pp 1290\u20131298","DOI":"10.1145\/3292500.3330949"},{"issue":"12","key":"11340_CR145","doi-asserted-by":"publisher","first-page":"25536","DOI":"10.1109\/TITS.2021.3091321","volume":"23","author":"W Wei","year":"2021","unstructured":"Wei W, Yang R, Gu H, Zhao W, Chen C, Wan S (2021) Multi-objective optimization for resource allocation in vehicular cloud computing networks. IEEE Trans Intell Transp Syst 23(12):25536\u201325545","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"3","key":"11340_CR146","doi-asserted-by":"publisher","first-page":"2107","DOI":"10.1109\/TMC.2023.3250495","volume":"23","author":"Z Wei","year":"2023","unstructured":"Wei Z, Li B, Zhang R, Cheng X, Yang L (2023) Many-to-many task offloading in vehicular fog computing: a multi-agent deep reinforcement learning approach. IEEE Trans Mob Comput 23(3):2107\u20132122","journal-title":"IEEE Trans Mob Comput"},{"issue":"4","key":"11340_CR147","volume":"2","author":"G Wen","year":"2021","unstructured":"Wen G, Fu J, Dai P, Zhou J (2021) Dtde: a new cooperative multi-agent reinforcement learning framework. Innov 2(4):100162","journal-title":"Innov"},{"key":"11340_CR148","first-page":"16509","volume":"35","author":"M Wen","year":"2022","unstructured":"Wen M, Kuba J, Lin R, Zhang W, Wen Y, Wang J, Yang Y (2022) Multi-agent reinforcement learning is a sequence modeling problem. Adv Neural Inf Process Syst 35:16509\u201316521","journal-title":"Adv Neural Inf Process Syst"},{"issue":"6","key":"11340_CR149","doi-asserted-by":"publisher","first-page":"5023","DOI":"10.1007\/s10462-022-10299-x","volume":"56","author":"A Wong","year":"2023","unstructured":"Wong A, B\u00e4ck T, Kononova AV, Plaat A (2023) Deep multiagent reinforcement learning: challenges and directions. Artif Intell Rev 56(6):5023\u20135056","journal-title":"Artif Intell Rev"},{"key":"11340_CR150","doi-asserted-by":"publisher","first-page":"538","DOI":"10.1016\/j.asoc.2018.07.008","volume":"71","author":"H Wu","year":"2018","unstructured":"Wu H, Pang GKH, Choy KL, Lam HY (2018) Dynamic resource allocation for parking lot electric vehicle recharging using heuristic fuzzy particle swarm optimization algorithm. Appl Soft Comput 71:538\u2013552","journal-title":"Appl Soft Comput"},{"issue":"8","key":"11340_CR151","doi-asserted-by":"publisher","first-page":"8243","DOI":"10.1109\/TVT.2020.2997896","volume":"69","author":"T Wu","year":"2020","unstructured":"Wu T, Zhou P, Liu K, Yuan Y, Wang X, Huang H, Wu DO (2020) Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks. IEEE Trans Veh Technol 69(8):8243\u20138256","journal-title":"IEEE Trans Veh Technol"},{"key":"11340_CR152","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.jpdc.2023.02.008","volume":"176","author":"G Wu","year":"2023","unstructured":"Wu G, Xu Z, Zhang H, Shen S, Yu S (2023) Multi-agent drl for joint completion delay and energy consumption with queuing theory in mec-based iiot. J Parallel Distrib Comput 176:80\u201394","journal-title":"J Parallel Distrib Comput"},{"key":"11340_CR153","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2024.123998","volume":"374","author":"H Wu","year":"2024","unstructured":"Wu H, Qiu D, Zhang L, Sun M (2024) Adaptive multi-agent reinforcement learning for flexible resource management in a virtual power plant with dynamic participating multi-energy buildings. Appl Energy 374:123998","journal-title":"Appl Energy"},{"issue":"8","key":"11340_CR154","doi-asserted-by":"publisher","first-page":"5414","DOI":"10.1109\/TWC.2022.3233853","volume":"22","author":"Y Xiao","year":"2023","unstructured":"Xiao Y, Song Y, Liu J (2023) Multi-agent deep reinforcement learning based resource allocation for ultra-reliable low-latency internet of controllable things. IEEE Trans Wireless Commun 22(8):5414\u20135430","journal-title":"IEEE Trans Wireless Commun"},{"issue":"2","key":"11340_CR155","doi-asserted-by":"publisher","first-page":"1883","DOI":"10.1007\/s10586-023-04030-w","volume":"27","author":"H Xu","year":"2024","unstructured":"Xu H, Jian C (2024) A meta reinforcement learning-based virtual machine placement algorithm in mobile edge computing. Clust Comput 27(2):1883\u20131896","journal-title":"Clust Comput"},{"key":"11340_CR156","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/s10586-008-0060-0","volume":"11","author":"J Xu","year":"2008","unstructured":"Xu J, Zhao M, Fortes J, Carpenter R, Yousif M (2008) Autonomic resource management in virtualized data centers using fuzzy logic-based approaches. Clust Comput 11:213\u2013227","journal-title":"Clust Comput"},{"issue":"4","key":"11340_CR157","doi-asserted-by":"publisher","first-page":"3201","DOI":"10.1109\/TSG.2020.2971427","volume":"11","author":"X Xu","year":"2020","unstructured":"Xu X, Jia Y, Xu Y, Xu Z, Chai S, Lai CS (2020) A multi-agent reinforcement learning-based data-driven method for home energy management. IEEE Trans Smart Grid 11(4):3201\u20133211","journal-title":"IEEE Trans Smart Grid"},{"key":"11340_CR158","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2024.123923","volume":"375","author":"X Xu","year":"2024","unstructured":"Xu X, Xu K, Zeng Z, Tang J, He Y, Shi G, Zhang T (2024) Collaborative optimization of multi-energy multi-microgrid system: a hierarchical trust-region multi-agent reinforcement learning approach. Appl Energy 375:123923","journal-title":"Appl Energy"},{"key":"11340_CR159","first-page":"20147","volume":"35","author":"K Xue","year":"2022","unstructured":"Xue K, Xu J, Yuan L, Li M, Qian C, Zhang Z, Yu Y (2022) Multi-agent dynamic algorithm configuration. Adv Neural Inf Process Syst 35:20147\u201320161","journal-title":"Adv Neural Inf Process Syst"},{"key":"11340_CR160","doi-asserted-by":"publisher","DOI":"10.1016\/j.comcom.2025.108081","volume":"234","author":"J Xue","year":"2025","unstructured":"Xue J, Wang L, Yu Q, Mao P (2025) Multi-agent deep reinforcement learning-based partial offloading and resource allocation in vehicular edge computing networks. Comput Commun 234:108081","journal-title":"Comput Commun"},{"issue":"4","key":"11340_CR161","doi-asserted-by":"publisher","first-page":"3509","DOI":"10.1109\/JIOT.2020.2972776","volume":"7","author":"W Y\u00e1nez","year":"2020","unstructured":"Y\u00e1nez W, Mahmud R, Bahsoon R, Zhang Y, Buyya R (2020) Data allocation mechanism for internet-of-things systems with blockchain. IEEE Internet Things J 7(4):3509\u20133522. https:\/\/doi.org\/10.1109\/JIOT.2020.2972776","journal-title":"IEEE Internet Things J"},{"key":"11340_CR162","unstructured":"Yang Y, Luo R, Li M, Zhou M, Zhang W, Wang J (2018) Mean field multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 5571\u20135580"},{"issue":"6","key":"11340_CR163","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1016\/j.future.2013.02.004","volume":"29","author":"D Ye","year":"2013","unstructured":"Ye D, Chen J (2013) Non-cooperative games on multidimensional resource allocation. Futur Gener Comput Syst 29(6):1345\u20131352","journal-title":"Futur Gener Comput Syst"},{"issue":"4","key":"11340_CR164","doi-asserted-by":"publisher","first-page":"2933","DOI":"10.1109\/JIOT.2021.3094651","volume":"9","author":"S Yin","year":"2021","unstructured":"Yin S, Yu FR (2021) Resource allocation and trajectory design in UAV-aided cellular networks based on multiagent reinforcement learning. IEEE Internet Things J 9(4):2933\u20132943","journal-title":"IEEE Internet Things J"},{"issue":"2","key":"11340_CR165","doi-asserted-by":"publisher","first-page":"855","DOI":"10.1109\/TSMC.2020.3012832","volume":"52","author":"X You","year":"2020","unstructured":"You X, Li X, Xu Y, Feng H, Zhao J, Yan H (2020) Toward packet routing with fully distributed multiagent deep reinforcement learning. IEEE Trans Syst Man Cybern Syst 52(2):855\u2013868","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"issue":"15","key":"11340_CR166","doi-asserted-by":"publisher","first-page":"12046","DOI":"10.1109\/JIOT.2021.3078462","volume":"8","author":"L Yu","year":"2021","unstructured":"Yu L, Qin S, Zhang M, Shen C, Jiang T, Guan X (2021) A review of deep reinforcement learning for smart building energy management. IEEE Internet Things J 8(15):12046\u201312063","journal-title":"IEEE Internet Things J"},{"key":"11340_CR167","first-page":"24611","volume":"35","author":"C Yu","year":"2022","unstructured":"Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A, Wu Y (2022) The surprising effectiveness of PPO in cooperative multi-agent games. Adv Neural Inf Process Syst 35:24611\u201324624","journal-title":"Adv Neural Inf Process Syst"},{"issue":"11","key":"11340_CR168","doi-asserted-by":"publisher","first-page":"9942","DOI":"10.1109\/JIOT.2023.3234911","volume":"10","author":"WJ Yun","year":"2023","unstructured":"Yun WJ, Kim JP, Jung S, Kim JH, Kim J (2023) Quantum multiagent actor-critic neural networks for internet-connected multirobot coordination in smart factory management. IEEE Internet Things J 10(11):9942\u20139952","journal-title":"IEEE Internet Things J"},{"key":"11340_CR169","doi-asserted-by":"publisher","DOI":"10.1145\/3603703","author":"Z Zabihi","year":"2023","unstructured":"Zabihi Z, Eftekhari Moghadam AM, Rezvani MH (2023) Reinforcement learning methods for computation offloading: a systematic review. ACM Comput Surv. https:\/\/doi.org\/10.1145\/3603703","journal-title":"ACM Comput Surv"},{"key":"11340_CR170","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2022.103497","volume":"207","author":"A Zeynivand","year":"2022","unstructured":"Zeynivand A, Javadpour A, Bolouki S, Sangaiah AK, Ja\u2019fari F, Pinto P, Zhang W (2022) Traffic flow control using multi-agent reinforcement learning. J Netw Comput Appl 207:103497","journal-title":"J Netw Comput Appl"},{"issue":"13s","key":"11340_CR171","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3589639","volume":"55","author":"X Zhang","year":"2023","unstructured":"Zhang X, Debroy S (2023) Resource management in mobile edge computing: a comprehensive survey. ACM Comput Surv 55(13s):1\u201337","journal-title":"ACM Comput Surv"},{"key":"11340_CR172","doi-asserted-by":"publisher","DOI":"10.1109\/TCOMM.2025.3534565","author":"Y Zhang","year":"2025","unstructured":"Zhang Y, Guo D (2025) Multi-agent reinforcement learning for multi-cell spectrum and power allocation. IEEE Trans Commun. https:\/\/doi.org\/10.1109\/TCOMM.2025.3534565","journal-title":"IEEE Trans Commun"},{"issue":"2","key":"11340_CR173","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1109\/MWC.2012.6189411","volume":"19","author":"G Zhang","year":"2012","unstructured":"Zhang G, Yang K, Chen HH (2012) Resource allocation for wireless cooperative networks: a unified cooperative bargaining game theoretic framework. IEEE Wirel Commun 19(2):38\u201343","journal-title":"IEEE Wirel Commun"},{"issue":"6","key":"11340_CR174","doi-asserted-by":"publisher","first-page":"3481","DOI":"10.1109\/TWC.2015.2407355","volume":"14","author":"H Zhang","year":"2015","unstructured":"Zhang H, Jiang C, Beaulieu NC, Chu X, Wang X, Quek TQ (2015) Resource allocation for cognitive small cell networks: a cooperative bargaining game theoretic approach. IEEE Trans Wirel Commun 14(6):3481\u20133493","journal-title":"IEEE Trans Wirel Commun"},{"key":"11340_CR175","doi-asserted-by":"crossref","unstructured":"Zhang H, Feng S, Liu C, Ding Y, Zhu Y, Zhou Z, Zhang W, Yu Y, Jin H, Li Z (2019) Cityflow: a multi-agent reinforcement learning environment for large scale city traffic scenario. In: The world wide web conference. Association for Computing Machinery, WWW \u201919, New York, NY, USA, pp 3620\u20133624","DOI":"10.1145\/3308558.3314139"},{"key":"11340_CR176","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.02682","author":"SQ Zhang","year":"2019","unstructured":"Zhang SQ, Zhang Q, Lin J (2019) Efficient communication in multi-agent reinforcement learning via variance based control. Adv Neural Inf Process Syst. https:\/\/doi.org\/10.48550\/arXiv.1909.02682","journal-title":"Adv Neural Inf Process Syst"},{"issue":"10","key":"11340_CR177","doi-asserted-by":"publisher","first-page":"11599","DOI":"10.1109\/TVT.2020.3014788","volume":"69","author":"Y Zhang","year":"2020","unstructured":"Zhang Y, Mou Z, Gao F, Jiang J, Ding R, Han Z (2020) Uav-enabled secure communications by multi-agent deep reinforcement learning. IEEE Trans Veh Technol 69(10):11599\u201311611","journal-title":"IEEE Trans Veh Technol"},{"issue":"2","key":"11340_CR178","doi-asserted-by":"publisher","first-page":"1405","DOI":"10.1109\/TII.2021.3088407","volume":"18","author":"K Zhang","year":"2021","unstructured":"Zhang K, Cao J, Zhang Y (2021) Adaptive digital twin and multiagent deep reinforcement learning for vehicular edge computing and networks. IEEE Trans Ind Inform 18(2):1405\u20131413","journal-title":"IEEE Trans Ind Inform"},{"issue":"8","key":"11340_CR179","doi-asserted-by":"publisher","first-page":"2501","DOI":"10.1109\/JSAC.2021.3087244","volume":"39","author":"M Zhang","year":"2021","unstructured":"Zhang M, Dou Y, Chong PHJ, Chan HC, Seet BC (2021) Fuzzy logic-based resource allocation algorithm for V2X communications in 5G cellular networks. IEEE J Sel Areas Commun 39(8):2501\u20132513","journal-title":"IEEE J Sel Areas Commun"},{"issue":"12","key":"11340_CR180","doi-asserted-by":"publisher","first-page":"3688","DOI":"10.1109\/JSAC.2021.3118352","volume":"39","author":"W Zhang","year":"2021","unstructured":"Zhang W, Yang D, Wu W, Peng H, Zhang N, Zhang H, Shen X (2021) Optimizing federated learning in distributed industrial iot: a multi-agent approach. IEEE J Sel Areas Commun 39(12):3688\u20133703","journal-title":"IEEE J Sel Areas Commun"},{"key":"11340_CR181","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2022.102412","volume":"78","author":"Y Zhang","year":"2022","unstructured":"Zhang Y, Zhu H, Tang D, Zhou T, Gui Y (2022) Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems. Robotics and Computer-Integrated Manufacturing 78:102412","journal-title":"Robotics and Computer-Integrated Manufacturing"},{"key":"11340_CR182","doi-asserted-by":"publisher","DOI":"10.1016\/j.enconman.2022.116647","volume":"277","author":"B Zhang","year":"2023","unstructured":"Zhang B, Hu W, Ghias AM, Xu X, Chen Z (2023) Multi-agent deep reinforcement learning based distributed control architecture for interconnected multi-energy microgrid energy management and optimization. Energy Convers Manage 277:116647","journal-title":"Energy Convers Manage"},{"key":"11340_CR183","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.110083","volume":"259","author":"JD Zhang","year":"2023","unstructured":"Zhang JD, He Z, Chan WH, Chow CY (2023) Deepmag: deep reinforcement learning with multi-agent graphs for flexible job shop scheduling. Knowledge-Based Systems 259:110083","journal-title":"Knowledge-Based Systems"},{"key":"11340_CR184","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2024.3405642","author":"H Zhang","year":"2024","unstructured":"Zhang H, Zhao H, Liu R, Kaushik A, Gao X, Xu S (2024) Collaborative task offloading optimization for satellite mobile edge computing using multi-agent deep reinforcement learning. IEEE Trans Veh Technol. https:\/\/doi.org\/10.1109\/TVT.2024.3405642","journal-title":"IEEE Trans Veh Technol"},{"key":"11340_CR185","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2024.3392587","author":"Y Zhang","year":"2024","unstructured":"Zhang Y, Zheng G, Liu Z, Li Q, Zeng H (2024) Marlens: understanding multi-agent reinforcement learning for traffic signal control via visual analytics. IEEE Trans Vis Comput Graph. https:\/\/doi.org\/10.1109\/TVCG.2024.3392587","journal-title":"IEEE Trans Vis Comput Graph"},{"key":"11340_CR186","unstructured":"Zhang J, Liu Z, Zhu Y, Shi E, Xu B, Yuen C, Niyato D, Debbah M, Jin S, Ai B et al (2025) Multi-agent reinforcement learning in wireless distributed networks for 6g. arXiv:2502.05812"},{"issue":"9","key":"11340_CR187","doi-asserted-by":"publisher","first-page":"6949","DOI":"10.1109\/TWC.2022.3153316","volume":"21","author":"N Zhao","year":"2022","unstructured":"Zhao N, Ye Z, Pei Y, Liang YC, Niyato D (2022) Multi-agent deep reinforcement learning for task offloading in UAV-assisted mobile edge computing. IEEE Trans Wirel Commun 21(9):6949\u20136960","journal-title":"IEEE Trans Wirel Commun"},{"key":"11340_CR188","unstructured":"Zhao J, Hu F, Li J, Nie Y (2023) Multi-agent deep reinforcement learning based resource management in heterogeneous v2x networks. Digit Commun Netw"},{"key":"11340_CR189","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1016\/j.jmsy.2021.08.002","volume":"61","author":"P Zheng","year":"2021","unstructured":"Zheng P, Xia L, Li C, Li X, Liu B (2021) Towards self-x cognitive manufacturing network: an industrial knowledge graph-based multi-agent reinforcement learning approach. J Manuf Syst 61:16\u201326","journal-title":"J Manuf Syst"},{"key":"11340_CR190","first-page":"1","volume":"25","author":"Y Zhong","year":"2024","unstructured":"Zhong Y, Kuba JG, Feng X, Hu S, Ji J, Yang Y (2024) Heterogeneous-agent reinforcement learning. J Mach Learn Res 25:1\u201367","journal-title":"J Mach Learn Res"},{"issue":"12","key":"11340_CR191","doi-asserted-by":"publisher","first-page":"9595","DOI":"10.1109\/TWC.2023.3272348","volume":"22","author":"H Zhou","year":"2023","unstructured":"Zhou H, Jiang K, He S, Min G, Wu J (2023) Distributed deep multi-agent reinforcement learning for cooperative edge caching in internet-of-vehicles. IEEE Trans Wireless Commun 22(12):9595\u20139609","journal-title":"IEEE Trans Wireless Commun"},{"issue":"12","key":"11340_CR192","doi-asserted-by":"publisher","first-page":"9763","DOI":"10.1109\/JIOT.2020.3040768","volume":"8","author":"X Zhu","year":"2020","unstructured":"Zhu X, Luo Y, Liu A, Bhuiyan MZA, Zhang S (2020) Multiagent deep reinforcement learning for vehicular computation offloading in IoT. IEEE Internet Things J 8(12):9763\u20139773","journal-title":"IEEE Internet Things J"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11340-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-025-11340-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11340-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T03:32:34Z","timestamp":1761363154000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-025-11340-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,27]]},"references-count":192,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["11340"],"URL":"https:\/\/doi.org\/10.1007\/s10462-025-11340-5","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,27]]},"assertion":[{"value":"29 July 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 August 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"354"}}