{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T16:01:05Z","timestamp":1774713665548,"version":"3.50.1"},"reference-count":40,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Intelligent traffic management systems have become one of the main applications of Intelligent Transportation Systems (ITS). There is a growing interest in Reinforcement Learning (RL) based control methods in ITS applications such as autonomous driving and traffic management solutions. Deep learning helps in approximating substantially complex nonlinear functions from complicated data sets and tackling complex control issues. In this paper, we propose an approach based on Multi-Agent Reinforcement Learning (MARL) and smart routing to improve the flow of autonomous vehicles on road networks. We evaluate Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critical (IA2C), recently suggested Multi-Agent Reinforcement Learning techniques with smart routing for traffic signal optimization to determine its potential. We investigate the framework offered by non-Markov decision processes, enabling a more in-depth understanding of the algorithms. We conduct a critical analysis to observe the robustness and effectiveness of the method. The method\u2019s efficacy and reliability are demonstrated by simulations using SUMO, a software modeling tool for traffic simulations. We used a road network that contains seven intersections. Our findings show that MA2C, when trained on pseudo-random vehicle flows, is a viable methodology that outperforms competing techniques.<\/jats:p>","DOI":"10.3390\/s23052373","type":"journal-article","created":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T02:08:34Z","timestamp":1677031714000},"page":"2373","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":40,"title":["Multi-Agent Reinforcement Learning for Traffic Flow Management of Autonomous Vehicles"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4503-7523","authenticated-orcid":false,"given":"Anum","family":"Mushtaq","sequence":"first","affiliation":[{"name":"Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5142-3965","authenticated-orcid":false,"given":"Irfan Ul","family":"Haq","sequence":"additional","affiliation":[{"name":"Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan"}]},{"given":"Muhammad Azeem","family":"Sarwar","sequence":"additional","affiliation":[{"name":"Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2039-5305","authenticated-orcid":false,"given":"Asifullah","family":"Khan","sequence":"additional","affiliation":[{"name":"Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan"},{"name":"PIEAS Artificial Intelligence Center (PAIC), Islamabad 44000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2274-6770","authenticated-orcid":false,"given":"Wajeeha","family":"Khalil","sequence":"additional","affiliation":[{"name":"Department of CS and IT, University of Engineering and Technology, Peshawar 25000, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6148-9634","authenticated-orcid":false,"given":"Muhammad Abid","family":"Mughal","sequence":"additional","affiliation":[{"name":"Pakistan Institute of Engineering and Applied Sciences, Islamabad 44000, Pakistan"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,21]]},"reference":[{"key":"ref_1","first-page":"446","article-title":"The economics of traffic congestion","volume":"82","author":"Arnott","year":"1994","journal-title":"Am. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1007\/s40534-016-0117-3","article-title":"Autonomous vehicles: Challenges, opportunities, and future implications for transportation policies","volume":"24","author":"Bagloee","year":"2016","journal-title":"J. Mod. Transp."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.tra.2015.04.003","article-title":"Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations","volume":"77","author":"Fagnant","year":"2015","journal-title":"Transp. Res. Part A Policy Pract."},{"key":"ref_4","unstructured":"Anderson, J.M., Nidhi, K., Stanley, K.D., Sorensen, P., Samaras, C., and Oluwatola, O.A. (2014). Autonomous Vehicle Technology: A Guide for Policymakers, Rand Corporation."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Folsom, T.C. (2011, January 23\u201325). Social ramifications of autonomous urban land vehicles. Proceedings of the 2011 IEEE International Symposium on Technology and Society (ISTAS), Chicago, IL, USA.","DOI":"10.1109\/ISTAS.2011.7160596"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"659","DOI":"10.1080\/01441640801987825","article-title":"Advanced driver assistance systems from autonomous to cooperative approach","volume":"28","author":"Piao","year":"2008","journal-title":"Transp. Rev."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"744","DOI":"10.3390\/vehicles4030042","article-title":"Exploring Smart Tires as a Tool to Assist Safe Driving and Monitor Tire\u2013Road Friction","volume":"4","author":"Pomoni","year":"2022","journal-title":"Vehicles"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mushtaq, A., Sarwar, M.A., Khan, A., and Shafiq, O. (2022). Traffic Management of Autonomous Vehicles Using Policy Based Deep Reinforcement Learning and Intelligent Routing. arXiv.","DOI":"10.1109\/ACCESS.2021.3063463"},{"key":"ref_9","unstructured":"Lijding, M., Benz, H., Meratnia, N., and Havinga, P. (2006). Smart Signs: Showing the Way in Smart Surroundings, Centre for Telematics and Information Technology, University of Twente. Technical Report TR-CTIT-06-20v."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Bimbraw, K. (2015, January 21\u201323). Autonomous cars: Past, present and future a review of the developments in the last century, the present scenario and the expected future of autonomous vehicle technology. Proceedings of the 2015 12th International Conference on Informatics in Control, Automation and Robotics (ICINCO), Colmar, France.","DOI":"10.5220\/0005540501910198"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1049\/iet-its.2017.0153","article-title":"Traffic light control using deep policy-gradient and value-function-based reinforcement learning","volume":"11","author":"Mousavi","year":"2017","journal-title":"IET Intell. Transp. Syst."},{"key":"ref_12","unstructured":"Genders, W., and Razavi, S. (2016). Using a deep reinforcement learning agent for traffic signal control. arXiv."},{"key":"ref_13","unstructured":"Bakker, B., Whiteson, S., Kester, L., and Groen, F.C. (2010). Interactive Collaborative Information Systems, Springer."},{"key":"ref_14","unstructured":"Casas, N. (2017). Deep deterministic policy gradient for urban traffic light control. arXiv."},{"key":"ref_15","first-page":"1","article-title":"Algorithms for reinforcement learning","volume":"4","year":"2010","journal-title":"Synth. Lect. Artif. Intell. Mach. Learn."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Canese, L., Cardarilli, G.C., Di Nunzio, L., Fazzolari, R., Giardino, D., Re, M., and Span\u00f2, S. (2021). Multi-Agent Reinforcement Learning: A Review of Challenges and Applications. Appl. Sci., 11.","DOI":"10.1038\/s41598-021-94691-7"},{"key":"ref_17","unstructured":"Zhang, K., Yang, Z., and Ba\u015far, T. (2021). Handbook of Reinforcement Learning and Control, Springer."},{"key":"ref_18","unstructured":"Bu\u015foniu, L., Babu\u0161ka, R., and De Schutter, B. (2010). Innovations in Multi-Agent Systems and Applications-1, Springer."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tan, M. (1993, January 27\u201329). Multi-Agent Reinforcement Learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1613\/jair.898","article-title":"Accelerating reinforcement learning through implicit imitation","volume":"19","author":"Price","year":"2003","journal-title":"J. Artif. Intell. Res."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1243","DOI":"10.1109\/TVT.2018.2890726","article-title":"A deep reinforcement learning network for traffic light cycle control","volume":"68","author":"Liang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1007\/s10458-008-9062-9","article-title":"Opportunities for multiagent systems and multiagent reinforcement learning in traffic control","volume":"18","author":"Bazzan","year":"2009","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_23","unstructured":"Hu, J., and Wellman, M.P. (1998, January 24\u201327). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), Madison, WI, USA."},{"key":"ref_24","unstructured":"Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., P\u00e9rolat, J., Silver, D., and Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. arXiv."},{"key":"ref_25","unstructured":"Foerster, J.N., Assael, Y.M., De Freitas, N., and Whiteson, S. (2016). Learning to communicate with deep Multi-Agent Reinforcement Learning. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multiagent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Prabuchandran, K., AN, H.K., and Bhatnagar, S. (2014, January 8\u201311). Multi-Agent Reinforcement Learning for traffic signal control. Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China.","DOI":"10.1109\/ITSC.2014.6958095"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1109\/TCYB.2020.3015811","article-title":"Large-scale traffic signal control using a novel multiagent reinforcement learning","volume":"51","author":"Wang","year":"2020","journal-title":"IEEE Trans. Cybern."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1109\/TITS.2013.2255286","article-title":"Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): Methodology and large-scale application on downtown Toronto","volume":"14","author":"Abdulhai","year":"2013","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"51005","DOI":"10.1109\/ACCESS.2021.3063463","article-title":"Traffic flow management of autonomous vehicles using deep reinforcement learning and smart rerouting","volume":"9","author":"Mushtaq","year":"2021","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mushtaq, A., Haq, I.U., Nabi, W.U., Khan, A., and Shafiq, O. (2021). Traffic Flow Management of Autonomous Vehicles Using Platooning and Collision Avoidance Strategies. Electronics, 10.","DOI":"10.3390\/electronics10101221"},{"key":"ref_32","unstructured":"O\u2019Donoghue, B., Osband, I., Munos, R., and Mnih, V. (2018, January 10\u201315). The uncertainty bellman equation and exploration. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1016\/j.asoc.2011.11.011","article-title":"Fuzzy Dijkstra algorithm for shortest path problem under uncertain environment","volume":"12","author":"Deng","year":"2012","journal-title":"Appl. Soft Comput."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Clark, J., and Daigle, G. (1997, January 7\u201310). The importance of simulation techniques in ITS research and analysis. Proceedings of the 29th Conference on Winter Simulation, Atlanta, GA, USA.","DOI":"10.1145\/268437.268766"},{"key":"ref_35","unstructured":"Krajzewicz, D. (2010). Fundamentals of Traffic Simulation, Springer."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.proeng.2016.01.053","article-title":"Simulation of traffic flows on the road network of urban area","volume":"134","author":"Ugnenko","year":"2016","journal-title":"Procedia Eng."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1109\/TITS.2019.2901791","article-title":"Multi-Agent deep reinforcement learning for large-scale traffic signal control","volume":"21","author":"Chu","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2182","DOI":"10.1109\/TITS.2016.2517079","article-title":"A hierarchical model predictive control approach for signal splits optimization in large-scale urban road networks","volume":"17","author":"Ye","year":"2016","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.trc.2004.12.004","article-title":"A fuzzy logic multi-phased signal control model for isolated junctions","volume":"13","author":"Murat","year":"2005","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1186\/s12544-020-00440-8","article-title":"The traffic signal control problem for intersections: A review","volume":"12","author":"Eom","year":"2020","journal-title":"Eur. Transp. Res. Rev."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2373\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:38:05Z","timestamp":1760121485000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2373"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,21]]},"references-count":40,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052373"],"URL":"https:\/\/doi.org\/10.3390\/s23052373","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,21]]}}}