{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T21:44:19Z","timestamp":1776289459403,"version":"3.50.1"},"reference-count":58,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,4,6]],"date-time":"2022-04-06T00:00:00Z","timestamp":1649203200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Existing inefficient traffic signal plans are causing traffic congestions in many urban areas. In recent years, many deep reinforcement learning (RL) methods have been proposed to control traffic signals in real-time by interacting with the environment. However, most of existing state-of-the-art RL methods use complex state definition and reward functions and\/or neglect the real-world constraints such as cyclic phase order and minimum\/maximum duration for each traffic phase. These issues make existing methods infeasible to implement for real-world applications. In this paper, we propose an RL-based multi-intersection traffic light control model with a simple yet effective combination of state, reward, and action definitions. The proposed model uses a novel pressure method called Biased Pressure (BP). We use a state-of-the-art advantage actor-critic learning mechanism in our model. Due to the decentralized nature of our state, reward, and action definitions, we achieve a scalable model. The performance of the proposed method is compared with related methods using both synthetic and real-world datasets. Experimental results show that our method outperforms the existing cyclic phase control methods with a significant margin in terms of throughput and average travel time. Moreover, we conduct ablation studies to justify the superiority of the BP method over the existing pressure methods.<\/jats:p>","DOI":"10.3390\/s22072818","type":"journal-article","created":{"date-parts":[[2022,4,7]],"date-time":"2022-04-07T21:08:22Z","timestamp":1649365702000},"page":"2818","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Biased Pressure: Cyclic Reinforcement Learning Model for Intelligent Traffic Signal Control"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7380-6916","authenticated-orcid":false,"given":"Bunyodbek","family":"Ibrokhimov","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Inha University, Inha-ro 100, Nam-gu, Incheon 22212, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Young-Joo","family":"Kim","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute (ETRI), 218 Gajeong-ro, Yuseong-gu, Daejeon 34129, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanggil","family":"Kang","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Inha University, Inha-ro 100, Nam-gu, Incheon 22212, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,6]]},"reference":[{"key":"ref_1","unstructured":"McCarthy, N. (2022, January 29). Traffic Congestion Costs U.S. Cities Billion of Dollars Every Year. Forbes, Available online: https:\/\/www.forbes.com\/sites\/niallmccarthy\/2020\/03\/10\/traffic-congestion-costs-us-cities-billions-of-dollars-every-year-infographic."},{"key":"ref_2","unstructured":"Lee, S. (2022, January 29). Transport System Management (TSM). Seoul Solution. Available online: https:\/\/www.seoulsolution.kr\/en\/node\/6537."},{"key":"ref_3","unstructured":"Wei, H., Zheng, G., Gayah, V., and Li, Z. (2019). A survey on traffic signal control methods. arXiv."},{"key":"ref_4","unstructured":"Schrank, D., Eisele, B., Lomax, T., and Bak, J. (2015). 2015 Urban Mobility Scorecard, The Texas A&M Transportation Institute and INRIX."},{"key":"ref_5","unstructured":"Lowrie, P.R. (1992). Scats\u2013A Traffic Responsive Method of Controlling Urban Traffic, Roads and Traffic Authority."},{"key":"ref_6","first-page":"190","article-title":"The SCOOT on-line traffic signal optimisation technique","volume":"23","author":"Hunt","year":"1982","journal-title":"Traffic Eng. Control."},{"key":"ref_7","unstructured":"Koonce, P., and Rodegerdts, L. (2008). Traffic Signal Timing Manual (No. FHWA-HOP-08-024), Federal Highway Administration."},{"key":"ref_8","unstructured":"Roess, R.P., Prassas, E.S., and McShane, W.R. (2004). Traffic Engineering, Pearson\/Prentice Hall."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.trc.2009.04.001","article-title":"Evaluation of green wave policy in real-time railway traffic management","volume":"17","author":"Corman","year":"2009","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"14","DOI":"10.4236\/wjet.2014.23B003","article-title":"Green-wave traffic theory optimization and analysis","volume":"2","author":"Wu","year":"2014","journal-title":"World J. Eng. Technol."},{"key":"ref_11","unstructured":"Vinod Chandra, S.S. (2020). A multi-agent ant colony optimization algorithm for effective vehicular traffic management. Proceedings of the International Conference on Swarm Intelligence, Belgrade, Serbia, 14\u201320 July 2020, Springer."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sattari, M.R.J., Malakooti, H., Jalooli, A., and Noor, R.M. (2014). A Dynamic Vehicular Traffic Control Using Ant Colony and Traffic Light Optimization. Advances in Systems Science, Springer.","DOI":"10.1007\/978-3-319-01857-7_6"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gao, K., Zhang, Y., Sadollah, A., and Su, R. (2017, January 5\u20138). Improved artificial bee colony algorithm for solving urban traffic light scheduling problem. Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain.","DOI":"10.1109\/CEC.2017.7969339"},{"key":"ref_14","unstructured":"Zhao, C., Hu, X., and Wang, G. (2020). PDLight: A Deep Reinforcement Learning Traffic Light Control Algorithm with Pressure and Dynamic Light Duration. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1016\/j.engappai.2013.01.007","article-title":"Holonic multi-agent system for traffic signals control","volume":"26","author":"Abdoos","year":"2013","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wei, H., Zheng, G., Yao, H., and Li, Z. (2018, January 19\u201323). Intellilight: A reinforcement learning approach for intelligent traffic light control. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK.","DOI":"10.1145\/3219819.3220096"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"732","DOI":"10.1016\/j.trc.2017.09.020","article-title":"Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events","volume":"85","author":"Aslani","year":"2017","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_18","unstructured":"Casas, N. (2017). Deep deterministic policy gradient for urban traffic light control. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1109\/JAS.2016.7508798","article-title":"Traffic signal timing via deep reinforcement learning","volume":"3","author":"Li","year":"2016","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1049\/iet-its.2017.0153","article-title":"Traffic light control using deep policy-gradient and value-function-based reinforcement learning","volume":"11","author":"Mousavi","year":"2017","journal-title":"IET Intell. Transp. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1243","DOI":"10.1109\/TVT.2018.2890726","article-title":"A Deep Reinforcement Learning Network for Traffic Light Cycle Control","volume":"68","author":"Liang","year":"2019","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Abdoos, M., Mozayani, N., and Bazzan, A.L. (2011, January 5\u20137). Traffic light control in non-stationary environments based on multi agent Q-learning. Proceedings of the 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, DC, USA.","DOI":"10.1109\/ITSC.2011.6083114"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wei, H., Xu, N., Zhang, H., Zheng, G., Zang, X., Chen, C., Zhang, W., Zhu, Y., Xu, K., and Li, Z. (2019, January 3\u20137). Colight: Learning network-level cooperation for traffic signal control. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China.","DOI":"10.1145\/3357384.3357902"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, C., Wei, H., Xu, N., Zheng, G., Yang, M., Xiong, Y., Xu, K., and Li, Z. (2020, January 7\u201312). Toward a Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i04.5744"},{"key":"ref_25","unstructured":"Van der Pol, E., and Oliehoek, F.A. Coordinated deep reinforcement learners for traffic light control. Proceedings of the Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016), 9th December 2016, Barcelona, Spain."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_27","unstructured":"Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y. (December, January 29). Policy gradient methods for reinforcement learning with function approximation. Proceedings of the NIPs 1999, Denver, CO, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1057\/jors.1963.61","article-title":"Settings for fixed-cycle traffic signals","volume":"14","author":"Miller","year":"1963","journal-title":"J. Oper. Res. Soc."},{"key":"ref_29","unstructured":"Little, J.D., Kelson, M.D., and Gartner, N.H. (1981). MAXBAND: A Versatile Program for Setting Signals on Arteries and Triangular Networks, Springer."},{"key":"ref_30","unstructured":"Gershenson, C. (2004). Self-organizing traffic lights. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Cools, S.-B., Gershenson, C., and D\u2019Hooghe, B. (2013). Self-Organizing Traffic Lights: A Realistic Simulation. Advances in Applied Self-Organizing Systems, Springer.","DOI":"10.1007\/978-1-4471-5113-5_3"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Varaiya, P. (2013). The Max-Pressure Controller for Arbitrary Networks of Signalized Intersections. Advances in Dynamic Network Modeling in Complex Transportation Systems, Springer.","DOI":"10.1007\/978-1-4614-6243-9_2"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.trc.2013.08.014","article-title":"Max pressure control of a network of signalized intersections","volume":"36","author":"Varaiya","year":"2013","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1086","DOI":"10.1109\/TITS.2019.2901791","article-title":"Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control","volume":"21","author":"Chu","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/j.trc.2014.11.009","article-title":"Decentralized signal control for urban road networks","volume":"58","author":"Le","year":"2015","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Kuyer, L., Whiteson, S., Bakker, B., and Vlassis, N. (2008). Multiagent reinforcement learning for urban traffic control using coordination graphs. Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Dublin, Ireland, 10\u201314 September 2018, Springer.","DOI":"10.1007\/978-3-540-87479-9_61"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1049\/iet-its.2009.0070","article-title":"Reinforcement learning-based multi-agent system for network traffic signal control","volume":"4","author":"Arel","year":"2010","journal-title":"IET Intell. Transp. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1109\/TITS.2013.2255286","article-title":"Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto","volume":"14","author":"Abdulhai","year":"2013","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wei, H., Chen, C., Zheng, G., Wu, K., Gayah, V., Xu, K., and Li, Z. (2019, January 4\u20138). Presslight: Learning max pressure control to coordinate traffic signals in arterial network. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330949"},{"key":"ref_40","unstructured":"Krajzewicz, D., Erdmann, J., Behrisch, M., and Bieker, L. (2012). Recent development and applications of SUMO-Simulation of Urban MObility. Int. J. Adv. Syst. Meas., 5."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/s10489-013-0455-3","article-title":"Hierarchical control of traffic signals using Q-learning with tile coding","volume":"40","author":"Abdoos","year":"2014","journal-title":"Appl. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1061\/(ASCE)0733-947X(2003)129:3(278)","article-title":"Reinforcement Learning for True Adaptive Traffic Signal Control","volume":"129","author":"Abdulhai","year":"2003","journal-title":"J. Transp. Eng."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"962869","DOI":"10.1155\/2013\/962869","article-title":"The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment","volume":"2013","author":"Xu","year":"2013","journal-title":"Math. Probl. Eng."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"332","DOI":"10.3182\/20140514-3-FR-4046.00128","article-title":"Control experiments for a network of signalized intersections using the \u2018Q\u2019simulator","volume":"47","author":"Lioris","year":"2014","journal-title":"IFAC Proc. Vol."},{"key":"ref_45","unstructured":"Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., and Vanhoucke, V. (2018). Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1109\/TNNLS.2016.2522401","article-title":"Deep Direct Reinforcement Learning for Financial Signal Representation and Trading","volume":"28","author":"Deng","year":"2016","journal-title":"IEEE Trans. Neural Networks Learn. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Pan, X., You, Y., Wang, Z., and Lu, C. (2017). Virtual to Real Reinforcement Learning for Autonomous Driving. arXiv.","DOI":"10.5244\/C.31.11"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1038\/s41591-018-0310-5","article-title":"Guidelines for reinforcement learning in healthcare","volume":"25","author":"Gottesman","year":"2019","journal-title":"Nat. Med."},{"key":"ref_49","unstructured":"Fran\u00e7ois-Lavet, V., Taralla, D., Ernst, D., and Fonteneau, R. (2016, January 3\u20134). Deep reinforcement learning solutions for energy microgrids management. Proceedings of the European Workshop on Reinforcement Learning (EWRL 2016), Barcelona, Italy."},{"key":"ref_50","unstructured":"Gauci, J., Conti, E., Liang, Y., Virochsiri, K., He, Y., Kaden, Z., Narayanan, V., Ye, X., Chen, Z., and Fujimoto, S. (2018). Horizon: Facebook\u2019s open source applied reinforcement learning platform. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1145\/203330.203343","article-title":"Temporal difference learning and TD-Gammon","volume":"38","author":"Tesauro","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_55","unstructured":"Konda, V.R., and Tsitsiklis, J.N. (2000). Actor-critic algorithms. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1177\/0361198196153800104","article-title":"Field studies of pedestrian walking speed and start-up time","volume":"1538","author":"Knoblauch","year":"1996","journal-title":"Transp. Res. Rec."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Kuutti, S., Bowden, R., Joshi, H., de Temple, R., and Fallah, S. (2019, January 27\u201330). End-to-end Reinforcement Learning for Autonomous Longitudinal Control Using Advantage Actor Critic with Temporal Context. Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand.","DOI":"10.1109\/ITSC.2019.8917387"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Zhang, H., Feng, S., Liu, C., Ding, Y., Zhu, Y., Zhou, Z., Zhang, W., Yu, Y., Jin, H., and Li, Z. (2019, January 13\u201317). CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario. Proceedings of the World Wide Web Conference, San Francisco, CA, USA.","DOI":"10.1145\/3308558.3314139"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2818\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:49:24Z","timestamp":1760136564000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2818"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,6]]},"references-count":58,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["s22072818"],"URL":"https:\/\/doi.org\/10.3390\/s22072818","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,6]]}}}