{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T15:01:17Z","timestamp":1773673277518,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T00:00:00Z","timestamp":1683763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2019YFB1600100"],"award-info":[{"award-number":["2019YFB1600100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In advanced transportation-management systems, variable speed limits are a crucial application. Deep reinforcement learning methods have been shown to have superior performance in many applications, as they are an effective approach to learning environment dynamics for decision-making and control. However, they face two significant difficulties in traffic-control applications: reward engineering with delayed reward and brittle convergence properties with gradient descent. To address these challenges, evolutionary strategies are well suited as a class of black-box optimization techniques inspired by natural evolution. Additionally, the traditional deep reinforcement learning framework struggles to handle the delayed reward setting. This paper proposes a novel approach using covariance matrix adaptation evolution strategy (CMA-ES), a gradient-free global optimization method, to handle the task of multi-lane differential variable speed limit control. The proposed method uses a deep-learning-based method to dynamically learn optimal and distinct speed limits among lanes. The parameters of the neural network are sampled using a multivariate normal distribution, and the dependencies between the variables are represented by a covariance matrix that is optimized dynamically by CMA-ES based on the freeway\u2019s throughput. The proposed approach is tested on a freeway with simulated recurrent bottlenecks, and the experimental results show that it outperforms deep reinforcement learning-based approaches, traditional evolutionary search methods, and the no-control scenario. Our proposed method demonstrates a 23% improvement in average travel time and an average of a 4% improvement in CO, HC, and NOx emission.Furthermore, the proposed method produces explainable speed limits and has desirable generalization power.<\/jats:p>","DOI":"10.3390\/s23104659","type":"journal-article","created":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T05:05:02Z","timestamp":1683781502000},"page":"4659","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Multi-Lane Differential Variable Speed Limit Control via Deep Neural Networks Optimized by an Adaptive Evolutionary Strategy"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3651-4751","authenticated-orcid":false,"given":"Jianshuai","family":"Feng","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianyu","family":"Shi","sequence":"additional","affiliation":[{"name":"Intelligent Transportation Systems Centre, University of Toronto, Toronto, ON M5S 1A4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuankai","family":"Wu","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu 610065, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiang","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2874-1858","authenticated-orcid":false,"given":"Hongwen","family":"He","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huachun","family":"Tan","sequence":"additional","affiliation":[{"name":"Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology, Beijing 100081, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1198","DOI":"10.1109\/TITS.2017.2725912","article-title":"Expert level control of ramp metering based on multi-task deep reinforcement learning","volume":"19","author":"Belletti","year":"2017","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1016\/j.sbspro.2011.08.042","article-title":"Application of ALINEA ramp control algorithm to freeway traffic flow on approaches to Bosphorus strait crossing bridges","volume":"20","author":"Demiral","year":"2011","journal-title":"Procedia-Soc. Behav. Sci."},{"key":"ref_3","unstructured":"Ran, B., Cheng, Y., Li, S., Zhang, Z., Ding, F., Tan, H., Wu, Y., Dong, S., Ye, L., and Li, X. (2020). Intelligent Road Infrastructure System (IRIS): Systems and Methods. (10,692,365), U.S. Patent."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/j.trc.2004.08.001","article-title":"Model predictive control for optimal coordination of ramp metering and variable speed limits","volume":"13","author":"Hegyi","year":"2005","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1109\/TITS.2004.842408","article-title":"Optimal coordination of variable speed limits to suppress shock waves","volume":"6","author":"Hegyi","year":"2005","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.trpro.2017.03.051","article-title":"Simulation-based variable speed limit systems modelling: An overview and a case study on Istanbul freeways","volume":"22","author":"Sadat","year":"2017","journal-title":"Transp. Res. Procedia"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/j.ifacol.2021.06.005","article-title":"Integral Input-to-State Stability of Traffic Flow with Variable Speed Limit","volume":"54","author":"Silgu","year":"2021","journal-title":"IFAC-PapersOnLine"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"11011","DOI":"10.1109\/TITS.2021.3098640","article-title":"Combined control of freeway traffic involving cooperative adaptive cruise controlled and human driven vehicles using feedback control through SUMO","volume":"23","author":"Silgu","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.ifacol.2021.06.003","article-title":"H\u221e State Feedback Controller for ODE Model of Traffic Flow","volume":"54","author":"Silgu","year":"2021","journal-title":"IFAC-PapersOnLine"},{"key":"ref_10","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.neucom.2021.04.133","article-title":"Reinforcement learning-based finite-time tracking control of an unknown unmanned surface vehicle with input constraints","volume":"484","author":"Wang","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1016\/j.neucom.2020.06.094","article-title":"Deep reinforcement learning based lane detection and localization","volume":"413","author":"Zhao","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1016\/j.neucom.2021.11.106","article-title":"A distributed deep reinforcement learning method for traffic light control","volume":"490","author":"Liu","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3204","DOI":"10.1109\/TITS.2017.2687620","article-title":"Reinforcement learning-based variable speed limit control strategy to reduce traffic congestion at freeway recurrent bottlenecks","volume":"18","author":"Li","year":"2017","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"102649","DOI":"10.1016\/j.trc.2020.102649","article-title":"Differential variable speed limits control for freeway recurrent bottlenecks via deep actor-critic algorithm","volume":"117","author":"Wu","year":"2020","journal-title":"Transp. Res. Part Emerg. Technol."},{"key":"ref_16","first-page":"1","article-title":"Decentralised reinforcement learning for ramp metering and variable speed limits on highways","volume":"14","year":"2015","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"4684","DOI":"10.1109\/TITS.2020.2990598","article-title":"Enhancing transferability of deep reinforcement learning-based variable speed limit control using transfer learning","volume":"22","author":"Ke","year":"2020","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_19","unstructured":"Wu, Y., Tan, H., and Ran, B. (2018). Differential variable speed limits control for freeway recurrent bottlenecks via deep reinforcement learning. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"104850","DOI":"10.1016\/j.engappai.2022.104850","article-title":"Impact of Deep Reinforcement Learning on Variable Speed Limit strategies in connected vehicles environments","volume":"112","author":"Ivanjko","year":"2022","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_21","unstructured":"Zhang, H., Chen, H., Boning, D., and Hsieh, C.J. (2021). Robust reinforcement learning on state observations with learned optimal adversary. arXiv."},{"key":"ref_22","unstructured":"Jaafra, Y., Laurent, J.L., Deruyver, A., and Naceur, M.S. (2019, January 6). Robust Reinforcement Learning for Autonomous Driving. Proceedings of the ICLR 2019 Workshop on Deep RL Meets Structured Prediction, New Orleans, LA, USA."},{"key":"ref_23","unstructured":"Kulkarni, T.D., Narasimhan, K., Saeedi, A., and Tenenbaum, J. (2016, January 5\u201310). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_24","unstructured":"Han, B., Ren, Z., Wu, Z., Zhou, Y., and Peng, J. (2022, January 17\u201323). Off-policy reinforcement learning with delayed rewards. Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA."},{"key":"ref_25","unstructured":"Salimans, T., Ho, J., Chen, X., Sidor, S., and Sutskever, I. (2017). Evolution strategies as a scalable alternative to reinforcement learning. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1109\/MITS.2019.2907631","article-title":"Reinforcement learning-based variable speed limits control to reduce crash risks near traffic oscillations on freeways","volume":"13","author":"Li","year":"2020","journal-title":"IEEE Intell. Transp. Syst. Mag."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"41947","DOI":"10.1109\/ACCESS.2019.2904619","article-title":"A new solution for freeway congestion: Cooperative speed limit control using distributed reinforcement learning","volume":"7","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_28","unstructured":"Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., and Clune, J. (2017). Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1162\/106365601750190398","article-title":"Completely derandomized self-adaptation in evolution strategies","volume":"9","author":"Hansen","year":"2001","journal-title":"Evol. Comput."},{"key":"ref_30","first-page":"1793","article-title":"Statistical normalization and back propagation for classification","volume":"3","author":"Jayalakshmi","year":"2011","journal-title":"Int. J. Comput. Theory Eng."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Song, J., Wu, Y., Xu, Z., and Lin, X. (2014, January 14\u201316). Research on car-following model based on SUMO. Proceedings of the the 7th IEEE\/International Conference on Advanced Infocomm Technology, Fuzhou, China.","DOI":"10.1109\/ICAIT.2014.7019528"},{"key":"ref_32","unstructured":"Notter, B., Keller, M., and Cox, B. (2021, September 05). Handbook Emission Factors for Road Transport 4.1. Quick Ref. Bern, Germany, 28 June 2019. Available online: https:\/\/www.hbefa.net\/e\/help\/HBEFA41_help_en.pdf."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lee, S., Ha, M.H., and Moon, B. (2020, January 4\u20138). Understanding features on evolutionary policy optimizations: Feature learning difference between gradient-based and evolutionary policy optimizations. Proceedings of the 35th Annual ACM Symposium on Applied Computing, Pisa, Italy.","DOI":"10.1145\/3341105.3373966"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40649-019-0069-y","article-title":"Graph convolutional networks: A comprehensive review","volume":"6","author":"Zhang","year":"2019","journal-title":"Comput. Soc. Netw."},{"key":"ref_35","unstructured":"Shi, T., Wang, J., Wu, Y., and Sun, L. (2020). Towards efficient connected and automated driving system via multi-agent graph reinforcement learning. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/10\/4659\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:32:52Z","timestamp":1760124772000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/10\/4659"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,11]]},"references-count":35,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["s23104659"],"URL":"https:\/\/doi.org\/10.3390\/s23104659","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,11]]}}}