{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:26:15Z","timestamp":1778081175610,"version":"3.51.4"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,4,19]],"date-time":"2023-04-19T00:00:00Z","timestamp":1681862400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["CNS-1837244"],"award-info":[{"award-number":["CNS-1837244"]}]},{"name":"AWS Machine Learning Research"},{"name":"U.S. Department of Energy\u2019s Office of Energy Efficiency and Renewable Energy","award":["CID DE-EE0008872"],"award-info":[{"award-number":["CID DE-EE0008872"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2023,4,30]]},"abstract":"<jats:p>\n            We study the ability of autonomous vehicles to improve the throughput of a bottleneck using a fully decentralized control scheme in a mixed autonomy setting. We consider the problem of improving the throughput of a scaled model of the San Francisco\u2013Oakland Bay Bridge: a two-stage bottleneck where four lanes reduce to two and then reduce to one. Although there is extensive work examining variants of bottleneck control in a centralized setting, there is less study of the challenging multi-agent setting where the large number of interacting AVs leads to significant optimization difficulties for reinforcement learning methods. We apply multi-agent reinforcement algorithms to this problem and demonstrate that significant improvements in bottleneck throughput, from 20% at a 5% penetration rate to 33% at a 40% penetration rate, can be achieved. We compare our results to a hand-designed feedback controller and demonstrate that our results sharply outperform the feedback controller despite extensive tuning. Additionally, we demonstrate that the RL-based controllers adopt a robust strategy that works across penetration rates whereas the feedback controllers degrade immediately upon penetration rate variation. We investigate the feasibility of both action and observation decentralization and demonstrate that effective strategies are possible using purely local sensing. Finally, we open-source our code at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/eugenevinitsky\/decentralized_bottlenecks\">https:\/\/github.com\/eugenevinitsky\/decentralized_bottlenecks<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3582576","type":"journal-article","created":{"date-parts":[[2023,2,9]],"date-time":"2023-02-09T13:45:05Z","timestamp":1675950305000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Optimizing Mixed Autonomy Traffic Flow with Decentralized Autonomous Vehicles and Multi-Agent Reinforcement Learning"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2372-4944","authenticated-orcid":false,"given":"Eugene","family":"Vinitsky","sequence":"first","affiliation":[{"name":"UC Berkeley, Mechanical Engineering, Berkeley, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5521-6517","authenticated-orcid":false,"given":"Nathan","family":"Lichtl\u00e9","sequence":"additional","affiliation":[{"name":"\u00c9cole des Ponts ParisTech, Champs-sur-Marne, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1446-0609","authenticated-orcid":false,"given":"Kanaad","family":"Parvate","sequence":"additional","affiliation":[{"name":"UC Berkeley, Berkeley, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6697-222X","authenticated-orcid":false,"given":"Alexandre","family":"Bayen","sequence":"additional","affiliation":[{"name":"UC Berkeley EECS, Institute of Transportation Systems, Berkeley, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,4,19]]},"reference":[{"key":"e_1_3_4_2_2","unstructured":"Joshua Achiam Ethan Knight and Pieter Abbeel. 2019. Towards characterizing divergence in deep q-learning. arXiv preprint arXiv:1903.08894 (2019)."},{"key":"e_1_3_4_3_2","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1007\/0-387-24109-4_3","volume-title":"Simulation Approaches in Transportation Analysis","author":"Barcel\u00f3 Jaime","year":"2005","unstructured":"Jaime Barcel\u00f3 and Jordi Casas. 2005. Dynamic network simulation with AIMSUN. In Simulation Approaches in Transportation Analysis. R. Kitamura and M. Kuwahara (Eds.), Springer, 57\u201398."},{"key":"e_1_3_4_4_2","first-page":"679","article-title":"A Markovian decision process","author":"Bellman Richard","year":"1957","unstructured":"Richard Bellman. 1957. A Markovian decision process. Journal of Mathematics and Mechanics 6, 5 (1957), 679\u2013684.","journal-title":"Journal of Mathematics and Mechanics"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.trb.2006.02.011"},{"key":"e_1_3_4_6_2","unstructured":"Mladen \u010ci\u010di\u0107 Li Jin and Karl Henrik Johansson. 2019. Coordinating vehicle platoons for highway bottleneck decongestion and throughput improvement. arXiv preprint arXiv:1907.13049 (2019)."},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-15024-6_7"},{"key":"e_1_3_4_8_2","unstructured":"Scott Fujimoto Herke Van Hoof and David Meger. 2018. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477 (2018)."},{"issue":"1320","key":"e_1_3_4_9_2","article-title":"Freeway capacity drop and the definition of capacity","author":"Hall Fred L.","year":"1991","unstructured":"Fred L. Hall and Kwaku Agyemang-Duah. 1991. Freeway capacity drop and the definition of capacity. Transportation Research Record1320 (1991), 91\u201398.","journal-title":"Transportation Research Record"},{"issue":"2","key":"e_1_3_4_10_2","first-page":"610","article-title":"Feedback-based mainstream traffic flow control for multiple bottlenecks on motorways","volume":"16","author":"Iordanidou Georgia-Roumpini","year":"2014","unstructured":"Georgia-Roumpini Iordanidou, Claudio Roncoli, Ioannis Papamichail, and Markos Papageorgiou. 2014. Feedback-based mainstream traffic flow control for multiple bottlenecks on motorways. IEEE Transactions on Intelligent Transportation Systems 16, 2 (2014), 610\u2013621.","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"e_1_3_4_11_2","volume-title":"Microscopic Modeling of Traffic Flow: Investigation of Collision Free Vehicle Dynamics","author":"Krau\u00df Stefan","year":"1998","unstructured":"Stefan Krau\u00df. 1998. Microscopic Modeling of Traffic Flow: Investigation of Collision Free Vehicle Dynamics. Ph.D. Dissertation."},{"key":"e_1_3_4_12_2","unstructured":"Eric Liang Richard Liaw Philipp Moritz Robert Nishihara Roy Fox Ken Goldberg Joseph E. Gonzalez Michael I. Jordan and Ion Stoica. 2017. RLlib: Abstractions for Distributed Reinforcement Learning. arXiv:1712.09381 [cs.AI]"},{"key":"e_1_3_4_13_2","first-page":"3053","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Liang Eric","year":"2018","unstructured":"Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the International Conference on Machine Learning. 3053\u20133062."},{"key":"e_1_3_4_14_2","unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)."},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2018.8569938"},{"key":"e_1_3_4_16_2","first-page":"6379","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi I. Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6379\u20136390."},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2010.5625107"},{"key":"e_1_3_4_18_2","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"issue":"1","key":"e_1_3_4_19_2","first-page":"58","article-title":"ALINEA: A local feedback control law for on-ramp metering","volume":"1320","author":"Papageorgiou Markos","year":"1991","unstructured":"Markos Papageorgiou, Habib Hadj-Salem, Jean-Marc Blosseville. 1991. ALINEA: A local feedback control law for on-ramp metering. Transportation Research Record 1320, 1 (1991), 58\u201367.","journal-title":"Transportation Research Record"},{"key":"e_1_3_4_20_2","unstructured":"Tabish Rashid Mikayel Samvelyan Christian Schroeder De Witt Gregory Farquhar Jakob Foerster and Shimon Whiteson. 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485 (2018)."},{"key":"e_1_3_4_21_2","unstructured":"Tom Schaul John Quan Ioannis Antonoglou and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)."},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/0191-2615(90)90023-R"},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.1061\/(ASCE)0733-947X(2010)136:1(67)"},{"key":"e_1_3_4_24_2","unstructured":"Cathy Wu Aboudy Kreidieh Kanaad Parvate Eugene Vinitsky and Alexandre M. Bayen. 2017. Flow: A modular learning framework for autonomy in traffic. arXiv preprint arXiv:1710.05465 (2017)."},{"key":"e_1_3_4_25_2","unstructured":"Yuankai Wu Huachun Tan and Bin Ran. 2018. Differential variable speed limits control for freeway recurrent bottlenecks via deep reinforcement learning. arXiv preprint arXiv:1810.10952 (2018)."}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582576","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3582576","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:13Z","timestamp":1750183753000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582576"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,19]]},"references-count":24,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4,30]]}},"alternative-id":["10.1145\/3582576"],"URL":"https:\/\/doi.org\/10.1145\/3582576","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,19]]},"assertion":[{"value":"2021-10-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-04-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}