{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:30:06Z","timestamp":1750221006311,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,7,25]],"date-time":"2019-07-25T00:00:00Z","timestamp":1564012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,7,25]]},"DOI":"10.1145\/3292500.3332299","type":"proceedings-article","created":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T13:17:26Z","timestamp":1564147046000},"page":"3201-3202","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Deep Reinforcement Learning with Applications in Transportation"],"prefix":"10.1145","author":[{"given":"Zhiwei (Tony)","family":"Qin","sequence":"first","affiliation":[{"name":"DiDi AI Labs, Didi Chuxing, Mountain View, CA, USA"}]},{"given":"Jian","family":"Tang","sequence":"additional","affiliation":[{"name":"DiDi AI Labs, Didi Chuxing &amp; Syracuse University, Beijing, China"}]},{"given":"Jieping","family":"Ye","sequence":"additional","affiliation":[{"name":"DiDi AI Labs, Didi Chuxing &amp; University of Michigan, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,7,25]]},"reference":[{"volume-title":"Dynamic programming and optimal control","author":"Bertsekas Dimitri P","key":"e_1_3_2_1_1_1","unstructured":"Dimitri P Bertsekas , Dimitri P Bertsekas , Dimitri P Bertsekas , and Dimitri P Bertsekas . 2005. Dynamic programming and optimal control . Vol. 1 . Athena scientific Belmont, MA. Dimitri P Bertsekas, Dimitri P Bertsekas, Dimitri P Bertsekas, and Dimitri P Bertsekas. 2005. Dynamic programming and optimal control . Vol. 1. Athena scientific Belmont, MA."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.1995.478953"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.4236\/jdaip.2016.44014"},{"volume-title":"NeurIPS 2018 Deep Reinforcement Learning Workshop .","author":"Holler J.","key":"e_1_3_2_1_4_1","unstructured":"J. Holler , Z. Qin , X. Tang , Y. Jiao , T. Jin , S. Singh , C. Wang , and J. Ye . 2018. Deep Q-Learning Approaches to Dynamic Multi-Driver Dispatching and Repositioning . In NeurIPS 2018 Deep Reinforcement Learning Workshop . J. Holler, Z. Qin, X. Tang, Y. Jiao, T. Jin, S. Singh, C. Wang, and J. Ye. 2018. Deep Q-Learning Approaches to Dynamic Multi-Driver Dispatching and Repositioning. In NeurIPS 2018 Deep Reinforcement Learning Workshop ."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622481"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313433"},{"key":"e_1_3_2_1_7_1","volume-title":"Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274","author":"Yuxi Li.","year":"2017","unstructured":"Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 ( 2017 ). Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017)."},{"key":"e_1_3_2_1_8_1","volume-title":"Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971","author":"Lillicrap Timothy P","year":"2015","unstructured":"Timothy P Lillicrap , Jonathan J Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 ( 2015 ). Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)."},{"key":"e_1_3_2_1_9_1","volume-title":"International conference on machine learning. 1928--1937","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza , Alex Graves , Timothy Lillicrap , Tim Harley , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous methods for deep reinforcement learning . In International conference on machine learning. 1928--1937 . Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. 1928--1937."},{"key":"e_1_3_2_1_10_1","volume-title":"et almbox","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A Rusu , Joel Veness , Marc G Bellemare , Alex Graves , Martin Riedmiller , Andreas K Fidjeland , Georg Ostrovski , et almbox . 2015 . Human-level control through deep reinforcement learning. Nature , Vol. 518 , 7540 (2015), 529--533. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. 2015. Human-level control through deep reinforcement learning. Nature , Vol. 518, 7540 (2015), 529--533."},{"volume-title":"Approximate Dynamic Programming: Solving the curses of dimensionality","author":"Powell Warren B","key":"e_1_3_2_1_11_1","unstructured":"Warren B Powell . 2007. Approximate Dynamic Programming: Solving the curses of dimensionality . Vol. 703 . John Wiley & Sons . Warren B Powell. 2007. Approximate Dynamic Programming: Solving the curses of dimensionality. Vol. 703. John Wiley & Sons."},{"key":"e_1_3_2_1_12_1","volume-title":"Prioritized experience replay. arXiv preprint arXiv:1511.05952","author":"Schaul Tom","year":"2015","unstructured":"Tom Schaul , John Quan , Ioannis Antonoglou , and David Silver . 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 ( 2015 ). Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)."},{"key":"e_1_3_2_1_13_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 ( 2017 ). John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022633531479"},{"key":"e_1_3_2_1_15_1","volume-title":"et almbox","author":"Sutton Richard S","year":"1998","unstructured":"Richard S Sutton , Andrew G Barto , et almbox . 1998 . Reinforcement learning: An introduction .MIT press. Richard S Sutton, Andrew G Barto, et almbox. 1998. Reinforcement learning: An introduction .MIT press."},{"key":"e_1_3_2_1_16_1","volume-title":"Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning","author":"Szepesv\u00e1ri Csaba","year":"2010","unstructured":"Csaba Szepesv\u00e1ri . 2010. Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning , Vol. 4 , 1 ( 2010 ), 1--103. Csaba Szepesv\u00e1ri. 2010. Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning , Vol. 4, 1 (2010), 1--103."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330724"},{"key":"e_1_3_2_1_18_1","unstructured":"John N Tsitsiklis and Benjamin Van Roy. 1997. Analysis of temporal-diffference learning with function approximation. In Advances in neural information processing systems. 1075--1081.   John N Tsitsiklis and Benjamin Van Roy. 1997. Analysis of temporal-diffference learning with function approximation. In Advances in neural information processing systems. 1075--1081."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Hado Van Hasselt Arthur Guez and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning.. In AAAI . 2094--2100.   Hado Van Hasselt Arthur Guez and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning.. In AAAI . 2094--2100.","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"e_1_3_2_1_20_1","volume-title":"Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In International Conference on Data Mining. IEEE.","author":"Wang Zhaodong","year":"2018","unstructured":"Zhaodong Wang , Zhiwei Qin , Xiaocheng Tang , Jieping Ye , and Hongtu Zhu . 2018 . Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In International Conference on Data Mining. IEEE. Zhaodong Wang, Zhiwei Qin, Xiaocheng Tang, Jieping Ye, and Hongtu Zhu. 2018. Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In International Conference on Data Mining. IEEE."},{"key":"e_1_3_2_1_21_1","volume-title":"Machine learning","author":"Watkins Christopher JCH","year":"1992","unstructured":"Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning , Vol. 8 , 3--4 ( 1992 ), 279--292. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning , Vol. 8, 3--4 (1992), 279--292."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219824"},{"key":"e_1_3_2_1_23_1","volume-title":"Mean Field Multi-Agent Reinforcement Learning. CoRR","author":"Yang Yaodong","year":"2018","unstructured":"Yaodong Yang , Rui Luo , Minne Li , Ming Zhou , Weinan Zhang , and Jun Wang . 2018. Mean Field Multi-Agent Reinforcement Learning. CoRR , Vol. abs\/ 1802 .05438 ( 2018 ). arxiv: 1802.05438 http:\/\/arxiv.org\/abs\/1802.05438 Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean Field Multi-Agent Reinforcement Learning. CoRR , Vol. abs\/1802.05438 (2018). arxiv: 1802.05438 http:\/\/arxiv.org\/abs\/1802.05438"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI.2016.7849837"}],"event":{"name":"KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"location":"Anchorage AK USA","acronym":"KDD '19"},"container-title":["Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3292500.3332299","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3292500.3332299","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:25:56Z","timestamp":1750206356000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3292500.3332299"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,25]]},"references-count":24,"alternative-id":["10.1145\/3292500.3332299","10.1145\/3292500"],"URL":"https:\/\/doi.org\/10.1145\/3292500.3332299","relation":{},"subject":[],"published":{"date-parts":[[2019,7,25]]},"assertion":[{"value":"2019-07-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}