{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T16:10:31Z","timestamp":1776355831619,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,11,3]],"date-time":"2019-11-03T00:00:00Z","timestamp":1572739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSFC","award":["61702327,61772333,61632017"],"award-info":[{"award-number":["61702327,61772333,61632017"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,11,3]]},"DOI":"10.1145\/3357384.3357799","type":"proceedings-article","created":{"date-parts":[[2019,11,4]],"date-time":"2019-11-04T14:11:35Z","timestamp":1572876695000},"page":"2645-2653","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":73,"title":["Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching"],"prefix":"10.1145","author":[{"given":"Ming","family":"Zhou","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Jiarui","family":"Jin","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Weinan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Zhiwei","family":"Qin","sequence":"additional","affiliation":[{"name":"DiDi AI Labs, Beijing, China"}]},{"given":"Yan","family":"Jiao","sequence":"additional","affiliation":[{"name":"DiDi AI Labs, Beijing, China"}]},{"given":"Chenxi","family":"Wang","sequence":"additional","affiliation":[{"name":"DiDi AI Labs, Beijing, China"}]},{"given":"Guobin","family":"Wu","sequence":"additional","affiliation":[{"name":"DiDi Research, Beijing, China"}]},{"given":"Yong","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"given":"Jieping","family":"Ye","sequence":"additional","affiliation":[{"name":"DiDi AI Labs, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,11,3]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"In8th international conference on autonomous agents and multiagent systems. 21--28","author":"Alshamsi Aamena","year":"2009","unstructured":"Aamena Alshamsi , Sherief Abdallah , and Iyad Rahwan . 2009 . Multiagent self-organization for a taxi dispatch system . In8th international conference on autonomous agents and multiagent systems. 21--28 . Aamena Alshamsi, Sherief Abdallah, and Iyad Rahwan. 2009. Multiagent self-organization for a taxi dispatch system. In8th international conference on autonomous agents and multiagent systems. 21--28."},{"key":"e_1_3_2_1_2_1","volume-title":"Opponent modeling in poker. Aaai\/iaai493","author":"Billings Darse","year":"1998","unstructured":"Darse Billings , Denis Papp , Jonathan Schaeffer , and Duane Szafron . 1998. Opponent modeling in poker. Aaai\/iaai493 ( 1998 ), 499. Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron. 1998. Opponent modeling in poker. Aaai\/iaai493 (1998), 499."},{"key":"e_1_3_2_1_3_1","volume-title":"Robotics and Vision, 2006. ICARCV'06. 9th International Conference on. IEEE, 1--6.","author":"Busoniu Lucian","year":"2006","unstructured":"Lucian Busoniu , Robert Babuska , and Bart De Schutter . 2006 . Multi-agent rein-forcement learning: A survey. In Control, Automation , Robotics and Vision, 2006. ICARCV'06. 9th International Conference on. IEEE, 1--6. Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2006. Multi-agent rein-forcement learning: A survey. In Control, Automation, Robotics and Vision, 2006. ICARCV'06. 9th International Conference on. IEEE, 1--6."},{"key":"e_1_3_2_1_4_1","first-page":"549","article-title":"Context-aware distributive taxicab dispatching. (March 19 2015)","volume":"14","author":"Chadwick Stephen C","year":"2015","unstructured":"Stephen C Chadwick and Charles Baron . 2015 . Context-aware distributive taxicab dispatching. (March 19 2015) . US Patent App. 14\/125 , 549 . Stephen C Chadwick and Charles Baron. 2015. Context-aware distributive taxicab dispatching. (March 19 2015). US Patent App. 14\/125,549.","journal-title":"US Patent App."},{"key":"e_1_3_2_1_5_1","unstructured":"Lee Chean Chung. 2005.GPS taxi dispatch system based on A* shortest pathalgorithm. Ph.D. Dissertation. Master's thesis Submitted to the Department of Transportation and Logistics at Malausia University of Science and Technology(MUST) in partial fulfillment of the requirements for the degree of Master of Science in Transportation and Logistics.  Lee Chean Chung. 2005.GPS taxi dispatch system based on A* shortest pathalgorithm. Ph.D. Dissertation. Master's thesis Submitted to the Department of Transportation and Logistics at Malausia University of Science and Technology(MUST) in partial fulfillment of the requirements for the degree of Master of Science in Transportation and Logistics."},{"key":"e_1_3_2_1_6_1","volume-title":"Nando de Freitas, and Shimon Whiteson.","author":"Foerster Jakob","year":"2016","unstructured":"Jakob Foerster , Ioannis Alexandros Assael , Nando de Freitas, and Shimon Whiteson. 2016 . Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems . 2137--2145. Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems. 2137--2145."},{"key":"e_1_3_2_1_7_1","unstructured":"Jakob N Foerster Yannis M Assael Nando de Freitas and Shimon Whiteson.2016. Learning to communicate to solve riddles with deep distributed recurrentq-networks. arXiv preprint arXiv:1602.02672(2016).  Jakob N Foerster Yannis M Assael Nando de Freitas and Shimon Whiteson.2016. Learning to communicate to solve riddles with deep distributed recurrentq-networks. arXiv preprint arXiv:1602.02672(2016)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"e_1_3_2_1_9_1","unstructured":"Matthew John Hausknecht. 2016.Cooperation and communication in multiagent deep reinforcement learning. Ph.D. Dissertation.  Matthew John Hausknecht. 2016.Cooperation and communication in multiagent deep reinforcement learning. Ph.D. Dissertation."},{"key":"e_1_3_2_1_10_1","volume-title":"1998. Multiagent reinforcement learning: theoretical framework and an algorithm.. InICML","author":"Hu Junling","unstructured":"Junling Hu , Michael P Wellman , 1998. Multiagent reinforcement learning: theoretical framework and an algorithm.. InICML , Vol. 98 . Citeseer , 242--250 Junling Hu, Michael P Wellman, et al.1998. Multiagent reinforcement learning: theoretical framework and an algorithm.. InICML, Vol. 98. Citeseer, 242--250"},{"key":"e_1_3_2_1_11_1","volume-title":"Taxi dispatch system based on current demands and real-time traffic conditions. Transportation Research Record: Journal of the Transportation Research Board1882","author":"Lee Der-Horng","year":"2004","unstructured":"Der-Horng Lee , Hao Wang , Ruey Cheu , and Siew Teo . 2004. Taxi dispatch system based on current demands and real-time traffic conditions. Transportation Research Record: Journal of the Transportation Research Board1882 ( 2004 ), 193--200. Der-Horng Lee, Hao Wang, Ruey Cheu, and Siew Teo. 2004. Taxi dispatch system based on current demands and real-time traffic conditions. Transportation Research Record: Journal of the Transportation Research Board1882 (2004), 193--200."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/PERCOMW.2011.5766967"},{"key":"e_1_3_2_1_13_1","unstructured":"Minne Li Yan Jiao Yaodong Yang Zhichen Gong Jun Wang Chenxi Wang Guobin Wu Jieping Ye etal 2019. Efficient Ride sharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1901.11454(2019).  Minne Li Yan Jiao Yaodong Yang Zhichen Gong Jun Wang Chenxi Wang Guobin Wu Jieping Ye et al. 2019. Efficient Ride sharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1901.11454(2019)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/17.946533"},{"key":"e_1_3_2_1_15_1","volume-title":"Real-time taxi dispatching using global positioning systems. Commun. ACM46, 5","author":"Liao Ziqi","year":"2003","unstructured":"Ziqi Liao . 2003. Real-time taxi dispatching using global positioning systems. Commun. ACM46, 5 ( 2003 ), 81--83. Ziqi Liao. 2003. Real-time taxi dispatching using global positioning systems. Commun. ACM46, 5 (2003), 81--83."},{"key":"e_1_3_2_1_16_1","unstructured":"Kaixiang Lin Renyu Zhao Zhe Xu and Jiayu Zhou. 2018. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. arXiv preprintar Xiv:1802.06444(2018).  Kaixiang Lin Renyu Zhao Zhe Xu and Jiayu Zhou. 2018. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. arXiv preprintar Xiv:1802.06444(2018)."},{"key":"e_1_3_2_1_17_1","volume-title":"OpenAI Pieter Abbeel, and Igor Mordatch","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , OpenAI Pieter Abbeel, and Igor Mordatch . 2017 . Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems . 6379--6390. Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems. 6379--6390."},{"key":"e_1_3_2_1_18_1","volume-title":"Single point of failure: The 10 essential laws of supply chainrisk management","author":"Lynch Gary S","unstructured":"Gary S Lynch . 2009. Single point of failure: The 10 essential laws of supply chainrisk management . John Wiley & Sons . Gary S Lynch. 2009. Single point of failure: The 10 essential laws of supply chainrisk management. John Wiley & Sons."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2016.2529580"},{"key":"e_1_3_2_1_20_1","volume-title":"2015. Human-level control through deep reinforcement learning. Nature 518, 7540","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A Rusu , Joel Veness , Marc G Bellemare , Alex Graves , Martin Riedmiller , Andreas K Fidjeland , Georg Ostrovski , 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 ( 2015 ), 529. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529."},{"key":"e_1_3_2_1_21_1","unstructured":"Igor Mordatch and Pieter Abbeel. 2017. Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv preprint arXiv: 1703.04908(2017).  Igor Mordatch and Pieter Abbeel. 2017. Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv preprint arXiv: 1703.04908(2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"Algorithms for the assignment and transportation prob-lems.Journal of the society for industrial and applied mathematics 5, 1","author":"Munkres James","year":"1957","unstructured":"James Munkres . 1957. Algorithms for the assignment and transportation prob-lems.Journal of the society for industrial and applied mathematics 5, 1 ( 1957 ),32--38. James Munkres. 1957. Algorithms for the assignment and transportation prob-lems.Journal of the society for industrial and applied mathematics 5, 1 (1957),32--38."},{"key":"e_1_3_2_1_23_1","first-page":"442","article-title":"Automatic optimal taxicab mobile location based dispatching system. (May 14 2013)","volume":"8","author":"Myr David","year":"2013","unstructured":"David Myr . 2013 . Automatic optimal taxicab mobile location based dispatching system. (May 14 2013) . US Patent 8 , 442 ,848. David Myr. 2013. Automatic optimal taxicab mobile location based dispatching system. (May 14 2013). US Patent 8,442,848.","journal-title":"US Patent"},{"key":"e_1_3_2_1_24_1","unstructured":"Takuma Oda and Yulia Tachibana. 2018. Distributed Fleet Control with Maximum Entropy Deep Reinforcement Learning. (2018).  Takuma Oda and Yulia Tachibana. 2018. Distributed Fleet Control with Maximum Entropy Deep Reinforcement Learning. (2018)."},{"key":"e_1_3_2_1_25_1","volume-title":"Combinatorial optimization: algorithms and complexity","author":"Papadimitriou Christos H","unstructured":"Christos H Papadimitriou and Kenneth Steiglitz . 1998. Combinatorial optimization: algorithms and complexity . Courier Corporation . Christos H Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial optimization: algorithms and complexity. Courier Corporation."},{"key":"e_1_3_2_1_26_1","unstructured":"Frederik Schadd Sander Bakkes and Pieter Spronck. 2007. Opponent Modelingin Real-Time Strategy Games.. In GAMEON. 61--70.  Frederik Schadd Sander Bakkes and Pieter Spronck. 2007. Opponent Modelingin Real-Time Strategy Games.. In GAMEON. 61--70."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASE.2009.2028577"},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the national academy ofsciences 39","author":"Shapley Lloyd S","year":"1953","unstructured":"Lloyd S Shapley . 1953 . Stochastic games . Proceedings of the national academy ofsciences 39 , 10 (1953), 1095--1100. Lloyd S Shapley. 1953. Stochastic games. Proceedings of the national academy ofsciences 39, 10 (1953), 1095--1100."},{"key":"e_1_3_2_1_29_1","volume-title":"Reinforcement Learning","author":"Spaan Matthijs TJ","unstructured":"Matthijs TJ Spaan . 2012. Partially observable Markov decision processes . In Reinforcement Learning . Springer , 387--414. Matthijs TJ Spaan. 2012. Partially observable Markov decision processes. In Reinforcement Learning. Springer, 387--414."},{"key":"e_1_3_2_1_30_1","unstructured":"Sainbayar Sukhbaatar Rob Fergus etal2016. Learning multiagent communication with back propagation. In Advances in Neural Information Processing Systems. 2244--2252.  Sainbayar Sukhbaatar Rob Fergus et al.2016. Learning multiagent communication with back propagation. In Advances in Neural Information Processing Systems. 2244--2252."},{"key":"e_1_3_2_1_31_1","unstructured":"Gerald Tesauro. 2004. Extending Q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems. 871--878.  Gerald Tesauro. 2004. Extending Q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems. 871--878."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236211"},{"key":"e_1_3_2_1_33_1","volume-title":"AAAI","volume":"2","author":"Hasselt Hado Van","year":"2016","unstructured":"Hado Van Hasselt , Arthur Guez , and David Silver . 2016 . Deep Reinforcement Learning with Double Q-Learning .. In AAAI , Vol. 2 . Phoenix, AZ, 5. Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning.. In AAAI, Vol. 2. Phoenix, AZ, 5."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219900"},{"key":"e_1_3_2_1_35_1","volume-title":"Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 617--626","author":"Wang Zhaodong","year":"2018","unstructured":"Zhaodong Wang , Zhiwei Qin , Xiaocheng Tang , Jieping Ye , and Hongtu Zhu . 2018 . Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 617--626 . Zhaodong Wang, Zhiwei Qin, Xiaocheng Tang, Jieping Ye, and Hongtu Zhu. 2018. Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 617--626."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2017.2769666"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219824"},{"key":"e_1_3_2_1_38_1","unstructured":"Yaodong Yang Rui Luo Minne Li Ming Zhou Weinan Zhang and Jun Wang. 2018. Mean Field Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1802.05438(2018).  Yaodong Yang Rui Luo Minne Li Ming Zhou Weinan Zhang and Jun Wang. 2018. Mean Field Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1802.05438(2018)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098138"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Lianmin Zheng Jiacheng Yang Han Cai Weinan Zhang Jun Wang and Yong Yu. 2017. MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. NIPS Demo(2017).  Lianmin Zheng Jiacheng Yang Han Cai Weinan Zhang Jun Wang and Yong Yu. 2017. MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. NIPS Demo(2017).","DOI":"10.1609\/aaai.v32i1.11371"}],"event":{"name":"CIKM '19: The 28th ACM International Conference on Information and Knowledge Management","location":"Beijing China","acronym":"CIKM '19","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 28th ACM International Conference on Information and Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357384.3357799","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3357384.3357799","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:43Z","timestamp":1750203883000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3357384.3357799"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,3]]},"references-count":40,"alternative-id":["10.1145\/3357384.3357799","10.1145\/3357384"],"URL":"https:\/\/doi.org\/10.1145\/3357384.3357799","relation":{},"subject":[],"published":{"date-parts":[[2019,11,3]]},"assertion":[{"value":"2019-11-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}