{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T03:47:45Z","timestamp":1775274465242,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T00:00:00Z","timestamp":1677024000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"The National Key Research and Development Program of China","award":["2020AAA0106000"],"award-info":[{"award-number":["2020AAA0106000"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U20B2060, 61971267, 61972223"],"award-info":[{"award-number":["U20B2060, 61971267, 61972223"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>In real-world express systems, couriers need to satisfy not only the delivery demands but also the pick-up demands of customers. Delivery and pickup tasks are usually mixed together within integrated routing plans. Such a mixed routing problem can be abstracted and formulated as Vehicle Routing Problem with Mixed Delivery and Pickup (VRPMDP), which is an NP-hard combinatorial optimization problem. To solve VRPMDP, there are three major challenges as below. (a) Even though successive pickup and delivery tasks are independent to accomplish, the inter-influence between choosing pickup task or delivery task to deal with still exists. (b) Due to the two-way flow of goods between the depot and customers, the loading rate of vehicles leaving the depot affects routing decisions. (c) The proportion of deliveries and pickups will change due to the complex demand situation in real-world scenarios, which requires robustness of the algorithm. To solve the challenges above, we design an encoder-decoder based framework to generate high-quality and robust VRPMDP solutions. First, we consider a VRPMDP instance as a graph and utilize a GNN encoder to extract the feature of the instance effectively. The detailed routing solutions are further decoded as a sequence by the decoder with attention mechanism. Second, we propose a Coordinated Decision of Loading and Routing (CDLR) mechanism to determine the loading rate dynamically after the vehicle returns to the depot, thus avoiding the influence of improper loading rate settings. Finally, the model equipped with a GNN encoder and CDLR simultaneously can adapt to the changes in the proportion of deliveries and pickups. We conduct the experiments to demonstrate the effectiveness of our model. The experiments show that our method achieves desirable results and generalization ability.<\/jats:p>","DOI":"10.1145\/3546952","type":"journal-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T11:28:55Z","timestamp":1657193335000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Reinforcement Learning for Practical Express Systems with Mixed Deliveries and Pickups"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5855-7093","authenticated-orcid":false,"given":"Jinwei","family":"Chen","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing Shi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3236-6640","authenticated-orcid":false,"given":"Zefang","family":"Zong","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing Shi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3798-2920","authenticated-orcid":false,"given":"Yunlin","family":"Zhuang","sequence":"additional","affiliation":[{"name":"Hitachi (China) Research Development Corporation, Beijing Shi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9626-5676","authenticated-orcid":false,"given":"Huan","family":"Yan","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing Shi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0419-5514","authenticated-orcid":false,"given":"Depeng","family":"Jin","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing Shi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5617-1659","authenticated-orcid":false,"given":"Yong","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing Shi, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,2,22]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Hierarchical integrated intelligent logistics system platform","year":"2011","unstructured":"Andrzej and Adamski. 2011. Hierarchical integrated intelligent logistics system platform. Procedia Social & Behavioral Sciences 20 (2011), 1004\u20131016.","journal-title":"Procedia Social & Behavioral Sciences"},{"key":"e_1_3_1_3_2","unstructured":"I. Bello H. Pham Q. V. Le M. Norouzi and S. Bengio. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940."},{"key":"e_1_3_1_4_2","unstructured":"X. Bresson and T. Laurent. 2017. Residual gated graph ConvNets. arXiv preprint arXiv:1711.07553."},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","unstructured":"D. O. Casco B. L. Golden and E. A. Wasil. 2018. Vehicle routing with backhauls: Models algorithms and case studies. Computers & Operations Research 91 (2018) 79\u201381.","DOI":"10.1016\/j.cor.2017.11.003"},{"key":"e_1_3_1_6_2","unstructured":"Xinyun Chen and Yuandong Tian. 2019. Learning to perform local rewriting for combinatorial optimization. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_1_7_2","unstructured":"Kyunghyun Cho. 2016. Noisy parallel approximate decoding for conditional recurrent language model. arXiv:1605.03835."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1057\/palgrave.jors.2601935"},{"key":"e_1_3_1_9_2","first-page":"170","volume-title":"Proceedings of the International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research","author":"Deudon Michel","year":"2018","unstructured":"Michel Deudon, Pierre Cournut, Alexandre Lacoste, Yossiri Adulyasak, and Louis-Martin Rousseau. 2018. Learning heuristics for the TSP by policy gradient. In Proceedings of the International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer, 170\u2013181."},{"key":"e_1_3_1_10_2","volume-title":"Proceedings of the KDD\u201920: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Duan L.","year":"2020","unstructured":"L. Duan, Y. Zhan, H. Hu, Y. Gong, and Y. Xu. 2020. Efficiently solving the practical vehicle routing problem: A novel joint learning approach. In Proceedings of the KDD\u201920: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining."},{"issue":"2","key":"e_1_3_1_11_2","first-page":"221","article-title":"Tabu search","volume":"106","author":"Glover F.","year":"1997","unstructured":"F. Glover, M. Laguna, and R. Mart\u00ed. 1997. Tabu search. General Information 106, 2 (1997), 221\u2013225.","journal-title":"General Information"},{"key":"e_1_3_1_12_2","unstructured":"Google. 2019. OR-Tools. Retrieved from https:\/\/developers.google.com\/optimization\/."},{"key":"e_1_3_1_13_2","unstructured":"Keld Helsgaun. 2017. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. (2017)."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2003.09.013"},{"key":"e_1_3_1_15_2","unstructured":"C. K. Joshi T. Laurent and X. Bresson. 2019. An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227."},{"key":"e_1_3_1_16_2","first-page":"6348","article-title":"Learning combinatorial optimization algorithms over graphs","volume":"30","author":"Khalil Elias","year":"2017","unstructured":"Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning combinatorial optimization algorithms over graphs. Advances in Neural Information Processing Systems 30 (2017), 6348\u20136358.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_17_2","first-page":"1008","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Konda Vijay R.","year":"2000","unstructured":"Vijay R. Konda and John N. Tsitsiklis. 2000. Actor-critic algorithms. In Proceedings of the Advances in Neural Information Processing Systems. 1008\u20131014."},{"key":"e_1_3_1_18_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Kool Wouter","year":"2018","unstructured":"Wouter Kool, Herke van Hoof, and Max Welling. 2018. Attention, learn to solve routing problems!. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02098290"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","unstructured":"Kun Lei Peng Guo Yi Wang Xiao Wu and Wenchao Zhao. 2021. Solve routing problems with a residual edge-graph attention neural network. arXiv:2105.02730.","DOI":"10.1016\/j.neucom.2022.08.005"},{"key":"e_1_3_1_22_2","unstructured":"Timothy P. Lillicrap Jonathan J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)."},{"issue":"1","key":"e_1_3_1_23_2","first-page":"64","article-title":"A variable depth approach for the single-vehicle pickup and delivery problem with time windows","volume":"86","author":"Ljj Vdb","year":"1990","unstructured":"Vdb Ljj, Jjk Lenstra, and P. P. Schuur. 1990. A variable depth approach for the single-vehicle pickup and delivery problem with time windows. American Journal of Clinical Nutrition 86, 1 (1990), 64\u201373.","journal-title":"American Journal of Clinical Nutrition"},{"key":"e_1_3_1_24_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Lu Hao","year":"2019","unstructured":"Hao Lu, Xingwen Zhang, and Shuang Yang. 2019. A learning-based iterative method for solving vehicle routing problems. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_25_2","unstructured":"Qiang Ma Suwen Ge Danyang He Darshan Thaker and Iddo Drori. 2019. Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. arXiv:1911.04936."},{"key":"e_1_3_1_26_2","article-title":"Adaptive large neighborhood search heuristic for pollution-routing problem with simultaneous pickup and delivery","author":"Majidi S.","year":"2017","unstructured":"S. Majidi, S. M. Hosseini-Motlagh, and J. Ignatius. 2017. Adaptive large neighborhood search heuristic for pollution-routing problem with simultaneous pickup and delivery. Soft Computing 22, 9 (2017), 2851\u20132865.","journal-title":"Soft Computing"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","first-page":"105400","DOI":"10.1016\/j.cor.2021.105400","article-title":"Reinforcement learning for combinatorial optimization: A survey","author":"Mazyavkina Nina","year":"2021","unstructured":"Nina Mazyavkina, Sergey Sviridov, Sergei Ivanov, and Evgeny Burnaev. 2021. Reinforcement learning for combinatorial optimization: A survey. Computers & Operations Research 134 (2021), 105400.","journal-title":"Computers & Operations Research"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compind.2015.10.002"},{"key":"e_1_3_1_29_2","article-title":"Artificial intelligence human-level control through deep reinforcement learning","author":"Mnih V.","year":"2015","unstructured":"V. Mnih. 2015. Artificial intelligence human-level control through deep reinforcement learning. NATURE -LONDON- 518, 7540 (2015), 529\u2013533.","journal-title":"NATURE -LONDON-"},{"key":"e_1_3_1_30_2","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cor.2004.07.009"},{"key":"e_1_3_1_33_2","first-page":"426","volume-title":"Proceedings of the SAI Intelligent Systems Conference","author":"Mousavi Seyed Sajad","year":"2016","unstructured":"Seyed Sajad Mousavi, Michael Schukat, and Enda Howley. 2016. Deep reinforcement learning: An overview. In Proceedings of the SAI Intelligent Systems Conference. Springer, 426\u2013440."},{"key":"e_1_3_1_34_2","article-title":"Reinforcement learning for solving the vehicle routing problem","volume":"31","author":"Nazari MohammadReza","year":"2018","unstructured":"MohammadReza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takac. 2018. Reinforcement learning for solving the vehicle routing problem. Advances in Neural Information Processing Systems 31 (2018), 9839\u20139849.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/0377-2217(83)90099-1"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005605"},{"key":"e_1_3_1_37_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction."},{"key":"e_1_3_1_38_2","first-page":"1057","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Sutton Richard S.","year":"2000","unstructured":"Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the Advances in Neural Information Processing Systems. 1057\u20131063."},{"key":"e_1_3_1_39_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_1_40_2","first-page":"2692","article-title":"Pointer networks","volume":"28","author":"Vinyals Oriol","year":"2015","unstructured":"Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. Advances in Neural Information Processing Systems 28 (2015), 2692\u20132700.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_41_2","article-title":"An ant system algorithm for the mixed vehicle routing problem with backhauls.","author":"Wade A.","year":"2003","unstructured":"A. Wade and S. Salhi. 2003. An ant system algorithm for the mixed vehicle routing problem with backhauls. Metaheuristics: Computer Decision-Making 86 (2003), 699\u2013719.","journal-title":"Metaheuristics: Computer Decision-Making"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2011.08.018"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tre.2020.101886"},{"key":"e_1_3_1_44_2","unstructured":"N. A. Wassan and G. Nagy. 2013. The vehicle routing problem with deliveries and pickups: Modelling issues and solution approaches. (2013)."},{"key":"e_1_3_1_45_2","article-title":"A reinforcement learning approach to job-shop scheduling","author":"Wei Z.","year":"1995","unstructured":"Z. Wei and T. G. Dietterich. 1995. A reinforcement learning approach to job-shop scheduling. Morgan Kaufmann Publishers Inc. 95 (1995), 1114\u20131120.","journal-title":"Morgan Kaufmann Publishers Inc."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_1_47_2","volume-title":"Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)","author":"Xin L.","year":"2021","unstructured":"L. Xin, W. Song, Z. Cao, and J. Zhang. 2021. Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_3_1_48_2","unstructured":"K. Xu W. Hu J. Leskovec and S. Jegelka. 2018. How powerful are graph neural networks? In International Conference on Learning Representations ."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CC.2017.8107642"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546952","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3546952","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:42Z","timestamp":1750186842000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546952"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,22]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3546952"],"URL":"https:\/\/doi.org\/10.1145\/3546952","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,22]]},"assertion":[{"value":"2022-01-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-20","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}