{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T15:33:23Z","timestamp":1772465603633,"version":"3.50.1"},"reference-count":20,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T00:00:00Z","timestamp":1772236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100008462","name":"Fujian University of Technology","doi-asserted-by":"publisher","award":["S90032600821"],"award-info":[{"award-number":["S90032600821"]}],"id":[{"id":"10.13039\/501100008462","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>The multi-depot vehicle routing problem with soft time windows (MDVRPSTW) has long been a focus in both academic and industrial circles. This paper proposes a deep reinforcement learning framework designed to enhance the efficiency and quality of MDVRPSTW solutions, addressing the limitations of traditional heuristic algorithms in large-scale complex scenarios. The framework first transforms the mathematical model into a sequential decision-making problem through a Markov decision process, then extracts path selection strategies using an encoder\u2013decoder architecture based on attention mechanisms and graph neural networks, and employs unsupervised reinforcement learning for model training. Test results on the Solomon benchmark dataset demonstrate that for small-scale problems (N = 20), our method reduces solving time by over 96% compared to comparative algorithms, with the objective value difference from the generalized variable neighborhood search (GVNS) being less than 9%. For medium-to-large scale problems (N = 50\/100), our method achieves a 27.7 to 96.3 percent improvement over GVNS, maintaining stable solution times within 3 to 10 s. Compared to exact algorithms and meta-heuristic methods, our approach reduces computational costs by 2\u20133 orders of magnitude while demonstrating strong adaptability to variations in the number of depots and vehicles. In summary, this method significantly outperforms baseline models in both solution quality and computational efficiency, providing an efficient end-to-end solution for MDVRPSTW in complex scenarios.<\/jats:p>","DOI":"10.3390\/systems14030261","type":"journal-article","created":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T14:06:56Z","timestamp":1772460416000},"page":"261","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Research on Distribution Optimization Strategy of Front Warehouse Model Based on Deep Reinforcement Learning"],"prefix":"10.3390","volume":"14","author":[{"given":"Jiaqing","family":"Chen","sequence":"first","affiliation":[{"name":"School of Economics and Finance, Xi\u2019an Jiaotong University, Xi\u2019an 710061, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ming","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Internet Economics and Business, Fujian University of Technology, Fuzhou 350118, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-5041-9760","authenticated-orcid":false,"given":"Guorong","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Internet Economics and Business, Fujian University of Technology, Fuzhou 350118, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"2335","DOI":"10.1142\/S0219622024500020","article-title":"Multi-Objective Last-Mile Vehicle Routing Problem for Fresh Food E-Commerce: A Sustainable Perspective","volume":"23","author":"Liu","year":"2024","journal-title":"Int. J. Inf. Technol. Decis. Mak."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"104773","DOI":"10.1016\/j.iref.2025.104773","article-title":"Evolutionary game analysis of collaborative strategies in fresh e-commerce and cold chain logistics: The role of incentive mechanisms and supervision policies","volume":"104","author":"Huang","year":"2025","journal-title":"Int. Rev. Econ. Financ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"113107","DOI":"10.1016\/j.asoc.2025.113107","article-title":"Optimizing multi-drone patrol path planning under uncertain flight duration: A robust model and adaptive large neighborhood search with simulated annealing","volume":"176","author":"Li","year":"2025","journal-title":"Appl. Soft Comput."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"129698","DOI":"10.1016\/j.eswa.2025.129698","article-title":"A two-stage evolutionary algorithm based on hybrid penalty strategy and its application to multi-UAV path planning","volume":"298","author":"Guo","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1007\/s12065-025-01092-0","article-title":"Adaptive ant colony optimization for solving dynamic vehicle and drone routing with time window constraints","volume":"18","author":"Hung","year":"2025","journal-title":"Evol. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"110122","DOI":"10.1016\/j.cie.2024.110122","article-title":"An adaptive large neighborhood search for the multi-depot dynamic vehicle routing problem with time windows","volume":"191","author":"Wang","year":"2024","journal-title":"Comput. Ind. Eng."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"606","DOI":"10.1504\/EJIE.2024.139327","article-title":"An adaptive large neighbourhood search for multi-depot electric vehicle routing problem with time windows","volume":"18","author":"Wang","year":"2024","journal-title":"Eur. J. Ind. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"106016","DOI":"10.1016\/j.cor.2022.106016","article-title":"A variable neighborhood search-based algorithm with adaptive local search for the Vehicle Routing Problem with Time Windows and multi-depots aiming for vehicle fleet reduction","volume":"149","author":"Bezerra","year":"2022","journal-title":"Comput. Oper. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1108\/JM2-04-2017-0046","article-title":"A hybrid genetic algorithm for multi-depot vehicle routing problem with considering time window repair and pick-up","volume":"13","author":"Rabbani","year":"2018","journal-title":"J. Model. Manag."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Anuar, W.K., Lee, L.S., Seow, H.-V., and Pickl, S. (2022). A Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity: An MDP Model and Dynamic Policy for Post-Decision State Rollout Algorithm in Reinforcement Learning. Mathematics, 10.","DOI":"10.3390\/math10152699"},{"key":"ref_11","first-page":"493","article-title":"A multi-agent deep reinforcement learning approach for solving the multi-depot vehicle routing problem","volume":"10","author":"Arishi","year":"2023","journal-title":"J. Manag. Anal."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"128451","DOI":"10.1016\/j.physa.2023.128451","article-title":"Graph attention reinforcement learning with flexible matching policies for multi-depot vehicle routing problems","volume":"611","author":"Zhang","year":"2023","journal-title":"Phys. A Stat. Mech. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1287\/ijoc.2023.0103","article-title":"VRPSolverEasy: A Python Library for the Exact Solution of a Rich Vehicle Routing Problem","volume":"36","author":"Errami","year":"2024","journal-title":"INFORMS J. Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"107193","DOI":"10.1016\/j.cor.2025.107193","article-title":"Generalized variable neighborhood search algorithm for vehicle routing problem with time windows and synchronization","volume":"183","author":"Masmoudi","year":"2025","journal-title":"Comput. Oper. Res."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"128258","DOI":"10.1016\/j.eswa.2025.128258","article-title":"Metaheuristic approaches for the stochastic capacitated multi-depot vehicle routing problem with pickup and delivery","volume":"290","author":"Rios","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"100638","DOI":"10.1016\/j.eij.2025.100638","article-title":"A hybrid GRASP and VND heuristic for vehicle routing problem with dynamic requests","volume":"29","author":"Chen","year":"2025","journal-title":"Egypt. Inform. J."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"128660","DOI":"10.1016\/j.eswa.2025.128660","article-title":"Dynamic embedding-based deep reinforcement learning for heterogeneous capacitated VRPs with unloading time constraints","volume":"293","author":"Guan","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Cai, H., Xu, P., Tang, X., and Lin, G. (2024). Solving the Vehicle Routing Problem with Stochastic Travel Cost Using Deep Reinforcement Learning. Electronics, 13.","DOI":"10.3390\/electronics13163242"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"112173","DOI":"10.1016\/j.knosys.2024.112173","article-title":"Token-based deep reinforcement learning for Heterogeneous VRP with Service Time Constraints","volume":"300","author":"Wang","year":"2024","journal-title":"Knowl.-Based Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1287\/opre.35.2.254","article-title":"Algorithms for the vehicle routing and scheduling problems with time window constraints","volume":"35","author":"Solomon","year":"1987","journal-title":"Oper. Res."}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/14\/3\/261\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T14:38:24Z","timestamp":1772462304000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/14\/3\/261"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,28]]},"references-count":20,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["systems14030261"],"URL":"https:\/\/doi.org\/10.3390\/systems14030261","relation":{},"ISSN":["2079-8954"],"issn-type":[{"value":"2079-8954","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,28]]}}}