{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T00:57:36Z","timestamp":1760057856282,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T00:00:00Z","timestamp":1740614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Education of the Republic of Korea and the National Research Foundation of Korea","award":["NRF-2018R1A5A7059549"],"award-info":[{"award-number":["NRF-2018R1A5A7059549"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Deep reinforcement learning (DRL) as a routing problem solver has shown promising results in recent studies. However, an inherent gap exists between computationally driven DRL and optimization-based heuristics. While a DRL algorithm for a certain problem is able to solve several similar problem instances, traditional optimization algorithms focus on optimizing solutions to one specific problem instance. In this paper, we propose an approach, AlphaRouter, which solves routing problems while bridging the gap between reinforcement learning and optimization. Fitting to routing problems, our approach first proposes attention-enabled policy and value networks consisting of a policy network that produces a probability distribution over all possible nodes and a value network that produces the expected distance from any given state. We modify a Monte Carlo tree search (MCTS) for the routing problems, selectively combining it with the routing problems. Our experiments demonstrate that the combined approach is promising and yields better solutions compared to original reinforcement learning (RL) approaches without MCTS, with good performance comparable to classical heuristics.<\/jats:p>","DOI":"10.3390\/e27030251","type":"journal-article","created":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T02:48:37Z","timestamp":1740710917000},"page":"251","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["AlphaRouter: Bridging the Gap Between Reinforcement Learning and Optimization for Vehicle Routing with Monte Carlo Tree Searches"],"prefix":"10.3390","volume":"27","author":[{"given":"Won-Jun","family":"Kim","sequence":"first","affiliation":[{"name":"Hyundai Glovis, Seoul 685-700, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5670-2653","authenticated-orcid":false,"given":"Junho","family":"Jeong","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, College of Engineering, Hanyang University, Seoul 133-791, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7120-9178","authenticated-orcid":false,"given":"Taeyeong","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, College of Engineering, Hanyang University, Seoul 133-791, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5184-7151","authenticated-orcid":false,"given":"Kichun","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, College of Engineering, Hanyang University, Seoul 133-791, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,27]]},"reference":[{"key":"ref_1","first-page":"5138","article-title":"Matrix encoding networks for neural combinatorial optimization","volume":"34","author":"Kwon","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_2","first-page":"21188","article-title":"POMO: Policy Optimization with Multiple Optima for Reinforcement Learning","volume":"33","author":"Kwon","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1038\/s41586-022-05172-4","article-title":"Discovering faster matrix multiplication algorithms with reinforcement learning","volume":"610","author":"Fawzi","year":"2022","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the Game of Go with Deep Neural Networks and Tree Search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the Game of Go without Human Knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_6","unstructured":"Vinyals, O., Fortunato, M., and Jaitly, N. (2017). Pointer Networks. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering atari, go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"ref_8","first-page":"66","article-title":"A Survey on the Vehicle Routing Problem and Its Variants","volume":"4","author":"Kumar","year":"2012","journal-title":"Intell. Inf. Manag."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1287\/opre.21.2.498","article-title":"An Effective Heuristic Algorithm for the Traveling-Salesman Problem","volume":"21","author":"Lin","year":"1973","journal-title":"Oper. Res."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1287\/opre.1120.1048","article-title":"A Hybrid Genetic Algorithm for Multidepot and Periodic Vehicle Routing Problems","volume":"60","author":"Vidal","year":"2012","journal-title":"Oper. Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"105643","DOI":"10.1016\/j.cor.2021.105643","article-title":"Hybrid genetic search for the CVRP: Open-source implementation and SWAP* neighborhood","volume":"140","author":"Vidal","year":"2022","journal-title":"Comput. Oper. Res."},{"key":"ref_12","unstructured":"Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., and Song, L. (2018). Learning Combinatorial Optimization Algorithms over Graphs. arXiv."},{"key":"ref_13","unstructured":"Sutskever, I., Vinyals, O., and Le, Q.V. (2014). Sequence to Sequence Learning with Neural Networks. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Luong, M.T., Pham, H., and Manning, C.D. (2015). Effective Approaches to Attention-based Neural Machine Translation. arXiv.","DOI":"10.18653\/v1\/D15-1166"},{"key":"ref_15","unstructured":"Bello, I., Pham, H., Le, Q.V., Norouzi, M., and Bengio, S. (2017). Neural Combinatorial Optimization with Reinforcement Learning. arXiv."},{"key":"ref_16","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume":"12","author":"Sutton","year":"1999","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_18","unstructured":"Kool, W., van Hoof, H., and Welling, M. (May, January 30). Attention, Learn to Solve Routing Problems!. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_19","first-page":"10418","article-title":"Learning collaborative policies to solve np-hard routing problems","volume":"34","author":"Kim","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xin, L., Song, W., Cao, Z., and Zhang, J. (2021, January 19\u201321). Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. Proceedings of the AAAI Conference on Artificial Intelligence, Virtually.","DOI":"10.1609\/aaai.v35i13.17430"},{"key":"ref_21","unstructured":"Hottung, A., Kwon, Y.D., and Tierney, K. (2022). Efficient Active Search for Combinatorial Optimization Problems. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bubeck, S., and Cesa-Bianchi, N. (2012). Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. arXiv.","DOI":"10.1561\/9781601986276"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ma\u0144dziuk, J., and \u015awiechowski, M. (2016, January 6\u20139). Simulation-based approach to Vehicle Routing Problem with traffic jams. Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece.","DOI":"10.1109\/SSCI.2016.7850028"},{"key":"ref_24","first-page":"2102265","article-title":"Hybrid fleet capacitated vehicle routing problem with flexible Monte\u2013Carlo Tree search","volume":"10","author":"Barletta","year":"2023","journal-title":"Int. J. Syst. Sci. Oper. Logist."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"104781","DOI":"10.1016\/j.cor.2019.104781","article-title":"Deep learning assisted heuristic tree search for the container pre-marshalling problem","volume":"113","author":"Hottung","year":"2020","journal-title":"Comput. Oper. Res."},{"key":"ref_26","unstructured":"Luo, G., Wang, Y., Zhang, H., Yuan, Q., and Li, J. (2023, January 7\u201314). AlphaRoute: Large-Scale Coordinated Route Planning via Monte Carlo Tree Search. Proceedings of the AAAI Conference on Artificial Intelligence 2023, Washington, DC, USA."},{"key":"ref_27","unstructured":"Cohen-Solal, Q., and Cazenave, T. (2020). Minimax strikes back. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sinha, A., Azad, U., and Singh, H. (March, January 22). Qubit Routing Using Graph Neural Network Aided Monte Carlo Tree Search. Proceedings of the AAAI Conference on Artificial Intelligence 2022, Online.","DOI":"10.1609\/aaai.v36i9.21231"},{"key":"ref_29","unstructured":"Kilby, P., Prosser, P., and Shaw, P. (1998). Dynamic VRPs: A Study of Scenarios, University of Strathclyde. University of Strathclyde Technical Report."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"463","DOI":"10.17535\/crorr.2017.0029","article-title":"Two models of the capacitated vehicle routing problem","volume":"8","author":"Borcinova","year":"2017","journal-title":"Croat. Oper. Res. Rev."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., and Girdhar, R. (2022, January 18\u201324). Masked-attention mask transformer for universal image segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00135"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.neucom.2022.05.051","article-title":"Full transformer network with masking future for word-level sign language recognition","volume":"500","author":"Du","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_33","unstructured":"Agarap, A.F. (2018). Deep learning using rectified linear units (relu). arXiv."},{"key":"ref_34","unstructured":"Shazeer, N. (2020). Glu variants improve transformer. arXiv."},{"key":"ref_35","unstructured":"Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., and Gehrmann, S. (2022). Palm: Scaling language modeling with pathways. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_37","unstructured":"Team, T.P.L. (2025, February 19). PyTorch Lightning, Opensource, 2023. Available online: https:\/\/github.com\/Lightning-AI\/lightning."},{"key":"ref_38","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2025, February 19). Automatic Differentiation in Pytorch. Available online: https:\/\/openreview.net\/forum?id=BJJsrmfCZ."},{"key":"ref_39","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_40","unstructured":"Helsgaun, K. (2017). An Extension of the Lin-Kernighan-Helsgaun TSP Solver for Constrained Traveling Salesman and Vehicle Routing Problems: Technical Report, Roskilde Universitet."},{"key":"ref_41","unstructured":"Applegate, D., Bixby, R., Chv\u00e1tal, V., and Cook, W. (2025, February 19). Concorde Tsp Solver. 03.12.19. Available online: https:\/\/en.wikipedia.org\/wiki\/Concorde_TSP_Solver."},{"key":"ref_42","unstructured":"Furnon, V., and Perron, L. (2025, February 19). OR-Tools Routing Library. Available online: https:\/\/developers.google.com\/optimization."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1137\/0206041","article-title":"An Analysis of Several Heuristics for the Traveling Salesman Problem","volume":"6","author":"Rosenkrantz","year":"1977","journal-title":"SIAM J. Comput."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Chaslot, G.M.B., Winands, M.H., and van Den Herik, H.J. (October, January 29). Parallel Monte-Carlo Tree Search. Proceedings of the Computers and Games: 6th International Conference, CG 2008, Beijing, China.","DOI":"10.1007\/978-3-540-87608-3_6"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/251\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:43:42Z","timestamp":1760028222000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/251"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,27]]},"references-count":44,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["e27030251"],"URL":"https:\/\/doi.org\/10.3390\/e27030251","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,2,27]]}}}