{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T23:10:26Z","timestamp":1771888226230,"version":"3.50.1"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2024,3,1]],"date-time":"2024-03-01T00:00:00Z","timestamp":1709251200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T00:00:00Z","timestamp":1711497600000},"content-version":"vor","delay-in-days":26,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"crossref","award":["2020R1A2C1005037"],"award-info":[{"award-number":["2020R1A2C1005037"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Korea Advanced Institute of Science and Technology"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider\/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wide range of intelligent UAS applications where changes in the mission environment force the operation of multiple vehicles to change from the original plan. To solve this problem with deep reinforcement learning (RL), we develop a policy network with self-attention on each partial tour and encoder-decoder attention between the partial tour and the remaining nodes. We propose a modified REINFORCE algorithm where the greedy rollout baseline is replaced by a local mini-batch baseline based on multiple, possibly non-duplicate sample rollouts. By drawing multiple samples per training instance, we can learn faster and obtain a stable policy gradient estimator with significantly fewer instances. The proposed training algorithm outperforms the conventional greedy rollout baseline, even when combined with the maximum entropy objective. The efficiency of our method is further demonstrated in two classical problems \u2013 the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). The experimental results show that our method enables models to develop more effective heuristics and performs competitively with the state-of-the-art deep reinforcement learning methods.<\/jats:p>","DOI":"10.1007\/s10489-024-05367-4","type":"journal-article","created":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T06:40:15Z","timestamp":1711521615000},"page":"4467-4489","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Multi-start team orienteering problem for UAS mission re-planning with data-efficient deep reinforcement learning"],"prefix":"10.1007","volume":"54","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9045-2574","authenticated-orcid":false,"given":"Dong Ho","family":"Lee","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4971-5130","authenticated-orcid":false,"given":"Jaemyung","family":"Ahn","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,27]]},"reference":[{"key":"5367_CR1","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1016\/j.cie.2018.04.037","volume":"120","author":"WP Coutinho","year":"2018","unstructured":"Coutinho WP, Battarra M, Fliege J (2018) The unmanned aerial vehicle routing and trajectory optimisation problem, a taxonomic review. Comput Ind Eng 120:116\u201328. https:\/\/doi.org\/10.1016\/j.cie.2018.04.037","journal-title":"Comput Ind Eng"},{"key":"5367_CR2","doi-asserted-by":"publisher","first-page":"1626","DOI":"10.1111\/itor.12783","volume":"28","author":"D Rojas Viloria","year":"2021","unstructured":"Rojas Viloria D, Solano-Charris EL, Mu\u00f1oz-Villamizar A, Montoya-Torres JR (2021) Unmanned aerial vehicles\/drones in vehicle routing problems: a literature review. Int Trans Oper Res 28:1626\u201357. https:\/\/doi.org\/10.1111\/itor.12783","journal-title":"Int Trans Oper Res"},{"key":"5367_CR3","doi-asserted-by":"publisher","unstructured":"Kool W, Hoof HV, Welling M (2019) Attention, Learn to Solve Routing Problems! In: 2019 International Conference on Learning Representations (ICLR).https:\/\/doi.org\/10.48550\/arXiv.1803.08475","DOI":"10.48550\/arXiv.1803.08475"},{"key":"5367_CR4","doi-asserted-by":"publisher","unstructured":"Kwon Y-D, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), 21188\u201398. https:\/\/doi.org\/10.48550\/arXiv.2010.16011","DOI":"10.48550\/arXiv.2010.16011"},{"key":"5367_CR5","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000\u201310. Curran Associates Inc, Long Beach, California, USA. https:\/\/dl.acm.org\/doi\/10.5555\/3295222.3295349"},{"key":"5367_CR6","doi-asserted-by":"publisher","unstructured":"Bresson X, Laurent T (2021) The transformer network for the traveling salesman problem. In: ArXiv. https:\/\/doi.org\/10.48550\/arXiv.2103.03012","DOI":"10.48550\/arXiv.2103.03012"},{"key":"5367_CR7","doi-asserted-by":"publisher","unstructured":"Peng B, Wang J, Zhang Z (2020) A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. In. https:\/\/doi.org\/10.48550\/arXiv.2002.03282","DOI":"10.48550\/arXiv.2002.03282"},{"key":"5367_CR8","doi-asserted-by":"publisher","unstructured":"Eysenbach B, Levine S (2021) Maximum entropy rl (provably) solves some robust rl problems. In. https:\/\/doi.org\/10.48550\/arXiv.2103.06257","DOI":"10.48550\/arXiv.2103.06257"},{"key":"5367_CR9","doi-asserted-by":"publisher","unstructured":"Ahmed Z, Le Roux N, Norouzi M, Schuurmans D (2019) Understanding the impact of entropy on policy optimization. In: International conference on machine learning, 151\u201360. PMLR. https:\/\/doi.org\/10.48550\/arXiv.1811.11214","DOI":"10.48550\/arXiv.1811.11214"},{"key":"5367_CR10","unstructured":"Archetti C, Grazia Speranza M, Vigo D (n.d.) Chapter 10: Vehicle Routing Problems with Profits. In: Vehicle Routing (MOS-SIAM Series on Optimization). https:\/\/epubs.siam.org\/doi\/abs\/10.1137\/1.9781611973594.ch10"},{"key":"5367_CR11","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1016\/j.dam.2011.12.021","volume":"161","author":"C Archetti","year":"2013","unstructured":"Archetti C, Bianchessi N, Speranza MG (2013) Optimal solutions for routing problems with profits. Discret Appl Math 161:547\u201357. https:\/\/doi.org\/10.1016\/j.dam.2011.12.021","journal-title":"Discret Appl Math"},{"key":"5367_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ejor.2010.03.045","volume":"209","author":"P Vansteenwegen","year":"2011","unstructured":"Vansteenwegen P, Souffriau W, Van Oudheusden D (2011) The orienteering problem: a survey. Eur J Oper Res 209:1\u201310. https:\/\/doi.org\/10.1016\/j.ejor.2010.03.045","journal-title":"Eur J Oper Res"},{"key":"5367_CR13","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1016\/S0305-0548(98)00071-9","volume":"26","author":"SE Butt","year":"1999","unstructured":"Butt SE, Ryan DM (1999) An optimal solution procedure for the multiple tour maximum collection problem using column generation. Comput Oper Res 26:427\u201341. https:\/\/doi.org\/10.1016\/S0305-0548(98)00071-9","journal-title":"Comput Oper Res"},{"key":"5367_CR14","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s10288-006-0009-1","volume":"5","author":"S Boussier","year":"2007","unstructured":"Boussier S, Feillet D, Gendreau M (2007) An exact algorithm for team orienteering problems. 4OR 5:211\u201330. https:\/\/doi.org\/10.1007\/s10288-006-0009-1","journal-title":"4OR"},{"key":"5367_CR15","doi-asserted-by":"publisher","first-page":"7804","DOI":"10.1109\/TITS.2020.3009289","volume":"22","author":"G Bono","year":"2021","unstructured":"Bono G, Dibangoye JS, Simonin O, Matignon L, Pereyron F (2021) Solving multi-agent routing problems using deep attention mechanisms. IEEE Trans Intell Transp Syst 22:7804\u201313. https:\/\/doi.org\/10.1109\/TITS.2020.3009289","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"5367_CR16","doi-asserted-by":"publisher","first-page":"1064","DOI":"10.1016\/j.asoc.2012.09.022","volume":"13","author":"S-W Lin","year":"2013","unstructured":"Lin S-W (2013) Solving the team orienteering problem using effective multi-start simulated annealing. Appl Soft Comput 13:1064\u201373. https:\/\/doi.org\/10.1016\/j.asoc.2012.09.022","journal-title":"Appl Soft Comput"},{"key":"5367_CR17","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/j.cie.2017.10.020","volume":"114","author":"S-W Lin","year":"2017","unstructured":"Lin S-W, Yu VF (2017) Solving the team orienteering problem with time windows and mandatory visits by multi-start simulated annealing. Comput Ind Eng 114:195\u2013205. https:\/\/doi.org\/10.1016\/j.cie.2017.10.020","journal-title":"Comput Ind Eng"},{"key":"5367_CR18","doi-asserted-by":"publisher","first-page":"679","DOI":"10.1007\/s40092-019-0315-9","volume":"15","author":"I Hapsari","year":"2019","unstructured":"Hapsari I, Surjandari I, Komarudin K (2019) Solving multi-objective team orienteering problem with time windows using adjustment iterated local search. J Ind Eng Int 15:679\u201393. https:\/\/doi.org\/10.1007\/s40092-019-0315-9","journal-title":"J Ind Eng Int"},{"key":"5367_CR19","doi-asserted-by":"publisher","unstructured":"Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural Combinatorial Optimization with Reinforcement Learning. In: 2017 International Conference on Learning Representations (ICLR). https:\/\/doi.org\/10.48550\/arXiv.1611.09940","DOI":"10.48550\/arXiv.1611.09940"},{"key":"5367_CR20","doi-asserted-by":"publisher","unstructured":"Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. In: Advances in Neural Information Processing Systems (NeurIPS). https:\/\/doi.org\/10.48550\/arXiv.1506.03134","DOI":"10.48550\/arXiv.1506.03134"},{"key":"5367_CR21","doi-asserted-by":"publisher","unstructured":"Khalil E, Dai H, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems (NeurIPS). https:\/\/doi.org\/10.48550\/arXiv.1704.01665","DOI":"10.48550\/arXiv.1704.01665"},{"key":"5367_CR22","doi-asserted-by":"publisher","unstructured":"Nazari M, Oroojlooy A, Snyder L, Tak\u00e1c M (2018) Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems (NeurIPS). https:\/\/doi.org\/10.48550\/arXiv.1802.04240","DOI":"10.48550\/arXiv.1802.04240"},{"key":"5367_CR23","doi-asserted-by":"publisher","unstructured":"Deudon M, Cournut P, Lacoste A, Adulyasak Y, Rousseau LM (2018) Learning heuristics for the tsp by policy gradient. In: International conference on the integration of constraint programming, artificial intelligence, and operations research, 170\u201381. Springer. https:\/\/doi.org\/10.1007\/978-3-319-93031-2_12","DOI":"10.1007\/978-3-319-93031-2_12"},{"key":"5367_CR24","doi-asserted-by":"publisher","unstructured":"Cappart Q, Moisan T, Rousseau L-M, Pr\u00e9mont-Schwarz I, Cire A (2020) Combining reinforcement learning and constraint programming for combinatorial optimization. In: ArXiv. https:\/\/doi.org\/10.48550\/arXiv.2006.01610","DOI":"10.48550\/arXiv.2006.01610"},{"key":"5367_CR25","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2110.02629","author":"J Li","year":"2021","unstructured":"Li J, Ma Y, Gao R, Cao Z, Lim A, Song W, Zhang J (2021) Deep reinforcement learning for solving the heterogeneous capacitated vehicle routing problem. IEEE Trans Cybern. https:\/\/doi.org\/10.48550\/arXiv.2110.02629","journal-title":"IEEE Trans Cybern"},{"key":"5367_CR26","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2102.05875","author":"K Li","year":"2021","unstructured":"Li K, Zhang T, Wang R, Wang Y, Han Y, Wang L (2021) Deep reinforcement learning for combinatorial optimization: covering salesman problems. IEEE Trans Cybern. https:\/\/doi.org\/10.48550\/arXiv.2102.05875","journal-title":"IEEE Trans Cybern"},{"key":"5367_CR27","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2021.3089179","author":"Y Xu","year":"2021","unstructured":"Xu Y, Fang M, Chen L, Gangyan X, Yali D, Zhang C (2021) Reinforcement learning with multiple relational attention for solving vehicle routing problems. IEEE Trans Cybern. https:\/\/doi.org\/10.1109\/TCYB.2021.3089179","journal-title":"IEEE Trans Cybern"},{"key":"5367_CR28","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1007\/s10489-022-03456-w","volume":"53","author":"W Pan","year":"2023","unstructured":"Pan W, Liu SQ (2023) Deep reinforcement learning for the dynamic and uncertain vehicle routing problem. Appl Intell 53:405\u201322. https:\/\/doi.org\/10.1007\/s10489-022-03456-w","journal-title":"Appl Intell"},{"key":"5367_CR29","doi-asserted-by":"publisher","first-page":"8910","DOI":"10.1007\/s10489-021-02920-3","volume":"52","author":"Q Wang","year":"2022","unstructured":"Wang Q (2022) VARL: a variational autoencoder-based reinforcement learning Framework for vehicle routing problems. Appl Intell 52:8910\u201323. https:\/\/doi.org\/10.1007\/s10489-021-02920-3","journal-title":"Appl Intell"},{"key":"5367_CR30","doi-asserted-by":"publisher","unstructured":"Joshi CK, Laurent T, Bresson X (2019) On learning paradigms for the travelling salesman problem. In\u00a0ArXiv. https:\/\/doi.org\/10.48550\/arXiv.1910.07210","DOI":"10.48550\/arXiv.1910.07210"},{"key":"5367_CR31","unstructured":"Kool W, van Hoof H, Welling M (2019) Buy 4 reinforce samples, get a baseline for free! In: ICLR 2019 Deep Reinforcement Learning meets Structured Prediction Workshop. https:\/\/openreview.net\/forum?id=r1lgTGL5DE.\u00a0Accessed 23 Jun 2022"},{"key":"5367_CR32","doi-asserted-by":"publisher","unstructured":"Kool W, van Hoof H, Welling M (2019) Stochastic beams and where to find them: The gumbel-top-k trick for sampling sequences without replacement. In: International Conference on Machine Learning (ICML), 3499\u2013508. PMLR. https:\/\/doi.org\/10.48550\/arXiv.1903.06059","DOI":"10.48550\/arXiv.1903.06059"},{"key":"5367_CR33","doi-asserted-by":"crossref","unstructured":"Croes GA (1958) A method for solving traveling-salesman problems. Oper Res 6:791\u2013812.\u00a0https:\/\/www.jstor.org\/stable\/167074.\u00a0Accessed 23 Jun 2022","DOI":"10.1287\/opre.6.6.791"},{"key":"5367_CR34","unstructured":"Gurobi Optimization, LLC (2018) Gurobi optimizer reference manual. https:\/\/www.gurobi.com"},{"key":"5367_CR35","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/BF00992696","volume":"8","author":"RJ Williams","year":"1992","unstructured":"Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229\u201356. https:\/\/doi.org\/10.1007\/BF00992696","journal-title":"Mach Learn"},{"key":"5367_CR36","volume-title":"Reinforcement learning: An introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. Cambridge, MIT press"},{"key":"5367_CR37","doi-asserted-by":"publisher","unstructured":"Sultana N, Chan J, Sarwar T, Qin AK (2021) Learning to Optimise Routing Problems using Policy Optimisation. In: 2021 International Joint Conference on Neural Networks (IJCNN), 1\u20138. IEEE. https:\/\/doi.org\/10.1109\/IJCNN52387.2021.9534010","DOI":"10.1109\/IJCNN52387.2021.9534010"},{"key":"5367_CR38","doi-asserted-by":"publisher","unstructured":"Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. In. https:\/\/doi.org\/10.48550\/arXiv.1412.6980","DOI":"10.48550\/arXiv.1412.6980"},{"key":"5367_CR39","doi-asserted-by":"crossref","unstructured":"Tsiligirides T (1984) Heuristic methods applied to orienteering. J Oper Res Soc 35:797\u2013809. https:\/\/www.jstor.org\/stable\/2582629.\u00a0Accessed 23 Jun 2022","DOI":"10.1057\/jors.1984.162"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-024-05367-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-024-05367-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-024-05367-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T13:43:39Z","timestamp":1714398219000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-024-05367-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3]]},"references-count":39,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["5367"],"URL":"https:\/\/doi.org\/10.1007\/s10489-024-05367-4","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3]]},"assertion":[{"value":"26 February 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}},{"value":"Our DDTM implementation and training methodology code based on the instance-augmentation baseline with maximum entropy is publicly available at .","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code"}}]}}