{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T05:40:41Z","timestamp":1777182041262,"version":"3.51.4"},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T00:00:00Z","timestamp":1667001600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T00:00:00Z","timestamp":1667001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2023,2]]},"DOI":"10.1007\/s00521-022-07989-6","type":"journal-article","created":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T15:04:06Z","timestamp":1667055846000},"page":"4723-4738","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Mastering construction heuristics with self-play deep reinforcement learning"],"prefix":"10.1007","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3249-8459","authenticated-orcid":false,"given":"Qi","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuqing","family":"He","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chunlei","family":"Tang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,10,29]]},"reference":[{"key":"7989_CR1","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1109\/TEVC.2011.2163638","volume":"16","author":"A Pr\u00fcgel-Bennett","year":"2012","unstructured":"Pr\u00fcgel-Bennett A, Tayarani-Najaran MH (2012) Maximum satisfiability: anatomy of the fitness landscape for a hard combinatorial optimization problem. IEEE Trans Evol Comput 16:319\u2013338. https:\/\/doi.org\/10.1109\/TEVC.2011.2163638","journal-title":"IEEE Trans Evol Comput"},{"key":"7989_CR2","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1109\/TEVC.2015.2433680","volume":"20","author":"L Hernando","year":"2016","unstructured":"Hernando L, Mendiburu A, Lozano JA (2016) A tunable generator of instances of permutation-based combinatorial optimization problems. IEEE Trans Evol Comput 20:165\u2013179. https:\/\/doi.org\/10.1109\/TEVC.2015.2433680","journal-title":"IEEE Trans Evol Comput"},{"key":"7989_CR3","unstructured":"Garey MR, Johnson DS (1979) Computers, Complexity, and Intractability. A Guid. to Theory NPCompleteness, vol 115"},{"key":"7989_CR4","doi-asserted-by":"publisher","first-page":"1583","DOI":"10.1109\/TITS.2020.2972389","volume":"22","author":"X Xu","year":"2021","unstructured":"Xu X, Li J, Zhou MC (2021) Delaunay-triangulation-based variable neighborhood search to solve large-scale general colored traveling salesman problems. IEEE Trans Intell Transp Syst 22:1583\u20131593. https:\/\/doi.org\/10.1109\/TITS.2020.2972389","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"7989_CR5","doi-asserted-by":"publisher","first-page":"3775","DOI":"10.1007\/s00500-020-05406-5","volume":"25","author":"N Rokbani","year":"2021","unstructured":"Rokbani N, Kumar R, Abraham A, Alimi AM, Long HV, Priyadarshini I, Son LH (2021) Bi-heuristic ant colony optimization-based approaches for traveling salesman problem. Soft Comput 25:3775\u20133794. https:\/\/doi.org\/10.1007\/s00500-020-05406-5","journal-title":"Soft Comput"},{"key":"7989_CR6","doi-asserted-by":"publisher","first-page":"3806","DOI":"10.1109\/TITS.2019.2909109","volume":"20","author":"JJQ Yu","year":"2019","unstructured":"Yu JJQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20:3806\u20133817. https:\/\/doi.org\/10.1109\/TITS.2019.2909109","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"7989_CR7","doi-asserted-by":"publisher","first-page":"1654","DOI":"10.1109\/TITS.2015.2395536","volume":"16","author":"G Kim","year":"2015","unstructured":"Kim G, Ong YS, Heng CK, Tan PS, Zhang NA (2015) City vehicle routing problem (city VRP): a review. IEEE Trans Intell Transp Syst 16:1654\u20131666. https:\/\/doi.org\/10.1109\/TITS.2015.2395536","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"7989_CR8","unstructured":"Goyal S (2010) A survey on travelling salesman problem. In: Midwest instruction and computing symposium, pp 1\u20139"},{"key":"7989_CR9","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1016\/j.cor.2019.03.006","volume":"107","author":"F Arnold","year":"2019","unstructured":"Arnold F, Gendreau M, S\u00f6rensen K (2019) Efficiently solving very large-scale routing problems. Comput Oper Res 107:32\u201342. https:\/\/doi.org\/10.1016\/j.cor.2019.03.006","journal-title":"Comput Oper Res"},{"key":"7989_CR10","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y Lecun","year":"2015","unstructured":"Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436\u2013444. https:\/\/doi.org\/10.1038\/nature14539","journal-title":"Nature"},{"key":"7989_CR11","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1038\/s41586-020-03157-9","volume":"590","author":"A Ecoffet","year":"2021","unstructured":"Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590:580\u2013586. https:\/\/doi.org\/10.1038\/s41586-020-03157-9","journal-title":"Nature"},{"key":"7989_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107526","author":"Q Wang","year":"2021","unstructured":"Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl-Based Syst. https:\/\/doi.org\/10.1016\/j.knosys.2021.107526","journal-title":"Knowl-Based Syst"},{"key":"7989_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TCIAIG.2012.2186810","volume":"4","author":"CB Browne","year":"2012","unstructured":"Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI Games 4:1\u201343. https:\/\/doi.org\/10.1109\/TCIAIG.2012.2186810","journal-title":"IEEE Trans Comput Intell AI Games"},{"key":"7989_CR14","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484\u2013489. https:\/\/doi.org\/10.1038\/nature16961","journal-title":"Nature"},{"key":"7989_CR15","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine TL, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, W\u00fcnsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350\u2013354. https:\/\/doi.org\/10.1038\/s41586-019-1724-z","journal-title":"Nature"},{"key":"7989_CR16","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","volume":"588","author":"J Schrittwieser","year":"2020","unstructured":"Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D (2020) Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588:604\u2013609. https:\/\/doi.org\/10.1038\/s41586-020-03051-4","journal-title":"Nature"},{"key":"7989_CR17","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature 550:354\u2013359. https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"key":"7989_CR18","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","volume":"362","author":"D Silver","year":"2018","unstructured":"Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (80-.) 362:1140\u20131144. https:\/\/doi.org\/10.1126\/science.aar6404","journal-title":"Science (80-.)"},{"key":"7989_CR19","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4095-0_4","author":"Y Huang","year":"2020","unstructured":"Huang Y (2020) Deep Q-networks. Deep Reinf Learn Fundam Res Appl. https:\/\/doi.org\/10.1007\/978-981-15-4095-0_4","journal-title":"Deep Reinf Learn Fundam Res Appl"},{"key":"7989_CR20","doi-asserted-by":"publisher","first-page":"107526","DOI":"10.1016\/j.knosys.2021.107526","volume":"233","author":"Q Wang","year":"2021","unstructured":"Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl-Based Syst 233:107526. https:\/\/doi.org\/10.1016\/j.knosys.2021.107526","journal-title":"Knowl-Based Syst"},{"key":"7989_CR21","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-021-02920-3","author":"Q Wang","year":"2021","unstructured":"Wang Q (2021) VARL: a variational autoencoder-based reinforcement learning Framework for vehicle routing problems. Appl Intell. https:\/\/doi.org\/10.1007\/s10489-021-02920-3","journal-title":"Appl Intell"},{"key":"7989_CR22","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1016\/j.ejor.2020.07.063","volume":"290","author":"Y Bengio","year":"2021","unstructured":"Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d\u2019horizon. Eur J Oper Res 290:405\u2013421. https:\/\/doi.org\/10.1016\/j.ejor.2020.07.063","journal-title":"Eur J Oper Res"},{"key":"7989_CR23","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1007\/s11750-017-0451-6","volume":"25","author":"A Lodi","year":"2017","unstructured":"Lodi A, Zarpellon G (2017) On learning and branching: a survey. TOP 25:207\u2013236236. https:\/\/doi.org\/10.1007\/s11750-017-0451-6","journal-title":"TOP"},{"key":"7989_CR24","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1016\/S0166-218X(01)00351-1","volume":"123","author":"P Toth","year":"2002","unstructured":"Toth P, Vigo D (2002) Models, relaxations and exact approaches for the capacitated vehicle routing problem. Discret Appl Math 123:487\u2013512. https:\/\/doi.org\/10.1016\/S0166-218X(01)00351-1","journal-title":"Discret Appl Math"},{"key":"7989_CR25","unstructured":"Gasse M, Ch\u00e9telat D, Ferroni N, Charlin L, Lodi A (2019) Exact combinatorial optimization with graph convolutional neural networks"},{"key":"7989_CR26","doi-asserted-by":"publisher","unstructured":"Ene A, Nagarajan V, Saket R (20180 Approximation algorithms for stochastic k-TSP. In: Leibniz International Proceedings in Informatics, LIPIcs, vol 93, pp 1\u201311. https:\/\/doi.org\/10.4230\/LIPIcs.FSTTCS.2017.27","DOI":"10.4230\/LIPIcs.FSTTCS.2017.27"},{"key":"7989_CR27","first-page":"1","volume":"32","author":"R Sato","year":"2019","unstructured":"Sato R, Yamada M, Kashima H (2019) Approximation ratios of graph neural networks for combinatorial problems. Adv Neural Inf Process Syst 32:1\u201315","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR28","doi-asserted-by":"publisher","first-page":"2222","DOI":"10.1109\/TNNLS.2019.2927480","volume":"31","author":"F Sheldon","year":"2020","unstructured":"Sheldon F, Cicotti P, Traversa FL, Di Ventra M (2020) Stress-testing memcomputing on hard combinatorial optimization problems. IEEE Trans Neural Netw Learn Syst 31:2222\u20132226. https:\/\/doi.org\/10.1109\/TNNLS.2019.2927480","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"7989_CR29","doi-asserted-by":"publisher","first-page":"66","DOI":"10.4236\/iim.2012.43010","volume":"04","author":"SN Kumar","year":"2012","unstructured":"Kumar SN, Panneerselvam R (2012) A survey on the vehicle routing problem and its variants. Intell Inf Manag 04:66\u201374. https:\/\/doi.org\/10.4236\/iim.2012.43010","journal-title":"Intell Inf Manag"},{"key":"7989_CR30","doi-asserted-by":"publisher","unstructured":"Helsgaun K (2000) Effective implementation of the Lin\u2013Kernighan traveling salesman heuristic.https:\/\/doi.org\/10.1016\/S0377-2217(99)00284-2","DOI":"10.1016\/S0377-2217(99)00284-2"},{"key":"7989_CR31","doi-asserted-by":"crossref","unstructured":"Zheng J, He K, Zhou J, Jin Y, Li C-M (2020) Combining reinforcement learning with Lin\u2013Kernighan\u2013Helsgaun algorithm for the traveling salesman problem. In: Proceedings of the AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v35i14.17476"},{"key":"7989_CR32","first-page":"3104","volume":"4","author":"I Sutskever","year":"2014","unstructured":"Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 4:3104\u20133112","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR33","first-page":"2692","volume":"28","author":"O Vinyals","year":"2015","unstructured":"Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. Adv Neural Inf Process Syst 28:2692\u20132700","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR34","unstructured":"Bello I, Pham H, Le QV, Norouzi M, Bengio S (2019) Neural combinatorial optimization with reinforcement learning. In: 5th international conference on learning representations, ICLR 2017\u2014workshop track proceedings, pp 1\u201315"},{"key":"7989_CR35","unstructured":"Ivanov S, D\u2019yakonov A (2019) Modern deep reinforcement learning algorithms"},{"key":"7989_CR36","first-page":"9839","volume":"31","author":"M Nazari","year":"2018","unstructured":"Nazari M, Oroojlooy A, Tak\u00e1\u010d M, Snyder LV (2018) Reinforcement learning for solving the vehicle routing problem. Adv Neural Inf Process Syst 31:9839\u20139849","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR37","first-page":"5999","volume":"30","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5999\u20136009","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR38","unstructured":"Kool W Van Hoof H, Welling M (2019) Attention, learn to solve routing problems! In: 7th International conference on learning representations. ICLR 2019, pp 1\u201325"},{"key":"7989_CR39","unstructured":"Veli\u010dkovi\u0107 P, Casanova A, Li\u00f2 P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. In: 6th International conference on learning representations. ICLR 2018\u2014conference track proceedings, pp 1\u201312"},{"key":"7989_CR40","unstructured":"Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R, Gulcehre C, Song F, Ballard A, Gilmer J, Dahl G, Vaswani A, Allen K, Nash C, Langston V, Dyer C, Heess N, Wierstra D, Kohli P, Botvinick M, Vinyals O, Li Y, Pascanu R (2018) Relational inductive biases, deep learning, and graph networks, pp 1\u201338"},{"key":"7989_CR41","unstructured":"Xu K, Jegelka S, Hu W, Leskovec J (2019) How powerful are graph neural networks? In: 7th International conference on learning representations. ICLR 2019"},{"key":"7989_CR42","doi-asserted-by":"publisher","first-page":"833","DOI":"10.1109\/TKDE.2018.2849727","volume":"31","author":"P Cui","year":"2019","unstructured":"Cui P, Wang X, Pei J, Zhu W (2019) A survey on network embedding. IEEE Trans Knowl Data Eng 31:833\u2013852. https:\/\/doi.org\/10.1109\/TKDE.2018.2849727","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"7989_CR43","first-page":"6349","volume":"30","author":"H Dai","year":"2017","unstructured":"Dai H, Khalil EB, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. Adv Neural Inf Process Syst 30:6349\u20136359","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR44","unstructured":"Wu F, Zhang T, de Souza AH, Fifty C, Yu T, Weinberger KQ (2019) Simplifying graph convolutional networks. In: 36th International conference on machine learning. ICML 2019. 2019-June, pp 11884\u201311894"},{"key":"7989_CR45","first-page":"539","volume":"31","author":"Z Li","year":"2018","unstructured":"Li Z, Chen Q, Koltun V (2018) Combinatorial optimization with graph convolutional networks and guided tree search. Adv Neural Inf Process Syst 31:539\u2013548","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR46","unstructured":"Manchanda S, Mittal A, Dhawan A, Medya S, Ranu S, Singh A (2019) Learning heuristics over large graphs via deep reinforcement learning. Assoc Adv Artif Intell"},{"key":"7989_CR47","unstructured":"Joshi CK, Laurent T, Bresson X (2019) An efficient graph convolutional network technique for the travelling salesman problem, pp 1\u201317"},{"key":"7989_CR48","doi-asserted-by":"publisher","unstructured":"Drori I, Kharkar A, Sickinger WR, Kates B, Ma Q, Ge S, Dolev E, Dietrich B, Williamson DP, Udell M (2020) Learning to solve combinatorial optimization problems on real-world graphs in linear time. In: Proceedings\u201419th IEEE international conference on machine learning and applications. ICMLA 2020, pp 19\u201324. https:\/\/doi.org\/10.1109\/ICMLA51294.2020.00013","DOI":"10.1109\/ICMLA51294.2020.00013"},{"key":"7989_CR49","doi-asserted-by":"publisher","unstructured":"Duan L, Zhan Y, Hu H, Gong Y, Wei J, Zhang X, Xu Y (2020) Efficiently solving the practical vehicle routing problem: a novel joint learning approach. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 3054\u20133063 https:\/\/doi.org\/10.1145\/3394486.3403356","DOI":"10.1145\/3394486.3403356"},{"key":"7989_CR50","unstructured":"Ma Q, Ge S, He D, Thaker D, Drori I (2019) Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning"},{"key":"7989_CR51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18196\/iclr.v3i1.11454","volume":"3","author":"H Lu","year":"2020","unstructured":"Lu H, Zhang X, Yang S (2020) A Learning-based iterative method for solving vehicle routing problems. ICLR 3:1\u201315","journal-title":"ICLR"},{"key":"7989_CR52","unstructured":"Huang J, Patwary M, Diamos G (2019) Coloring big graphs with AlphaGoZero"},{"key":"7989_CR53","unstructured":"Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm, pp 1\u201319"},{"key":"7989_CR54","unstructured":"Laterre A, Fu Y, Jabri MK, Cohen A-S, Kas D, Hajjar K, Dahl TS, Kerkeni A, Beguir K (2018) Ranked reward: enabling self-play reinforcement learning for combinatorial optimization"},{"key":"7989_CR55","unstructured":"Abe K, Xu Z, Sato I, Sugiyama M (2019) Solving NP-hard problems on graphs with extended AlphaGo Zero, pp 1\u201323"},{"key":"7989_CR56","doi-asserted-by":"publisher","DOI":"10.1109\/tkde.2020.2981333","author":"Z Zhang","year":"2020","unstructured":"Zhang Z, Cui P, Zhu W (2020) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. https:\/\/doi.org\/10.1109\/tkde.2020.2981333","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"7989_CR57","first-page":"4863","volume":"31","author":"C Jin","year":"2018","unstructured":"Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Adv Neural Inf Process Syst 31:4863\u20134873","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR58","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPs. In: AAAI fall symposium\u2014technical report. FS-15-06, pp 29\u201337"},{"key":"7989_CR59","first-page":"5361","volume":"30","author":"T Anthony","year":"2017","unstructured":"Anthony T, Tian Z, Barber D (2017) Thinking fast and slow with deep learning and tree search. Adv Neural Inf Process Syst 30:5361\u20135371","journal-title":"Adv Neural Inf Process Syst"},{"key":"7989_CR60","doi-asserted-by":"publisher","unstructured":"Wu TR, Wei TH, Wu IC (2020) Accelerating and improving alphazero using population based training. https:\/\/doi.org\/10.1609\/aaai.v34i01.5454","DOI":"10.1609\/aaai.v34i01.5454"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-022-07989-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-022-07989-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-022-07989-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,29]],"date-time":"2023-01-29T08:16:56Z","timestamp":1674980216000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-022-07989-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,29]]},"references-count":60,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["7989"],"URL":"https:\/\/doi.org\/10.1007\/s00521-022-07989-6","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,29]]},"assertion":[{"value":"8 December 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 October 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 October 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We wish to confirm no known conflicts of interest associated with this publication. There has been no significant financial support for this work that could have influenced its outcome. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that all have approved the order of authors listed in our manuscript. We confirm that we have given due consideration to the protection of intellectual property associated with this work. There are no impediments to publication, including the timing of publication, concerning intellectual property. In so doing, we confirm that we have followed the regulations of our institutions concerning intellectual property. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). They are responsible for communicating with the other authors about progress, submissions of revisions, and final approval of proofs. We confirm that we have provided a current, correct email address accessible by the Corresponding Author and configured to accept email from 17110240039@fudan.edu.cn.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Our study did not raise any ethical questions, i.e., none of the subjects were humans or living individuals. This paper only focuses on combinatorial optimization of graphs in computer science. The technologies used are all modern computer technologies, including deep learning, reinforcement learning, and Monte Carlo tree search. Our research belongs to theoretical and application innovation in computer science, so it does not involve ethical and moral issues.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"All authors are aware of this article and agree to its submission.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed consent"}}]}}