{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T17:14:20Z","timestamp":1772471660302,"version":"3.50.1"},"reference-count":75,"publisher":"Springer Science and Business Media LLC","issue":"13","license":[{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T00:00:00Z","timestamp":1708560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"funder":[{"name":"Open Fund\/Postdoctoral Fund of the Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences","award":["CASIA-KFKT-11"],"award-info":[{"award-number":["CASIA-KFKT-11"]}]},{"DOI":"10.13039\/501100019602","name":"Collaborative Innovation Center of Audit Information Engineering and Technology","doi-asserted-by":"publisher","award":["SC-2023-039"],"award-info":[{"award-number":["SC-2023-039"]}],"id":[{"id":"10.13039\/501100019602","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2024,5]]},"DOI":"10.1007\/s00521-024-09455-x","type":"journal-article","created":{"date-parts":[[2024,2,22]],"date-time":"2024-02-22T18:02:40Z","timestamp":1708624960000},"page":"7203-7219","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football"],"prefix":"10.1007","volume":"36","author":[{"given":"Junjie","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Jiangwen","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Xinyan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yuanbai","family":"Li","sequence":"additional","affiliation":[]},{"given":"Xianzhong","family":"Zhou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4897-2007","authenticated-orcid":false,"given":"Yuxiang","family":"Sun","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,22]]},"reference":[{"key":"9455_CR1","unstructured":"Agarwal R, Schuurmans D, Norouzi M (2019) Striving for simplicity in off-policy deep reinforcement learning. CoRR. arXiv:1907.04543"},{"key":"9455_CR2","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller MA (2013) Playing atari with deep reinforcement learning. CoRR. arXiv:1312.5602"},{"key":"9455_CR3","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1724-z","author":"O Vinyals","year":"2019","unstructured":"Vinyals O, Babuschkin I, Czarnecki W, Mathieu M, Dudzik A, Chung J, Choi D, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou J, Jaderberg M, Silver D (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature. https:\/\/doi.org\/10.1038\/s41586-019-1724-z","journal-title":"Nature"},{"key":"9455_CR4","doi-asserted-by":"crossref","unstructured":"Kurach K, Raichuk A, Stanczyk P, Zajac M, Bachem O, Espeholt L, Riquelme C, Vincent D, Michalski M, Bousquet O, Gelly S (2019) Google research football: a novel reinforcement learning environment. CoRR. arXiv:1907.11180","DOI":"10.1609\/aaai.v34i04.5878"},{"key":"9455_CR5","unstructured":"Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, J\u00f3zefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Oliveira\u00a0Pinto HP, Raiman J, Salimans T, Schlatter J, Schneider J, Sidor S, Sutskever I, Tang J, Wolski F, Zhang S (2019) Dota 2 with large scale deep reinforcement learning. CoRR. arXiv:1912.06680"},{"key":"9455_CR6","unstructured":"Rashid T, Samvelyan M, Witt CS, Farquhar G, Foerster JN, Whiteson S (2018) QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. CoRR. arXiv:1803.11485"},{"key":"9455_CR7","unstructured":"Mahajan A, Rashid T, Samvelyan M, Whiteson S (2019) MAVEN: multi-agent variational exploration. CoRR. arXiv:1910.07483"},{"key":"9455_CR8","unstructured":"Yu C, Velu A, Vinitsky E, Wang Y, Bayen AM, Wu Y (2021) The surprising effectiveness of MAPPO in cooperative, multi-agent games. CoRR. arXiv:2103.01955"},{"key":"9455_CR9","unstructured":"Ta\u00efga AA, Fedus W, Machado MC, Courville AC, Bellemare MG (2021) On bonus-based exploration methods in the arcade learning environment. CoRR. arXiv:2109.11052"},{"key":"9455_CR10","unstructured":"Zhang T, Xu H, Wang X, Wu Y, Keutzer K, Gonzalez JE. Tian Y (2020) Bebold: Exploration beyond the boundary of explored regions. CoRR. arXiv:2012.08621"},{"key":"9455_CR11","doi-asserted-by":"crossref","unstructured":"Zhao R, Song J, Yuan Y, Haifeng H, Gao Y, Wu Y, Sun Z, Wei Y (2022) Maximum entropy population-based training for zero-shot human-AI coordination","DOI":"10.1609\/aaai.v37i5.25758"},{"key":"9455_CR12","unstructured":"Kapturowski S, Campos V, Jiang R, Raki\u0107evi\u0107 N, Hasselt H, Blundell C, Badia AP (2022) Human-level Atari 200x faster"},{"key":"9455_CR13","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap TP, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484\u2013489","journal-title":"Nature"},{"key":"9455_CR14","unstructured":"Cobbe K, Hesse C, Hilton J, Schulman J (2020) Leveraging procedural generation to benchmark reinforcement learning"},{"key":"9455_CR15","doi-asserted-by":"crossref","unstructured":"Ye D, Chen G, Zhang W, Chen S, Yuan B, Liu B, Chen J, Liu Z, Qiu F, Yu H, Yin Y, Shi B, Wang L, Shi T, Fu Q, Yang W, Huang L, Liu W (2020) Towards playing full MOBA games with deep reinforcement learning","DOI":"10.1609\/aaai.v34i04.6144"},{"key":"9455_CR16","unstructured":"Huang S, Chen W, Zhang L, Li Z, Zhu F, Ye D, Chen T, Zhu J (2021) Tikick: towards playing multi-agent football full games from single-agent demonstrations. CoRR. arXiv:2110.04507"},{"key":"9455_CR17","unstructured":"Lin F, Huang S, Pearce T, Chen W, Tu W-W (2023) TiZero: mastering multi-agent football with curriculum learning and self-play"},{"key":"9455_CR18","unstructured":"Liu X, Jia H, Wen Y, Yang Y, Hu Y, Chen Y, Fan C, Hu Z (2021) Unifying behavioral and response diversity for open-ended learning in zero-sum games. CoRR. arXiv:2106.04958"},{"key":"9455_CR19","unstructured":"Li C, Wu C, Wang T, Yang J, Zhao Q, Zhang C (2021) Celebrating diversity in shared multi-agent reinforcement learning. CoRR. arXiv:2106.02195"},{"key":"9455_CR20","unstructured":"Yang Y, Wang J (2021) An overview of multi-agent reinforcement learning from game theoretical perspective"},{"key":"9455_CR21","doi-asserted-by":"crossref","unstructured":"Kajii Y, Yamada K (2017) Multi-agent reinforcement learning. In: The proceedings of JSME annual conference on robotics and mechatronics (Robomec), pp 2\u2013109","DOI":"10.1299\/jsmermd.2017.2A1-G09"},{"key":"9455_CR22","unstructured":"Uddin\u00a0Mondal W, Aggarwal V, Ukkusuri SV (2022) Mean-field approximation of cooperative constrained multi-agent reinforcement learning (CMARL)"},{"key":"9455_CR23","doi-asserted-by":"crossref","unstructured":"Galliera R, Venable KB, Bassani M, Suri N (2023) Learning collaborative information dissemination with graph-based multi-agent reinforcement learning","DOI":"10.1007\/978-3-031-73903-3_11"},{"key":"9455_CR24","unstructured":"Mishra S, Anand A, Hoffmann J, Heess N, Riedmiller M, Abdolmaleki A, Precup D (2023) Policy composition in reinforcement learning via multi-objective policy optimization"},{"key":"9455_CR25","unstructured":"Maria, Grazia, Vigliotti: decentralized execution of constraint handling rules for ensembles. Comput Rev (2014)"},{"key":"9455_CR26","unstructured":"Rashid T, Samvelyan M, De\u00a0Witt CS, Farquhar G, Foerster J, Whiteson S (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning"},{"key":"9455_CR27","unstructured":"Schreuder N, Brunel V-E, Dalalyan A (2020) Statistical guarantees for generative models without domination"},{"key":"9455_CR28","unstructured":"Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K (2017) Population based training of neural networks"},{"key":"9455_CR29","doi-asserted-by":"crossref","unstructured":"Yu H, Zhang X, Song L, Jiang L, Huang X, Chen W, Zhang C, Li J, Yang J, Hu Z, Duan Q, Chen W, He X, Fan J, Jiang W, Zhang L, Qiu C, Gu M, Sun W, Zhang Y, Peng G, Shen W, Fu G (2020) Large-scale gastric cancer screening and localization using multi-task deep neural network","DOI":"10.1016\/j.neucom.2021.03.006"},{"issue":"3","key":"9455_CR30","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1007\/s10994-021-05946-3","volume":"110","author":"E H\u00fcllermeier","year":"2021","unstructured":"H\u00fcllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110(3):457\u2013506. https:\/\/doi.org\/10.1007\/s10994-021-05946-3","journal-title":"Mach Learn"},{"key":"9455_CR31","doi-asserted-by":"crossref","unstructured":"Gehrig M, Shrestha SB, Mouritzen, D, Scaramuzza D (2020) Event-based angular velocity regression with spiking networks","DOI":"10.1109\/ICRA40945.2020.9197133"},{"key":"9455_CR32","doi-asserted-by":"publisher","unstructured":"Ge Y, Xu S, Liu S, Fu Z, Sun F, Zhang Y (2020) Learning personalized risk preferences for recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. SIGIR \u201920. Association for Computing Machinery, New York, pp. 409\u2013418. https:\/\/doi.org\/10.1145\/3397271.3401056","DOI":"10.1145\/3397271.3401056"},{"key":"9455_CR33","unstructured":"Li Q, Huang J, Hu J, Gong S (2022) Feature-distribution perturbation and calibration for generalized person ReID"},{"key":"9455_CR34","doi-asserted-by":"publisher","DOI":"10.1016\/j.cnsns.2022.106994","volume":"118","author":"H Gampe","year":"2023","unstructured":"Gampe H, Griffin C (2023) Dynamics of a binary option market with exogenous information and price sensitivity. Commun Nonlinear Sci Numer Simul 118:106994. https:\/\/doi.org\/10.1016\/j.cnsns.2022.106994","journal-title":"Commun Nonlinear Sci Numer Simul"},{"key":"9455_CR35","unstructured":"Liu Z, Li X (2022) A novel Lagrange multiplier approach with relaxation for gradient flows"},{"key":"9455_CR36","doi-asserted-by":"crossref","unstructured":"Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Horgan D, Quan J, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, Gruslys A (2017) Deep Q-learning from demonstrations","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"9455_CR37","unstructured":"Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning"},{"issue":"9","key":"9455_CR38","doi-asserted-by":"publisher","first-page":"5546","DOI":"10.1109\/TSMC.2021.3130070","volume":"52","author":"G Wen","year":"2022","unstructured":"Wen G, Li B (2022) Optimized leader-follower consensus control using reinforcement learning for a class of second-order nonlinear multiagent systems. IEEE Trans Syst Man Cybern: Syst 52(9):5546\u20135555. https:\/\/doi.org\/10.1109\/TSMC.2021.3130070","journal-title":"IEEE Trans Syst Man Cybern: Syst"},{"key":"9455_CR39","doi-asserted-by":"crossref","unstructured":"Song Z, Ma C, Ding M, Yang HH, Qian Y, Zhou X (2023) Personalized federated deep reinforcement learning-based trajectory optimization for multi-UAV assisted edge computing","DOI":"10.1109\/ICCC57788.2023.10233399"},{"key":"9455_CR40","doi-asserted-by":"crossref","unstructured":"Cazenavette G, Wang T, Torralba A, Efros AA, Zhu J-Y (2022) Dataset distillation by matching training trajectories","DOI":"10.1109\/CVPR52688.2022.01045"},{"key":"9455_CR41","doi-asserted-by":"publisher","unstructured":"Tu V, Pham TL, Dao PN (2022) Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans 130:277\u2013292. https:\/\/doi.org\/10.1016\/j.isatra.2022.03.027","DOI":"10.1016\/j.isatra.2022.03.027"},{"key":"9455_CR42","doi-asserted-by":"publisher","first-page":"1062","DOI":"10.1109\/TSTE.2022.3148236","volume":"13","author":"Y Du","year":"2022","unstructured":"Du Y, Wu D (2022) Deep reinforcement learning from demonstrations to assist service restoration in islanded microgrids. IEEE Trans Sustain Energy 13:1062\u20131072","journal-title":"IEEE Trans Sustain Energy"},{"key":"9455_CR43","doi-asserted-by":"publisher","first-page":"3158","DOI":"10.1109\/LRA.2023.3253023","volume":"8","author":"Z Tang","year":"2023","unstructured":"Tang Z, Shi Y, Xu X (2023) CSGP: closed-loop safe grasp planning via attention-based deep reinforcement learning from demonstrations. IEEE Robot Autom Lett 8:3158\u20133165","journal-title":"IEEE Robot Autom Lett"},{"key":"9455_CR44","doi-asserted-by":"crossref","unstructured":"Martins FB, Machado MG, Bassani HF, Braga PHM, Barros ES (2021) rSoccer: A framework for studying reinforcement learning in small and very small size robot soccer","DOI":"10.1007\/978-3-030-98682-7_14"},{"issue":"3","key":"9455_CR45","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1177\/105971230501300301","volume":"13","author":"P Stone","year":"2005","unstructured":"Stone P, Sutton RS, Kuhlmann G (2005) Reinforcement learning for robocup soccer keepaway. Adapt Behav 13(3):165\u2013188. https:\/\/doi.org\/10.1177\/105971230501300301","journal-title":"Adapt Behav"},{"key":"9455_CR46","doi-asserted-by":"publisher","unstructured":"Kitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E (1997) Robocup: the robot world cup initiative. In: Proceedings of the first international conference on autonomous agents. AGENTS \u201997. Association for Computing Machinery, New York, pp 340\u2013347. https:\/\/doi.org\/10.1145\/267658.267738","DOI":"10.1145\/267658.267738"},{"key":"9455_CR47","doi-asserted-by":"crossref","unstructured":"Liu S, Lever G, Wang Z, Merel J, Eslami SMA, Hennes D, Czarnecki WM, Tassa Y, Omidshafiei S, Abdolmaleki A, Siegel NY, Hasenclever L, Marris L, Tunyasuvunakool S, Song HF, Wulfmeier M, Muller P, Haarnoja T, Tracey BD, Tuyls K, Graepel T, Heess N (2021) From motor control to team play in simulated humanoid football","DOI":"10.1126\/scirobotics.abo0235"},{"key":"9455_CR48","unstructured":"Fengming\u00a0Zhu ZL, Zhu, K (2020) WeKick. https:\/\/www.kaggle.com\/ c\/google-football\/discussion\/202232"},{"key":"9455_CR49","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28929-8","volume-title":"A concise introduction to decentralized POMDPs","author":"FA Oliehoek","year":"2016","unstructured":"Oliehoek FA, Amato C (2016) A concise introduction to decentralized POMDPs, 1st edn. Springer, Berlin","edition":"1"},{"key":"9455_CR50","doi-asserted-by":"crossref","unstructured":"Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Horgan D, Quan J, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, Gruslys A (2017) Deep Q-learning from demonstrations","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"9455_CR51","unstructured":"Vecerik M, Hester T, Scholz J, Wang F, Pietquin O, Piot B, Heess N, Roth\u00f6rl T, Lampe T, Riedmiller M (2018) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards"},{"key":"9455_CR52","doi-asserted-by":"crossref","unstructured":"Nair A, McGrew B, Andrychowicz M, Zaremba W, Abbeel P (2018) Overcoming exploration in reinforcement learning with demonstrations","DOI":"10.1109\/ICRA.2018.8463162"},{"key":"9455_CR53","doi-asserted-by":"crossref","unstructured":"Liang X, Wang T, Yang L, Xing E (2018) CIRL: controllable imitative reinforcement learning for vision-based self-driving","DOI":"10.1007\/978-3-030-01234-2_36"},{"key":"9455_CR54","unstructured":"Fu J, Luo K, Levine S (2018) Learning robust rewards with adversarial inverse reinforcement learning"},{"key":"9455_CR55","unstructured":"Hausman K, Chebotar Y, Schaal S, Sukhatme G, Lim J (2017) Multi-modal imitation learning from unstructured demonstrations using generative adversarial nets"},{"key":"9455_CR56","doi-asserted-by":"crossref","unstructured":"Zhang M, Wang Y, Ma X, Xia L, Yang J, Li Z, Li X (2020) Wasserstein distance guided adversarial imitation learning with reward shape exploration. CoRR. arXiv:2006.03503","DOI":"10.1109\/DDCLS49620.2020.9275169"},{"key":"9455_CR57","unstructured":"Weng L (2019) From GAN to WGAN. CoRR. arXiv:1904.08994"},{"issue":"1","key":"9455_CR58","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1146\/annurev-statistics-030718-104938","volume":"6","author":"VM Panaretos","year":"2019","unstructured":"Panaretos VM, Zemel Y (2019) Statistical aspects of Wasserstein distances. Annu Rev Stat Appl 6(1):405\u2013431. https:\/\/doi.org\/10.1146\/annurev-statistics-030718-104938","journal-title":"Annu Rev Stat Appl"},{"key":"9455_CR59","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1016\/j.neunet.2023.01.025","volume":"161","author":"J Xing","year":"2023","unstructured":"Xing J, Nagata T, Zou X, Neftci E, Krichmar JL (2023) Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization. Neural Netw 161:228\u2013241","journal-title":"Neural Netw"},{"key":"9455_CR60","unstructured":"Xing J, Nagata T, Zou X, Neftci E, Krichmar JL (2022) Policy distillation with selective input gradient regularization for efficient interpretability"},{"key":"9455_CR61","unstructured":"Rusu AA, Colmenarejo SG, G\u00fcl\u00e7ehre, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2015) Policy distillation. CoRR. arXiv:1511.06295"},{"key":"9455_CR62","unstructured":"Nowozin S, Cseke B, Tomioka R (2016) f-GAN: training generative neural samplers using variational divergence minimization"},{"key":"9455_CR63","unstructured":"Yu C, Velu A, Vinitsky E, Wang Y, Bayen AM, Wu Y (2021) The surprising effectiveness of MAPPO in cooperative, multi-agent games. CoRR. arXiv:2103.01955"},{"key":"9455_CR64","unstructured":"Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. CoRR. arXiv:1706.02275"},{"key":"9455_CR65","doi-asserted-by":"crossref","unstructured":"Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR. arXiv:1801.04381","DOI":"10.1109\/CVPR.2018.00474"},{"key":"9455_CR66","unstructured":"Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR. arXiv:1704.04861"},{"key":"9455_CR67","doi-asserted-by":"crossref","unstructured":"Cho K, Merrienboer B, G\u00fcl\u00e7ehre \u00c7, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. arXiv:1406.1078","DOI":"10.3115\/v1\/D14-1179"},{"key":"9455_CR68","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S (1997) Long short-term memory. Neural Comput 9:1735\u201380. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","journal-title":"Neural Comput"},{"key":"9455_CR69","doi-asserted-by":"publisher","unstructured":"Yu X, Li G, Chai C, Tang N (2020) Reinforcement learning with tree-LSTM for join order selection. In: 2020 IEEE 36th international conference on data engineering (ICDE), pp 1297\u20131308. https:\/\/doi.org\/10.1109\/ICDE48307.2020.00116","DOI":"10.1109\/ICDE48307.2020.00116"},{"key":"9455_CR70","doi-asserted-by":"publisher","unstructured":"Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. https:\/\/doi.org\/10.48550\/ARXIV.1312.6120. arXiv:1312.6120","DOI":"10.48550\/ARXIV.1312.6120"},{"key":"9455_CR71","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR. arXiv:1412.6980"},{"key":"9455_CR72","unstructured":"Espeholt L, Marinier R, Stanczyk P, Wang K, Michalski M (2020) SEED RL: scalable and efficient deep-RL with accelerated central inference"},{"key":"9455_CR73","unstructured":"Czarnecki WM, Pascanu R, Osindero S, Jayakumar SM, Swirszcz G, Jaderberg M (2019) Distilling policy distillation. CoRR. arXiv:1902.02186"},{"key":"9455_CR74","unstructured":"Automation CAoS RLChina Reinforcement Learning Community. Institute of Automation, Chinese Academy of Sciences. www.rlchina.org"},{"key":"9455_CR75","unstructured":"Automation CAoS Jidi. Institute of Automation, Chinese Academy of Sciences. http:\/\/www.jidiai.cn\/"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-09455-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-09455-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-09455-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,12]],"date-time":"2024-11-12T13:35:55Z","timestamp":1731418555000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-09455-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,22]]},"references-count":75,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2024,5]]}},"alternative-id":["9455"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-09455-x","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,22]]},"assertion":[{"value":"11 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 January 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not involve human subject for data collection. There is no need for ethical approval.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}}]}}