{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T23:52:46Z","timestamp":1781653966000,"version":"3.54.5"},"reference-count":78,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T00:00:00Z","timestamp":1778544000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T00:00:00Z","timestamp":1778544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Deutsches Zentrum f\u00fcr Luft- und Raumfahrt e.V. (DLR)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Quantum Mach. Intell."],"published-print":{"date-parts":[[2026,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Combining quantum computing techniques in the form of amplitude amplification with classical reinforcement learning has led to the so-called \u201chybrid agent for quantum-accessible reinforcement learning\u201d, which achieves a quadratic speedup in sample complexity for certain learning problems. So far, this hybrid agent has only been applied to stationary learning problems, that is, learning problems without any time dependency within components of the Markov decision process. In this work, we investigate the applicability of the hybrid agent to dynamic RL environments and thus to more realistic scenarios. To this end, we enhance the hybrid agent by introducing a dissipation mechanism and, with the resulting learning agent, perform an empirical comparison with a classical RL agent in an RL environment with a time-dependent reward function. Our findings suggest that the modified hybrid agent can adapt its behavior to changes in the environment quickly, leading to a higher average success probability compared to its classical counterpart.<\/jats:p>","DOI":"10.1007\/s42484-026-00383-8","type":"journal-article","created":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T10:46:25Z","timestamp":1778582785000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Quantum reinforcement learning in dynamic environments"],"prefix":"10.1007","volume":"8","author":[{"given":"Oliver","family":"Sefrin","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Manuel","family":"Radons","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lars","family":"Simon","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sabine","family":"W\u00f6lk","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,5,12]]},"reference":[{"key":"383_CR1","unstructured":"Abel D, Barreto A, Van Roy B, Precup D, Hasselt HP, Singh S (2023) A definition of continual reinforcement learning. Adv Neural Inf Process Syst vol 36. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/9d8cf1247786d6dfeefeeb53b8b5f6d7-Abstract-Conference.html"},{"key":"383_CR2","doi-asserted-by":"publisher","unstructured":"Acampora G, Cuzzocrea A, Lapegna M, Schiattarella R, Vitiello A (2025) Fuzzy reward-based reinforcement learning for clifford circuit synthesis. In: IEEE Int Conf Fuzzy Syst (FUZZ), pp 1\u20136. https:\/\/doi.org\/10.1109\/FUZZ62266.2025.11152245","DOI":"10.1109\/FUZZ62266.2025.11152245"},{"key":"383_CR3","doi-asserted-by":"publisher","unstructured":"Altmann P, Stein J, K\u00f6lle M, B\u00e4rligea A, Zorn M, Gabor T, Phan T, Feld S, Linnhoff-Popien C (2024) Challenges for reinforcement learning in quantum circuit design. In: IEEE Int Conf Quantum Comput Eng (QCE), vol 01, pp 1600\u20131610. https:\/\/doi.org\/10.1109\/QCE60285.2024.00187","DOI":"10.1109\/QCE60285.2024.00187"},{"key":"383_CR4","doi-asserted-by":"publisher","first-page":"183","DOI":"10.22331\/q-2019-09-02-183","volume":"3","author":"P Andreasson","year":"2019","unstructured":"Andreasson P, Johansson J, Liljestrand S, Granath M (2019) Quantum error correction for the toric code using deep reinforcement learning. Quantum 3:183. https:\/\/doi.org\/10.22331\/q-2019-09-02-183","journal-title":"Quantum"},{"key":"383_CR5","doi-asserted-by":"publisher","unstructured":"Boyer M, Brassard G, H\u00f8yer P, Tapp A (1998) Tight bounds on quantum searching. Fortschr Phys 46(4\u20135):493\u2013505. https:\/\/doi.org\/10.1002\/(SICI)1521-3978(199806)46:4\/5%3C;493::AID-PROP493%3E;3.0.CO;2-P","DOI":"10.1002\/(SICI)1521-3978(199806)46:4\/5%3C;493::AID-PROP493%3E;3.0.CO;2-P"},{"key":"383_CR6","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1090\/conm\/305\/05215","volume":"305","author":"G Brassard","year":"2002","unstructured":"Brassard G, H\u00f8yer P, Mosca M, Tapp A (2002) Quantum amplitude amplification and estimation. Contemp Math 305:53\u201374. https:\/\/doi.org\/10.1090\/conm\/305\/05215","journal-title":"Contemp Math"},{"issue":"1","key":"383_CR7","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1038\/srep00400","volume":"2","author":"HJ Briegel","year":"2012","unstructured":"Briegel HJ, De las Cuevas G (2012) Projective simulation for artificial intelligence. Sci Rep 2(1):400. https:\/\/doi.org\/10.1038\/srep00400","journal-title":"Sci Rep"},{"key":"383_CR8","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.8.031086","volume":"8","author":"M Bukov","year":"2018","unstructured":"Bukov M, Day AGR, Sels D, Weinberg P, Polkovnikov A, Mehta P (2018) Reinforcement learning in different phases of quantum control. Phys Rev X 8:031086. https:\/\/doi.org\/10.1103\/PhysRevX.8.031086","journal-title":"Phys Rev X"},{"issue":"9","key":"383_CR9","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1038\/s42254-021-00348-9","volume":"3","author":"M Cerezo","year":"2021","unstructured":"Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L, Coles PJ (2021) Variational quantum algorithms. Nat Rev Phys 3(9):625\u2013644. https:\/\/doi.org\/10.1038\/s42254-021-00348-9","journal-title":"Nat Rev Phys"},{"issue":"1","key":"383_CR10","doi-asserted-by":"publisher","first-page":"7907","DOI":"10.1038\/s41467-025-63099-6","volume":"16","author":"M Cerezo","year":"2025","unstructured":"Cerezo M, Larocca M, Garc\u00eda-Mart\u00edn D, Diaz NL, Braccia P, Fontana E, Rudolph MS, Bermejo P, Ijaz A, Thanasilp S, Anschuetz ER, Holmes Z (2025) Does provable absence of barren plateaus imply classical simulability? Nat Commun 16(1):7907. https:\/\/doi.org\/10.1038\/s41467-025-63099-6","journal-title":"Nat Commun"},{"key":"383_CR11","doi-asserted-by":"publisher","first-page":"141007","DOI":"10.1109\/ACCESS.2020.3010470","volume":"8","author":"SY-C Chen","year":"2020","unstructured":"Chen SY-C, Yang C-HH, Qi J, Chen P-Y, Ma X, Goan H-S (2020) Variational quantum circuits for deep reinforcement learning. IEEE Access 8:141007\u2013141024. https:\/\/doi.org\/10.1109\/ACCESS.2020.3010470","journal-title":"IEEE Access"},{"issue":"1","key":"383_CR12","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1007\/s42484-023-00137-w","volume":"6","author":"H-Y Chen","year":"2024","unstructured":"Chen H-Y, Chang Y-J, Liao S-W, Chang C-R (2024) Deep Q-learning with hybrid quantum neural network on solving maze problems. Quantum Mach Intell 6(1):2. https:\/\/doi.org\/10.1007\/s42484-023-00137-w","journal-title":"Quantum Mach Intell"},{"issue":"2","key":"383_CR13","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1007\/s42484-023-00116-1","volume":"5","author":"EA Cherrat","year":"2023","unstructured":"Cherrat EA, Kerenidis I, Prakash A (2023) Quantum reinforcement learning via policy iteration. Quantum Mach Intell 5(2):30. https:\/\/doi.org\/10.1007\/s42484-023-00116-1","journal-title":"Quantum Mach Intell"},{"key":"383_CR14","unstructured":"Choi SPM, Yeung D-Y, Zhang NL (1999) Hidden-mode markov decision processes. In: Proc 16th Int Jt Conf Artif Int (IJCAI-99), Stockholm, Sweden"},{"key":"383_CR15","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevA.102.042414","volume":"102","author":"C Ciliberto","year":"2020","unstructured":"Ciliberto C, Rocchetto A, Rudi A, Wossnig L (2020) Statistical limits of supervised quantum learning. Phys Rev A 102:042414. https:\/\/doi.org\/10.1103\/PhysRevA.102.042414","journal-title":"Phys Rev A"},{"key":"383_CR16","doi-asserted-by":"publisher","unstructured":"Cire\u015fan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: Proc 22nd Int Jt Conf Artif Int. https:\/\/doi.org\/10.5591\/978-1-57735-516-8\/IJCAI11-210","DOI":"10.5591\/978-1-57735-516-8\/IJCAI11-210"},{"key":"383_CR17","doi-asserted-by":"publisher","unstructured":"da Silva BC, Basso EW, Bazzan ALC, Engel PM (2006) Dealing with non-stationary environments using context detection. In: Proc 23rd Int Conf Mach Learn. https:\/\/doi.org\/10.1145\/1143844.1143872","DOI":"10.1145\/1143844.1143872"},{"issue":"1","key":"383_CR18","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1007\/s42484-022-00068-y","volume":"4","author":"N Dalla Pozza","year":"2022","unstructured":"Dalla Pozza N, Buffoni L, Martina S, Caruso F (2022) Quantum reinforcement learning: the maze problem. Quantum Mach Intell 4(1):11. https:\/\/doi.org\/10.1007\/s42484-022-00068-y","journal-title":"Quantum Mach Intell"},{"issue":"5","key":"383_CR19","doi-asserted-by":"publisher","first-page":"1207","DOI":"10.1109\/TSMCB.2008.925743","volume":"38","author":"D Dong","year":"2008","unstructured":"Dong D, Chen C, Li H, Tarn T-J (2008) Quantum reinforcement learning. IEEE Trans Syst Man Cybern B Cybern 38(5):1207\u20131220. https:\/\/doi.org\/10.1109\/TSMCB.2008.925743","journal-title":"IEEE Trans Syst Man Cybern B Cybern"},{"key":"383_CR20","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.117.130501","volume":"117","author":"V Dunjko","year":"2016","unstructured":"Dunjko V, Taylor JM, Briegel HJ (2016) Quantum-enhanced machine learning. Phys Rev Lett 117:130501. https:\/\/doi.org\/10.1103\/PhysRevLett.117.130501","journal-title":"Phys Rev Lett"},{"key":"383_CR21","unstructured":"Dunjko V, Liu Y-K, Wu X, Taylor JM (2018) Exponential improvements for quantum-accessible reinforcement learning. arxiv:1710.11160"},{"key":"383_CR22","unstructured":"Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proc 34th Int Conf Mach Learn"},{"key":"383_CR23","unstructured":"F\u00f6sel T, Niu MY, Marquardt F, Li L (2021) Quantum circuit optimization with deep reinforcement learning. arxiv:2103.07585"},{"key":"383_CR24","unstructured":"Ganguly B, Wu Y, Wang D, Aggarwal V (2023) Quantum computing provides exponential regret improvement in episodic reinforcement learning. arxiv:2302.08617"},{"key":"383_CR25","unstructured":"Gil-Fuster E, Gyurik C, P\u00e9rez-Salinas A, Dunjko V (2025) On the relation between trainability and dequantization of variational quantum learning models. arxiv:2406.07072"},{"key":"383_CR26","doi-asserted-by":"crossref","unstructured":"Giordano S, Sen K, Martin-Delgado MA (2025) Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesis. arxiv:2507.16641","DOI":"10.1007\/s42484-026-00359-8"},{"key":"383_CR27","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1103\/PhysRevLett.79.325","volume":"79","author":"LK Grover","year":"1997","unstructured":"Grover LK (1997) Quantum mechanics helps in searching for a needle in a haystack. Phys Rev Lett 79:325\u2013328. https:\/\/doi.org\/10.1103\/PhysRevLett.79.325","journal-title":"Phys Rev Lett"},{"issue":"3","key":"383_CR28","doi-asserted-by":"publisher","DOI":"10.1088\/1367-2630\/ac5b56","volume":"24","author":"A Hamann","year":"2022","unstructured":"Hamann A, W\u00f6lk S (2022) Performance analysis of a hybrid agent for quantum-accessible reinforcement learning. New J Phys 24(3):033044. https:\/\/doi.org\/10.1088\/1367-2630\/ac5b56","journal-title":"New J Phys"},{"issue":"2","key":"383_CR29","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1007\/s42484-021-00049-7","volume":"3","author":"A Hamann","year":"2021","unstructured":"Hamann A, Dunjko V, W\u00f6lk S (2021) Quantum-accessible reinforcement learning beyond strictly epochal environments. Quantum Mach Intell 3(2):22. https:\/\/doi.org\/10.1007\/s42484-021-00049-7","journal-title":"Quantum Mach Intell"},{"key":"383_CR30","unstructured":"Jerbi S, Gyurik C, Marshall SC, Briegel HJ, Dunjko V (2021) Parametrized quantum policies for reinforcement learning. In: Adv Neural Inf Process Syst 34, Virtual Event. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/eec96a7f788e88184c0e713456026f3f-Paper.pdf"},{"key":"383_CR31","doi-asserted-by":"publisher","DOI":"10.1103\/PRXQuantum.2.010328","volume":"2","author":"S Jerbi","year":"2021","unstructured":"Jerbi S, Trenkwalder LM, Poulsen Nautrup H, Briegel HJ, Dunjko V (2021) Quantum enhancements for deep reinforcement learning in large spaces. PRX Quantum 2:010328. https:\/\/doi.org\/10.1103\/PRXQuantum.2.010328","journal-title":"PRX Quantum"},{"key":"383_CR32","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1613\/jair.301","volume":"4","author":"LP Kaelbling","year":"1996","unstructured":"Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: A survey. J Artif Intell Res 4:237\u2013285. https:\/\/doi.org\/10.1613\/jair.301","journal-title":"J Artif Intell Res"},{"key":"383_CR33","doi-asserted-by":"publisher","first-page":"1401","DOI":"10.1613\/jair.1.13673","volume":"75","author":"K Khetarpal","year":"2022","unstructured":"Khetarpal K, Riemer M, Rish I, Precup D (2022) Towards continual reinforcement learning: A review and perspectives. J Artif Intell Res 75:1401\u20131476. https:\/\/doi.org\/10.1613\/jair.1.13673","journal-title":"J Artif Intell Res"},{"issue":"6","key":"383_CR34","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2017","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84\u201390. https:\/\/doi.org\/10.1145\/3065386","journal-title":"Commun ACM"},{"issue":"1","key":"383_CR35","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1038\/s41534-025-01065-2","volume":"11","author":"S Li","year":"2025","unstructured":"Li S, Fan Y, Li X, Ruan X, Zhao Q, Peng Z, Wu R-B, Zhang J, Song P (2025) Robust quantum control using reinforcement learning from demonstration. NPJ Quantum Inf 11(1):124. https:\/\/doi.org\/10.1038\/s41534-025-01065-2","journal-title":"NPJ Quantum Inf"},{"key":"383_CR36","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2019) Continuous control with deep reinforcement learning. arxiv:1509.02971"},{"issue":"9","key":"383_CR37","doi-asserted-by":"publisher","first-page":"1013","DOI":"10.1038\/s41567-021-01287-z","volume":"17","author":"Y Liu","year":"2021","unstructured":"Liu Y, Arunachalam S, Temme K (2021) A rigorous and robust quantum speed-up in supervised machine learning. Nat Phys 17(9):1013\u20131017. https:\/\/doi.org\/10.1038\/s41567-021-01287-z","journal-title":"Nat Phys"},{"key":"383_CR38","unstructured":"Lockwood O, Si M (2020a) Playing atari with hybrid quantum-classical reinforcement learning. In: Proc Mach Learn Res, Virtual Event. https:\/\/proceedings.mlr.press\/v148\/lockwood21a.html"},{"key":"383_CR39","doi-asserted-by":"publisher","unstructured":"Lockwood O, Si M (2020b) Reinforcement learning with quantum variational circuit. Proc AAAI Conf Artif Intell Interact Digit Entertain 16(1):245\u2013251. https:\/\/doi.org\/10.1609\/aiide.v16i1.7437","DOI":"10.1609\/aiide.v16i1.7437"},{"key":"383_CR40","unstructured":"Luketina J, Flennerhag S, Schroecker Y, Abel D, Zahavy T, Singh S (2022) Meta-gradients in non-stationary environments. In: Proc 1st conf lifelong learn agents proc mach learn res. PMLR, Montr\u00e9al, Canada, vol 199, pp 886\u2013901. https:\/\/proceedings.mlr.press\/v199\/luketina22a.html"},{"key":"383_CR41","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1007\/s00354-015-0102-0","volume":"33","author":"J Mautner","year":"2015","unstructured":"Mautner J, Makmal A, Manzano D, Tiersch M, Briegel HJ (2015) Projective simulation for classical learning agents: a comprehensive investigation. New Gener Comput 33:69\u2013114. https:\/\/doi.org\/10.1007\/s00354-015-0102-0","journal-title":"New Gener Comput"},{"key":"383_CR42","doi-asserted-by":"publisher","unstructured":"McCloskey M, Cohen, NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. Psychol Learn Motiv Academic Press, vol 24, pp 109\u2013165. https:\/\/doi.org\/10.1016\/S0079-7421(08)60536-8","DOI":"10.1016\/S0079-7421(08)60536-8"},{"key":"383_CR43","doi-asserted-by":"publisher","first-page":"64639","DOI":"10.1109\/ACCESS.2018.2876494","volume":"6","author":"AA Melnikov","year":"2018","unstructured":"Melnikov AA, Makmal A, Briegel HJ (2018) Benchmarking projective simulation in navigation problems. IEEE Access 6:64639\u201364648. https:\/\/doi.org\/10.1109\/ACCESS.2018.2876494","journal-title":"IEEE Access"},{"key":"383_CR44","doi-asserted-by":"publisher","unstructured":"Meyer N, Scherer DD, Plinge A, Mutschler C, Hartmann MJ (2023) Quantum natural policy gradients: towards sample-efficient reinforcement learning. In: IEEE int conf quantum comput eng (QCE). https:\/\/doi.org\/10.1109\/QCE57702.2023.10181","DOI":"10.1109\/QCE57702.2023.10181"},{"issue":"7540","key":"383_CR45","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533. https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"383_CR46","unstructured":"Molteni R, Gyurik C, Dunjko V (2024) Exponential quantum advantages in learning quantum observables from classical data. arxiv:2405.02027"},{"key":"383_CR47","doi-asserted-by":"publisher","first-page":"215","DOI":"10.22331\/q-2019-12-16-215","volume":"3","author":"HP Nautrup","year":"2019","unstructured":"Nautrup HP, Delfosse N, Dunjko V, Briegel HJ, Friis N (2019) Optimizing quantum error correction codes with reinforcement learning. Quantum 3:215. https:\/\/doi.org\/10.22331\/q-2019-12-16-215","journal-title":"Quantum"},{"issue":"1","key":"383_CR48","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1038\/s41534-019-0141-3","volume":"5","author":"MY Niu","year":"2019","unstructured":"Niu MY, Boixo S, Smelyanskiy VN, Neven H (2019) Universal quantum control through deep reinforcement learning. NPJ Quantum Inf 5(1):33. https:\/\/doi.org\/10.1038\/s41534-019-0141-3","journal-title":"NPJ Quantum Inf"},{"issue":"6","key":"383_CR49","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3459991","volume":"54","author":"S Padakandla","year":"2021","unstructured":"Padakandla S (2021) A survey of reinforcement learning algorithms for dynamically varying environments. ACM Comput Surv 54(6):1\u201325. https:\/\/doi.org\/10.1145\/3459991","journal-title":"ACM Comput Surv"},{"key":"383_CR50","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.4.031002","volume":"4","author":"GD Paparo","year":"2014","unstructured":"Paparo GD, Dunjko V, Makmal A, Martin-Delgado MA, Briegel HJ (2014) Quantum speedup for active learning agents. Phys Rev X 4:031002. https:\/\/doi.org\/10.1103\/PhysRevX.4.031002","journal-title":"Phys Rev X"},{"key":"383_CR51","doi-asserted-by":"publisher","unstructured":"Pieters M, Wiering MA (2016) Q-learning with experience replay in a dynamic environment. In: IEEE symp ser comput intell (SSCI). https:\/\/doi.org\/10.1109\/SSCI.2016.7849368","DOI":"10.1109\/SSCI.2016.7849368"},{"key":"383_CR52","doi-asserted-by":"publisher","first-page":"79","DOI":"10.22331\/q-2018-08-06-79","volume":"2","author":"J Preskill","year":"2018","unstructured":"Preskill J (2018) Quantum computing in the NISQ era and beyond. Quantum 2:79. https:\/\/doi.org\/10.22331\/q-2018-08-06-79","journal-title":"Quantum"},{"key":"383_CR53","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887","volume-title":"Markov decision processes: discrete stochastic dynamic programming","author":"ML Puterman","year":"1994","unstructured":"Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming, 1st edn. John Wiley & Sons Inc, Hoboken, NJ, USA","edition":"1"},{"key":"383_CR54","unstructured":"Ring MB (1994) Continual learning in reinforcement environments. PhD thesis, The University of Texas, Austin, TX, USA"},{"key":"383_CR55","volume-title":"On-line Q-learning using connectionist systems","author":"GA Rummery","year":"1994","unstructured":"Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems, vol 37. University of Cambridge, Department of Engineering, Cambridge, UK"},{"issue":"7849","key":"383_CR56","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1038\/s41586-021-03242-7","volume":"591","author":"V Saggio","year":"2021","unstructured":"Saggio V, Asenbeck BE, Hamann A, Str\u00f6mberg T, Schiansky P, Dunjko V, Friis N, Harris NC, Hochberg M, Englund D, W\u00f6lk S, Briegel HJ, Walther P (2021) Experimental quantum speed-up in reinforcement learning agents. Nature 591(7849):229\u2013233. https:\/\/doi.org\/10.1038\/s41586-021-03242-7","journal-title":"Nature"},{"issue":"7839","key":"383_CR57","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","volume":"588","author":"J Schrittwieser","year":"2020","unstructured":"Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D (2020) Mastering atari, go, chess and shogi by planning with a learned model. Nature 588(7839):604\u2013609. https:\/\/doi.org\/10.1038\/s41586-020-03051-4","journal-title":"Nature"},{"key":"383_CR58","doi-asserted-by":"publisher","DOI":"10.1103\/PRXQuantum.3.030101","volume":"3","author":"M Schuld","year":"2022","unstructured":"Schuld M, Killoran N (2022) Is quantum advantage the right goal for quantum machine learning? PRX Quantum 3:030101. https:\/\/doi.org\/10.1103\/PRXQuantum.3.030101","journal-title":"PRX Quantum"},{"key":"383_CR59","doi-asserted-by":"publisher","unstructured":"Schuld M, Petruccione F (2021) Machine learning with quantum computers. Springer, Cham, Switzerland. https:\/\/doi.org\/10.1007\/978-3-030-83098-4","DOI":"10.1007\/978-3-030-83098-4"},{"key":"383_CR60","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arxiv:1707.06347"},{"issue":"1","key":"383_CR61","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1007\/s42484-025-00269-1","volume":"7","author":"O Sefrin","year":"2025","unstructured":"Sefrin O, W\u00f6lk S (2025) A hybrid learning agent for episodic learning tasks with unknown target distance. Quantum Mach Intell 7(1):52. https:\/\/doi.org\/10.1007\/s42484-025-00269-1","journal-title":"Quantum Mach Intell"},{"key":"383_CR62","doi-asserted-by":"publisher","unstructured":"Shor PW (1994) Algorithms for quantum computation: discrete logarithms and factoring. In: Proc 35th annu symp found comput sci. https:\/\/doi.org\/10.1109\/SFCS.1994.365700","DOI":"10.1109\/SFCS.1994.365700"},{"key":"383_CR63","unstructured":"Singh S, Lewis RL, Barto AG (2009) Where do rewards come from? In: Proc 31st annu conf cogn sci soc, pp 2601\u20132606. https:\/\/all.cs.umass.edu\/pubs\/2009\/singh_l_b_09.pdf"},{"key":"383_CR64","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.12.011059","volume":"12","author":"VV Sivak","year":"2022","unstructured":"Sivak VV, Eickbusch A, Liu H, Royer B, Tsioutsios I, Devoret MH (2022) Model-free quantum control with reinforcement learning. Phys Rev X 12:011059. https:\/\/doi.org\/10.1103\/PhysRevX.12.011059","journal-title":"Phys Rev X"},{"key":"383_CR65","doi-asserted-by":"publisher","first-page":"720","DOI":"10.22331\/q-2022-05-24-720","volume":"6","author":"A Skolik","year":"2022","unstructured":"Skolik A, Jerbi S, Dunjko V (2022) Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. Quantum 6:720. https:\/\/doi.org\/10.22331\/q-2022-05-24-720","journal-title":"Quantum"},{"key":"383_CR66","unstructured":"Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: Proc. 32nd int conf mach learn. https:\/\/proceedings.mlr.press\/v37\/sohl-dickstein15.html"},{"issue":"1","key":"383_CR67","doi-asserted-by":"publisher","DOI":"10.1088\/2058-9565\/aaef5e","volume":"4","author":"T Sriarunothai","year":"2018","unstructured":"Sriarunothai T, W\u00f6lk S, Giri GS, Friis N, Dunjko V, Briegel HJ, Wunderlich C (2018) Speeding-up the decision making of a learning agent using an ion trap quantum processor. Quantum Sci Technol 4(1):015014. https:\/\/doi.org\/10.1088\/2058-9565\/aaef5e","journal-title":"Quantum Sci Technol"},{"key":"383_CR68","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. 2nd Edition. A Bradford Book, Cambridge, MA, USA. http:\/\/incompleteideas.net\/book\/the-book-2nd.html"},{"key":"383_CR69","unstructured":"Towers M, Kwiatkowski A, Terry J, Balis JU, Cola GD, Deleu T, Goul\u00e3o M, Kallinteris A, Krimmel M, KG A, Perez-Vicente R, Pierr\u00e9 A, Schulhoff S, Tai JJ, Tan H, Younis OG (2025) Gymnasium: a standard interface for reinforcement learning environments. arxiv:2407.17032"},{"key":"383_CR70","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Adv neural inf process syst. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"issue":"3","key":"383_CR71","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279\u2013292. https:\/\/doi.org\/10.1007\/BF00992698","journal-title":"Mach Learn"},{"key":"383_CR72","unstructured":"Wiedemann S, Hein D, Udluft S, Mendl C (2023) Quantum policy iteration via amplitude estimation and grover search \u2013 towards quantum advantage for reinforcement learning. arxiv:2206.04741"},{"key":"383_CR73","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/bf00992696","volume":"8","author":"RJ Williams","year":"1992","unstructured":"Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229\u2013256. https:\/\/doi.org\/10.1007\/bf00992696","journal-title":"Mach Learn"},{"key":"383_CR74","doi-asserted-by":"publisher","first-page":"1660","DOI":"10.22331\/q-2025-03-12-1660","volume":"9","author":"S Wu","year":"2025","unstructured":"Wu S, Jin S, Wen D, Han D, Wang X (2025) Quantum reinforcement learning in continuous action space. Quantum 9:1660. https:\/\/doi.org\/10.22331\/q-2025-03-12-1660","journal-title":"Quantum"},{"issue":"3","key":"383_CR75","doi-asserted-by":"publisher","first-page":"1087","DOI":"10.1109\/TAI.2022.3225256","volume":"5","author":"H Yu","year":"2024","unstructured":"Yu H, Zhao X (2024) Deep reinforcement learning with reward design for quantum control. IEEE Trans Artif Intell 5(3):1087\u20131101. https:\/\/doi.org\/10.1109\/TAI.2022.3225256","journal-title":"IEEE Trans Artif Intell"},{"issue":"1","key":"383_CR76","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1038\/s41534-019-0201-8","volume":"5","author":"X-M Zhang","year":"2019","unstructured":"Zhang X-M, Wei Z, Asad R, Yang X-C, Wang X (2019) When does reinforcement learning stand out in quantum control? a comparative study on state preparation. NPJ Quantum Inf 5(1):85. https:\/\/doi.org\/10.1038\/s41534-019-0201-8","journal-title":"NPJ Quantum Inf"},{"key":"383_CR77","unstructured":"Zhong H, Hu J, Xue Y, Li T, Wang L (2024) Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret. arxiv:2302.10796"},{"key":"383_CR78","unstructured":"Ziegler DM, Stiennon N, Wu J, Brown TB, Radford A, Amodei D, Christiano P, Irving G (2020) Fine-tuning language models from human preferences. arxiv:1909.08593"}],"container-title":["Quantum Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42484-026-00383-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42484-026-00383-8","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42484-026-00383-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T23:00:21Z","timestamp":1781650821000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42484-026-00383-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,12]]},"references-count":78,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,6]]}},"alternative-id":["383"],"URL":"https:\/\/doi.org\/10.1007\/s42484-026-00383-8","relation":{},"ISSN":["2524-4906","2524-4914"],"issn-type":[{"value":"2524-4906","type":"print"},{"value":"2524-4914","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,12]]},"assertion":[{"value":"2 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 March 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Clinical trial number"}}],"article-number":"58"}}