{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T23:09:51Z","timestamp":1778627391563,"version":"3.51.4"},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T00:00:00Z","timestamp":1744329600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T00:00:00Z","timestamp":1744329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Deutsches Zentrum f\u00fcr Luft- und Raumfahrt e.V. (DLR)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Quantum Mach. Intell."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The \u201chybrid agent for quantum-accessible reinforcement learning,\u201d as defined in (Hamann and W\u00f6lk New J Phys 24:033044 2022), provides a proven quasi-quadratic speedup and is experimentally tested. However, the standard version can only be applied to episodic learning tasks with fixed episode length. In many real-world applications, the information about the necessary number of steps within an episode to reach a defined target is not available in advance and especially before reaching the target for the first time. Furthermore, in such scenarios, classical agents have the advantage of observing at which step they reach the target. How to best deal with an unknown target distance in classical and quantum reinforcement learning and whether the hybrid agent can provide an advantage in such learning scenarios is unknown so far. In this work, we introduce a hybrid agent with a stochastic episode length selection strategy to alleviate the need for knowledge about the necessary episode length. Through simulations, we test the adapted hybrid agent\u2019s performance versus classical counterparts with and without similar episode selection strategies. Our simulations demonstrate a speedup in certain scenarios due to our developed episode length selection strategy for classical learning agents as well as an additional speedup for our resulting hybrid learning agent.<\/jats:p>","DOI":"10.1007\/s42484-025-00269-1","type":"journal-article","created":{"date-parts":[[2025,4,11]],"date-time":"2025-04-11T04:44:57Z","timestamp":1744346697000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A hybrid learning agent for episodic learning tasks with unknown target distance"],"prefix":"10.1007","volume":"7","author":[{"given":"Oliver","family":"Sefrin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sabine","family":"W\u00f6lk","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,4,11]]},"reference":[{"key":"269_CR1","unstructured":"Biamonte J, Bergholm V (2017) Tensor networks in a nutshell. arXiv:1708.00006"},{"key":"269_CR2","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1038\/nature23474","volume":"549","author":"J Biamonte","year":"2017","unstructured":"Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, Lloyd S (2017) Quantum machine learning. Nature 549:195\u2013202. https:\/\/doi.org\/10.1038\/nature23474","journal-title":"Nature"},{"key":"269_CR3","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1002\/(SICI)1521-3978(199806)46:4\/5<493::AID-PROP493>3.0.CO;2-P","volume":"46","author":"M Boyer","year":"1998","unstructured":"Boyer M, Brassard G, H\u00f8yer P, Tapp A (1998) Tight bounds on quantum searching. Fortschr Phys 46:493\u2013505. https:\/\/doi.org\/10.1002\/(SICI)1521-3978(199806)46:4\/5<493::AID-PROP493>3.0.CO;2-P","journal-title":"Fortschr Phys"},{"key":"269_CR4","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1090\/conm\/305\/05215","volume":"305","author":"G Brassard","year":"2002","unstructured":"Brassard G, H\u00f8yer P, Mosca M, Tapp A (2002) Quantum amplitude amplification and estimation. Contemp Math 305:53\u201374. https:\/\/doi.org\/10.1090\/conm\/305\/05215","journal-title":"Contemp Math"},{"key":"269_CR5","doi-asserted-by":"publisher","DOI":"10.1088\/1751-8121\/aa6dc3","volume":"50","author":"JC Bridgeman","year":"2017","unstructured":"Bridgeman JC, Chubb CT (2017) Hand-waving and interpretive dance: an introductory course on tensor networks. J Phys A 50:223001. https:\/\/doi.org\/10.1088\/1751-8121\/aa6dc3","journal-title":"J Phys A"},{"key":"269_CR6","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1038\/srep00400","volume":"2","author":"HJ Briegel","year":"2012","unstructured":"Briegel HJ, Cuevas G (2012) Projective simulation for artificial intelligence. Sci Rep 2:400. https:\/\/doi.org\/10.1038\/srep00400","journal-title":"Sci Rep"},{"key":"269_CR7","unstructured":"Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym. arXiv:1606.01540"},{"key":"269_CR8","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.8.031086","volume":"8","author":"M Bukov","year":"2018","unstructured":"Bukov M, Day AGR, Sels D, Weinberg P, Polkovnikov A, Mehta P (2018) Reinforcement learning in different phases of quantum control. Phys Rev X 8:031086. https:\/\/doi.org\/10.1103\/PhysRevX.8.031086","journal-title":"Phys Rev X"},{"key":"269_CR9","doi-asserted-by":"publisher","unstructured":"Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, McClean JR, Mitarai K, Yuan X, Cincio L, Coles PJ (2021) Variational quantum algorithms. Nat Rev Phys 3:625\u2013644. https:\/\/doi.org\/10.1038\/s42254-021-00348-9","DOI":"10.1038\/s42254-021-00348-9"},{"key":"269_CR10","unstructured":"Cerezo M, Larocca M, Garc\u00eda-Mart\u00edn D, Diaz NL, Braccia P, Fontana E, Rudolph MS, Bermejo P, Ijaz A, Thanasilp S, Anschuetz ER, Holmes Z (2024) Does provable absence of barren plateaus imply classical simulability? Or, why we need to rethink variational quantum computing. arXiv:2312.09121"},{"key":"269_CR11","doi-asserted-by":"publisher","first-page":"141007","DOI":"10.1109\/ACCESS.2020.3010470","volume":"8","author":"SY-C Chen","year":"2020","unstructured":"Chen SY-C, Yang C-HH, Qi J, Chen P-Y, Ma X, Goan H-S (2020) Variational quantum circuits for deep reinforcement learning. IEEE Access 8:141007\u2013141024. https:\/\/doi.org\/10.1109\/ACCESS.2020.3010470","journal-title":"IEEE Access"},{"key":"269_CR12","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1007\/s42484-023-00137-w","volume":"6","author":"H-Y Chen","year":"2024","unstructured":"Chen H-Y, Chang Y-J, Liao S-W, Chang C-R (2024) Deep Q-learning with hybrid quantum neural network on solving maze problems. Quant Mach Intell 6:2. https:\/\/doi.org\/10.1007\/s42484-023-00137-w","journal-title":"Quant Mach Intell"},{"key":"269_CR13","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1007\/s42484-023-00116-1","volume":"5","author":"EA Cherrat","year":"2023","unstructured":"Cherrat EA, Kerenidis I, Prakash A (2023) Quantum reinforcement learning via policy iteration. Quant Mach Intell 5:30. https:\/\/doi.org\/10.1007\/s42484-023-00116-1","journal-title":"Quant Mach Intell"},{"key":"269_CR14","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1007\/s42484-022-00068-y","volume":"4","author":"N Dalla Pozza","year":"2022","unstructured":"Dalla Pozza N, Buffoni L, Martina S, Caruso F (2022) Quantum reinforcement learning: the maze problem. Quant Mach Intell 4:11. https:\/\/doi.org\/10.1007\/s42484-022-00068-y","journal-title":"Quant Mach Intell"},{"key":"269_CR15","doi-asserted-by":"publisher","first-page":"1207","DOI":"10.1109\/TSMCB.2008.925743","volume":"38","author":"D Dong","year":"2008","unstructured":"Dong D, Chen C, Li H, Tarn T-J (2008) Quantum reinforcement learning. IEEE Trans Syst Man Cybern B Cybern 38:1207\u20131220. https:\/\/doi.org\/10.1109\/TSMCB.2008.925743","journal-title":"IEEE Trans Syst Man Cybern B Cybern"},{"key":"269_CR16","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.117.130501","volume":"117","author":"V Dunjko","year":"2016","unstructured":"Dunjko V, Taylor JM, Briegel HJ (2016) Quantum-enhanced machine learning. Phys Rev Lett 117:130501. https:\/\/doi.org\/10.1103\/PhysRevLett.117.130501","journal-title":"Phys Rev Lett"},{"key":"269_CR17","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.8.031084","volume":"8","author":"T F\u00f6sel","year":"2018","unstructured":"F\u00f6sel T, Tighineanu P, Weiss T, Marquardt F (2018) Reinforcement learning with neural networks for quantum feedback. Phys Rev X 8:031084. https:\/\/doi.org\/10.1103\/PhysRevX.8.031084","journal-title":"Phys Rev X"},{"key":"269_CR18","unstructured":"F\u00f6sel T, Niu MY, Marquardt F, Li L (2021) Quantum circuit optimization with deep reinforcement learning. arXiv:2103.07585"},{"key":"269_CR19","unstructured":"Ganguly B, Wu Y, Wang D, Aggarwal V (2023) Quantum computing provides exponential regret improvement in episodic reinforcement learning. arXiv:2302.08617"},{"key":"269_CR20","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1103\/PhysRevLett.79.325","volume":"79","author":"LK Grover","year":"1997","unstructured":"Grover LK (1997) Quantum mechanics helps in searching for a needle in a haystack. Phys Rev Lett 79:325\u2013328. https:\/\/doi.org\/10.1103\/PhysRevLett.79.325","journal-title":"Phys Rev Lett"},{"key":"269_CR21","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevA.110.012605","volume":"110","author":"M Guatto","year":"2024","unstructured":"Guatto M, Susto GA, Ticozzi F (2024) Improving robustness of quantum feedback control with reinforcement learning. Phys Rev A 110:012605. https:\/\/doi.org\/10.1103\/PhysRevA.110.012605","journal-title":"Phys Rev A"},{"key":"269_CR22","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1007\/s42484-021-00049-7","volume":"3","author":"A Hamann","year":"2021","unstructured":"Hamann A, Dunjko V, W\u00f6lk S (2021) Quantum-accessible reinforcement learning beyond strictly epochal environments. Quant Mach Intell 3:22. https:\/\/doi.org\/10.1007\/s42484-021-00049-7","journal-title":"Quant Mach Intell"},{"key":"269_CR23","doi-asserted-by":"publisher","DOI":"10.1088\/1367-2630\/ac5b56","volume":"24","author":"A Hamann","year":"2022","unstructured":"Hamann A, W\u00f6lk S (2022) Performance analysis of a hybrid agent for quantum-accessible reinforcement learning. New J Phys 24:033044. https:\/\/doi.org\/10.1088\/1367-2630\/ac5b56","journal-title":"New J Phys"},{"key":"269_CR24","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1016\/j.procs.2024.01.038","volume":"232","author":"A H\u00e4mmerle","year":"2024","unstructured":"H\u00e4mmerle A, Heindl C, St\u00fcbl G, Thapa J, Lamon E, Pichler A (2024) Applying grid world based reinforcement learning to real world collaborative transport. Procedia Comput Sci 232:388\u2013396. https:\/\/doi.org\/10.1016\/j.procs.2024.01.038","journal-title":"Procedia Comput Sci"},{"key":"269_CR25","doi-asserted-by":"publisher","first-page":"87217","DOI":"10.1109\/ACCESS.2024.3417808","volume":"12","author":"H Hohenfeld","year":"2024","unstructured":"Hohenfeld H, Heimann D, Wiebe F, Kirchner F (2024) Quantum deep reinforcement learning for robot navigation tasks. IEEE Access 12:87217\u201387236. https:\/\/doi.org\/10.1109\/ACCESS.2024.3417808","journal-title":"IEEE Access"},{"key":"269_CR26","doi-asserted-by":"publisher","DOI":"10.1088\/2058-9565\/aaea94","volume":"4","author":"W Huggins","year":"2019","unstructured":"Huggins W, Patil P, Mitchell B, Whaley KB, Stoudenmire EM (2019) Towards quantum machine learning with tensor networks. Quant Sci Technol 4:024001. https:\/\/doi.org\/10.1088\/2058-9565\/aaea94","journal-title":"Quant Sci Technol"},{"key":"269_CR27","unstructured":"Jerbi S, Gyurik C, Marshall SC, Briegel HJ, Dunjko V (2021) Parametrized quantum policies for reinforcement learning. In: Advances in neural information processing systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, Virtual event, pp 28362\u201328375. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/eec96a7f788e88184c0e713456026f3f-Abstract.html"},{"key":"269_CR28","unstructured":"Kitaev AY (1995) Quantum measurements and the Abelian stabilizer problem. https:\/\/arxiv.org\/abs\/quant-ph\/9511026"},{"key":"269_CR29","unstructured":"Kwiatkowski A, Towers M, Terry J, Balis JU, De\u00a0Cola G, Deleu T, Goul\u00e3o M, Kallinteris A, Krimmel M, KG A, Perez-Vicente R, Pierr\u00e9 A, Schulhoff S, Tai JJ, Tan H, Younis OG (2024) Gymnasium: a standard interface for reinforcement learning environments. arXiv:2407.17032"},{"key":"269_CR30","doi-asserted-by":"publisher","unstructured":"Lamata L (2017) Basic protocols in quantum reinforcement learning with superconducting circuits. Sci Rep 7:1609. https:\/\/doi.org\/10.1038\/s41598-017-01711-6","DOI":"10.1038\/s41598-017-01711-6"},{"key":"269_CR31","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1038\/s41562-019-0804-2","volume":"4","author":"J-A Li","year":"2020","unstructured":"Li J-A, Dong D, Wei Z, Liu Y, Pan Y, Nori F, Zhang X (2020) Quantum reinforcement learning during human decision-making. Nat Hum Behav 4:294\u2013307. https:\/\/doi.org\/10.1038\/s41562-019-0804-2","journal-title":"Nat Hum Behav"},{"key":"269_CR32","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2019) Continuous control with deep reinforcement learning. arXiv:1509.02971"},{"key":"269_CR33","unstructured":"Lockwood O (2022) Optimizing quantum variational circuits with deep reinforcement learning. arXiv:2109.03188"},{"key":"269_CR34","doi-asserted-by":"publisher","unstructured":"Lockwood O, Si M (2020) Reinforcement learning with quantum variational circuits. In: Proceedings of the AAAI conference on artificial intelligence and interactive digital entertainment, Lexington, KY. https:\/\/doi.org\/10.1609\/aiide.v16i1.7437","DOI":"10.1609\/aiide.v16i1.7437"},{"key":"269_CR35","unstructured":"Lockwood O, Si M (2021) Playing atari with hybrid quantum-classical reinforcement learning. In: NeurIPS 2020 Workshop on Pre-registration in Machine Learning, Virtual event. http:\/\/proceedings.mlr.press\/v148\/lockwood21a.html"},{"key":"269_CR36","doi-asserted-by":"crossref","unstructured":"Mandal D, Radanovic G, Gan J, Singla A, Majumdar R (2023) Online reinforcement learning with uncertain episode lengths. arXiv:2302.03608","DOI":"10.1609\/aaai.v37i7.26088"},{"key":"269_CR37","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518:529\u2013533. https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"269_CR38","doi-asserted-by":"publisher","unstructured":"Nautrup HP, Delfosse N, Dunjko V, Briegel HJ, Friis N (2019) Optimizing quantum error correction codes with reinforcement learning. Quantum 3:215. https:\/\/doi.org\/10.22331\/q-2019-12-16-215","DOI":"10.22331\/q-2019-12-16-215"},{"key":"269_CR39","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511976667","volume-title":"Quantum computation and quantum information: 10th","author":"MA Nielsen","year":"2010","unstructured":"Nielsen MA, Chuang IL (2010) Quantum computation and quantum information: 10th, Anniversary. Univ. Press, Cambridge, Camb. https:\/\/doi.org\/10.1017\/CBO9780511976667","edition":"Anniversary"},{"key":"269_CR40","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/j.aop.2014.06.013","volume":"349","author":"R Or\u00fas","year":"2014","unstructured":"Or\u00fas R (2014) A practical introduction to tensor networks: matrix product states and projected entangled pair states. Ann Phys 349:117\u2013158. https:\/\/doi.org\/10.1016\/j.aop.2014.06.013","journal-title":"Ann Phys"},{"key":"269_CR41","unstructured":"Ostaszewski M, Trenkwalder LM, Masarczyk W, Scerri E, Dunjko V (2021) Reinforcement learning for optimization of variational quantum circuit architectures. In: Advances in neural information processing systems 34: Annual conference on neural information processing systems 2021, NeurIPS 2021, Virtual event, pp 18182\u201318194. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/9724412729185d53a2e3e7f889d9f057-Abstract.html"},{"key":"269_CR42","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.4.031002","volume":"4","author":"GD Paparo","year":"2014","unstructured":"Paparo GD, Dunjko V, Makmal A, Martin-Delgado MA, Briegel HJ (2014) Quantum speedup for active learning agents. Phys Rev X 4:031002. https:\/\/doi.org\/10.1103\/PhysRevX.4.031002","journal-title":"Phys Rev X"},{"key":"269_CR43","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1007\/BF01458701","volume":"84","author":"G P\u00f3lya","year":"1921","unstructured":"P\u00f3lya G (1921) \u00dcber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt im Stra\u00dfennetz. Math Ann 84:149\u2013160. https:\/\/doi.org\/10.1007\/BF01458701","journal-title":"Math Ann"},{"key":"269_CR44","doi-asserted-by":"publisher","unstructured":"Preskill J (2018) Quantum computing in the NISQ era and beyond. Quantum 2:79. https:\/\/doi.org\/10.22331\/q-2018-08-06-79","DOI":"10.22331\/q-2018-08-06-79"},{"key":"269_CR45","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/adaf75","volume":"6","author":"F Rapp","year":"2025","unstructured":"Rapp F, Kreplin DA, Huber MF, Roth M (2025) Reinforcement learning-based architecture search for quantum machine learning. Mach Learn: Sci Technol 6:015041. https:\/\/doi.org\/10.1088\/2632-2153\/adaf75","journal-title":"Mach Learn: Sci Technol"},{"key":"269_CR46","doi-asserted-by":"crossref","unstructured":"Ruiz FJR, Laakkonen T, Bausch J, Balog M, Barekatain M, Heras FJH, Novikov A, Fitzpatrick N, Romera-Paredes B, Wetering J, Fawzi A, Meichanetzidis K, Kohli P (2024) Quantum circuit optimization with AlphaTensor. arXiv:2402.14396","DOI":"10.1038\/s42256-025-01001-1"},{"key":"269_CR47","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1038\/s41586-021-03242-7","volume":"591","author":"V Saggio","year":"2021","unstructured":"Saggio V, Asenbeck BE, Hamann A, Str\u00f6mberg T, Schiansky P, Dunjko V, Friis N, Harris NC, Hochberg M, Englund D, W\u00f6lk S, Briegel HJ, Walther P (2021) Experimental quantum speed-up in reinforcement learning agents. Nature 591:229\u2013233. https:\/\/doi.org\/10.1038\/s41586-021-03242-7","journal-title":"Nature"},{"key":"269_CR48","doi-asserted-by":"publisher","DOI":"10.1103\/PRXQuantum.3.030101","volume":"3","author":"M Schuld","year":"2022","unstructured":"Schuld M, Killoran N (2022) Is quantum advantage the right goal for quantum machine learning? PRX Quantum 3:030101. https:\/\/doi.org\/10.1103\/PRXQuantum.3.030101","journal-title":"PRX Quantum"},{"key":"269_CR49","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347"},{"key":"269_CR50","unstructured":"Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp 387\u2013395. https:\/\/proceedings.mlr.press\/v32\/silver14.html"},{"key":"269_CR51","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550:354\u2013359. https:\/\/doi.org\/10.1038\/nature24270","journal-title":"Nature"},{"key":"269_CR52","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.12.011059","volume":"12","author":"VV Sivak","year":"2022","unstructured":"Sivak VV, Eickbusch A, Liu H, Royer B, Tsioutsios I, Devoret MH (2022) Model-free quantum control with reinforcement learning. Phys Rev X 12:011059. https:\/\/doi.org\/10.1103\/PhysRevX.12.011059","journal-title":"Phys Rev X"},{"key":"269_CR53","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1038\/s41586-023-05782-6","volume":"616","author":"VV Sivak","year":"2023","unstructured":"Sivak VV, Eickbusch A, Royer B, Singh S, Tsioutsios I, Ganjam S, Miano A, Brock BL, Ding AZ, Frunzio L, Girvin SM, Schoelkopf RJ, Devoret MH (2023) Real-time quantum error correction beyond break-even. Nature 616:50\u201355. https:\/\/doi.org\/10.1038\/s41586-023-05782-6","journal-title":"Nature"},{"key":"269_CR54","doi-asserted-by":"publisher","unstructured":"Skolik A, Jerbi S, Dunjko V (2022) Quantum agents in the gym: a variational quantum algorithm for deep Q-learning. Quantum 6:720. https:\/\/doi.org\/10.22331\/q-2022-05-24-720","DOI":"10.22331\/q-2022-05-24-720"},{"key":"269_CR55","doi-asserted-by":"publisher","DOI":"10.1088\/2058-9565\/aaef5e","volume":"4","author":"T Sriarunothai","year":"2018","unstructured":"Sriarunothai T, W\u00f6lk S, Giri GS, Friis N, Dunjko V, Briegel HJ, Wunderlich C (2018) Speeding-up the decision making of a learning agent using an ion trap quantum processor. Quant Sci Technol 4:015014. https:\/\/doi.org\/10.1088\/2058-9565\/aaef5e","journal-title":"Quant Sci Technol"},{"key":"269_CR56","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. 2nd Edition. A Bradford Book, Cambridge. http:\/\/incompleteideas.net\/book\/the-book-2nd.html"},{"key":"269_CR57","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279\u2013292. https:\/\/doi.org\/10.1007\/BF00992698","journal-title":"Mach Learn"},{"key":"269_CR58","unstructured":"Wiedemann S, Hein D, Udluft S, Mendl C (2023) Quantum policy iteration via amplitude estimation and Grover search \u2013 towards quantum advantage for reinforcement learning. arXiv:2206.04741"},{"key":"269_CR59","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1109\/TCST.2024.3437142","volume":"33","author":"H Yu","year":"2025","unstructured":"Yu H, Zhao X, Chen C (2025) Quantum-inspired reinforcement learning for quantum control. IEEE Trans Control Syst Technol 33:61\u201376. https:\/\/doi.org\/10.1109\/TCST.2024.3437142","journal-title":"IEEE Trans Control Syst Technol"},{"key":"269_CR60","unstructured":"Zhong H, Hu J, Xue Y, Li T, Wang L (2024) Provably efficient exploration in quantum reinforcement learning with logarithmic worst-case regret. arXiv:2302.10796"}],"container-title":["Quantum Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42484-025-00269-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42484-025-00269-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42484-025-00269-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T14:40:34Z","timestamp":1750948834000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42484-025-00269-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,11]]},"references-count":60,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["269"],"URL":"https:\/\/doi.org\/10.1007\/s42484-025-00269-1","relation":{},"ISSN":["2524-4906","2524-4914"],"issn-type":[{"value":"2524-4906","type":"print"},{"value":"2524-4914","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,11]]},"assertion":[{"value":"17 December 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"52"}}