{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:30:34Z","timestamp":1776277834379,"version":"3.50.1"},"reference-count":91,"publisher":"Annual Reviews","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Annu. Rev. Control Robot. Auton. Syst."],"published-print":{"date-parts":[[2019,5,3]]},"abstract":"<jats:p>This article surveys reinforcement learning from the perspective of optimization and control, with a focus on continuous control applications. It reviews the general formulation, terminology, and typical experimental implementations of reinforcement learning as well as competing solution paradigms. In order to compare the relative merits of various techniques, it presents a case study of the linear quadratic regulator (LQR) with unknown dynamics, perhaps the simplest and best-studied problem in optimal control. It also describes how merging techniques from learning theory and control can provide nonasymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. The article concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and control might be combined to approach these challenges.<\/jats:p>","DOI":"10.1146\/annurev-control-053018-023825","type":"journal-article","created":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T18:30:41Z","timestamp":1544812241000},"page":"253-279","source":"Crossref","is-referenced-by-count":439,"title":["A Tour of Reinforcement Learning: The View from Continuous Control"],"prefix":"10.1146","volume":"2","author":[{"given":"Benjamin","family":"Recht","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA;"}]}],"member":"22","reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"B2","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas DP","year":"2017"},{"key":"B3","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton RS","year":"1998"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"B5","first-page":"2818","volume-title":"Advances in Neural Information Processing Systems 28","author":"Dann C","year":"2015"},{"key":"B6","volume-title":"Problem Complexity and Method Efficiency in Optimization","author":"Nemirovski A","year":"1983"},{"key":"B7","volume-title":"Semi-supervised learning literature survey","author":"Zhu X","year":"2005"},{"key":"B8","first-page":"38.1","volume-title":"Proceedings of the 25th Annual Conference on Learning Theory","author":"Hazan E","year":"2012"},{"key":"B9","volume-title":"Neuro-Dynamic Programming","author":"Bertsekas DP","year":"1996"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1613\/jair.301"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1126\/science.1259433"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2379-3_11"},{"key":"B14","first-page":"3207","volume":"14","author":"Bottou L","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"B15","first-page":"2217","volume-title":"Advances in Neural Information Processing Systems 23","author":"Strehl A","year":"2010"},{"key":"B16","volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas DP","year":"2012"},{"key":"B17","volume-title":"System Identification: Theory for the User","author":"Ljung L","year":"1998"},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2002.800750"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.1016\/j.jprocont.2007.10.009"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1007\/BF00993306"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992701"},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.1007\/BF00114723"},{"key":"B25","volume-title":"Temporal differences-based policy iteration and applications in neuro-dynamic programming","author":"Bertsekas DP","year":"1996"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2009.2022097"},{"key":"B27","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"B28","first-page":"2672","volume-title":"Advances in Neural Information Processing Systems 25","author":"Jamieson KG","year":"2012"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.1016\/0022-247X(65)90154-X"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(98)00023-X"},{"key":"B31","first-page":"1467","volume":"24","author":"Rastrigin LA","year":"1963","journal-title":"Avtomat. Telemekh."},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.1007\/s10208-015-9296-2"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015059928466"},{"key":"B34","volume-title":"Evolutionsstrategie und numerische optimierung","author":"Schwefel HP","year":"1975"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.1109\/9.119632"},{"key":"B36","first-page":"385","volume-title":"Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Flaxman AD","year":"2005"},{"key":"B37","first-page":"28","volume-title":"COLT 2010: 23rd Conference on Learning Theory","author":"Agarwal A","year":"2010"},{"key":"B38","volume-title":"Robust and Optimal Control","author":"Zhou K","year":"1995"},{"key":"B39","unstructured":"39.\u2002 Dean S, Mania H, Matni N, Recht B, Tu S. 2017. On the sample complexity of the linear quadratic regulator. arXiv:1710.01688 [math.OC]"},{"key":"B40","first-page":"439","volume-title":"Proceedings of the 31st Conference on Learning Theory","author":"Simchowitz M","year":"2018"},{"key":"B41","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2017.8264168"},{"key":"B42","unstructured":"42.\u2002 Wang YS, Matni N, Doyle JC. 2016. A system level approach to controller synthesis. arXiv:1610.04815 [cs.SY]"},{"key":"B43","doi-asserted-by":"publisher","DOI":"10.1109\/ACC.1994.735224"},{"key":"B44","first-page":"5005","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Tu SL","year":"2018"},{"key":"B45","unstructured":"45.\u2002 Kingma D, Ba J. 2014. Adam: a method for stochastic optimization. arXiv:1412.6980 [cs.LG]"},{"key":"B46","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-4832-1448-1.50011-1"},{"key":"B47","volume-title":"Temporal credit assignment in reinforcement learning","author":"Sutton RS","year":"1984"},{"key":"B48","volume-title":"Toward a theory of reinforcement-learning connectionist systems","author":"Williams RJ","year":"1988"},{"key":"B49","first-page":"1107","volume":"4","author":"Lagoudakis MG","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"B50","volume-title":"Machine learning applications for data center optimization","author":"Gao J","year":"2014"},{"key":"B51","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176344552"},{"key":"B52","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-0795-5"},{"key":"B53","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"B54","doi-asserted-by":"publisher","DOI":"10.1201\/9781315136370"},{"key":"B55","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2004.1389841"},{"key":"B56","first-page":"1","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Levine S","year":"2013"},{"key":"B57","first-page":"387","volume-title":"Proceedings of the 31st International Conference on Machine Learning","author":"Silver D","year":"2014"},{"key":"B58","unstructured":"58.\u2002 Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, et al. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs.LG]"},{"key":"B59","first-page":"1889","volume-title":"Proceedings of the 32nd International Conference on Machine Learning","author":"Schulman J","year":"2015"},{"key":"B60","unstructured":"60.\u2002 Schulman J, Moritz P, Levine S, Jordan M, Abbeel P. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438 [cs.LG]"},{"key":"B61","unstructured":"61.\u2002 Wu Y, Mansimov E, Liao S, Grosse R, Ba J. 2017. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. arXiv:1708.05144 [cs.LG]"},{"key":"B62","unstructured":"62.\u2002 Salimans T, Ho J, Chen X, Sutskever I. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 [stat.ML]"},{"key":"B63","doi-asserted-by":"crossref","first-page":"49","DOI":"10.2478\/pjbr-2013-0003","volume":"4","author":"Stulp F","year":"2013","journal-title":"Paladyn"},{"key":"B64","first-page":"6550","volume-title":"Advances in Neural Information Processing Systems 30","author":"Rajeswaran A","year":"2017"},{"key":"B65","unstructured":"65.\u2002 Mania H, Guy A, Recht B. 2018. Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 [cs.LG]"},{"key":"B66","doi-asserted-by":"crossref","unstructured":"66.\u2002 Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D. 2017. Deep reinforcement learning that matters. arXiv:1709.06560 [cs.LG]","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"B67","doi-asserted-by":"crossref","unstructured":"67.\u2002 Islam R, Henderson P, Gomrokchi M, Precup D. 2017. Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv:1708.04133 [cs.LG]","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"B68","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2013.7029990"},{"key":"B69","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386025"},{"key":"B70","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2017.2753460"},{"key":"B71","unstructured":"71.\u2002 Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, et al. 2016. End to end learning for self-driving cars. arXiv:1604.07316 [cs.CV]"},{"key":"B72","doi-asserted-by":"publisher","DOI":"10.1016\/S0005-1098(00)00050-9"},{"key":"B73","doi-asserted-by":"publisher","DOI":"10.1287\/moor.12.3.441"},{"key":"B74","first-page":"1","volume":"17","author":"Levine S","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"B75","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2014.7039601"},{"key":"B76","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2013.02.003"},{"key":"B77","first-page":"908","volume-title":"Advances in Neural Information Processing Systems 30","author":"Berkenkamp F","year":"2017"},{"key":"B78","doi-asserted-by":"publisher","DOI":"10.1137\/0319052"},{"key":"B79","first-page":"1","volume-title":"Proceedings of the 24th Annual Conference on Learning Theory","author":"Abbasi-Yadkori Y","year":"2011"},{"key":"B80","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"B81","doi-asserted-by":"publisher","DOI":"10.1016\/0196-8858(85)90002-8"},{"key":"B82","first-page":"2","volume-title":"Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence","author":"Abbasi-Yadkori Y","year":"2015"},{"key":"B83","first-page":"176","volume-title":"Proceedings of the 20th International Conference on Artificial Intelligence and Statistics","author":"Abeille M","year":"2017"},{"key":"B84","doi-asserted-by":"crossref","unstructured":"84.\u2002 Ouyang Y, Gagrani M, Jain R. 2017. Learning-based control of unknown linear systems with Thompson sampling. arXiv:1709.04047 [cs.SY]","DOI":"10.1109\/ALLERTON.2017.8262873"},{"key":"B85","unstructured":"85.\u2002 Abbasi-Yadkori Y, Lazic N, Szepesv\u00e1ri C. 2018. The return of \u03f5-greedy: sublinear regret for model-free linear quadratic control. arXiv:1804.06021 [cs.LG]"},{"key":"B86","unstructured":"86.\u2002 Dean S, Mania H, Matni N, Recht B, Tu S. 2018. Regret bounds for robust adaptive control of the linear quadratic regulator. arXiv:1805.09388 [cs.LG]"},{"key":"B87","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2016.XII.029"},{"key":"B88","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.1989.70615"},{"key":"B89","doi-asserted-by":"publisher","DOI":"10.1109\/TCST.2017.2723574"},{"key":"B90","doi-asserted-by":"publisher","DOI":"10.1115\/1.3653115"},{"key":"B91","first-page":"663","volume-title":"Proceedings of the Seventeenth International Conference on Machine Learning","author":"Ng AY","year":"2000"}],"container-title":["Annual Review of Control, Robotics, and Autonomous Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.annualreviews.org\/doi\/pdf\/10.1146\/annurev-control-053018-023825","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T14:13:45Z","timestamp":1775312025000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.annualreviews.org\/doi\/10.1146\/annurev-control-053018-023825"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,3]]},"references-count":91,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,5,3]]}},"alternative-id":["10.1146\/annurev-control-053018-023825"],"URL":"https:\/\/doi.org\/10.1146\/annurev-control-053018-023825","relation":{},"ISSN":["2573-5144","2573-5144"],"issn-type":[{"value":"2573-5144","type":"print"},{"value":"2573-5144","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,5,3]]}}}