{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:50:06Z","timestamp":1766137806114,"version":"3.41.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T00:00:00Z","timestamp":1632268800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF CCF","award":["2008799"],"award-info":[{"award-number":["2008799"]}]},{"name":"NSF","award":["1724237"],"award-info":[{"award-number":["1724237"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>We focus on the problem of reliably training Reinforcement Learning (RL) models (agents) for stable low-level control in embedded systems and test our methods on a high-performance, custom-built quadrotor platform. A common but often under-studied problem in developing RL agents for continuous control is that the control policies developed are not always smooth. This lack of smoothness can be a major problem when learning controllers as it can result in control instability and hardware failure.<\/jats:p>\n          <jats:p>\n            Issues of noisy control are further accentuated when training RL agents in simulation due to simulators ultimately being imperfect representations of reality\u2014what is known as the\n            <jats:italic>reality gap<\/jats:italic>\n            . To combat issues of instability in RL agents, we propose a systematic framework, REinforcement-based transferable Agents through Learning (RE+AL), for designing simulated training environments that preserve the quality of trained agents when transferred to real platforms. RE+AL is an evolution of the Neuroflight infrastructure detailed in technical reports prepared by members of our research group. Neuroflight is a state-of-the-art framework for training RL agents for low-level attitude control. RE+AL improves and completes Neuroflight by solving a number of important limitations that hindered the deployment of Neuroflight to real hardware. We benchmark RE+AL on the NF1 racing quadrotor developed as part of Neuroflight. We demonstrate that RE+AL significantly mitigates the previously observed issues of smoothness in RL agents. Additionally, RE+AL is shown to consistently train agents that are flight capable and with minimal degradation in controller quality upon transfer. RE+AL agents also learn to perform better than a tuned PID controller, with better tracking errors, smoother control, and reduced power consumption. To the best of our knowledge, RE+AL agents are the first RL-based controllers trained in simulation to outperform a well-tuned PID controller on a real-world controls problem that is solvable with classical control.\n          <\/jats:p>","DOI":"10.1145\/3466618","type":"journal-article","created":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T21:36:34Z","timestamp":1632346594000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["How to Train Your Quadrotor: A Framework for Consistently Smooth and Responsive Flight Control via Reinforcement Learning"],"prefix":"10.1145","volume":"5","author":[{"given":"Siddharth","family":"Mysore","sequence":"first","affiliation":[{"name":"Boston University, Boston, MA"}]},{"given":"Bassel","family":"Mabsout","sequence":"additional","affiliation":[{"name":"Boston University, Boston, MA"}]},{"given":"Kate","family":"Saenko","sequence":"additional","affiliation":[{"name":"Boston University, Boston, MA"}]},{"given":"Renato","family":"Mancuso","sequence":"additional","affiliation":[{"name":"Boston University, Boston, MA"}]}],"member":"320","published-online":{"date-parts":[[2021,9,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Team Betaflight. [n.d.]. Betaflight. https:\/\/betaflight.com\/  Team Betaflight. [n.d.]. Betaflight. https:\/\/betaflight.com\/"},{"key":"e_1_2_1_2_1","unstructured":"Russell L. Smith. [n.d.]. Open Dynamics Engine. https:\/\/www.ode.org\/  Russell L. Smith. [n.d.]. Open Dynamics Engine. https:\/\/www.ode.org\/"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/3026877.3026899"},{"key":"e_1_2_1_4_1","volume-title":"Dynamic weights in multi-objective deep reinforcement learning. arXiv preprint arXiv:1809.07803","author":"Abels Axel","year":"2018","unstructured":"Axel Abels , Diederik M. Roijers , Tom Lenaerts , Ann Now\u00e9 , and Denis Steckelmacher . 2018. Dynamic weights in multi-objective deep reinforcement learning. arXiv preprint arXiv:1809.07803 ( 2018 ). Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Now\u00e9, and Denis Steckelmacher. 2018. Dynamic weights in multi-objective deep reinforcement learning. arXiv preprint arXiv:1809.07803 (2018)."},{"key":"e_1_2_1_5_1","unstructured":"Dario Amodei Chris Olah Jacob Steinhardt Paul Christiano John Schulman and Dan Man\u00e9. 2016. Concrete Problems in AI Safety. arxiv:cs.AI\/1606.06565  Dario Amodei Chris Olah Jacob Steinhardt Paul Christiano John Schulman and Dan Man\u00e9. 2016. Concrete Problems in AI Safety. arxiv:cs.AI\/1606.06565"},{"key":"e_1_2_1_6_1","volume-title":"CoRR abs\/1606.01540","author":"Brockman Greg","year":"2016","unstructured":"Greg Brockman , Vicki Cheung , Ludwig Pettersson , Jonas Schneider , John Schulman , Jie Tang , and Wojciech Zaremba . 2016. Open AI Gym . CoRR abs\/1606.01540 ( 2016 ). arxiv:1606.01540 Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs\/1606.01540 (2016). arxiv:1606.01540"},{"key":"e_1_2_1_7_1","unstructured":"Erwin Coumans and Yunfei Bai. 2016\u20132019. PyBullet a Python module for physics simulation for games robotics and machine learning. http:\/\/pybullet.org.  Erwin Coumans and Yunfei Bai. 2016\u20132019. PyBullet a Python module for physics simulation for games robotics and machine learning. http:\/\/pybullet.org."},{"key":"e_1_2_1_8_1","unstructured":"Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines.  Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045531"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/5666.5673"},{"key":"e_1_2_1_11_1","volume-title":"International Conference on Machine Learning (ICML'18)","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , and Sergey Levine . 2018 . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor . In International Conference on Machine Learning (ICML'18) . Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning (ICML'18)."},{"volume-title":"G\u00f6del Logic (and Boolean Logic)","author":"H\u00e1jek Petr","key":"e_1_2_1_12_1","unstructured":"Petr H\u00e1jek . 1998. Product Logic , G\u00f6del Logic (and Boolean Logic) . Springer Netherlands , Dordrecht , 89\u2013107. https:\/\/doi.org\/10.1007\/978-94-011-5300-3_4 10.1007\/978-94-011-5300-3_4 Petr H\u00e1jek. 1998. Product Logic, G\u00f6del Logic (and Boolean Logic). Springer Netherlands, Dordrecht, 89\u2013107. https:\/\/doi.org\/10.1007\/978-94-011-5300-3_4"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/2997046.2997187"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2017.2720851"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793742"},{"key":"e_1_2_1_17_1","volume-title":"Neuroflight: Next generation flight control firmware. CoRR abs\/1901.06553","author":"Koch William","year":"2019","unstructured":"William Koch , Renato Mancuso , and Azer Bestavros . 2019 . Neuroflight: Next generation flight control firmware. CoRR abs\/1901.06553 (2019). arxiv:1901.06553 William Koch, Renato Mancuso, and Azer Bestavros. 2019. Neuroflight: Next generation flight control firmware. CoRR abs\/1901.06553 (2019). arxiv:1901.06553"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3301273"},{"key":"e_1_2_1_19_1","volume-title":"2004 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS'04)","volume":"3","author":"Koenig Nathan","unstructured":"Nathan Koenig and Andrew Howard . [n.d.]. Design and use paradigms for gazebo, an open-source multi-robot simulator . In 2004 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS'04) (IEEE Cat. No. 04CH37566) , Vol. 3 . IEEE, 2149\u20132154. Nathan Koenig and Andrew Howard. [n.d.]. Design and use paradigms for gazebo, an open-source multi-robot simulator. In 2004 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS'04) (IEEE Cat. No. 04CH37566), Vol. 3. IEEE, 2149\u20132154."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00500"},{"key":"e_1_2_1_21_1","volume-title":"International Conference on Learning Representations. arxiv:1509","author":"Lillicrap Timothy P.","year":"2016","unstructured":"Timothy P. Lillicrap , Jonathan J. Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2016 . Continuous control with deep reinforcement learning . In International Conference on Learning Representations. arxiv:1509 .02971 Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In International Conference on Learning Representations. arxiv:1509.02971"},{"key":"e_1_2_1_22_1","volume-title":"Conference on Robot Learning. arxiv:1809","author":"Mahmood A. Rupam","year":"2018","unstructured":"A. Rupam Mahmood , Dmytro Korenkevych , Gautham Vasan , William Ma , and James Bergstra . 2018 . Benchmarking reinforcement learning algorithms on real-world robots . In Conference on Robot Learning. arxiv:1809 .07731 A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, and James Bergstra. 2018. Benchmarking reinforcement learning algorithms on real-world robots. In Conference on Robot Learning. arxiv:1809.07731"},{"key":"e_1_2_1_23_1","volume-title":"Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ( 2013 ). Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"volume-title":"International Conference on Intelligent Robots and Systems. arxiv:1903","author":"Molchanov Artem","key":"e_1_2_1_24_1","unstructured":"Artem Molchanov , Tao Chen , Wolfgang H\u00f6nig , James A. Preiss , Nora Ayanian , and Gaurav S. Sukhatme . 2019. Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors . In International Conference on Intelligent Robots and Systems. arxiv:1903 .04628 Artem Molchanov, Tao Chen, Wolfgang H\u00f6nig, James A. Preiss, Nora Ayanian, and Gaurav S. Sukhatme. 2019. Sim-to-(multi)-real: Transfer of low-level robust control policies to multiple quadrotors. In International Conference on Intelligent Robots and Systems. arxiv:1903.04628"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/325334.325247"},{"key":"e_1_2_1_26_1","volume-title":"Rummery and Mahesan Niranjan","author":"Gavin","year":"1994","unstructured":"Gavin A. Rummery and Mahesan Niranjan . 1994 . On-line Q-learning Using Connectionist Systems . Vol. 37 . Gavin A. Rummery and Mahesan Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Vol. 37."},{"key":"e_1_2_1_27_1","volume-title":"Sim2Real view invariant visual servoing by recurrent control. CoRR abs\/1712.07642","author":"Sadeghi Fereshteh","year":"2017","unstructured":"Fereshteh Sadeghi , Alexander Toshev , Eric Jang , and Sergey Levine . 2017. Sim2Real view invariant visual servoing by recurrent control. CoRR abs\/1712.07642 ( 2017 ). arxiv:1712.07642 Fereshteh Sadeghi, Alexander Toshev, Eric Jang, and Sergey Levine. 2017. Sim2Real view invariant visual servoing by recurrent control. CoRR abs\/1712.07642 (2017). arxiv:1712.07642"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045319"},{"key":"e_1_2_1_29_1","volume-title":"Proximal policy optimization algorithms. CoRR abs\/1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. CoRR abs\/1707.06347 ( 2017 ). arxiv:1707.06347 John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR abs\/1707.06347 (2017). arxiv:1707.06347"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3008751.3008902"},{"key":"e_1_2_1_31_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et\u00a0al.","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J. Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et\u00a0al. 2016 . Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et\u00a0al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484."},{"key":"e_1_2_1_32_1","volume-title":"et\u00a0al","author":"Silver David","year":"2017","unstructured":"David Silver , Thomas Hubert , Julian Schrittwieser , Ioannis Antonoglou , Matthew Lai , Arthur Guez , Marc Lanctot , Laurent Sifre , Dharshan Kumaran , Thore Graepel , et\u00a0al . 2017 . Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017). David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et\u00a0al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/551283"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3009657.3009806"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_2_1_36_1","volume-title":"et\u00a0al","author":"Vinyals Oriol","year":"2019","unstructured":"Oriol Vinyals , Igor Babuschkin , Wojciech M. Czarnecki , Micha\u00ebl Mathieu , Andrew Dudzik , Junyoung Chung , David H. Choi , Richard Powell , Timo Ewalds , Petko Georgiev , et\u00a0al . 2019 . Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350\u2013354. Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Micha\u00ebl Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et\u00a0al. 2019. Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350\u2013354."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_2_1_39_1","volume-title":"A dissection of overfitting and generalization in continuous reinforcement learning. CoRR abs\/1806.07937","author":"Zhang Amy X.","year":"2018","unstructured":"Amy X. Zhang , Nicolas Ballas , and Joelle Pineau . 2018. A dissection of overfitting and generalization in continuous reinforcement learning. CoRR abs\/1806.07937 ( 2018 ). Amy X. Zhang, Nicolas Ballas, and Joelle Pineau. 2018. A dissection of overfitting and generalization in continuous reinforcement learning. CoRR abs\/1806.07937 (2018)."},{"key":"e_1_2_1_40_1","volume-title":"A study on overfitting in deep reinforcement learning. CoRR abs\/1804.06893","author":"Zhang Chiyuan","year":"2018","unstructured":"Chiyuan Zhang , Oriol Vinyals , R\u00e9mi Munos , and Samy Bengio . 2018. A study on overfitting in deep reinforcement learning. CoRR abs\/1804.06893 ( 2018 ). Chiyuan Zhang, Oriol Vinyals, R\u00e9mi Munos, and Samy Bengio. 2018. A study on overfitting in deep reinforcement learning. CoRR abs\/1804.06893 (2018)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1115\/1.2899060"}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3466618","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3466618","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3466618","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:52Z","timestamp":1750195492000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3466618"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,22]]},"references-count":40,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3466618"],"URL":"https:\/\/doi.org\/10.1145\/3466618","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"type":"print","value":"2378-962X"},{"type":"electronic","value":"2378-9638"}],"subject":[],"published":{"date-parts":[[2021,9,22]]},"assertion":[{"value":"2020-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}