{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T03:32:09Z","timestamp":1769311929715,"version":"3.49.0"},"reference-count":76,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T00:00:00Z","timestamp":1625616000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T00:00:00Z","timestamp":1625616000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100002418","name":"Intel Corporation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100002418","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2021,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We introduce Air Learning, an open-source simulator, and a gym environment for deep reinforcement learning research on resource-constrained aerial robots. Equipped with domain randomization, Air Learning exposes a UAV agent to a diverse set of challenging scenarios. We seed the toolset with point-to-point obstacle avoidance tasks in three different environments and Deep Q Networks (DQN) and Proximal Policy Optimization (PPO) trainers. Air Learning assesses the policies\u2019 performance under various quality-of-flight (QoF) metrics, such as the energy consumed, endurance, and the average trajectory length, on resource-constrained embedded platforms like a Raspberry Pi. We find that the trajectories on an embedded Ras-Pi are vastly different from those predicted on a high-end desktop system, resulting in up to<jats:inline-formula><jats:alternatives><jats:tex-math>$$40\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>40<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>longer trajectories in one of the environments. To understand the source of such discrepancies, we use Air Learning to artificially degrade high-end desktop performance to mimic what happens on a low-end embedded system. We then propose a mitigation technique that uses the hardware-in-the-loop to determine the latency distribution of running the policy on the target platform (onboard compute on aerial robot). A randomly sampled latency from the latency distribution is then added as an artificial delay within the training loop. Training the policy with artificial delays allows us to minimize the hardware gap (discrepancy in the flight time metric reduced from 37.73% to 0.5%). Thus, Air Learning with hardware-in-the-loop characterizes those differences and exposes how the onboard compute\u2019s choice affects the aerial robot\u2019s performance. We also conduct reliability studies to assess the effect of sensor failures on the learned policies. All put together, Air Learning enables a broad class of deep RL research on UAVs. The source code is available at:\u00a0<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/harvard-edge\/AirLearning\">https:\/\/github.com\/harvard-edge\/AirLearning<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s10994-021-06006-6","type":"journal-article","created":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T18:01:51Z","timestamp":1625680911000},"page":"2501-2540","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["Air Learning: a deep reinforcement learning gym for autonomous aerial robot visual navigation"],"prefix":"10.1007","volume":"110","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7148-5389","authenticated-orcid":false,"given":"Srivatsan","family":"Krishnan","sequence":"first","affiliation":[]},{"given":"Behzad","family":"Boroujerdian","sequence":"additional","affiliation":[]},{"given":"William","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Aleksandra","family":"Faust","sequence":"additional","affiliation":[]},{"given":"Vijay Janapa","family":"Reddi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,7,7]]},"reference":[{"key":"6006_CR1","unstructured":"Abadi, M.,\u00a0Agarwal, A.,\u00a0Barham, P.,\u00a0Brevdo, E.,\u00a0Chen, Z.,\u00a0Citro C., Corrado, G.\u00a0S.,\u00a0Davis, A.,\u00a0Dean, J.,\u00a0Devin, M.,\u00a0Ghemawat, S.,\u00a0Goodfellow, I.,\u00a0Harp, A.,\u00a0Irving, G.,\u00a0Isard, M.,\u00a0Jia, Y.,\u00a0Jozefowicz, R.,\u00a0Kaiser, L.,\u00a0Kudlur, M.,\u00a0Levenberg, J.,\u00a0Man\u00e9, D.,\u00a0Monga, R.,\u00a0Moore, S.,\u00a0Murray, D.,\u00a0Olah, C.,\u00a0Schuster, M.,\u00a0Shlens, J.,\u00a0Steiner, B.,\u00a0Sutskever, I.,\u00a0Talwar, K.,\u00a0Tucker, P.,\u00a0Vanhoucke, V.,\u00a0Vasudevan, V.,\u00a0Vi\u00e9gas, F.,\u00a0Vinyals, O.,\u00a0Warden, P.,\u00a0Wattenberg, M.,\u00a0Wicke, M.,\u00a0Yu, Y.,&\u00a0Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from www.tensorflow.org"},{"key":"6006_CR2","unstructured":"Adiprawita, W., Ahmad, A.\u00a0S., & Semibiring, J. (2008). Hardware in the loop simulator in UAV rapid development life cycle. CoRR, vol.\u00a0abs\/0804.3874."},{"key":"6006_CR3","unstructured":"Ahn, M., Zhu, H., Hartikainen, K., Ponte, H., Gupta, A., Levine, S., & Kumar, V. (2020). Robel: Robotics benchmarks for learning with low-cost robots. In Conference on robot learning (pp.\u00a01300\u20131313). PMLR."},{"key":"6006_CR4","first-page":"1475","volume":"5","author":"B Bakker","year":"2002","unstructured":"Bakker, B. (2002). Reinforcement learning with long short-term memory. Advances in Neural Information Processing Systems, 5, 1475\u20131482.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6006_CR5","unstructured":"Bellemare, M.\u00a0G., Naddaf, Y., Veness, J., & Bowling, M. (2015). The arcade learning environment: An evaluation platform for general agents. In Proceedings of the 24th international conference on artificial intelligence, IJCAI\u201915 (pp.\u00a04148\u20134152). AAAI Press."},{"key":"6006_CR6","doi-asserted-by":"crossref","unstructured":"Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp.\u00a041\u201348). ACM.","DOI":"10.1145\/1553374.1553380"},{"key":"6006_CR7","doi-asserted-by":"crossref","unstructured":"Berger, K., Voorhies, R., & Matthies, L.\u00a0H. (2017). Depth from stereo polarization in specular scenes for urban robotics. In 2017 IEEE international conference on robotics and automation (ICRA) (pp.\u00a01966\u20131973). IEEE.","DOI":"10.1109\/ICRA.2017.7989227"},{"key":"6006_CR8","doi-asserted-by":"crossref","unstructured":"Boeing, A., & Br\u00e4unl, T. (2012). Leveraging multiple simulators for crossing the reality gap. In 2012 12th international conference on control automation robotics & vision (ICARCV) (pp.\u00a01113\u20131119). IEEE.","DOI":"10.1109\/ICARCV.2012.6485313"},{"key":"6006_CR9","unstructured":"Bojarski, M., Testa, D.\u00a0D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.\u00a0D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., & Zieba, K. (2016). End to end learning for self-driving cars. CoRR, vol.\u00a0abs\/1604.07316."},{"key":"6006_CR10","doi-asserted-by":"crossref","unstructured":"Boroujerdian, B., Genc, H., Krishnan, S., Cui, W., Faust, A., & Reddi, V. (2018). Mavbench: Micro aerial vehicle benchmarking. In 2018 51st annual IEEE\/ACM international symposium on microarchitecture (MICRO) (pp.\u00a0894\u2013907). IEEE.","DOI":"10.1109\/MICRO.2018.00077"},{"key":"6006_CR11","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). Openai gym. CoRR, vol.\u00a0abs\/1606.01540."},{"issue":"2","key":"6006_CR12","doi-asserted-by":"publisher","first-page":"2007","DOI":"10.1109\/LRA.2019.2899918","volume":"4","author":"H-TL Chiang","year":"2019","unstructured":"Chiang, H.-T.L., Faust, A., Fiser, M., & Francis, A. (2019). Learning navigation behaviors end-to-end with autorl. IEEE Robotics and Automation Letters, 4(2), 2007\u20132014.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"6006_CR13","doi-asserted-by":"publisher","first-page":"2007","DOI":"10.1109\/LRA.2019.2899918","volume":"4","author":"HL Chiang","year":"2019","unstructured":"Chiang, H. L., Faust, A., Fiser, M., & Francis, A. (2019). Learning navigation behaviors end-to-end with autorl. IEEE Robotics and Automation Letters, 4, 2007\u20132014.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"6006_CR14","unstructured":"Chollet, F. (2015). Keras. https:\/\/github.com\/fchollet\/keras"},{"key":"6006_CR15","unstructured":"Crazyflie. (2018). Crazyflie\u00a02.0. https:\/\/www.bitcraze.io\/crazyflie-2\/"},{"key":"6006_CR16","unstructured":"DJI. (2018). DJI-mavic pro. https:\/\/www.dji.com\/mavic"},{"key":"6006_CR17","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st annual conference on robot learning (pp.\u00a01\u201316)."},{"key":"6006_CR18","unstructured":"Duisterhof, B.\u00a0P., Krishnan, S., Cruz, J.\u00a0J., Banbury, C.\u00a0R., Fu, W., Faust, A., de\u00a0Croon, G.\u00a0C. H.\u00a0E., & Reddi, V.\u00a0J. (2019). Learning to seek: Autonomous source seeking with deep reinforcement learning onboard a nano drone microcontroller. CoRR, vol.\u00a0abs\/1909.11236."},{"key":"6006_CR19","unstructured":"Epic, G. (2018). Ue4 materials. https:\/\/docs.unrealengine.com\/en-US\/Engine\/Basics\/AssetsAndPackages"},{"key":"6006_CR20","unstructured":"Epic, G. (2018). Wire frame. https:\/\/docs.unrealengine.com\/en-us\/Engine\/Rendering\/Materialss"},{"key":"6006_CR21","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1016\/j.artint.2014.11.009","volume":"247","author":"A Faust","year":"2017","unstructured":"Faust, A., Palunko, I., Cruz, P., Fierro, R., & Tapia, L. (2017). Automated aerial suspended cargo delivery through reinforcement learning. Artificial Intelligence, 247, 381\u2013398.","journal-title":"Artificial Intelligence"},{"key":"6006_CR22","unstructured":"Games, E. (2018). Ue4 textures. https:\/\/docs.unrealengine.com\/en-us\/Engine\/Content\/Types\/Textures"},{"key":"6006_CR23","unstructured":"Games, E. (2018). Wire frame. https:\/\/docs.unrealengine.com\/en-us\/Engine\/UI\/LevelEditor\/Viewports\/Vie wModes"},{"key":"6006_CR24","doi-asserted-by":"crossref","unstructured":"Gandhi, D., Pinto, L., & Gupta, A. (2017). Learning to fly by crashing. CoRR, vol.\u00a0abs\/1704.05588.","DOI":"10.1109\/IROS.2017.8206247"},{"issue":"2","key":"6006_CR25","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1109\/LRA.2015.2509024","volume":"1","author":"A Giusti","year":"2016","unstructured":"Giusti, A., Guzzi, J., Ciresan, D. C., He, F.-L., Rodr\u00edguez, J. P., Fontana, F., et al. (2016). A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 1(2), 661\u2013667.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"6006_CR26","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1016\/j.trd.2017.02.017","volume":"61","author":"A Goodchild","year":"2018","unstructured":"Goodchild, A., & Toy, J. (2018). Delivery by drone: An evaluation of unmanned aerial vehicle technology in reducing CO2 emissions in the delivery service industry. Transportation Research Part D: Transport and Environment, 61, 58\u201367.","journal-title":"Transportation Research Part D: Transport and Environment"},{"key":"6006_CR27","doi-asserted-by":"crossref","unstructured":"Gu, S., Holly, E., Lillicrap, T.\u00a0P., & Levine, S. (2016). Deep reinforcement learning for robotic manipulation. CoRR, vol.\u00a0abs\/1610.00633.","DOI":"10.1109\/ICRA.2017.7989385"},{"key":"6006_CR28","doi-asserted-by":"crossref","unstructured":"Ha, S., Kim, J., & Yamane, K. (2018). Automated deep reinforcement learning environment for hardware of a modular legged robot. In 2018 15th international conference on ubiquitous robots (UR) (pp.\u00a0348\u2013354). IEEE.","DOI":"10.1109\/URAI.2018.8442201"},{"key":"6006_CR29","unstructured":"Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905"},{"key":"6006_CR30","unstructured":"Hill, A., Raffin, A., Ernestus, M., Gleave, A., Traore, R., Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., & Wu, Y. (2018). Stable baselines. https:\/\/github.com\/hill-a\/stable-baselines"},{"key":"6006_CR31","unstructured":"Hummingbird, A. (2018). Asctec hummingbird. http:\/\/www.asctec.de\/en\/uav-uas-drones-rpas-roav\/asctec-hummingbi rd\/"},{"issue":"4","key":"6006_CR32","doi-asserted-by":"publisher","first-page":"2096","DOI":"10.1109\/LRA.2017.2720851","volume":"2","author":"J Hwangbo","year":"2017","unstructured":"Hwangbo, J., Sa, I., Siegwart, R., & Hutter, M. (2017). Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters, 2(4), 2096\u20132103.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"6006_CR33","unstructured":"Intel. (2018). Intel aero ready to fly drone. https:\/\/www.intel.com\/content\/www\/us\/en\/products\/drones\/aero-ready-to-fly.html"},{"key":"6006_CR34","doi-asserted-by":"crossref","unstructured":"Judah, K., Fern, A.\u00a0P., Tadepalli, P., Goetschalckx, R. (2014). Imitation learning with demonstrations and shaping rewards. In Twenty-eighth AAAI conference on artificial intelligence.","DOI":"10.1609\/aaai.v28i1.9024"},{"key":"6006_CR35","unstructured":"Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., & Levine, S. (2018). Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293"},{"issue":"1","key":"6006_CR36","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1080\/21693277.2016.1195304","volume":"4","author":"Y Khosiawan","year":"2016","unstructured":"Khosiawan, Y., & Nielsen, I. (2016). A system of uav application in indoor environment. Production & Manufacturing Research, 4(1), 2\u201322.","journal-title":"Production & Manufacturing Research"},{"key":"6006_CR37","unstructured":"Kjell, K. (2018). Airgym. http:\/\/github.com\/Kjell-K\/AirGym"},{"key":"6006_CR38","doi-asserted-by":"crossref","unstructured":"Koch, W., Mancuso, R., West, R., & Bestavros, A. (2018). Reinforcement learning for uav attitude control.","DOI":"10.1145\/3301273"},{"key":"6006_CR39","first-page":"2149","volume":"3","author":"N Koenig","year":"2004","unstructured":"Koenig, N., & Howard, A. (2004). Design and use paradigms for gazebo, an open-source multi-robot simulator. IEEE\/RSJ International Conference on Intelligent Robots and Systems, 3, 2149\u20132154.","journal-title":"IEEE\/RSJ International Conference on Intelligent Robots and Systems"},{"key":"6006_CR40","doi-asserted-by":"crossref","unstructured":"Koos, S., Mouret, J.-B., & Doncieux, S. (2010). Crossing the reality gap in evolutionary robotics by promoting transferable controllers. In Proceedings of the 12th annual conference on genetic and evolutionary computation (pp.\u00a0119\u2013126). ACM.","DOI":"10.1145\/1830483.1830505"},{"key":"6006_CR41","unstructured":"Kretchmar, R. M. (2000). A synthesis of reinforcement learning and robust control theory. Colorado State University Fort Collins."},{"issue":"1","key":"6006_CR42","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1109\/LCA.2020.2981022","volume":"19","author":"S Krishnan","year":"2020","unstructured":"Krishnan, S., Wan, Z., Bhardwaj, K., Whatmough, P., Faust, A., Wei, G.-Y., et al. (2020). The sky is not the limit: A visual performance model for cyber-physical co-design in autonomous machines. IEEE Computer Architecture Letters, 19(1), 38\u201342.","journal-title":"IEEE Computer Architecture Letters"},{"key":"6006_CR43","doi-asserted-by":"crossref","unstructured":"Kumar, K.\u00a0R., Sastry, V., Sekhar, O.\u00a0C., Mohanta, D., Rajesh, D., & Varma, M.\u00a0P.\u00a0C. (2016). Design and fabrication of coulomb counter for estimation of soc of battery. In 2016 IEEE international conference on power electronics, drives and energy systems (PEDES) (pp.\u00a01\u20136). IEEE.","DOI":"10.1109\/PEDES.2016.7914473"},{"key":"6006_CR44","doi-asserted-by":"crossref","unstructured":"Kundu, T., & Saha, I. (2018). Charging station placement for indoor robotic applications. In 2018 IEEE international conference on robotics and automation (ICRA) (pp.\u00a03029\u20133036). IEEE.","DOI":"10.1109\/ICRA.2018.8461006"},{"key":"6006_CR45","unstructured":"Lai, P.-J., & Fuh, C.-S. (2015). Transparent object detection using regions with convolutional neural network. In IPPR conference on computer vision, graphics, and image processing (pp.\u00a01\u20138)."},{"key":"6006_CR46","unstructured":"Li, X., Li, L., Gao, J., He, X., Chen, J., Deng, L., & He, J. (2015). Recurrent reinforcement learning: A hybrid approach. CoRR, vol.\u00a0abs\/1509.03044."},{"key":"6006_CR47","unstructured":"Liu, S., Watterson, M., Tang, S., & Kumar, V. (2016). High speed navigation for quadrotors with limited onboard sensing. In 2016 IEEE international conference on robotics and automation (ICRA) (pp.\u00a01484\u20131491). IEEE."},{"key":"6006_CR48","unstructured":"Locobot. (2018). An open source low cost robot. http:\/\/www.locobot.org\/."},{"key":"6006_CR49","unstructured":"Mahmood, A.\u00a0R., Korenkevych, D., Komer, B.\u00a0J., & Bergstra, J. (2018). Setting up a reinforcement learning task with a real-world robot. CoRR, vol.\u00a0abs\/1803.07067."},{"key":"6006_CR50","unstructured":"Mahmood, A.\u00a0R., Korenkevych, D., Vasan, G., Ma, W., & Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. CoRR, vol.\u00a0abs\/1809.07731."},{"key":"6006_CR51","unstructured":"Menard, M., & Wagstaff, B. (2015). Game development with unity. Nelson Education."},{"key":"6006_CR52","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602"},{"key":"6006_CR53","unstructured":"Murali, A., Chen, T., Alwala, K.\u00a0V., Gandhi, D., Pinto, L., Gupta, S., & Gupta, A. (2019). Pyrobot: An open-source robotics framework for research and benchmarking. arXiv preprint arXiv:1906.08236"},{"key":"6006_CR54","unstructured":"NVIDAA-AI-IOT. (2015). NVIDIA-AI-IOT\/redtail. https:\/\/github.com\/NVIDIA-AI-IOT\/redtail\/wiki\/Skypad-TBS-Discovery-Setup"},{"key":"6006_CR55","unstructured":"NVIDIA. (2019). NVIDIA Xavier. https:\/\/developer.nvidia.com\/embedded\/buy\/jetson-agx-xavier-devkit"},{"key":"6006_CR56","unstructured":"OpenAI. (2018). Openai five. https:\/\/blog.openai.com\/openai-five\/"},{"key":"6006_CR57","doi-asserted-by":"crossref","unstructured":"Palacin, J., Palleja, T., Valgan\u00f3n, I., Pernia, R., & Roca, J. (2005). Measuring coverage performances of a floor cleaning mobile robot using a vision system. In Proceedings of the 2005 IEEE international conference on robotics and automation (pp.\u00a04236\u20134241). IEEE.","DOI":"10.1109\/ROBOT.2005.1570771"},{"key":"6006_CR58","unstructured":"Parrot. (2019). Parrot bebob-2. https:\/\/www.parrot.com\/us\/drones\/parrot-bebop-2-fpv?ref=#parrot-bebop-2-fpv-details"},{"key":"6006_CR59","doi-asserted-by":"crossref","unstructured":"Peng, K., Feng, L., Hsieh, Y., Yang, T., Hsiung, S., Tsai, Y., & Kuo, C. (2017). Unmanned aerial vehicle for infrastructure inspection with image processing for quantification of measurement and formation of facade map. In 2017 international conference on applied system innovation (ICASI) (pp.\u00a01969\u20131972). IEEE.","DOI":"10.1109\/ICASI.2017.7988578"},{"key":"6006_CR60","unstructured":"Plappert, M. (2016). Keras-rl. https:\/\/github.com\/keras-rl\/keras-rl"},{"key":"6006_CR61","doi-asserted-by":"crossref","unstructured":"Quillen, D., Jang, E., Nachum, O., Finn, C., Ibarz, J., & Levine, S. (2018). Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods. CoRR, vol.\u00a0abs\/1802.10264.","DOI":"10.1109\/ICRA.2018.8461039"},{"key":"6006_CR62","doi-asserted-by":"crossref","unstructured":"Riedmiller, M. (2012). 10 steps and some tricks to set up neural reinforcement controllers. In Neural networks: Tricks of the trade (pp.\u00a0735\u2013757). Springer.","DOI":"10.1007\/978-3-642-35289-8_39"},{"key":"6006_CR63","doi-asserted-by":"crossref","unstructured":"Sadeghi, F., & Levine, S. (2016). (cad)$$2$$rl: Real single-image flight without a single real image. CoRR, vol.\u00a0abs\/1611.04201.","DOI":"10.15607\/RSS.2017.XIII.034"},{"key":"6006_CR64","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. CoRR, vol.\u00a0abs\/1707.06347."},{"key":"6006_CR65","doi-asserted-by":"crossref","unstructured":"Shah, S., Dey, D., Lovett, C., & Kapoor, A. (2017). Air Sim: High-fidelity visual and physical simulation for autonomous vehicles. CoRR, vol.\u00a0abs\/1705.05065.","DOI":"10.1007\/978-3-319-67361-5_40"},{"key":"6006_CR66","doi-asserted-by":"crossref","unstructured":"Su, P.-H., Vandyke, D., Gasic, M., Mrksic, N., Wen, T.-H., & Young, S. (2015). Reward shaping with recurrent neural networks for speeding up on-line policy learning in spoken dialogue systems. arXiv preprint arXiv:1508.03391","DOI":"10.18653\/v1\/W15-4655"},{"key":"6006_CR67","doi-asserted-by":"crossref","unstructured":"Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. CoRR, vol.\u00a0abs\/1703.06907.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"6006_CR68","doi-asserted-by":"crossref","unstructured":"Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In 2012 IEEE\/RSJ international conference on intelligent robots and systems (pp.\u00a05026\u20135033).","DOI":"10.1109\/IROS.2012.6386109"},{"key":"6006_CR69","doi-asserted-by":"publisher","first-page":"79","DOI":"10.3389\/frobt.2018.00079","volume":"5","author":"JB Travnik","year":"2018","unstructured":"Travnik, J. B., Mathewson, K. W., Sutton, R. S., & Pilarski, P. M. (2018). Reactive reinforcement learning in asynchronous environments. Frontiers in Robotics and AI, 5, 79.","journal-title":"Frontiers in Robotics and AI"},{"key":"6006_CR70","unstructured":"Tseng, C., Chau, C., Elbassioni, K.\u00a0M., & Khonji, M. (2017). Flight tour planning with recharging optimization for battery-operated autonomous drones. CoRR, vol.\u00a0abs\/1703.10049."},{"key":"6006_CR71","unstructured":"Valcasara, N. (2015). Unreal engine game development blueprints. Packt Publishing Ltd."},{"key":"6006_CR72","doi-asserted-by":"crossref","unstructured":"Waharte, S., & Trigoni, N. (2010). Supporting search and rescue operations with uavs. In 2010 international conference on emerging security technologies (pp.\u00a0142\u2013147). IEEE.","DOI":"10.1109\/EST.2010.31"},{"key":"6006_CR73","doi-asserted-by":"crossref","unstructured":"Wu, B., Chen, W., Fan, Y., Zhang, Y., Hou, J., Liu, J., Huang, J., Liu, W., & Zhang, T. (2019). Tencent ml-images: A large-scale multi-label image database for visual representation learning. CoRR, vol.\u00a0abs\/1901.01703.","DOI":"10.1109\/ACCESS.2019.2956775"},{"key":"6006_CR74","doi-asserted-by":"crossref","unstructured":"Yahya, A., Li, A., Kalakrishnan, M., Chebotar, Y., & Levine, S. (2016). Collective robot reinforcement learning with distributed asynchronous guided policy search. CoRR, vol.\u00a0abs\/1610.00673.","DOI":"10.1109\/IROS.2017.8202141"},{"key":"6006_CR75","unstructured":"Zeiler, M.\u00a0D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR, vol.\u00a0abs\/1311.2901."},{"key":"6006_CR76","doi-asserted-by":"crossref","unstructured":"Zoph, B., Vasudevan, V., Shlens, J., & Le, Q.\u00a0V. (2017). Learning transferable architectures for scalable image recognition. CoRR, vol.\u00a0abs\/1707.07012.","DOI":"10.1109\/CVPR.2018.00907"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-021-06006-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-021-06006-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-021-06006-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,3]],"date-time":"2023-01-03T07:00:46Z","timestamp":1672729246000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-021-06006-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,7]]},"references-count":76,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["6006"],"URL":"https:\/\/doi.org\/10.1007\/s10994-021-06006-6","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,7]]},"assertion":[{"value":"16 March 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 January 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 May 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 July 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}