{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:50:02Z","timestamp":1760241002534,"version":"build-2065373602"},"reference-count":58,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2019,11,4]],"date-time":"2019-11-04T00:00:00Z","timestamp":1572825600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Deep- and reinforcement-learning techniques have increasingly required large sets of real data to achieve stable convergence and generalization, in the context of image-recognition, object-detection or motion-control strategies. On this subject, the research community lacks robust approaches to overcome unavailable real-world extensive data by means of realistic synthetic-information and domain-adaptation techniques. In this work, synthetic-learning strategies have been used for the vision-based autonomous following of a noncooperative multirotor. The complete maneuver was learned with synthetic images and high-dimensional low-level continuous robot states, with deep- and reinforcement-learning techniques for object detection and motion control, respectively. A novel motion-control strategy for object following is introduced where the camera gimbal movement is coupled with the multirotor motion during the multirotor following. Results confirm that our present framework can be used to deploy a vision-based task in real flight using synthetic data. It was extensively validated in both simulated and real-flight scenarios, providing proper results (following a multirotor up to 1.3 m\/s in simulation and 0.3 m\/s in real flights).<\/jats:p>","DOI":"10.3390\/s19214794","type":"journal-article","created":{"date-parts":[[2019,11,4]],"date-time":"2019-11-04T10:49:07Z","timestamp":1572864547000},"page":"4794","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Vision-Based Multirotor Following Using Synthetic Learning Techniques"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3257-4602","authenticated-orcid":false,"given":"Alejandro","family":"Rodriguez-Ramos","sequence":"first","affiliation":[{"name":"Computer Vision and Aerial Robotics group, Centre for Automation and Robotics, Universidad Polit\u00e9cnica de Madrid (UPM-CSIC), Calle Jose Gutierrez Abascal 2, 28006 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adrian","family":"Alvarez-Fernandez","sequence":"additional","affiliation":[{"name":"Artificial Intelligence group, University of Groningen, 9712 Groningen, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hriday","family":"Bavle","sequence":"additional","affiliation":[{"name":"Computer Vision and Aerial Robotics group, Centre for Automation and Robotics, Universidad Polit\u00e9cnica de Madrid (UPM-CSIC), Calle Jose Gutierrez Abascal 2, 28006 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9894-2009","authenticated-orcid":false,"given":"Pascual","family":"Campoy","sequence":"additional","affiliation":[{"name":"Computer Vision and Aerial Robotics group, Centre for Automation and Robotics, Universidad Polit\u00e9cnica de Madrid (UPM-CSIC), Calle Jose Gutierrez Abascal 2, 28006 Madrid, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8576-1930","authenticated-orcid":false,"given":"Jonathan P.","family":"How","sequence":"additional","affiliation":[{"name":"Aerospace Controls Laboratory, Massachusetts Institute of Technology (MIT), 77Mass. Ave., Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., and Vanhoucke, V. (2018). Sim-to-real: Learning agile locomotion for quadruped robots. arXiv.","DOI":"10.15607\/RSS.2018.XIV.010"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Rodriguez-Ramos, A., Sampedro, C., Bavle, H., Moreno, I.G., and Campoy, P. (2018, January 1\u20135). A Deep Reinforcement Learning Technique for Vision-Based Autonomous Multirotor Landing on a Moving Platform. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594472"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1109\/LRA.2017.2720851","article-title":"Control of a quadrotor with reinforcement learning","volume":"2","author":"Hwangbo","year":"2017","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.cviu.2014.12.006","article-title":"On rendering synthetic images for training an object detector","volume":"137","author":"Rozantsev","year":"2015","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017, January 24\u201328). Domain randomization for transferring deep neural networks from simulation to the real world. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chen, Y., Li, W., Sakaridis, C., Dai, D., and Van Gool, L. (2018, January 18\u201322). Domain adaptive faster r-cnn for object detection in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA.","DOI":"10.1109\/CVPR.2018.00352"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1148","DOI":"10.1109\/LRA.2019.2894216","article-title":"Vr-goggles for robots: Real-to-sim domain adaptation for visual control","volume":"4","author":"Zhang","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"879","DOI":"10.1109\/TPAMI.2016.2564408","article-title":"Detecting flying objects using a single moving camera","volume":"39","author":"Rozantsev","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, J., Ye, D.H., Chung, T., Kolsch, M., Wachs, J., and Bouman, C. (2016, January 9\u201314). Multi-target detection and tracking from a single camera in Unmanned Aerial Vehicles (UAVs). Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Deajeon, Korea.","DOI":"10.1109\/IROS.2016.7759733"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Opromolla, R., Fasano, G., and Accardo, D. (2018). A Vision-Based Approach to UAV Detection and Tracking in Cooperative Applications. Sensors, 18.","DOI":"10.3390\/s18103391"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1016\/j.robot.2011.08.006","article-title":"Cooperative target pursuit by multiple UAVs in an adversarial environment","volume":"59","author":"Zengin","year":"2011","journal-title":"Robot. Autonom. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yamasaki, T., and Balakrishnan, S. (July, January 30). Sliding mode based pure pursuit guidance for UAV rendezvous and chase with a cooperative aircraft. Proceedings of the 2010 American Control Conference, Baltimore, MD, USA.","DOI":"10.1109\/ACC.2010.5530997"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Alexopoulos, A., Schmidt, T., and Badreddin, E. (October, January 28). Cooperative pursue in pursuit-evasion games with unmanned aerial vehicles. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7354022"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"617","DOI":"10.2514\/1.57598","article-title":"Vision-based cyclic pursuit for cooperative target tracking","volume":"36","author":"Ma","year":"2013","journal-title":"J. Guid. Control Dyn."},{"key":"ref_17","unstructured":"Yamasaki, T., Enomoto, K., Takano, H., Baba, Y., and Balakrishnan, S. (August, January 10). Advanced pure pursuit guidance via sliding mode approach for chase UAV. Proceedings of the AIAA Guidance, Navigation, and Control Conference, Chicago, IL, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Pestana, J., Sanchez-Lopez, J., Saripalli, S., and Campoy, P. (2014, January 4\u20136). Computer vision based general object following for gps-denied multirotor unmanned vehicles. Proceedings of the 2014 American Control Conference, Portland, OR, USA.","DOI":"10.1109\/ACC.2014.6858831"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Teuliere, C., Eck, L., and Marchand, E. (2011, January 25\u201330). Chasing a moving target from a flying UAV. Proceedings of the 2011 IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6048050"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.conengprac.2013.09.006","article-title":"UAV guidance using a monocular-vision sensor for aerial target tracking","volume":"22","author":"Choi","year":"2014","journal-title":"Control Eng. Pract."},{"key":"ref_21","unstructured":"Li, R., Pang, M., Zhao, C., Zhou, G., and Fang, L. (July, January 26). Monocular long-term target following on uavs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Mueller, M., Sharma, G., Smith, N., and Ghanem, B. (2016, January 9\u201314). Persistent aerial tracking system for uavs. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Deajeon, Korea.","DOI":"10.1109\/IROS.2016.7759253"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1409","DOI":"10.1109\/TPAMI.2011.239","article-title":"Tracking-learning-detection","volume":"34","author":"Kalal","year":"2011","journal-title":"Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1109\/TPAMI.2015.2509974","article-title":"Struck: Structured output tracking with kernels","volume":"38","author":"Hare","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kassab, M.A., Maher, A., Elkazzaz, F., and Baochang, Z. (2019, January 8\u201312). UAV Target Tracking By Detection via Deep Neural Networks. Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China.","DOI":"10.1109\/ICME.2019.00032"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1007\/s11554-018-0780-1","article-title":"Realtime multi-aircraft tracking in aerial scene with deep orientation network","volume":"15","author":"Maher","year":"2018","journal-title":"J. Real-Time Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yao, N., Anaya, E., Tao, Q., Cho, S., Zheng, H., and Zhang, F. (June, January 29). Monocular vision-based human following on miniature robotic blimp. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989369"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"N\u00e4geli, T., Oberholzer, S., Pl\u00fcss, S., Alonso-Mora, J., and Hilliges, O. (2018, January 4\u20137). Flycon: Real-Time Environment-Independent Multi-View Human Pose Estimation with Aerial Vehicles. Proceedings of the SIGGRAPH Asia 2018 Technical, Tokio, Japan.","DOI":"10.1145\/3272127.3275022"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Rafi, F., Khan, S., Shafiq, K., and Shah, M. (2006, January 20\u201321). Autonomous target following by unmanned aerial vehicles. Proceedings of the Unmanned Systems Technology VIII. International Society for Optics and Photonics, San Jose, CA, USA.","DOI":"10.1117\/12.667356"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Qadir, A., Neubert, J., Semke, W., and Schultz, R. (2011). On-Board Visual Trackingwith Unmanned Aircraft System (UAS), American Institute of Aeronautics and Astronautics.","DOI":"10.2514\/6.2011-1503"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., and Ray, A. (2018). Learning Dexterous in-Hand Manipulation. arXiv.","DOI":"10.1177\/0278364919887447"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"James, S., Wohlhart, P., Kalakrishnan, M., Kalashnikov, D., Irpan, A., Ibarz, J., Levine, S., Hadsell, R., and Bousmalis, K. (2018). Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks. arXiv.","DOI":"10.1109\/CVPR.2019.01291"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sampedro, C., Rodriguez-Ramos, A., Gil, I., Mejias, L., and Campoy, P. (2018, January 1\u20135). Image-Based Visual Servoing Controller for Multirotor Aerial Robots Using Deep Reinforcement Learning. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594249"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kang, K., Belkhale, S., Kahn, G., Abbeel, P., and Levine, S. (2019). Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. arXiv.","DOI":"10.1109\/ICRA.2019.8793735"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Sadeghi, F., and Levine, S. (2016). CAD2RL: Real single-image flight without a single real image. arXiv.","DOI":"10.15607\/RSS.2017.XIII.034"},{"key":"ref_36","unstructured":"Wang, F., Zhou, B., Chen, K., Fan, T., Zhang, X., Li, J., Tian, H., and Pan, J. (2018). Intervention Aided Reinforcement Learning for Safe and Practical Policy Optimization in Navigation. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"23805","DOI":"10.3390\/s150923805","article-title":"Vision-based detection and distance estimation of micro unmanned aerial vehicles","volume":"15","author":"Kalkan","year":"2015","journal-title":"Sensors"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Fu, C., Duan, R., Kircali, D., and Kayacan, E. (2016). Onboard robust visual tracking for UAVs using a reliable global-local object model. Sensors, 16.","DOI":"10.3390\/s16091406"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Hinterstoisser, S., Lepetit, V., Wohlhart, P., and Konolige, K. (2018, January 8\u201314). On pre-trained image features and synthetic images for deep learning. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-11009-3_42"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Peng, X., Sun, B., Ali, K., and Saenko, K. (2015, January 11\u201318). Learning deep object detectors from 3d models. Proceedings of the IEEE International Conference on Computer Vision, Las Condes, Chile.","DOI":"10.1109\/ICCV.2015.151"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., and Webb, R. (2017, January 21\u201326). Learning from simulated and unsupervised images through adversarial training. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.241"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., and Birchfield, S. (2018, January 18\u201323). Training deep networks with synthetic data: Bridging the reality gap by domain randomization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00143"},{"key":"ref_43","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_44","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016, January 5\u201310). R-fcn: Object detection via region-based fully convolutional networks. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venezia, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Alabachi, S., Sukthankar, G., and Sukthankar, R. (2019). Customizing Object Detectors for Indoor Robots. arXiv.","DOI":"10.1109\/ICRA.2019.8793551"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venezia, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_48","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_51","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_52","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., and Moritz, P. (2014, January 21\u201326). Trust Region Policy Optimization. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_53","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_54","unstructured":"Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, O.P., and Zaremba, W. (2017, January 4\u20139). Hindsight experience replay. Proceedings of the Advances in Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Dorigo, M., and Colombetti, M. (1998). Robot Shaping: An Experiment in Behavior Engineering, MIT Press.","DOI":"10.7551\/mitpress\/5988.001.0001"},{"key":"ref_56","unstructured":"Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., and Ng, A.Y. (2009, January 12\u201317). ROS: An open-source Robot Operating System. Proceedings of the ICRA Workshop on Open Source Software, Kobe, Japan."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Furrer, F., Burri, M., Achtelik, M., and Siegwart, R. (2016). Rotors\u2014A modular gazebo mav simulator framework. Robot Operating System (ROS), Springer.","DOI":"10.1007\/978-3-319-26054-9_23"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/21\/4794\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:31:46Z","timestamp":1760189506000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/21\/4794"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,4]]},"references-count":58,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["s19214794"],"URL":"https:\/\/doi.org\/10.3390\/s19214794","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,11,4]]}}}