{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:23:17Z","timestamp":1754155397417,"version":"3.41.2"},"reference-count":40,"publisher":"Emerald","issue":"2","license":[{"start":{"date-parts":[[2021,9,24]],"date-time":"2021-09-24T00:00:00Z","timestamp":1632441600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IR"],"published-print":{"date-parts":[[2022,2,11]]},"abstract":"<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title>\n<jats:p>This paper aims to realize a fully distributed multi-UAV collision detection and avoidance based on deep reinforcement learning (DRL). To deal with the problem of low sample efficiency in DRL and speed up the training. To improve the applicability and reliability of the DRL-based approach in multi-UAV control problems.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title>\n<jats:p>In this paper, a fully distributed collision detection and avoidance approach for multi-UAV based on DRL is proposed. A method that integrates human experience into policy training via a human experience-based adviser is proposed. The authors propose a hybrid control method which combines the learning-based policy with traditional model-based control. Extensive experiments including simulations, real flights and comparative experiments are conducted to evaluate the performance of the approach.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Findings<\/jats:title>\n<jats:p>A fully distributed multi-UAV collision detection and avoidance method based on DRL is realized. The reward curve shows that the training process when integrating human experience is significantly accelerated and the mean episode reward is higher than the pure DRL method. The experimental results show that the DRL method with human experience integration has a significant improvement than the pure DRL method for multi-UAV collision detection and avoidance. Moreover, the safer flight brought by the hybrid control method has also been validated.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title>\n<jats:p>The fully distributed architecture is suitable for large-scale unmanned aerial vehicle (UAV) swarms and real applications. The DRL method with human experience integration has significantly accelerated the training compared to the pure DRL method. The proposed hybrid control strategy makes up for the shortcomings of two-dimensional light detection and ranging and other puzzles in applications.<\/jats:p>\n<\/jats:sec>","DOI":"10.1108\/ir-06-2021-0116","type":"journal-article","created":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T12:32:03Z","timestamp":1632313923000},"page":"256-270","source":"Crossref","is-referenced-by-count":6,"title":["Integrating human experience in deep reinforcement learning for multi-UAV collision detection and avoidance"],"prefix":"10.1108","volume":"49","author":[{"given":"Guanzheng","family":"Wang","sequence":"first","affiliation":[]},{"given":"Yinbo","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Zhihong","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xin","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Xiangke","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Jiarun","family":"Yan","sequence":"additional","affiliation":[]}],"member":"140","published-online":{"date-parts":[[2021,9,24]]},"reference":[{"key":"key2022102108063366400_ref001","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1109\/IROS.2008.4651020","article-title":"Learning robot motion control with demonstration and advice-operators","volume-title":"2008 IEEE\/RSJ International Conference on Intelligent Robots and Systems","year":"2008"},{"article-title":"Transfer learning for reinforcement learning on a physical robot","volume-title":"Ninth International Conference on Autonomous Agents and Multiagent Systems-Adaptive Learning Agents Workshop (AAMAS-ALA)","year":"2010","key":"key2022102108063366400_ref002"},{"issue":"1","key":"key2022102108063366400_ref003","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1007\/s10846-018-0839-z","article-title":"An interactive framework for learning continuous actions policies based on corrective feedback","volume":"95","year":"2019","journal-title":"Journal of Intelligent & Robotic Systems"},{"year":"2017","key":"key2022102108063366400_ref004","article-title":"Pre-training neural networks with human demonstrations for deep reinforcement learning"},{"key":"key2022102108063366400_ref005","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1109\/ICIEA.2011.5975758","article-title":"A hybrid approach of virtual force and a\u2217 search algorithm for UAV path re-planning","volume-title":"2011 6th IEEE Conference on Industrial Electronics and Applications","year":"2011"},{"issue":"2","key":"key2022102108063366400_ref006","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/S1672-6529(08)60113-4","article-title":"Max-min adaptive ant colony optimization approach to multi-UAVs coordinated trajectory replanning in dynamic and uncertain environments","volume":"6","year":"2009","journal-title":"Journal of Bionic Engineering"},{"issue":"7","key":"key2022102108063366400_ref007","first-page":"0278364920916531","article-title":"Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios","volume":"39","year":"2020","journal-title":"The International Journal of Robotics Research:"},{"issue":"2","key":"key2022102108063366400_ref008","first-page":"661","article-title":"A machine learning approach to visual perception of Forest trails for mobile robots","volume":"1","year":"2015","journal-title":"IEEE Robotics and Automation Letters"},{"issue":"4","key":"key2022102108063366400_ref009","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1007\/s12555-009-0407-1","article-title":"Proportional navigation-based collision avoidance for UAVs","volume":"7","year":"2009","journal-title":"International Journal of Control, Automation and Systems"},{"key":"key2022102108063366400_ref010","first-page":"788","article-title":"A route planning's method for unmanned aerial vehicles based on improved A-Star algorithm","volume":"29","year":"2008","journal-title":"Acta Armamentarii"},{"key":"key2022102108063366400_ref011","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1145\/1597735.1597738","article-title":"Interactively shaping agents via human reinforcement: the TAMER framework","volume-title":"Proceedings of the Fifth International Conference on Knowledge Capture","year":"2009"},{"key":"key2022102108063366400_ref012","first-page":"2149","article-title":"Design and use paradigms for gazebo, an open-source multi-robot simulator","volume-title":"2004 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566)","year":"2004"},{"key":"key2022102108063366400_ref013","first-page":"1","article-title":"Mission-oriented miniature fixed-wing UAV swarms: a multilayered and distributed architecture","volume-title":"IEEE Transactions on Systems, Man, and Cybernetics: Systems","year":"2020"},{"issue":"2","key":"key2022102108063366400_ref014","doi-asserted-by":"crossref","first-page":"656","DOI":"10.1109\/LRA.2017.2651371","article-title":"Deep-learned collision avoidance policy for distributed multiagent navigation","volume":"2","year":"2017","journal-title":"IEEE Robotics and Automation Letters"},{"key":"key2022102108063366400_ref015","doi-asserted-by":"crossref","first-page":"6235","DOI":"10.1109\/ICRA.2015.7140074","article-title":"PX4: a node-based multithreaded open source robotics framework for deeply embedded platforms","volume-title":"2015 IEEE International Conference on Robotics and Automation (ICRA)","year":"2015"},{"issue":"4","key":"key2022102108063366400_ref016","first-page":"1480","article-title":"Decentralized multi-UAV flight autonomy for moving convoys search and track","volume":"25","year":"2016","journal-title":"IEEE Transactions on Control Systems Technology"},{"issue":"3\/4","key":"key2022102108063366400_ref017","first-page":"165","article-title":"A cooperative perception system for multiple UAVs: application to automatic detection of forest fires","volume":"23","year":"2006","journal-title":"Journal of Field Robotics"},{"issue":"7540","key":"key2022102108063366400_ref018","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","year":"2015","journal-title":"Nature"},{"issue":"3","key":"key2022102108063366400_ref019","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1002\/rob.20235","article-title":"Cooperative use of unmanned sea surface and micro aerial vehicles at hurricane wilma","volume":"25","year":"2008","journal-title":"Journal of Field Robotics"},{"key":"key2022102108063366400_ref020","doi-asserted-by":"crossref","first-page":"1398","DOI":"10.1109\/ICRA.2016.7487274","article-title":"Fast nonlinear model predictive control for unified trajectory optimization and tracking","volume-title":"2016 IEEE International Conference on Robotics and Automation (ICRA)","year":"2016"},{"key":"key2022102108063366400_ref021","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1613\/jair.2447","article-title":"Optimal and approximate Q-value functions for decentralized POMDPs","volume":"32","year":"2008","journal-title":"Journal of Artificial Intelligence Research"},{"first-page":"353","article-title":"Interactive learning with corrective feedback for policies based on deep neural networks","year":"2018","key":"key2022102108063366400_ref022"},{"first-page":"1936","article-title":"Aircraft trajectory planning with collision avoidance using mixed integer linear programming","year":"2002","key":"key2022102108063366400_ref023"},{"key":"key2022102108063366400_ref024","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1109\/CDC.2004.1428700","article-title":"An overview of emerging results in cooperative UAV control","volume-title":"2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601)","year":"2004"},{"article-title":"Proximal policy optimization algorithms","volume-title":"arXiv preprint arXiv:.06347","year":"2017","key":"key2022102108063366400_ref025"},{"issue":"9","key":"key2022102108063366400_ref026","doi-asserted-by":"crossref","first-page":"1084","DOI":"10.1016\/j.conengprac.2009.02.010","article-title":"Co-operative path planning of multiple UAVs using dubins paths with clothoid arcs","volume":"18","year":"2010","journal-title":"Control Engineering Practice"},{"volume-title":"Reinforcement Learning: An Introduction","year":"2018","key":"key2022102108063366400_ref027"},{"first-page":"1000","volume-title":"Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance","year":"2006","key":"key2022102108063366400_ref028"},{"key":"key2022102108063366400_ref029","first-page":"1221","article-title":"Genetic algorithm based path planning for a mobile robot","volume-title":"2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422)","year":"2003"},{"key":"key2022102108063366400_ref030","first-page":"3","article-title":"Reciprocal n-body collision avoidance","volume-title":"Robotics research","year":"2011"},{"key":"key2022102108063366400_ref031","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.1109\/ROBOT.2008.4543489","article-title":"Reciprocal velocity obstacles for real-time multi-agent navigation","volume-title":"2008 IEEE International Conference on Robotics and Automation","year":"2008"},{"issue":"2\/4","key":"key2022102108063366400_ref032","first-page":"189","article-title":"Massively multi-robot simulation in stage","volume":"2","year":"2008","journal-title":"Swarm Intelligence"},{"key":"key2022102108063366400_ref033","first-page":"153","article-title":"Behavior based control of a mobile robot in unknown environments using fuzzy logic","volume":"13","year":"1996","journal-title":"Control Theory Applications"},{"article-title":"XTDrone: a customizable multi-rotor UAVs simulation platform","volume-title":"arXiv preprint arXiv:.09700","year":"2020","key":"key2022102108063366400_ref034"},{"key":"key2022102108063366400_ref035","first-page":"5739","article-title":"Towards sample efficient reinforcement learning","year":"2018","journal-title":"IJCAI"},{"key":"key2022102108063366400_ref036","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1016\/j.paerosci.2015.01.001","article-title":"Sense and avoid technologies with applications to unmanned aircraft systems: review and prospects","volume":"74","year":"2015","journal-title":"Progress in Aerospace Sciences"},{"key":"key2022102108063366400_ref037","first-page":"471","article-title":"A chaotic genetic algorithm (CGA) for path planning of UAVs (unmanned air vehicles)","volume":"24","year":"2006","journal-title":"Journal-Northwestern Polytechnical University"},{"article-title":"Leveraging human guidance for deep reinforcement learning tasks","volume-title":"arXiv preprint arXiv:.09906","year":"2019","key":"key2022102108063366400_ref038"},{"key":"key2022102108063366400_ref039","doi-asserted-by":"crossref","first-page":"122757","DOI":"10.1109\/ACCESS.2020.3007496","article-title":"A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment","volume":"8","year":"2020","journal-title":"IEEE Access"},{"key":"key2022102108063366400_ref040","first-page":"152","article-title":"Path planning for nonholonomic mobile robots using artificial potential field method","volume":"27","year":"2010","journal-title":"Control Theory"}],"container-title":["Industrial Robot: the international journal of robotics research and application"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IR-06-2021-0116\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IR-06-2021-0116\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T21:39:22Z","timestamp":1753393162000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ir\/article\/49\/2\/256-270\/186238"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,24]]},"references-count":40,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,9,24]]},"published-print":{"date-parts":[[2022,2,11]]}},"alternative-id":["10.1108\/IR-06-2021-0116"],"URL":"https:\/\/doi.org\/10.1108\/ir-06-2021-0116","relation":{},"ISSN":["0143-991X","0143-991X"],"issn-type":[{"type":"print","value":"0143-991X"},{"type":"electronic","value":"0143-991X"}],"subject":[],"published":{"date-parts":[[2021,9,24]]}}}