{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T17:20:16Z","timestamp":1780680016930,"version":"3.54.1"},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,2,3]],"date-time":"2025-02-03T00:00:00Z","timestamp":1738540800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>Traditional search and rescue methods in wilderness areas can be time-consuming and have limited coverage. Drones offer a faster and more flexible solution, but optimizing their search paths is crucial for effective operations. This paper proposes a novel algorithm using deep reinforcement learning to create efficient search paths for drones in wilderness environments. Our approach leverages <jats:italic>a priori<\/jats:italic> data about the search area and the missing person in the form of a probability distribution map. This allows the policy to learn optimal flight paths that maximize the probability of finding the missing person quickly. Experimental results show that our method achieves a significant improvement in search times compared to traditional coverage planning and search planning algorithms by over <jats:inline-formula><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" id=\"m1\"><mml:mrow><mml:mn>160<\/mml:mn><mml:mi>%<\/mml:mi><\/mml:mrow><\/mml:math><\/jats:inline-formula>, a difference that can mean life or death in real-world search operations Additionally, unlike previous work, our approach incorporates a continuous action space enabled by cubature, allowing for more nuanced flight patterns.<\/jats:p>","DOI":"10.3389\/frobt.2024.1527095","type":"journal-article","created":{"date-parts":[[2025,2,3]],"date-time":"2025-02-03T09:03:06Z","timestamp":1738573386000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Deep reinforcement learning for time-critical wilderness search and rescue using drones"],"prefix":"10.3389","volume":"11","author":[{"given":"Jan-Hendrik","family":"Ewers","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Anderson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Douglas","family":"Thomson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,2,3]]},"reference":[{"key":"B1","unstructured":"Deep reinforcement learning at the edge of the statistical precipice\n          \n          \n            \n              Agarwal\n              R.\n            \n            \n              Schwarzer\n              M.\n            \n            \n              Castro\n              P. S.\n            \n            \n              Courville\n              A.\n            \n            \n              Bellemare\n              M. G.\n            \n          \n          \n          2022"},{"key":"B2","article-title":"Flying to the rescue: Scottish mountain teams are turning to drones","author":"Carrell","year":"2022","journal-title":"Guard"},{"key":"B3","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/BF01553881","article-title":"Constrained delaunay triangulations","volume":"4","author":"Chew","year":"1987","journal-title":"Algorithmica"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1312","DOI":"10.1109\/TMC.2020.2966989","article-title":"Autonomous uav trajectory for localizing ground objects: a reinforcement learning approach","volume":"20","author":"Ebrahimi","year":"2021","journal-title":"IEEE Trans. Mob. Comput."},{"key":"B5","first-page":"1466","article-title":"GIS data driven probability map generation for search and rescue using agents","volume-title":"IFAC world congress 2023","author":"Ewers","year":""},{"key":"B6","doi-asserted-by":"publisher","first-page":"e167","DOI":"10.1002\/adc2.167","article-title":"Optimal path planning using psychological profiling in drone-assisted missing person search","volume":"5","author":"Ewers","year":"","journal-title":"Adv. Control Appl."},{"key":"B7","doi-asserted-by":"publisher","first-page":"7825","DOI":"10.1109\/iros58592.2024.10801978","article-title":"Enhancing reinforcement learning in sensor fusion: a comparative analysis of cubature and sampling-based integration methods for rover search planning","author":"Ewers","year":"2024","journal-title":"arXiv"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1512.08562","article-title":"Taming the noise in reinforcement learning via soft updates","author":"Fox","year":"2017","journal-title":"arXiv"},{"key":"B9","first-page":"3864","article-title":"Full quaternion based attitude control for a quadrotor","author":"Fresk","year":"2013"},{"key":"B10","doi-asserted-by":"publisher","first-page":"1258","DOI":"10.1016\/j.robot.2013.09.004","article-title":"A survey on coverage path planning for robotics","volume":"61","author":"Galceran","year":"2013","journal-title":"Robotics Aut. Syst."},{"key":"B11","doi-asserted-by":"publisher","first-page":"4613","DOI":"10.1007\/s10115-023-01924-4","article-title":"A robust adaptive linear regression method for severe noise","volume":"65","author":"Guo","year":"2023","journal-title":"Knowl. Inf. Syst."},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1801.01290","article-title":"Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor","author":"Haarnoja","year":"2018","journal-title":"arXiv"},{"key":"B13","article-title":"Soft actor-critic algorithms and applications","author":"Haarnoja","year":"2019"},{"key":"B14","doi-asserted-by":"publisher","first-page":"5873","DOI":"10.1038\/s41598-022-09502-4","article-title":"An agent-based model reveals lost person behavior based on data from wilderness search and rescue","volume":"12","author":"Hashimoto","year":"2022","journal-title":"Sci. Rep."},{"key":"B15","doi-asserted-by":"publisher","first-page":"982","DOI":"10.1038\/s41586-023-06419-4","article-title":"Champion-level drone racing using deep reinforcement learning","volume":"620","author":"Kaufmann","year":"2023","journal-title":"Nature"},{"key":"B16","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1412.6980","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2017","journal-title":"arXiv"},{"key":"B17","article-title":"Sweep width estimation for ground search and rescue","author":"Koester","year":"2004"},{"key":"B18","doi-asserted-by":"crossref","DOI":"10.2514\/6.2010-3360","article-title":"Information-rich path planning with general constraints using rapidly-exploring random trees","volume-title":"AIAA Infotech@Aerospace 2010","author":"Levine","year":"2010"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2110.08825","article-title":"Localization with sampling-argmax","author":"Li","year":"2021","journal-title":"arXiv"},{"key":"B20","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1109\/IROS.2009.5354455","article-title":"UAV intelligent path planning for wilderness search and rescue","author":"Lin","year":"2009","journal-title":"2009 IEEE\/RSJ Int. Conf. Intelligent Robots Syst."},{"key":"B21","doi-asserted-by":"publisher","first-page":"2532","DOI":"10.1109\/TCYB.2014.2309898","article-title":"Hierarchical heuristic search using a Gaussian mixture model for UAV coverage planning","volume":"44","author":"Lin","year":"2014","journal-title":"IEEE Trans. Cybern."},{"key":"B22","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B23","doi-asserted-by":"publisher","first-page":"36","DOI":"10.4236\/jilsa.2023.151003","article-title":"A comparison of PPO, TD3 and SAC reinforcement algorithms for quadruped walking gait generation","volume":"15","author":"Mock","year":"2023","journal-title":"J. Intelligent Learn. Syst. Appl."},{"key":"B24","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/j.ejor.2022.06.019","article-title":"Ant colony optimization for path planning in search and rescue operations","volume":"305","author":"Morin","year":"2023","journal-title":"Eur. J. Operational Res."},{"key":"B25","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1109\/SSRR50563.2020.9292613","article-title":"Wilderness search and rescue missions using deep reinforcement learning","author":"Peake","year":"2020","journal-title":"2020 IEEE Int. Symposium Saf. Secur. Rescue Robotics (SSRR)"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.11077334","article-title":"Stable-Baselines3: reliable reinforcement learning implementations","author":"Raffin","year":"2021","journal-title":"J. Mach. Learn. Res"},{"key":"B27","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1007\/978-3-319-28872-7_37","article-title":"Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments","volume-title":"Robotics research","author":"Richter","year":"2016"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1707.06347","article-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017","journal-title":"arXiv"},{"key":"B29","doi-asserted-by":"publisher","first-page":"80","DOI":"10.3390\/ijgi10020080","article-title":"Lost person search area prediction based on regression and transfer learning models","volume":"10","author":"\u0160eri\u0107","year":"2021","journal-title":"ISPRS Int. J. Geo-Information"},{"key":"B30","doi-asserted-by":"crossref","DOI":"10.2514\/6.2020-0987","article-title":"A probabilistic path planning framework for optimizing feasible trajectories of autonomous search vehicles leveraging the projected-search reduced hessian method","volume-title":"AIAA scitech 2020 forum","author":"Subramanian","year":"2020"},{"key":"B31","article-title":"A novel methodology for autonomous planetary exploration using multi-robot teams","author":"Swinton","year":"2024"},{"key":"B32","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1007\/978-3-031-22695-3_51","article-title":"Autonomous UAV navigation in wilderness search-and-rescue operations using deep reinforcement learning","volume-title":"AI 2022: advances in artificial intelligence","author":"Talha","year":"2022"},{"key":"B33","article-title":"Underactuated robotics: algorithms for walking","volume-title":"Running, swimming, flying, and manipulation (course notes for MIT 6.832)","author":"Tedrake","year":"2023"},{"key":"B34","first-page":"142","article-title":"Supporting search and rescue operations with UAVs","author":"Waharte","year":"2010"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.03811","article-title":"Environment transformer and policy optimization for model-based offline reinforcement learning","author":"Wang","year":"2023","journal-title":"arXiv"},{"key":"B36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3306346.3322940","article-title":"Learning to fly: computational controller design for hybrid UAVs with reinforcement learning","volume":"38","author":"Xu","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"B37","doi-asserted-by":"publisher","first-page":"822","DOI":"10.1109\/TCST.2017.2781655","article-title":"Optimal UAV route planning for coverage search of stationary target in river","volume":"27","author":"Yao","year":"2019","journal-title":"IEEE Trans. Control Syst. Technol."},{"key":"B38","doi-asserted-by":"publisher","first-page":"2387","DOI":"10.1109\/TMECH.2023.3286102","article-title":"Catch planner: catching high-speed targets in the flight","volume":"28","author":"Yu","year":"2023","journal-title":"IEEE\/ASME Trans. Mechatronics"},{"key":"B39","doi-asserted-by":"publisher","first-page":"739","DOI":"10.2514\/1.I010961","article-title":"Cooperative planning for an unmanned combat aerial vehicle fleet using reinforcement learning","volume":"18","author":"Yuksek","year":"2021","journal-title":"J. Aerosp. Inf. Syst."},{"key":"B40","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1109\/ICAA64256.2024.00016","article-title":"Shrinking pomcp: a framework for real-time UAV search and rescue","author":"Zhang","year":"2024","journal-title":"2024 Int. Conf. Assur. Aut. (ICAA)"}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2024.1527095\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,3]],"date-time":"2025-02-03T09:03:13Z","timestamp":1738573393000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2024.1527095\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,3]]},"references-count":40,"alternative-id":["10.3389\/frobt.2024.1527095"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2024.1527095","relation":{},"ISSN":["2296-9144"],"issn-type":[{"value":"2296-9144","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,3]]},"article-number":"1527095"}}