{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T07:16:42Z","timestamp":1773991002034,"version":"3.50.1"},"reference-count":86,"publisher":"Wiley","issue":"11","license":[{"start":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T00:00:00Z","timestamp":1663113600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100008982","name":"National Science Foundation of Sri Lanka","doi-asserted-by":"publisher","award":["DBI\u20101942280"],"award-info":[{"award-number":["DBI\u20101942280"]}],"id":[{"id":"10.13039\/501100008982","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["besjournals.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Methods Ecol Evol"],"published-print":{"date-parts":[[2022,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    <jats:list>\n                      <jats:list-item>\n                        <jats:p>Can machine learning help us make better decisions about a changing planet? In this paper, we illustrate and discuss the potential of a promising corner of machine learning known as deep reinforcement learning (RL) to help tackle the most challenging conservation decision problems. We provide a conceptual and technical introduction to deep RL as well as annotated code so that researchers can adopt, evaluate and extend these approaches.<\/jats:p>\n                      <\/jats:list-item>\n                      <jats:list-item>\n                        <jats:p>RL explicitly focuses on designing an agent who interacts with an environment that is dynamic and uncertain. Deep RL is the subfield of RL that incorporates deep neural networks into the agent. We train deep RL agents to solve sequential decision\u2010making problems in setting fisheries quotas and managing ecological tipping points.<\/jats:p>\n                      <\/jats:list-item>\n                      <jats:list-item>\n                        <jats:p>We show that a deep RL agent is able to learn a nearly optimal solution for the fisheries management problem. For the tipping point problem, we show that a deep RL agent can outperform a sensible rule\u2010of\u2010thumb strategy.<\/jats:p>\n                      <\/jats:list-item>\n                      <jats:list-item>\n                        <jats:p>Our results demonstrate that deep RL has the potential to solve challenging decision problems in conservation. While this potential may be compelling, the challenges involved in successfully deploying RL\u2010based management to realistic scenarios are formidable\u2014the required expertise and computational cost may place these applications beyond the reach of all but large, international technology firms. Ecologists must establish a better understanding of how these algorithms work and fail if we are to realize this potential and avoid the pitfalls such a transition would bring. We ultimately set forth a research framework based on well\u2010posed, public challenges so that ecologists and computer scientists can collaborate towards solving hard decision\u2010making problems in conservation.<\/jats:p>\n                      <\/jats:list-item>\n                    <\/jats:list>\n                  <\/jats:p>","DOI":"10.1111\/2041-210x.13954","type":"journal-article","created":{"date-parts":[[2022,9,15]],"date-time":"2022-09-15T01:05:10Z","timestamp":1663203910000},"page":"2649-2662","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["Deep reinforcement learning for conservation decisions"],"prefix":"10.1111","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2588-7843","authenticated-orcid":false,"given":"Marcus","family":"Lapeyrolerie","sequence":"first","affiliation":[{"name":"Department of Environmental Science, Policy, and Management University of California, Berkeley  Berkeley California USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Melissa S.","family":"Chapman","sequence":"additional","affiliation":[{"name":"Department of Environmental Science, Policy, and Management University of California, Berkeley  Berkeley California USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kari E. A.","family":"Norman","sequence":"additional","affiliation":[{"name":"Department of Environmental Science, Policy, and Management University of California, Berkeley  Berkeley California USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1642-628X","authenticated-orcid":false,"given":"Carl","family":"Boettiger","sequence":"additional","affiliation":[{"name":"Department of Environmental Science, Policy, and Management University of California, Berkeley  Berkeley California USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2022,9,14]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2743240"},{"key":"e_1_2_10_3_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1611.06256"},{"key":"e_1_2_10_4_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature11018"},{"key":"e_1_2_10_5_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0095693"},{"key":"e_1_2_10_6_1","article-title":"OpenAI Gym","author":"Brockman Greg","year":"2016","journal-title":"arXiv:1606.01540 [Cs]"},{"key":"e_1_2_10_7_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1203672"},{"key":"e_1_2_10_8_1","doi-asserted-by":"publisher","DOI":"10.1038\/538020a"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1111\/2041\u2010210X.13692"},{"key":"e_1_2_10_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.oneear.2021.05.011"},{"key":"e_1_2_10_11_1","doi-asserted-by":"publisher","DOI":"10.1086\/260090"},{"key":"e_1_2_10_12_1","volume-title":"Mathematical bioeconomics: The optimal management of renewable resources","author":"Clark C. W.","year":"1990"},{"key":"e_1_2_10_13_1","volume-title":"Mathematical bioeconomics: The mathematics of conservation","author":"Clark C. W.","year":"2010"},{"key":"e_1_2_10_14_1","doi-asserted-by":"publisher","DOI":"10.1002\/9781118506196"},{"key":"e_1_2_10_15_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1520420113"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1219805"},{"key":"e_1_2_10_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_10_18_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1251817"},{"key":"e_1_2_10_19_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.abc3189"},{"key":"e_1_2_10_20_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1802.01561"},{"key":"e_1_2_10_21_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i17.17735"},{"key":"e_1_2_10_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tree.2009.03.020"},{"key":"e_1_2_10_23_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1939\u20107445.2005.tb00147.x"},{"key":"e_1_2_10_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.beproc.2018.01.008"},{"key":"e_1_2_10_25_1","article-title":"Addressing function approximation error in actor\u2010critic methods","author":"Fujimoto S.","year":"2018","journal-title":"arXiv:1802.09477 [Cs, Stat]"},{"key":"e_1_2_10_26_1","doi-asserted-by":"publisher","DOI":"10.1111\/ele.12893"},{"key":"e_1_2_10_27_1","first-page":"1332","volume-title":"Proceedings of the 31st international conference on machine learning","author":"Grande R.","year":"2014"},{"key":"e_1_2_10_28_1","doi-asserted-by":"publisher","DOI":"10.1002\/9781444398557"},{"key":"e_1_2_10_29_1","article-title":"Q\u2010prop: Sample\u2010efficient policy gradient with an off\u2010policy critic","author":"Gu S.","year":"2017","journal-title":"arXiv:1611.02247 [Cs]"},{"key":"e_1_2_10_30_1","article-title":"Learning to walk in the real world with minimal human effort","author":"Ha S.","year":"2020","journal-title":"arXiv:2002.08550 [Cs]"},{"key":"e_1_2_10_31_1","article-title":"Soft actor\u2010critic: Off\u2010policy maximum entropy deep reinforcement learning with a stochastic actor","author":"Haarnoja T.","year":"2018","journal-title":"arXiv:1801.01290 [Cs, Stat]"},{"key":"e_1_2_10_32_1","article-title":"Inverse Reward Design","author":"Hadfield\u2010Menell D.","year":"2020","journal-title":"arXiv:1711.02827 [Cs]"},{"key":"e_1_2_10_33_1","article-title":"Learning latent dynamics for planning from pixels","author":"Hafner D.","year":"2019","journal-title":"arXiv:1811.04551 [Cs, Stat]"},{"key":"e_1_2_10_34_1","volume-title":"Encyclopedia of theoretical ecology","author":"Hastings A.","year":"2012"},{"key":"e_1_2_10_35_1","article-title":"Deep reinforcement learning that matters","author":"Henderson P.","year":"2019","journal-title":"arXiv:1709.06560 [Cs, Stat]"},{"key":"e_1_2_10_36_1","article-title":"Measuring the algorithmic efficiency of neural networks","author":"Hernandez D.","year":"2020","journal-title":"arXiv:2005.04305 [Cs, Stat]"},{"key":"e_1_2_10_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0893\u20106080(89)90020\u20108"},{"key":"e_1_2_10_38_1","unstructured":"Huang D. (2018).How Much Did AlphaGo Zero Cost?https:\/\/www.yuzeh.com\/data\/agz\u2010cost.html"},{"key":"e_1_2_10_39_1","article-title":"When to trust your model: Model\u2010based policy optimization","author":"Janner M.","year":"2019","journal-title":"arXiv:1906.08253 [Cs, Stat]"},{"key":"e_1_2_10_40_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1523\u20101739.2008.01124.x"},{"key":"e_1_2_10_41_1","doi-asserted-by":"publisher","DOI":"10.1111\/ele.13462"},{"key":"e_1_2_10_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341216.3342218"},{"key":"e_1_2_10_43_1","article-title":"A survey of generalisation in deep reinforcement learning","author":"Kirk R.","year":"2022","journal-title":"arXiv:2111.09794 [Cs]"},{"key":"e_1_2_10_44_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1461\u20100248.2011.01702.x"},{"key":"e_1_2_10_45_1","doi-asserted-by":"publisher","DOI":"10.5281\/ZENODO.6886892"},{"issue":"4","key":"e_1_2_10_46_1","first-page":"421","article-title":"The strategy of model building in population biology","volume":"54","author":"Levins R.","year":"1966","journal-title":"American Scientist"},{"key":"e_1_2_10_47_1","article-title":"Continuous control with deep reinforcement learning","author":"Lillicrap T. P.","year":"2019","journal-title":"arXiv:1509.02971 [Cs, Stat]"},{"key":"e_1_2_10_48_1","doi-asserted-by":"publisher","DOI":"10.1111\/2041\u2010210X.12082"},{"key":"e_1_2_10_49_1","doi-asserted-by":"publisher","DOI":"10.1038\/d41586\u2010018\u201006870\u20108"},{"key":"e_1_2_10_50_1","doi-asserted-by":"publisher","DOI":"10.1038\/269471a0"},{"key":"e_1_2_10_51_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1461\u20100248.2004.00624.x"},{"key":"e_1_2_10_52_1","article-title":"Asynchronous methods for deep reinforcement learning","author":"Mnih V.","year":"2016","journal-title":"arXiv:1602.01783 [Cs]"},{"key":"e_1_2_10_53_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_2_10_54_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature13946"},{"key":"e_1_2_10_55_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.258.5086.1315"},{"key":"e_1_2_10_56_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1808.00177"},{"key":"e_1_2_10_57_1","doi-asserted-by":"publisher","DOI":"10.2307\/2963479"},{"key":"e_1_2_10_58_1","article-title":"A multi\u2010agent reinforcement learning model of common\u2010Pool resource appropriation","author":"Perolat J.","year":"2017","journal-title":"arXiv:1707.06600 [Cs, q\u2010Bio]"},{"key":"e_1_2_10_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tree.2011.04.007"},{"key":"e_1_2_10_60_1","article-title":"Temporal difference models: Model\u2010free deep RL for model\u2010based control","author":"Pong V.","year":"2020","journal-title":"arXiv:1802.09081 [Cs]"},{"key":"e_1_2_10_61_1","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.aap7885"},{"key":"e_1_2_10_62_1","doi-asserted-by":"publisher","DOI":"10.1111\/faf.12104"},{"key":"e_1_2_10_63_1","unstructured":"RAM Legacy Stock Assessment Database. (2020).RAM legacy stock assessment database V4.491.https:\/\/doi.org\/10.5281\/zenodo.3676088"},{"key":"e_1_2_10_64_1","doi-asserted-by":"publisher","DOI":"10.1016\/0095\u20100696(79)90014\u20107"},{"key":"e_1_2_10_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02464432"},{"key":"e_1_2_10_66_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev\u2010ecolsys\u2010112414\u2010054242"},{"key":"e_1_2_10_67_1","article-title":"Trust region policy optimization","author":"Schulman J.","year":"2017","journal-title":"arXiv:1502.05477 [Cs]"},{"key":"e_1_2_10_68_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1707.06347"},{"key":"e_1_2_10_69_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosust.2021.01.009"},{"key":"e_1_2_10_70_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_2_10_71_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.aar6404"},{"key":"e_1_2_10_72_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature24270"},{"key":"e_1_2_10_73_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41893\u2010022\u201000851\u20106"},{"key":"e_1_2_10_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecolmodel.2015.06.031"},{"key":"e_1_2_10_75_1","volume-title":"Reinforcement learning: An introduction","author":"Sutton R. S.","year":"2018"},{"key":"e_1_2_10_76_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1007783"},{"key":"e_1_2_10_77_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.anbehav.2016.12.005"},{"key":"e_1_2_10_78_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10588\u2010012\u20109137\u20107"},{"key":"e_1_2_10_79_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586\u2010019\u20101724\u2010z"},{"key":"e_1_2_10_80_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.es.09.110178.001105"},{"key":"e_1_2_10_81_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecocom.2020.100815"},{"key":"e_1_2_10_82_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.1707.06203"},{"key":"e_1_2_10_83_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature04366"},{"key":"e_1_2_10_84_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1132294"},{"key":"e_1_2_10_85_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1173146"},{"key":"e_1_2_10_86_1","article-title":"Robust reinforcement learning under minimax regret for green security","author":"Xu L.","year":"2021","journal-title":"arXiv"},{"key":"e_1_2_10_87_1","doi-asserted-by":"publisher","DOI":"10.1021\/acscentsci.7b00492"}],"container-title":["Methods in Ecology and Evolution"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/2041-210X.13954","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1111\/2041-210X.13954","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/besjournals.onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/2041-210X.13954","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,5]],"date-time":"2025-08-05T07:29:50Z","timestamp":1754378990000},"score":1,"resource":{"primary":{"URL":"https:\/\/besjournals.onlinelibrary.wiley.com\/doi\/10.1111\/2041-210X.13954"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,14]]},"references-count":86,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,11]]}},"alternative-id":["10.1111\/2041-210X.13954"],"URL":"https:\/\/doi.org\/10.1111\/2041-210x.13954","archive":["Portico"],"relation":{"has-review":[{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v3\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v2\/review3","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v2\/review2","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v2\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v3\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v2\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v2\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1111\/2041-210X.13954\/v1\/decision1","asserted-by":"object"}]},"ISSN":["2041-210X","2041-210X"],"issn-type":[{"value":"2041-210X","type":"print"},{"value":"2041-210X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,14]]},"assertion":[{"value":"2022-09-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-14","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}