{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:25:12Z","timestamp":1772173512504,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1013454","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T00:00:00Z","timestamp":1758240000000}}],"reference-count":56,"publisher":"Public Library of Science (PLoS)","issue":"9","license":[{"start":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T00:00:00Z","timestamp":1757635200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100021086","name":"NTT Research","doi-asserted-by":"publisher","award":["A47994"],"award-info":[{"award-number":["A47994"]}],"id":[{"id":"10.13039\/100021086","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["RF1NS128865"],"award-info":[{"award-number":["RF1NS128865"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01DC017311"],"award-info":[{"award-number":["R01DC017311"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Dogs and laboratory mice are commonly trained to perform complex tasks by guiding them through a curriculum of simpler tasks (\u2018shaping\u2019). What are the principles behind effective shaping strategies? Here, we propose a teacher-student framework for shaping behavior, where an autonomous teacher agent decides its student\u2019s task based on the student\u2019s transcript of successes and failures on previously assigned tasks. Using algorithms for Monte Carlo planning under uncertainty, we show that near-optimal shaping algorithms achieve a careful balance between reinforcement and extinction. Near-optimal algorithms track learning rate to adaptively alternate between simpler and harder tasks. Based on this intuition, we derive an adaptive shaping heuristic with minimal parameters, which we show is near-optimal on a sequence learning task and robustly trains deep reinforcement learning agents on navigation tasks that involve sparse, delayed rewards. Extensions to continuous curricula are explored. Our work provides a starting point towards a general computational framework for shaping behavior that applies to both animals and artificial agents.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013454","type":"journal-article","created":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T18:02:41Z","timestamp":1757700161000},"page":"e1013454","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":0,"title":["Adaptive algorithms for shaping behavior"],"prefix":"10.1371","volume":"21","author":[{"given":"William L.","family":"Tong","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2443-4252","authenticated-orcid":true,"given":"Venkatesh N.","family":"Murthy","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1276-9613","authenticated-orcid":true,"given":"Gautam","family":"Reddy","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2025,9,12]]},"reference":[{"key":"pcbi.1013454.ref001","unstructured":"Cooper JO, Heron TE, Heward WL. Applied behavior analysis. Pearson UK; 2020."},{"key":"pcbi.1013454.ref002","unstructured":"Lindsay SR. Handbook of applied dog behavior and training, adaptation and learning. John Wiley & Sons; 2013."},{"key":"pcbi.1013454.ref003","unstructured":"Skinner B. The behavior of organisms: an experimental analysis. BF Skinner Foundation; 2019."},{"key":"pcbi.1013454.ref004","unstructured":"Pryor K. Don\u2019t shoot the dog: The art of teaching and training. Simon & Schuster; 2019."},{"issue":"3","key":"pcbi.1013454.ref005","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1901\/jaba.2011.44-559","article-title":"A comparison of procedural variations in teaching behavior chains: manual guidance, trainer completion, and no completion of untrained steps","volume":"44","author":"SL Bancroft","year":"2011","journal-title":"J Appl Behav Anal"},{"key":"pcbi.1013454.ref006","doi-asserted-by":"crossref","first-page":"36","DOI":"10.3389\/fnbeh.2018.00036","article-title":"An accumulation-of-evidence task using visual pulses for mice navigating in virtual reality","volume":"12","author":"L Pinto","year":"2018","journal-title":"Front Behav Neurosci"},{"issue":"2","key":"pcbi.1013454.ref007","article-title":"Procedures for behavioral experiments in head-fixed mice","volume":"9","author":"ZV Guo","year":"2014","journal-title":"PLoS One"},{"issue":"9","key":"pcbi.1013454.ref008","doi-asserted-by":"crossref","first-page":"1225","DOI":"10.1038\/nn.3775","article-title":"An olfactory cocktail party: figure-ground segregation of odorants in rodents","volume":"17","author":"D Rokni","year":"2014","journal-title":"Nat Neurosci"},{"key":"pcbi.1013454.ref009","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.63711","article-title":"Standardized and reproducible measurement of decision-making in mice","volume":"10","author":"International Brain Laboratory","year":"2021","journal-title":"Elife"},{"issue":"2","key":"pcbi.1013454.ref010","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1016\/j.neuron.2011.07.010","article-title":"A cortical substrate for memory-guided orienting in the rat","volume":"72","author":"JC Erlich","year":"2011","journal-title":"Neuron"},{"issue":"12","key":"pcbi.1013454.ref011","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0083171","article-title":"A fully automated high-throughput training system for rodents","volume":"8","author":"R Poddar","year":"2013","journal-title":"PLoS One"},{"key":"pcbi.1013454.ref012","unstructured":"Kepple DR, Engelken R, Rajan K. Curriculum learning as a tool to uncover learning principles in the brain. In: 2022."},{"key":"pcbi.1013454.ref013","doi-asserted-by":"crossref","first-page":"102555","DOI":"10.1016\/j.conb.2022.102555","article-title":"Learning, fast and slow","volume":"75","author":"M Meister","year":"2022","journal-title":"Curr Opin Neurobiol"},{"key":"pcbi.1013454.ref014","unstructured":"Selfridge OG, Sutton RS, Barto AG. Ijcai. 1985. p. 670\u20132."},{"key":"pcbi.1013454.ref015","doi-asserted-by":"crossref","unstructured":"Gullapalli V, Barto AG. Shaping as a method for accelerating reinforcement learning. In: Proceedings of the 1992 IEEE International Symposium on Intelligent Control. 1992. p. 554\u20139.","DOI":"10.1109\/ISIC.1992.225046"},{"issue":"1","key":"pcbi.1013454.ref016","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/0010-0277(93)90058-4","article-title":"Learning and development in neural networks: the importance of starting small","volume":"48","author":"JL Elman","year":"1993","journal-title":"Cognition"},{"key":"pcbi.1013454.ref017","unstructured":"Randl\u00f8v J, Alstr\u00f8m P. Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the International Conference on Machine Learning (ICML), 1998. 463\u201371."},{"issue":"3","key":"pcbi.1013454.ref018","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1016\/j.cognition.2008.11.014","article-title":"Flexible shaping: how learning in small steps helps","volume":"110","author":"KA Krueger","year":"2009","journal-title":"Cognition"},{"key":"pcbi.1013454.ref019","doi-asserted-by":"crossref","unstructured":"Dorigo M, Colombetti M. Robot shaping: an experiment in behavior engineering. MIT Press; 1998.","DOI":"10.7551\/mitpress\/5988.001.0001"},{"key":"pcbi.1013454.ref020","doi-asserted-by":"crossref","unstructured":"Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, 2009. p. 41\u20138.","DOI":"10.1145\/1553374.1553380"},{"key":"pcbi.1013454.ref021","doi-asserted-by":"crossref","unstructured":"Portelas R, Colas C, Weng L, Hofmann K, Oudeyer PY. Automatic curriculum learning for deep rl: a short survey. arXiv preprint 2020. https:\/\/doi.org\/arXiv:200304664","DOI":"10.24963\/ijcai.2020\/671"},{"key":"pcbi.1013454.ref022","unstructured":"Florensa C, Held D, Wulfmeier M, Zhang M, Abbeel P. In: Conference on robot learning. 2017. p. 482\u201395."},{"key":"pcbi.1013454.ref023","doi-asserted-by":"crossref","unstructured":"Ivanovic B, Harrison J, Sharma A, Chen M, Pavone M. Barc: Backward reachability curriculum for robotic reinforcement learning. In: 2019 International Conference on Robotics and Automation (ICRA). 2019. p. 15\u201321.","DOI":"10.1109\/ICRA.2019.8794206"},{"key":"pcbi.1013454.ref024","unstructured":"Salimans T, Chen R. Learning montezuma\u2019s revenge from a single demonstration. arXiv preprint 2018. https:\/\/arxiv.org\/abs\/1812.03381"},{"key":"pcbi.1013454.ref025","article-title":"Intrinsically motivated reinforcement learning","volume":"17","author":"N Chentanez","year":"2004","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"pcbi.1013454.ref026","first-page":"6818","article-title":"Intrinsically motivated goal exploration processes with automatic curriculum learning","volume":"23","author":"S Forestier","year":"2022","journal-title":"The Journal of Machine Learning Research"},{"key":"pcbi.1013454.ref027","article-title":"Unifying count-based exploration and intrinsic motivation","volume":"29","author":"M Bellemare","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"pcbi.1013454.ref028","unstructured":"Pathak D, Gandhi D, Gupta A. Self-supervised exploration via disagreement. In: International Conference on Machine Learning. 2019. p. 5062\u201371."},{"key":"pcbi.1013454.ref029","unstructured":"Shyam P, Ja\u015bkowski W, Gomez F. Model-based active exploration. In: International Conference on Machine Learning. PMLR; 2019. p. 5779\u201388."},{"key":"pcbi.1013454.ref030","unstructured":"Eysenbach B, Gupta A, Ibarz J, Levine S. Diversity is all you need: learning skills without a reward function. arXiv preprint 2018. https:\/\/arxiv.org\/abs\/1802.06070"},{"key":"pcbi.1013454.ref031","unstructured":"Yang T, Tang H, Bai C, Liu J, Hao J, Meng Z. Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint 2021. https:\/\/arxiv.org\/abs\/2109.06668"},{"key":"pcbi.1013454.ref032","article-title":"Exploration in deep reinforcement learning: a survey","author":"P Ladosz","year":"2022","journal-title":"Information Fusion"},{"key":"pcbi.1013454.ref033","unstructured":"Ng AY, Harada D, Russell S. Policy invariance under reward transformations: theory and application to reward shaping. In: ICML. vol. 99. 1999. p. 278\u201387."},{"key":"pcbi.1013454.ref034","first-page":"15931","article-title":"Learning to utilize shaping rewards: a new approach of reward shaping","volume":"33","author":"Y Hu","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"pcbi.1013454.ref035","unstructured":"Laud AD. Theory and application of reward shaping in reinforcement learning. University of Illinois at Urbana-Champaign; 2004."},{"key":"pcbi.1013454.ref036","unstructured":"Fournier P, Sigaud O, Chetouani M, Oudeyer PY. Accuracy-based curriculum learning in deep reinforcement learning. arXiv preprint. 2018. https:\/\/arxiv.org\/abs\/1806.09614"},{"key":"pcbi.1013454.ref037","doi-asserted-by":"crossref","unstructured":"Nair A, McGrew B, Andrychowicz M, Zaremba W, Abbeel P. Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). 2018. p. 6292\u20139.","DOI":"10.1109\/ICRA.2018.8463162"},{"key":"pcbi.1013454.ref038","unstructured":"Bajaj V, Sharon G, Stone P. Task phasing: Automated curriculum learning from demonstrations. In: 2022. https:\/\/arxiv.org\/abs\/2210.10999"},{"issue":"9","key":"pcbi.1013454.ref039","doi-asserted-by":"crossref","first-page":"3732","DOI":"10.1109\/TNNLS.2019.2934906","article-title":"Teacher-student curriculum learning","volume":"31","author":"T Matiisen","year":"2020","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"pcbi.1013454.ref040","unstructured":"Portelas R, Colas C, Hofmann K, Oudeyer PY. Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In: Conference on Robot Learning. PMLR; 2020. p. 835\u201353."},{"issue":"2","key":"pcbi.1013454.ref041","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1109\/TEVC.2006.890271","article-title":"Intrinsic motivation systems for autonomous mental development","volume":"11","author":"PY Oudeyer","year":"2007","journal-title":"IEEE Transactions on Evolutionary Computation"},{"issue":"49","key":"pcbi.1013454.ref042","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2215352119","article-title":"A reinforcement-based mechanism for discontinuous learning","volume":"119","author":"G Reddy","year":"2022","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"1","key":"pcbi.1013454.ref043","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1016\/0022-247X(65)90154-X","article-title":"Optimal control of Markov processes with incomplete state information","volume":"10","author":"KJ \u00c5str\u00f6m","year":"1965","journal-title":"Journal of Mathematical Analysis and Applications"},{"issue":"2","key":"pcbi.1013454.ref044","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1287\/opre.26.2.282","article-title":"The optimal control of partially observable Markov processes over the infinite horizon: discounted costs","volume":"26","author":"EJ Sondik","year":"1978","journal-title":"Operations Research"},{"key":"pcbi.1013454.ref045","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0004-3702(98)00023-X","article-title":"Planning and acting in partially observable stochastic domains","volume":"101","author":"LP Kaelbling","year":"1998","journal-title":"Artificial Intelligence"},{"key":"pcbi.1013454.ref046","article-title":"Monte-Carlo planning in large POMDPs","volume":"23","author":"D Silver","year":"2010","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"pcbi.1013454.ref047","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TCIAIG.2012.2186810","article-title":"A survey of Monte Carlo tree search methods","volume":"4","author":"CB Browne","year":"2012","journal-title":"IEEE Transactions on Computational Intelligence and AI in Games"},{"issue":"4","key":"pcbi.1013454.ref048","doi-asserted-by":"crossref","DOI":"10.1016\/j.neuron.2020.02.023","article-title":"Tracking the mind\u2019s eye: primate gaze behavior during virtual visuomotor navigation reflects belief dynamics","volume":"106","author":"KJ Lakshminarasimhan","year":"2020","journal-title":"Neuron"},{"issue":"1","key":"pcbi.1013454.ref049","doi-asserted-by":"crossref","first-page":"4646","DOI":"10.1038\/s41467-019-12552-4","article-title":"The eighty five percent rule for optimal learning","volume":"10","author":"RC Wilson","year":"2019","journal-title":"Nat Commun"},{"key":"pcbi.1013454.ref050","unstructured":"Gerritsen R, Haak R. K9 scent training: a manual for training your identification, tracking and detection dog. Dog Training Press; 2015."},{"key":"pcbi.1013454.ref051","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017. https:\/\/arxiv.org\/abs\/1707.06347"},{"issue":"7126","key":"pcbi.1013454.ref052","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1038\/nature05464","article-title":"\u201cInfotaxis\u201d as a strategy for searching without gradients","volume":"445","author":"M Vergassola","year":"2007","journal-title":"Nature"},{"key":"pcbi.1013454.ref053","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1146\/annurev-conmatphys-031720-032754","article-title":"Olfactory sensing and navigation in turbulent environments","volume":"13","author":"G Reddy","year":"2022","journal-title":"Annual Review of Condensed Matter Physics"},{"issue":"1","key":"pcbi.1013454.ref054","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1037\/h0045345","article-title":"Pigeons in a pelican","volume":"15","author":"BF Skinner","year":"1960","journal-title":"American Psychologist"},{"issue":"7392","key":"pcbi.1013454.ref055","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1038\/nature10918","article-title":"Choice-specific sequences in parietal cortex during a virtual-navigation decision task","volume":"484","author":"CD Harvey","year":"2012","journal-title":"Nature"},{"issue":"12","key":"pcbi.1013454.ref056","doi-asserted-by":"crossref","first-page":"1672","DOI":"10.1038\/nn.4403","article-title":"History-dependent variability in population dynamics during evidence accumulation in cortex","volume":"19","author":"AS Morcos","year":"2016","journal-title":"Nat Neurosci"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1013454","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T00:00:00Z","timestamp":1758240000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013454","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T21:53:37Z","timestamp":1758318817000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013454"}},"subtitle":[],"editor":[{"given":"Matthieu","family":"Louis","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,9,12]]},"references-count":56,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9,12]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013454","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.12.03.569774","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,12]]}}}