{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T04:23:35Z","timestamp":1777523015350,"version":"3.51.4"},"reference-count":66,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T00:00:00Z","timestamp":1651017600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"CIFAR Canada AI Chair program"},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007631","name":"Canadian Institute for Advanced Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007631","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100017149","name":"DeepMind","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100017149","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NSERC Discovery grant program"},{"DOI":"10.13039\/501100000146","name":"Alberta Innovates - Technology Futures","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000146","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100013373","name":"Alberta Machine Intelligence Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100013373","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Adaptive Behavior"],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:p>We present three new diagnostic prediction problems inspired by classical-conditioning experiments to facilitate research in online prediction learning. Experiments in classical conditioning show that animals such as rabbits, pigeons, and dogs can make long temporal associations that enable multi-step prediction. To replicate this remarkable ability, an agent must construct an internal state representation that summarizes its interaction history. Recurrent neural networks can automatically construct state and learn temporal associations. However, the current training methods are prohibitively expensive for online prediction\u2014continual learning on every time step\u2014which is the focus of this paper. Our proposed problems test the learning capabilities that animals readily exhibit and highlight the limitations of the current recurrent learning methods. While the proposed problems are nontrivial, they are still amenable to extensive testing and analysis in the small-compute regime, thereby enabling researchers to study issues in isolation, ultimately accelerating progress towards scalable online representation learning methods.<\/jats:p>","DOI":"10.1177\/10597123221085039","type":"journal-article","created":{"date-parts":[[2022,4,27]],"date-time":"2022-04-27T15:31:06Z","timestamp":1651073466000},"page":"3-19","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["From eye-blinks to state construction: Diagnostic benchmarks for online representation learning"],"prefix":"10.1177","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4641-7349","authenticated-orcid":false,"given":"Banafsheh","family":"Rafiee","sequence":"first","affiliation":[{"name":"Department of Computing Science and the Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, AB, Canada"}]},{"given":"Zaheer","family":"Abbas","sequence":"additional","affiliation":[{"name":"DeepMind Alberta, Edmonton, AB, Canada"}]},{"given":"Sina","family":"Ghiassian","sequence":"additional","affiliation":[{"name":"Department of Computing Science and the Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, AB, Canada"}]},{"given":"Raksha","family":"Kumaraswamy","sequence":"additional","affiliation":[{"name":"Department of Computing Science and the Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, AB, Canada"}]},{"given":"Richard S","family":"Sutton","sequence":"additional","affiliation":[{"name":"Department of Computing Science and the Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, AB, Canada"},{"name":"DeepMind Alberta, Edmonton, AB, Canada"}]},{"given":"Elliot A","family":"Ludvig","sequence":"additional","affiliation":[{"name":"Department of Psychology, University of Warwick, Coventry, UK"}]},{"given":"Adam","family":"White","sequence":"additional","affiliation":[{"name":"Department of Computing Science and the Alberta Machine Intelligence Institute (Amii), University of Alberta, Edmonton, AB, Canada"},{"name":"DeepMind Alberta, Edmonton, AB, Canada"}]}],"member":"179","published-online":{"date-parts":[[2022,4,27]]},"reference":[{"key":"bibr1-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50013-X"},{"key":"bibr2-10597123221085039","unstructured":"Beattie C., Leibo J. Z., Teplyashin D., Ward T., Wainwright M., K\u00fcttler H., Lefrancq A., Green S., Vald\u00e9s V., Sadik A., Schrittwieser J., Anderson K., York S., Cant M., Cain A., Bolton A., Gaffney S., King H., Hassabis D., Petersen S. (2016). Deepmind lab. arXiv preprint arXiv:1612.03801."},{"key":"bibr3-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3912"},{"key":"bibr4-10597123221085039","unstructured":"Brockman G., Cheung V., Pettersson L., Schneider J., Schulman J., Tang J., Zaremba W. (2016). Openai gym. arXiv preprint arXiv:1606.01540."},{"key":"bibr5-10597123221085039","unstructured":"Chen L., Lu K., Rajeswaran A., Lee K., Grover A., Laskin M., Abbeel P., Srinivas A., Mordatch I. (2021). Decision transformer: Reinforcement learning via sequence modeling. arXiv preprint arXiv:2106.01345."},{"key":"bibr6-10597123221085039","doi-asserted-by":"crossref","unstructured":"Cho K., Van Merri\u00ebnboer B., Bahdanau D., Bengio Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.","DOI":"10.3115\/v1\/W14-4012"},{"key":"bibr7-10597123221085039","unstructured":"Colas C., Sigaud O., Oudeyer P. (2018). How many random seeds? statistical power analysis in deep reinforcement learning experiments. arXiv preprint arXiv:1806.08295."},{"key":"bibr8-10597123221085039","unstructured":"Dehghani M., Gouws S., Vinyals O., Uszkoreit J., Kaiser L. (2019). Universal transformers. In International conference on learning representations."},{"key":"bibr9-10597123221085039","volume-title":"Contemporary animal learning theory","volume":"1","author":"Dickinson A.","year":"1980"},{"key":"bibr10-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog1402_1"},{"key":"bibr11-10597123221085039","unstructured":"Engstrom L., Ilyas A., Santurkar S., Tsipras D., Janoos F., Rudolph L., Madry A. (2019). Implementation matters in deep rl: A case study on ppo and trpo. In International conference on learning representations."},{"key":"bibr12-10597123221085039","first-page":"1407","volume-title":"Proceedings of the 35th International conference on machine learning, proceedings of machine learning research","volume":"80","author":"Espeholt L.","year":"2018"},{"key":"bibr13-10597123221085039","unstructured":"Fortunato M., Tan M., Faulkner R., Hansen S., Badia A. P., Buttimore G., Deck C., Leibo J. Z., Blundell C. (2019). Generalization of reinforcement learners with working and episodic memory. In Advances in neural information processing systems (pp. 12469\u201312478)."},{"key":"bibr14-10597123221085039","volume-title":"Memory and the computational brain: Why cognitive science will transform neuroscience","volume":"6","author":"Gallistel C. R.","year":"2011"},{"key":"bibr15-10597123221085039","first-page":"1243","volume-title":"International conference on machine learning","author":"Gehring J.","year":"2017"},{"key":"bibr16-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1037\/0097-7403.34.4.494"},{"key":"bibr17-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"bibr18-10597123221085039","unstructured":"Hochreiter S., Bengio Y., Frasconi P., Schmidhuber J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In A field guide to dynamical recurrent neural networks."},{"key":"bibr19-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"bibr20-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.79.8.2554"},{"key":"bibr21-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1037\/a0033621"},{"key":"bibr22-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1037\/h0054032"},{"key":"bibr23-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013943"},{"key":"bibr24-10597123221085039","first-page":"1627","volume-title":"International conference on machine learning","author":"Jaderberg M.","year":"2017"},{"key":"bibr25-10597123221085039","unstructured":"Jaeger H. (2001). The \u201cecho state\u201d approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 148(34): 13."},{"key":"bibr26-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1037\/11059-000"},{"key":"bibr27-10597123221085039","unstructured":"Janner M., Li Q., Levine S. (2021). Reinforcement learning as one big sequence modeling problem. arXiv preprint arXiv:2106.02039."},{"key":"bibr28-10597123221085039","unstructured":"Kingma D. P., Ba J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980."},{"key":"bibr29-10597123221085039","first-page":"6404","volume-title":"International conference on machine learning","author":"Loynd R.","year":"2020"},{"key":"bibr30-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2008.11-07-654"},{"key":"bibr31-10597123221085039","doi-asserted-by":"publisher","DOI":"10.3758\/s13420-012-0082-6"},{"key":"bibr32-10597123221085039","unstructured":"Ludvig E. A., Sutton R. S., Verbeek E., Kehoe E. J. (2009). A computational model of hippocampal function in trace conditioning. In Advances in neural information processing systems (pp. 993\u20131000)."},{"key":"bibr33-10597123221085039","volume-title":"The Rescorla-Wagner drift-diffusion model","author":"Luzardo A.","year":"2018"},{"key":"bibr34-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1613\/jair.5699"},{"key":"bibr35-10597123221085039","volume-title":"The psychology of animal learning","author":"Mackintosh N. J.","year":"1974"},{"key":"bibr36-10597123221085039","unstructured":"Mahmood A. R., Sutton R. S. (2013). Representation search through generate and test. In Workshops at the Twenty-Seventh AAAI conference on artificial intelligence."},{"key":"bibr37-10597123221085039","unstructured":"Menick J., Elsen E., Evci U., Osindero S., Simonyan K., Graves A. (2020). Practical real time recurrent learning with a sparse approximation. In International conference on learning representations."},{"key":"bibr38-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1177\/1059712313511648"},{"issue":"4","key":"bibr39-10597123221085039","first-page":"349","volume":"3","author":"Mozer M. C.","year":"1989","journal-title":"Complex Systems"},{"key":"bibr40-10597123221085039","unstructured":"Nath S., Liu V., Chan A., Li X., White A., White M. (2019). Training recurrent neural networks online by learning explicit state variables. In International conference on learning representations."},{"key":"bibr41-10597123221085039","unstructured":"Obando-Ceron J. S., Castro P. S. (2020). Revisiting rainbow: Promoting more insightful and inclusive deep reinforcement learning research. arXiv preprint arXiv:2011.14826."},{"key":"bibr42-10597123221085039","unstructured":"Osband I., Doron Y., Hessel M., Aslanides J., Sezener E., Saraiva A., McKinney K., Lattimore T., Szepesvari C., Singh S., Roy B. V., Sutton R., Silver D., Hasselt H. V. (2020). Behaviour suite for reinforcement learning.In International conference on learning representations."},{"issue":"124","key":"bibr43-10597123221085039","first-page":"1","volume":"20","author":"Osband I.","year":"2019","journal-title":"Journal of Machine Learning Research"},{"key":"bibr44-10597123221085039","unstructured":"Parisotto E., Salakhutdinov R. (2021). Efficient transformers in reinforcement learning using actor-learner distillation. arXiv preprint arXiv:2104.01655."},{"key":"bibr45-10597123221085039","first-page":"7487","volume-title":"International conference on machine learning","author":"Parisotto E.","year":"2020"},{"key":"bibr46-10597123221085039","first-page":"7487","volume-title":"Proceedings of the 37th International conference on machine learning, proceedings of machine learning research","volume":"119","author":"Parisotto E.","year":"2020"},{"key":"bibr47-10597123221085039","volume-title":"Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex","volume":"3","author":"Pavlov I. P.","year":"1927"},{"key":"bibr48-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1007\/s00422-013-0575-1"},{"key":"bibr49-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1037\/h0023946"},{"key":"bibr50-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"bibr51-10597123221085039","first-page":"171","volume-title":"Proceedings of the Tenth National conference on artificial intelligence","author":"Sutton R. S.","year":"1992"},{"key":"bibr52-10597123221085039","first-page":"497","volume-title":"Learning and computational neuroscience: Foundations of adaptive networks","author":"Sutton R. S.","year":"1990"},{"key":"bibr53-10597123221085039","volume-title":"Reinforcement learning: An introduction","author":"Sutton R. S.","year":"2018"},{"key":"bibr54-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273606"},{"key":"bibr55-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50047-2"},{"key":"bibr56-10597123221085039","unstructured":"Tallec C., Ollivier Y. (2018). Unbiased online recurrent optimization. In International conference on learning representations."},{"key":"bibr57-10597123221085039","first-page":"5026","volume-title":"IEEE\/RSJ International conference on intelligent robots and systems","author":"Todorov E.","year":"2012"},{"key":"bibr58-10597123221085039","first-page":"5015","volume-title":"International conference on machine learning","author":"Tucker G.","year":"2018"},{"key":"bibr59-10597123221085039","unstructured":"van Hasselt H., Sutton R. S. (2015). Learning to predict independent of span. arXiv preprint arXiv:1508.04582."},{"key":"bibr60-10597123221085039","first-page":"177","volume-title":"Cognitive processes in animal behavior","author":"Wagner A. R.","year":"1978"},{"key":"bibr61-10597123221085039","unstructured":"Wayne G., Hung C., Amos D., Mirza M., Ahuja A., Grabska-Barwinska A., Rae J., Mirowski P., Leibo J. Z., Santoro A., Gemici M., Reynolds M., Harley T., Abramson J., Mohamed S., Rezende D., Saxton D., Cain A., Hillier C., Lillicrap T. (2018). Unsupervised predictive memory in a goal-directed agent. arXiv preprint arXiv:1803.10760."},{"key":"bibr62-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v31i2.2227"},{"key":"bibr63-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1990.2.4.490"},{"key":"bibr64-10597123221085039","doi-asserted-by":"publisher","DOI":"10.3758\/s13420-016-0240-3"},{"key":"bibr65-10597123221085039","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1989.1.2.270"},{"key":"bibr66-10597123221085039","unstructured":"Zhang S., Sutton R. S. (2017). A deeper look at experience replay. arXiv preprint arXiv:1712.01275."}],"container-title":["Adaptive Behavior"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10597123221085039","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10597123221085039","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10597123221085039","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T16:19:19Z","timestamp":1777393159000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10597123221085039"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,27]]},"references-count":66,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["10.1177\/10597123221085039"],"URL":"https:\/\/doi.org\/10.1177\/10597123221085039","relation":{},"ISSN":["1059-7123","1741-2633"],"issn-type":[{"value":"1059-7123","type":"print"},{"value":"1741-2633","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,27]]}}}