{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T18:26:38Z","timestamp":1776277598402,"version":"3.50.1"},"reference-count":108,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T00:00:00Z","timestamp":1672185600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T00:00:00Z","timestamp":1672185600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010663","name":"H2020 European Research Council","doi-asserted-by":"publisher","award":["ERC-StG-2015-ERC"],"award-info":[{"award-number":["ERC-StG-2015-ERC"]}],"id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008977","name":"Universit\u00e4t Ulm","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008977","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from a Bayesian perspective and show its connections to previous approaches to continual learning. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Equipped with a diverse and specialized set of parameters, each path can be regarded as a distinct sub-network that learns to solve tasks. To improve expert allocation, we introduce diversity objectives, which we evaluate in additional ablation studies. Importantly, our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method on continual reinforcement learning and variants of the MNIST, CIFAR-10, and CIFAR-100 datasets.<\/jats:p>","DOI":"10.1007\/s10994-022-06283-9","type":"journal-article","created":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T20:02:46Z","timestamp":1672257766000},"page":"655-686","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Hierarchically structured task-agnostic continual learning"],"prefix":"10.1007","volume":"112","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3244-3661","authenticated-orcid":false,"given":"Heinke","family":"Hihn","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel A.","family":"Braun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,12,28]]},"reference":[{"key":"6283_CR1","unstructured":"Ahn, H., Cha, S., Lee, D., & Moon, T. (2019). Uncertainty-based continual learning with adaptive regularization. In Proceedings of the 33rd international conference on neural information processing systems (pp. 4392\u20134402)."},{"key":"6283_CR2","unstructured":"Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P., & Zaremba, W. (2017). Hindsight experience replay. In Proceedings of the 31st international conference on neural information processing systems (pp. 5055\u20135065)."},{"key":"6283_CR3","doi-asserted-by":"crossref","unstructured":"Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 1152\u20131174.","DOI":"10.1214\/aos\/1176342871"},{"key":"6283_CR4","unstructured":"Arumugam, D., Henderson, P., & Bacon, P.-L. (2020). An information-theoretic perspective on credit assignment in reinforcement learning. In Workshop on biological and artificial reinforcement learning (NeurIPS 2020)."},{"key":"6283_CR5","doi-asserted-by":"crossref","unstructured":"Bang, J., Kim, H., Yoo, Y., Ha, J.-W., & Choi, J. (2021). Rainbow memory: Continual learning with a memory of diverse samples. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 8218\u20138227).","DOI":"10.1109\/CVPR46437.2021.00812"},{"key":"6283_CR6","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1613\/jair.1.11432","volume":"68","author":"D Benavides-Prado","year":"2020","unstructured":"Benavides-Prado, D., Koh, Y. S., & Riddle, P. (2020). Towards knowledgeable supervised lifelong learning systems. Journal of Artificial Intelligence Research, 68, 159\u2013224.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6283_CR7","doi-asserted-by":"crossref","unstructured":"Bian, Y., & Chen, H. (2021). When does diversity help generalization in classification ensembles. IEEE Transactions on Cybernetics.","DOI":"10.1109\/TCYB.2021.3053165"},{"key":"6283_CR8","doi-asserted-by":"crossref","unstructured":"Biesialska, M., Biesialska, K., & Costa-juss\u00e0, M. R. (2020). Continual lifelong learning in natural language processing: A survey. In Proceedings of the 28th international conference on computational linguistics (pp. 6523\u20136541).","DOI":"10.18653\/v1\/2020.coling-main.574"},{"key":"6283_CR9","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1023\/A:1018054314350","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123\u2013140. https:\/\/doi.org\/10.1023\/A:1018054314350.","journal-title":"Machine Learning"},{"key":"6283_CR10","unstructured":"Cha, S., Hsu, H., Hwang, T., Calmon, F., & Moon, T. (2020). Cpr: Classifier-projection regularization for continual learning. In International conference on learning representations."},{"key":"6283_CR11","doi-asserted-by":"crossref","unstructured":"Chaudhry, A., Dokania, P. K., Ajanthan, T., & Torr, P. H. (2018). Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV) (pp. 532\u2013547).","DOI":"10.1007\/978-3-030-01252-6_33"},{"key":"6283_CR12","doi-asserted-by":"crossref","unstructured":"Chaudhry, A., Gordo, A., Dokania, P., Torr, P., & Lopez-Paz, D. (2021). Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 6993\u20137001).","DOI":"10.1609\/aaai.v35i8.16861"},{"key":"6283_CR13","unstructured":"Chaudhry, A., Ranzato, M., Rohrbach, M., & Elhoseiny, M. (2018). Efficient lifelong learning with a-gem. In International conference on learning representations."},{"key":"6283_CR14","unstructured":"Collier, M., Kokiopoulou, E., Gesmundo, A., & Berent, J. (2020). Routing networks with co-training for continual learning. In ICML 2020 workshop on continual learning."},{"key":"6283_CR15","unstructured":"Coumans, E., & Bai, Y. (2016\u20132021). PyBullet, a Python module for physics simulation for games, robotics and machine learning. http:\/\/pybullet.org"},{"key":"6283_CR16","volume-title":"Elements of information theory","author":"TM Cover","year":"2012","unstructured":"Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. Wiley."},{"key":"6283_CR17","doi-asserted-by":"crossref","unstructured":"Dai, T., Liu, H., Arulkumaran, K., Ren, G., & Bharath, A. A. (2021). Diversity-based trajectory and goal selection with hindsight experience replay. In Pacific rim international conference on artificial intelligence (pp. 32\u201345). Springer.","DOI":"10.1007\/978-3-030-89370-5_3"},{"issue":"7","key":"6283_CR18","first-page":"3366","volume":"44","author":"M De Lange","year":"2021","unstructured":"De Lange, M., Aljundi, R., Masana, M., Parisot, S., Jia, X., Leonardis, A., et al. (2021). A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7), 3366\u20133385.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"6283_CR19","unstructured":"Ebrahimi, S., Elhoseiny, M., Darrell, T., & Rohrbach, M. (2020). Uncertainty-guided continual learning with bayesian neural networks. In International conference on learning representations."},{"key":"6283_CR20","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1007\/978-3-030-27202-9_41","volume-title":"Image analysis and recognition","author":"A El Khatib","year":"2019","unstructured":"El Khatib, A., & Karray, F. (2019). Strategies for improving single-head continual learning performance. In F. Karray, A. Campilho, & A. Yu (Eds.), Image analysis and recognition (pp. 452\u2013460). Cham: Springer."},{"key":"6283_CR21","unstructured":"Ellenberger, B. (2018\u20132019). PyBullet Gymperium. https:\/\/www.github.com\/benelot\/pybullet-gym"},{"key":"6283_CR22","unstructured":"Eysenbach, B., Gupta, A., Ibarz, J., & Levine, S. (2018). Diversity is all you need: Learning skills without a reward function. In International conference on learning representations."},{"key":"6283_CR23","unstructured":"Farquhar, S., & Gal, Y. (2018). Towards robust evaluations of continual learning. In Lifelong learning: A reinforcement learning approach (ICML 2018)."},{"key":"6283_CR24","unstructured":"Fernando, C., Banarse, D., Blundell, C., Zwols, Y., Ha, D., Rusu, A. A., Pritzel, A., & Wierstra, D. (2017). Pathnet: Evolution channels gradient descent in super neural networks. arXiv:1701.08734"},{"issue":"4","key":"6283_CR25","doi-asserted-by":"publisher","first-page":"955","DOI":"10.1162\/089976600300015664","volume":"12","author":"JD Freitas","year":"2000","unstructured":"Freitas, J. D., Niranjan, M., Gee, A. H., & Doucet, A. (2000). Sequential monte Carlo methods to train neural network models. Neural Computation, 12(4), 955\u2013993.","journal-title":"Neural Computation"},{"key":"6283_CR26","doi-asserted-by":"crossref","unstructured":"Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., & Carin, L. (2019). Cyclical annealing schedule: A simple approach to mitigating kl vanishing. In Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: Human language technologies, Volume 1 (long and short papers) (pp. 240\u2013250).","DOI":"10.18653\/v1\/N19-1021"},{"key":"6283_CR27","unstructured":"Gal, Y., & Ghahramani, Z. (2016). Bayesian convolutional neural networks with Bernoulli approximate variational inference. In ICLR 2016 workshop track."},{"key":"6283_CR28","unstructured":"Galashov, A., Jayakumar, S. M., Hasenclever, L., Tirumala, D., Schwarz, J., Desjardins, G., Czarnecki, W. M., Teh, Y. W., Pascanu, R., & Heess, N. (2019). Information asymmetry in kl-regularized rl. In Proceedings of the international conference on representation learning"},{"key":"6283_CR29","doi-asserted-by":"publisher","first-page":"27","DOI":"10.3389\/frobt.2015.00027","volume":"2","author":"T Genewein","year":"2015","unstructured":"Genewein, T., Leibfried, F., Grau-Moya, J., & Braun, D. A. (2015). Bounded rationality, abstraction, and hierarchical decision-making: An information-theoretic optimality principle. Frontiers in Robotics and AI, 2, 27.","journal-title":"Frontiers in Robotics and AI"},{"key":"6283_CR31","unstructured":"Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., & Levine, S. (2018). Divide-and-conquer reinforcement learning. In International conference on learning representations."},{"key":"6283_CR30","unstructured":"Ghosh, P., Sajjadi, M. S., Vergari, A., Black, M., & Scholkopf, B. (2019). From variational to deterministic autoencoders. In International conference on learning representations."},{"key":"6283_CR32","unstructured":"Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Teh, Y. W., & Titterington, M. (Eds.), Proceedings of the thirteenth international conference on artificial intelligence and statistics. proceedings of machine learning research (Vol. 9, pp. 249\u2013256). PMLR, Chia Laguna Resort, Sardinia, Italy. https:\/\/proceedings.mlr.press\/v9\/glorot10a.html"},{"key":"6283_CR33","unstructured":"Golkar, S., Kagan, M., & Cho, K. (2019). Continual learning via neural pruning. In NeurIPS 2019 workshop neuro AI."},{"key":"6283_CR34","unstructured":"Grau-Moya, J., Leibfried, F., & Vrancx, P. (2019). Soft q-learning with mutual-information regularization. In Proceedings of the international conference on learning representations."},{"key":"6283_CR35","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861\u20131870)."},{"key":"6283_CR36","doi-asserted-by":"crossref","unstructured":"Hadjeres, G., Nielsen, F., & Pachet, F. (2017). Glsr-vae: Geodesic latent space regularization for variational autoencoder architectures. In 2017 IEEE symposium series on computational intelligence (SSCI) (pp. 1\u20137). IEEE.","DOI":"10.1109\/SSCI.2017.8280895"},{"key":"6283_CR37","doi-asserted-by":"crossref","unstructured":"Han, X., & Guo, Y. (2021a). Continual learning with dual regularizations. In Joint European conference on machine learning and knowledge discovery in databases (pp. 619\u2013634). Springer.","DOI":"10.1007\/978-3-030-86486-6_38"},{"key":"6283_CR38","unstructured":"Han, X., & Guo, Y. (2021b). Contrastive continual learning with feature propagation. arXiv:2112.01713"},{"key":"6283_CR40","doi-asserted-by":"crossref","unstructured":"He, J., & Zhu, F. (2022). Online continual learning via candidates voting. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp. 3154\u20133163).","DOI":"10.1109\/WACV51458.2022.00136"},{"key":"6283_CR39","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J.(2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026\u20131034).","DOI":"10.1109\/ICCV.2015.123"},{"key":"6283_CR41","doi-asserted-by":"crossref","unstructured":"Hihn, H., Gottwald, S., & Braun, D. A. (2019). An information-theoretic on-line learning principle for specialization in hierarchical decision-making systems. In 2019 IEEE 58th conference on decision and control (CDC) (pp. 3677\u20133684). IEEE.","DOI":"10.1109\/CDC40024.2019.9029255"},{"key":"6283_CR42","doi-asserted-by":"crossref","unstructured":"Hihn, H., Gottwald, S., Braun, & D. A. (2018). Bounded rational decision-making with adaptive neural network priors. In IAPR workshop on artificial neural networks in pattern recognition (pp. 213\u2013225). Springer.","DOI":"10.1007\/978-3-319-99978-4_17"},{"key":"6283_CR43","unstructured":"Hihn, H.,Braun, & D. A. (2020a). Hierarchical expert networks for meta-learning. In 4th ICML workshop on life long machine learning."},{"issue":"3","key":"6283_CR44","doi-asserted-by":"publisher","first-page":"2319","DOI":"10.1007\/s11063-020-10351-3","volume":"52","author":"H Hihn","year":"2020","unstructured":"Hihn, H., & Braun, D. A. (2020b). Specialization in hierarchical learning systems. Neural Processing Letters, 52(3), 2319\u20132352.","journal-title":"Neural Processing Letters"},{"key":"6283_CR45","unstructured":"Hsu, Y.-C., Liu, Y.-C., Ramasamy, A., & Kira, Z. (2018). Re-evaluating continual learning scenarios: A categorization and case for strong baselines. In Continual learning workshop, 32nd conference on neural information processing systems."},{"issue":"1","key":"6283_CR46","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1162\/neco.1991.3.1.79","volume":"3","author":"RA Jacobs","year":"1991","unstructured":"Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79\u201387.","journal-title":"Neural Computation"},{"key":"6283_CR47","unstructured":"Jerfel, G., Grant, E., Griffiths, T., & Heller, K. A. (2019). Reconciling meta-learning and continual learning with online mixtures of tasks. In Advances in neural information processing systems (pp. 9122\u20139133)."},{"key":"6283_CR48","first-page":"3647","volume":"33","author":"S Jung","year":"2020","unstructured":"Jung, S., Ahn, H., Cha, S., & Moon, T. (2020). Continual learning with node-importance based adaptive group sparse regularization. Advances in Neural Information Processing Systems, 33, 3647\u20133658.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6283_CR49","unstructured":"Kao, T.-C., Jensen, K., van\u00a0de Ven, G., Bernacchia, A., & Hennequin, G. (2021). Natural continual learning: Success is a journey, not (just) a destination. Advances in Neural Information Processing Systems, 34."},{"key":"6283_CR50","unstructured":"Kessler, S., Nguyen, V., Zohren, S., & Roberts, S. J. (2021). Hierarchical Indian buffet neural networks for Bayesian continual learning. In Uncertainty in artificial intelligence (pp. 749\u2013759). PMLR."},{"key":"6283_CR51","unstructured":"Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd international conference on learning representations."},{"issue":"13","key":"6283_CR52","doi-asserted-by":"publisher","first-page":"3521","DOI":"10.1073\/pnas.1611835114","volume":"114","author":"J Kirkpatrick","year":"2017","unstructured":"Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521\u20133526.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"6283_CR53","first-page":"14374","volume":"33","author":"J Kj","year":"2020","unstructured":"Kj, J., & Balasubramanian, N. V. (2020). Meta-consolidation for continual learning. Advances in Neural Information Processing Systems, 33, 14374\u201314386.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"2\u20133","key":"6283_CR54","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1561\/2200000044","volume":"5","author":"A Kulesza","year":"2012","unstructured":"Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends in Machine Learning, 5(2\u20133), 123\u2013286.","journal-title":"Foundations and Trends in Machine Learning"},{"key":"6283_CR55","doi-asserted-by":"publisher","DOI":"10.1002\/0471660264","volume-title":"Combining pattern classifiers: Methods and algorithms","author":"LI Kuncheva","year":"2004","unstructured":"Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. Wiley."},{"issue":"2","key":"6283_CR56","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1023\/A:1022859003006","volume":"51","author":"LI Kuncheva","year":"2003","unstructured":"Kuncheva, L. I., & Whitaker, C. J. (2003). Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2), 181\u2013207.","journal-title":"Machine Learning"},{"key":"6283_CR57","unstructured":"Lee, S., Ha, J., Zhang, D., & Kim, G. (2020). A neural dirichlet process mixture model for task-free continual learning. In International conference on learning representations. https:\/\/openreview.net\/forum?id=SJxSOJStPr"},{"key":"6283_CR58","unstructured":"Leibfried, F., & Grau-Moya, J. (2019). Mutual-information regularization in Markov decision processes and actor-critic learning. In Proceedings of the conference on robot learning."},{"key":"6283_CR59","unstructured":"Li, H., Krishnan, A., Wu, J., Kolouri, S., Pilly, P. K., & Braverman, V. (2021). Lifelong learning with sketched structural regularization. In Asian conference on machine learning (pp. 985\u20131000). PMLR."},{"issue":"12","key":"6283_CR60","doi-asserted-by":"publisher","first-page":"2935","DOI":"10.1109\/TPAMI.2017.2773081","volume":"40","author":"Z Li","year":"2017","unstructured":"Li, Z., & Hoiem, D. (2017). Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2935\u20132947. https:\/\/doi.org\/10.1109\/TPAMI.2017.2773081.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"6283_CR61","unstructured":"Lin, M., Fu, J., & Bengio, Y. (2019). Conditional computation for continual learning. In NeurIPS 2018 continual learning workshop."},{"key":"6283_CR62","doi-asserted-by":"crossref","unstructured":"Liu, Y., Su, Y., Liu, A. -A., Schiele, B., & Sun, Q. (2020). Mnemonics training: Multi-class incremental learning without forgetting. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 12245\u201312254).","DOI":"10.1109\/CVPR42600.2020.01226"},{"key":"6283_CR63","unstructured":"Lopez-Paz, D., & Ranzato, M. (2017). Gradient episodic memory for continual learning. In Advances in neural information processing systems (pp. 6467\u20136476)."},{"key":"6283_CR64","unstructured":"Lupu, A.,Cui, B., Hu, H., & Foerster, J. (2021). Trajectory diversity for zero-shot coordination. In International conference on machine learning (pp. 7204\u20137213). PMLR."},{"key":"6283_CR65","unstructured":"Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the Icml (Vol. 30, p. 3)."},{"issue":"1","key":"6283_CR66","doi-asserted-by":"publisher","first-page":"83","DOI":"10.2307\/1425855","volume":"7","author":"O Macchi","year":"1975","unstructured":"Macchi, O. (1975). The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), 83\u2013122.","journal-title":"Advances in Applied Probability"},{"key":"6283_CR67","doi-asserted-by":"crossref","unstructured":"Mazur, M., Pustelnik, \u0141., Knop, S., Pagacz, P., & Spurek, P. (2021). Target layer regularization for continual learning using cramer-wold generator. arXiv:2111.07928","DOI":"10.1016\/j.ins.2022.07.085"},{"key":"6283_CR68","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/S0079-7421(08)60536-8","volume":"24","author":"M McCloskey","year":"1989","unstructured":"McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation, 24, 109\u2013165. https:\/\/doi.org\/10.1016\/S0079-7421(08)60536-8.","journal-title":"Psychology of Learning and Motivation"},{"key":"6283_CR69","doi-asserted-by":"crossref","unstructured":"Narkhede, M. V., Bartakke, P. P., & Sutaone, M. S. (2021). A review on weight initialization strategies for neural networks. Artificial Intelligence Review, 1\u201332.","DOI":"10.1007\/s10462-021-10033-z"},{"key":"6283_CR70","unstructured":"Nguyen, C. V., Li, Y., Bui, T. D., & Turner, R. E. (2017). Variational continual learning. In Proceedings of the international conference on representation learning."},{"key":"6283_CR71","doi-asserted-by":"crossref","unstructured":"Ostapenko, O., Puscas, M., Klein, T., Jahnichen, P., & Nabi, M. (2019). Learning to remember: A synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 11321\u201311329).","DOI":"10.1109\/CVPR.2019.01158"},{"key":"6283_CR72","unstructured":"Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2020). Learning latent space energy-based prior model. Advances in Neural Information Processing Systems33."},{"key":"6283_CR73","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1016\/j.neunet.2019.01.012","volume":"113","author":"GI Parisi","year":"2019","unstructured":"Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural Networks, 113, 54\u201371.","journal-title":"Neural Networks"},{"key":"6283_CR74","unstructured":"Parker-Holder, J., Pacchiano, A., Choromanski, K. M., & Roberts, S. J. (2020). Effective diversity in population based reinforcement learning. Advances in Neural Information Processing Systems, 33."},{"key":"6283_CR75","unstructured":"Raghavan, K., & Balaprakash, P. (2021). Formalizing the generalization-forgetting trade-off in continual learning. Advances in Neural Information Processing Systems, 34."},{"key":"6283_CR76","unstructured":"Rao, D., Visin, F., Rusu, A., Pascanu, R., Teh, Y. W., & Hadsell, R. (2019). Continual unsupervised representation learning. Advances in Neural Information Processing Systems, 32."},{"issue":"6","key":"6283_CR77","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1016\/j.conb.2007.11.007","volume":"17","author":"B Rasch","year":"2007","unstructured":"Rasch, B., & Born, J. (2007). Maintaining memories by reactivation. Current Opinion in Neurobiology, 17(6), 698\u2013703.","journal-title":"Current Opinion in Neurobiology"},{"key":"6283_CR78","doi-asserted-by":"crossref","unstructured":"Rebuffi, S. -A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017) icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2001\u20132010).","DOI":"10.1109\/CVPR.2017.587"},{"key":"6283_CR79","unstructured":"Rothfuss, J., Lee, D., Clavera, I., Asfour, T., & Abbeel, P. (2018). Promp: Proximal meta-policy search. In International conference on learning representations."},{"key":"6283_CR80","unstructured":"Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., & Hadsell, R. (2016). Progressive neural networks. In NIPS deep learning symposium."},{"key":"6283_CR81","first-page":"2234","volume":"29","author":"T Salimans","year":"2016","unstructured":"Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. Advances in Neural Information Processing Systems, 29, 2234\u20132242.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6283_CR82","unstructured":"Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv:1511.05952"},{"key":"6283_CR83","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347"},{"key":"6283_CR84","unstructured":"Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., & Dean, J. (2017). Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the international conference on learning representations (ICLR)."},{"key":"6283_CR85","unstructured":"Shin, H., Lee, J. K., Kim, J., & Kim, J. (2017). Continual learning with deep generative replay. In Advances in neural information processing systems (pp. 2990\u20132999)."},{"key":"6283_CR86","unstructured":"Sokar, G., Mocanu, D. C., & Pechenizkiy, M. (2021). Self-attention meta-learner for continual learning. In Proceedings of the 20th international conference on autonomous agents and multiagent systems (pp. 1658\u20131660)."},{"issue":"1","key":"6283_CR87","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929\u20131958.","journal-title":"The Journal of Machine Learning Research"},{"key":"6283_CR88","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1\u20139).","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"6283_CR89","unstructured":"Tensorflow 2.0 Documentation (2022). https:\/\/www.tensorflow.org\/addons\/api_docs\/python\/tfa\/optimizers\/LazyAdam"},{"key":"6283_CR90","doi-asserted-by":"crossref","unstructured":"Thiam, P.,Hihn, H., Braun, D. A., Kestler, H. A., & Schwenker, F. (2021). Multi-modal pain intensity assessment based on physiological signals: A deep learning perspective. Frontiers in Physiology, 12.","DOI":"10.3389\/fphys.2021.720464"},{"key":"6283_CR91","doi-asserted-by":"crossref","unstructured":"Thrun, S. (1998). Lifelong learning algorithms. In Learning to learn (pp. 181\u2013209). Springer.","DOI":"10.1007\/978-1-4615-5529-2_8"},{"key":"6283_CR92","unstructured":"Tsai, Y.-H. H., Wu, Y., Salakhutdinov, R., & Morency, L.-P. (2021). Self-supervised learning from a multi-view perspective. In International conference on learning representations."},{"key":"6283_CR93","unstructured":"Vahdat, A., & Kautz, J.: Nvae: A deep hierarchical variational autoencoder. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 19667\u201319679). Curran Associates, Inc., Online Conference (2020). https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/e3b21256183cf7c2c7a66be163579d37-Paper.pdf"},{"issue":"1","key":"6283_CR94","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-020-17866-2","volume":"11","author":"GM van de Ven","year":"2020","unstructured":"van de Ven, G. M., Siegelmann, H. T., & Tolias, A. S. (2020). Brain-inspired replay for continual learning with artificial neural networks. Nature Communications, 11(1), 1\u201314.","journal-title":"Nature Communications"},{"key":"6283_CR96","unstructured":"van\u00a0de Ven, G. M., & Tolias, A. S. (2018). Generative replay with feedback connections as a general strategy for continual learning. arXiv:1809.10635"},{"issue":"5","key":"6283_CR95","doi-asserted-by":"publisher","first-page":"968","DOI":"10.1016\/j.neuron.2016.10.020","volume":"92","author":"GM van de Ven","year":"2016","unstructured":"van de Ven, G. M., Trouche, S., McNamara, C. G., Allen, K., & Dupret, D. (2016). Hippocampal offline reactivation consolidates recently formed cell assembly patterns during sharp wave-ripples. Neuron, 92(5), 968\u2013974.","journal-title":"Neuron"},{"key":"6283_CR97","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1007\/978-3-030-92600-7_15","volume-title":"Computational intelligence in data science","author":"M Vijayan","year":"2021","unstructured":"Vijayan, M., & Sridhar, S. S. (2021). Continual learning for classification problems: A survey. In V. Krishnamurthy, S. Jaganathan, K. Rajaram, & S. Shunmuganathan (Eds.), Computational intelligence in data science (pp. 156\u2013166). Springer."},{"key":"6283_CR99","doi-asserted-by":"crossref","unstructured":"Wang, H.-n., Liu, N., Zhang, Y.-y., Feng, D.-w., Huang, F., Li, D.-s., & Zhang, Y.-m. (2020). Deep reinforcement learning: A survey. Frontiers of Information Technology and Electronic Engineering, 1\u201319.","DOI":"10.1631\/FITEE.1900533"},{"key":"6283_CR98","doi-asserted-by":"crossref","unstructured":"Wang, S., Li, X., Sun, J., & Xu, Z. (2021). Training networks in null space of feature covariance for continual learning. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 184\u2013193).","DOI":"10.1109\/CVPR46437.2021.00025"},{"key":"6283_CR100","unstructured":"Wen, Y., Vicol, P., Ba, J., Tran, D., & Grosse, R. (2018). Flipout: Efficient pseudo-independent weight perturbations on mini-batches. In International conference on learning representations."},{"issue":"5172","key":"6283_CR101","doi-asserted-by":"publisher","first-page":"676","DOI":"10.1126\/science.8036517","volume":"265","author":"MA Wilson","year":"1994","unstructured":"Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of hippocampal ensemble memories during sleep. Science, 265(5172), 676\u2013679.","journal-title":"Science"},{"key":"6283_CR102","unstructured":"Yao, H., Wei, Y., Huang, J., & Li, Z. (2019). Hierarchically structured meta-learning. In Proceedings of the international conference on machine learning (pp. 7045\u20137054)."},{"key":"6283_CR103","unstructured":"Yoon, J., Yang, E., Lee, J., & Hwang, S. J. (2018). Lifelong learning with dynamically expandable networks. In 6th International conference on learning representations, ICLR 2018."},{"key":"6283_CR104","doi-asserted-by":"crossref","unstructured":"Zacarias, A., & Alexandre, L. A. (2018). Sena-cnn: Overcoming catastrophic forgetting in convolutional neural networks by selective network augmentation. In IAPR workshop on artificial neural networks in pattern recognition (pp. 102\u2013112). Springer.","DOI":"10.1007\/978-3-319-99978-4_8"},{"issue":"8","key":"6283_CR105","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1038\/s42256-019-0080-x","volume":"1","author":"G Zeng","year":"2019","unstructured":"Zeng, G., Chen, Y., Cui, B., & Yu, S. (2019). Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8), 364\u2013372.","journal-title":"Nature Machine Intelligence"},{"key":"6283_CR106","first-page":"3987","volume":"70","author":"F Zenke","year":"2017","unstructured":"Zenke, F., Poole, B., & Ganguli, S. (2017). Continual learning through synaptic intelligence. Proceedings of Machine Learning Research, 70, 3987.","journal-title":"Proceedings of Machine Learning Research"},{"key":"6283_CR107","doi-asserted-by":"crossref","unstructured":"Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M., & Mori, G. (2019). Lifelong gan: Continual learning for conditional image generation. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 2759\u20132768).","DOI":"10.1109\/ICCV.2019.00285"},{"key":"6283_CR108","unstructured":"Zhang, G., Sun, S., Duvenaud, D., & Grosse, R. (2018). Noisy natural gradient as variational inference. In International conference on machine learning (pp. 5852\u20135861). PMLR."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06283-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-022-06283-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06283-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T07:08:31Z","timestamp":1692860911000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-022-06283-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,28]]},"references-count":108,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["6283"],"URL":"https:\/\/doi.org\/10.1007\/s10994-022-06283-9","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,28]]},"assertion":[{"value":"17 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 November 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 December 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Open Source code implementing MoVE Layers is available under .","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}