{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T13:12:00Z","timestamp":1775913120677,"version":"3.50.1"},"reference-count":358,"publisher":"Emerald","issue":"2-3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,4,3]]},"abstract":"<jats:p>While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the limited generality of the policies it produces. A promising approach for alleviating these limitations is to cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible. In this survey, we describe the meta-RL problem setting in detail as well as its major variations. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. Using these clusters, we then survey meta-RL algorithms and applications. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.<\/jats:p>","DOI":"10.1561\/2200000080","type":"journal-article","created":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T04:50:49Z","timestamp":1743655849000},"page":"224-384","source":"Crossref","is-referenced-by-count":26,"title":["A Tutorial on Meta-Reinforcement Learning"],"prefix":"10.1561","volume":"18","author":[{"given":"Jacob","family":"Beck","sequence":"first","affiliation":[{"name":"University of Oxford ,","place":["UK"]}]},{"given":"Risto","family":"Vuorio","sequence":"additional","affiliation":[{"name":"University of Oxford ,","place":["UK"]}]},{"given":"Evan Zheran","family":"Liu","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"Zheng","family":"Xiong","sequence":"additional","affiliation":[{"name":"University of Oxford ,","place":["UK"]}]},{"given":"Luisa","family":"Zintgraf","sequence":"additional","affiliation":[{"name":"University of Oxford ,","place":["UK"]}]},{"given":"Chelsea","family":"Finn","sequence":"additional","affiliation":[{"name":"Stanford University ,","place":["USA"]}]},{"given":"Shimon","family":"Whiteson","sequence":"additional","affiliation":[{"name":"University of Oxford ,","place":["UK"]}]}],"member":"140","published-online":{"date-parts":[[2025,4,3]]},"reference":[{"key":"2026033012321563800_ref001","first-page":"20095","article-title":"Flambe: Structural complexity and representation learning of low rank mdps","volume":"33","author":"Agarwal","year":"2020","journal-title":"Advances in neural information processing systems."},{"key":"2026033012321563800_ref002","volume-title":"Dis-tributionally Adaptive Meta Reinforcement Learning","author":"Ajay","year":"2022"},{"key":"2026033012321563800_ref003","volume-title":"Solving rubik\u2019s cube with a robot hand","author":"Akkaya","year":"2019"},{"key":"2026033012321563800_ref004","first-page":"73","volume-title":"Learning for Dynamics and Control.","author":"Akuzawa","year":"2021"},{"key":"2026033012321563800_ref005","volume-title":"Deep Variational Information Bottleneck","author":"Alemi","year":"2017"},{"key":"2026033012321563800_ref006","volume-title":"Meta-learning curiosity algorithms","author":"Alet","year":"2020"},{"key":"2026033012321563800_ref007","volume-title":"A generalizable approach to learning optimizers","author":"Almeida","year":"2021"},{"key":"2026033012321563800_ref008","volume-title":"A Brief Look at Generalization in Visual Meta-Reinforcement Learning","author":"Alver","year":"2020"},{"key":"2026033012321563800_ref009","volume-title":"An automated measure of mdp similarity for transfer in reinforcement learning","author":"Ammar","year":"2014"},{"key":"2026033012321563800_ref010","article-title":"Learning to learn by gradient descent by gradient descent","volume-title":"Advances in neural information processing systems.","author":"Andrychowicz","year":"2016"},{"key":"2026033012321563800_ref011","first-page":"4568","volume-title":"Meta-Learning for Fast Adaptive Locomotion with Uncertainties in Environments and Robot Dynamics","author":"Anne","year":"2021"},{"key":"2026033012321563800_ref012","doi-asserted-by":"publisher","first-page":"2725","DOI":"10.1109\/ICRA40945.2020.9196540","volume-title":"Meta Reinforcement Learning for Sim-to-real Domain Adaptation","author":"Arndt","year":"2020"},{"key":"2026033012321563800_ref013","doi-asserted-by":"crossref","unstructured":"Arumugam, D. and S.Singh. (2022). \u201cPlanning to the Information Horizon of BAMDPs via Epistemic State Abstraction\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. url: https:\/\/openreview.net\/forum?id=7eUOC9fEIRO.","DOI":"10.52202\/068431-1489"},{"key":"2026033012321563800_ref014","volume-title":"A survey on intrinsic motivation in reinforcement learning","author":"Aubret","year":"2019"},{"key":"2026033012321563800_ref015","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1613\/jair.731","article-title":"A model of inductive bias learning","volume":"12","author":"Baxter","year":"2000","journal-title":"Journal of artificial intelligence research."},{"key":"2026033012321563800_ref016","first-page":"4161","volume-title":"Meta Learning via Learned Loss","author":"Bechtle","year":"2021"},{"key":"2026033012321563800_ref017","volume-title":"AMRL: Aggregated Memory For Reinforcement Learning","author":"Beck","year":"2020"},{"key":"2026033012321563800_ref018","article-title":"Hypernetworks in Meta-Reinforcement Learning","volume-title":"CoRL.","author":"Beck","year":"2022"},{"key":"2026033012321563800_ref019","volume-title":"SplAgger: Split Aggregation for Meta-Reinforcement Learning","author":"Beck","year":"2024"},{"key":"2026033012321563800_ref020","volume-title":"Metalic: Meta-Learning In-Context with Protein Language Models","author":"Beck","year":"2025"},{"key":"2026033012321563800_ref021","doi-asserted-by":"crossref","DOI":"10.52202\/075280-2714","volume-title":"Recurrent Hypernetworks are Surprisingly Strong in Meta-RL","author":"Beck","year":"2023"},{"issue":"2","key":"2026033012321563800_ref022","doi-asserted-by":"crossref","first-page":"1471","DOI":"10.1109\/LRA.2021.3057046","article-title":"Model-based meta-reinforcement learning for flight with suspended payloads","volume":"6","author":"Belkhale","year":"2021","journal-title":"IEEE Robotics and Automation Letters."},{"key":"2026033012321563800_ref023","article-title":"Long short-term memory and Learning-to-learn in networks of spiking neurons","volume-title":"NeurlPS.","author":"Bellec","year":"2018"},{"issue":"7836","key":"2026033012321563800_ref024","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1038\/s41586-020-2939-8","article-title":"Autonomous navigation of stratospheric balloons using reinforcement learning","volume":"588","author":"Bellemare","year":"2020","journal-title":"Nature."},{"key":"2026033012321563800_ref025","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1613\/jair.3912","article-title":"The arcade learning environment: An evaluation platform for general agents","volume":"47","author":"Bellemare","year":"2013","journal-title":"Journal of Artificial Intelligence Research."},{"key":"2026033012321563800_ref026","volume-title":"CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning","author":"Benjamins","year":"2021"},{"key":"2026033012321563800_ref027","volume-title":"Comps: Continual meta policy search","author":"Berseth","year":"2021"},{"key":"2026033012321563800_ref028","doi-asserted-by":"crossref","DOI":"10.1109\/LRA.2020.2977835","article-title":"Learning One-Shot Imitation from Humans without Humans","volume-title":"CVPR.","author":"Bonardi","year":"2020"},{"key":"2026033012321563800_ref029","volume-title":"One Step at a Time: Pros and Cons of Multi-Step Meta-Gradient Reinforcement Learning","author":"Bonnet","year":"2021"},{"key":"2026033012321563800_ref030","volume-title":"Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function","author":"Bonnet","year":"2022"},{"key":"2026033012321563800_ref031","first-page":"2122","volume-title":"Differentiable Meta-Learning of Bandit Policies","author":"Boutilier","year":"2020"},{"key":"2026033012321563800_ref032","volume-title":"Openai gym","author":"Brockman","year":"2016"},{"key":"2026033012321563800_ref033","article-title":"Language Models are Few-Shot Learners","volume-title":"Advances in Neural Information Processing Systems.","author":"Brown","year":"2020"},{"key":"2026033012321563800_ref034","volume-title":"Exploration by random network distillation","author":"Burda","year":"2019"},{"key":"2026033012321563800_ref035","volume-title":"Learning to Prioritize Planning Updates in Model-based Reinforcement Learning","author":"Burega","year":"2022"},{"key":"2026033012321563800_ref036","doi-asserted-by":"crossref","unstructured":"Chalvidal, M., T.Serre, and R.VanRullen. (2022). \u201cMeta-Reinforcement Learning with Self-Modifying Networks\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. URL: https:\/\/openreview.net\/forum?id= cYeYzaP-5AF.","DOI":"10.52202\/068431-0569"},{"key":"2026033012321563800_ref037","volume-title":"Transformers generalize differently from information stored in context vs in weights","author":"Chan","year":"2022"},{"key":"2026033012321563800_ref038","first-page":"1478","volume-title":"Learning to Cooperate with Unseen Agents Through Meta-Reinforcement Learning","author":"Charakorn","year":"2021"},{"key":"2026033012321563800_ref039","first-page":"3414","volume-title":"Toward a thousand lights: Decentralized deep reinforcement learning for large-scale traffic signal control","author":"Chen","year":"2020"},{"key":"2026033012321563800_ref040","volume-title":"Understanding Domain Randomization for Sim-to-real Transfer","author":"Chen","year":"2022"},{"key":"2026033012321563800_ref041","first-page":"748","volume-title":"Learning to learn without gradient descent by gradient descent","author":"Chen","year":"2017"},{"key":"2026033012321563800_ref042","doi-asserted-by":"crossref","first-page":"31741","DOI":"10.52202\/068431-2301","article-title":"Provable benefit of multitask representation learning in reinforcement learning","volume":"35","author":"Cheng","year":"2022","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref043","article-title":"ContraBAR: Contrastive Bayes-Adaptive Deep RL","volume-title":"ICML.","author":"Choshen","year":"2023"},{"key":"2026033012321563800_ref044","volume-title":"PaLM: Scaling Language Modeling with Pathways","author":"Chowdhery","year":"2022"},{"key":"2026033012321563800_ref045","article-title":"Model-Based Reinforcement Learning via Meta-Policy Optimization","volume-title":"CoRL.","author":"Clavera","year":"2018"},{"key":"2026033012321563800_ref046","first-page":"2048","volume-title":"Leveraging procedural generation to benchmark reinforcement learning","author":"Cobbe","year":"2020"},{"key":"2026033012321563800_ref047","first-page":"4238","volume-title":"Marni and anil provably learn representations","author":"Collins","year":"2022"},{"key":"2026033012321563800_ref048","doi-asserted-by":"crossref","DOI":"10.1109\/IROS45743.2020.9341076","article-title":"Self-Adapting Recurrent Models for Object Pushing from Learning in Simulation","volume-title":"IROS.","author":"Cong","year":"2020"},{"key":"2026033012321563800_ref049","first-page":"2376","volume-title":"Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation","author":"Dance","year":"2021"},{"key":"2026033012321563800_ref050","unstructured":"Dasari, S. and A.Gupta. (2020). \u201cTransformers for One-Shot Visual Imitation\u201d. CoRL. abs\/2011.05970. URL: https:\/\/arxiv.org\/abs\/2011.05970."},{"key":"2026033012321563800_ref051","volume-title":"The effects of negative adaptation in model-agnostic meta-learning","author":"Deleu","year":"2018"},{"key":"2026033012321563800_ref052","first-page":"1566","volume-title":"Learning-to-learn stochastic gradient descent with biased regularization","author":"Denevi","year":"2019"},{"key":"2026033012321563800_ref053","first-page":"13049","article-title":"Emergent complexity and zero-shot transfer via unsupervised environment design","volume":"33","author":"Dennis","year":"2020","journal-title":"Advances in neural information processing systems."},{"key":"2026033012321563800_ref054","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","author":"Devlin","year":"2019"},{"key":"2026033012321563800_ref055","first-page":"4607","article-title":"Offline Meta Reinforcement Learning-Identifiability Challenges and Effective Data Collection Strategies","volume":"34","author":"Dorfman","year":"2021","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref056","article-title":"One-Shot Imitation Learning","volume-title":"NIPS.","author":"Duan","year":"2017"},{"key":"2026033012321563800_ref057","first-page":"1329","volume-title":"Benchmarking deep reinforcement learning for continuous control","author":"Duan","year":"2016"},{"key":"2026033012321563800_ref058","volume-title":"RL2: Fast reinforcement learning via slow reinforcement learning","author":"Duan","year":"2016"},{"key":"2026033012321563800_ref059","volume-title":"Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision ProCeSSeS","author":"Duff","year":"2002"},{"key":"2026033012321563800_ref060","volume-title":"ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI","author":"Elawady","year":"2024"},{"key":"2026033012321563800_ref061","volume-title":"SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning","author":"Ellis","year":"2022"},{"key":"2026033012321563800_ref062","volume-title":"Successor Feature Neural Episodic Control","author":"Emukpere","year":"2021"},{"key":"2026033012321563800_ref063","first-page":"1407","volume-title":"Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures","author":"Espeholt","year":"2018"},{"key":"2026033012321563800_ref064","article-title":"Diversity is All You Need: Learning Skills without a Reward Function","volume-title":"ICLR.","author":"Eysenbach","year":"2019"},{"key":"2026033012321563800_ref065","volume-title":"Meta-Q-Learning","author":"Fakoor","year":"2020"},{"key":"2026033012321563800_ref066","volume-title":"Prov-ably convergent policy gradient methods for model-agnostic meta-reinforcement learning","author":"Fallali","year":"2020"},{"key":"2026033012321563800_ref067","first-page":"3096","article-title":"On the convergence theory of debiased model-agnostic meta-reinforcement learning","volume":"34","author":"Fallali","year":"2021","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref068","article-title":"Loaded DiCE: Trading off bias and variance in any-order score function gradient estimators for reinforcement learning","volume-title":"Advances in Neural Information Processing Systems.","author":"Farquhar","year":"2019"},{"key":"2026033012321563800_ref069","first-page":"34","article-title":"Neural auto-curricula in two-player zero-sum games","volume-title":"Advances in Neural Information Processing Systems.","author":"Feng","year":"2021"},{"key":"2026033012321563800_ref070","first-page":"1126","volume-title":"Model-agnostic meta-learning for fast adaptation of deep networks","author":"Finn","year":"2017"},{"key":"2026033012321563800_ref071","article-title":"Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm","volume-title":"ICLR.","author":"Finn","year":"2018"},{"key":"2026033012321563800_ref072","first-page":"1920","volume-title":"Online meta-learning","author":"Finn","year":"2019"},{"key":"2026033012321563800_ref073","article-title":"One-Shot Visual Imitation Learning via Meta-Learning","volume-title":"CoRL.","author":"Finn","year":"2017"},{"key":"2026033012321563800_ref074","volume-title":"Meta-Learning with Warped Gradient Descent","author":"Flennerhag","year":"2020"},{"key":"2026033012321563800_ref075","volume-title":"Bootstrapped Meta-Learning","author":"Flennerhag","year":"2021"},{"key":"2026033012321563800_ref076","first-page":"1529","volume-title":"Dice: The infinitely differentiable monte carlo estimator","author":"Foerster","year":"2018"},{"key":"2026033012321563800_ref077","doi-asserted-by":"crossref","DOI":"10.65109\/HGWA8807","article-title":"Learning with Opponent-Learning Awareness","volume-title":"AAMAS.","author":"Foerster","year":"2018"},{"key":"2026033012321563800_ref078","article-title":"Generalization of Reinforcement Learners with Working and Episodic Memory","volume-title":"NeurlPS.","author":"Fortunato","year":"2019"},{"key":"2026033012321563800_ref079","first-page":"7457","volume-title":"Towards effective context for meta-reinforcement learning: an approach based on contrastive learning","author":"Fu","year":"2021"},{"key":"2026033012321563800_ref080","article-title":"Off-Policy Deep Rein-forcement Learning without Exploration","volume-title":"ICML.","author":"Fujimoto","year":"2019"},{"key":"2026033012321563800_ref081","volume-title":"Meta-Learning surrogate models for sequential decision making","author":"Galashov","year":"2019"},{"key":"2026033012321563800_ref082","volume-title":"Transferring Hierarchical Structures with Dual Meta Imitation Learning","author":"Gao","year":"2022"},{"key":"2026033012321563800_ref083","first-page":"11154","volume-title":"Modeling and Optimization Tradeoff in Meta-learning","author":"Gao","year":"2020"},{"key":"2026033012321563800_ref084","first-page":"305","volume-title":"Fast adaptation with meta-reinforcement learning for trust modelling in human-robot interaction","author":"Gao","year":"2019"},{"key":"2026033012321563800_ref085","volume-title":"Multi-objective evolution for generalizable policy gradient algorithms","author":"Garau-Luis","year":"2022"},{"key":"2026033012321563800_ref086","doi-asserted-by":"crossref","DOI":"10.65109\/IHUS5472","article-title":"A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning","volume-title":"NeurlPS.","author":"Garcia","year":"2019"},{"key":"2026033012321563800_ref087","article-title":"Neural Processes","volume-title":"ICML.","author":"Garnelo","year":"2018"},{"key":"2026033012321563800_ref088","volume-title":"Meta-RL for Multi-Agent RL: Learning to Adapt to Evolving Agents","author":"Gerstgrasser","year":"2022"},{"key":"2026033012321563800_ref089","doi-asserted-by":"publisher","first-page":"1274","DOI":"10.1109\/IROS51168.2021.9636628","volume-title":"Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms","author":"Ghadirzadeh","year":"2021"},{"issue":"5-6","key":"2026033012321563800_ref090","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1561\/2200000049","article-title":"Bayesian reinforcement learning: A survey","volume":"8","author":"Ghavamzadeh","year":"2015","journal-title":"Foundations and Trends\u00ae in Machine Learning."},{"key":"2026033012321563800_ref091","first-page":"7513","volume-title":"Offline rl policies should be trained to be adaptive","author":"Ghosh","year":"2022"},{"key":"2026033012321563800_ref092","article-title":"Can Learned Optimization Make Reinforcement Learning Less Difficult?","volume-title":"Advances in Neural Information Processing Systems.","author":"Goldie","year":"2024"},{"key":"2026033012321563800_ref093","first-page":"7755","volume-title":"One-shot learning of multi-step tasks from observation via activity localization in auxiliary video","author":"Goo","year":"2019"},{"key":"2026033012321563800_ref094","unstructured":"Graves, A., G.Wayne, and I.Danihelka. (2014). \u201cNeural Turing Machines\u201d. arXiv. abs\/1410.5401. URL: http:\/\/arxiv.org\/abs\/1410.5401."},{"key":"2026033012321563800_ref095","volume-title":"Train Hard, Fight Easy: Robust Meta Reinforcement Learning","author":"Greenberg","year":"2023"},{"key":"2026033012321563800_ref096","volume-title":"Variance-Seeking Meta-Exploration to Handle Out-of-Distribution Tasks","author":"Grewal","year":"2021"},{"key":"2026033012321563800_ref097","article-title":"AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents","volume-title":"ICLR.","author":"Grigsby","year":"2024"},{"key":"2026033012321563800_ref098","first-page":"1802","volume-title":"Learning Policy Representations in Multiagent Systems","author":"Grover","year":"2018"},{"key":"2026033012321563800_ref099","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1613\/jair.4117","article-title":"Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search","volume":"48","author":"Guez","year":"2013","journal-title":"Journal of Artificial Intelligence Research."},{"key":"2026033012321563800_ref100","first-page":"6792","volume-title":"Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks","author":"Guo","year":"2022"},{"key":"2026033012321563800_ref101","volume-title":"Neural predictive belief representations","author":"Guo","year":"2018"},{"key":"2026033012321563800_ref102","article-title":"Dynamic population-based meta-learning for multi-agent communication with natural language","volume-title":"NeurlPS.","author":"Gupta","year":"2021"},{"key":"2026033012321563800_ref103","volume-title":"Unsupervised Meta-Learning for Reinforcement Learning","author":"Gupta","year":"218"},{"key":"2026033012321563800_ref104","volume-title":"Meta-Reinforcement Learning of Structured Exploration Strategies","author":"Gupta","year":"2018"},{"key":"2026033012321563800_ref105","first-page":"910","volume-title":"MAME: Model-Agnostic Meta-Exploration","author":"Gurumurthy","year":"2020"},{"key":"2026033012321563800_ref106","volume-title":"Hypernetworks","author":"Ha","year":"2017"},{"key":"2026033012321563800_ref107","first-page":"1861","volume-title":"Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor","author":"Haarnoja","year":"2018"},{"key":"2026033012321563800_ref108","volume-title":"Contextual Markov Decision Processes","author":"Hallak","year":"2015"},{"key":"2026033012321563800_ref109","volume-title":"Control adaptation via meta-learning dynamics","author":"Harrison","year":"2018"},{"key":"2026033012321563800_ref110","article-title":"Deep Recurrent Q-Learning for Partially Observable MDPs","volume-title":"A A AI Fall Symposia.","author":"Hausknecht","year":"2015"},{"key":"2026033012321563800_ref111","volume-title":"Learning Representations that Enable Generalization in Assistive Tasks","author":"He","year":"2022"},{"key":"2026033012321563800_ref112","volume-title":"The organization of behavior; a neuropsychological theory.","author":"Hebb","year":"1949"},{"key":"2026033012321563800_ref113","volume-title":"Memory-based control with recurrent neural networks","author":"Heess","year":"2015"},{"key":"2026033012321563800_ref114","volume-title":"Few-Shot Preference Learning for Human-in-the-Loop RL","author":"Hejna","year":"2022"},{"key":"2026033012321563800_ref115","first-page":"3796","volume-title":"Multi-task deep reinforcement learning with popart","author":"Hessel","year":"2019"},{"key":"2026033012321563800_ref116","first-page":"129","volume-title":"Meta-model-based meta-policy optimization","author":"Hiraoka","year":"2021"},{"key":"2026033012321563800_ref117","first-page":"87","volume-title":"Learning to learn using gradient descent","author":"Hochreiter","year":"2001"},{"key":"2026033012321563800_ref118","volume-title":"Meta-learning in neural networks: A survey","author":"Hosp\u00e9dales","year":"2020"},{"key":"2026033012321563800_ref119","volume-title":"Evolved Policy Gradients","author":"Houthooft","year":"2018"},{"key":"2026033012321563800_ref120","first-page":"4349","volume-title":"Near-optimal representation learning for linear bandits and linear rl","author":"Hu","year":"2021"},{"issue":"6","key":"2026033012321563800_ref121","doi-asserted-by":"crossref","first-page":"4483","DOI":"10.1007\/s10462-021-10004-4","article-title":"A survey of deep meta-learning","volume":"54","author":"Huisman","year":"2021","journal-title":"Artificial Intelligence Review."},{"key":"2026033012321563800_ref122","volume-title":"Meta reinforcement learning as task inference","author":"Humplik","year":"2019"},{"key":"2026033012321563800_ref123","doi-asserted-by":"crossref","first-page":"49494","DOI":"10.1109\/ACCESS.2022.3170582","article-title":"Off-Policy Meta-Reinforcement Learning With Belief-Based Task Inference","volume":"10","author":"Imagawa","year":"2022","journal-title":"IEEE Access."},{"key":"2026033012321563800_ref124","first-page":"32","article-title":"Unsupervised curricula for visual meta-reinforcement learning","volume-title":"Advances in Neural Information Processing Systems.","author":"Jabri","year":"2019"},{"key":"2026033012321563800_ref125","volume-title":"Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design","author":"Jackson","year":"2023"},{"key":"2026033012321563800_ref126","volume-title":"Reinforcement learning with unsupervised auxiliary tasks","author":"Jaderberg","year":"2016"},{"key":"2026033012321563800_ref127","article-title":"Task-Embedded Control Networks for Few-Shot Imitation Learning","volume-title":"CoRL.","author":"James","year":"2018"},{"issue":"2","key":"2026033012321563800_ref128","doi-asserted-by":"crossref","first-page":"3019","DOI":"10.1109\/LRA.2020.2974707","article-title":"Rlbench: The robot learning benchmark & learning environment","volume":"5","author":"James","year":"2020","journal-title":"IEEE Robotics and Automation Letters."},{"key":"2026033012321563800_ref129","volume-title":"BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning","author":"Jang","year":"2021"},{"key":"2026033012321563800_ref130","first-page":"2137","volume-title":"Provably efficient reinforcement learning with linear function approximation","author":"Jin","year":"2020"},{"key":"2026033012321563800_ref131","first-page":"20813","volume-title":"Probabilistic Active Meta-Learning","author":"Kaddour","year":"2020"},{"key":"2026033012321563800_ref132","volume-title":"Learning adaptive exploration strategies in dynamic environments through informed policy regularization","author":"Kamienny","year":"2020"},{"key":"2026033012321563800_ref133","article-title":"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention","volume-title":"ICML.","author":"Katharopoulos","year":"2020"},{"key":"2026033012321563800_ref134","first-page":"5269","volume-title":"Fast online adaptation in robotics through meta-learning embeddings of simulated priors","author":"Kaushik","year":"2020"},{"key":"2026033012321563800_ref135","first-page":"32","volume-title":"Advances in Neural Information Processing Systems.","author":"Khodak","year":"2019"},{"key":"2026033012321563800_ref136","first-page":"5541","volume-title":"A policy gradient algorithm for learning to learn in multiagent reinforcement learning","author":"Kim","year":"2021"},{"key":"2026033012321563800_ref137","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1613\/jair.1.14174","article-title":"A survey of zero-shot generalisation in deep reinforcement learning","volume":"76","author":"Kirk","year":"2023","journal-title":"Journal of Artificial Intelligence Research."},{"key":"2026033012321563800_ref138","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v36i7.20681","article-title":"Introducing Symmetries to Black Box Meta Reinforcement Learning","volume-title":"AAAI","author":"Kirsch","year":"2022"},{"key":"2026033012321563800_ref139","volume-title":"Towards General-Purpose In-Context Learning Agents","author":"Kirsch","year":"2023"},{"key":"2026033012321563800_ref140","volume-title":"General-purpose in-context learning by meta-learning transformers","author":"Kirsch","year":"2022"},{"key":"2026033012321563800_ref141","volume-title":"Improving generalization in meta reinforcement learning using learned objectives","author":"Kirsch","year":"2019"},{"key":"2026033012321563800_ref142","volume-title":"Exchangeable Models in Meta Reinforcement Learning","author":"Korshunova","year":"2020"},{"key":"2026033012321563800_ref143","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2021.XVII.011","article-title":"RMA: Rapid Motor Adaptation for Legged Robots","volume-title":"RSS.","author":"Kumar","year":"2021"},{"key":"2026033012321563800_ref144","volume-title":"Meta-Learning of Compositional Task Distributions in Humans and Machines","author":"Kumar","year":"2020"},{"key":"2026033012321563800_ref145","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2021.XVII.065","volume-title":"Error-Aware Policy Learning: Zero-Shot Generalization in Partially Observable Dynamic Environments","author":"Kumar","year":"2021"},{"key":"2026033012321563800_ref146","first-page":"7671","article-title":"The nethack learning environment","volume":"33","author":"K\u00fcttler","year":"2020","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref147","first-page":"5884","volume-title":"Meta-Thompson Sampling","author":"Kveton","year":"2021"},{"key":"2026033012321563800_ref148","first-page":"2794","volume-title":"Meta reinforcement learning with task embedding and shared policy","author":"Lan","year":"2019"},{"key":"2026033012321563800_ref149","volume-title":"Learning to Optimize for Reinforcement Learning","author":"Lan","year":"2023"},{"key":"2026033012321563800_ref150","volume-title":"Learning not to learn: Nature versus nurture in silico","author":"Lange","year":"2020"},{"key":"2026033012321563800_ref151","article-title":"In-context reinforcement learning with algorithm distillation","volume-title":"ICLR.","author":"Laskin","year":"2023"},{"key":"2026033012321563800_ref152","first-page":"5725","volume-title":"Batch reinforcement learning with hyperparameter gradients","author":"Lee","year":"2020"},{"key":"2026033012321563800_ref153","volume-title":"Bayesian Policy Optimization for Model Uncertainty","author":"Lee","year":"2019"},{"key":"2026033012321563800_ref154","first-page":"36","article-title":"Supervised pretraining can learn in-context reinforcement learning","volume-title":"Advances in Neural Information Processing Systems.","author":"Lee","year":"2024"},{"key":"2026033012321563800_ref155","article-title":"Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning","volume-title":"ICML.","author":"Lee","year":"2020"},{"key":"2026033012321563800_ref156","first-page":"34","article-title":"Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture","volume-title":"Advances in Neural Information Processing Systems.","author":"Lee","year":"2021"},{"key":"2026033012321563800_ref157","volume-title":"Offline reinforcement learning: Tutorial, review, and perspectives on open problems","author":"Levine","year":"2020"},{"key":"2026033012321563800_ref158","volume-title":"Meta-imitation learning by watching video demonstrations","author":"Li","year":"2021"},{"key":"2026033012321563800_ref159","volume-title":"Learning to optimize","author":"Li","year":"2016"},{"key":"2026033012321563800_ref160","article-title":"Learning to Optimize","volume-title":"ICLR.","author":"Li","year":"2017"},{"key":"2026033012321563800_ref161","first-page":"6346","volume-title":"MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning","author":"Li","year":"2021"},{"key":"2026033012321563800_ref162","volume-title":"FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization","author":"Li","year":"2021"},{"key":"2026033012321563800_ref163","unstructured":"Li, Z., F.Zhou, F.Chen, and H.Li. (2017). \u201cMeta-SGD: Learning to Learn Quickly for Few Shot Learning\u201d. CoRR. abs\/1707.09835. url: http:\/\/arxiv.org\/abs\/1707.09835."},{"key":"2026033012321563800_ref164","unstructured":"Lillicrap, T. P., J. J.Hunt, A.Pritzel, N.Heess, T.Erez, Y.Tassa, D.Silver, and D.Wierstra. (2016). \u201cContinuous control with deep reinforcement learning.\u201d In: ICLR (Poster). url: http:\/\/arxiv.org\/abs\/1509.02971."},{"key":"2026033012321563800_ref165","volume-title":"Model-Based Offline Meta-Reinforcement Learning with Regularization","author":"Lin","year":"2022"},{"key":"2026033012321563800_ref166","article-title":"Adaptive Auxiliary Task Weighting for Reinforcement Learning","volume-title":"NeurlPS.","author":"Lin","year":"2019"},{"key":"2026033012321563800_ref167","article-title":"Model-based Adversarial Meta-Reinforcement Learning","volume-title":"Advances in Neural Information Processing Systems.","author":"Lin","year":"2020"},{"key":"2026033012321563800_ref168","first-page":"10161","article-title":"Model-based adversarial meta-reinforcement learning","volume":"33","author":"Lin","year":"2020","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref169","first-page":"7525","volume-title":"Lifelong hyper-policy optimization with multiple importance sampling regularization","author":"Liotet","year":"2022"},{"key":"2026033012321563800_ref170","unstructured":"Liu, B., X.Feng, J.Ren, L.Mai, R.Zhu, H.Zhang, J.Wang, and Y.Yang. (2022a). \u201cA Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. URL: https:\/\/openreview.net\/forum?id=p9zeOtKQXKs."},{"key":"2026033012321563800_ref171","first-page":"31059","article-title":"A theoretical understanding of gradient bias in meta-reinforcement learning","volume":"35","author":"Liu","year":"2022","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref172","first-page":"6925","volume-title":"Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices","author":"Liu","year":"2021"},{"key":"2026033012321563800_ref173","article-title":"Giving Feedback on Interactive Student Programs with Meta-Exploration","volume-title":"Advances in Neural Information Processing Systems.","author":"Liu","year":"2022"},{"key":"2026033012321563800_ref174","first-page":"4061","volume-title":"Taming marni: Efficient unbiased meta-reinforcement learning","author":"Liu","year":"2019"},{"key":"2026033012321563800_ref175","volume-title":"Meta-learning from sparse recovery","author":"Lou","year":"2021"},{"key":"2026033012321563800_ref176","first-page":"16455","volume-title":"Discovered Policy Optimisation","author":"Lu","year":"2022"},{"key":"2026033012321563800_ref177","volume-title":"Adversarial Cheap Talk","author":"Lu","year":"2022"},{"key":"2026033012321563800_ref178","first-page":"14398","volume-title":"Model-free opponent shaping","author":"Lu","year":"2022"},{"key":"2026033012321563800_ref179","volume-title":"Meta-Gradients in Non-Stationary Environments","author":"Luketina","year":"2022"},{"key":"2026033012321563800_ref180","first-page":"20532","article-title":"Information-theoretic task selection for meta-reinforcement learning","volume":"33","author":"Luna Gutierrez","year":"2020","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref181","first-page":"7637","volume-title":"Adapt to environment sudden changes by learning a context sensitive policy","author":"Luo","year":"2022"},{"key":"2026033012321563800_ref182","volume-title":"On the Effectiveness of Fine-tuning Versus Meta-RL for Robot Manipulation","author":"Mandi","year":"2022"},{"key":"2026033012321563800_ref183","first-page":"4343","volume-title":"A baseline for any order gradient estimation in stochastic computation graphs","author":"Mao","year":"2019"},{"key":"2026033012321563800_ref184","volume-title":"Curriculum in Gradient-Based Meta-Reinforcement Learning","author":"Mehta","year":"2020"},{"key":"2026033012321563800_ref185","doi-asserted-by":"crossref","unstructured":"Meier, R. and A.Mujika. (2022). \u201cOpen-Ended Reinforcement Learning with Neural Reward Functions\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. url: https:\/\/openreview.net\/forum?id=NL05_JGVg99.","DOI":"10.52202\/068431-0179"},{"key":"2026033012321563800_ref186","volume-title":"Transformers are Meta-Reinforcement Learners","author":"Melo","year":"2022"},{"key":"2026033012321563800_ref187","volume-title":"Meta-reinforcement learning robust to distributional shift via model identification and experience relabeling","author":"Mendonca","year":"2020"},{"key":"2026033012321563800_ref188","article-title":"Guided Meta-Policy Search","volume-title":"NeurlPS.","author":"Mendonca","year":"2019"},{"key":"2026033012321563800_ref189","volume-title":"Gradients are not all you need","author":"Metz","year":"2021"},{"key":"2026033012321563800_ref190","first-page":"4556","volume-title":"Understanding and correcting pathologies in the training of learned optimizers","author":"Metz","year":"2019"},{"key":"2026033012321563800_ref191","volume-title":"Back-propamine: training self-modifying neural networks with differen-tiable neuromodulated plasticity","author":"Miconi","year":"2019"},{"key":"2026033012321563800_ref192","first-page":"3559","volume-title":"Differentiable plasticity: training plastic neural networks with backpropagation","author":"Miconi","year":"2018"},{"issue":"62","key":"2026033012321563800_ref193","doi-asserted-by":"crossref","DOI":"10.1126\/scirobotics.abk2822","article-title":"Learning robust perceptive locomotion for quadrupedal robots in the wild","volume":"7","author":"Miki","year":"2022","journal-title":"Science Robotics."},{"key":"2026033012321563800_ref194","volume-title":"A Simple Neural Attentive Meta-Learner","author":"Mishra","year":"2018"},{"key":"2026033012321563800_ref195","unstructured":"Mitchell, E., R.Rafailov, X. B.Peng, S.Levine, and C.Finn. (2020). \u201cOffline Meta-Reinforcement Learning with Advantage Weighting\u201d. CoRR. abs\/2008.06043. URL: https:\/\/arxiv.org\/abs\/2008.06043."},{"key":"2026033012321563800_ref196","first-page":"7780","volume-title":"Offline Meta-Reinforcement Learning with Advantage Weighting","author":"Mitchell","year":"2021"},{"key":"2026033012321563800_ref197","first-page":"1928","volume-title":"Asynchronous methods for deep reinforcement learning","author":"Mnih","year":"2016"},{"key":"2026033012321563800_ref198","volume-title":"Neural belief states for partially observed domains","author":"Moreno","year":"2018"},{"key":"2026033012321563800_ref199","unstructured":"Mu, Y., Y.Zliuang, F.Ni, B.Wang, J.Chen, J.Hao, and P.Luo. (2022). \u201cDOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. url: https:\/\/openreview.net\/forum?id=CJGUABT_COm."},{"key":"2026033012321563800_ref200","first-page":"7850","volume-title":"Unsupervised reinforcement learning in multiple environments","author":"Mutti","year":"2022"},{"key":"2026033012321563800_ref201","volume-title":"Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning","author":"Nagabandi","year":"2019"},{"key":"2026033012321563800_ref202","volume-title":"Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning","author":"Nagabandi","year":"2019"},{"key":"2026033012321563800_ref203","volume-title":"Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL","author":"Nagabandi","year":"2019"},{"key":"2026033012321563800_ref204","first-page":"20719","volume-title":"Meta-Learning through Hebbian Plasticity in Random Networks","author":"Najarro","year":"2020"},{"key":"2026033012321563800_ref205","volume-title":"Leveraging Fully Observable Policies for Learning under Partial Observability","author":"Nguyen","year":"2022"},{"key":"2026033012321563800_ref206","volume-title":"Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs","author":"Ni","year":"2022"},{"key":"2026033012321563800_ref207","volume-title":"On first-order meta-learning algorithms","author":"Nichol","year":"2018"},{"key":"2026033012321563800_ref208","volume-title":"Gotta learn fast: A new benchmark for generalization in rl","author":"Nichol","year":"2018"},{"key":"2026033012321563800_ref209","article-title":"Control of Memory, Active Perception, and Action in Minecraft","volume-title":"ICML.","author":"Oh","year":"2016"},{"key":"2026033012321563800_ref210","volume-title":"Discovering reinforcement learning algorithms","author":"Oh","year":"2020"},{"key":"2026033012321563800_ref211","article-title":"Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL","author":"Packer","year":"2021"},{"key":"2026033012321563800_ref212","volume-title":"One-shot high-fidelity imitation: Training large-scale deep nets with rl","author":"Paine","year":"2018"},{"key":"2026033012321563800_ref213","volume-title":"Variational Autoencoders for Opponent Modeling in Multi-Agent Systems","author":"Papoudakis","year":"2020"},{"key":"2026033012321563800_ref214","article-title":"Stabilizing Transformers for Reinforcement Learning","volume-title":"ICML.","author":"Parisotto","year":"2020"},{"key":"2026033012321563800_ref215","article-title":"Meta-Curvature","volume-title":"NeurIPS.","author":"Park","year":"2019"},{"key":"2026033012321563800_ref216","volume-title":"METRA: Scalable Unsupervised RL with Metric-Aware Abstraction","author":"Park","year":"2024"},{"key":"2026033012321563800_ref217","volume-title":"Automated Reinforcement Learning (AutoRL): A Survey and Open Problems","author":"Parker-Holder","year":"2022"},{"key":"2026033012321563800_ref218","article-title":"On the difficulty of training recurrent neural networks","volume-title":"ICML.","author":"Pascanu","year":"2013"},{"key":"2026033012321563800_ref219","unstructured":"Peng, M., B.Zhu, and J.Jiao. (2021). \u201cLinear Representation Meta-Reinforcement Learning for Instant Adaptation\u201d. url: https:\/\/openreview.net\/forum?id=lNrtNGkr-vw."},{"key":"2026033012321563800_ref220","first-page":"5403","volume-title":"Generalized hidden parameter mdps: Transferable model-based rl in a handful of trials","author":"Perez","year":"2020"},{"issue":"2","key":"2026033012321563800_ref221","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/s10846-017-0468-y","article-title":"Survey of model-based reinforcement learning: Applications on robotics","volume":"86","author":"Polydoros","year":"2017","journal-title":"Journal of Intelligent & Robotic Systems."},{"key":"2026033012321563800_ref222","first-page":"17811","volume-title":"Offline meta-reinforcement learning with online self-supervision","author":"Pong","year":"2022"},{"key":"2026033012321563800_ref223","unstructured":"Prat, A. and E.Johns. (2021). \u201cPERIL: Probabilistic Embeddings for hybrid Meta-Reinforcement and Imitation Learning\u201d. url: https:\/\/openreview.net\/forum?id=B\u03a0wfP55pp."},{"key":"2026033012321563800_ref224","volume-title":"The Reflective Explorer: Online Meta-Exploration from Offline Data in Realistic Robotic Tasks","author":"Rafailov","year":"2021"},{"key":"2026033012321563800_ref225","article-title":"Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML","volume-title":"ICLR.","author":"Raghu","year":"2020"},{"key":"2026033012321563800_ref226","article-title":"Fast Adaptation via Policy-Dynamics Value Functions","volume-title":"ICML.","author":"Raileanu","year":"2020"},{"key":"2026033012321563800_ref227","doi-asserted-by":"publisher","first-page":"5454","DOI":"10.1609\/aaai.v34i04. 5995","article-title":"How Should an Agent Practice?","author":"Rajendran","year":"2020"},{"key":"2026033012321563800_ref228","first-page":"5331","volume-title":"Efficient off-policy meta-reinforcement learning via probabilistic context variables","author":"Rakelly","year":"2019"},{"key":"2026033012321563800_ref229","volume-title":"Generalization to New Sequential Decision Making Tasks with In-Context Learning","author":"Raparthy","year":"2023"},{"key":"2026033012321563800_ref230","article-title":"Optimization as a Model for Few-Shot Learning","volume-title":"ICLR.","author":"Ravi","year":"2017"},{"key":"2026033012321563800_ref231","volume-title":"PhD thesis.","author":"Rechenberg","year":"1971"},{"key":"2026033012321563800_ref232","volume-title":"Leveraging Language for Accelerated Learning of Tool Manipulation","author":"Ren","year":"2022"},{"key":"2026033012321563800_ref233","unstructured":"Ren, Z., A.Liu, Y.Liang, J.Peng, and J.Ma. (2022b). \u201cEfficient Meta Reinforcement Learning for Preference-based Fast Adaptation\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. URL: https:\/\/openreview.net\/forum?id=6lUwgelotn."},{"key":"2026033012321563800_ref234","doi-asserted-by":"crossref","unstructured":"Rengarajan, D., S.Chaudhary, J.Kim, D.Kalathil, and S.Shakkottai. (2022). \u201cEnhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. URL: https:\/\/openreview.net\/forum?id=kCtnkLv-WO.","DOI":"10.52202\/068431-0198"},{"key":"2026033012321563800_ref235","volume-title":"Accelerating Online Reinforcement Learning via Model-Based Meta-Learning","author":"Co-Reyes","year":"2021"},{"key":"2026033012321563800_ref236","volume-title":"Evolving Reinforcement Learning Algorithms","author":"Co-Reyes","year":"2021"},{"key":"2026033012321563800_ref237","volume-title":"Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference","author":"Riemer","year":"2019"},{"key":"2026033012321563800_ref238","unstructured":"Rimon, Z., A.Tamar, and G.Adler. (2022). \u201cMeta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. url: https:\/\/openreview.net\/forum?id=Y-sdZLIi9R9."},{"key":"2026033012321563800_ref239","volume-title":"Rapid Task-Solving in Novel Environments","author":"Ritter","year":"2021"},{"key":"2026033012321563800_ref240","article-title":"Been There, Done That: Meta-Learning with Episodic Recall","volume-title":"ICML.","author":"Ritter","year":"2018"},{"key":"2026033012321563800_ref241","first-page":"4354","volume-title":"Been There, Done That: Meta-Learning with Episodic Recall","author":"Ritter","year":"2018"},{"key":"2026033012321563800_ref242","first-page":"9048","volume-title":"BIMRL: Brain Inspired Meta Reinforcement Learning","author":"Rohani","year":"2022"},{"key":"2026033012321563800_ref243","volume-title":"ProMP: Proximal Meta-Policy Search","author":"Rothfuss","year":"2019"},{"key":"2026033012321563800_ref244","volume-title":"Meta reinforcement learning with latent variable gaussian processes","author":"Saemundsson","year":"2018"},{"key":"2026033012321563800_ref245","volume-title":"Evolution strategies as a scalable alternative to reinforcement learning","author":"Salimans","year":"2017"},{"key":"2026033012321563800_ref246","first-page":"1842","volume-title":"Meta-learning with memory-augmented neural networks","author":"Santoro","year":"2016"},{"key":"2026033012321563800_ref247","volume-title":"PhD thesis.","author":"Schmidhuber","year":"1987"},{"key":"2026033012321563800_ref248","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/978-3-540-68677-4_7","volume-title":"Artificial general intelligence.","author":"Schmidhuber","year":"2007"},{"issue":"1","key":"2026033012321563800_ref249","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1023\/A:1007383707642","article-title":"Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement","volume":"28","author":"Schmidhuber","year":"1997","journal-title":"Machine Learning."},{"key":"2026033012321563800_ref250","first-page":"8545","volume-title":"Off-policy actor-critic with shared experience replay","author":"Schmitt","year":"2020"},{"key":"2026033012321563800_ref251","first-page":"9728","volume-title":"Meta-reinforcement learning for robotic industrial insertion tasks","author":"Schoettler","year":"2020"},{"key":"2026033012321563800_ref252","volume-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017"},{"key":"2026033012321563800_ref253","doi-asserted-by":"crossref","DOI":"10.1109\/IROS45743.2020.9341057","article-title":"Sim-to-Real with Domain Randomization for Tumbling Robot Control","volume-title":"IROS.","author":"Schwartzwald","year":"2020"},{"key":"2026033012321563800_ref254","volume-title":"Masked World Models for Visual Control","author":"Seo","year":"2022"},{"key":"2026033012321563800_ref255","volume-title":"SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies","author":"Seyed Ghasemipour","year":"2019"},{"key":"2026033012321563800_ref256","volume-title":"AutoRL-Bench 1.0","author":"Shala","year":"2022"},{"key":"2026033012321563800_ref257","first-page":"9541","volume-title":"Meta-Learning Effective Exploration Strategies for Contextual Bandits","author":"Sharaf","year":"2021"},{"key":"2026033012321563800_ref258","volume-title":"Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments","author":"Al-Shedivat","year":"218"},{"key":"2026033012321563800_ref259","volume-title":"Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments","author":"Al-Shedivat","year":"2018"},{"issue":"4","key":"2026033012321563800_ref260","doi-asserted-by":"crossref","first-page":"10065","DOI":"10.1109\/LRA.2022.3191234","article-title":"Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments","volume":"7","author":"Shin","year":"2022","journal-title":"IEEE Robotics and Automation Letters."},{"issue":"7587","key":"2026033012321563800_ref261","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"Silver","year":"2016","journal-title":"nature."},{"key":"2026033012321563800_ref262","first-page":"26382","article-title":"Bayesian decision-making under misspecified priors with applications to meta-learning","volume":"34","author":"Simchowitz","year":"2021","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref263","first-page":"2167","volume-title":"Scalable multi-task imitation learning with autonomous improvement","author":"Singh","year":"2020"},{"key":"2026033012321563800_ref264","first-page":"2601","volume-title":"Where do rewards come from","author":"Singh","year":"2009"},{"key":"2026033012321563800_ref265","article-title":"Prototypical Networks for Few-shot Learning","volume-title":"Advances in Neural Information Processing Systems.","author":"Snell","year":"2017"},{"key":"2026033012321563800_ref358","volume-title":"Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies","author":"Sohn","year":"2020"},{"key":"2026033012321563800_ref266","volume-title":"ES-MAML: Simple Hessian-Free Meta Learning","author":"Song","year":"2020"},{"key":"2026033012321563800_ref267","doi-asserted-by":"crossref","DOI":"10.1109\/IROS45743.2020.9341571","article-title":"Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning","volume-title":"IROS.","author":"Song","year":"2020"},{"key":"2026033012321563800_ref268","article-title":"Some Considerations on Learning to Explore via Meta-Reinforcement Learning","volume-title":"NeurlPS.","author":"Stadie","year":"2018"},{"key":"2026033012321563800_ref269","volume-title":"Meta Arcade: A Configurable Environment Suite for Meta-Learning","author":"Staley","year":"2021"},{"key":"2026033012321563800_ref270","volume-title":"Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination","author":"Stone","year":"2010"},{"issue":"1","key":"2026033012321563800_ref271","first-page":"483","article-title":"Approximate information state for approximate planning and reinforcement learning in partially observed systems","volume":"23","author":"Subramanian","year":"2022","journal-title":"The Journal of Machine Learning Research."},{"key":"2026033012321563800_ref272","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2018.00131","article-title":"Learning to Compare: Relation Network for Few-Shot Learning","volume-title":"CVPR.","author":"Sung","year":"2018"},{"key":"2026033012321563800_ref273","unstructured":"Sung, F., L.Zhang, T.Xiang, T. M.Hosp\u00e9dales, and Y.Yang. (2017). \u201cLearning to Learn: Meta-Critic Networks for Sample Efficient Learning\u201d. CoRR.abs\/1706.09529.arXiv: 1706.09529. URL: http:\/\/arxiv.org\/abs\/1706.09529."},{"key":"2026033012321563800_ref274","volume-title":"Training recurrent neural networks.","author":"Sutskever","year":"2013"},{"key":"2026033012321563800_ref275","first-page":"171","volume-title":"AAAI.","author":"Sutton","year":"1992"},{"key":"2026033012321563800_ref276","unstructured":"Tack, J., J.Park, H.Lee, J.Lee, and J.Shin. (2022). \u201cMeta-Learning with Self-Improving Momentum Target\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. URL: https:\/\/openreview.net\/forum?id=FCNMbF_TsKm."},{"key":"2026033012321563800_ref277","first-page":"8423","volume-title":"Regularization guarantees generalization in bayesian reinforcement learning through algorithmic stability","author":"Tamar","year":"2022"},{"key":"2026033012321563800_ref278","first-page":"21050","volume-title":"Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning","author":"Tang","year":"2022"},{"key":"2026033012321563800_ref279","first-page":"5303","article-title":"Unifying gradient estimators for meta-reinforcement learning via off-policy evaluation","volume":"34","author":"Tang","year":"2021","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref280","volume-title":"Human-Timescale Adaptation in an Open-Ended Task Space","author":"Team","year":"2023"},{"key":"2026033012321563800_ref281","first-page":"30","article-title":"Distrai: Robust multitask reinforcement learning","volume-title":"Advances in neural information processing systems.","author":"Teh","year":"2017"},{"key":"2026033012321563800_ref282","volume-title":"HyperMARL: Adaptive Hypernetworks for Multi-Agent RL","author":"Tessera","year":"2024"},{"issue":"3-4","key":"2026033012321563800_ref283","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/biomet\/25.3-4.285","article-title":"On the likelihood that one unknown probability exceeds another in view of the evidence of two samples","volume":"25","author":"Thompson","year":"1933","journal-title":"Biometrika."},{"key":"2026033012321563800_ref284","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-1-4615-5529-2_1","volume-title":"Learning to learn.","author":"Thrun","year":"1998"},{"key":"2026033012321563800_ref285","first-page":"5026","volume-title":"Mujoco: A physics engine for model-based control","author":"Todorov","year":"2012"},{"key":"2026033012321563800_ref286","first-page":"13","article-title":"Learning one representation to optimize all rewards","volume":"34","author":"Touati","year":"2021","journal-title":"Advances in Neural Information Processing Systems."},{"key":"2026033012321563800_ref287","volume-title":"Meta-dataset: A dataset of datasets for learning to learn from few examples","author":"Triantafillou","year":"2020"},{"key":"2026033012321563800_ref288","first-page":"10434","volume-title":"Provable meta-learning of linear representations","author":"Tripuraneni","year":"2021"},{"issue":"11","key":"2026033012321563800_ref289","doi-asserted-by":"crossref","first-page":"1134","DOI":"10.1145\/1968.1972","article-title":"A theory of the learnable","volume":"27","author":"Valiant","year":"1984","journal-title":"Communications of the ACM."},{"key":"2026033012321563800_ref290","volume-title":"Meta-learning: A survey","author":"Vanschoren","year":"2018"},{"key":"2026033012321563800_ref291","article-title":"Attention Is All You Need","volume-title":"NeurlPS.","author":"Vaswani","year":"2017"},{"key":"2026033012321563800_ref292","volume-title":"CityLearn: Standardizing research in multi-agent reinforcement learning for demand response and urban energy management","author":"V\u00e1zquez-Canteli","year":"2020"},{"key":"2026033012321563800_ref293","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1016\/j.apenergy.2018.11.002","article-title":"Reinforcement learning for demand response: A review of algorithms and modeling techniques","volume":"235","author":"V\u00e1zquez-Canteli","year":"2019","journal-title":"Applied energy."},{"key":"2026033012321563800_ref294","volume-title":"Discovery of Useful Questions as Auxiliary Tasks","author":"Veeriah","year":"2019"},{"key":"2026033012321563800_ref295","first-page":"29861","volume-title":"Discovery of Options via Meta-Learned Subgoals","author":"Veeriah","year":"2021"},{"key":"2026033012321563800_ref296","article-title":"Matching Networks for One Shot Learning","volume-title":"Advances in Neural Information Processing Systems.","author":"Vinyals","year":"2016"},{"key":"2026033012321563800_ref297","volume-title":"No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients","author":"Vuorio","year":"2021"},{"key":"2026033012321563800_ref298","article-title":"Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation","volume-title":"NeurlPS.","author":"Vuorio","year":"2019"},{"key":"2026033012321563800_ref299","volume-title":"Don\u2019t Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning","author":"Walke","year":"2022"},{"key":"2026033012321563800_ref300","volume-title":"Hindsight Foresight Relabeling for Meta-Reinforcement Learning","author":"Wan","year":"2022"},{"key":"2026033012321563800_ref301","first-page":"1440","volume-title":"Learning Context-aware Task Reasoning for Efficient Meta Reinforcement Learning","author":"Wang","year":"2020"},{"issue":"6","key":"2026033012321563800_ref302","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/s41593-018-0147-8","article-title":"Prefrontal cortex as a meta-reinforcement learning system","volume":"21","author":"Wang","year":"2018","journal-title":"Nature neuroscience."},{"key":"2026033012321563800_ref303","volume-title":"Learning to reinforcement learn","author":"Wang","year":"2016"},{"key":"2026033012321563800_ref304","article-title":"Alchemy: A structured task distribution for meta-reinforcement learning","volume-title":"NeurlPS.","author":"Wang","year":"2021"},{"key":"2026033012321563800_ref305","unstructured":"Wang, Q. and H.van Hoof. (2022). \u201cLearning Expressive Meta-Representations with Mixture of Expert Neural Processes\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. url: https:\/\/openreview.net\/forum?id=ju38DG3sbg6."},{"key":"2026033012321563800_ref306","article-title":"A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm","volume-title":"Advances in Neural Information Processing Systems.","author":"Wang","year":"2023"},{"key":"2026033012321563800_ref307","volume-title":"Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search","author":"Wang","year":"2022"},{"issue":"3","key":"2026033012321563800_ref308","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3386252","article-title":"Generalizing from a few examples: A survey on few-shot learning","volume":"53","author":"Wang","year":"2020","journal-title":"ACM computing surveys (csur)."},{"key":"2026033012321563800_ref309","volume-title":"Bayesian Meta Sampling for Fast Uncertainty Adaptation","author":"Wang","year":"2020"},{"key":"2026033012321563800_ref310","unstructured":"Weihs, L., U.Jain, I.-J.Liu, J.Salvador, S.Lazebnik, A.Kembhavi, and A.Schwing. (2021). \u201cBridging the Imitation Gap by Adaptive Insubordination\u201d. In: Advances in Neural Information Processing Systems. Ed. by A.Beygelzimer, Y.Dauphin, P.Liang, and J. W.Vaughan. url: https:\/\/openreview.net\/forum?id=WlxODqiUTD_."},{"key":"2026033012321563800_ref311","first-page":"8987","volume-title":"Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization","author":"Wen","year":"2022"},{"issue":"1","key":"2026033012321563800_ref312","first-page":"949","article-title":"Natural evolution strategies","volume":"15","author":"Wierstra","year":"2014","journal-title":"The Journal of Machine Learning Research."},{"issue":"3","key":"2026033012321563800_ref313","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1023\/A:1022672621406","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Machine learning."},{"key":"2026033012321563800_ref314","first-page":"8657","volume-title":"Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments","author":"Woo","year":"2022"},{"key":"2026033012321563800_ref315","doi-asserted-by":"crossref","DOI":"10.1109\/IROS45743.2020.9340915","article-title":"SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks","volume-title":"IROS.","author":"Wu","year":"2020"},{"key":"2026033012321563800_ref316","volume-title":"Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance","author":"Wu","year":"2022"},{"key":"2026033012321563800_ref317","volume-title":"Understanding Short-Horizon Bias in Stochastic Meta-Optimization","author":"Wu","year":"2018"},{"key":"2026033012321563800_ref318","volume-title":"HyperDynamics: Meta-Learning Object and Agent Dynamics with Hypernetworks","author":"Xian","year":"2021"},{"key":"2026033012321563800_ref319","first-page":"40","volume-title":"Few-shot goal inference for visuomotor learning and planning","author":"Xie","year":"2018"},{"key":"2026033012321563800_ref320","volume-title":"On the Practical Consistency of Meta-Reinforcement Learning Algorithms","author":"Xiong","year":"2021"},{"key":"2026033012321563800_ref321","volume-title":"Learning a Prior over Intent via Meta-Inverse Reinforcement Learning","author":"Xu","year":"2019"},{"key":"2026033012321563800_ref322","first-page":"24631","volume-title":"Prompting decision transformer for few-shot policy generalization","author":"Xu","year":"2022"},{"key":"2026033012321563800_ref323","first-page":"15254","volume-title":"Meta-Gradient Reinforcement Learning with an Objective Discovered Online","author":"Xu","year":"2020"},{"key":"2026033012321563800_ref324","volume-title":"Meta-Gradient Reinforcement Learning","author":"Xu","year":"2018"},{"key":"2026033012321563800_ref325","doi-asserted-by":"crossref","DOI":"10.1109\/IROS45743.2020.9341398","article-title":"Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning","volume-title":"IROS.","author":"Yan","year":"2020"},{"key":"2026033012321563800_ref326","doi-asserted-by":"crossref","DOI":"10.65109\/ZAQO5222","article-title":"Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning","volume-title":"AAMAS.","author":"Yang","year":"2022"},{"key":"2026033012321563800_ref327","volume-title":"Impact of representation learning in linear bandits","author":"Yang","year":"2020"},{"key":"2026033012321563800_ref328","doi-asserted-by":"crossref","DOI":"10.65109\/ELVR9404","article-title":"NoRML: No-Reward Meta Learning","volume-title":"AAMAS.","author":"Yang","year":"2019"},{"key":"2026033012321563800_ref329","first-page":"39770","volume-title":"On the power of pre-training for generalization in rl: Provable benefits and hardness","author":"Ye","year":"2023"},{"key":"2026033012321563800_ref330","first-page":"10700","volume-title":"Sequential generative exploration model for partially observable reinforcement learning","author":"Yin","year":"2021"},{"key":"2026033012321563800_ref331","volume-title":"Bayesian Model-Agnostic Meta-Learning","author":"Yoon","year":"2018"},{"key":"2026033012321563800_ref332","volume-title":"Meta-Inverse Reinforcement Learning with Probabilistic Context Variables","author":"Yu","year":"2019"},{"key":"2026033012321563800_ref333","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2018.XIV.002","volume-title":"One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning","author":"Yu","year":"2018"},{"key":"2026033012321563800_ref334","first-page":"1094","volume-title":"Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning","author":"Yu","year":"2020"},{"issue":"2","key":"2026033012321563800_ref335","doi-asserted-by":"crossref","first-page":"2950","DOI":"10.1109\/LRA.2020.2974685","article-title":"Learning fast adaptation with meta strategy optimization","volume":"5","author":"Yu","year":"2020","journal-title":"IEEE Robotics and Automation Letters."},{"key":"2026033012321563800_ref336","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2017.X\u03a0I.048","volume-title":"Preparing for the Unknown: Learning a Universal Policy with Online System Identification","author":"Yu","year":"2017"},{"key":"2026033012321563800_ref337","first-page":"25747","volume-title":"Robust task representations for offline meta-reinforcement learning via contrastive learning","author":"Yuan","year":"2022"},{"key":"2026033012321563800_ref338","volume-title":"A self-tuning actor-critic algorithm","author":"Zahavy","year":"2020"},{"key":"2026033012321563800_ref339","first-page":"1153","volume-title":"Metalight: Value-based meta-reinforcement learning for traffic signal control","author":"Zang","year":"2020"},{"issue":"3","key":"2026033012321563800_ref340","doi-asserted-by":"crossref","first-page":"8194","DOI":"10.1109\/LRA.2022.3185384","article-title":"Temporal logic guided meta q-learning of multiple tasks","volume":"7","author":"Zhang","year":"2022","journal-title":"IEEE Robotics and Automation Letters."},{"key":"2026033012321563800_ref341","first-page":"12600","volume-title":"MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration","author":"Zhang","year":"2021"},{"key":"2026033012321563800_ref342","volume-title":"A Meta-Gradient Approach to Learning Cooperative Multi-Agent Communication Topology","author":"Zhang","year":"2021"},{"key":"2026033012321563800_ref343","unstructured":"Zhao, M., P.Abbeel, and S.James. (2022a). \u201cOn the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning\u201d. In: Advances in Neural Information Processing Systems. Ed. by A. H.Oh, A.Agarwal, D.Belgrave, and K.Cho. url: https:\/\/openreview.net\/forum?id=mux7gn3g_3."},{"key":"2026033012321563800_ref344","first-page":"6386","volume-title":"Offline meta-reinforcement learning for industrial insertion","author":"Zhao","year":"2022"},{"key":"2026033012321563800_ref345","first-page":"737","volume-title":"Sim-to-real transfer in deep reinforcement learning for robotics: a survey","author":"Zhao","year":"2020"},{"key":"2026033012321563800_ref346","first-page":"1246","volume-title":"MELD: Meta-Reinforcement Learning from Images via Latent State Models","author":"Zhao","year":"2021"},{"key":"2026033012321563800_ref347","first-page":"11436","article-title":"What can learned intrinsic rewards capture?","author":"Zheng","year":"2020"},{"key":"2026033012321563800_ref348","volume-title":"On Learning Intrinsic Rewards for Policy Gradient Methods","author":"Zheng","year":"2018"},{"key":"2026033012321563800_ref349","article-title":"Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards","volume-title":"ICLR.","author":"Zhou","year":"2020"},{"key":"2026033012321563800_ref350","volume-title":"Environment Probing Interaction Policies","author":"Zhou","year":"2019"},{"key":"2026033012321563800_ref351","first-page":"1712","volume-title":"Deep Interactive Bayesian Reinforcement Learning via Meta-Learning","author":"Zintgraf","year":"2021"},{"issue":"289","key":"2026033012321563800_ref352","first-page":"1","article-title":"VariBAD: Variational Bayes-Adaptive Deep RL via Meta-Learning","volume":"22","author":"Zintgraf","year":"2021","journal-title":"Journal of Machine Learning Research."},{"key":"2026033012321563800_ref353","volume-title":"VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning","author":"Zintgraf","year":"2020"},{"key":"2026033012321563800_ref354","first-page":"12991","volume-title":"Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning","author":"Zintgraf","year":"2021"},{"key":"2026033012321563800_ref355","article-title":"Fast Context Adaptation via Meta-Learning","volume-title":"ICLR.","author":"Zintgraf","year":"2019"},{"key":"2026033012321563800_ref356","first-page":"11210","volume-title":"Learning task-distribution reward shaping with meta-learning","author":"Zou","year":"2021"},{"key":"2026033012321563800_ref357","first-page":"20865","volume-title":"Gradient-EM Bayesian Meta-Learning","author":"Zou","year":"2020"}],"container-title":["Foundations and Trends\u00ae in Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/18\/2-3\/224\/11147306\/2200000080en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/18\/2-3\/224\/11147306\/2200000080en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T16:33:54Z","timestamp":1774888434000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftmal\/article\/18\/2-3\/224\/1332163\/A-Tutorial-on-Meta-Reinforcement-Learning"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,3]]},"references-count":358,"journal-issue":{"issue":"2-3","published-print":{"date-parts":[[2025,4,3]]}},"URL":"https:\/\/doi.org\/10.1561\/2200000080","relation":{},"ISSN":["1935-8237","1935-8245"],"issn-type":[{"value":"1935-8237","type":"print"},{"value":"1935-8245","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,3]]}}}