{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:31:39Z","timestamp":1775579499876,"version":"3.50.1"},"reference-count":32,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T00:00:00Z","timestamp":1742947200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001833","name":"Vrije Universiteit Amsterdam","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001833","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Traditional approaches to training agents have generally involved a single, deterministic environment of minimal complexity to solve various tasks such as robot locomotion or computer vision. However, agents trained in static environments lack generalization capabilities, limiting their potential in broader scenarios. Thus, recent benchmarks frequently rely on multiple environments, for instance, by providing stochastic noise, simple permutations, or altogether different settings. In practice, such collections result mainly from costly human-designed processes or the liberal use of random number generators. In this work, we introduce AMaze, a novel benchmark generator in which embodied agents must navigate a maze by interpreting visual signs of arbitrary complexities and deceptiveness. This generator promotes human interaction through the easy generation of feature-specific mazes and an intuitive understanding of the resulting agents' strategies. As a proof-of-concept, we demonstrate the capabilities of the generator in a simple, fully discrete case with limited deceptiveness. Agents were trained under three different regimes (one-shot, scaffolding, and interactive), and the results showed that the latter two cases outperform direct training in terms of generalization capabilities. Indeed, depending on the combination of generalization metric, training regime, and algorithm, the median gain ranged from 50% to 100% and maximal performance was achieved through interactive training, thereby demonstrating the benefits of a controllable human-in-the-loop benchmark generator.<\/jats:p>","DOI":"10.3389\/frai.2025.1511712","type":"journal-article","created":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T08:03:27Z","timestamp":1742976207000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["AMaze: an intuitive benchmark generator for fast prototyping of generalizable agents"],"prefix":"10.3389","volume":"8","author":[{"given":"Kevin","family":"Godin-Dubois","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karine","family":"Miras","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anna V.","family":"Kononova","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,3,26]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1145\/3205651.3208285","article-title":"\u201cMaze benchmark for testing evolutionary algorithms,\u201d","volume-title":"Proceedings of the Genetic and Evolutionary Computation Conference Companion","author":"Alaguna","year":"2018"},{"key":"B2","doi-asserted-by":"publisher","first-page":"834","DOI":"10.1109\/TSMC.1983.6313077","article-title":"Neuronlike adaptive elements that can solve difficult learning control problems","volume":"13","author":"Barto","year":"1983","journal-title":"IEEE Trans. Syst. Man Cybern"},{"key":"B3","article-title":"DeepMind Lab2D","author":"Beattie","year":"2020","journal-title":"arXiv [Preprint]"},{"key":"B4","article-title":"DeepMind Lab","author":"Beattie","year":"2016","journal-title":"arXiv [Preprint]"},{"key":"B5","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","article-title":"The arcade learning environment: an evaluation platform for general agents","volume":"47","author":"Bellemare","year":"2013","journal-title":"J. Artif. Intell. Res"},{"key":"B6","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. Roy. Statist. Soc. Ser. B"},{"key":"B7","first-page":"2026","article-title":"\u201cLeveraging procedural generation to benchmark reinforcement learning,\u201d","author":"Cobbe","year":"2020","journal-title":"37th International Conference on Machine Learning, ICML 2020"},{"key":"B8","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-662-44874-8","volume-title":"Introduction to Evolutionary Computing, volume 28","author":"Eiben","year":"2015"},{"key":"B9","first-page":"2587","article-title":"\u201cAddressing function approximation error in actor-critic methods,\u201d","author":"Fujimoto","year":"2018","journal-title":"35th International Conference on Machine Learning, ICML 2018"},{"key":"B10","article-title":"AMaze: Fully discrete training with three regimes (direct, scaffolding, interactive) and two algorithms (A2C, PPO)","author":"Godin-Dubois","year":"2024","journal-title":"arXiv [Preprint]"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI47803.2020.9308411","article-title":"\u201cBeneficial catastrophes: leveraging abiotic constraints through environment-driven evolutionary selection,\u201d","author":"Godin-Dubois","year":"2020","journal-title":"2020 IEEE Symposium Series on Computational Intelligence (SSCI)"},{"key":"B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/CIG.2019.8848048","article-title":"\u201cMazeExplorer: a customisable 3D benchmark for assessing generalisation in reinforcement learning,\u201d","volume-title":"2019 IEEE Conference on Games (CoG)","author":"Harries","year":"2019"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694","article-title":"\u201cDeep reinforcement learning that matters,\u201d","author":"Henderson","year":"2018","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"B14","first-page":"2684","article-title":"\u201cObstacle tower: a generalization challenge in vision, control, and planning,\u201d","volume-title":"Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, volume August","author":"Juliani","year":"2019"},{"key":"B15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/CIG.2016.7860433","article-title":"\u201cViZDoom: a Doom-based AI research platform for visual reinforcement learning,\u201d","volume-title":"2016 IEEE Conference on Computational Intelligence and Games (CIG)","author":"Kempka","year":"2016"},{"key":"B16","article-title":"\u201cURLB: unsupervised reinforcement learning benchmark,\u201d","author":"Laskin","year":"2021","journal-title":"NeurIPS"},{"key":"B17","article-title":"Asynchronous methods for deep reinforcement learning","author":"Mnih","year":"2016","journal-title":"arXiv [Preprint]"},{"key":"B18","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B19","article-title":"Gotta learn fast: a new benchmark for generalization in RL","author":"Nichol","year":"2018","journal-title":"arXiv [Preprint]"},{"key":"B20","first-page":"1","article-title":"Stable-baselines3: reliable reinforcement learning implementations","volume":"22","author":"Raffin","year":"2021","journal-title":"J. Mach. Learn. Res"},{"key":"B21","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1038\/s42256-020-0208-z","article-title":"Increasing generality in machine learning through procedural content generation","volume":"2","author":"Risi","year":"2020","journal-title":"Nat. Mach. Intell"},{"key":"B22","article-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017","journal-title":"arXiv [Preprint]"},{"key":"B23","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Techn. J"},{"key":"B24","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"2018"},{"key":"B25","article-title":"TorchCraft: a library for machine learning research on real-time strategy games","author":"Synnaeve","year":"2016","journal-title":"arXiv [Preprint]"},{"key":"B26","article-title":"ELF: an extensive, lightweight and flexible research platform for real-time strategy games","author":"Tian","year":"2017","journal-title":"arXiv [Preprint]"},{"key":"B27","doi-asserted-by":"crossref","first-page":"5026","DOI":"10.1109\/IROS.2012.6386109","article-title":"\u201cMuJoCo: a physics engine for model-based control,\u201d","volume-title":"2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Todorov","year":"2012"},{"key":"B28","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1109\/CoG51982.2022.9893707","article-title":"\u201cLevDoom: a benchmark for generalization on level difficulty in reinforcement learning,\u201d","volume-title":"2022 IEEE Conference on Games (CoG)","author":"Tomilin","year":"2022"},{"key":"B29","article-title":"Gymnasium: a standard interface for reinforcement learning environments","author":"Towers","year":"2024","journal-title":"arXiv [Preprint]"},{"key":"B30","article-title":"Paired open-ended trailblazer (POET): endlessly generating increasingly complex and diverse learning environments and their solutions","author":"Wang","year":"2019","journal-title":"arXiv [Preprint]"},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04921-7_39","article-title":"\u201cA cat-like robot real-time learning to run,\u201d","author":"Wawrzy\u0144ski","year":"2009","journal-title":"Adaptive and Natural Computing Algorithms. ICANNGA 2009"},{"key":"B32","article-title":"Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning","author":"Yu","year":"2019","journal-title":"arXiv [Preprint]"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1511712\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T08:04:34Z","timestamp":1742976274000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1511712\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,26]]},"references-count":32,"alternative-id":["10.3389\/frai.2025.1511712"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1511712","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,26]]},"article-number":"1511712"}}