{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T13:25:00Z","timestamp":1781616300326,"version":"3.54.5"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T00:00:00Z","timestamp":1767225600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T00:00:00Z","timestamp":1767744000000},"content-version":"vor","delay-in-days":6,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Johannes Gutenberg-Universit\u00e4t Mainz"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Automating scientific discovery has been one of the motivating tasks in the development of AI methods. The task of Equation Discovery (also called Symbolic Regression) is to learn a free-form symbolic equation from experimental data. Equation Discovery benchmarks, however, assume the experimental data as given. Recent successes in protein folding and material optimization, powered by advancements, amongst others, in reinforcement learning and deep learning, have renewed the broader community\u2019s interest in applications of AI in science. Nonetheless, these successful applications do not necessarily lead to an improved understanding of the underlying phenomena, just as super-human chess engines do not necessarily lead to improved understanding of chess theory and practice. In this paper, we propose Science-Gym: a new testbed for basic physics understanding. To the best of our knowledge, Science-Gym is the first scientific discovery benchmark that requires agents to autonomously perform data collection, experimental design, and discover the underlying equations of phenomena. Science-Gym is a Python software library with Gym-compatible bindings. It offers seven scientific simulations, which reproduce basic physics and epidemiology principles: the law of the lever, projectile motion, the inclined plane, Lagrangian points in space, brachistochrones, the SIRV model, and the friction force of a droplet. In these environments, agents may be evaluated not only on their ability in e.g. balancing objects on the two beams of a lever, but more importantly on finding equations that describe the overall behavior of the dynamical system at hand.<\/jats:p>","DOI":"10.1007\/s10994-025-06914-x","type":"journal-article","created":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T23:18:12Z","timestamp":1767741492000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Science-Gym: a simple testbed for AI-driven scientific discovery"],"prefix":"10.1007","volume":"115","author":[{"given":"Mattia","family":"Cerrato","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lennart","family":"Baur","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jannis","family":"Brugger","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sajjad","family":"Shumaly","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nicholas","family":"Schmitt","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Edward","family":"Finkelstein","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Selina","family":"Jukic","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lars","family":"M\u00fcnzel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Felix Peter","family":"Paul","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pascal","family":"Pfannes","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Benedikt","family":"Rohr","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Julius","family":"Schellenberg","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Philipp","family":"Wolf","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefan","family":"Kramer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,1,7]]},"reference":[{"key":"6914_CR1","unstructured":"Biewald, L. (2020). Experiment Tracking with Weights and Biases. Software available from wandb.com . https:\/\/www.wandb.com\/"},{"key":"6914_CR2","unstructured":"Catto, E.: Box2D. GitHub (2021)."},{"key":"6914_CR3","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1613\/jair.295","volume":"4","author":"DA Cohn","year":"1996","unstructured":"Cohn, D. A., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129\u2013145.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6914_CR4","doi-asserted-by":"publisher","unstructured":"Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl https:\/\/doi.org\/10.48550\/ARXIV.2305.01582 . Publisher: arXiv Version Number: 2. Accessed 2023-05-05","DOI":"10.48550\/ARXIV.2305.01582"},{"issue":"4","key":"6914_CR5","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1016\/0095-8522(62)90011-9","volume":"17","author":"C Furmidge","year":"1962","unstructured":"Furmidge, C. (1962). Studies at phase interfaces i. the sliding of liquid drops on solid surfaces and a theory for spray retention. Journal of Colloid Science, 17(4), 309\u2013324.","journal-title":"Journal of Colloid Science"},{"key":"6914_CR6","unstructured":"Gandhi, K., Li, M.Y., Goodyear, L., Li, L., Bhaskar, A., Zaman, M., & Goodman, N.D. (2025). BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery. https:\/\/github.com\/kanishkg\/boxing-gym"},{"key":"6914_CR7","unstructured":"gplearn. https:\/\/github.com\/trevorstephens\/gplearn. Accessed: 2025-04-22"},{"key":"6914_CR8","unstructured":"Haarnoja, T., et al. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861\u20131870 . PMLR"},{"key":"6914_CR9","unstructured":"Jansen, P.e.a. (2024). DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents."},{"key":"6914_CR10","doi-asserted-by":"crossref","unstructured":"Johnson, J. & et al. (2017). Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2901\u20132910.","DOI":"10.1109\/CVPR.2017.215"},{"issue":"7873","key":"6914_CR11","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","volume":"596","author":"J Jumper","year":"2021","unstructured":"Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., \u017d\u00eddek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., \u2026 Hassabis, D. (2021). Highly accurate protein structure prediction with alphafold. Nature, 596(7873), 583\u2013589. https:\/\/doi.org\/10.1038\/s41586-021-03819-2","journal-title":"Nature"},{"key":"6914_CR12","doi-asserted-by":"crossref","unstructured":"Kitano, H. (2021). Nobel turing challenge: creating the engine for scientific discovery. NPJ Systems Biology and Applications 7(29).","DOI":"10.1038\/s41540-021-00189-3"},{"issue":"5","key":"6914_CR13","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1016\/0095-8522(60)90044-1","volume":"15","author":"K Kawasaki","year":"1960","unstructured":"Kawasaki, K. (1960). Study of wettability of polymers by sliding of water drop. Journal of Colloid Science, 15(5), 402\u2013407.","journal-title":"Journal of Colloid Science"},{"issue":"2","key":"6914_CR14","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/BF00175355","volume":"4","author":"JR Koza","year":"1994","unstructured":"Koza, J. R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and Computing, 4(2), 87\u2013112.","journal-title":"Statistics and Computing"},{"key":"6914_CR15","unstructured":"Kramer, S., Cerrato, M., D\u017eeroski, S., & King, R. (2023). Automated scientific discovery: From equation discovery to autonomous discovery systems."},{"issue":"6","key":"6914_CR16","doi-asserted-by":"publisher","first-page":"562","DOI":"10.1371\/journal.pone.0000562","volume":"2","author":"A Krishnan","year":"2007","unstructured":"Krishnan, A., Giuliani, A., & Tomita, M. (2007). Indeterminacy of reverse engineering of gene regulatory networks: the curse of gene elasticity. PLoS One, 2(6), 562.","journal-title":"PLoS One"},{"key":"6914_CR17","unstructured":"La\u00a0Cava, W., Spector, L., & Danai, K. (2021). Srbench: A living benchmark for symbolic regression. arXiv preprint arXiv:2102.13031"},{"key":"6914_CR18","unstructured":"Langley, P. (1977). Bacon: A production system that discovers empirical laws. In: Proceedings of the 5th International Joint Conference on Artificial Intelligence (IJCAI), pp. 344\u2013350."},{"key":"6914_CR19","doi-asserted-by":"crossref","unstructured":"Langley, P.W., Simon, H.A., Bradshaw, G.L., & Zytkow, J.M. (1987). Scientific Discovery: Computational Explorations of the Creative Processes. MIT Press, ???","DOI":"10.7551\/mitpress\/6090.001.0001"},{"key":"6914_CR20","doi-asserted-by":"publisher","unstructured":"Li, X., Bodziony, F., Yin, M., Marschall, H., Berger, R., & Butt, H.-J. (2023). Kinetic drop friction. Nature Communications,14(1), 4571. https:\/\/doi.org\/10.1038\/s41467-023-40289-8. Number: 1 Publisher: Nature Publishing Group. Accessed 2023-12-12","DOI":"10.1038\/s41467-023-40289-8"},{"key":"6914_CR21","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pntd.0000761","author":"P Luz","year":"2010","unstructured":"Luz, P., Struchiner, C., & Galvani, A. (2010). Modeling transmission dynamics and control of vector-borne neglected tropical diseases. PLoS Neglected Tropical Diseases. https:\/\/doi.org\/10.1371\/journal.pntd.0000761","journal-title":"PLoS Neglected Tropical Diseases"},{"key":"6914_CR22","doi-asserted-by":"crossref","unstructured":"Makke, N., S., C. (2024). Interpretable scientific discovery with symbolic regression: A review. Artificial Intelligence Review 57(2).","DOI":"10.1007\/s10462-023-10622-0"},{"key":"6914_CR23","unstructured":"Microsoft Research\u00a0AI4Science, M.A.Q. (2023). The impact of large language models on scientific discovery: a preliminary study using gpt-4. arXiv preprint arXiv:2311.07361"},{"key":"6914_CR24","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928\u20131937. PMLR, New York, New York, USA . https:\/\/proceedings.mlr.press\/v48\/mniha16.html"},{"key":"6914_CR25","doi-asserted-by":"crossref","unstructured":"Olson, R.S., Bartley, N., Urbanowicz, R.J., & Moore, J.H. (2017). Pmlb: A large benchmark suite for machine learning evaluation and comparison. In: Gecco, pp. 503\u2013510 . ACM","DOI":"10.1186\/s13040-017-0154-4"},{"issue":"268","key":"6914_CR26","first-page":"1","volume":"22","author":"A Raffin","year":"2021","unstructured":"Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1\u20138.","journal-title":"Journal of Machine Learning Research"},{"issue":"5923","key":"6914_CR27","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1126\/science.1165620","volume":"324","author":"K Ross","year":"2009","unstructured":"Ross, K., et al. (2009). The automation of science. Science, 324(5923), 85\u201389.","journal-title":"Science"},{"key":"6914_CR28","unstructured":"Shuaiwen, L.S., et al. (2023). Deepspeed4science initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies."},{"key":"6914_CR29","volume-title":"Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering","author":"SH Strogatz","year":"1994","unstructured":"Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Westview Press."},{"key":"6914_CR30","unstructured":"Sutton, R.S., & Barto, A.G. (2018). Reinforcement learning: An introduction. MIT Press."},{"key":"6914_CR31","doi-asserted-by":"crossref","unstructured":"Udrescu, S.-M., & Tegmark, M. (2020). AI Feynman: A physics-inspired method for symbolic regression. Science Advances 6(16).","DOI":"10.1126\/sciadv.aay2631"},{"key":"6914_CR32","doi-asserted-by":"crossref","unstructured":"Wang, R., Jansen, P., C\u00f4t\u00e9, M.-A., & Ammanabrolu, P. (2022). ScienceWorld: Is your Agent Smarter than a 5th Grader? . https:\/\/arxiv.org\/abs\/2203.07540","DOI":"10.18653\/v1\/2022.emnlp-main.775"},{"key":"6914_CR33","unstructured":"Yi, K., et al. (2020). Clevrer: Collision events for video representation and reasoning. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7661\u20137670."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06914-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06914-x","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06914-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T14:05:33Z","timestamp":1769781933000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06914-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["6914"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06914-x","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1]]},"assertion":[{"value":"24 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 August 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 October 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2026","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"16"}}