{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T16:51:48Z","timestamp":1781801508981,"version":"3.54.5"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2025,8,11]],"date-time":"2025-08-11T00:00:00Z","timestamp":1754870400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,11]],"date-time":"2025-08-11T00:00:00Z","timestamp":1754870400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100007569","name":"Carl-Zeiss-Stiftung","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007569","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["460197019"],"award-info":[{"award-number":["460197019"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006785","name":"Google","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Comput Sci"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Recent advancements in artificial intelligence have sparked interest in scientific assistants that could support researchers across the full spectrum of scientific workflows, from literature review to experimental design and data analysis. A key capability for such systems is the ability to process and reason about scientific information in both visual and textual forms\u2014from interpreting spectroscopic data to understanding laboratory set-ups. Here we introduce MaCBench, a comprehensive benchmark for evaluating how vision language models handle real-world chemistry and materials science tasks across three core aspects: data extraction, experimental execution and results interpretation. Through a systematic evaluation of leading models, we find that although these systems show promising capabilities in basic perception tasks\u2014achieving near-perfect performance in equipment identification and standardized data extraction\u2014they exhibit fundamental limitations in spatial reasoning, cross-modal information synthesis and multi-step logical inference. Our insights have implications beyond chemistry and materials science, suggesting that developing reliable multimodal AI scientific assistants may require advances in curating suitable training data and approaches to training those models.<\/jats:p>","DOI":"10.1038\/s43588-025-00836-3","type":"journal-article","created":{"date-parts":[[2025,8,11]],"date-time":"2025-08-11T09:02:44Z","timestamp":1754902964000},"page":"952-961","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Probing the limitations of multimodal language models for chemistry and materials research"],"prefix":"10.1038","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-7697-7315","authenticated-orcid":false,"given":"Nawaf","family":"Alampara","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4392-5918","authenticated-orcid":false,"given":"Mara","family":"Schilling-Wilhelmi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1507-4048","authenticated-orcid":false,"given":"Marti\u00f1o","family":"R\u00edos-Garc\u00eda","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8808-4602","authenticated-orcid":false,"given":"Indrajeet","family":"Mandal","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8817-9134","authenticated-orcid":false,"given":"Pranav","family":"Khetarpal","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hargun Singh","family":"Grover","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1500-4947","authenticated-orcid":false,"given":"N. M. Anoop","family":"Krishnan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4894-4660","authenticated-orcid":false,"given":"Kevin Maik","family":"Jablonka","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,8,11]]},"reference":[{"key":"836_CR1","doi-asserted-by":"publisher","first-page":"3924","DOI":"10.1038\/s41467-023-39531-0","volume":"14","author":"B Mahjour","year":"2023","unstructured":"Mahjour, B. et al. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nat. Commun. 14, 3924 (2023).","journal-title":"Nat. Commun."},{"key":"836_CR2","doi-asserted-by":"publisher","first-page":"15691","DOI":"10.1021\/acscatal.3c03864","volume":"13","author":"J Lu","year":"2023","unstructured":"Lu, J. & Leitch, D. C. Organopalladium catalysis as a proving ground for data-rich approaches to reaction development and quantitative predictions. ACS Catal. 13, 15691\u201315707 (2023).","journal-title":"ACS Catal."},{"key":"836_CR3","doi-asserted-by":"publisher","first-page":"1082","DOI":"10.1038\/s44160-023-00351-1","volume":"2","author":"N Gesmundo","year":"2023","unstructured":"Gesmundo, N. et al. Miniaturization of popular reactions from the medicinal chemists\u2019 toolbox for ultrahigh-throughput experimentation. Nat. Synth. 2, 1082\u20131091 (2023).","journal-title":"Nat. Synth."},{"key":"836_CR4","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1038\/s41586-022-05263-2","volume":"610","author":"CC Wagen","year":"2022","unstructured":"Wagen, C. C., McMinn, S. E., Kwan, E. E. & Jacobsen, E. N. Screening for generality in asymmetric catalysis. Nature 610, 680\u2013686 (2022).","journal-title":"Nature"},{"key":"836_CR5","unstructured":"Microsoft Research AI4Science & Microsoft Azure Quantum. The impact of large language models on scientific discovery: a preliminary study using GPT-4. Preprint at https:\/\/arxiv.org\/abs\/2311.07361 (2023)."},{"key":"836_CR6","unstructured":"Jimenez, C. E. et al. SWE-Bench: can language models resolve real-world github issues? In The Twelfth International Conference on Learning Representations 6476 (ICLR, 2024)."},{"key":"836_CR7","unstructured":"Laurent, J. M. et al. LAB-Bench: measuring capabilities of language models for biology research. Preprint at https:\/\/arxiv.org\/abs\/2407.10362 (2024)."},{"key":"836_CR8","doi-asserted-by":"publisher","first-page":"991","DOI":"10.1038\/s42256-025-01058-y","volume":"7","author":"S Miret","year":"2025","unstructured":"Miret, S. & Krishnan, N. M. A. Enabling large language models for real-world materials discovery. Nat. Mach. Intell. 7, 991\u2013998 (2025).","journal-title":"Nat. Mach. Intell."},{"key":"836_CR9","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1038\/s41570-023-00502-0","volume":"7","author":"AD White","year":"2023","unstructured":"White, A. D. The future of chemistry is language. Nat. Rev. Chem. 7, 457\u2013458 (2023).","journal-title":"Nat. Rev. Chem."},{"key":"836_CR10","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.1039\/D3DD00113J","volume":"2","author":"KM Jablonka","year":"2023","unstructured":"Jablonka, K. M. et al. 14 Examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon. Digit. Discov. 2, 1233\u20131250 (2023).","journal-title":"Digit. Discov."},{"key":"836_CR11","doi-asserted-by":"publisher","first-page":"2514","DOI":"10.1039\/D4SC03921A","volume":"16","author":"MC Ramos","year":"2025","unstructured":"Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514\u20132572 (2025).","journal-title":"Chem. Sci."},{"key":"836_CR12","unstructured":"Bushuiev, R. et al. MassSpecGym: a benchmark for the discovery and identification of molecules. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track 2132 (NeurIPS, 2024)."},{"key":"836_CR13","unstructured":"One-third of college students used ChatGPT for schoolwork during the 2022\u201323 academic date. Intelligent (5 September 2023); https:\/\/www.intelligent.com\/one-third-of-college-students-used-chatgpt-for-schoolwork-during-the-2022-23-academic-date\/"},{"key":"836_CR14","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1038\/s42256-022-00465-9","volume":"4","author":"F Urbina","year":"2022","unstructured":"Urbina, F., Lentzos, F., Invernizzi, C. & Ekins, S. Dual use of artificial-intelligence-powered drug discovery. Nat. Mach. Intell. 4, 189\u2013191 (2022).","journal-title":"Nat. Mach. Intell."},{"key":"836_CR15","unstructured":"Campbell, Q. L., Herington, J. & White, A. D. Censoring chemical data to mitigate dual use risk. Preprint at https:\/\/arxiv.org\/abs\/2304.10510 (2023)."},{"key":"836_CR16","doi-asserted-by":"publisher","first-page":"1125","DOI":"10.1039\/D4CS00913D","volume":"54","author":"M Schilling-Wilhelmi","year":"2025","unstructured":"Schilling-Wilhelmi, M. et al. From text to insight: large language models for chemical data extraction. Chem. Soc. Rev. 54, 1125\u20131150 (2025).","journal-title":"Chem. Soc. Rev."},{"key":"836_CR17","doi-asserted-by":"publisher","first-page":"1569","DOI":"10.1038\/s41467-024-45914-8","volume":"15","author":"MP Polak","year":"2024","unstructured":"Polak, M. P. & Morgan, D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nat. Commun. 15, 1569 (2024).","journal-title":"Nat. Commun."},{"key":"836_CR18","unstructured":"Schilling-Wilhelmi, M. & Jablonka, K. M. Using machine-learning and large-language-model extracted data to predict copolymerizations. Preprint at https:\/\/openreview.net\/forum?id=zlutCyZ12H (2024)."},{"key":"836_CR19","doi-asserted-by":"publisher","first-page":"1822","DOI":"10.1039\/D4DD00091A","volume":"3","author":"Q Ai","year":"2024","unstructured":"Ai, Q., Meng, F., Shi, J., Pelkie, B. & Coley, C. W. Extracting structured data from organic synthesis procedures using a fine-tuned large language model. Digital Discovery 3, 1822\u20131831 (2024).","journal-title":"Digital Discovery"},{"key":"836_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-024-45563-x","volume":"15","author":"J Dagdelen","year":"2024","unstructured":"Dagdelen, J. et al. Structured information extraction from scientific text with large language models. Nat. Commun. 15, 1\u201312 (2024).","journal-title":"Nat. Commun."},{"key":"836_CR21","doi-asserted-by":"crossref","unstructured":"Caufield, J. H. et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): a method for populating knowledge bases using zero-shot learning. Bioinformatics40, btae104 (2024).","DOI":"10.1093\/bioinformatics\/btae104"},{"key":"836_CR22","unstructured":"Skarlinski, M. D. et al. Language agents achieve superhuman synthesis of scientific knowledge. Preprint at https:\/\/arxiv.org\/abs\/2409.13740 (2024)."},{"key":"836_CR23","doi-asserted-by":"crossref","unstructured":"Gupta, T., Zaki, M. & Krishnan, N. et al. Discomat: distantly supervised composition extraction from tables in materials science articles. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A., Boyd-Graber, J. & Okazaki, N.) 13465\u201313483 (Association for Computational Linguistics, 2023).","DOI":"10.18653\/v1\/2023.acl-long.753"},{"key":"836_CR24","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1038\/s42256-023-00788-1","volume":"6","author":"KM Jablonka","year":"2024","unstructured":"Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161\u2013169 (2024).","journal-title":"Nat. Mach. Intell."},{"key":"836_CR25","unstructured":"Ramos, M. C., Michtavy, S. S., Porosoff, M. D. & White, A. D. Bayesian optimization of catalysts with in-context learning. Preprint at https:\/\/arxiv.org\/abs\/2304.05341 (2023)."},{"key":"836_CR26","unstructured":"Zhong, Z., Zhou, K. & Mottin, D. Benchmarking large language models for molecule prediction tasks. Preprint at https:\/\/arxiv.org\/abs\/2403.05075 (2024)."},{"key":"836_CR27","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1039\/D3SC04610A","volume":"15","author":"Z Xie","year":"2024","unstructured":"Xie, Z. et al. Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules. Chem. Sci. 15, 500\u2013510 (2024).","journal-title":"Chem. Sci."},{"key":"836_CR28","unstructured":"Kristiadi, A. et al. A Sober look at LLMs for material discovery: are they actually good for Bayesian optimization over molecules? In Proc. 41st International Conference on Machine Learning 1025 (ICML, 2024)."},{"key":"836_CR29","unstructured":"Gruver, N. et al. Fine-tuned language models generate stable inorganic materials as text. In The Twelfth International Conference on Learning Representations 5580 (ICLR, 2024)."},{"key":"836_CR30","unstructured":"Alampara, N., Miret, S. & Jablonka, K. M. MatText: do language models need more than text and scale for materials modeling? Preprint at https:\/\/arxiv.org\/abs\/2406.17295 (2024)."},{"key":"836_CR31","doi-asserted-by":"publisher","first-page":"570","DOI":"10.1038\/s41586-023-06792-0","volume":"624","author":"DA Boiko","year":"2023","unstructured":"Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570\u2013578 (2023).","journal-title":"Nature"},{"key":"836_CR32","doi-asserted-by":"publisher","first-page":"101897","DOI":"10.1016\/j.matt.2024.10.015","volume":"8","author":"K Darvish","year":"2025","unstructured":"Darvish, K. et al. Organa: a robotic assistant for automated chemistry experimentation and characterization. Matter 8, 101897 (2025).","journal-title":"Matter"},{"key":"836_CR33","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1038\/s42256-024-00832-8","volume":"6","author":"A M. Bran","year":"2024","unstructured":"M. Bran, A. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525\u2013535 (2024).","journal-title":"Nat. Mach. Intell."},{"key":"836_CR34","doi-asserted-by":"crossref","unstructured":"Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E. & Zou, J. The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation. Preprint at https:\/\/www.biorxiv.org\/content\/10.1101\/2024.11.11.623004v1 (2024).","DOI":"10.1101\/2024.11.11.623004"},{"key":"836_CR35","unstructured":"Lu, P. et al. Learn to explain: multimodal reasoning via thought chains for science question answering. In Proc. 36th International Conference on Neural Information Processing Systems (NIPS '22) 2507\u20132521 (NeurIPS, 2022)."},{"key":"836_CR36","unstructured":"Gupta, H. et al. Polymath: a challenging multi-modal mathematical reasoning benchmark. Preprint at https:\/\/arxiv.org\/abs\/2410.14702 (2024)."},{"key":"836_CR37","doi-asserted-by":"crossref","unstructured":"Cheng, K. et al. Vision-language models can self-improve reasoning via reflection. In Proc. 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Chiruzzo, L., Ritter, A. & Wang, L.) 8876\u20138892 (Association for Computational Linguistics, 2025).","DOI":"10.18653\/v1\/2025.naacl-long.447"},{"key":"836_CR38","unstructured":"Zou, C. et al. DynaMath: a dynamic visual benchmark for evaluating mathematical reasoning robustness of vision language models. In The Thirteenth International Conference on Learning Representations 13293 (ICLR, 2025)."},{"key":"836_CR39","unstructured":"Shao, H. et al. Visual CoT: advancing multi-modal language models with a comprehensive dataset and benchmark for chain-of-thought reasoning. Preprint at https:\/\/arxiv.org\/abs\/2403.16999 (2024)."},{"key":"836_CR40","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1038\/s41557-025-01815-x","volume":"17","author":"A Mirza","year":"2025","unstructured":"Mirza, A. et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nat. Chem. 17, 1027\u20131034 (2025).","journal-title":"Nat. Chem."},{"key":"836_CR41","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1039\/D3DD00188A","volume":"3","author":"M Zaki","year":"2024","unstructured":"Zaki, M. & Krishnan, N. M. A. MaScQA: investigating materials science knowledge of large language models. Digital Discov. 3, 313\u2013327 (2024).","journal-title":"Digital Discov."},{"key":"836_CR42","unstructured":"Wang, X. et al. SciBench: evaluating college-level scientific problem-solving abilities of large language models. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) 50622\u201350649 (PMLR, 2024)."},{"key":"836_CR43","doi-asserted-by":"crossref","unstructured":"Zhang, R. et al. MathVerse: does your multi-modal LLM truly see the diagrams in visual math problems? in Computer Vision \u2013 ECCV 2024. ECCV 2024. Lecture Notes in Computer Science (eds Leonardis, A. et al.) Vol. 15066 (Springer, 2025).","DOI":"10.1007\/978-3-031-73242-3_10"},{"key":"836_CR44","doi-asserted-by":"crossref","unstructured":"Barrett, A. M., Jackson, K., Murphy, E. R., Madkour, N. & Newman, J. Benchmark early and red team often: a framework for assessing and managing dual-use hazards of AI foundation models. Preprint at https:\/\/arxiv.org\/abs\/2405.10986 (2024).","DOI":"10.70777\/si.v1i1.10601"},{"key":"836_CR45","unstructured":"Sandbrink, J. B. Artificial intelligence and biological misuse: differentiating risks of language models and biological design tools. Preprint at https:\/\/arxiv.org\/abs\/2306.13952 (2023)."},{"key":"836_CR46","unstructured":"McCoy, R. T., Yao, S., Friedman, D., Hardy, M. & Griffiths, T. L. Embers of autoregression: understanding large language models through the problem they are trained to solve. Preprint at https:\/\/arxiv.org\/abs\/2309.13638 (2023)."},{"key":"836_CR47","unstructured":"Anil, C. et al. Exploring length generalization in large language models. In 36th Conference on Neural Information Processing Systems (NeurIPS, 2022)."},{"key":"836_CR48","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1038\/s41557-022-00910-7","volume":"14","author":"KM Jablonka","year":"2022","unstructured":"Jablonka, K. M., Patiny, L. & Smit, B. Making the collective knowledge of chemistry open and machine actionable. Nat. Chem. 14, 365\u2013376 (2022).","journal-title":"Nat. Chem."},{"key":"836_CR49","doi-asserted-by":"publisher","first-page":"9633","DOI":"10.1021\/acs.chemrev.4c00055","volume":"124","author":"G Tom","year":"2024","unstructured":"Tom, G. et al. Self-driving laboratories for chemistry and materials science. Chem. Rev. 124, 9633\u20139732 (2024).","journal-title":"Chem. Rev."},{"key":"836_CR50","unstructured":"Srivastava, A. et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models. In Transactions on Machine Learning Research 783 (TMLR, 2023)."},{"key":"836_CR51","unstructured":"ProtectAI.com. Fine-Tuned Distilroberta-Base for Rejection in the Output Detection (ProjectAI, 2024); https:\/\/huggingface.co\/ProtectAI\/distilroberta-base-rejection-v1"},{"key":"836_CR52","doi-asserted-by":"crossref","unstructured":"Alampara, N., Schilling-Wilhelmi, M. & Jablonka, K. M. Lessons from the trenches on evaluating machine-learning systems in materials science. Preprint at https:\/\/www.arxiv.org\/abs\/2503.10837 (2025).","DOI":"10.1016\/j.commatsci.2025.114041"},{"key":"836_CR53","unstructured":"lamalab-org\/macbench (GitHub, 2025); https:\/\/github.com\/lamalab-org\/macbench\/blob\/main\/eval-card.md"},{"key":"836_CR54","doi-asserted-by":"publisher","unstructured":"Jablonka, K. et al. MaCBench Revision feb8c43 (Hugging Face, 2025); https:\/\/doi.org\/10.57967\/hf\/4611","DOI":"10.57967\/hf\/4611"},{"key":"836_CR55","doi-asserted-by":"publisher","unstructured":"Jablonka, K. et al. MaCBench-Ablations Revision c52701f (Hugging Face, 2025); https:\/\/doi.org\/10.57967\/hf\/4612","DOI":"10.57967\/hf\/4612"},{"key":"836_CR56","unstructured":"lamalab-org\/chembench(GitHub, 2025); https:\/\/github.com\/lamalab-org\/chembench\/"},{"key":"836_CR57","doi-asserted-by":"publisher","unstructured":"ChemBench authors. Chembench v.0.3.0 Zenodo https:\/\/doi.org\/10.5281\/zenodo.14935487 (2025).","DOI":"10.5281\/zenodo.14935487"},{"key":"836_CR58","unstructured":"Pseudomanifold\/latex-credits (GitHub, 2025); https:\/\/github.com\/Pseudomanifold\/latex-credits"}],"updated-by":[{"DOI":"10.1038\/s43588-025-00869-8","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T00:00:00Z","timestamp":1755734400000}}],"container-title":["Nature Computational Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00836-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00836-3","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00836-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T03:03:05Z","timestamp":1760065385000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s43588-025-00836-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,11]]},"references-count":58,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["836"],"URL":"https:\/\/doi.org\/10.1038\/s43588-025-00836-3","relation":{"correction":[{"id-type":"doi","id":"10.1038\/s43588-025-00869-8","asserted-by":"object"}]},"ISSN":["2662-8457"],"issn-type":[{"value":"2662-8457","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,11]]},"assertion":[{"value":"27 November 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 August 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 August 2025","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1038\/s43588-025-00869-8","URL":"https:\/\/doi.org\/10.1038\/s43588-025-00869-8","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"K.M.J. has been a paid contractor for OpenAI (as part of the red teaming network). The remaining authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}