{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T00:24:56Z","timestamp":1781310296253,"version":"3.54.1"},"reference-count":49,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T00:00:00Z","timestamp":1747180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T00:00:00Z","timestamp":1747180800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"Schmidt Sciences AI2050 Early Career Fellowship"},{"DOI":"10.13039\/100000105","name":"Office of Advanced Cyberinfrastructure","doi-asserted-by":"crossref","award":["#2118201"],"award-info":[{"award-number":["#2118201"]}],"id":[{"id":"10.13039\/100000105","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Large language models (LLMs) are increasingly being used in materials science. However, little attention has been given to benchmarking and standardized evaluation for LLM-based materials property prediction, which hinders progress. We present LLM4Mat-Bench, the largest benchmark to date for evaluating the performance of LLMs in predicting the properties of crystalline materials. LLM4Mat-Bench contains about 1.9\u2009M crystal structures in total, collected from 10 publicly available materials data sources, and 45 distinct properties. LLM4Mat-Bench features different input modalities: crystal composition, CIF, and crystal text description, with 4.7\u2009M, 615.5\u2009M, and 3.1B tokens in total for each modality, respectively. We use LLM4Mat-Bench to fine-tune models with different sizes, including LLM-Prop and MatBERT, and provide zero-shot and few-shot prompts to evaluate the property prediction capabilities of LLM-chat-like models, including Llama, Gemma, and Mistral. The results highlight the challenges of general-purpose LLMs in materials science and the need for task-specific predictive models and task-specific instruction-tuned LLMs in materials property prediction<jats:sup>7<\/jats:sup>\n                  <jats:fn id=\"mlstadd3bbfn2\">\n                     <jats:label>7<\/jats:label>\n                     <jats:p>The Benchmark and code can be found at: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/vertaix\/LLM4Mat-Bench\">https:\/\/github.com\/vertaix\/LLM4Mat-Bench<\/jats:ext-link>.<\/jats:p>\n                  <\/jats:fn>.<\/jats:p>","DOI":"10.1088\/2632-2153\/add3bb","type":"journal-article","created":{"date-parts":[[2025,5,2]],"date-time":"2025-05-02T22:57:47Z","timestamp":1746226667000},"page":"020501","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["LLM4Mat-bench: benchmarking large language models for materials property prediction"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3608-2039","authenticated-orcid":true,"given":"Andre","family":"Niyongabo Rubungo","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4471-8527","authenticated-orcid":false,"given":"Kangming","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2937-3188","authenticated-orcid":false,"given":"Jason","family":"Hattrick-Simpers","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5687-3554","authenticated-orcid":false,"given":"Adji","family":"Bousso Dieng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"266","published-online":{"date-parts":[[2025,5,14]]},"reference":[{"key":"mlstadd3bbbib1","article-title":"Gpt-4 technical report","author":"Achiam","year":"2023"},{"key":"mlstadd3bbbib2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-024-54639-7","article-title":"Crystal structure generation with autoregressive large language modeling","volume":"15","author":"Antunes","year":"2024","journal-title":"Nat. Comm."},{"key":"mlstadd3bbbib3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0171501","article-title":"Organic materials database: an open-access online database for data mining","volume":"12","author":"Borysov","year":"2017","journal-title":"PLoS One"},{"key":"mlstadd3bbbib4","doi-asserted-by":"publisher","first-page":"1649","DOI":"10.1021\/acs.jcim.3c00285","article-title":"Do large language models understand chemistry? a conversation with chatgpt","volume":"63","author":"Castro Nascimento","year":"2023","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadd3bbbib5","article-title":"Llamp: large language model made powerful for high-fidelity materials knowledge retrieval and distillation","author":"Chiang","year":"2024"},{"key":"mlstadd3bbbib6","doi-asserted-by":"publisher","first-page":"6909","DOI":"10.1021\/acs.jpclett.4c01126","article-title":"Atomgpt: atomistic generative pretrained transformer for forward and inverse materials design","volume":"15","author":"Choudhary","year":"2024","journal-title":"J. Phys. Chem. Lett."},{"key":"mlstadd3bbbib7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41524-021-00650-1","article-title":"Atomistic line graph neural network for improved materials property predictions","volume":"7","author":"Choudhary","year":"2021","journal-title":"npj Comput. Mater."},{"key":"mlstadd3bbbib8","doi-asserted-by":"publisher","first-page":"5179","DOI":"10.1038\/s41598-017-05402-0","article-title":"High-throughput identification and characterization of two-dimensional materials using density functional theory","volume":"7","author":"Choudhary","year":"2017","journal-title":"Sci. Rep."},{"key":"mlstadd3bbbib9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2018.82","article-title":"Computational screening of high-performance optoelectronic materials using optb88vdw and tb-mbj formalisms","volume":"5","author":"Choudhary","year":"2018","journal-title":"Sci. Data"},{"key":"mlstadd3bbbib10","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1038\/s41524-024-01259-w","article-title":"Jarvis-leaderboard: a large scale benchmark of materials design methods","volume":"10","author":"Choudhary","year":"2024","journal-title":"npj Comput. Mater."},{"key":"mlstadd3bbbib11","first-page":"507","article-title":"Crysmmnet: multimodal representation for crystal property prediction","author":"Das","year":"2023"},{"key":"mlstadd3bbbib12","doi-asserted-by":"publisher","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding","volume":"Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"mlstadd3bbbib13","article-title":"The llama 3 herd of models","author":"Dubey","year":"2024"},{"key":"mlstadd3bbbib14","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1038\/s41524-020-00406-3","article-title":"Benchmarking materials property prediction methods: the matbench test set and automatminer reference algorithm","volume":"6","author":"Dunn","year":"2020","journal-title":"npj Comput. Mater."},{"key":"mlstadd3bbbib15","first-page":"375","article-title":"Translation between molecules and natural language","author":"Edwards","year":"2022"},{"key":"mlstadd3bbbib16","article-title":"Mol-instructions-a large-scale biomolecular instruction dataset for large language models","author":"Fang","year":"2023"},{"key":"mlstadd3bbbib17","article-title":"Language models can generate molecules, materials, and protein binding sites directly in three dimensions as xyz, cif, and pdb files","author":"Flam-Shepherd","year":"2023"},{"key":"mlstadd3bbbib18","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevMaterials.7.044603","article-title":"Fast and accurate prediction of material properties with three-body tight-binding model for the periodic table","volume":"7","author":"Garrity","year":"2023","journal-title":"Phys. Rev. Mater."},{"key":"mlstadd3bbbib19","article-title":"xval: a continuous number encoding for large language models","author":"Golkar","year":"2023"},{"key":"mlstadd3bbbib20","article-title":"Fine-tuned language models generate stable inorganic materials as text","author":"Gruver","year":"2023"},{"key":"mlstadd3bbbib21","doi-asserted-by":"publisher","first-page":"655","DOI":"10.1107\/S010876739101067X","article-title":"The crystallographic information file (cif): a new standard archive file for crystallography","volume":"47","author":"Hall","year":"1991","journal-title":"Found. Crystallogr."},{"key":"mlstadd3bbbib22","doi-asserted-by":"publisher","DOI":"10.1063\/1.4812323","article-title":"The materials project: A materials genome approach to accelerating materials innovation","volume":"1","author":"Jain","year":"2013","journal-title":"APL Mater."},{"key":"mlstadd3bbbib23","article-title":"Mistral 7b","author":"Jiang","year":"2023"},{"key":"mlstadd3bbbib24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/npjcompumats.2015.10","article-title":"The open quantum materials database (oqmd): assessing the accuracy of dft formation energies","volume":"1","author":"Kirklin","year":"2015","journal-title":"npj Comput. Mater."},{"key":"mlstadd3bbbib25","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2023.100803","article-title":"Accurate, interpretable predictions of materials properties within transformer language models","volume":"4","author":"Korolev","year":"2023","journal-title":"Patterns"},{"key":"mlstadd3bbbib26","doi-asserted-by":"publisher","first-page":"12412","DOI":"10.1039\/D4TA00982G","article-title":"Efficient first principles based modeling via machine learning: from simple representations to high entropy materials","volume":"12","author":"Li","year":"2024","journal-title":"J. Mater. Chem. A"},{"key":"mlstadd3bbbib27","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1038\/s43246-024-00731-w","article-title":"Probing out-of-distribution generalization in machine learning for materials","volume":"6","author":"Li","year":"2025","journal-title":"Comm. Mat."},{"key":"mlstadd3bbbib28","article-title":"Language models of protein sequences at the scale of evolution enable accurate structure prediction","volume":"2022","author":"Lin","year":"2022"},{"key":"mlstadd3bbbib29","article-title":"Deepseek-v3 technical report","author":"Liu","year":"2024"},{"key":"mlstadd3bbbib30","article-title":"Prollama: a protein large language model for multi-task protein language processing","author":"Lv","year":"2024"},{"key":"mlstadd3bbbib31","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1038\/s41586-023-06735-9","article-title":"Scaling deep learning for materials discovery","volume":"624","author":"Merchant","year":"2023","journal-title":"Nature"},{"key":"mlstadd3bbbib32","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2022.100491","article-title":"Scalable deeper graph neural networks for high-performance materials property prediction","volume":"3","author":"Omee","year":"2022","journal-title":"Patterns"},{"key":"mlstadd3bbbib33","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1038\/s41524-024-01231-8","article-title":"Leveraging language representation for materials exploration and discovery","volume":"10","author":"Qu","year":"2024","journal-title":"npj Comput. Mater."},{"key":"mlstadd3bbbib34","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018"},{"key":"mlstadd3bbbib35","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadd3bbbib36","doi-asserted-by":"publisher","first-page":"1578","DOI":"10.1016\/j.matt.2021.02.015","article-title":"Machine learning the quantum-chemical properties of metal\u2013organic frameworks for accelerated materials discovery","volume":"4","author":"Rosen","year":"2021","journal-title":"Matter"},{"key":"mlstadd3bbbib37","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1038\/s41524-022-00796-6","article-title":"High-throughput predictions of metal\u2013organic framework electronic properties: theoretical challenges, graph neural networks and data exploration","volume":"8","author":"Rosen","year":"2022","journal-title":"npj Comput. Mater."},{"key":"mlstadd3bbbib38","article-title":"Llm-prop: predicting physical and electronic properties of crystalline solids from their text descriptions","author":"Rubungo","year":"2023"},{"key":"mlstadd3bbbib39","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1117\/12.2520589","article-title":"Super-convergence: very fast training of neural networks using large learning rates","volume":"vol 11006","author":"Smith","year":"2019"},{"key":"mlstadd3bbbib40","article-title":"Gemini 1.5: unlocking multimodal understanding across millions of tokens of context","author":"Team"},{"key":"mlstadd3bbbib41","article-title":"Gemma: open models based on gemini research and technology","author":"Team","year":"2024"},{"key":"mlstadd3bbbib42","article-title":"Gemma 2: improving open language models at a practical size","author":"Team","year":"2024"},{"key":"mlstadd3bbbib43","article-title":"What information is necessary and sufficient to predict materials properties using machine learning?","author":"Tian","year":"2022"},{"key":"mlstadd3bbbib44","article-title":"Llama 2: open foundation and fine-tuned chat models","author":"Touvron","year":"2023"},{"key":"mlstadd3bbbib45","doi-asserted-by":"publisher","DOI":"10.3389\/fbinf.2023.1304099","article-title":"The promises of large language models for protein design and modeling","volume":"3","author":"Valentini","year":"2023","journal-title":"Front. Bioinf."},{"key":"mlstadd3bbbib46","doi-asserted-by":"crossref","DOI":"10.2139\/ssrn.3950755","article-title":"The impact of domain-specific pre-training on named entity recognition tasks in materials science","author":"Walker","year":"2021"},{"key":"mlstadd3bbbib47","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1038\/nchem.1192","article-title":"Large-scale screening of hypothetical metal\u2013organic frameworks","volume":"4","author":"Wilmer","year":"2012","journal-title":"Nat. Chem."},{"key":"mlstadd3bbbib48","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.120.145301","article-title":"Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties","volume":"120","author":"Xie","year":"2018","journal-title":"Phys. Rev. Lett."},{"key":"mlstadd3bbbib49","article-title":"Darwin series: domain specific large language models for natural science","author":"Xie","year":"2023"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T10:35:43Z","timestamp":1747305343000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/add3bb"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,14]]},"references-count":49,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,5,14]]},"published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/add3bb","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,14]]},"assertion":[{"value":"LLM4Mat-bench: benchmarking large language models for materials property prediction","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-11-26","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-05-01","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-05-14","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}