{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T06:24:21Z","timestamp":1775715861375,"version":"3.50.1"},"reference-count":76,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T00:00:00Z","timestamp":1704844800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T00:00:00Z","timestamp":1704844800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"crossref","award":["EP\/T517811\/1"],"award-info":[{"award-number":["EP\/T517811\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"crossref","award":["EP\/X016188\/1"],"award-info":[{"award-number":["EP\/X016188\/1"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2024,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Machine learning (ML) based interatomic potentials have transformed the field of atomistic materials modelling. However, ML potentials depend critically on the quality and quantity of quantum-mechanical reference data with which they are trained, and therefore developing datasets and training pipelines is becoming an increasingly central challenge. Leveraging the idea of \u2018synthetic\u2019 (artificial) data that is common in other areas of ML research, we here show that synthetic atomistic data, themselves obtained at scale with an existing ML potential, constitute a useful pre-training task for neural-network (NN) interatomic potential models. Once pre-trained with a large synthetic dataset, these models can be fine-tuned on a much smaller, quantum-mechanical one, improving numerical accuracy and stability in computational practice. We demonstrate feasibility for a series of equivariant graph-NN potentials for carbon, and we carry out initial experiments to test the limits of the approach.<\/jats:p>","DOI":"10.1088\/2632-2153\/ad1626","type":"journal-article","created":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T22:30:15Z","timestamp":1702679415000},"page":"015003","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["Synthetic pre-training for neural-network interatomic potentials"],"prefix":"10.1088","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7377-7146","authenticated-orcid":true,"given":"John L A","family":"Gardner","sequence":"first","affiliation":[]},{"given":"Kathryn T","family":"Baker","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6873-0278","authenticated-orcid":true,"given":"Volker L","family":"Deringer","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2024,1,10]]},"reference":[{"key":"mlstad1626bib1","doi-asserted-by":"publisher","DOI":"10.1002\/anie.201703114","volume":"56","author":"Behler","year":"2017","journal-title":"Angew. Chem., Int. Ed."},{"key":"mlstad1626bib2","doi-asserted-by":"publisher","DOI":"10.1002\/adma.201902765","volume":"31","author":"Deringer","year":"2019","journal-title":"Adv. Mater."},{"key":"mlstad1626bib3","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1146\/annurev-physchem-042018-052331","volume":"71","author":"No\u00e9","year":"2020","journal-title":"Annu. Rev. Phys. Chem."},{"key":"mlstad1626bib4","doi-asserted-by":"publisher","DOI":"10.1021\/acs.chemrev.0c01111","volume":"121","author":"Unke","year":"2021","journal-title":"Chem. Rev."},{"key":"mlstad1626bib5","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1038\/s41563-020-0777-6","volume":"20","author":"Friederich","year":"2021","journal-title":"Nat. Mater."},{"key":"mlstad1626bib6","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1038\/s41586-020-2677-y","volume":"585","author":"Cheng","year":"2020","journal-title":"Nature"},{"key":"mlstad1626bib7","doi-asserted-by":"publisher","DOI":"10.1002\/adma.202107515","volume":"34","author":"Zhou","year":"2022","journal-title":"Adv. Mater."},{"key":"mlstad1626bib8","doi-asserted-by":"publisher","first-page":"914","DOI":"10.1038\/s41557-022-00950-z","volume":"14","author":"Westermayr","year":"2022","journal-title":"Nat. Chem."},{"key":"mlstad1626bib9","doi-asserted-by":"publisher","DOI":"10.1038\/d41586-023-01445-8","article-title":"Synthetic data could be better than real data","author":"Savage","year":"2023"},{"key":"mlstad1626bib10","article-title":"Synthetic data from diffusion models improves imagenet classification","author":"Azizi","year":"2023"},{"key":"mlstad1626bib11","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV51070.2023.00371","article-title":"Segment anything","author":"Kirillov","year":"2023"},{"key":"mlstad1626bib12","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.findings-acl.823","article-title":"Better language models of code through self-improvement","author":"To","year":"2023"},{"key":"mlstad1626bib13","author":"Zhang","year":"2022"},{"key":"mlstad1626bib14","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1039\/D1DD00025J","volume":"1","author":"Aty","year":"2022","journal-title":"Digit. Discovery"},{"key":"mlstad1626bib15","doi-asserted-by":"publisher","first-page":"578","DOI":"10.1039\/D2DD00147K","volume":"2","author":"Anker","year":"2023","journal-title":"Digit. Discovery"},{"key":"mlstad1626bib16","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1038\/s41524-023-01055-y","volume":"9","author":"Schuetzke","year":"2023","journal-title":"npj Comput. Mater."},{"key":"mlstad1626bib17","doi-asserted-by":"publisher","DOI":"10.1063\/5.0099929","volume":"157","author":"Morrow","year":"2022","journal-title":"J. Chem. Phys."},{"key":"mlstad1626bib18","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1039\/D2DD00137C","volume":"2","author":"Gardner","year":"2023","journal-title":"Digit. Discovery"},{"key":"mlstad1626bib19","doi-asserted-by":"publisher","DOI":"10.1039\/D3CC02265J","volume":"59","author":"Faure Beaulieu","year":"2023","journal-title":"Chem. Commun."},{"key":"mlstad1626bib20","author":"Kelvinius","year":"2023"},{"key":"mlstad1626bib21","doi-asserted-by":"publisher","first-page":"5077","DOI":"10.1021\/acs.jctc.3c00289","volume":"19","author":"Wang","year":"2023","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad1626bib22","article-title":"Deep unsupervised learning using nonequilibrium thermodynamics","author":"Sohl-Dickstein","year":"2015"},{"key":"mlstad1626bib23","article-title":"Denoising diffusion probabilistic models","author":"Ho","year":"2020"},{"key":"mlstad1626bib24","article-title":"Pre-training via denoising for molecular property prediction","author":"Zaidi","year":"2022"},{"key":"mlstad1626bib25","doi-asserted-by":"crossref","DOI":"10.1021\/acs.jctc.3c00702","article-title":"Two for one: diffusion models and force fields for coarse-grained molecular dynamics","author":"Arts","year":"2023"},{"key":"mlstad1626bib26","first-page":"pp 14839","volume":"vol 35","author":"Shui","year":"2022"},{"key":"mlstad1626bib27","article-title":"A comprehensive survey on transfer learning","author":"Zhuang","year":"2019"},{"key":"mlstad1626bib28","first-page":"pp 213","author":"Saenko","year":"2010","edition":"ed"},{"key":"mlstad1626bib29","first-page":"pp 3156","author":"Vinyals","year":"2015"},{"key":"mlstad1626bib30","first-page":"pp 1041","author":"Sharma","year":"2007"},{"key":"mlstad1626bib31","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1145\/2746230","volume":"34","author":"Tang","year":"2016","journal-title":"ACM Trans. Inf. Syst."},{"key":"mlstad1626bib32","doi-asserted-by":"publisher","first-page":"2903","DOI":"10.1038\/s41467-019-10827-4","volume":"10","author":"Smith","year":"2019","journal-title":"Nat. Commun."},{"key":"mlstad1626bib33","article-title":"DPA-1: pretraining of attention-based deep potential model for molecular simulation","author":"Zhang","year":"2022"},{"key":"mlstad1626bib34","doi-asserted-by":"publisher","first-page":"4510","DOI":"10.1021\/acs.jctc.2c01203","volume":"19","author":"Chen","year":"2023","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad1626bib35","doi-asserted-by":"publisher","first-page":"5383","DOI":"10.1039\/D2CP05793J","article-title":"Transfer learning for chemically accurate interatomic neural network potentials","volume":"25","author":"Zaverkin","year":"2023","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstad1626bib36","doi-asserted-by":"publisher","DOI":"10.1016\/j.isci.2022.105231","volume":"25","author":"Li","year":"2022","journal-title":"iScience"},{"key":"mlstad1626bib37","doi-asserted-by":"publisher","DOI":"10.1063\/1.5020710","volume":"148","author":"Faber","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstad1626bib38","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1021\/acs.jpclett.8b02805","volume":"10","author":"Fias","year":"2019","journal-title":"J. Phys. Chem. Lett."},{"key":"mlstad1626bib39","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1016\/0893-6080(91)90009-T","volume":"4","author":"Hornik","year":"1991","journal-title":"Neural Netw."},{"key":"mlstad1626bib40","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.98.146401","volume":"98","author":"Behler","year":"2007","journal-title":"Phys. Rev. Lett."},{"key":"mlstad1626bib41","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.87.184115","volume":"87","author":"Bart\u00f3k","year":"2013","journal-title":"Phys. Rev. B"},{"key":"mlstad1626bib42","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1016\/j.cpc.2018.03.016","volume":"228","author":"Wang","year":"2018","journal-title":"Comput. Phys. Commun."},{"key":"mlstad1626bib43","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.99.014104","volume":"99","author":"Drautz","year":"2019","journal-title":"Phys. Rev. B"},{"key":"mlstad1626bib44","doi-asserted-by":"publisher","DOI":"10.1063\/1.5019779","volume":"148","author":"Sch\u00fctt","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstad1626bib45","doi-asserted-by":"publisher","first-page":"3678","DOI":"10.1021\/acs.jctc.9b00181","volume":"15","author":"Unke","year":"2019","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad1626bib46","doi-asserted-by":"publisher","first-page":"398","DOI":"10.1038\/s41467-020-20427-2","volume":"12","author":"Ko","year":"2021","journal-title":"Nat. Commun."},{"key":"mlstad1626bib47","article-title":"Equivariant message passing for the prediction of tensorial properties and molecular spectra","author":"Sch\u00fctt","year":"2021"},{"key":"mlstad1626bib48","doi-asserted-by":"publisher","first-page":"2453","DOI":"10.1038\/s41467-022-29939-5","volume":"13","author":"Batzner","year":"2022","journal-title":"Nat. Commun."},{"key":"mlstad1626bib49","first-page":"pp 11423","volume":"vol 35","author":"Batatia","year":"2022"},{"key":"mlstad1626bib50","article-title":"The design space of E(3)-equivariant atom-centered interatomic potentials","author":"Batatia","year":"2022"},{"key":"mlstad1626bib51","article-title":"TensorNet: cartesian tensor representations for efficient learning of molecular potentials","author":"Simeon","year":"2023"},{"key":"mlstad1626bib52","article-title":"AutoFreeze: automatically freezing model blocks to accelerate fine-tuning","author":"Liu","year":"2021"},{"key":"mlstad1626bib53","article-title":"The lottery ticket hypothesis: finding sparse, trainable neural networks","author":"Frankle","year":"2019"},{"key":"mlstad1626bib54","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P18-1031","article-title":"Universal language model fine-tuning for text classification","author":"Howard","year":"2018"},{"key":"mlstad1626bib55","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2021.108171","volume":"271","author":"Thompson","year":"2022","journal-title":"Comput. Phys. Commun."},{"key":"mlstad1626bib56","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.95.094203","volume":"95","author":"Deringer","year":"2017","journal-title":"Phys. Rev. B"},{"key":"mlstad1626bib57","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.104.136403","volume":"104","author":"Bart\u00f3k","year":"2010","journal-title":"Phys. Rev. Lett."},{"key":"mlstad1626bib58","doi-asserted-by":"publisher","DOI":"10.1063\/5.0005084","volume":"153","author":"Rowe","year":"2020","journal-title":"J. Chem. Phys."},{"key":"mlstad1626bib59","doi-asserted-by":"publisher","first-page":"5151","DOI":"10.1021\/acs.jctc.2c01149","volume":"19","author":"Qamar","year":"2023","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad1626bib60","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.68.024107","volume":"68","author":"Los","year":"2003","journal-title":"Phys. Rev. B"},{"key":"mlstad1626bib61","doi-asserted-by":"publisher","first-page":"4370","DOI":"10.1103\/PhysRevLett.77.4370","volume":"77","author":"Bazant","year":"1996","journal-title":"Phys. Rev. Lett."},{"key":"mlstad1626bib62","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.63.035401","volume":"63","author":"Marks","year":"2000","journal-title":"Phys. Rev. B"},{"key":"mlstad1626bib63","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevX.8.041048","volume":"8","author":"Bart\u00f3k","year":"2018","journal-title":"Phys. Rev. X"},{"key":"mlstad1626bib64","author":"Kingma","year":"2017"},{"key":"mlstad1626bib65","article-title":"LAMMPS LCBOP potential for C developed by Los and Fasolino (2003) v000","author":"Karls","year":"2019"},{"key":"mlstad1626bib66","article-title":"Environment-dependent interatomic potential (EDIP) model driver v002","author":"Karls","year":"2018"},{"key":"mlstad1626bib67","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1007\/s11837-011-0102-6","volume":"63","author":"Tadmor","year":"2011","journal-title":"JOM"},{"key":"mlstad1626bib68","article-title":"Knowledgebase of interatomic models (KIM) application programming interface (API)","author":"Elliott","year":"2011"},{"key":"mlstad1626bib69","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevMaterials.6.013804","volume":"6","author":"Bochkarev","year":"2022","journal-title":"Phys. Rev. Mater."},{"key":"mlstad1626bib70","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1038\/s41524-021-00559-9","volume":"7","author":"Lysogorskiy","year":"2021","journal-title":"npj Comput. Mater."},{"key":"mlstad1626bib71","first-page":"1","author":"Cs\u00e1nyi","year":"2007","journal-title":"IoP Comput. Phys. Newsl. Spring"},{"key":"mlstad1626bib72","doi-asserted-by":"publisher","DOI":"10.1088\/1361-648X\/ab82d2","volume":"32","author":"Kermode","year":"2020","journal-title":"J. Phys.: Condens. Matter"},{"key":"mlstad1626bib73","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2020"},{"key":"mlstad1626bib74","doi-asserted-by":"publisher","DOI":"10.1088\/0965-0393\/18\/1\/015012","volume":"18","author":"Stukowski","year":"2009","journal-title":"Model. Simul. Mater. Sci. Eng."},{"key":"mlstad1626bib75","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.79.075430","volume":"79","author":"Powles","year":"2009","journal-title":"Phys. Rev. B"},{"key":"mlstad1626bib76","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1016\/j.carbon.2016.08.024","volume":"109","author":"de Tomas","year":"2016","journal-title":"Carbon"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,6]],"date-time":"2024-11-06T04:10:46Z","timestamp":1730866246000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad1626"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,10]]},"references-count":76,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,1,10]]},"published-print":{"date-parts":[[2024,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ad1626","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,10]]},"assertion":[{"value":"Synthetic pre-training for neural-network interatomic potentials","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2024 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2023-07-31","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2023-12-15","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-01-10","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}