{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,10]],"date-time":"2025-11-10T14:01:04Z","timestamp":1762783264666,"version":"3.41.0"},"reference-count":39,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T00:00:00Z","timestamp":1749600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T00:00:00Z","timestamp":1749600000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"Danmarks Frie Forskningsfond","award":["0217-00326B"],"award-info":[{"award-number":["0217-00326B"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Deep generative models are a powerful tool for exploring the chemical space within inverse-design workflows; however, their effectiveness relies on sufficient training data and effective mechanisms for guiding the model to optimize specific properties. We demonstrate that designing an expert-informed data representation and training procedure allows leveraging data augmentation while maintaining the required sampling controllability. We focus our discussion on a specific class of compounds (transition metal complexes), and a popular class of generative models (equivariant diffusion models), although we envision that the approach could be extended to other chemical spaces and model types. Through experiments, we demonstrate that augmenting the training database with generic but related unlabeled data enables a practical level of performance to be reached.<\/jats:p>","DOI":"10.1088\/2632-2153\/addc32","type":"journal-article","created":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T22:55:22Z","timestamp":1747954522000},"page":"025057","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Improving generative inverse design of molecular catalysts in small data regime"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-6157-862X","authenticated-orcid":true,"given":"Fran\u00e7ois","family":"Cornet","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1719-7310","authenticated-orcid":true,"given":"Pratham","family":"Deshmukh","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1578-9126","authenticated-orcid":false,"given":"Bardi","family":"Benediktsson","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6927-8869","authenticated-orcid":true,"given":"Mikkel N","family":"Schmidt","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3198-5116","authenticated-orcid":true,"given":"Arghya","family":"Bhowmik","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,6,11]]},"reference":[{"key":"mlstaddc32bib1","doi-asserted-by":"publisher","first-page":"8736","DOI":"10.1021\/jacs.2c13467","article-title":"Generative models as an emerging paradigm in the chemical sciences","volume":"145","author":"Anstine","year":"2023","journal-title":"J. Am. Chem. Soc."},{"key":"mlstaddc32bib2","doi-asserted-by":"publisher","first-page":"6135","DOI":"10.1021\/acs.jcim.0c01041","article-title":"tmQM dataset\u2014quantum geometries and properties of 86k transition metal complexes","volume":"60","author":"Balcells","year":"2020","journal-title":"J. Chem. Inf. Model."},{"article-title":"A foundation model for atomistic materials chemistry","year":"2023","author":"Batatia","key":"mlstaddc32bib3"},{"key":"mlstaddc32bib4","first-page":"p 37","article-title":"Equivariant neural diffusion for molecule generation","author":"Cornet","year":"2024a"},{"key":"mlstaddc32bib5","doi-asserted-by":"publisher","first-page":"1793","DOI":"10.1039\/D4DD00099D","article-title":"Om-diff: inverse-design of organometallic catalysts with guided equivariant denoising diffusion","volume":"3","author":"Cornet","year":"2024b","journal-title":"Digit. Discov."},{"key":"mlstaddc32bib6","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00020","article-title":"Autoaugment: Learning augmentation strategies from data","author":"Cubuk","year":"2019"},{"article-title":"Symphony: symmetry-equivariant point-centered spherical harmonics for 3D molecule generation","year":"2024","author":"Daigavane","key":"mlstaddc32bib7"},{"key":"mlstaddc32bib8","doi-asserted-by":"publisher","first-page":"4584","DOI":"10.1039\/D0SC00445F","article-title":"Machine learning dihydrogen activation in the chemical space surrounding vaska\u2019s complex","volume":"11","author":"Friederich","year":"2020","journal-title":"Chem. Sci."},{"year":"2016","author":"Frisch","key":"mlstaddc32bib9"},{"article-title":"Simple GNN regularisation for 3D molecular property prediction and beyond","year":"2022","author":"Godwin","key":"mlstaddc32bib10"},{"key":"mlstaddc32bib11","doi-asserted-by":"publisher","DOI":"10.1063\/1.3382344","article-title":"A consistent and accurate Ab Initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu","volume":"132","author":"Grimme","year":"2010","journal-title":"J. Chem. Phys."},{"key":"mlstaddc32bib12","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1107\/S2052520616003954","article-title":"The cambridge structural database","volume":"72","author":"Groom","year":"2016","journal-title":"Struct. Sci."},{"article-title":"3D equivariant diffusion for target-aware molecule generation and affinity prediction","year":"2022","author":"Guan","key":"mlstaddc32bib13"},{"key":"mlstaddc32bib14","first-page":"pp 8867","article-title":"Equivariant diffusion for molecule generation in 3D","author":"Hoogeboom","year":"2022"},{"article-title":"Forcenet: a graph neural network for large-scale quantum calculations","year":"2021","author":"Hu","key":"mlstaddc32bib15"},{"key":"mlstaddc32bib16","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1038\/s42256-024-00815-9","article-title":"Equivariant 3D-conditional diffusion model for molecular linker design","volume":"6","author":"Igashov","year":"2024","journal-title":"Nat. Mach. Intell."},{"key":"mlstaddc32bib17","doi-asserted-by":"publisher","first-page":"2106","DOI":"10.1002\/jcc.24437","article-title":"molSimplify: a toolkit for automating discovery in inorganic chemistry","volume":"37","author":"Ioannidis","year":"2016","journal-title":"J. Comput. Chem."},{"article-title":"Efficient 3D molecular generation with flow matching and scale optimal transport","year":"2024","author":"Irwin","key":"mlstaddc32bib18"},{"key":"mlstaddc32bib19","doi-asserted-by":"publisher","first-page":"4377","DOI":"10.1021\/acs.jctc.4c00232","article-title":"LigandDiff: de novo ligand design for 3D transition metal complexes with diffusion models","volume":"20","author":"Jin","year":"2024a","journal-title":"J. Chem. Theory Comput."},{"key":"mlstaddc32bib20","doi-asserted-by":"publisher","first-page":"8367\u221277","DOI":"10.1021\/acs.jctc.4c00775","article-title":"Partial to total generation of 3D transition-metal complexes","volume":"20","author":"Jin","year":"2024b","journal-title":"J. Chem. Theory Comput."},{"key":"mlstaddc32bib21","first-page":"pp 5006","article-title":"Distribution augmentation for generative modeling","author":"Jun","year":"2020"},{"key":"mlstaddc32bib22","doi-asserted-by":"publisher","first-page":"12974","DOI":"10.1021\/jp960669l","article-title":"Density functional theory of electronic structure","volume":"100","author":"Kohn","year":"1996","journal-title":"J. Phys. Chem."},{"key":"mlstaddc32bib23","doi-asserted-by":"publisher","first-page":"1486","DOI":"10.1002\/zaac.202100078","article-title":"Metal-ligand cooperative activation of hx (x = h, br, or) bond on mn based pincer complexes","volume":"647","author":"Krieger","year":"2021","journal-title":"Z. Anorg. Allg. Chem."},{"article-title":"RDKit: Open-source cheminformatics","year":"2024","author":"Landrum","key":"mlstaddc32bib24"},{"key":"mlstaddc32bib25","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac9c84","article-title":"Auglichem: data augmentation library of chemical structures for machine learning","volume":"3","author":"Magar","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstaddc32bib26","doi-asserted-by":"publisher","first-page":"7069","DOI":"10.1039\/C8SC01949E","article-title":"Machine learning meets volcano plots: computational discovery of cross-coupling catalysts","volume":"9","author":"Meyer","year":"2018","journal-title":"Chem. Sci."},{"key":"mlstaddc32bib27","doi-asserted-by":"publisher","first-page":"13973","DOI":"10.1021\/acs.iecr.8b04015","article-title":"Strategies and software for machine learning accelerated discovery in transition metal chemistry","volume":"57","author":"Nandy","year":"2018","journal-title":"Ind. Eng. Chem. Res."},{"key":"mlstaddc32bib28","doi-asserted-by":"publisher","first-page":"9927","DOI":"10.1021\/acs.chemrev.1c00347","article-title":"Computational discovery of transition-metal complexes: from high-throughput screening to machine learning","volume":"121","author":"Nandy","year":"2021","journal-title":"Chem. Rev."},{"key":"mlstaddc32bib29","doi-asserted-by":"publisher","first-page":"3865","DOI":"10.1103\/PhysRevLett.77.3865","article-title":"Generalized gradient approximation made simple","volume":"77","author":"Perdew","year":"1996","journal-title":"Phys. Rev. Lett."},{"key":"mlstaddc32bib30","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3693767","article-title":"Molcraft: structure-based drug design in continuous parameter space","author":"Qu","year":"2024"},{"key":"mlstaddc32bib31","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1002\/gamm.202100008","article-title":"An introduction to deep generative modeling","volume":"44","author":"Ruthotto","year":"2021","journal-title":"GAMM-Mitteilungen"},{"key":"mlstaddc32bib32","doi-asserted-by":"publisher","first-page":"2571","DOI":"10.1063\/1.463096","article-title":"Fully optimized contracted Gaussian basis sets for atoms Li to Kr","volume":"97","author":"Sch\u00e4fer","year":"1992","journal-title":"J. Chem. Phys."},{"key":"mlstaddc32bib33","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1038\/s43588-024-00737-x","article-title":"Structure-based drug design with equivariant diffusion models","volume":"4","author":"Schneuing","year":"2024","journal-title":"Nat. Comput. Sci."},{"key":"mlstaddc32bib34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J. Big Data"},{"key":"mlstaddc32bib35","doi-asserted-by":"publisher","first-page":"14360","DOI":"10.1021\/acsomega.9b02221","article-title":"Evolving concept of activity cliffs","volume":"4","author":"Stumpfe","year":"2019","journal-title":"ACS Omega"},{"key":"mlstaddc32bib36","doi-asserted-by":"publisher","first-page":"5938","DOI":"10.1021\/acs.jcim.2c01073","article-title":"Exposing the limitations of molecular machine learning with activity cliffs","volume":"62","author":"Van Tilborg","year":"2022","journal-title":"J. Chem. Inf. Model."},{"key":"mlstaddc32bib37","doi-asserted-by":"publisher","first-page":"2784","DOI":"10.1021\/ja01473a054","article-title":"Carbonyl and hydrido-carbonyl complexes of iridium by reaction with alcohols. Hydrido complexes by reaction with acid","volume":"83","author":"Vaska","year":"1961","journal-title":"J. Am. Chem. Soc."},{"key":"mlstaddc32bib38","first-page":"pp 560","article-title":"Midi: mixed graph and 3D denoising diffusion for molecule generation","author":"Vignac","year":"2023"},{"article-title":"Mixup: beyond empirical risk minimization","year":"2018","author":"Zhang","key":"mlstaddc32bib39"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T10:27:09Z","timestamp":1749637629000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc32"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,11]]},"references-count":39,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,6,11]]},"published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/addc32","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2025,6,11]]},"assertion":[{"value":"Improving generative inverse design of molecular catalysts in small data regime","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2025-01-10","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-05-22","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-06-11","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}