{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T20:19:25Z","timestamp":1776111565584,"version":"3.50.1"},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Novo Nordisk Fonden, Denmark","award":["NNF20OC0064104"],"award-info":[{"award-number":["NNF20OC0064104"]}]},{"name":"Independent Research Foundation Denmark","award":["0217-00326B"],"award-info":[{"award-number":["0217-00326B"]}]},{"DOI":"10.13039\/501100001734","name":"Copenhagen University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001734","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>We present a method for creating RDKit-parsable SMILES for transition metal complexes (TMCs) based on xyz-coordinates and overall charge of the complex. This can be viewed as an extension to the program xyz2mol that does the same for organic molecules. The only dependency is RDKit, which makes it widely applicable. One thing that has been lacking when it comes to generating SMILES from structure for TMCs is an existing SMILES dataset to compare with. Therefore, sanity-checking a method has required manual work. Therefore, we also generate SMILES two other ways; one where ligand charges and TMC connectivity are based on natural bond orbital (NBO) analysis from density functional theory (DFT) calculations utilizing recent work by Kneiding et al. (Digit Discov\u00a02:\u00a0618\u2013633, 2023). Another one fixes SMILES available through the Cambridge Structural Database (CSD), making them parsable by RDKit. We compare these three different ways of obtaining SMILES for a subset of the CSD (tmQMg) and find &gt;70% agreement for all three pairs. We utilize these SMILES to make simple molecular fingerprint (FP) and graph-based representations of the molecules to be used in the context of machine learning. Comparing with the graphs made by Kneiding et al. where nodes and edges are featurized with DFT properties, we find that depending on the target property (polarizability, HOMO-LUMO gap or dipole moment) the SMILES based representations can perform equally well. This makes them very suitable as baseline-models. Finally we present a dataset of 227k RDKit parsable SMILES for mononuclear TMCs in the CSD.<\/jats:p>\n                  <jats:p>\n                    <jats:bold>Scientific contribution<\/jats:bold>\n                    We present a method that can create RDKit-parsable SMILES strings of transition metal complexes (TMCs) from Cartesian coordinates and use it to create a dataset of 227k TMC SMILES strings. The RDKit-parsability allows us to generate perform machine learning studies of TMC properties using \u201dstandard\u201d molecular representations such as fingerprints and 2D-graph convolution. We show that these relatively simple representations can perform quite well depending on the target property.\n                  <\/jats:p>","DOI":"10.1186\/s13321-025-01008-1","type":"journal-article","created":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T12:31:47Z","timestamp":1745843507000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["SMILES all around: structure to SMILES conversion for transition metal complexes"],"prefix":"10.1186","volume":"17","author":[{"given":"Maria H.","family":"Rasmussen","sequence":"first","affiliation":[]},{"given":"Magnus","family":"Strandgaard","sequence":"additional","affiliation":[]},{"given":"Julius","family":"Seumer","sequence":"additional","affiliation":[]},{"given":"Laura K.","family":"Hemmingsen","sequence":"additional","affiliation":[]},{"given":"Angelo","family":"Frei","sequence":"additional","affiliation":[]},{"given":"David","family":"Balcells","sequence":"additional","affiliation":[]},{"given":"Jan H.","family":"Jensen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,28]]},"reference":[{"key":"1008_CR1","unstructured":"RDKit: Open-source cheminformatics. http:\/\/www.rdkit.org"},{"key":"1008_CR2","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1107\/S2052520616003954","volume":"72","author":"CR Groom","year":"2016","unstructured":"Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge structural database. Acta Crystallogr Sect B Struct Sci Cryst Eng Mater 72:171\u2013179","journal-title":"Acta Crystallogr Sect B Struct Sci Cryst Eng Mater"},{"key":"1008_CR3","doi-asserted-by":"crossref","unstructured":"Vela S, Laplaza R, Cho Y, Corminboeuf C (2022) cell2mol: encoding chemistry to interpret crystallographic data, NPJ Comput Mater 8: 1\u20138","DOI":"10.1038\/s41524-022-00874-9"},{"key":"1008_CR4","doi-asserted-by":"publisher","first-page":"6135","DOI":"10.1021\/acs.jcim.0c01041","volume":"60","author":"D Balcells","year":"2020","unstructured":"Balcells D, Skjelstad BB (2020) TmQM dataset-quantum geometries and properties of 86k transition metal complexes. J Chem Inf Model 60:6135\u20136146","journal-title":"J Chem Inf Model"},{"key":"1008_CR5","doi-asserted-by":"publisher","first-page":"618","DOI":"10.1039\/D2DD00129B","volume":"2","author":"H Kneiding","year":"2023","unstructured":"Kneiding H, Lukin R, Lang L, Reine S, Pedersen TB, De Bin R, Balcells D (2023) Deep learning metal complex properties with natural quantum graphs. Digit Discov 2:618\u2013633","journal-title":"Digit Discov"},{"key":"1008_CR6","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1038\/s43588-024-00616-5","volume":"4","author":"H Kneiding","year":"2024","unstructured":"Kneiding H, Nova A, Balcells D (2024) Directional multiobjective optimization of metal complexes at the billion-system scale. Nat Comput Sci 4:263\u2013273","journal-title":"Nat Comput Sci"},{"key":"1008_CR7","unstructured":"Jensen JH (2021) xyz2mol. https:\/\/github.com\/jensengroup\/xyz2mol"},{"key":"1008_CR8","doi-asserted-by":"publisher","first-page":"1397","DOI":"10.1063\/1.1734456","volume":"39","author":"R Hoffmann","year":"1963","unstructured":"Hoffmann R (1963) An extended H\u00fcckel theory. I. hydrocarbons. J Chem Phys 39:1397\u20131412","journal-title":"J Chem Phys"},{"key":"1008_CR9","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742\u2013754","journal-title":"J Chem Inf Model"},{"key":"1008_CR10","doi-asserted-by":"publisher","first-page":"759","DOI":"10.1039\/D2DD00146B","volume":"2","author":"G Tom","year":"2023","unstructured":"Tom G, Hickman RJ, Zinzuwadia A, Mohajeri A, Sanchez-Lengeling B, Aspuru-Guzik A (2023) Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS. Digit Discov 2:759\u2013774","journal-title":"Digit Discov"},{"key":"1008_CR11","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/1758-2946-3-33","volume":"3","author":"NM O\u2019Boyle","year":"2011","unstructured":"O\u2019Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open Babel: an open chemical toolbox. J Cheminform 3:33","journal-title":"J Cheminform"},{"key":"1008_CR12","doi-asserted-by":"crossref","unstructured":"Frei A, Orsi M (2024) ELECTRUM: an electron configuration-based universal metal fingerprint for transition metal compounds. ChemRxiv","DOI":"10.26434\/chemrxiv-2024-vqktn"},{"key":"1008_CR13","doi-asserted-by":"publisher","first-page":"18193","DOI":"10.1021\/acs.est.3c02198","volume":"57","author":"S Zhong","year":"2023","unstructured":"Zhong S, Guan X (2023) Count-based Morgan fingerprint: a more efficient and interpretable molecular representation in developing machine learning-based predictive regression models for water contaminants\u2019 activities and properties. Environ Sci Technol 57:18193\u201318202","journal-title":"Environ Sci Technol"},{"key":"1008_CR14","unstructured":"How to turn a SMILES string into a molecular graph for pytorch geometric. https:\/\/www.blopig.com\/blog\/2022\/02\/how-to-turn-a-smiles-string-into-a-molecular-graph-for-pytorch-geometric. Accessed 12 Nov 2024."},{"key":"1008_CR15","unstructured":"Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) LightGBM: a highly efficient Gradient Boosting Decision Tree. Proceedings of the 31st international conference on neural information processing systems. 30:3146\u20133154"},{"key":"1008_CR16","unstructured":"Gilmer J, Schoenholz SS, Riley PF, Vinyals O (2017) Neural message passing for quantum chemistry, G.\u00a0E. Dahl in International Conference on Machine Learning, PMLR, pp.\u00a01263\u20131272"},{"key":"1008_CR17","doi-asserted-by":"publisher","first-page":"3167","DOI":"10.1021\/ic052110i","volume":"45","author":"JE Ellis","year":"2006","unstructured":"Ellis JE (2006) Adventures with substances containing metals in negative oxidation states. Inorg Chem. 45:3167\u20133186","journal-title":"Inorg Chem."},{"key":"1008_CR18","doi-asserted-by":"crossref","unstructured":"Yu HS, Truhlar DG (2016) Oxidation state 10 exists. Angewandte Chem (International ed. in English) 55: 9004\u20139006","DOI":"10.1002\/anie.201604670"},{"key":"1008_CR19","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-pchem.30","volume":"5","author":"M Strandgaard","year":"2023","unstructured":"Strandgaard M, Seumer J, Benediktsson B, Bhowmik A, Vegge T, Jensen JH (2023) Genetic algorithm-based re-optimization of the Schrock catalyst for dinitrogen fixation. PeerJ Phys Chem 5:e30","journal-title":"PeerJ Phys Chem"},{"key":"1008_CR20","doi-asserted-by":"crossref","unstructured":"Strandgaard M, Seumer J, Jensen JH (2024) Discovery of molybdenum based nitrogen fixation catalysts with genetic algorithms. Chem Sci (R Soc Chem: 2010). 15: 10638\u201310650","DOI":"10.1039\/D4SC02227K"},{"key":"1008_CR21","doi-asserted-by":"crossref","unstructured":"Seumer J, Jensen J (2024) Beyond predefined ligand libraries: a genetic algorithm approach for de novo discovery of catalysts for the Suzuki coupling reactions. ChemRxiv","DOI":"10.26434\/chemrxiv-2024-9xh38"},{"key":"1008_CR22","doi-asserted-by":"publisher","first-page":"5714","DOI":"10.1021\/acs.jcim.0c00174","volume":"60","author":"W Gao","year":"2020","unstructured":"Gao W, Coley CW (2020) The synthesizability of molecules proposed by generative models. J Chem Inf Model 60:5714\u20135723","journal-title":"J Chem Inf Model"},{"key":"1008_CR23","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/1758-2946-1-8","volume":"1","author":"P Ertl","year":"2009","unstructured":"Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1:8","journal-title":"J Cheminform"},{"key":"1008_CR24","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1007\/s10822-024-00549-1","volume":"38","author":"A Kerstjens","year":"2024","unstructured":"Kerstjens A, De Winter H (2024) Molecule auto-correction to facilitate molecular design. J Comput Aid Mol Des 38:10","journal-title":"J Comput Aid Mol Des"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01008-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01008-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01008-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T12:32:02Z","timestamp":1745843522000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01008-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,28]]},"references-count":24,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1008"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01008-1","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2024-c660p","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,28]]},"assertion":[{"value":"16 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"63"}}