{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:22:49Z","timestamp":1760145769991,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T00:00:00Z","timestamp":1724284800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","award":["LA\/P\/0008\/2020","UIDP\/50006\/2020","Programme ERASMUS2027, ERASMUS-EDU-2021-PEX-EMJM-MOB, Project number 101050809"],"award-info":[{"award-number":["LA\/P\/0008\/2020","UIDP\/50006\/2020","Programme ERASMUS2027, ERASMUS-EDU-2021-PEX-EMJM-MOB, Project number 101050809"]}]},{"DOI":"10.13039\/501100000780","name":"European Union","doi-asserted-by":"publisher","award":["LA\/P\/0008\/2020","UIDP\/50006\/2020","Programme ERASMUS2027, ERASMUS-EDU-2021-PEX-EMJM-MOB, Project number 101050809"],"award-info":[{"award-number":["LA\/P\/0008\/2020","UIDP\/50006\/2020","Programme ERASMUS2027, ERASMUS-EDU-2021-PEX-EMJM-MOB, Project number 101050809"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Molecules"],"abstract":"<jats:p>A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity. Atomic DLSV descriptors were used to train machine learning (ML) models to predict 19F NMR chemical shifts. An R2 of up to 0.89 and mean absolute errors of up to 5.5 ppm were obtained for an independent test set of 1046 molecules with random forests or a gradient-boosting regressor. Intermediate representations from a Transformer model yielded comparable results. Furthermore, DLSVs were applied as molecular operators in the latent space: the DLSV of a halogenation (H\u2192F substitution) was summed to the LSVs of 4135 new molecules with no fluorine atom and decoded into SMILES, yielding 99% of valid SMILES, with 75% of the SMILES incorporating fluorine and 56% of the structures incorporating fluorine with no other structural change.<\/jats:p>","DOI":"10.3390\/molecules29163969","type":"journal-article","created":{"date-parts":[[2024,8,22]],"date-time":"2024-08-22T11:14:41Z","timestamp":1724325281000},"page":"3969","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Exploring Molecular Heteroencoders with Latent Space Arithmetic: Atomic Descriptors and Molecular Operators"],"prefix":"10.3390","volume":"29","author":[{"given":"Xinyue","family":"Gao","sequence":"first","affiliation":[{"name":"Faculty of Sciences, Universit\u00e9 Paris Cit\u00e9, 75013 Paris, France"}]},{"given":"Natalia","family":"Baimacheva","sequence":"additional","affiliation":[{"name":"Faculty of Chemistry, University of Strasbourg, 4, Blaise Pascal Str., 67081 Strasbourg, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5887-2966","authenticated-orcid":false,"given":"Joao","family":"Aires-de-Sousa","sequence":"additional","affiliation":[{"name":"LAQV and REQUIMTE, Chemistry Department, NOVA School of Science and Technology, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","article-title":"Generating focused molecule libraries for drug discovery with recurrent neural networks","volume":"4","author":"Segler","year":"2018","journal-title":"ACS Cent. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yoshikai, Y., Mizuno, T., Nemoto, S., and Kusuhara, H. (2024). Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations. Nat. Commun., 15.","DOI":"10.1038\/s41467-024-45102-8"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","article-title":"Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules","volume":"4","author":"Wei","year":"2018","journal-title":"ACS Cent. Sci."},{"key":"ref_4","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Irwin, R., Dimitriadis, S., He, J., and Bjerrum, E.J. (2022). Chemformer: A pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol., 3.","DOI":"10.1088\/2632-2153\/ac3ffb"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Bjerrum, E., and Sattarov, B. (2018). Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders. Biomolecules, 8.","DOI":"10.3390\/biom8040131"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1692","DOI":"10.1039\/C8SC04175J","article-title":"Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations","volume":"10","author":"Winter","year":"2019","journal-title":"Chem. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2099","DOI":"10.1021\/acs.jctc.0c01213","article-title":"Understanding conformational entropy in small molecules","volume":"17","author":"Chan","year":"2021","journal-title":"J. Chem. Theory Comput."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2539","DOI":"10.1021\/acs.jcim.3c01417","article-title":"HyperPCM: Robust Task-Conditioned Modeling of Drug\u2013Target Interactions","volume":"64","author":"Svensson","year":"2024","journal-title":"J. Chem. Inf. Model."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"8016","DOI":"10.1039\/C9SC01928F","article-title":"Efficient multi-objective molecular optimization in a continuous latent space","volume":"10","author":"Winter","year":"2019","journal-title":"Chem. Sci."},{"key":"ref_11","first-page":"37","article-title":"The art of atom descriptor design","volume":"32\u201333","year":"2019","journal-title":"Drug Discov. Today Technol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"940","DOI":"10.1021\/ci034228s","article-title":"Structure-based predictions of 1H NMR chemical shifts using feed-forward neural networks","volume":"44","author":"Binev","year":"2004","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"12012","DOI":"10.1039\/D1SC03343C","article-title":"Real-time prediction of 1H and 13C chemical shifts with DFT accuracy using a 3D graph neural network","volume":"12","author":"Guan","year":"2021","journal-title":"Chem. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1007\/s10822-023-00542-0","article-title":"QM assisted ML for 19F NMR chemical shift prediction","volume":"38","author":"Penner","year":"2024","journal-title":"J. Comput. Aided. Mol. Des."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Kuhn, S., Egert, B., Neumann, S., and Steinbeck, C. (2008). Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction. BMC Bioinform., 9.","DOI":"10.1186\/1471-2105-9-400"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1021\/ci700256n","article-title":"Toward More Reliable 13C and 1H Chemical Shift Prediction: A Systematic Comparison of Neural-Network and Least-Squares Regression Based Approaches","volume":"48","author":"Smurnyy","year":"2008","journal-title":"J. Chem. Inf. Model."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1016\/j.chemolab.2014.03.011","article-title":"A QSPR approach for the fast estimation of DFT\/NBO partial atomic charges","volume":"134","author":"Zhang","year":"2014","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"4721","DOI":"10.1093\/bioinformatics\/btaa566","article-title":"Fast and accurate prediction of partial charges using Atom-Path-Descriptor-based machine learning","volume":"36","author":"Wang","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1002\/minf.201500113","article-title":"Machine Learning Estimation of Atom Condensed Fukui Functions","volume":"35","author":"Zhang","year":"2016","journal-title":"Mol. Inform."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1038\/s41467-023-42145-1","article-title":"Predictive Minisci late stage functionalization with transfer learning","volume":"15","author":"Faber","year":"2024","journal-title":"Nat. Commun."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1832","DOI":"10.1021\/acs.jcim.7b00250","article-title":"FAME 2: Simple and Effective Machine Learning Model of Cytochrome P450 Regioselectivity","volume":"57","author":"Stork","year":"2017","journal-title":"J. Chem. Inf. Model."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1002\/minf.201600070","article-title":"Predictive Models for the Free Energy of Hydrogen Bonded Complexes with Single and Cooperative Hydrogen Bonds","volume":"35","author":"Glavatskikh","year":"2016","journal-title":"Mol. Inform."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Bauer, C.A., Schneider, G., and G\u00f6ller, A.H. (2019). Machine learning models for hydrogen bond donor and acceptor strengths using large and diverse training data generated by first-principles interaction free energies. J. Cheminform., 11.","DOI":"10.1186\/s13321-019-0381-4"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2200193","DOI":"10.1002\/minf.202200193","article-title":"Machine Learning to Predict Homolytic Dissociation Energies of C\u2212H Bonds: Calibration of DFT-based Models with Experimental Data","volume":"42","author":"Li","year":"2023","journal-title":"Mol. Inform."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, Z., Luo, P., Wang, X., and Tang, X. (2014). Deep Learning Face Attributes in the Wild. arXiv.","DOI":"10.1109\/ICCV.2015.425"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bitard-Feildel, T. (2021). Navigating the amino acid sequence space between functional proteins using a deep learning framework. PeerJ Comput. Sci., 7.","DOI":"10.7717\/peerj-cs.684"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"36","DOI":"10.3390\/biochem1010004","article-title":"De Novo Drug Design Using Artificial Intelligence Applied on SARS-CoV-2 Viral Proteins ASYNT-GAN","volume":"1","author":"Jacobs","year":"2021","journal-title":"BioChem"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ochiai, T., Inukai, T., Akiyama, M., Furui, K., Ohue, M., Matsumori, N., Inuki, S., Uesugi, M., Sunazuka, T., and Kikuchi, K. (2023). Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity. Commun. Chem., 6.","DOI":"10.1038\/s42004-023-01054-6"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1002\/mrc.1260230304","article-title":"A quantitative empirical treatment of 13C NMR chemical shift variations on successive substitution of methane by halogen atoms","volume":"23","author":"Gasteiger","year":"1985","journal-title":"Magn. Reson. Chem."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1002\/mrc.1270150408","article-title":"Prediction of proton magnetic resonance shifts: The dependence on hydrogen charges obtained by iterative partial equalization of orbital electronegativity","volume":"15","author":"Gasteiger","year":"1981","journal-title":"Org. Magn. Reson."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, Y., Huang, W.-S., Zhang, L., Su, D., Xu, H., and Xue, X.-S. (2024). Prediction of 19F NMR chemical shift by machine learning. Artificial Intell. Chem., 2.","DOI":"10.1016\/j.aichem.2024.100043"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rull, H., Fischer, M., and Kuhn, S. (2023). NMR shift prediction from small data quantities. J. Cheminform, 15.","DOI":"10.1186\/s13321-023-00785-x"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Polykovskiy, D., Zhebrak, A., Sanchez-Lengeling, B., Golovanov, S., Tatanov, O., Belyaev, S., Kurbanov, R., Artamonov, A., Aladinskiy, V., and Veselov, M. (2020). Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models. Front. Pharmacol., 11.","DOI":"10.3389\/fphar.2020.565644"},{"key":"ref_34","first-page":"2825","article-title":"Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."}],"container-title":["Molecules"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1420-3049\/29\/16\/3969\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:41:30Z","timestamp":1760110890000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1420-3049\/29\/16\/3969"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,22]]},"references-count":34,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["molecules29163969"],"URL":"https:\/\/doi.org\/10.3390\/molecules29163969","relation":{},"ISSN":["1420-3049"],"issn-type":[{"type":"electronic","value":"1420-3049"}],"subject":[],"published":{"date-parts":[[2024,8,22]]}}}