{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T22:54:06Z","timestamp":1776293646032,"version":"3.50.1"},"reference-count":63,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T00:00:00Z","timestamp":1676332800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T00:00:00Z","timestamp":1676332800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Generative deep learning models have emerged as a powerful approach for de novo drug design as they aid researchers in finding new molecules with desired properties. Despite continuous improvements in the field, a subset of the outputs that sequence-based de novo generators produce cannot be progressed due to errors. Here, we propose to fix these invalid outputs post hoc. In similar tasks, transformer models from the field of natural language processing have been shown to be very effective. Therefore, here this type of model was trained to translate invalid Simplified Molecular-Input Line-Entry System (SMILES) into valid representations. The performance of this SMILES corrector was evaluated on four representative methods of de novo generation: a recurrent neural network (RNN), a target-directed RNN, a generative adversarial network (GAN), and a variational autoencoder (VAE). This study has found that the percentage of invalid outputs from these specific generative models ranges between 4 and 89%, with different models having different error-type distributions. Post hoc correction of SMILES was shown to increase model validity. The SMILES corrector trained with one error per input alters 60\u201390% of invalid generator outputs and fixes 35\u201380% of them. However, a higher error detection and performance was obtained for transformer models trained with multiple errors per input. In this case, the best model was able to correct 60\u201395% of invalid generator outputs. Further analysis showed that these fixed molecules are comparable to the correct molecules from the de novo generators based on novelty and similarity. Additionally, the SMILES corrector can be used to expand the amount of interesting new molecules within the targeted chemical space. Introducing different errors into existing molecules yields novel analogs with a uniqueness of 39% and a novelty of approximately 20%. The results of this research demonstrate that SMILES correction is a viable post hoc extension and can enhance the search for better drug candidates.<\/jats:p>\n                  <jats:p>\n                    <jats:bold>Graphical Abstract<\/jats:bold>\n                  <\/jats:p>","DOI":"10.1186\/s13321-023-00696-x","type":"journal-article","created":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T05:04:16Z","timestamp":1676351056000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["UnCorrupt SMILES: a novel approach to de novo design"],"prefix":"10.1186","volume":"15","author":[{"given":"Linde","family":"Schoenmaker","sequence":"first","affiliation":[]},{"given":"Olivier J. M.","family":"B\u00e9quignon","sequence":"additional","affiliation":[]},{"given":"Willem","family":"Jespers","sequence":"additional","affiliation":[]},{"given":"Gerard J. P.","family":"van Westen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,14]]},"reference":[{"key":"696_CR1","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1038\/nature03192","volume":"432","author":"CM Dobson","year":"2004","unstructured":"Dobson CM (2004) Chemical space and biology. Nature 432:824\u2013828. https:\/\/doi.org\/10.1038\/nature03192","journal-title":"Nature"},{"key":"696_CR2","doi-asserted-by":"publisher","first-page":"1241","DOI":"10.1016\/j.drudis.2018.01.039","volume":"23","author":"H Chen","year":"2018","unstructured":"Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discov Today 23:1241\u20131250. https:\/\/doi.org\/10.1016\/j.drudis.2018.01.039","journal-title":"Drug Discov Today"},{"key":"696_CR3","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/978-1-0716-0826-5_6","volume-title":"Artificial Neural Networks","author":"X Liu","year":"2021","unstructured":"Liu X, IJzerman AP, van Westen GJP (2021) Computational approaches for de novo drug design: past, present, and future. In: Cartwright H (ed) Artificial Neural Networks. Springer, Berlin, pp 139\u2013165"},{"key":"696_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2022.105403","author":"DD Martinelli","year":"2022","unstructured":"Martinelli DD (2022) Generative machine learning for de novo drug discovery: a systematic review. Comput Biol Med. https:\/\/doi.org\/10.1016\/j.compbiomed.2022.105403","journal-title":"Comput Biol Med"},{"key":"696_CR5","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comput Sci"},{"key":"696_CR6","doi-asserted-by":"publisher","first-page":"689","DOI":"10.1016\/j.drudis.2020.01.020","volume":"25","author":"H \u00d6zt\u00fcrk","year":"2020","unstructured":"\u00d6zt\u00fcrk H, \u00d6zg\u00fcr A, Schwaller P, Laino T, Ozkirimli E (2020) Exploring chemical space using natural language processing methodologies for drug discovery. Drug Discov Today 25:689\u2013705. https:\/\/doi.org\/10.1016\/j.drudis.2020.01.020","journal-title":"Drug Discov Today"},{"key":"696_CR7","doi-asserted-by":"publisher","first-page":"1700111","DOI":"10.1002\/minf.201700111","volume":"37","author":"A Gupta","year":"2018","unstructured":"Gupta A, M\u00fcller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G (2018) Generative recurrent networks for de novo drug design. Mol Inform 37:1700111","journal-title":"Mol Inform"},{"key":"696_CR8","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","volume":"4","author":"MHS Segler","year":"2018","unstructured":"Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4:120\u2013131. https:\/\/doi.org\/10.1021\/acscentsci.7b00512","journal-title":"ACS Cent Sci"},{"key":"696_CR9","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","volume":"59","author":"N Brown","year":"2019","unstructured":"Brown N, Fiscato M, Segler MHS, Vaucher AC (2019) GuacaMol: benchmarking models for de novo molecular design. J Chem Inf Model 59:1096\u20131108","journal-title":"J Chem Inf Model"},{"key":"696_CR10","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268\u2013276","journal-title":"ACS Cent Sci"},{"key":"696_CR11","doi-asserted-by":"crossref","unstructured":"O\u2019Boyle N, Dalke A (2018) DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures. Accessed 23 Aug 2022","DOI":"10.26434\/chemrxiv.7097960"},{"key":"696_CR12","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba947","volume":"1","author":"M Krenn","year":"2020","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1:045024","journal-title":"Mach Learn Sci Technol"},{"key":"696_CR13","first-page":"01070","volume":"1812","author":"W Jin","year":"2018","unstructured":"Jin W, Yang K, Barzilay R, Jaakkola T (2018) Learning multimodal graph-to-graph translation for molecular optimization. arxiv preprint arXiv 1812:01070","journal-title":"arxiv preprint arXiv"},{"key":"696_CR14","doi-asserted-by":"publisher","first-page":"828","DOI":"10.1039\/C9ME00039A","volume":"4","author":"DC Elton","year":"2019","unstructured":"Elton DC, Boukouvalas Z, Fuge MD, Chung PW (2019) Deep learning for molecular design\u2014a review of the state of the art. Mol Syst Des Eng 4:828\u2013849. https:\/\/doi.org\/10.1039\/C9ME00039A","journal-title":"Mol Syst Des Eng"},{"key":"696_CR15","doi-asserted-by":"publisher","first-page":"14011","DOI":"10.1021\/acs.jmedchem.1c00927","volume":"64","author":"X Tong","year":"2021","unstructured":"Tong X, Liu X, Tan X, Li X, Jiang J, Xiong Z, Xu T, Jiang H, Qiao N, Zheng M (2021) Generative models for de novo drug design. J Med Chem 64:14011\u201314027. https:\/\/doi.org\/10.1021\/acs.jmedchem.1c00927","journal-title":"J Med Chem"},{"key":"696_CR16","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1016\/j.ddtec.2020.11.004","volume":"32\u201333","author":"X Xia","year":"2019","unstructured":"Xia X, Hu J, Wang Y, Zhang L, Liu Z (2019) Graph-based generative models for de Novo drug design. Drug Discov Today Technol 32\u201333:45\u201353. https:\/\/doi.org\/10.1016\/j.ddtec.2020.11.004","journal-title":"Drug Discov Today Technol"},{"key":"696_CR17","doi-asserted-by":"publisher","first-page":"025023","DOI":"10.1088\/2632-2153\/abcf91","volume":"2","author":"R Mercado","year":"2021","unstructured":"Mercado R, Rastemo T, Lindel\u00f6f E, Klambauer G, Engkvist O, Chen H, Bjerrum EJ (2021) Graph networks for molecular design. Mach Learn Sci Technol 2:025023. https:\/\/doi.org\/10.1088\/2632-2153\/abcf91","journal-title":"Mach Learn Sci Technol"},{"key":"696_CR18","unstructured":"Kusner MJ, Paige B, Hern\u00e1ndez-Lobato JM (2017) Grammar variational autoencoder. In: International conference on machine learning. PMLR. 1945\u20131954. Accessed 23 Aug 2022"},{"key":"696_CR19","first-page":"08786","volume":"1802","author":"H Dai","year":"2018","unstructured":"Dai H, Tian Y, Dai B, Skiena S, Song L (2018) Syntax-directed variational autoencoder for structured data. arxiv preprint arXiv 1802:08786","journal-title":"arxiv preprint arXiv"},{"key":"696_CR20","doi-asserted-by":"crossref","unstructured":"Yuan Z, Briscoe T (2016) Grammatical error correction using neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp 380\u2013386. Accessed 23 Aug 2022","DOI":"10.18653\/v1\/N16-1042"},{"key":"696_CR21","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1021\/acs.jcim.9b00949","volume":"60","author":"S Zheng","year":"2020","unstructured":"Zheng S, Rao J, Zhang Z, Xu J, Yang Y (2020) Predicting retrosynthetic reactions using self-corrected transformer neural networks. J Chem Inf Model 60:47\u201355. https:\/\/doi.org\/10.1021\/acs.jcim.9b00949","journal-title":"J Chem Inf Model"},{"key":"696_CR22","doi-asserted-by":"publisher","first-page":"1572","DOI":"10.1021\/acscentsci.9b00576","volume":"5","author":"P Schwaller","year":"2019","unstructured":"Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, Lee AA (2019) Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5:1572\u20131583. https:\/\/doi.org\/10.1021\/acscentsci.9b00576","journal-title":"ACS Cent Sci"},{"key":"696_CR23","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1039\/C8SC04175J","volume":"10","author":"R Winter","year":"2019","unstructured":"Winter R, Montanari F, No\u00e9 F, Clevert D-A (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10:1692\u20131701","journal-title":"Chem Sci"},{"key":"696_CR24","doi-asserted-by":"publisher","first-page":"131","DOI":"10.3390\/biom8040131","volume":"8","author":"EJ Bjerrum","year":"2018","unstructured":"Bjerrum EJ, Sattarov B (2018) Improving chemical autoencoder latent space and molecular de novo generation diversity with heteroencoders. Biomolecules 8:131","journal-title":"Biomolecules"},{"key":"696_CR25","doi-asserted-by":"publisher","first-page":"1371","DOI":"10.1039\/C9RA08535A","volume":"10","author":"H Duan","year":"2020","unstructured":"Duan H, Wang L, Zhang C, Guo L, Li J (2020) Retrosynthesis with attention-based NMT model and chemical analysis of \u201cwrong\u201d predictions. RSC Adv 10:1371\u20131378","journal-title":"RSC Adv"},{"key":"696_CR26","doi-asserted-by":"crossref","unstructured":"Foster J, Andersen \u00d8E (2009) Generrate: Generating errors for use in grammatical error detection. The Association for Computational Linguistics. Accessed 23 Aug 2022","DOI":"10.3115\/1609843.1609855"},{"key":"696_CR27","first-page":"08889","volume":"1907","author":"PM Htut","year":"2019","unstructured":"Htut PM, Tetreault J (2019) The unbearable weight of generating artificial errors for grammatical error correction. arxiv preprint arXiv 1907:08889","journal-title":"arxiv preprint arXiv"},{"key":"696_CR28","first-page":"00353","volume":"1910","author":"J N\u00e1plava","year":"2019","unstructured":"N\u00e1plava J, Straka M (2019) Grammatical error correction in low-resource scenarios. arxiv preprint arXiv 1910:00353","journal-title":"arxiv preprint arXiv"},{"key":"696_CR29","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/s13321-022-00672-x","volume":"15","author":"OJM B\u00e9quignon","year":"2023","unstructured":"B\u00e9quignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP (2023) Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 15:3. https:\/\/doi.org\/10.1186\/s13321-022-00672-x","journal-title":"J Cheminform"},{"key":"696_CR30","doi-asserted-by":"publisher","unstructured":"B\u00e9quignon OJM, Bongers BJ, Jespers W, IJzerman AP, van de Water B, van Westen GJP (2022) Accompanying data - papyrus \u2014a large scale curated dataset aimed at bioactivity predictions. https:\/\/doi.org\/10.5281\/zenodo.7019874","DOI":"10.5281\/zenodo.7019874"},{"key":"696_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00456-1","volume":"12","author":"AP Bento","year":"2020","unstructured":"Bento AP, Hersey A, F\u00e9lix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, de Veij M, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12:1\u201316","journal-title":"J Cheminform"},{"key":"696_CR32","doi-asserted-by":"publisher","first-page":"8732","DOI":"10.1021\/ja902302h","volume":"131","author":"LC Blum","year":"2009","unstructured":"Blum LC, Reymond J-L (2009) 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131:8732\u20138733. https:\/\/doi.org\/10.1021\/ja902302h","journal-title":"J Am Chem Soc"},{"key":"696_CR33","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1186\/s13321-021-00561-9","volume":"13","author":"X Liu","year":"2021","unstructured":"Liu X, Ye K, van Vlijmen HWT, Emmerich MTM, IJzerman AP, van Westen GJP (2021) DrugEx v2: de novo design of drug molecules by pareto-based multi-objective reinforcement learning in polypharmacology. J Cheminform 13:85. https:\/\/doi.org\/10.1186\/s13321-021-00561-9","journal-title":"J Cheminform"},{"key":"696_CR34","doi-asserted-by":"crossref","unstructured":"Sanchez-Lengeling B, Outeiral C, Guimaraes GL, Aspuru-Guzik A (2017) Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). Accessed 23 Aug 2022","DOI":"10.26434\/chemrxiv.5309668"},{"key":"696_CR35","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s13321-020-00438-3","volume":"12","author":"L Burggraaff","year":"2020","unstructured":"Burggraaff L, van Vlijmen HWT, IJzerman AP, van Westen GJP, (2020) Quantitative prediction of selectivity between the A1 and A2A adenosine receptors. J Cheminform 12:33. https:\/\/doi.org\/10.1186\/s13321-020-00438-3","journal-title":"J Cheminform"},{"key":"696_CR36","doi-asserted-by":"publisher","unstructured":"\u0160\u00edcho M, Luukkonen SIM, van den Maagdenberg HW, Liu X, Schoenmaker L, B\u00e9quignon OJM (2022) CDDLeiden\/DrugEx: DrugEx version 3.2.0. https:\/\/doi.org\/10.5281\/ZENODO.7113194","DOI":"10.5281\/ZENODO.7113194"},{"key":"696_CR37","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Machine Learn Res 12:2825\u20132830","journal-title":"J Machine Learn Res"},{"key":"696_CR38","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. pp 2623\u20132631. Accessed 23 Aug 2022","DOI":"10.1145\/3292500.3330701"},{"issue":"1038","key":"696_CR39","doi-asserted-by":"publisher","first-page":"1040","DOI":"10.1038\/s41587-019-0224-x","volume":"37","author":"A Zhavoronkov","year":"2019","unstructured":"Zhavoronkov A, Ivanenkov YA, Aliper A, Veselov MS, Aladinskiy VA, Aladinskaya AV, Terentiev VA, Polykovskiy DA, Kuznetsov MD, Asadulaev AV, Zholus Y, Shayakhmetov A, Zhebrak RR, Minaeva A, Zagribelnyy LI, Lee BA, Soll LH, Madge R, Xing D, Guo L, Aspuru-Guzik TA (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37(1038):1040. https:\/\/doi.org\/10.1038\/s41587-019-0224-x","journal-title":"Nat Biotechnol"},{"key":"696_CR40","unstructured":"Polykovskiy D, Max K Generative tensorial reinforcement learning (GENTRL) model. https:\/\/github.com\/insilicomedicine\/GENTRL. Accessed 6 Aug 2022"},{"key":"696_CR41","unstructured":"Outeiral C, Sanchez-Lengeling B, Guimaraes G, Aspuru-Guzik A Code repo for optimizing distributions of molecules. https:\/\/github.com\/aspuru-guzik-group\/ORGANIC. Accessed 31 Aug 2022"},{"key":"696_CR42","unstructured":"Landrum G RDKit: Cheminformatics and machine-learning software in C++ and Python. 10.5281\/zenodo.5085999. Accessed 23 Aug 2022"},{"key":"696_CR43","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L (2019) Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32. Accessed 23 Aug 2022"},{"key":"696_CR44","unstructured":"Trevett B Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText. https:\/\/github.com\/bentrevett\/pytorch-seq2seq. Accessed 25 Jul 2022"},{"key":"696_CR45","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-017-0235-x","volume":"9","author":"M Olivecrona","year":"2017","unstructured":"Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9:48. https:\/\/doi.org\/10.1186\/s13321-017-0235-x","journal-title":"J Cheminform"},{"issue":"927","key":"696_CR46","first-page":"933","volume":"30","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30(927):933","journal-title":"Adv Neural Inf Process Syst"},{"key":"696_CR47","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2020.565644","volume":"11","author":"D Polykovskiy","year":"2020","unstructured":"Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, Golovanov S, Tatanov O, Belyaev S, Kurbanov R, Artamonov A, Aladinskiy V, Veselov M (2020) Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front Pharmacol 11:565644","journal-title":"Front Pharmacol"},{"key":"696_CR48","doi-asserted-by":"publisher","first-page":"5801","DOI":"10.1021\/ja00385a049","volume":"104","author":"SH Bertz","year":"1982","unstructured":"Bertz SH (1982) Convergence, molecular complexity, and synthetic analysis. J Am Chem Soc 104:5801\u20135803. https:\/\/doi.org\/10.1021\/ja00385a049","journal-title":"J Am Chem Soc"},{"key":"696_CR49","doi-asserted-by":"publisher","first-page":"488","DOI":"10.1002\/jcc.540150503","volume":"15","author":"R Abagyan","year":"1994","unstructured":"Abagyan R, Totrov M, Kuznetsov D (1994) ICM\u2014A new method for protein modeling and design: applications to docking and structure prediction from the distorted native conformation. J Comput Chem 15:488\u2013506","journal-title":"J Comput Chem"},{"issue":"5","key":"696_CR50","first-page":"2","volume":"2","author":"LLC Schr\u00f6dinger","year":"2015","unstructured":"Schr\u00f6dinger LLC (2015) The PyMOL molecular graphics system. Version 2(5):2","journal-title":"Version"},{"key":"696_CR51","doi-asserted-by":"publisher","first-page":"577","DOI":"10.1039\/C9SC04026A","volume":"11","author":"R-R Griffiths","year":"2020","unstructured":"Griffiths R-R, Hern\u00e1ndez-Lobato JM (2020) Constrained bayesian optimization for automatic chemical design using variational autoencoders. Chem Sci 11:577\u2013586. https:\/\/doi.org\/10.1039\/C9SC04026A","journal-title":"Chem Sci"},{"key":"696_CR52","doi-asserted-by":"publisher","first-page":"1700123","DOI":"10.1002\/minf.201700123","volume":"37","author":"T Blaschke","year":"2018","unstructured":"Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H (2018) Application of generative autoencoder in de novo molecular design. Mol Inform 37:1700123. https:\/\/doi.org\/10.1002\/minf.201700123","journal-title":"Mol Inform"},{"key":"696_CR53","doi-asserted-by":"publisher","first-page":"5343","DOI":"10.1021\/acs.jcim.0c01496","volume":"61","author":"T Sousa","year":"2021","unstructured":"Sousa T, Correia J, Pereira V, Rocha M (2021) Generative deep learning for targeted compound design. J Chem Inf Model 61:5343\u20135361. https:\/\/doi.org\/10.1021\/acs.jcim.0c01496","journal-title":"J Chem Inf Model"},{"key":"696_CR54","doi-asserted-by":"publisher","unstructured":"HW, Emmerich MTM, van Westen GJP (2023) Artificial intelligence in multi-objective drug design. Curr Opin Struct Biol 79:102537. https:\/\/doi.org\/10.1016\/j.sbi.2023.102537","DOI":"10.1016\/j.sbi.2023.102537"},{"key":"696_CR55","doi-asserted-by":"publisher","first-page":"34591","DOI":"10.1007\/s11042-020-09148-2","volume":"80","author":"C Park","year":"2021","unstructured":"Park C, Kim K, Yang Y, Kang M, Lim H (2021) Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimed Tools Appl 80:34591\u201334608. https:\/\/doi.org\/10.1007\/s11042-020-09148-2","journal-title":"Multimed Tools Appl"},{"key":"696_CR56","first-page":"03031","volume":"2106","author":"M Mita","year":"2021","unstructured":"Mita M, Yanaka H (2021) Do grammatical error correction models realize grammatical generalization\u202f? Arxiv preprint arXiv 2106:03031","journal-title":"Arxiv preprint arXiv"},{"key":"696_CR57","first-page":"05940","volume":"1804","author":"M Junczys-Dowmunt","year":"2018","unstructured":"Junczys-Dowmunt M, Grundkiewicz R, Guha S, Heafield K (2018) Approaching neural grammatical error correction as a low-resource machine translation task. arxiv preprint arXiv 1804:05940","journal-title":"arxiv preprint arXiv"},{"key":"696_CR58","doi-asserted-by":"crossref","unstructured":"Ge T, Wei F, Zhou M (2018) Fluency boost learning and inference for neural grammatical error correction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1055\u20131065","DOI":"10.18653\/v1\/P18-1097"},{"key":"696_CR59","volume-title":"Grammatical error correction in non-native English","author":"Z Yuan","year":"2017","unstructured":"Yuan Z (2017) Grammatical error correction in non-native English. University of Cambridge, Computer Laboratory"},{"key":"696_CR60","doi-asserted-by":"publisher","first-page":"2064","DOI":"10.1021\/acs.jcim.1c00600","volume":"62","author":"V Bagal","year":"2022","unstructured":"Bagal V, Aggarwal R, Vinod PK, Priyakumar UD (2022) MolGPT: molecular generation using a transformer-decoder model. J Chem Inf Model 62:2064\u20132076. https:\/\/doi.org\/10.1021\/acs.jcim.1c00600","journal-title":"J Chem Inf Model"},{"key":"696_CR61","doi-asserted-by":"publisher","first-page":"5637","DOI":"10.1021\/acs.jcim.0c01015","volume":"60","author":"M Langevin","year":"2020","unstructured":"Langevin M, Minoux H, Levesque M, Bianciotto M (2020) Scaffold-constrained molecular generation. J Chem Inf Model 60:5637\u20135646. https:\/\/doi.org\/10.1021\/acs.jcim.0c01015","journal-title":"J Chem Inf Model"},{"key":"696_CR62","doi-asserted-by":"publisher","first-page":"1411","DOI":"10.1021\/acs.jcim.2c00205","volume":"62","author":"TM Creanza","year":"2022","unstructured":"Creanza TM, Lamanna G, Delre P, Contino M, Corriero N, Saviano M, Mangiatordi GF, Ancona N (2022) DeLA-Drug: a deep learning algorithm for automated design of druglike analogues. J Chem Inf Model 62:1411\u20131424. https:\/\/doi.org\/10.1021\/acs.jcim.2c00205","journal-title":"J Chem Inf Model"},{"key":"696_CR63","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1021\/acsmedchemlett.0c00540","volume":"12","author":"GM Makara","year":"2021","unstructured":"Makara GM, Kov\u00e1cs L, Szab\u00f3 I, Po\u030bcze G, (2021) Derivatization design of synthetically accessible space for optimization: in silico synthesis vs deep generative design. ACS Med Chem Lett 12:185\u2013194","journal-title":"ACS Med Chem Lett"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00696-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00696-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00696-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,14]],"date-time":"2023-02-14T05:10:07Z","timestamp":1676351407000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00696-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,14]]},"references-count":63,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["696"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00696-x","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2022-x3zng-v2","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv-2022-x3zng","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,14]]},"assertion":[{"value":"13 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"LS and OJBM conceived the study. LS developed and implemented the method. WJ and GJPvW supervised the study. OJBM, WJ, and GJPvW provided feedback and critical input. All authors read and approved the final manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Author contributions"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Funding"}},{"value":"The data and code required to recreate the results of this paper are available in the following GitHub repository:\n                      \n                      .","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Availability of data and materials"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare that they have no competing interests.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"22"}}