{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T19:55:29Z","timestamp":1777406129556,"version":"3.51.4"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T00:00:00Z","timestamp":1677110400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T00:00:00Z","timestamp":1677110400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003654","name":"Korea Environmental Industry and Technology Institute","doi-asserted-by":"publisher","award":["KEITI:2020002960002"],"award-info":[{"award-number":["KEITI:2020002960002"]}],"id":[{"id":"10.13039\/501100003654","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Korea Ministry of Environment","award":["NTIS:1485017120"],"award-info":[{"award-number":["NTIS:1485017120"]}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2022M3E5F3081268"],"award-info":[{"award-number":["NRF-2022M3E5F3081268"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.<\/jats:p>","DOI":"10.1186\/s13321-023-00693-0","type":"journal-article","created":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T07:02:52Z","timestamp":1677135772000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["Reconstruction of lossless molecular representations from fingerprints"],"prefix":"10.1186","volume":"15","author":[{"given":"Umit V.","family":"Ucak","sequence":"first","affiliation":[]},{"given":"Islambek","family":"Ashyrmamatov","sequence":"additional","affiliation":[]},{"given":"Juyong","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,23]]},"reference":[{"issue":"1","key":"693_CR1","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comp Sci 28(1):31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comp Sci"},{"key":"693_CR2","unstructured":"ChemAxon Extended SMILES and SMARTS CXSMILES and CXSMARTS Documentation. https:\/\/docs.chemaxon.com\/display\/docs\/chemaxon-extended-smiles-and-smarts-cxsmiles-and-cxsmarts.md#src-1806633_ChemAxonExtendedSMILESandSMARTS-CXSMILESandCXSMARTS-Fragmentgrouping. Accessed 10 Feb 2022"},{"key":"693_CR3","unstructured":"OpenSMILES. Home Page https:\/\/opensmiles.org. Accessed 10 Dec 2021"},{"issue":"9","key":"693_CR4","doi-asserted-by":"publisher","first-page":"1523","DOI":"10.1021\/acscentsci.9b00476","volume":"5","author":"T-S Lin","year":"2019","unstructured":"Lin T-S, Coley CW, Mochigase H, Beech HK, Wang W, Wang Z, Woods E, Craig SL, Johnson JA, Kalow JA, Jensen KF, Olsen BD (2019) Bigsmiles: A structurally-based line notation for describing macromolecules. ACS Cent Sci 5(9):1523\u20131531. https:\/\/doi.org\/10.1021\/acscentsci.9b00476. (PMID: 31572779)","journal-title":"ACS Cent Sci"},{"issue":"1","key":"693_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1758-2946-3-1","volume":"3","author":"A Drefahl","year":"2011","unstructured":"Drefahl A (2011) CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. J Cheminformatics 3(1):1\u20137. https:\/\/doi.org\/10.1186\/1758-2946-3-1","journal-title":"J Cheminformatics"},{"issue":"2","key":"693_CR6","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J Chem Doc 5(2):107\u2013113. https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J Chem Doc"},{"issue":"2","key":"693_CR7","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1021\/ci00062a008","volume":"29","author":"D Weininger","year":"1989","unstructured":"Weininger D, Weininger A, Weininger JL (1989) Smiles. 2. Algorithm for generation of unique smiles notation. J Chem Inf Comp Sci 29(2):97\u2013101. https:\/\/doi.org\/10.1021\/ci00062a008","journal-title":"J Chem Inf Comp Sci"},{"issue":"9","key":"693_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1758-2946-4-22","volume":"4","author":"NM O\u2019Boyle","year":"2012","unstructured":"O\u2019Boyle NM (2012) Towards a Universal SMILES representation\u2014a standard method to generate canonical SMILES based on the InChI. J Cheminformatics 4(9):1\u201314. https:\/\/doi.org\/10.1186\/1758-2946-4-22","journal-title":"J Cheminformatics"},{"issue":"10","key":"693_CR9","doi-asserted-by":"publisher","first-page":"2111","DOI":"10.1021\/acs.jcim.5b00543","volume":"55","author":"N Schneider","year":"2015","unstructured":"Schneider N, Sayle RA, Landrum GA (2015) Get your atoms in order-an open-source implementation of a novel and robust molecular canonicalization algorithm. J Chem Inf Model 55(10):2111\u20132120. https:\/\/doi.org\/10.1021\/acs.jcim.5b00543. (PMID: 26441310)","journal-title":"J Chem Inf Model"},{"issue":"2","key":"693_CR10","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1021\/ci00034a005","volume":"22","author":"WJ Wiswesser","year":"1982","unstructured":"Wiswesser WJ (1982) How the WLN Began in 1949 and How It Might Be in 1999. J Chem Inf Model 22(2):88\u201393. https:\/\/doi.org\/10.1021\/ci00034a005","journal-title":"J Chem Inf Model"},{"issue":"12","key":"693_CR11","doi-asserted-by":"publisher","first-page":"2294","DOI":"10.1021\/ci7004687","volume":"48","author":"RW Homer","year":"2008","unstructured":"Homer RW, Swanson J, Jilek RJ, Hurst T, Clark RD (2008) SYBYL line notation (SLN): a single notation to represent chemical structures, queries, reactions, and virtual libraries. J Chem Inf Model 48(12):2294\u20132307. https:\/\/doi.org\/10.1021\/ci7004687","journal-title":"J Chem Inf Model"},{"issue":"S1","key":"693_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1758-2946-6-s1-p4","volume":"6","author":"S Heller","year":"2014","unstructured":"Heller S (2014) InChI\u2014the worldwide chemical structure standard. J Cheminformatics 6(S1):1\u20139. https:\/\/doi.org\/10.1186\/1758-2946-6-s1-p4","journal-title":"J Cheminformatics"},{"issue":"12","key":"693_CR13","doi-asserted-by":"publisher","first-page":"3355","DOI":"10.1039\/c9sc03666k","volume":"11","author":"K Lin","year":"2020","unstructured":"Lin K, Xu Y, Pei J, Lai L (2020) Automatic retrosynthetic route planning using template-free models. Chem Sci 11(12):3355\u20133364. https:\/\/doi.org\/10.1039\/c9sc03666k","journal-title":"Chem Sci"},{"issue":"3","key":"693_CR14","doi-asserted-by":"publisher","first-page":"1205","DOI":"10.1021\/acs.jcim.8b00706","volume":"59","author":"M Skalic","year":"2019","unstructured":"Skalic M, Jim\u00e9nez J, Sabbadin D, De Fabritiis G (2019) Shape-Based Generative Modeling for de Novo Drug Design. J Chem Inf Model 59(3):1205\u20131214. https:\/\/doi.org\/10.1021\/acs.jcim.8b00706","journal-title":"J Chem Inf Model"},{"issue":"1","key":"693_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00501-7","volume":"13","author":"Y Kwon","year":"2021","unstructured":"Kwon Y, Lee J (2021) MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES. J Cheminformatics 13(1):1\u201314. https:\/\/doi.org\/10.1186\/s13321-021-00501-7","journal-title":"J Cheminformatics"},{"issue":"10","key":"693_CR16","doi-asserted-by":"publisher","first-page":"1103","DOI":"10.1021\/acscentsci.7b00303","volume":"3","author":"B Liu","year":"2017","unstructured":"Liu B, Ramsundar B, Kawthekar P, Shi J, Gomes J, Luu Nguyen Q, Ho S, Sloane J, Wender P, Pande V (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent Sci 3(10):1103\u20131113. https:\/\/doi.org\/10.1021\/acscentsci.7b00303","journal-title":"ACS Cent Sci"},{"issue":"9","key":"693_CR17","doi-asserted-by":"publisher","first-page":"1572","DOI":"10.1021\/acscentsci.9b00576","volume":"5","author":"P Schwaller","year":"2019","unstructured":"Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, Lee AA (2019) Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 5(9):1572\u20131583. https:\/\/doi.org\/10.1021\/acscentsci.9b00576","journal-title":"ACS Cent Sci"},{"issue":"3","key":"693_CR18","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","volume":"59","author":"N Brown","year":"2019","unstructured":"Brown N, Fiscato M, Segler MHS, Vaucher AC (2019) Guacamol: benchmarking models for de novo molecular design. J Chem Inf Model 59(3):1096\u20131108. https:\/\/doi.org\/10.1021\/acs.jcim.8b00839. (PMID: 30887799)","journal-title":"J Chem Inf Model"},{"issue":"1","key":"693_CR19","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1186\/s13321-018-0286-7","volume":"10","author":"J Lim","year":"2018","unstructured":"Lim J, Ryu S, Kim JW, Kim WY (2018) Molecular generative model based on conditional variational autoencoder for de novo molecular design. J Cheminformatics 10(1):31. https:\/\/doi.org\/10.1186\/s13321-018-0286-7","journal-title":"J Cheminformatics"},{"key":"693_CR20","doi-asserted-by":"publisher","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4(2): 268\u2013276. https:\/\/doi.org\/10.1021\/acscentsci.7b00572. arXiv:1610.02415","DOI":"10.1021\/acscentsci.7b00572"},{"key":"693_CR21","unstructured":"Alperstein Z, Cherkasov A, Rolfe JT (2019) All SMILES variational autoencoder. arXiv. doi:1048550\/arxiv.1905.13343"},{"issue":"1","key":"693_CR22","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1021\/acs.jcim.9b00949","volume":"60","author":"S Zheng","year":"2020","unstructured":"Zheng S, Rao J, Zhang Z, Xu J, Yang Y (2020) Predicting retrosynthetic reactions using self-corrected transformer neural networks. J Chem Inf Model 60(1):47\u201355. https:\/\/doi.org\/10.1021\/acs.jcim.9b00949","journal-title":"J Chem Inf Model"},{"issue":"3","key":"693_CR23","doi-asserted-by":"publisher","first-page":"1371","DOI":"10.1039\/c9ra08535a","volume":"10","author":"H Duan","year":"2020","unstructured":"Duan H, Wang L, Zhang C, Guo L, Li J (2020) Retrosynthesis with attention-based NMT model and chemical analysis of \u201cwrong\u2019\u2019 predictions. RSC Adv 10(3):1371\u20131378. https:\/\/doi.org\/10.1039\/c9ra08535a","journal-title":"RSC Adv"},{"issue":"1","key":"693_CR24","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1021\/acs.jcim.0c01074","volume":"61","author":"E Kim","year":"2021","unstructured":"Kim E, Lee D, Kwon Y, Park MS, Choi YS (2021) Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. J Chem Inf Model 61(1):123\u2013133. https:\/\/doi.org\/10.1021\/acs.jcim.0c01074","journal-title":"J Chem Inf Model"},{"issue":"6","key":"693_CR25","doi-asserted-by":"publisher","first-page":"2547","DOI":"10.1021\/acs.jcim.0c01226","volume":"61","author":"AE Bilsland","year":"2021","unstructured":"Bilsland AE, McAulay K, West R, Pugliese A, Bower J (2021) Automated generation of novel fragments using screening data, a dual SMILES autoencoder, transfer learning and syntax correction. J Chem Inf Model 61(6):2547\u20132559. https:\/\/doi.org\/10.1021\/acs.jcim.0c01226","journal-title":"J Chem Inf Model"},{"key":"693_CR26","unstructured":"Dai H, Tian Y, Dai B, Skiena S, Song L (2018) Syntax-directed variational autoencoder for structured data. arXiv. doi:1048550\/arxiv.1802.08786 . arXiv:1802.08786"},{"key":"693_CR27","unstructured":"Kusner MJ, Paige B, Hern\u00e1ndez-Lobato JM (2017) Grammar variational autoencoder. In: Precup D, Teh YW, eds. Proceedings of the 34th international conference on machine learning. Proceedings of machine learning research, vol 70, pp 1945\u20131954. https:\/\/proceedings.mlr.press\/v69\/kusner17a.html"},{"key":"693_CR28","doi-asserted-by":"publisher","unstructured":"O\u2019Boyle NM, Dalke A (2018) DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures. ChemRxiv, 1\u20139. https:\/\/doi.org\/10.26434\/chemrxiv.7097960","DOI":"10.26434\/chemrxiv.7097960"},{"issue":"4","key":"693_CR29","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba947","volume":"1","author":"M Krenn","year":"2020","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1(4):045024. https:\/\/doi.org\/10.1088\/2632-2153\/aba947","journal-title":"Mach Learn Sci Technol"},{"issue":"9","key":"693_CR30","doi-asserted-by":"publisher","first-page":"3098","DOI":"10.1021\/acs.molpharmaceut.7b00346","volume":"14","author":"A Kadurin","year":"2017","unstructured":"Kadurin A, Nikolenko S, Khrabrov K, Aliper A, Zhavoronkov A (2017) druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm 14(9):3098\u20133104. https:\/\/doi.org\/10.1021\/acs.molpharmaceut.7b00346","journal-title":"Mol Pharm"},{"issue":"1","key":"693_CR31","doi-asserted-by":"publisher","first-page":"1186","DOI":"10.1038\/s41467-022-28857-w","volume":"13","author":"UV Ucak","year":"2022","unstructured":"Ucak UV, Ashyrmamatov I, Ko J, Lee J (2022) Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nat Commun 13(1):1186. https:\/\/doi.org\/10.1038\/s41467-022-28857-w","journal-title":"Nat Commun"},{"issue":"1","key":"693_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00482-z","volume":"13","author":"UV Ucak","year":"2021","unstructured":"Ucak UV, Kang T, Ko J, Lee J (2021) Substructure-based neural machine translation for retrosynthetic prediction. J Cheminformatics 13(1):1\u201315. https:\/\/doi.org\/10.1186\/s13321-020-00482-z","journal-title":"J Cheminformatics"},{"key":"693_CR33","unstructured":"Tu Z, Coley CW (2021) Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. arXiv:2110.09681 [cs]. Accessed 2022-02-10"},{"issue":"1","key":"693_CR34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-020-19266-y","volume":"11","author":"IV Tetko","year":"2020","unstructured":"Tetko IV, Karpov P, Van Deursen R, Godin G (2020) State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nat Commun 11(1):1\u201311. https:\/\/doi.org\/10.1038\/s41467-020-19266-y","journal-title":"Nat Commun"},{"issue":"5","key":"693_CR35","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742\u2013754. https:\/\/doi.org\/10.1021\/ci100050t","journal-title":"J Chem Inf Model"},{"issue":"38","key":"693_CR36","doi-asserted-by":"publisher","first-page":"10378","DOI":"10.1039\/d0sc03115a","volume":"11","author":"T Le","year":"2020","unstructured":"Le T, Winter R, No\u00e9 F, Clevert D-A (2020) Neuraldecipher\u2014reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chem Sci 11(38):10378\u201310389. https:\/\/doi.org\/10.1039\/d0sc03115a","journal-title":"Chem Sci"},{"issue":"1","key":"693_CR37","doi-asserted-by":"publisher","first-page":"17304","DOI":"10.1038\/s41598-021-96812-8","volume":"11","author":"Y Kwon","year":"2021","unstructured":"Kwon Y, Kang S, Choi Y-S, Kim I (2021) Evolutionary design of molecules based on deep learning and a genetic algorithm. Sci Rep 11(1):17304. https:\/\/doi.org\/10.1038\/s41598-021-96812-8","journal-title":"Sci Rep"},{"key":"693_CR38","doi-asserted-by":"publisher","unstructured":"Cofala T, Kramer O (2022) An evolutionary fragment-based approach to molecular fingerprint reconstruction. In: Proceedings of the genetic and evolutionary computation conference, pp 1156\u20131163. https:\/\/doi.org\/10.1145\/3512290.3528824","DOI":"10.1145\/3512290.3528824"},{"key":"693_CR39","unstructured":"Jaegle A, Gimeno F, Brock A, Zisserman A, Vinyals O, Carreira J (2021) Perceiver: general perception with iterative attention. Preprint at arXiv:2103.03206"},{"key":"693_CR40","unstructured":"Landrum G (2016) RDKit: open-source cheminformatics software. https:\/\/github.com\/rdkit\/rdkit\/releases\/tag\/Release_2020_03_1"},{"issue":"6","key":"693_CR41","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1021\/ci010132r","volume":"42","author":"JL Durant","year":"2002","unstructured":"Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comp Sci 42(6):1273\u20131280. https:\/\/doi.org\/10.1021\/ci010132r","journal-title":"J Chem Inf Comp Sci"},{"key":"693_CR42","unstructured":"James CA, Weininger D, Delany JD (2002) Daylight Theory Manual. Daylight Chemical Information Systems Inc. https:\/\/daylight.com\/dayhtml\/doc\/theory\/index.html"},{"issue":"5","key":"693_CR43","doi-asserted-by":"publisher","first-page":"1924","DOI":"10.1021\/ci050413p","volume":"46","author":"P Gedeck","year":"2006","unstructured":"Gedeck P, Rohde B, Bartels C (2006) QSAR\u2014how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J Chem Inf Model 46(5):1924\u20131936. https:\/\/doi.org\/10.1021\/ci050413p","journal-title":"J Chem Inf Model"},{"issue":"2","key":"693_CR44","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1021\/ci00046a002","volume":"25","author":"DH Smith","year":"1985","unstructured":"Smith DH, Carhart RE, Venkataraghavan R (1985) Atom Pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comp Sci 25(2):64\u201373. https:\/\/doi.org\/10.1021\/ci00046a002","journal-title":"J Chem Inf Comp Sci"},{"issue":"2","key":"693_CR45","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1021\/ci00054a008","volume":"27","author":"R Nilakantan","year":"1987","unstructured":"Nilakantan R, Bauman N, Venkataraghavan R, Dixon JS (1987) Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors. J Chem Inf Comp Sci 27(2):82\u201385. https:\/\/doi.org\/10.1021\/ci00054a008","journal-title":"J Chem Inf Comp Sci"},{"key":"693_CR46","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 2017-Decem(Nips): 5999\u20136009"},{"issue":"D1","key":"693_CR47","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1093\/nar\/gkw1074","volume":"45","author":"A Gaulton","year":"2016","unstructured":"Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibri\u00e1n-Uhalte E, Davies M, Dedman N, Karlsson A, Magari\u00e3os MP, Overington JP, Papadatos G, Smit I, Leach AR (2016) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):945\u2013954. https:\/\/doi.org\/10.1093\/nar\/gkw1074","journal-title":"Nucleic Acids Res"},{"key":"693_CR48","doi-asserted-by":"publisher","unstructured":"Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) Chapter 12 - pubchem: Integrated platform of small molecules and biological activities. In: Annual reports in computational chemistry, vol 4, pp 217\u2013241. https:\/\/doi.org\/10.1016\/S1574-1400(08)00012-1","DOI":"10.1016\/S1574-1400(08)00012-1"},{"issue":"23","key":"693_CR49","doi-asserted-by":"publisher","first-page":"12788","DOI":"10.1021\/acs.chemrev.0c00534","volume":"120","author":"S Decherchi","year":"2020","unstructured":"Decherchi S, Cavalli A (2020) Thermodynamics and kinetics of drug-target binding by molecular simulation. Chem Rev 120(23):12788\u201312833. https:\/\/doi.org\/10.1021\/acs.chemrev.0c00534","journal-title":"Chem Rev"},{"key":"693_CR50","doi-asserted-by":"crossref","unstructured":"Vogt M, Bajorath J (2020) Ccbmlib\u2014a python package for modeling tanimoto similarity value distributions. F100Research.https:\/\/doi.org\/10.12688\/f1000research.22292.1","DOI":"10.12688\/f1000research.22292.2"},{"key":"693_CR51","unstructured":"Grimsley C, Mayfield E, RS\u00a0Bursten J (2020) Why attention is not explanation: surgical intervention and causal reasoning about neural models. In: Proceedings of the 12th language resources and evaluation conference, pp 1780\u20131790. European Language Resources Association, Marseille, France. https:\/\/aclanthology.org\/2020.lrec-1.220"},{"key":"693_CR52","doi-asserted-by":"publisher","unstructured":"Jain S, Wallace BC (2019) Attention is not explanation. In: Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, vol 1 (Long and Short Papers), pp 3543\u20133556. Association for Computational Linguistics, Minneapolis, Minnesota. https:\/\/doi.org\/10.18653\/v1\/N19-1357.https:\/\/aclanthology.org\/N19-1357","DOI":"10.18653\/v1\/N19-1357."},{"key":"693_CR53","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d\u2019 Alch\u00e9-Buc F, Fox E, Garnett R, eds. Advances in neural information processing systems, pp 8024\u20138035"},{"key":"693_CR54","doi-asserted-by":"publisher","unstructured":"Rush A (2018) The annotated transformer. In: Proceedings of workshop for NLP open source software (NLP-OSS), pp 52\u201360. Association for Computational Linguistics, Melbourne, Australia. https:\/\/doi.org\/10.18653\/v1\/W18-2509","DOI":"10.18653\/v1\/W18-2509"},{"key":"693_CR55","unstructured":"Xiong R, Yang Y, He D, Zheng K, Zheng S, Xing C, Zhang H, Lan Y, Wang L, Liu T (2020) On layer normalization in the transformer architecture. arxiv:2002.04745"},{"key":"693_CR56","doi-asserted-by":"publisher","unstructured":"Rajbhandari S, Rasley J, Ruwase O, He Y (2020) Zero: Memory optimizations toward training trillion parameter models. In: International conference for high performance computing, networking, storage and analysis, SC 2020-November, 1\u201324. https:\/\/doi.org\/10.1109\/SC41405.2020.00024.arXiv:1910.02054","DOI":"10.1109\/SC41405.2020.00024."},{"key":"693_CR57","unstructured":"Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. 5th international conference on learning representations, ICLR 2017\u2014conference track proceedings, pp 1\u201316"},{"key":"693_CR58","doi-asserted-by":"crossref","unstructured":"Karpov P, Godin G, Tetko IV (2019) A transformer model for retrosynthesis. In: Artificial neural networks and machine learning\u2014ICANN 2019: workshop and special sessions, pp 817\u2013830. Springer, Cham","DOI":"10.1007\/978-3-030-30493-5_78"}],"updated-by":[{"DOI":"10.1186\/s13321-023-00739-3","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T00:00:00Z","timestamp":1690329600000}}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00693-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00693-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00693-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T03:05:01Z","timestamp":1690427101000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00693-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,23]]},"references-count":58,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["693"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00693-0","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2022-tqv76","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv-2022-tqv76-v2","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,23]]},"assertion":[{"value":"3 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2023","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1186\/s13321-023-00739-3","URL":"https:\/\/doi.org\/10.1186\/s13321-023-00739-3","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"26"}}