{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T17:49:13Z","timestamp":1781632153666,"version":"3.54.5"},"reference-count":105,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,5,13]],"date-time":"2021-05-13T00:00:00Z","timestamp":1620864000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,5,13]],"date-time":"2021-05-13T00:00:00Z","timestamp":1620864000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Deep generative models have shown the ability to devise both valid and novel chemistry, which could significantly accelerate the identification of bioactive compounds. Many current models, however, use molecular descriptors or ligand-based predictive methods to guide molecule generation towards a desirable property space. This restricts their application to relatively data-rich targets, neglecting those where little data is available to sufficiently train a predictor. Moreover, ligand-based approaches often bias molecule generation towards previously established chemical space, thereby limiting their ability to identify truly novel chemotypes. In this work, we assess the ability of using molecular docking via Glide\u2014a structure-based approach\u2014as a scoring function to guide the deep generative model REINVENT and compare model performance and behaviour to a ligand-based scoring function. Additionally, we modify the previously published MOSES benchmarking dataset to remove any induced bias towards non-protonatable groups. We also propose a new metric to measure dataset diversity, which is less confounded by the distribution of heavy atom count than the commonly used\n                    <jats:italic>internal diversity<\/jats:italic>\n                    metric. With respect to the main findings, we found that when optimizing the docking score against DRD2, the model improves predicted ligand affinity beyond that of known DRD2 active molecules. In addition, generated molecules occupy complementary chemical and physicochemical space compared to the ligand-based approach, and novel physicochemical space compared to known DRD2 active molecules. Furthermore, the structure-based approach learns to generate molecules that satisfy crucial residue interactions, which is information only available when taking protein structure into account. Overall, this work demonstrates the advantage of using molecular docking to guide de novo molecule generation over ligand-based predictors with respect to predicted affinity, novelty, and the ability to identify key interactions between ligand and protein target. Practically, this approach has applications in early hit generation campaigns to enrich a virtual library towards a particular target, and also in novelty-focused projects, where de novo molecule generation either has no prior ligand knowledge available or should not be biased by it.\n                  <\/jats:p>","DOI":"10.1186\/s13321-021-00516-0","type":"journal-article","created":{"date-parts":[[2021,5,13]],"date-time":"2021-05-13T05:02:58Z","timestamp":1620882178000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":58,"title":["Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study"],"prefix":"10.1186","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1610-3499","authenticated-orcid":false,"given":"Morgan","family":"Thomas","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Robert T.","family":"Smith","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Noel M.","family":"O\u2019Boyle","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris","family":"de Graaf","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andreas","family":"Bender","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,5,13]]},"reference":[{"key":"516_CR1","doi-asserted-by":"publisher","first-page":"806","DOI":"10.1016\/j.tips.2019.09.004","volume":"40","author":"H Chen","year":"2019","unstructured":"Chen H, Engkvist O (2019) Has drug design augmented by artificial intelligence become a reality? Trends Pharmacol Sci 40:806\u2013809","journal-title":"Trends Pharmacol Sci"},{"key":"516_CR2","doi-asserted-by":"publisher","first-page":"1038","DOI":"10.1038\/s41587-019-0224-x","volume":"37","author":"A Zhavoronkov","year":"2019","unstructured":"Zhavoronkov A, Ivanenkov YA, Aliper A, Veselov MS, Aladinskiy VA, Aladinskaya AV et al (2019) Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 37:1038\u20131040","journal-title":"Nat Biotechnol"},{"key":"516_CR3","doi-asserted-by":"publisher","first-page":"688","DOI":"10.1016\/j.cell.2020.01.021","volume":"180","author":"JM Stokes","year":"2020","unstructured":"Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM et al (2020) A deep learning approach to antibiotic discovery. Cell 180:688\u2013702","journal-title":"Cell"},{"key":"516_CR4","doi-asserted-by":"publisher","first-page":"1931","DOI":"10.3389\/fphar.2020.565644","volume":"11","author":"D Polykovskiy","year":"2020","unstructured":"Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, Golovanov S, Tatanov O, Belyaev S et al (2020) Molecular sets (MOSES): A benchmarking platform for molecular generation models. Front Pharmacol 11:1931","journal-title":"Front Pharmacol"},{"key":"516_CR5","doi-asserted-by":"publisher","first-page":"828","DOI":"10.1039\/C9ME00039A","volume":"4","author":"DC Elton","year":"2019","unstructured":"Elton DC, Boukouvalas Z, Fuge MD, Chung PW (2019) Deep learning for molecular design\u2014a review of the state of the art. Mol Syst Des Eng 4:828\u2013849","journal-title":"Mol Syst Des Eng"},{"key":"516_CR6","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","volume":"4","author":"MHS Segler","year":"2018","unstructured":"Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4:120\u2013131","journal-title":"ACS Cent Sci"},{"key":"516_CR7","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-017-0235-x","volume":"9","author":"M Olivecrona","year":"2017","unstructured":"Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9:48","journal-title":"J Cheminform"},{"key":"516_CR8","doi-asserted-by":"publisher","first-page":"eaap7885","DOI":"10.1126\/sciadv.aap7885","volume":"4","author":"M Popova","year":"2018","unstructured":"Popova M, Isayev O, Tropsha A (2018) Deep reinforcement learning for de novo drug design. Sci Adv. 4:eaap7885","journal-title":"Sci Adv."},{"key":"516_CR9","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system: 1: introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31\u201336","journal-title":"J Chem Inf Comput Sci"},{"key":"516_CR10","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268\u2013276","journal-title":"ACS Cent Sci"},{"key":"516_CR11","unstructured":"Kusner MJ, Paige B, Hern\u00e1ndez-Lobato JM. Grammar variational autoencoder. arXiv:1703.01925 [stat.ML]"},{"key":"516_CR12","unstructured":"Jin W, Barzilay R, Jaakkola T. Junction tree variational autoencoder for molecular graph generation. arXiv:1802.04364 [cs.LG]"},{"key":"516_CR13","doi-asserted-by":"crossref","unstructured":"Sanchez-Lengeling B, Outeiral C, Guimaraes GL, Aspuru-Guzik A. Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC). ChemRxiv. 2017","DOI":"10.26434\/chemrxiv.5309668.v2"},{"key":"516_CR14","unstructured":"De Cao N, Kipf T. MolGAN: an implicit generative model for small molecular graphs. arXiv:1805.11973 [stat.ML]"},{"key":"516_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-019-13807-w","volume":"11","author":"O M\u00e9ndez-Lucio","year":"2020","unstructured":"M\u00e9ndez-Lucio O, Baillif B, Clevert DA, Rouqui\u00e9 D, Wichard J (2020) De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat Commun 11:1\u201310","journal-title":"Nat Commun"},{"key":"516_CR16","unstructured":"You J, Liu B, Ying R, Pande V, Leskovec J. Graph convolutional policy network for goal-directed molecular graph generation. arXiv:1806.02473 [cs.LG]"},{"key":"516_CR17","doi-asserted-by":"publisher","first-page":"10752","DOI":"10.1038\/s41598-019-47148-x","volume":"9","author":"Z Zhou","year":"2019","unstructured":"Zhou Z, Kearnes S, Li L, Zare RN, Riley P (2019) Optimization of molecules via deep reinforcement learning. Sci Rep 9:10752","journal-title":"Sci Rep"},{"key":"516_CR18","doi-asserted-by":"publisher","first-page":"3166","DOI":"10.1021\/acs.jcim.9b00325","volume":"59","author":"N St\u00e5hl","year":"2019","unstructured":"St\u00e5hl N, Falkman G, Karlsson A, Mathiason G, Bostr\u00f6m J (2019) Deep reinforcement learning for multiparameter optimization in de novo drug design. J Chem Inf Model 59:3166\u20133176","journal-title":"J Chem Inf Model"},{"key":"516_CR19","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1186\/s13321-019-0397-9","volume":"11","author":"O Prykhodko","year":"2019","unstructured":"Prykhodko O, Johansson SV, Kotsias PC, Ar\u00fas-Pous J, Bjerrum EJ, Engkvist O et al (2019) A de novo molecular generation method using latent vector based generative adversarial network. J Cheminform 11:74","journal-title":"J Cheminform"},{"key":"516_CR20","unstructured":"Gottipati SK, Sattarov B, Niu S, Pathak Y, Wei H, Liu S, et al. Learning to navigate the synthetically accessible chemical space using reinforcement learning. arXiv:2004.12485 [cs.LG]"},{"key":"516_CR21","doi-asserted-by":"publisher","first-page":"32984","DOI":"10.1021\/acsomega.0c04153","volume":"5","author":"J Horwood","year":"2020","unstructured":"Horwood J, Noutahi E (2020) Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning. ACS Omega 5:32984\u201332994","journal-title":"ACS Omega"},{"key":"516_CR22","unstructured":"Jin W, Yang K, Barzilay R, Jaakkola T. Learning multimodal graph-to-graph translation for molecular optimization. arXiv:1812.01070 [cs.LG]"},{"key":"516_CR23","doi-asserted-by":"publisher","first-page":"8016","DOI":"10.1039\/C9SC01928F","volume":"10","author":"R Winter","year":"2019","unstructured":"Winter R, Montanari F, Steffen A, Briem H, No\u00e9 F, Clevert DA (2019) Efficient multi-objective molecular optimization in a continuous latent space. Chem Sci 10:8016\u20138024","journal-title":"Chem Sci"},{"key":"516_CR24","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s10822-007-9150-y","volume":"22","author":"AE Cleves","year":"2008","unstructured":"Cleves AE, Jain AN (2008) Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery. J Comput Aided Mol Des 22:147\u2013159","journal-title":"J Comput Aided Mol Des"},{"key":"516_CR25","doi-asserted-by":"publisher","first-page":"916","DOI":"10.1021\/acs.jcim.7b00403","volume":"58","author":"I Wallach","year":"2018","unstructured":"Wallach I, Heifets A (2018) Most ligand-based classification benchmarks reward memorization rather than generalization. J Chem Inf Model 58:916\u2013932","journal-title":"J Chem Inf Model"},{"key":"516_CR26","doi-asserted-by":"publisher","first-page":"1912","DOI":"10.1021\/ci049782w","volume":"44","author":"RP Sheridan","year":"2004","unstructured":"Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inf Comput Sci 44:1912\u20131928","journal-title":"J Chem Inf Comput Sci"},{"key":"516_CR27","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1016\/j.ddtec.2020.09.003","volume":"32\u201333","author":"P Renz","year":"2019","unstructured":"Renz R, Van Rompaey D, Wegner JK, Hochreiter S, Klambauer G (2019) On failure modes in molecule generation and optimization. Drug Discov Today Technol 32\u201333:55\u201363","journal-title":"Drug Discov Today Technol"},{"key":"516_CR28","doi-asserted-by":"publisher","first-page":"5699","DOI":"10.1021\/acs.jcim.0c00343","volume":"60","author":"S Amabilino","year":"2020","unstructured":"Amabilino S, Pog\u00e1ny P, Pickett SD, Green DVS (2020) Guidelines for recurrent neural network transfer learning-based molecular generation of focused libraries. J Chem Inf Model. 60:5699","journal-title":"J Chem Inf Model."},{"key":"516_CR29","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1186\/s13321-020-00473-0","volume":"12","author":"T Blaschke","year":"2020","unstructured":"Blaschke T, Engkvist O, Bajorath J, Chen H (2020) Memory-assisted reinforcement learning for diverse molecular de novo design. J Cheminform 12:68","journal-title":"J Cheminform"},{"key":"516_CR30","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1038\/s41587-020-0418-2","volume":"38","author":"WP Walters","year":"2020","unstructured":"Walters WP, Murcko M (2020) Assessing the impact of generative AI on medicinal chemistry. Nat Biotechnol 38:143\u2013145","journal-title":"Nat Biotechnol"},{"key":"516_CR31","doi-asserted-by":"publisher","first-page":"935","DOI":"10.1038\/nrd1549","volume":"3","author":"DB Kitchen","year":"2004","unstructured":"Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3:935\u2013949","journal-title":"Nat Rev Drug Discov"},{"key":"516_CR32","doi-asserted-by":"publisher","first-page":"1739","DOI":"10.1021\/jm0306430","volume":"47","author":"RA Friesner","year":"2004","unstructured":"Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT et al (2004) Glide: a new approach for rapid, accurate docking and scoring 1. Method and assessment of docking accuracy. J Med Chem. 47:1739\u201349","journal-title":"J Med Chem."},{"key":"516_CR33","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1006\/jmbi.1996.0897","volume":"267","author":"G Jones","year":"1997","unstructured":"Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727\u2013748","journal-title":"J Mol Biol"},{"key":"516_CR34","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/jcc.21334","volume":"31","author":"O Trott","year":"2009","unstructured":"Trott O, Olson AJ (2009) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455\u2013461","journal-title":"J Comput Chem"},{"key":"516_CR35","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1021\/acs.jcim.8b00545","volume":"59","author":"M Su","year":"2019","unstructured":"Su M, Yang Q, Du Y, Feng G, Liu Z, Li Y et al (2019) Comparative assessment of scoring functions: the CASF-2016 update. J Chem Inf Model 59:895\u2013913","journal-title":"J Chem Inf Model"},{"key":"516_CR36","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1007\/s10822-007-9165-4","volume":"22","author":"IJ Enyedy","year":"2008","unstructured":"Enyedy IJ, Egan WJ (2008) Can we use docking and scoring for hit-to-lead optimization? J Comput Aided Mol Des 22:161\u2013168","journal-title":"J Comput Aided Mol Des"},{"key":"516_CR37","doi-asserted-by":"publisher","first-page":"6582","DOI":"10.1021\/jm300687e","volume":"55","author":"MM Mysinger","year":"2012","unstructured":"Mysinger MM, Carchia M, Irwin JJ, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55:6582\u20136594","journal-title":"J Med Chem"},{"key":"516_CR38","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1002\/jcc.21601","volume":"32","author":"A Bordogna","year":"2011","unstructured":"Bordogna A, Pandini A, Bonati L (2011) Predicting the accuracy of protein-ligand docking on homology models. J Comput Chem 32:81\u201398","journal-title":"J Comput Chem"},{"key":"516_CR39","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/j.ymeth.2014.08.017","volume":"71","author":"H Du","year":"2015","unstructured":"Du H, Brender JR, Zhang J, Zhang Y (2015) Protein structure prediction provides comparable performance to crystallographic structures in docking-based virtual screening. Methods. 71:77\u201384","journal-title":"Methods."},{"key":"516_CR40","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","volume":"28","author":"HM Berman","year":"2000","unstructured":"Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H et al (2000) The protein data bank. Nucleic Acids Res 28:235\u2013242","journal-title":"Nucleic Acids Res"},{"key":"516_CR41","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1038\/d41586-020-03348-4","volume":"588","author":"E Callaway","year":"2020","unstructured":"Callaway E (2020) \u201cIt will change everything\u201d: DeepMind\u2019s AI makes gigantic leap in solving protein structures. Nature 588:203\u2013204","journal-title":"Nature"},{"key":"516_CR42","doi-asserted-by":"crossref","unstructured":"Zhang J, Mercado R, Engkvist O, Chen H. Comparative Study of Deep Generative Models on Chemical Space Coverage Comparative study of deep generative models on chemical space coverage. ChemRxiv. 2020","DOI":"10.26434\/chemrxiv.13234289.v1"},{"key":"516_CR43","doi-asserted-by":"publisher","first-page":"1893","DOI":"10.1021\/ci300604z","volume":"53","author":"DR Koes","year":"2013","unstructured":"Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53:1893\u20131904","journal-title":"J Chem Inf Model"},{"key":"516_CR44","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1038\/s42256-020-0174-5","volume":"2","author":"P-C Kotsias","year":"2020","unstructured":"Kotsias P-C, Ar\u00fas-Pous J, Chen H, Engkvist O, Tyrchan C, Bjerrum EJ (2020) Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat Mach Intell 2:254\u2013265","journal-title":"Nat Mach Intell"},{"key":"516_CR45","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1038\/nature25758","volume":"555","author":"S Wang","year":"2018","unstructured":"Wang S, Che T, Levit A, Shoichet BK, Wacker D, Roth BL (2018) Structure of the D2 dopamine receptor bound to the atypical antipsychotic drug risperidone. Nature 555:269\u2013273","journal-title":"Nature"},{"key":"516_CR46","doi-asserted-by":"publisher","first-page":"829","DOI":"10.1038\/nrd.2017.178","volume":"16","author":"AS Hauser","year":"2017","unstructured":"Hauser AS, Attwood MM, Rask-Andersen M, Schi\u00f6th HB, Gloriam DE (2017) Trends in GPCR drug discovery: new agents, targets and indications. Nat Rev Drug Discov 16:829\u2013842","journal-title":"Nat Rev Drug Discov"},{"key":"516_CR47","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.cell.2020.03.003","volume":"181","author":"M Congreve","year":"2020","unstructured":"Congreve M, de Graaf C, Swain NA, Tate CG (2020) Impact of GPCR structures on drug discovery. Cell 181:81\u201391","journal-title":"Cell"},{"key":"516_CR48","doi-asserted-by":"publisher","first-page":"4311","DOI":"10.1021\/acs.jcim.0c00120","volume":"60","author":"P Ghanakota","year":"2020","unstructured":"Ghanakota P, Bos PH, Konze KD, Staker J, Marques G, Marshall K et al (2020) Combining cloud-based free-energy calculations, synthetically aware enumerations, and goal-directed generative machine learning for rapid large-scale chemical exploration and optimization. J Chem Inf Model 60:4311\u20134325","journal-title":"J Chem Inf Model"},{"key":"516_CR49","doi-asserted-by":"publisher","first-page":"1825","DOI":"10.4155\/fmc-2016-0093","volume":"8","author":"SL Dixon","year":"2016","unstructured":"Dixon SL, Duan J, Smith E, Von Bargen CD, Sherman W, Repasky MP (2016) AutoQSAR: an automated machine learning tool for best-practice quantitative structure-activity relationship modeling. Future Med Chem 8:1825\u20131839","journal-title":"Future Med Chem"},{"key":"516_CR50","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1186\/s13321-020-00446-3","volume":"12","author":"X Li","year":"2020","unstructured":"Li X, Xu Y, Yao H, Lin K (2020) Chemical space exploration based on recurrent neural networks: applications in discovering kinase inhibitors. J Cheminform 12:42","journal-title":"J Cheminform"},{"key":"516_CR51","doi-asserted-by":"crossref","unstructured":"Xu Z, Wauchope OR, Frank AT. Navigating chemical space by interfacing generative artificial intelligence and molecular docking. bioRxiv. 2020","DOI":"10.1101\/2020.06.09.143289"},{"key":"516_CR52","unstructured":"Cieplinski T, Danel T, Podlewska S, Jastrz\u0119bski S. We should at least be able to design molecules that dock well. arXiv:2006.16955 [q-bio.BM]"},{"key":"516_CR53","unstructured":"Kusner MJ, Paige B, Miguel Hern\u00e1ndez-Lobato J. Grammar variational autoencoder. arXiv:1703.01925 [stat.ML]"},{"key":"516_CR54","unstructured":"Cieplinski T. smina-docking-benchmark. GitHub. https:\/\/github.com\/cieplinski-tobiasz\/smina-docking-benchmark. Accessed 23 Nov 2020"},{"key":"516_CR55","first-page":"1062","volume":"55","author":"J Boitreaud","year":"2020","unstructured":"Boitreaud J, Mallet V, Oliver C, Waldispuhl J (2020) OptiMol: optimization of binding affinities in chemical space for drug discovery. J Chem Inf Model 55:1062","journal-title":"J Chem Inf Model"},{"key":"516_CR56","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba947","volume":"1","author":"M Krenn","year":"2020","unstructured":"Krenn M, H\u00e4se F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-Referencing Embedded Strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1:045024","journal-title":"Mach Learn Sci Technol"},{"key":"516_CR57","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-019-0341-z","volume":"11","author":"J Ar\u00fas-Pous","year":"2019","unstructured":"Ar\u00fas-Pous J, Blaschke T, Ulander S, Reymond J-L, Chen H, Engkvist O (2019) Exploring the GDB-13 chemical space using deep generative models. J Cheminform 11:20","journal-title":"J Cheminform"},{"key":"516_CR58","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/s13321-017-0203-5","volume":"9","author":"J Sun","year":"2017","unstructured":"Sun J, Jeliazkova N, Chupakhin V, Golib-Dzib J-F, Engkvist O, Carlsson L et al (2017) ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. J Cheminform 9:17","journal-title":"J Cheminform"},{"key":"516_CR59","doi-asserted-by":"publisher","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","volume":"55","author":"T Sterling","year":"2015","unstructured":"Sterling T, Irwin JJ (2015) ZINC 15-ligand discovery for everyone. J Chem Inf Model 55:2324\u20132337","journal-title":"J Chem Inf Model"},{"key":"516_CR60","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1021\/ci960169p","volume":"37","author":"R Wang","year":"1997","unstructured":"Wang R, Fu Y, Lai L (1997) A new atom-additive method for calculating partition coefficients. J Chem Inf Comput Sci 37:615\u2013621","journal-title":"J Chem Inf Comput Sci"},{"key":"516_CR61","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1517\/17425255.1.1.91","volume":"1","author":"AS Kalgutkar","year":"2005","unstructured":"Kalgutkar AS, Soglia JR (2005) Minimising the potential for metabolic activation in drug discovery. Expert Opin Drug Metab Toxicol 1:91\u2013142","journal-title":"Expert Opin Drug Metab Toxicol"},{"key":"516_CR62","doi-asserted-by":"publisher","first-page":"161","DOI":"10.2174\/1389200054021799","volume":"6","author":"A Kalgutkar","year":"2005","unstructured":"Kalgutkar A, Gardner I, Obach R, Shaffer C, Callegari E, Henne K et al (2005) A comprehensive listing of bioactivation pathways of organic functional groups. Curr Drug Metab 6:161\u2013225","journal-title":"Curr Drug Metab"},{"key":"516_CR63","doi-asserted-by":"publisher","first-page":"2719","DOI":"10.1021\/jm901137j","volume":"53","author":"JB Baell","year":"2010","unstructured":"Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53:2719\u20132740","journal-title":"J Med Chem"},{"key":"516_CR64","unstructured":"RDKit. Open-source cheminformatics. http:\/\/www.rdkit.org"},{"key":"516_CR65","unstructured":"O\u2019Boyle NM. No charge - A simple approach to neutralising charged molecules. Noel O\u2019Blog. 2019. https:\/\/baoilleach.blogspot.com\/2019\/12\/no-charge-simple-approach-to.html. Accessed 7 Feb 2021"},{"key":"516_CR66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/srep28288","volume":"6","author":"AJ Kooistra","year":"2016","unstructured":"Kooistra AJ, Vischer HF, McNaught-Flores D, Leurs R, De Esch IJP, De Graaf C (2016) Function-specific virtual screening for GPCR ligands using a combined scoring method. Sci Rep 6:1\u201321","journal-title":"Sci Rep"},{"key":"516_CR67","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1016\/j.coph.2016.07.007","volume":"30","author":"M Vass","year":"2016","unstructured":"Vass M, Kooistra AJ, Ritschel T, Leurs R, De Esch JI, De Graaf C (2016) Molecular interaction fingerprint approaches for GPCR drug discovery. Curr Opin Pharmacol. 30:59\u201368","journal-title":"Curr Opin Pharmacol."},{"key":"516_CR68","doi-asserted-by":"publisher","first-page":"D930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2019","unstructured":"Mendez D, Gaulton A, Bento PA, Chambers J, De Veij M, Magari\u00f1osMagari PM et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47:D930","journal-title":"Nucleic Acids Res."},{"key":"516_CR69","first-page":"47","volume":"2019","author":"S Kim","year":"2019","unstructured":"Kim S, Chen J, Cheng T, Gindulyte A, He J, He S et al (2019) Pubchem 2019 update: improved access to chemical data. Nucleic Acids Res 2019:47","journal-title":"Nucleic Acids Res"},{"key":"516_CR70","unstructured":"Kingma DP, Ba JL. Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. International Conference on Learning Representations, ICLR; 2015"},{"key":"516_CR71","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s10822-013-9644-8","volume":"27","author":"G Madhavi Sastry","year":"2013","unstructured":"Madhavi Sastry G, Adzhigirey M, Day T, Annabhimoju R, Sherman W (2013) Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J Comput Aided Mol Des 27:221\u2013234","journal-title":"J Comput Aided Mol Des"},{"key":"516_CR72","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1007\/s10822-007-9133-z","volume":"21","author":"JC Shelley","year":"2007","unstructured":"Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M (2007) Epik: a software program for pKa prediction and protonation state generation for drug-like molecules. J Comput Aided Mol Des 21:681\u2013691","journal-title":"J Comput Aided Mol Des"},{"key":"516_CR73","doi-asserted-by":"publisher","first-page":"2284","DOI":"10.1021\/ct200133y","volume":"7","author":"CR Sondergaard","year":"2011","unstructured":"Sondergaard CR, Olsson MHM, Rostkowski M, Jensen JH (2011) Improved treatment of ligands and coupling effects in empirical calculation and rationalization of p K a values. J Chem Theory Comput 7:2284\u20132295","journal-title":"J Chem Theory Comput"},{"key":"516_CR74","doi-asserted-by":"publisher","first-page":"1863","DOI":"10.1021\/acs.jctc.8b01026","volume":"15","author":"K Roos","year":"2019","unstructured":"Roos K, Wu C, Damm W, Reboul M, Stevenson JM, Lu C et al (2019) OPLS3e: extending force field coverage for drug-like small molecules. J Chem Theory Comput 15:1863\u20131874","journal-title":"J Chem Theory Comput"},{"key":"516_CR75","unstructured":"Schr\u00f6dinger Release 2019\u20134. LigPrep"},{"key":"516_CR76","unstructured":"Dask Development Team. Dask: Library for dynamic task scheduling. 2016. https:\/\/dask.org"},{"key":"516_CR77","unstructured":"Bender A. How to Lie With Computational Predictive Models in Drug Discovery. DrugDiscovery.NET - AI in Drug Discovery. 2020. http:\/\/www.drugdiscovery.net\/2020\/10\/13\/how-to-lie-with-computational-predictive-models-in-drug-discovery\/. Accessed 19 Nov 2020"},{"key":"516_CR78","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","volume":"59","author":"N Brown","year":"2019","unstructured":"Brown N, Fiscato M, Segler MHS, Vaucher AC (2019) GuacaMol: benchmarking models for de novo molecular design. J Chem Inf Model 59:1096\u20131108","journal-title":"J Chem Inf Model"},{"key":"516_CR79","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1021\/ci025554v","volume":"43","author":"A Gobbi","year":"2003","unstructured":"Gobbi A, Lee ML (2003) DISE: directed sphere exclusion. J Chem Inf Comput Sci 43:317\u2013323","journal-title":"J Chem Inf Comput Sci"},{"key":"516_CR80","unstructured":"Sayle RA. 2d similarity, diversity and clustering in rdkit. In: RDKit UGM. 2019"},{"key":"516_CR81","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm980708c","volume":"42","author":"SL Dixon","year":"1999","unstructured":"Dixon SL, Koehler RT (1999) The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. J Med Chem 42:2887\u20132900","journal-title":"J Med Chem"},{"key":"516_CR82","doi-asserted-by":"publisher","unstructured":"CHEMBL database release 28. 2021. https:\/\/doi.org\/10.6019\/CHEMBL.database.28","DOI":"10.6019\/CHEMBL.database.28"},{"key":"516_CR83","doi-asserted-by":"publisher","first-page":"2864","DOI":"10.1021\/ci300415d","volume":"52","author":"L Ruddigkeit","year":"2012","unstructured":"Ruddigkeit L, Van Deursen R, Blum LC, Reymond JL (2012) Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model 52:2864\u20132875","journal-title":"J Chem Inf Model"},{"key":"516_CR84","doi-asserted-by":"publisher","first-page":"8732","DOI":"10.1021\/ja902302h","volume":"131","author":"LC Blum","year":"2009","unstructured":"Blum LC, Reymond JL (2009) 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131:8732\u20138733","journal-title":"J Am Chem Soc"},{"key":"516_CR85","unstructured":"Diversity Libraries - Enamine. https:\/\/enamine.net\/hit-finding\/diversity-libraries. Accessed 1 Mar 2021"},{"key":"516_CR86","unstructured":"Targeted Libraries - Enamine. https:\/\/enamine.net\/hit-finding\/focused-libraries. Accessed 1 Mar 2021"},{"key":"516_CR87","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1038\/nature04710","volume":"440","author":"MC Sanguinetti","year":"2006","unstructured":"Sanguinetti MC, Tristani-Firouzi M (2006) hERG potassium channels and cardiac arrhythmia. Nature 440:463\u2013469","journal-title":"Nature"},{"key":"516_CR88","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem. 39:2887\u201393","journal-title":"J Med Chem."},{"key":"516_CR89","unstructured":"McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 [stat.ML]"},{"key":"516_CR90","doi-asserted-by":"publisher","first-page":"987","DOI":"10.1021\/ci025599w","volume":"43","author":"WHB Sauer","year":"2003","unstructured":"Sauer WHB, Schwarz MK (2003) Molecular shape diversity of combinatorial libraries: a prerequisite for broad bioactivity. J Chem Inf Comput Sci 43:987\u20131003","journal-title":"J Chem Inf Comput Sci"},{"key":"516_CR91","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1038\/nchem.1243","volume":"4","author":"GR Bickerton","year":"2012","unstructured":"Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90\u201398","journal-title":"Nat Chem"},{"key":"516_CR92","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/1758-2946-1-8","volume":"1","author":"P Ertl","year":"2009","unstructured":"Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1:8","journal-title":"J Cheminform"},{"key":"516_CR93","doi-asserted-by":"publisher","first-page":"2562","DOI":"10.1021\/acs.jcim.5b00654","volume":"55","author":"S Riniker","year":"2015","unstructured":"Riniker S, Landrum GA (2015) Better informed distance geometry: using what we know to improve conformation generation. J Chem Inf Model 55:2562\u20132574","journal-title":"J Chem Inf Model"},{"key":"516_CR94","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1021\/jm030331x","volume":"47","author":"Z Deng","year":"2004","unstructured":"Deng Z, Chuaqui C, Singh J (2004) Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. J Med Chem 47:337\u2013344","journal-title":"J Med Chem"},{"key":"516_CR95","doi-asserted-by":"publisher","first-page":"1736","DOI":"10.1021\/acs.jcim.8b00234","volume":"58","author":"K Preuer","year":"2018","unstructured":"Preuer K, Renz P, Unterthiner T, Hochreiter S, Klambauer G (2018) Fr\u00e9chet ChemNet distance: a metric for generative models for molecules in drug discovery. J Chem Inf Model 58:1736\u20131741","journal-title":"J Chem Inf Model"},{"key":"516_CR96","unstructured":"Benhenda M. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv:1708.08227 [stat.ML]"},{"key":"516_CR97","doi-asserted-by":"publisher","first-page":"3450","DOI":"10.1021\/jm500126s","volume":"57","author":"J Xiao","year":"2014","unstructured":"Xiao J, Free RB, Barnaeva E, Conroy JL, Doyle T, Miller B et al (2014) Discovery, optimization, and characterization of novel D2 dopamine receptor selective antagonists. J Med Chem 57:3450\u20133463","journal-title":"J Med Chem"},{"key":"516_CR98","doi-asserted-by":"publisher","first-page":"2174","DOI":"10.1016\/j.drudis.2020.09.027","volume":"25","author":"A Tomberg","year":"2020","unstructured":"Tomberg A, Bostr\u00f6m J (2020) Can \u2018easy\u2019 chemistry produce complex, diverse, and novel molecules? Drug Discov Today 25:2174\u20132181","journal-title":"Drug Discov Today"},{"key":"516_CR99","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1021\/acs.jcim.5b00018","volume":"55","author":"C Kramer","year":"2015","unstructured":"Kramer C, Fuchs JE, Liedl KR (2015) Strong nonadditivity as a key structure-activity relationship feature: distinguishing structural changes from assay artifacts. J Chem Inf Model 55:483\u2013494","journal-title":"J Chem Inf Model"},{"key":"516_CR100","doi-asserted-by":"publisher","first-page":"5714","DOI":"10.1021\/acs.jcim.0c00174","volume":"60","author":"W Gao","year":"2020","unstructured":"Gao W, Coley CW (2020) The synthesizability of molecules proposed by generative models. J Chem Inf Model. 60:5714","journal-title":"J Chem Inf Model."},{"key":"516_CR101","doi-asserted-by":"crossref","unstructured":"Steinmann C, Jensen JH. Using a genetic algorithm to find molecules with good docking scores. ChemRxiv. 2021","DOI":"10.26434\/chemrxiv.13525589"},{"key":"516_CR102","unstructured":"Danel T, Szymczak M, Maziarka \u0141, Podolak I, Tabor J, Jastrz\u02db S. De Novo Drug Design with a Docking Score Proxy. In: Machine Learning for Molecules Workshop at NeurIPS 2020. 2020"},{"key":"516_CR103","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1021\/ci020055f","volume":"43","author":"Y Pan","year":"2003","unstructured":"Pan Y, Huang N, Cho S, MacKerell AD (2003) Consideration of molecular weight during compound selection in virtual target-based database screening. J Chem Inf Comput Sci 43:267\u2013272","journal-title":"J Chem Inf Comput Sci"},{"key":"516_CR104","doi-asserted-by":"publisher","first-page":"1564","DOI":"10.1021\/ci600471m","volume":"47","author":"G Carta","year":"2007","unstructured":"Carta G, Knox AJS, Lloyd DG (2007) Unbiasing scoring functions: a new normalization and rescoring strategy. J Chem Inf Model 47:1564\u20131571","journal-title":"J Chem Inf Model"},{"key":"516_CR105","doi-asserted-by":"publisher","first-page":"718","DOI":"10.1002\/cmdc.201500599","volume":"11","author":"AA Kaczor","year":"2016","unstructured":"Kaczor AA, Silva AG, Loza MI, Kolb P, Castro M, Poso A (2016) Structure-based virtual screening for dopamine D2 receptor ligands as potential antipsychotics. ChemMedChem 11:718\u2013729","journal-title":"ChemMedChem"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-021-00516-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-021-00516-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-021-00516-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,3]],"date-time":"2023-11-03T13:52:50Z","timestamp":1699019570000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-021-00516-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,13]]},"references-count":105,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["516"],"URL":"https:\/\/doi.org\/10.1186\/s13321-021-00516-0","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.14138147.v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,13]]},"assertion":[{"value":"2 March 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 May 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 May 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"39"}}