{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T06:09:23Z","timestamp":1774505363566,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2024,4,8]],"date-time":"2024-04-08T00:00:00Z","timestamp":1712534400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,8]],"date-time":"2024-04-08T00:00:00Z","timestamp":1712534400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004063","name":"Knut och Alice Wallenbergs Stiftelse","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002835","name":"Chalmers University of Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002835","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep learning-based approaches for generating novel drug molecules with specific properties have gained a lot of interest in the last few years. Recent studies have demonstrated promising performance for string-based generation of novel molecules utilizing reinforcement learning. In this paper, we develop a unified framework for using reinforcement learning for de novo drug design, wherein we systematically study various on- and off-policy reinforcement learning algorithms and replay buffers to learn an RNN-based policy to generate novel molecules predicted to be active against the dopamine receptor DRD2. Our findings suggest that it is advantageous to use at least both top-scoring and low-scoring molecules for updating the policy when structural diversity is essential. Using all generated molecules at an iteration seems to enhance performance stability for on-policy algorithms. In addition, when replaying high, intermediate, and low-scoring molecules, off-policy algorithms display the potential of improving the structural diversity and number of active molecules generated, but possibly at the cost of a longer exploration phase. Our work provides an open-source framework enabling researchers to investigate various reinforcement learning methods for de novo drug design.<\/jats:p>","DOI":"10.1007\/s10994-024-06519-w","type":"journal-article","created":{"date-parts":[[2024,4,8]],"date-time":"2024-04-08T19:01:26Z","timestamp":1712602886000},"page":"4811-4843","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Utilizing reinforcement learning for de novo drug design"],"prefix":"10.1007","volume":"113","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0765-1837","authenticated-orcid":false,"given":"Hampus","family":"Gummesson Svensson","sequence":"first","affiliation":[]},{"given":"Christian","family":"Tyrchan","sequence":"additional","affiliation":[]},{"given":"Ola","family":"Engkvist","sequence":"additional","affiliation":[]},{"given":"Morteza","family":"Haghir Chehreghani","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,4,8]]},"reference":[{"issue":"15","key":"6519_CR1","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis, G. W., & Murcko, M. A. (1996). The properties of known drugs. 1. molecular frameworks. Journal of Medicinal Chemistry, 39(15), 2887\u20132893.","journal-title":"Journal of Medicinal Chemistry"},{"issue":"12","key":"6519_CR2","doi-asserted-by":"publisher","first-page":"5918","DOI":"10.1021\/acs.jcim.0c00915","volume":"60","author":"T Blaschke","year":"2020","unstructured":"Blaschke, T., Ar\u00fas-Pous, J., Chen, H., Margreitter, C., Tyrchan, C., Engkvist, O., Papadopoulos, K., & Patronov, A. (2020). Reinvent 2.0: An ai tool for de novo drug design. Journal of Chemical Information and Modeling, 60(12), 5918\u20135922.","journal-title":"Journal of Chemical Information and Modeling"},{"issue":"1","key":"6519_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00473-0","volume":"12","author":"T Blaschke","year":"2020","unstructured":"Blaschke, T., Engkvist, O., Bajorath, J., & Chen, H. (2020). Memory-assisted reinforcement learning for diverse molecular de novo design. Journal of Cheminformatics, 12(1), 1\u201317.","journal-title":"Journal of Cheminformatics"},{"key":"6519_CR4","first-page":"6852","volume":"33","author":"J Bradshaw","year":"2020","unstructured":"Bradshaw, J., Paige, B., Kusner, M. J., Segler, M., & Hern\u00e1ndez-Lobato, J. M. (2020). Barking up the right tree: An approach to search over molecule synthesis dags. Advances in Neural Information Processing Systems, 33, 6852\u20136866.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"3","key":"6519_CR5","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","volume":"59","author":"N Brown","year":"2019","unstructured":"Brown, N., Fiscato, M., Segler, M. H., & Vaucher, A. C. (2019). Guacamol: Benchmarking models for de novo molecular design. Journal of Chemical Information and Modeling, 59(3), 1096\u20131108.","journal-title":"Journal of Chemical Information and Modeling"},{"issue":"6","key":"6519_CR6","doi-asserted-by":"publisher","first-page":"1241","DOI":"10.1016\/j.drudis.2018.01.039","volume":"23","author":"H Chen","year":"2018","unstructured":"Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke, T. (2018). The rise of deep learning in drug discovery. Drug Discovery Today, 23(6), 1241\u20131250.","journal-title":"Drug Discovery Today"},{"key":"6519_CR7","unstructured":"Christodoulou, P. (2019). Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207"},{"key":"6519_CR8","unstructured":"Fedus, W., Ramachandran, P., Agarwal, R., Bengio, Y., Larochelle, H., Rowland, M., & Dabney, W. (2020). Revisiting fundamentals of experience replay. In International Conference on Machine Learning, pp. 3061\u20133071. PMLR."},{"key":"6519_CR9","unstructured":"Gao, W., Fu, T., Sun, J., & Coley, C. W. (2022). Sample efficiency matters: a benchmark for practical molecular optimization. arXiv preprint arXiv:2206.12411"},{"issue":"D1","key":"6519_CR10","doi-asserted-by":"publisher","first-page":"1100","DOI":"10.1093\/nar\/gkr777","volume":"40","author":"A Gaulton","year":"2012","unstructured":"Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., Light, Y., McGlinchey, S., Michalovich, D., Al-Lazikani, B., et al. (2012). Chembl: A large-scale bioactivity database for drug discovery. Nucleic Acids Research, 40(D1), 1100\u20131107.","journal-title":"Nucleic Acids Research"},{"issue":"2","key":"6519_CR11","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hern\u00e1ndez-Lobato, J. M., S\u00e1nchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., & Aspuru-Guzik, A. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4(2), 268\u2013276.","journal-title":"ACS Central Science"},{"key":"6519_CR12","unstructured":"Gottipati, S. K., Sattarov, B., Niu, S., Pathak, Y., Wei, H., Liu, S., Blackburn, S., Thomas, K., Coley, C., Tang, J., et al. (2020). Learning to navigate the synthetically accessible chemical space using reinforcement learning. In International Conference on Machine Learning, pp. 3668\u20133679 . PMLR."},{"key":"6519_CR13","unstructured":"Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al. (2018). Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905"},{"issue":"8","key":"6519_CR14","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735\u20131780.","journal-title":"Neural Computation"},{"issue":"51","key":"6519_CR15","doi-asserted-by":"publisher","first-page":"32984","DOI":"10.1021\/acsomega.0c04153","volume":"5","author":"J Horwood","year":"2020","unstructured":"Horwood, J., & Noutahi, E. (2020). Molecular design in synthetically accessible chemical space via deep reinforcement learning. ACS Omega, 5(51), 32984\u201332994.","journal-title":"ACS Omega"},{"issue":"9","key":"6519_CR16","doi-asserted-by":"publisher","first-page":"4062","DOI":"10.1021\/acs.jmedchem.5b01746","volume":"59","author":"Y Hu","year":"2016","unstructured":"Hu, Y., Stumpfe, D., & Bajorath, J. (2016). Computational exploration of molecular scaffolds in medicinal chemistry: Miniperspective. Journal of Medicinal Chemistry, 59(9), 4062\u20134076.","journal-title":"Journal of Medicinal Chemistry"},{"key":"6519_CR17","unstructured":"Jin, W., Barzilay, R., & Jaakkola, T. (2018). Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning, pp. 2323\u20132332. PMLR."},{"key":"6519_CR18","unstructured":"Jin, W., Barzilay, R., & Jaakkola, T. (2020). Multi-objective molecule generation using interpretable substructures. In International Conference on Machine Learning, pp. 4849\u20134859. PMLR"},{"key":"6519_CR19","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980"},{"key":"6519_CR20","unstructured":"Landrum, G. (2006). RDKit: Open-source Cheminformatics. Retrieved from https:\/\/www.rdkit.org\/docs\/Overview.html"},{"key":"6519_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-019-0370-7","volume":"11","author":"L Liang","year":"2019","unstructured":"Liang, L., Ma, C., Du, T., Zhao, Y., Zhao, X., Liu, M., Wang, Z., & Lin, J. (2019). Bioactivity-explorer: A web application for interactive visualization and exploration of bioactivity data. Journal of Cheminformatics, 11, 1\u20136.","journal-title":"Journal of Cheminformatics"},{"key":"6519_CR22","unstructured":"Maus, N., Jones, H. T., Moore, J. S., Kusner, M. J., Bradshaw, J., & Gardner, J. R. (2022). Local latent space bayesian optimization over structured inputs. arXiv preprint arXiv:2201.11872"},{"key":"6519_CR23","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1928\u20131937. PMLR, New York, New York, USA . https:\/\/proceedings.mlr.press\/v48\/mniha16.html"},{"key":"6519_CR24","unstructured":"Neil, D., Segler, M., Guasch, L., Ahmed, M., Plumbley, D., Sellwood, M., & Brown, N. (2018). Exploring deep recurrent models with reinforcement learning for molecule design. In 6th International Conference on Learning Representations."},{"issue":"1","key":"6519_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-017-0235-x","volume":"9","author":"M Olivecrona","year":"2017","unstructured":"Olivecrona, M., Blaschke, T., Engkvist, O., & Chen, H. (2017). Molecular de-novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), 1\u201314.","journal-title":"Journal of Cheminformatics"},{"key":"6519_CR26","doi-asserted-by":"crossref","unstructured":"Rumelhart, D.E., Hinton, G. E., & Williams, R. J. (1985). Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science.","DOI":"10.21236\/ADA164453"},{"key":"6519_CR27","unstructured":"Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952"},{"issue":"8","key":"6519_CR28","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1038\/nrd1799","volume":"4","author":"G Schneider","year":"2005","unstructured":"Schneider, G., & Fechner, U. (2005). Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, 4(8), 649\u2013663.","journal-title":"Nature Reviews Drug Discovery"},{"key":"6519_CR29","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347"},{"key":"6519_CR30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-016-0187-6","volume":"9","author":"J Sun","year":"2017","unstructured":"Sun, J., Jeliazkova, N., Chupakhin, V., Golib-Dzib, J.-F., Engkvist, O., Carlsson, L., Wegner, J., Ceulemans, H., Georgiev, I., Jeliazkov, V., et al. (2017). Excape-db: An integrated large scale dataset facilitating big data analysis in chemogenomics. Journal of Cheminformatics, 9, 1\u20139.","journal-title":"Journal of Cheminformatics"},{"key":"6519_CR31","unstructured":"Thomas, M., O\u2019Boyle, N. M., Bender, A., & De\u00a0Graaf, C. (2022). Re-evaluating sample efficiency in de novo molecule generation. arXiv preprint arXiv:2212.01385."},{"issue":"1","key":"6519_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-022-00646-z","volume":"14","author":"M Thomas","year":"2022","unstructured":"Thomas, M., O\u2019Boyle, N. M., Bender, A., & De Graaf, C. (2022). Augmented hill-climb increases reinforcement learning efficiency for language-based de novo molecule generation. Journal of Cheminformatics, 14(1), 1\u201322.","journal-title":"Journal of Cheminformatics"},{"issue":"6","key":"6519_CR33","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","volume":"18","author":"J Vamathevan","year":"2019","unstructured":"Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., Li, B., Madabhushi, A., Shah, P., Spitzer, M., et al. (2019). Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18(6), 463\u2013477.","journal-title":"Nature Reviews Drug Discovery"},{"key":"6519_CR34","unstructured":"Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & de Freitas, N. (2016). Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224"},{"issue":"D1","key":"6519_CR35","doi-asserted-by":"publisher","first-page":"955","DOI":"10.1093\/nar\/gkw1118","volume":"45","author":"Y Wang","year":"2017","unstructured":"Wang, Y., Bryant, S. H., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B. A., Thiessen, P. A., He, S., & Zhang, J. (2017). Pubchem bioassay: 2017 update. Nucleic Acids Research, 45(D1), 955\u2013963.","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"6519_CR36","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger, D. (1988). Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1), 31\u201336.","journal-title":"Journal of Chemical Information and Computer Sciences"},{"key":"6519_CR37","first-page":"7924","volume":"34","author":"S Yang","year":"2021","unstructured":"Yang, S., Hwang, D., Lee, S., Ryu, S., & Hwang, S. J. (2021). Hit and lead discovery with explorative rl and fragment-based molecule generation. Advances in Neural Information Processing Systems, 34, 7924\u20137936.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"18","key":"6519_CR38","doi-asserted-by":"publisher","first-page":"10520","DOI":"10.1021\/acs.chemrev.8b00728","volume":"119","author":"X Yang","year":"2019","unstructured":"Yang, X., Wang, Y., Byrne, R., Schneider, G., & Yang, S. (2019). Concepts of artificial intelligence for computer-assisted drug discovery. Chemical Reviews, 119(18), 10520\u201310594.","journal-title":"Chemical Reviews"},{"key":"6519_CR39","first-page":"6410","volume":"31","author":"J You","year":"2018","unstructured":"You, J., Liu, B., Ying, Z., Pande, V., & Leskovec, J. (2018). Graph convolutional policy network for goal-directed molecular graph generation. Advances in Neural Information Processing Systems, 31, 6410\u20136421.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"6","key":"6519_CR40","doi-asserted-by":"publisher","first-page":"2572","DOI":"10.1021\/acs.jcim.0c01328","volume":"61","author":"J Zhang","year":"2021","unstructured":"Zhang, J., Mercado, R., Engkvist, O., & Chen, H. (2021). Comparative study of deep generative models on chemical space coverage. Journal of Chemical Information and Modeling, 61(6), 2572\u20132581.","journal-title":"Journal of Chemical Information and Modeling"},{"key":"6519_CR41","unstructured":"Zhou, H., Lin, Z., Li, J., Ye, D., Fu, Q., & Yang, W. (2022). Revisiting discrete soft actor-critic. arXiv preprint arXiv:2209.10081"},{"issue":"1","key":"6519_CR42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-37186-2","volume":"9","author":"Z Zhou","year":"2019","unstructured":"Zhou, Z., Kearnes, S., Li, L., Zare, R. N., & Riley, P. (2019). Optimization of molecules via deep reinforcement learning. Scientific Reports, 9(1), 1\u201310.","journal-title":"Scientific Reports"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06519-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-024-06519-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06519-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T18:07:01Z","timestamp":1717178821000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-024-06519-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,8]]},"references-count":42,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["6519"],"URL":"https:\/\/doi.org\/10.1007\/s10994-024-06519-w","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,8]]},"assertion":[{"value":"29 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 February 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 April 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"There is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}