{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T05:21:42Z","timestamp":1777526502038,"version":"3.51.4"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T00:00:00Z","timestamp":1735862400000},"content-version":"vor","delay-in-days":42,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100018537","name":"National Science and Technology Major Project","doi-asserted-by":"publisher","award":["2023ZD0120902"],"award-info":[{"award-number":["2023ZD0120902"]}],"id":[{"id":"10.13039\/501100018537","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U22A2037"],"award-info":[{"award-number":["U22A2037"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62425204"],"award-info":[{"award-number":["62425204"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62122025"],"award-info":[{"award-number":["62122025"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62450002"],"award-info":[{"award-number":["62450002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62432011"],"award-info":[{"award-number":["62432011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004826","name":"Beijing Natural Science Foundation","doi-asserted-by":"publisher","award":["L248013"],"award-info":[{"award-number":["L248013"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human\u2013machine dialogue by leveraging LLM\u2019s strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called \u2018MolOpt-Instructions\u2019 for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at https:\/\/github.com\/blazerye\/DrugAssist, which we hope to pave the way for future research in LLMs\u2019 application for drug discovery.<\/jats:p>","DOI":"10.1093\/bib\/bbae693","type":"journal-article","created":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T13:35:39Z","timestamp":1735911339000},"source":"Crossref","is-referenced-by-count":27,"title":["DrugAssist: a large language model for molecule optimization"],"prefix":"10.1093","volume":"26","author":[{"given":"Geyan","family":"Ye","sequence":"first","affiliation":[{"name":"Tencent AI Lab, Tencent , Shenzhen 518057 ,","place":["China"]}]},{"given":"Xibao","family":"Cai","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hunan University , Changsha 410008 ,","place":["China"]}]},{"given":"Houtim","family":"Lai","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Tencent , Shenzhen 518057 ,","place":["China"]}]},{"given":"Xing","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Tencent , Shenzhen 518057 ,","place":["China"]}]},{"given":"Junhong","family":"Huang","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Tencent , Shenzhen 518057 ,","place":["China"]}]},{"given":"Longyue","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Tencent , Shenzhen 518057 ,","place":["China"]}]},{"given":"Wei","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent AI Lab, Tencent , Shenzhen 518057 ,","place":["China"]}]},{"given":"Xiangxiang","family":"Zeng","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hunan University , Changsha 410008 ,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,3]]},"reference":[{"key":"2025010313352699500_ref1","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"2025010313352699500_ref2","doi-asserted-by":"publisher","DOI":"10.7759\/cureus.40895","article-title":"ChatDoctor: a medical chat model fine-tuned on LLaMA model using medical domain knowledge","volume-title":"Cureus","author":"Yunxiang"},{"key":"2025010313352699500_ref3","article-title":"MedAlpaca\u2013an open-source collection of medical conversational AI models and training data","author":"Han","year":"2023"},{"key":"2025010313352699500_ref4","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocae045","article-title":"PMC-LLaMA: towards building open-source language models for medicine","volume-title":"Journal of the American Medical Informatics Association","author":"Wu","year":"2024"},{"key":"2025010313352699500_ref5","article-title":"BioMedGPT: open multimodal generative pre-trained transformer for biomedicine","author":"Luo","year":"2023"},{"key":"2025010313352699500_ref6","article-title":"Interactive molecular discovery with natural language","author":"Zeng","year":"2023"},{"key":"2025010313352699500_ref7","article-title":"ChatGPT-powered conversational drug editing using retrieval and domain feedback","volume-title":"The Twelfth International Conference on Learning Representations","author":"Liu","year":"2024"},{"key":"2025010313352699500_ref8","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-022-00599-3","article-title":"Transformer-based molecular optimization beyond matched molecular pairs","volume":"14","author":"He","year":"2022","journal-title":"J Chem"},{"key":"2025010313352699500_ref9","doi-asserted-by":"crossref","DOI":"10.24963\/ijcai.2020\/380","article-title":"KGNN: knowledge graph neural network for drug-drug interaction prediction","volume-title":"IJCAI","author":"Lin"},{"key":"2025010313352699500_ref10","doi-asserted-by":"publisher","first-page":"107900","DOI":"10.1016\/j.compbiomed.2023.107900","article-title":"An effective framework for predicting drug\u2013drug interactions based on molecular substructures and knowledge graph neural network","volume":"169","author":"Chen","year":"2024","journal-title":"Comput Biol Med"},{"key":"2025010313352699500_ref11","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1002\/ddr.21895","article-title":"Races of small molecule clinical trials for the treatment of Covid-19: an up-to-date comprehensive review","volume":"83","author":"Suwen","year":"2022","journal-title":"Drug Dev Res"},{"key":"2025010313352699500_ref12","doi-asserted-by":"publisher","first-page":"718","DOI":"10.1002\/ddr.22051","article-title":"Combination of chemotherapy and gaseous signaling molecular therapy: novel$\\beta $-elemene nitric oxide donor derivatives against leukemia","volume":"84","author":"Zhu","year":"2023","journal-title":"Drug Dev Res"},{"key":"2025010313352699500_ref13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00497-0","article-title":"Molecular optimization by capturing chemist\u2019s intuition using deep neural networks","volume":"13","author":"He","year":"2021","journal-title":"J Chem"},{"key":"2025010313352699500_ref14","first-page":"4839","article-title":"Hierarchical generation of molecular graphs using structural motifs","volume-title":"International conference on machine learning","author":"Jin","year":"2020"},{"key":"2025010313352699500_ref15","doi-asserted-by":"publisher","DOI":"10.1002\/minf.201880141","article-title":"Generative recurrent networks for de novo drug design","volume":"37","author":"Gupta","year":"2018","journal-title":"Mol Inform"},{"key":"2025010313352699500_ref16","article-title":"Molecular generation with recurrent neural networks (RNNs)","author":"Bjerrum","year":"2017"},{"key":"2025010313352699500_ref17","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","article-title":"Generating focused molecule libraries for drug discovery with recurrent neural networks","volume":"4","author":"Segler","year":"2018","journal-title":"ACS Cent Sci"},{"key":"2025010313352699500_ref18","first-page":"2323","article-title":"Junction tree variational autoencoder for molecular graph generation","volume-title":"International conference on machine learning","author":"Jin","year":"2018"},{"key":"2025010313352699500_ref19","article-title":"Learning multimodal graph-to-graph translation for molecular optimization","volume-title":"International Conference on Learning Representations","author":"Jin"},{"key":"2025010313352699500_ref20","article-title":"Syntax-directed variational autoencoder for molecule generation","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Dai","year":"2018"},{"key":"2025010313352699500_ref21","article-title":"Constrained graph variational autoencoders for molecule design","volume":"31","author":"Liu","year":"2018","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025010313352699500_ref22","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1007\/978-3-030-01418-6_41","article-title":"Graphvae: Towards generation of small graphs using variational autoencoders","volume-title":"Artificial Neural Networks and Machine Learning\u2013ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27","author":"Simonovsky","year":"2018"},{"key":"2025010313352699500_ref23","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1021\/acsmedchemlett.2c00515","article-title":"Accelerated discovery of macrocyclic CDK2 inhibitor QR-6401 by generative models and structure-based drug design","volume":"14","author":"Yang","year":"2023","journal-title":"ACS Med Chem Lett"},{"key":"2025010313352699500_ref24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-017-0235-x","article-title":"Molecular de-novo design through deep reinforcement learning","volume":"9","author":"Olivecrona","year":"2017","journal-title":"J Chem"},{"key":"2025010313352699500_ref25","doi-asserted-by":"publisher","first-page":"1194","DOI":"10.1021\/acs.jcim.7b00690","article-title":"Reinforced adversarial neural computer for de novo molecular design","volume":"58","author":"Putin","year":"2018","journal-title":"J Chem Inf Model"},{"key":"2025010313352699500_ref26","doi-asserted-by":"publisher","first-page":"3098","DOI":"10.1021\/acs.molpharmaceut.7b00346","article-title":"druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico","volume":"14","author":"Kadurin","year":"2017","journal-title":"Mol Pharm"},{"key":"2025010313352699500_ref27","doi-asserted-by":"publisher","first-page":"902","DOI":"10.1021\/acs.jcim.8b00173","article-title":"mmpdb: an open-source matched molecular pair platform for large multiproperty data sets","volume":"58","author":"Dalke","year":"2018","journal-title":"J Chem Inf Model"},{"key":"2025010313352699500_ref28"},{"key":"2025010313352699500_ref29","article-title":"Mol-Instructions: a large-scale biomolecular instruction dataset for large language models","volume-title":"Proceedings of the Twelfth International Conference on Learning Representations","author":"Fang"},{"key":"2025010313352699500_ref30","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1021\/ci049714+","article-title":"ZINC- a free database of commercially available compounds for virtual screening","volume":"45","author":"Irwin","year":"2005","journal-title":"J Chem Inf Model"},{"key":"2025010313352699500_ref31","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2025010313352699500_ref32","first-page":"3366","article-title":"A continual learning survey: defying forgetting in classification tasks","volume":"44","author":"De Lange","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025010313352699500_ref33","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.acl-long.12","article-title":"How abilities in large language models are affected by supervised fine-tuning data composition","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Dong"},{"key":"2025010313352699500_ref34","article-title":"Llama 2: open foundation and fine-tuned chat models","author":"Touvron","year":"2023"},{"key":"2025010313352699500_ref35","article-title":"PaLM: Scaling language modeling with pathways","volume-title":"J Mach Learn Res","author":"Narang"},{"key":"2025010313352699500_ref36","article-title":"LoRA: low-rank adaptation of large language models","volume-title":"International Conference on Learning Representations","author":"Edward","year":"2022"},{"key":"2025010313352699500_ref37","article-title":"Macaw-LLM: multi-modal language modeling with image, audio, video, and text integration","author":"Lyu","year":"2023"},{"key":"2025010313352699500_ref38","first-page":"2023","article-title":"A comprehensive study of GPT-4V\u2019s multimodal capabilities in medical imaging","author":"Li","year":"2023"},{"key":"2025010313352699500_ref39","article-title":"Siren\u2019s song in the AI ocean: a survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions","volume-title":"ACM Trans Inf Syst","author":"Zhang"},{"key":"2025010313352699500_ref40","article-title":"Retrieval-augmented multi-modal chain-of-thoughts reasoning for large language models","author":"Liu","year":"2023"},{"key":"2025010313352699500_ref41","article-title":"A comprehensive evaluation of GPT-4V on knowledge-intensive visual question answering","author":"Li","year":"2023"},{"key":"2025010313352699500_ref42","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1016\/j.ymeth.2024.01.004","article-title":"Comprehensive evaluation of molecule property prediction with ChatGPT","volume":"222","author":"Cai","year":"2023","journal-title":"Methods"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbae693\/61326352\/bbae693.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbae693\/61326352\/bbae693.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T13:35:46Z","timestamp":1735911346000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae693\/7942355"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae693","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,11,22]]},"article-number":"bbae693"}}