{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,22]],"date-time":"2026-02-22T15:32:37Z","timestamp":1771774357188,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2024,9,2]],"date-time":"2024-09-02T00:00:00Z","timestamp":1725235200000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Natural language is poised to become a key medium for human\u2013machine interactions in the era of large language models. In the field of biochemistry, tasks such as property prediction and molecule mining are critically important yet technically challenging. Bridging molecular expressions in natural language and chemical language can significantly enhance the interpretability and ease of these tasks. Moreover, it can integrate chemical knowledge from various sources, leading to a deeper understanding of molecules.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Recognizing these advantages, we introduce the concept of conversational molecular design, a novel task that utilizes natural language to describe and edit target molecules. To better accomplish this task, we develop ChatMol, a knowledgeable and versatile generative pretrained model. This model is enhanced by incorporating experimental property information, molecular spatial knowledge, and the associations between natural and chemical languages. Several typical solutions including large language models (e.g. ChatGPT) are evaluated, proving the challenge of conversational molecular design and the effectiveness of our knowledge enhancement approach. Case observations and analysis offer insights and directions for further exploration of natural-language interaction in molecular discovery.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Codes and data are provided in https:\/\/github.com\/Ellenzzn\/ChatMol\/tree\/main.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae534","type":"journal-article","created":{"date-parts":[[2024,9,2]],"date-time":"2024-09-02T13:48:17Z","timestamp":1725284897000},"source":"Crossref","is-referenced-by-count":14,"title":["ChatMol: interactive molecular discovery with natural language"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6901-5292","authenticated-orcid":false,"given":"Zheni","family":"Zeng","sequence":"first","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University , Beijing 100084,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2795-7759","authenticated-orcid":false,"given":"Bangchen","family":"Yin","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University , Beijing 100084,","place":["China"]}]},{"given":"Shipeng","family":"Wang","sequence":"additional","affiliation":[{"name":"PingAn Technology, Beijing 100027, China"}]},{"given":"Jiarui","family":"Liu","sequence":"additional","affiliation":[{"name":"PingAn Technology, Beijing 100027, China"}]},{"given":"Cheng","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Beijing University of Posts and Telecommunications , Beijing 100876,","place":["China"]}]},{"given":"Haishen","family":"Yao","sequence":"additional","affiliation":[{"name":"PingAn Technology, Beijing 100027, China"}]},{"given":"Xingzhi","family":"Sun","sequence":"additional","affiliation":[{"name":"PingAn Technology, Beijing 100027, China"}]},{"given":"Maosong","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University , Beijing 100084,","place":["China"]}]},{"given":"Guotong","family":"Xie","sequence":"additional","affiliation":[{"name":"PingAn Technology, Beijing 100027, China"}]},{"given":"Zhiyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University , Beijing 100084,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2024,9,2]]},"reference":[{"key":"2024102914121334900_btae534-B1","first-page":"65","author":"Banerjee","year":"2005"},{"key":"2024102914121334900_btae534-B2","first-page":"3615","author":"Beltagy","year":"2019"},{"key":"2024102914121334900_btae534-B3","first-page":"668","author":"Danel","year":"2020"},{"key":"2024102914121334900_btae534-B4","doi-asserted-by":"crossref","first-page":"1817","DOI":"10.1016\/j.bmc.2011.12.048","article-title":"Biotin sulfone tagged oligomannosides as immunogens for eliciting antibodies against specific mannan epitopes","volume":"20","author":"Despras","year":"2012","journal-title":"Bioorg Med Chem"},{"key":"2024102914121334900_btae534-B5","author":"Du","year":"2022"},{"key":"2024102914121334900_btae534-B6","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1021\/ci010132r","article-title":"Reoptimization of mdl keys for use in drug discovery","volume":"42","author":"Durant","year":"2002","journal-title":"J Chem Inf Comput Sci"},{"key":"2024102914121334900_btae534-B7","first-page":"375","author":"Edwards","year":"2022"},{"key":"2024102914121334900_btae534-B8","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1039\/C9ME00039A","article-title":"Deep learning for molecular design\u2014a review of the state of the art","volume":"4","author":"Elton","year":"2019","journal-title":"Mol Syst Des Eng"},{"key":"2024102914121334900_btae534-B9","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1038\/s42256-021-00438-4","article-title":"Geometry-enhanced molecular representation learning for property prediction","volume":"4","author":"Fang","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2024102914121334900_btae534-B10","first-page":"302","author":"Goh","year":"2018"},{"key":"2024102914121334900_btae534-B11","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1038\/s41586-020-2117-z","article-title":"An open-source drug discovery platform enables ultra-large virtual screens","volume":"580","author":"Gorgulla","year":"2020","journal-title":"Nature"},{"key":"2024102914121334900_btae534-B12","first-page":"59662","article-title":"What can large language models do in chemistry? A comprehensive benchmark on eight tasks","volume":"36","author":"Guo","year":"2023","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024102914121334900_btae534-B13","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1002\/(SICI)1096-987X(199604)17:5\/6<490::AID-JCC1>3.0.CO;2-P","article-title":"Merck molecular force field. I. Basis, form, scope, parameterization, and performance of mmff94","volume":"17","author":"Halgren","year":"1996","journal-title":"J Comput Chem"},{"key":"2024102914121334900_btae534-B14","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.aiopen.2021.08.002","article-title":"Pre-trained models: past, present and future","volume":"2","author":"Han","year":"2021","journal-title":"AI Open"},{"key":"2024102914121334900_btae534-B15","author":"Hao","year":"2020"},{"key":"2024102914121334900_btae534-B16","article-title":"Dual learning for machine translation","volume":"29","author":"He","year":"2016","journal-title":"Adv NeurIPS"},{"key":"2024102914121334900_btae534-B17","first-page":"1277","author":"Huang","year":"2020"},{"key":"2024102914121334900_btae534-B18","doi-asserted-by":"crossref","first-page":"D1202","DOI":"10.1093\/nar\/gkv951","article-title":"Pubchem substance and compound databases","volume":"44","author":"Kim","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024102914121334900_btae534-B19","first-page":"6248","author":"Lai","year":"2021"},{"key":"2024102914121334900_btae534-B20","author":"Landrum","year":"2013"},{"key":"2024102914121334900_btae534-B21","first-page":"1","article-title":"Empowering molecule discovery for molecule-caption translation with large language models: a ChatGPT perspective","author":"Li","year":"2024","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2024102914121334900_btae534-B22","author":"Li","year":"2024"},{"key":"2024102914121334900_btae534-B23","first-page":"2592","author":"Li","year":"2021"},{"key":"2024102914121334900_btae534-B24","first-page":"74","author":"Lin","year":"2004"},{"key":"2024102914121334900_btae534-B25","doi-asserted-by":"crossref","first-page":"108073","DOI":"10.1016\/j.compbiomed.2024.108073","article-title":"Git-mol: a multi-modal large language model for molecular science with graph, image, and text","volume":"171","author":"Liu","year":"2024","journal-title":"Comput Biol Med"},{"key":"2024102914121334900_btae534-B26","first-page":"319","author":"Neumann","year":"2019"},{"key":"2024102914121334900_btae534-B27","first-page":"311","author":"Papineni","year":"2002"},{"key":"2024102914121334900_btae534-B28","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1146\/annurev.pc.34.100183.003215","article-title":"Density functional theory","volume":"34","author":"Parr","year":"1983","journal-title":"Annu Rev Phys Chem"},{"key":"2024102914121334900_btae534-B29","first-page":"8748","author":"Radford","year":"2021"},{"key":"2024102914121334900_btae534-B30","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2024102914121334900_btae534-B31","first-page":"12559","article-title":"Self-supervised graph transformer on large-scale molecular data","volume":"33","author":"Rong","year":"2020","journal-title":"Adv NeurIPS"},{"key":"2024102914121334900_btae534-B32","doi-asserted-by":"crossref","first-page":"cwae012","DOI":"10.1093\/glycob\/cwae012","article-title":"Targeting the glycan epitope type in-acetyllactosamine enables immunodepletion of human pluripotent stem cells from early differentiated cells","volume":"34","author":"Rossdam","year":"2024","journal-title":"Glycobiology"},{"key":"2024102914121334900_btae534-B33","doi-asserted-by":"crossref","first-page":"2111","DOI":"10.1021\/acs.jcim.5b00543","article-title":"Get your atoms in order: an open-source implementation of a novel and robust molecular canonicalization algorithm","volume":"55","author":"Schneider","year":"2015","journal-title":"J Chem Inf Model"},{"key":"2024102914121334900_btae534-B34","author":"Su","year":"2022"},{"key":"2024102914121334900_btae534-B35","doi-asserted-by":"crossref","first-page":"4323","DOI":"10.1093\/bioinformatics\/btaa491","article-title":"Chemical\u2013protein interaction extraction via Gaussian probability distribution and external biomedical knowledge","volume":"36","author":"Sun","year":"2020","journal-title":"Bioinformatics"},{"key":"2024102914121334900_btae534-B36","author":"Tanimoto","year":"1958"},{"key":"2024102914121334900_btae534-B37","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1016\/j.synbio.2023.02.004","article-title":"Discovering the next decade\u2019s synthetic biology research trends with ChatGPT","volume":"8","author":"Tong","year":"2023","journal-title":"Synth Syst Biotechnol"},{"key":"2024102914121334900_btae534-B38","author":"Touvron","year":"2023"},{"key":"2024102914121334900_btae534-B39","first-page":"1","article-title":"Pre-trained language models in biomedical domain: a systematic survey","volume":"56","author":"Wang","year":"2023","journal-title":"ACM Comput Surv"},{"key":"2024102914121334900_btae534-B40","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.sbi.2021.10.001","article-title":"Deep learning approaches for de novo drug design: an overview","volume":"72","author":"Wang","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2024102914121334900_btae534-B41","first-page":"429","author":"Wang","year":"2019"},{"key":"2024102914121334900_btae534-B42","author":"Wang","year":"2023"},{"key":"2024102914121334900_btae534-B43","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J Chem Inf Comput Sci"},{"key":"2024102914121334900_btae534-B44","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1039\/C7SC02664A","article-title":"Moleculenet: a benchmark for molecular machine learning","volume":"9","author":"Wu","year":"2018","journal-title":"Chem Sci"},{"key":"2024102914121334900_btae534-B45","first-page":"3789","author":"Xia","year":"2017"},{"key":"2024102914121334900_btae534-B46","author":"Ye","year":"2023"},{"key":"2024102914121334900_btae534-B47","first-page":"180","author":"Yuan","year":"2021"},{"key":"2024102914121334900_btae534-B48","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1038\/s41467-022-28494-3","article-title":"A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals","volume":"13","author":"Zeng","year":"2022","journal-title":"Nat Commun"},{"key":"2024102914121334900_btae534-B49","author":"Zhang","year":"2024"},{"key":"2024102914121334900_btae534-B50","doi-asserted-by":"crossref","first-page":"690049","DOI":"10.3389\/fgene.2021.690049","article-title":"Graph neural networks and their current applications in bioinformatics","volume":"12","author":"Zhang","year":"2021","journal-title":"Front Genet"},{"key":"2024102914121334900_btae534-B51","author":"Zhao","year":"2024"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae534\/58995574\/btae534.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/9\/btae534\/60195114\/btae534.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/9\/btae534\/60195114\/btae534.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,29]],"date-time":"2024-10-29T14:12:36Z","timestamp":1730211156000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae534\/7747661"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,9]]},"references-count":51,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2024,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae534","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,9]]},"article-number":"btae534"}}