{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T22:31:23Z","timestamp":1775946683644,"version":"3.50.1"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,15]],"date-time":"2024-01-15T00:00:00Z","timestamp":1705276800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,15]],"date-time":"2024-01-15T00:00:00Z","timestamp":1705276800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100009592","name":"Beijing Municipal Science and Technology Commission","doi-asserted-by":"publisher","award":["Z211100003521001"],"award-info":[{"award-number":["Z211100003521001"]}],"id":[{"id":"10.13039\/501100009592","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Mach Intell"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Generative models for molecules based on sequential line notation (for example, the simplified molecular-input line-entry system) or graph representation have attracted an increasing interest in the field of structure-based drug design, but they struggle to capture important three-dimensional (3D) spatial interactions and often produce undesirable molecular structures. To address these challenges, we introduce Lingo3DMol, a pocket-based 3D molecule generation method that combines language models and geometric deep learning technology. A new molecular representation, the fragment-based simplified molecular-input line-entry system with local and global coordinates, was developed to assist the model in learning molecular topologies and atomic spatial positions. Additionally, we trained a separate non-covalent interaction predictor to provide essential binding pattern information for the generative model. Lingo3DMol can efficiently traverse drug-like chemical spaces, preventing the formation of unusual structures. The Directory of Useful Decoys-Enhanced dataset was used for evaluation. Lingo3DMol outperformed state-of-the-art methods in terms of drug likeness, synthetic accessibility, pocket binding mode and molecule generation speed.<\/jats:p>","DOI":"10.1038\/s42256-023-00775-6","type":"journal-article","created":{"date-parts":[[2024,1,15]],"date-time":"2024-01-15T11:02:25Z","timestamp":1705316545000},"page":"62-73","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":54,"title":["Generation of 3D molecules in pockets via a language model"],"prefix":"10.1038","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-1220-1458","authenticated-orcid":false,"given":"Wei","family":"Feng","sequence":"first","affiliation":[]},{"given":"Lvwei","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Zaiyun","family":"Lin","sequence":"additional","affiliation":[]},{"given":"Yanhao","family":"Zhu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9282-2105","authenticated-orcid":false,"given":"Han","family":"Wang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0004-8509-2385","authenticated-orcid":false,"given":"Jianqiang","family":"Dong","sequence":"additional","affiliation":[]},{"given":"Rong","family":"Bai","sequence":"additional","affiliation":[]},{"given":"Huting","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Jielong","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Peng","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3822-9110","authenticated-orcid":false,"given":"Bo","family":"Huang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7168-3676","authenticated-orcid":false,"given":"Wenbiao","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,15]]},"reference":[{"key":"775_CR1","doi-asserted-by":"publisher","first-page":"787","DOI":"10.1016\/j.chembiol.2003.09.002","volume":"10","author":"AC Anderson","year":"2003","unstructured":"Anderson, A. C. The process of structure-based drug design. Chem. Biol. 10, 787\u2013797 (2003).","journal-title":"Chem. Biol."},{"key":"775_CR2","unstructured":"Bjerrum, E. J. & Threlfall, R. Molecular generation with recurrent neural networks (RNNs). Preprint at https:\/\/arxiv.org\/abs\/1705.04612 (2017)."},{"key":"775_CR3","unstructured":"Kusner, M. J., Paige, B. & Hern\u00e1ndez-Lobato, J. M. Grammar variational autoencoder. Preprint at https:\/\/arxiv.org\/abs\/1703.01925 (2017)."},{"key":"775_CR4","doi-asserted-by":"crossref","unstructured":"Segler, M. H., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120\u2013131 (2018).","DOI":"10.1021\/acscentsci.7b00512"},{"key":"775_CR5","doi-asserted-by":"publisher","first-page":"3240","DOI":"10.1021\/acs.jcim.0c01494","volume":"61","author":"M Xu","year":"2021","unstructured":"Xu, M., Ran, T. & Chen, H. De novo molecule design through the molecular generative model conditioned by 3D information of protein binding sites. J. Chem. Inform. Model. 61, 3240\u20133254 (2021).","journal-title":"J. Chem. Inform. Model."},{"key":"775_CR6","unstructured":"Li, Y., Vinyals, O., Dyer, C., Pascanu, R. & Battaglia, P. Learning deep generative models of graphs. Preprint at https:\/\/arxiv.org\/abs\/1803.03324 (2018)."},{"key":"775_CR7","unstructured":"Liu, Q., Allamanis, M., Brockschmidt, M. & Gaunt, A. L. Constrained graph variational autoencoders for molecule design. Preprint at https:\/\/arxiv.org\/abs\/1805.09076 (2018)."},{"key":"775_CR8","unstructured":"Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. Preprint at https:\/\/arxiv.org\/abs\/1802.04364 (2018)."},{"key":"775_CR9","unstructured":"Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at https:\/\/arxiv.org\/abs\/2001.09382 (2020)."},{"key":"775_CR10","doi-asserted-by":"publisher","first-page":"4200","DOI":"10.1021\/acs.jcim.0c00411","volume":"60","author":"PG Francoeur","year":"2020","unstructured":"Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inform. Model. 60, 4200\u20134215 (2020).","journal-title":"J. Chem. Inform. Model."},{"key":"775_CR11","doi-asserted-by":"crossref","unstructured":"Skalic, M., Sabbadin, D., Sattarov, B., Sciabola, S. & De Fabritiis, G. From target to drug: generative modeling for the multimodal structure-based ligand design. Mol. Pharm. 16, 4282\u20134291 (2019).","DOI":"10.1021\/acs.molpharmaceut.9b00634"},{"key":"775_CR12","unstructured":"Gebauer, N. W. A., Gastegger, M. & Sch\u00fctt, K. T. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. Preprint at https:\/\/arxiv.org\/abs\/1906.00957 (2019)."},{"key":"775_CR13","doi-asserted-by":"publisher","first-page":"2701","DOI":"10.1039\/D1SC05976A","volume":"13","author":"M Ragoza","year":"2022","unstructured":"Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701\u20132713 (2022).","journal-title":"Chem. Sci."},{"key":"775_CR14","first-page":"6229","volume":"34","author":"S Luo","year":"2021","unstructured":"Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. Adv. Neural Inf. Process. Syst. 34, 6229\u20136239 (2021).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"775_CR15","unstructured":"Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. Preprint at https:\/\/arxiv.org\/abs\/2204.09410 (2022)."},{"key":"775_CR16","unstructured":"Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. Preprint at https:\/\/arxiv.org\/abs\/2205.07249 (2022)."},{"key":"775_CR17","doi-asserted-by":"publisher","first-page":"13664","DOI":"10.1039\/D1SC04444C","volume":"12","author":"Y Li","year":"2021","unstructured":"Li, Y., Pei, J. & Lai, L. Structure-based de novo drug design using 3D deep generative models. Chem. Sci. 12, 13664\u201313675 (2021).","journal-title":"Chem. Sci."},{"key":"775_CR18","unstructured":"Guan, J. et al. 3D equivariant diffusion for target-aware molecule generation and affinity prediction. Preprint at https:\/\/arxiv.org\/abs\/2303.03543 (2023)."},{"key":"775_CR19","unstructured":"Garcia, S. V., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. Preprint at https:\/\/arxiv.org\/abs\/2102.09844 (2021)."},{"key":"775_CR20","unstructured":"Hoogeboom, E., Garcia, S. V., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. Preprint at https:\/\/arxiv.org\/abs\/2203.17003 (2022)."},{"key":"775_CR21","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1038\/nchem.1243","volume":"4","author":"GR Bickerton","year":"2012","unstructured":"Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90\u201398 (2012).","journal-title":"Nat. Chem."},{"key":"775_CR22","doi-asserted-by":"crossref","unstructured":"Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).","DOI":"10.1186\/1758-2946-1-8"},{"key":"775_CR23","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1007\/s10822-013-9672-4","volume":"27","author":"PG Polishchuk","year":"2013","unstructured":"Polishchuk, P. G., Madzhidov, T. I. & Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Aided Mol. Des. 27, 675\u2013679 (2013).","journal-title":"J. Comput. Aided Mol. Des."},{"key":"775_CR24","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger, D. Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31\u201336 (1988).","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"775_CR25","unstructured":"Corso, G., St\u00e4rk, H., Jing, B., Barzilay, R. & Jaakkola, T. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at https:\/\/arxiv.org\/abs\/2210.01776 (2022)."},{"key":"775_CR26","doi-asserted-by":"publisher","first-page":"1734","DOI":"10.1021\/acs.jcim.1c01406","volume":"62","author":"K Ding","year":"2022","unstructured":"Ding, K. et al. Observing noncovalent interactions in experimental electron density for macromolecular systems: a novel perspective for protein\u2013ligand interaction research. J. Chem. Inf. Model. 62, 1734\u20131743 (2022).","journal-title":"J. Chem. Inf. Model."},{"key":"775_CR27","doi-asserted-by":"crossref","unstructured":"Lewis, M. et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. Preprint at https:\/\/arxiv.org\/abs\/1910.13461 (2019).","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"775_CR28","doi-asserted-by":"publisher","first-page":"015022","DOI":"10.1088\/2632-2153\/ac3ffb","volume":"3","author":"R Irwin","year":"2022","unstructured":"Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).","journal-title":"Mach. Learn. Sci. Technol."},{"key":"775_CR29","doi-asserted-by":"publisher","first-page":"2977","DOI":"10.1021\/jm030580l","volume":"47","author":"R Wang","year":"2004","unstructured":"Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database:collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977\u20132980 (2004).","journal-title":"J. Med. Chem."},{"key":"775_CR30","doi-asserted-by":"publisher","first-page":"534","DOI":"10.1021\/ci100015j","volume":"50","author":"KS Watts","year":"2010","unstructured":"Watts, K. S. et al. Confgen: a conformational search method for efficient generation of bioactive conformers. J. Chem. Inf. Model 50, 534\u2013546 (2010).","journal-title":"J. Chem. Inf. Model"},{"key":"775_CR31","doi-asserted-by":"publisher","first-page":"6582","DOI":"10.1021\/jm300687e","volume":"55","author":"MM Mysinger","year":"2012","unstructured":"Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582\u20136594 (2012).","journal-title":"J. Med. Chem."},{"key":"775_CR32","doi-asserted-by":"publisher","first-page":"3029","DOI":"10.1093\/bioinformatics\/btab184","volume":"37","author":"M Mirdita","year":"2021","unstructured":"Mirdita, M., Steinegger, M., Breitwieser, F., S\u00f6ding, J. & Levy Karin, E. Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics 37, 3029\u20133031 (2021).","journal-title":"Bioinformatics"},{"key":"775_CR33","doi-asserted-by":"crossref","unstructured":"Wojcikowski, M., Zielenkiewicz, P. & Siedlecki, P. Open drug discovery toolkit (ODDT): a new open-source player in the drug discovery field. J. Cheminform. 7, 26 (2015).","DOI":"10.1186\/s13321-015-0078-2"},{"key":"775_CR34","doi-asserted-by":"publisher","first-page":"1739","DOI":"10.1021\/jm0306430","volume":"47","author":"RA Friesner","year":"2004","unstructured":"Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739\u20131749 (2004).","journal-title":"J. Med. Chem."},{"key":"775_CR35","doi-asserted-by":"crossref","unstructured":"Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895\u2013913 (2018).","DOI":"10.1021\/acs.jcim.8b00545"},{"key":"775_CR36","doi-asserted-by":"crossref","unstructured":"Shen, C. et al. Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Brief. Bioinform. 22, bbaa070 (2021).","DOI":"10.1093\/bib\/bbaa070"},{"key":"775_CR37","doi-asserted-by":"publisher","first-page":"D1074","DOI":"10.1093\/nar\/gkx1037","volume":"46","author":"DS Wishart","year":"2017","unstructured":"Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074\u2013D1082 (2017).","journal-title":"Nucleic Acids Res."},{"key":"775_CR38","doi-asserted-by":"publisher","first-page":"1955","DOI":"10.1021\/acs.jmedchem.2c01744","volume":"66","author":"AN Jain","year":"2023","unstructured":"Jain, A. N., Brueckner, A. C., Cleves, A. E., Reibarkh, M. & Sherer, E. C. A distributional model of bound ligand conformational strain: from small molecules up to large peptidic macrocycles. J. Med. Chem. 66, 1955\u20131971 (2023).","journal-title":"J. Med. Chem."},{"key":"775_CR39","doi-asserted-by":"crossref","unstructured":"Gu, S., Smith, M. S., Yang, Y., Irwin, J. J. & Shoichet, B. K. Ligand strain energy in large library docking. J. Chem. Inf. Model. 61, 4331\u20134341 (2021).","DOI":"10.1021\/acs.jcim.1c00368"},{"key":"775_CR40","doi-asserted-by":"publisher","first-page":"5520","DOI":"10.1021\/acs.chemrev.5b00630","volume":"116","author":"U Ryde","year":"2016","unstructured":"Ryde, U. & Soderhjelm, P. Ligand-binding affinity estimates supported by quantum-mechanical methods. Chem. Rev. 116, 5520\u20135566 (2016).","journal-title":"Chem. Rev."},{"key":"775_CR41","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-022-19363-6","volume":"12","author":"L Wang","year":"2022","unstructured":"Wang, L. et al. A pocket-based 3D molecule generative model fueled by experimental electron density. Sci. Rep. 12, 15100 (2022).","journal-title":"Sci. Rep."},{"key":"775_CR42","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1038\/s42004-023-00984-5","volume":"6","author":"W Ma","year":"2023","unstructured":"Ma, W. et al. Using macromolecular electron densities to improve the enrichment of active compounds in virtual screening. Commun. Chem. 6, 173 (2023).","journal-title":"Commun. Chem."},{"key":"775_CR43","unstructured":"Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. Preprint at https:\/\/arxiv.org\/abs\/2203.02923 (2022)."},{"key":"775_CR44","unstructured":"Jing, B., Eismann, S., Soni, P. N. & Dror, R. O. Equivariant graph neural networks for 3D macromolecular structure. Preprint at https:\/\/arxiv.org\/abs\/2106.03843 (2021)."},{"key":"775_CR45","doi-asserted-by":"crossref","unstructured":"Deng, C. et al. Vector neurons: a general framework for SO(3)-equivariant networks. Preprint at https:\/\/arxiv.org\/abs\/2104.12229 (2021).","DOI":"10.1109\/ICCV48922.2021.01198"},{"key":"775_CR46","unstructured":"Simm, G. N. C., Pinsler, R., Cs\u00e1nyi, G. & Hern\u00e1ndez-Lobato, J. M. Symmetry-aware actor-critic for 3D molecular design. Preprint at https:\/\/arxiv.org\/abs\/2011.12747 (2020)."},{"key":"775_CR47","unstructured":"Landrum, G. et al. RDKit: open-source cheminformatics software. GitHub https:\/\/github.com\/rdkit\/rdkit (2016)."},{"key":"775_CR48","first-page":"28877","volume":"34","author":"C Ying","year":"2021","unstructured":"Ying, C. et al. Do transformers really perform badly for graph representation? Adv. Neural Inf. Process. Syst. 34, 28877\u201328888 (2021).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"775_CR49","unstructured":"Feng, W. et al. Data for Lingo3DMol. figshare https:\/\/figshare.com\/articles\/dataset\/Data_for_Lingo3DMol\/24550351 (2023)."},{"key":"775_CR50","unstructured":"Feng, W. et al. Code for Lingo3DMol. figshare https:\/\/figshare.com\/articles\/software\/Code_for_Lingo3DMo\/24633084 (2023)."},{"key":"775_CR51","doi-asserted-by":"crossref","unstructured":"Bajusz, D., Racz, A. & Heberger, K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminform. 7, 20 (2015).","DOI":"10.1186\/s13321-015-0069-3"}],"container-title":["Nature Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00775-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00775-6","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00775-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T00:03:38Z","timestamp":1706054618000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00775-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,15]]},"references-count":51,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["775"],"URL":"https:\/\/doi.org\/10.1038\/s42256-023-00775-6","relation":{},"ISSN":["2522-5839"],"issn-type":[{"value":"2522-5839","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,15]]},"assertion":[{"value":"14 June 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 November 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}