{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T15:19:41Z","timestamp":1780067981133,"version":"3.54.0"},"reference-count":58,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T00:00:00Z","timestamp":1748908800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T00:00:00Z","timestamp":1748908800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Autoregressive models have gained popularity in the field of drug design due to their capability to sample novel molecules from a vast chemical space efficiently. Sampling novel and diverse molecules in an efficient manner is a crucial aspect, as it is important for downstream tasks such as reinforcement learning to identify novel molecules with pre-defined desired properties. Existing sampling strategies like multinomial sampling and beam search often struggle with mode collapses or are computational inefficient, respectively. To address these limitations, we introduce WEISS (Wasserstein efficient sampling strategy), a framework that seamlessly enables autoregressive models to efficiently sample diverse molecules. Our approach, which draws inspiration from the Wasserstein autoencoder, is compatible with any encoder\u2013decoder-based autoregressive model. We show that WEISS effectively mitigates mode collapsing while maintaining token sampling speed 25 times faster than beam search. Secondly, we showcase the efficacy of the proposed method for various drug design tasks such as molecular property optimization and single-step retrosynthesis prediction.<\/jats:p>","DOI":"10.1088\/2632-2153\/addc33","type":"journal-article","created":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T22:55:27Z","timestamp":1747954527000},"page":"025048","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["WEISS: Wasserstein efficient sampling strategy for LLMs in drug design"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1477-0121","authenticated-orcid":true,"given":"Riccardo","family":"Tedoldi","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Junyong","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4970-6461","authenticated-orcid":true,"given":"Ola","family":"Engkvist","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2765-5395","authenticated-orcid":true,"given":"Andrea","family":"Passerini","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2288-5711","authenticated-orcid":false,"given":"Annie M","family":"Westerlund","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9070-740X","authenticated-orcid":true,"given":"Alessandro","family":"Tibo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"266","published-online":{"date-parts":[[2025,6,3]]},"reference":[{"key":"mlstaddc33bib1","article-title":"Attention is all you need","volume":"vol 30","author":"Vaswani","year":"2017"},{"key":"mlstaddc33bib2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"vol 33","author":"Brown","year":"2020"},{"key":"mlstaddc33bib3","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-019-0341-z","article-title":"Exploring the GDB-13 chemical space using deep generative models","volume":"11","author":"Ar\u00fas-Pous","year":"2019","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib4","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1186\/s13321-022-00599-3","article-title":"Transformer-based molecular optimization beyond matched molecular pairs","volume":"14","author":"He","year":"2022","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib5","doi-asserted-by":"publisher","first-page":"1572","DOI":"10.1021\/acscentsci.9b00576","article-title":"Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction","volume":"5","author":"Schwaller","year":"2019","journal-title":"ACS Cent. Sci."},{"key":"mlstaddc33bib6","doi-asserted-by":"publisher","first-page":"3021","DOI":"10.1021\/acs.jcim.3c01685","article-title":"Do chemformers dream of organic matter? Evaluating a transformer model for multistep retrosynthesis","volume":"64","author":"Westerlund","year":"2024","journal-title":"J. Chem. Inf. Model."},{"key":"mlstaddc33bib7","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1186\/s13321-023-00702-2","article-title":"Deep generative model for drug design from protein target sequence","volume":"15","author":"Chen","year":"2023","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib8","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-024-00812-5","article-title":"Reinvent 4: modern AI\u2013driven generative molecule design","volume":"16","author":"Loeffler","year":"2024","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib9","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1186\/s13321-024-00887-0","article-title":"Evaluation of reinforcement learning in transformer-based molecular design","volume":"16","author":"He","year":"2024","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib10","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1602\/neurorx.2.4.541","article-title":"Medicinal chemical properties of successful central nervous system drugs","volume":"2","author":"Pajouhesh","year":"2005","journal-title":"NeuroRx"},{"key":"mlstaddc33bib11","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1021\/jm030267j","article-title":"Characteristic physical properties and structural fragments of marketed oral drugs","volume":"47","author":"Vieth","year":"2004","journal-title":"J. Med. Chem."},{"key":"mlstaddc33bib12","first-page":"3104","article-title":"Sequence to sequence learning with neural networks","volume":"vol 2","author":"Sutskever","year":"2014"},{"key":"mlstaddc33bib13","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"mlstaddc33bib14","article-title":"Spanning tree-based graph generation for molecules","author":"Ahn","year":"2022"},{"key":"mlstaddc33bib15","first-page":"2323","article-title":"Junction tree variational autoencoder for molecular graph generation","author":"Jin","year":"2018"},{"key":"mlstaddc33bib16","article-title":"The curious case of neural text degeneration","author":"Holtzman","year":"2020"},{"key":"mlstaddc33bib17","article-title":"Wasserstein auto-encoders","author":"Tolstikhin","year":"2018"},{"key":"mlstaddc33bib18","article-title":"WEISS: Wasserstein efficient sampling strategy for LLMs in drug design GitHub repository, r1cc4r2o\/weiss","author":"Tedoldi","year":"2025"},{"key":"mlstaddc33bib19","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"mlstaddc33bib20","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2019","journal-title":"J. Mach. Learn. Res."},{"key":"mlstaddc33bib21","doi-asserted-by":"publisher","first-page":"7315","DOI":"10.1038\/s41467-024-51672-4","article-title":"Exhaustive local chemical space exploration using a transformer model","volume":"15","author":"Tibo","year":"2024","journal-title":"Nat. Commun."},{"key":"mlstaddc33bib22","article-title":"Auto-encoding variational bayes","author":"Kingma","year":"2014"},{"key":"mlstaddc33bib23","doi-asserted-by":"publisher","first-page":"10","DOI":"10.18653\/v1\/K16-1002","article-title":"Generating sentences from a continuous space","author":"Bowman","year":"2016"},{"key":"mlstaddc33bib24","doi-asserted-by":"publisher","first-page":"521","DOI":"10.18653\/v1\/D16-1050","author":"Zhang","year":"2016"},{"key":"mlstaddc33bib25","first-page":"182","article-title":"Latent variable dialogue models and their diversity","volume":"vol 2,","author":"Cao","year":"2017"},{"key":"mlstaddc33bib26","doi-asserted-by":"publisher","first-page":"29","DOI":"10.18653\/v1\/2021.insights-1.5","article-title":"Finetuning pretrained transformers into variational autoencoders","author":"Park","year":"2021"},{"key":"mlstaddc33bib27","article-title":"Wasserstein autoregressive models for density time series","author":"Zhang","year":"2020"},{"key":"mlstaddc33bib28","article-title":"Objective-agnostic enhancement of molecule properties via multi-stage vae","author":"Zhou","year":"2023"},{"key":"mlstaddc33bib29","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1021\/acs.jcim.0c01074","article-title":"Valid, plausible and diverse retrosynthesis using tied two-way transformers with latent variables","volume":"61","author":"Kim","year":"2021","journal-title":"J. Chem. Inf. Model."},{"key":"mlstaddc33bib30","article-title":"Greedy importance sampling","volume":"vol 12","author":"Schuurmans","year":"1999"},{"key":"mlstaddc33bib31","article-title":"Understanding top-k sparsification in distributed deep learning","author":"Shi","year":"2019"},{"key":"mlstaddc33bib32","first-page":"8785","article-title":"Incremental sampling without replacement for sequence models","author":"Shi","year":"2020"},{"key":"mlstaddc33bib33","first-page":"p ICML\u201923","article-title":"Arithmetic sampling: parallel diverse decoding for large language models","author":"Vilnis","year":"2023"},{"key":"mlstaddc33bib34","doi-asserted-by":"publisher","first-page":"6112","DOI":"10.18653\/v1\/D19-1633","article-title":"Mask-predict: parallel decoding of conditional masked language models","author":"Ghazvininejad","year":"2019"},{"key":"mlstaddc33bib35","doi-asserted-by":"crossref","DOI":"10.1145\/3642970.3655831","article-title":"Priority sampling of large language models for compilers","author":"Grubisic","year":"2024"},{"key":"mlstaddc33bib36","article-title":"Scaling laws for neural language models","author":"Kaplan","year":"2020"},{"key":"mlstaddc33bib37","article-title":"Fast inference from transformers via speculative decoding","author":"Leviathan","year":"2023"},{"key":"mlstaddc33bib38","article-title":"Speculative decoding with big little decoder","author":"Kim","year":"2024"},{"key":"mlstaddc33bib39","article-title":"Accelerated speculative sampling based on tree monte carlo","author":"Hu","year":"2024"},{"key":"mlstaddc33bib40","article-title":"Break the sequential dependency of llm inference using lookahead decoding","author":"Fu","year":"2024"},{"key":"mlstaddc33bib41","article-title":"GPT3.int8: 8-bit matrix multiplication for transformers at scale","author":"Dettmers","year":"2022"},{"key":"mlstaddc33bib42","article-title":"Reducing transformer depth on demand with structured dropout","author":"Fan","year":"2020"},{"key":"mlstaddc33bib43","first-page":"24101","article-title":"A fast post-training pruning framework for transformers","volume":"vol 35","author":"Kwon","year":"2022"},{"key":"mlstaddc33bib44","article-title":"Deep encoder, shallow decoder: Reevaluating non-autoregressive machine translation","author":"Kasai","year":"2021"},{"key":"mlstaddc33bib45","article-title":"Better & faster large language models via multi-token prediction","author":"Gloeckle","year":"2024"},{"key":"mlstaddc33bib46","first-page":"5877","article-title":"The evolved transformer","volume":"vol 97)","author":"So","year":"2019"},{"key":"mlstaddc33bib47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-016-0187-6","article-title":"ExCAPE-DB: an integrated large scale dataset facilitating big data analysis in chemogenomics","volume":"9","author":"Sun","year":"2017","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib48","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-017-0235-x","article-title":"Molecular de-novo design through deep reinforcement learning","volume":"9","author":"Olivecrona","year":"2017","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib49","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1016\/j.trechm.2022.07.005","article-title":"When machine learning meets molecular synthesis","volume":"4","author":"Oliveira","year":"2022","journal-title":"Trends Chem."},{"key":"mlstaddc33bib50","doi-asserted-by":"publisher","first-page":"762","DOI":"10.1039\/CT9171100762","article-title":"LXIII\u2013a synthesis of tropinone","volume":"111","author":"Robinson","year":"1917","journal-title":"J. Chem. Soc. Trans."},{"key":"mlstaddc33bib51","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1186\/s13321-020-00472-1","article-title":"AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning","volume":"12","author":"Genheden","year":"2020","journal-title":"J. Cheminform."},{"key":"mlstaddc33bib52","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1039\/C9SC04944D","article-title":"Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain","volume":"11","author":"Thakkar","year":"2020","journal-title":"Chem. Sci."},{"key":"mlstaddc33bib53","doi-asserted-by":"publisher","first-page":"5966","DOI":"10.1002\/chem.201605499","article-title":"Neural-symbolic machine learning for retrosynthesis and reaction prediction","volume":"23","author":"Segler","year":"2017","journal-title":"Chem. A Euro. J."},{"key":"mlstaddc33bib54","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/nature25978","article-title":"Planning chemical syntheses with deep neural networks and symbolic ai","volume":"555","author":"Segler","year":"2018","journal-title":"Nature"},{"key":"mlstaddc33bib55","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac3ffb","article-title":"Chemformer: a pre-trained transformer for computational chemistry","volume":"3","author":"Irwin","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstaddc33bib56","article-title":"Retrosynthesis prediction with conditional graph logic network","volume":"vol 32","author":"Dai","year":"2019"},{"key":"mlstaddc33bib57","doi-asserted-by":"publisher","first-page":"1841","DOI":"10.1021\/acs.jcim.2c01486","article-title":"AiZynthTrain: robust, reproducible and extensible pipelines for training synthesis prediction models","volume":"63","author":"Genheden","year":"2023","journal-title":"J. Chem. Inf. Model."},{"key":"mlstaddc33bib58","doi-asserted-by":"publisher","first-page":"3316","DOI":"10.1039\/C9SC05704H","article-title":"Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy","volume":"11","author":"Schwaller","year":"2020","journal-title":"Chem. Sci."}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T06:15:23Z","timestamp":1748931323000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addc33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,3]]},"references-count":58,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,6,3]]},"published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/addc33","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,3]]},"assertion":[{"value":"WEISS: Wasserstein efficient sampling strategy for LLMs in drug design","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-12-12","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-05-22","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-06-03","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}