{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T06:50:04Z","timestamp":1772779804663,"version":"3.50.1"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2022,5,2]],"date-time":"2022-05-02T00:00:00Z","timestamp":1651449600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1942594"],"award-info":[{"award-number":["1942594"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1755850"],"award-info":[{"award-number":["1755850"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1907805"],"award-info":[{"award-number":["1907805"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,13]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Expanding our knowledge of small molecules beyond what is known in nature or designed in wet laboratories promises to significantly advance cheminformatics, drug discovery, biotechnology and material science. In silico molecular design remains challenging, primarily due to the complexity of the chemical space and the non-trivial relationship between chemical structures and biological properties. Deep generative models that learn directly from data are intriguing, but they have yet to demonstrate interpretability in the learned representation, so we can learn more about the relationship between the chemical and biological space. In this article, we advance research on disentangled representation learning for small molecule generation. We build on recent work by us and others on deep graph generative frameworks, which capture atomic interactions via a graph-based representation of a small molecule. The methodological novelty is how we leverage the concept of disentanglement in the graph variational autoencoder framework both to generate biologically relevant small molecules and to enhance model interpretability.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Extensive qualitative and quantitative experimental evaluation in comparison with state-of-the-art models demonstrate the superiority of our disentanglement framework. We believe this work is an important step to address key challenges in small molecule generation with deep generative frameworks.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Training and generated data are made available at https:\/\/ieee-dataport.org\/documents\/dataset-disentangled-representation-learning-interpretable-molecule-generation. All code is made available at https:\/\/anonymous.4open.science\/r\/D-MolVAE-2799\/.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac296","type":"journal-article","created":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T03:30:00Z","timestamp":1651116600000},"page":"3200-3208","source":"Crossref","is-referenced-by-count":10,"title":["Small molecule generation via disentangled representation learning"],"prefix":"10.1093","volume":"38","author":[{"given":"Yuanqi","family":"Du","sequence":"first","affiliation":[{"name":"Department of Computer Science, George Mason University , Fairfax, VA 22030, USA"}]},{"given":"Xiaojie","family":"Guo","sequence":"additional","affiliation":[{"name":"Department of Information Technology and Science, George Mason University , Fairfax, VA 22030, USA"}]},{"given":"Yinkai","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, George Mason University , Fairfax, VA 22030, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5230-4610","authenticated-orcid":false,"given":"Amarda","family":"Shehu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, George Mason University , Fairfax, VA 22030, USA"},{"name":"Center for Advancing Human-Machine Partnerships (CAHMP) , Fairfax, VA 22030, USA"},{"name":"Department of Bioengineering, George Mason University , Fairfax, VA 22030, USA"},{"name":"School of System Biology, George Mason University , Manassas, VA 20110, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2648-9989","authenticated-orcid":false,"given":"Liang","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Emory University , Atlanta, GA 30322, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,5,2]]},"reference":[{"key":"2023050200260724300_btac296-B1","volume-title":"Deep variational information bottleneck","author":"Alemi","year":"2017"},{"key":"2023050200260724300_btac296-B2","doi-asserted-by":"crossref","first-page":"1700123","DOI":"10.1002\/minf.201700123","article-title":"Application of generative autoencoder in de novo molecular design","volume":"37","author":"Blaschke","year":"2018","journal-title":"Mol. Inf."},{"key":"2023050200260724300_btac296-B3","first-page":"609","author":"Bojchevski","year":"2018"},{"key":"2023050200260724300_btac296-B4","first-page":"2610","author":"Chen","year":"2018"},{"key":"2023050200260724300_btac296-B5","author":"Dai","year":"2018"},{"key":"2023050200260724300_btac296-B6","author":"De Samanta","year":"2018"},{"key":"2023050200260724300_btac296-B7","author":"Doshi-Velez","year":"2017"},{"key":"2023050200260724300_btac296-B8","first-page":"1","author":"Du","year":"2020"},{"key":"2023050200260724300_btac296-B9","author":"Du","year":"2021"},{"key":"2023050200260724300_btac296-B10","author":"Du","year":"2021"},{"key":"2023050200260724300_btac296-B11","author":"Eastwood","year":"2018"},{"key":"2023050200260724300_btac296-B12","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1021\/ar950190w","article-title":"Design, synthesis, and evaluation of small-molecule libraries","volume":"29","author":"Ellman","year":"1996","journal-title":"Acc. Chem. Res"},{"key":"2023050200260724300_btac296-B13","article-title":"Structured disentangled representations","volume":"89","author":"Esmaeili","year":"2019","journal-title":"Proc. Mach. Learn. Res"},{"key":"2023050200260724300_btac296-B14","doi-asserted-by":"crossref","first-page":"D945","DOI":"10.1093\/nar\/gkw1074","article-title":"The chembl database in 2017","volume":"45","author":"Gaulton","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023050200260724300_btac296-B15","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","article-title":"Automatic chemical design using a data-driven continuous representation of molecules","volume":"4","author":"G\u00f3mez-Bombarelli","year":"2018","journal-title":"ACS Cent. Sci."},{"key":"2023050200260724300_btac296-B16","first-page":"2434","author":"Grover","year":"2019"},{"key":"2023050200260724300_btac296-B17","author":"Guimaraes","year":"2017"},{"key":"2023050200260724300_btac296-B18","author":"Guo","year":"2018"},{"key":"2023050200260724300_btac296-B20","author":"Guo","year":"2020"},{"key":"2023050200260724300_btac296-B21","author":"Guo","year":"2021"},{"key":"2023050200260724300_btac296-B23","author":"Higgins","year":"2017"},{"key":"2023050200260724300_btac296-B24","author":"Honda","year":"2019"},{"key":"2023050200260724300_btac296-B25","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1021\/ci3001277","article-title":"Zinc: a free tool to discover chemistry for biology","volume":"52","author":"Irwin","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"2023050200260724300_btac296-B26","author":"Janz","year":"2017"},{"key":"2023050200260724300_btac296-B27","author":"Jin","year":"2018"},{"key":"2023050200260724300_btac296-B28","author":"Kim","year":"2018"},{"key":"2023050200260724300_btac296-B29","author":"Kingma","year":"2013"},{"key":"2023050200260724300_btac296-B30","author":"Kipf","year":"2016"},{"key":"2023050200260724300_btac296-B31","author":"Kumar","year":"2018"},{"key":"2023050200260724300_btac296-B32","first-page":"1945","author":"Kusner","year":"2017"},{"key":"2023050200260724300_btac296-B33","author":"Li","year":"2018"},{"key":"2023050200260724300_btac296-B34","first-page":"7795","volume-title":"Advances in Neural Information Processing Systems","author":"Liu","year":"2018"},{"key":"2023050200260724300_btac296-B35","author":"Locatello","year":"2018"},{"key":"2023050200260724300_btac296-B36","article-title":"Information constraints on auto-encoding variational bayes","author":"Lopez","year":"2018"},{"key":"2023050200260724300_btac296-B37","author":"Madhawa","year":"2019"},{"key":"2023050200260724300_btac296-B38","doi-asserted-by":"crossref","DOI":"10.3389\/fphar.2020.565644","article-title":"Molecular sets (MOSES): a benchmarking platform for molecular generation models","volume":"11","author":"Polykovskiy","year":"2020","journal-title":"Front. Pharmacol"},{"key":"2023050200260724300_btac296-B39","doi-asserted-by":"crossref","first-page":"140022","DOI":"10.1038\/sdata.2014.22","article-title":"Quantum chemistry structures and properties of 134 kilo molecules","volume":"1","author":"Ramakrishnan","year":"2014","journal-title":"Sci. Data"},{"key":"2023050200260724300_btac296-B40","first-page":"55","article-title":"On failure modes in molecule generation and optimization","volume":"33","author":"Renz","year":"2020","journal-title":"Drug Discov. Today Technol"},{"key":"2023050200260724300_btac296-B41","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1002\/wcms.1104","article-title":"The enumeration of chemical space","volume":"2","author":"Reymond","year":"2012","journal-title":"Wires Comput. Mol. Sci"},{"key":"2023050200260724300_btac296-B42","first-page":"185","author":"Ridgeway","year":"2018"},{"key":"2023050200260724300_btac296-B43","doi-asserted-by":"crossref","first-page":"2864","DOI":"10.1021\/ci300415d","article-title":"Enumeration of 166 billion organic small molecules in the chemical universe database gdb-17","volume":"52","author":"Ruddigkeit","year":"2012","journal-title":"J. Chem. Inf. Model"},{"key":"2023050200260724300_btac296-B44","doi-asserted-by":"crossref","first-page":"4077","DOI":"10.1021\/acs.jmedchem.5b01849","article-title":"De novo design at the edge of chaos","volume":"59","author":"Schneider","year":"2016","journal-title":"J. Med. Chem"},{"key":"2023050200260724300_btac296-B45","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","article-title":"Generating focused molecule libraries for drug discovery with recurrent neural networks","volume":"4","author":"Segler","year":"2018","journal-title":"ACS Cent. Sci"},{"key":"2023050200260724300_btac296-B46","volume-title":"International Conference on Learning Representations.","author":"Shi","year":"2019"},{"key":"2023050200260724300_btac296-B47","first-page":"412","author":"Simonovsky","year":"2018"},{"key":"2023050200260724300_btac296-B48","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1002\/wcms.23","article-title":"Similarity searching","volume":"1","author":"Stumpfe","year":"2011","journal-title":"WIREs Comput. Mol. Sci"},{"key":"2023050200260724300_btac296-B49","author":"Sundermeyer","year":"2012"},{"key":"2023050200260724300_btac296-B50","first-page":"31","article-title":"SMILES, a chemical language and information system","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Model"},{"key":"2023050200260724300_btac296-B51","doi-asserted-by":"crossref","first-page":"3196","DOI":"10.1002\/anie.201410884","article-title":"Reinventing chemistry","volume":"54","author":"Whitesides","year":"2015","journal-title":"Angew. Chem. Int. Ed. Engl"},{"key":"2023050200260724300_btac296-B52","doi-asserted-by":"crossref","first-page":"e1395","DOI":"10.1002\/wcms.1395","article-title":"Advances and challenges in deep generative models for de novo molecule generation","volume":"9","author":"Xue","year":"2019","journal-title":"Wiley Interdisc. Rev. Comput. Mol. Sci"},{"key":"2023050200260724300_btac296-B53","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1246\/cl.180665","article-title":"Population-based de novo molecule generation, using grammatical evolution","volume":"47","author":"Yoshikawa","year":"2018","journal-title":"Chem. Lett"},{"key":"2023050200260724300_btac296-B54","author":"You","year":"2018"},{"key":"2023050200260724300_btac296-B55","author":"Zhao","year":"2019"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac296\/43711680\/btac296.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/12\/3200\/49885313\/btac296.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/12\/3200\/49885313\/btac296.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T19:30:36Z","timestamp":1700508636000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/12\/3200\/6576627"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,5,2]]},"references-count":53,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2022,6,13]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac296","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,6,15]]},"published":{"date-parts":[[2022,5,2]]}}}