{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T10:49:56Z","timestamp":1768819796044,"version":"3.49.0"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T00:00:00Z","timestamp":1638835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,2,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Molecule generation, which is to generate new molecules, is an important problem in bioinformatics. Typical tasks include generating molecules with given properties, molecular property improvement (i.e. improving specific properties of an input molecule), retrosynthesis (i.e. predicting the molecules that can be used to synthesize a target molecule), etc. Recently, deep-learning-based methods received more attention for molecule generation. The labeled data of bioinformatics is usually costly to obtain, but there are millions of unlabeled molecules. Inspired by the success of sequence generation in natural language processing with unlabeled data, we would like to explore an effective way of using unlabeled molecules for molecule generation.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We propose a new method, back translation for molecule generation, which is a simple yet effective semisupervised method. Let X be the source domain, which is the collection of properties, the molecules to be optimized, etc. Let Y be the target domain which is the collection of molecules. In particular, given a main task which is about to learn a mapping from the source domain X to the target domain Y, we first train a reversed model g for the Y to X mapping. After that, we use g to back translate the unlabeled data in Y to X and obtain more synthetic data. Finally, we combine the synthetic data with the labeled data and train a model for the main task. We conduct experiments on molecular property improvement and retrosynthesis, and we achieve state-of-the-art results on four molecule generation tasks and one retrosynthesis benchmark, USPTO-50k.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Our code and data are available at https:\/\/github.com\/fyabc\/BT4MolGen.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab817","type":"journal-article","created":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T12:34:55Z","timestamp":1638362095000},"page":"1244-1251","source":"Crossref","is-referenced-by-count":8,"title":["Back translation for molecule generation"],"prefix":"10.1093","volume":"38","author":[{"given":"Yang","family":"Fan","sequence":"first","affiliation":[{"name":"University of Science and Technology of China , Hefei, Anhui 230027, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9823-9033","authenticated-orcid":false,"given":"Yingce","family":"Xia","sequence":"additional","affiliation":[{"name":"Microsoft Research , Beijing 100080, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinhua","family":"Zhu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China , Hefei, Anhui 230027, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lijun","family":"Wu","sequence":"additional","affiliation":[{"name":"Microsoft Research , Beijing 100080, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shufang","family":"Xie","sequence":"additional","affiliation":[{"name":"Microsoft Research , Beijing 100080, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tao","family":"Qin","sequence":"additional","affiliation":[{"name":"Microsoft Research , Beijing 100080, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,12,7]]},"reference":[{"key":"2023020108551461800_btab817-B1","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1038\/nchem.1243","article-title":"Quantifying the chemical beauty of drugs","volume":"4","author":"Bickerton","year":"2012","journal-title":"Nat. Chem"},{"key":"2023020108551461800_btab817-B2","doi-asserted-by":"crossref","first-page":"102269","DOI":"10.1016\/j.isci.2021.102269","article-title":"Paccmannrl: de novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning","volume":"24","author":"Born","year":"2021","journal-title":"iScience"},{"key":"2023020108551461800_btab817-B3","first-page":"1597","author":"Chen","year":"2020"},{"key":"2023020108551461800_btab817-B4","author":"Chithrananda","year":"2020"},{"key":"2023020108551461800_btab817-B5","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1021\/acscentsci.7b00355","article-title":"Computer-assisted retrosynthesis based on molecular similarity","volume":"3","author":"Coley","year":"2017","journal-title":"ACS Central Sci"},{"key":"2023020108551461800_btab817-B6","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/anie.199104553","article-title":"The logic of chemical synthesis: multistep synthesis of complex carbogenic molecules (nobel lecture)","volume":"30","author":"Corey","year":"1991","journal-title":"Angew. Chem. Int. Ed. Engl"},{"key":"2023020108551461800_btab817-B7","author":"Dai","year":"2019"},{"key":"2023020108551461800_btab817-B8","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1021\/acs.jcim.8b00173","article-title":"mmpdb: an open-source matched molecular pair platform for large multiproperty data sets","volume":"58","author":"Dalke","year":"2018","journal-title":"J. Chem. Inf. Model"},{"key":"2023020108551461800_btab817-B9","author":"De Cao","year":"2018"},{"key":"2023020108551461800_btab817-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the em algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023020108551461800_btab817-B11","first-page":"4171","author":"Devlin","year":"2019"},{"key":"2023020108551461800_btab817-B12","first-page":"489","author":"Edunov","year":"2018"},{"key":"2023020108551461800_btab817-B13","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","article-title":"Automatic chemical design using a data-driven continuous representation of molecules","volume":"4","author":"G\u00f3mez-Bombarelli","year":"2018","journal-title":"ACS Central Sci"},{"key":"2023020108551461800_btab817-B14","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/s41598-020-79682-4","article-title":"Transformer neural network for protein-specific de novo drug generation as a machine translation problem","volume":"11","author":"Grechishnikova","year":"2021","journal-title":"Sci. Rep"},{"key":"2023020108551461800_btab817-B15","first-page":"820","volume-title":"Advances in Neural Information Processing Systems","author":"He","year":"2016"},{"key":"2023020108551461800_btab817-B16","doi-asserted-by":"crossref","first-page":"2134","DOI":"10.1093\/bioinformatics\/btab080","article-title":"Generating property-matched decoy molecules using deep learning","volume":"37","author":"Imrie","year":"2021","journal-title":"Bioinformatics"},{"key":"2023020108551461800_btab817-B17","first-page":"2323","author":"Jin","year":"2018"},{"key":"2023020108551461800_btab817-B18","author":"Jin","year":"2019"},{"key":"2023020108551461800_btab817-B19","first-page":"4839","author":"Jin","year":"2020"},{"key":"2023020108551461800_btab817-B20","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1021\/acs.jcim.8b00263","article-title":"Conditional molecular design with deep generative models","volume":"59","author":"Kang","year":"2019","journal-title":"J. Chem. Inf. Model"},{"key":"2023020108551461800_btab817-B21","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1007\/978-3-030-30493-5_78","volume-title":"Artificial Neural Networks and Machine Learning\u2014ICANN 2019: Workshop and Special Sessions","author":"Karpov","year":"2019"},{"key":"2023020108551461800_btab817-B22","first-page":"817","volume-title":"International Conference on Artificial Neural Networks","author":"Karpov","year":"2019"},{"key":"2023020108551461800_btab817-B23","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1038\/s42256-020-0174-5","article-title":"Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks","volume":"2","author":"Kotsias","year":"2020","journal-title":"Nat. Mach. Intell"},{"key":"2023020108551461800_btab817-B24","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Proc. Syst"},{"key":"2023020108551461800_btab817-B25","first-page":"1945","volume-title":"International Conference on Machine Learning","author":"Kusner","year":"2017"},{"key":"2023020108551461800_btab817-B26","author":"Landrum","year":"2016"},{"key":"2023020108551461800_btab817-B27","article-title":"Learn molecular representations from large-scale unlabeled molecules for drug discovery","author":"Li","year":"2020"},{"key":"2023020108551461800_btab817-B28","doi-asserted-by":"crossref","first-page":"1103","DOI":"10.1021\/acscentsci.7b00303","article-title":"Retrosynthetic reaction prediction using neural sequence-to-sequence models","volume":"3","author":"Liu","year":"2017","journal-title":"ACS Central Sci"},{"key":"2023020108551461800_btab817-B29","first-page":"7795","article-title":"Constrained graph variational autoencoders for molecule design","volume":"31","author":"Liu","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023020108551461800_btab817-B30","author":"Liu","year":"2019"},{"key":"2023020108551461800_btab817-B31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-017-0235-x","article-title":"Molecular de-novo design through deep reinforcement learning","volume":"9","author":"Olivecrona","year":"2017","journal-title":"J. Cheminf"},{"key":"2023020108551461800_btab817-B32","doi-asserted-by":"crossref","first-page":"eaap7885","DOI":"10.1126\/sciadv.aap7885","article-title":"Deep reinforcement learning for de novo drug design","volume":"4","author":"Popova","year":"2018","journal-title":"Sci. Adv"},{"key":"2023020108551461800_btab817-B33","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model"},{"key":"2023020108551461800_btab817-B34","doi-asserted-by":"crossref","first-page":"6091","DOI":"10.1039\/C8SC02339E","article-title":"\u201cfound in translation\u201d: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models","volume":"9","author":"Schwaller","year":"2018","journal-title":"Chem. Sci"},{"key":"2023020108551461800_btab817-B35","doi-asserted-by":"crossref","first-page":"5966","DOI":"10.1002\/chem.201605499","article-title":"Neural-symbolic machine learning for retrosynthesis and reaction prediction","volume":"23","author":"Segler","year":"2017","journal-title":"Chemistry\u2013Eur. J"},{"key":"2023020108551461800_btab817-B36","first-page":"86","author":"Sennrich","year":"2016"},{"key":"2023020108551461800_btab817-B37","first-page":"8818","volume-title":"International Conference on Machine Learning","author":"Shi","year":"2020"},{"key":"2023020108551461800_btab817-B38","doi-asserted-by":"crossref","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","article-title":"Zinc 15\u2014ligand discovery for everyone","volume":"55","author":"Sterling","year":"2015","journal-title":"J. Chem. Inf. Model"},{"key":"2023020108551461800_btab817-B39","doi-asserted-by":"crossref","first-page":"688","DOI":"10.1016\/j.cell.2020.01.021","article-title":"A deep learning approach to antibiotic discovery","volume":"180","author":"Stokes","year":"2020","journal-title":"Cell"},{"key":"2023020108551461800_btab817-B40","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2023020108551461800_btab817-B41","author":"Wang","year":"2019"},{"key":"2023020108551461800_btab817-B42","first-page":"3789","author":"Xia","year":"2017"},{"key":"2023020108551461800_btab817-B43","author":"Xie","year":"2021"},{"key":"2023020108551461800_btab817-B44","first-page":"11248","volume-title":"Advances in Neural Information Processing Systems","author":"Yan","year":"2020"},{"key":"2023020108551461800_btab817-B45","first-page":"6412","author":"You","year":"2018"},{"key":"2023020108551461800_btab817-B46","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1021\/acs.jcim.9b00949","article-title":"Predicting retrosynthetic reactions using self-corrected transformer neural networks","volume":"60","author":"Zheng","year":"2020","journal-title":"J. Chem. Inf. Model"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab817\/41811801\/btab817.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/5\/1244\/49009351\/btab817.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/5\/1244\/49009351\/btab817.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T13:04:48Z","timestamp":1726232688000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/5\/1244\/6454941"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,12,7]]},"references-count":46,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,2,7]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab817","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,3,1]]},"published":{"date-parts":[[2021,12,7]]}}}