{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T19:09:44Z","timestamp":1775934584848,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T00:00:00Z","timestamp":1648425600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T00:00:00Z","timestamp":1648425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist\u2019s intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and\/or necessary,\n                    <jats:italic>e<\/jats:italic>\n                    .\n                    <jats:italic>g<\/jats:italic>\n                    .\u00a0simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.\n                  <\/jats:p>","DOI":"10.1186\/s13321-022-00599-3","type":"journal-article","created":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T22:04:48Z","timestamp":1648505088000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":56,"title":["Transformer-based molecular optimization beyond matched molecular pairs"],"prefix":"10.1186","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5848-8318","authenticated-orcid":false,"given":"Jiazhen","family":"He","sequence":"first","affiliation":[]},{"given":"Eva","family":"Nittinger","sequence":"additional","affiliation":[]},{"given":"Christian","family":"Tyrchan","sequence":"additional","affiliation":[]},{"given":"Werngard","family":"Czechtizky","sequence":"additional","affiliation":[]},{"given":"Atanas","family":"Patronov","sequence":"additional","affiliation":[]},{"given":"Esben Jannik","family":"Bjerrum","sequence":"additional","affiliation":[]},{"given":"Ola","family":"Engkvist","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,28]]},"reference":[{"issue":"8","key":"599_CR1","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1007\/s10822-013-9672-4","volume":"27","author":"PG Polishchuk","year":"2013","unstructured":"Polishchuk PG, Madzhidov TI, Varnek A (2013) Estimation of the size of drug-like chemical space based on gdb-17 data. J comput Aided Mol Des 27(8):675\u2013679","journal-title":"J comput Aided Mol Des"},{"issue":"10","key":"599_CR2","doi-asserted-by":"publisher","first-page":"1006","DOI":"10.1021\/jm00280a002","volume":"15","author":"JG Topliss","year":"1972","unstructured":"Topliss JG (1972) Utilization of operational schemes for analog synthesis in drug design. J Med Chem 15(10):1006\u20131011","journal-title":"J Med Chem"},{"issue":"1","key":"599_CR3","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","volume":"4","author":"MH Segler","year":"2018","unstructured":"Segler MH, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Central Sci 4(1):120\u2013131","journal-title":"ACS Central Sci"},{"issue":"1\u20132","key":"599_CR4","doi-asserted-by":"publisher","first-page":"1700111","DOI":"10.1002\/minf.201700111","volume":"37","author":"A Gupta","year":"2018","unstructured":"Gupta A, M\u00fcller AT, Huisman BJ, Fuchs JA, Schneider P, Schneider G (2018) Generative recurrent networks for de novo drug design. Mol Inform 37(1\u20132):1700111","journal-title":"Mol Inform"},{"key":"599_CR5","unstructured":"Bjerrum EJ, Threlfall R (2017) Molecular generation with recurrent neural networks (RNNs). arXiv preprint arXiv:1705.04612"},{"issue":"2","key":"599_CR6","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci 4(2):268\u2013276","journal-title":"ACS Central Sci"},{"key":"599_CR7","unstructured":"Dai H, Tian Y, Dai B, Skiena S, Song L (2018) Syntax-directed variational autoencoder for molecule generation. In: Proceedings of the international conference on learning representations"},{"issue":"1","key":"599_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-018-0286-7","volume":"10","author":"J Lim","year":"2018","unstructured":"Lim J, Ryu S, Kim JW, Kim WY (2018) Molecular generative model based on conditional variational autoencoder for de novo molecular design. J Cheminform 10(1):1\u20139","journal-title":"J Cheminform"},{"key":"599_CR9","unstructured":"Jin W, Barzilay R, Jaakkola T (2018) Junction tree variational autoencoder for molecular graph generation. In: International Conference on Machine Learning, pp. 2323\u20132332"},{"key":"599_CR10","unstructured":"Liu Q, Allamanis M, Brockschmidt M, Gaunt A (2018) Constrained graph variational autoencoders for molecule design. In: Advances in neural information processing systems, pp. 7795\u20137804"},{"key":"599_CR11","doi-asserted-by":"crossref","unstructured":"Simonovsky M, Komodakis N (2018) Graphvae: Towards generation of small graphs using variational autoencoders. In: International conference on artificial neural networks, pp. 412\u2013422 . Springer","DOI":"10.1007\/978-3-030-01418-6_41"},{"key":"599_CR12","unstructured":"Guimaraes GL, Sanchez-Lengeling B, Outeiral C, Farias P.L.C., Aspuru-Guzik A (2017) Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843"},{"issue":"6","key":"599_CR13","doi-asserted-by":"publisher","first-page":"1194","DOI":"10.1021\/acs.jcim.7b00690","volume":"58","author":"E Putin","year":"2018","unstructured":"Putin E, Asadulaev A, Ivanenkov Y, Aladinskiy V, Sanchez-Lengeling B, Aspuru-Guzik A, Zhavoronkov A (2018) Reinforced adversarial neural computer for de novo molecular design. J Chem Inf Model 58(6):1194\u20131204","journal-title":"J Chem Inf Model"},{"issue":"10","key":"599_CR14","doi-asserted-by":"publisher","first-page":"4386","DOI":"10.1021\/acs.molpharmaceut.7b01137","volume":"15","author":"E Putin","year":"2018","unstructured":"Putin E, Asadulaev A, Vanhaelen Q, Ivanenkov Y, Aladinskaya AV, Aliper A, Zhavoronkov A (2018) Adversarial threshold neural computer for molecular de novo design. Mol Pharm 15(10):4386\u20134397","journal-title":"Mol Pharm"},{"key":"599_CR15","unstructured":"De\u00a0Cao N, Kipf T (2018) MolGAN: An implicit generative model for small molecular graphs. In: ICML 2018 workshop on theoretical foundations and applications of deep generative models"},{"issue":"1","key":"599_CR16","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-017-0235-x","volume":"9","author":"M Olivecrona","year":"2017","unstructured":"Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9(1):48","journal-title":"J Cheminform"},{"key":"599_CR17","unstructured":"Jin W, Yang K, Barzilay R, Jaakkola T (2018) Learning multimodal graph-to-graph translation for molecule optimization. In: International conference on learning representations"},{"issue":"9","key":"599_CR18","doi-asserted-by":"publisher","first-page":"3098","DOI":"10.1021\/acs.molpharmaceut.7b00346","volume":"14","author":"A Kadurin","year":"2017","unstructured":"Kadurin A, Nikolenko S, Khrabrov K, Aliper A, Zhavoronkov A (2017) druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol Pharm 14(9):3098\u20133104","journal-title":"Mol Pharm"},{"issue":"1\u20132","key":"599_CR19","doi-asserted-by":"publisher","first-page":"1700123","DOI":"10.1002\/minf.201700123","volume":"37","author":"T Blaschke","year":"2018","unstructured":"Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H (2018) Application of generative autoencoder in de novo molecular design. Mol Inform 37(1\u20132):1700123","journal-title":"Mol Inform"},{"issue":"34","key":"599_CR20","doi-asserted-by":"publisher","first-page":"8016","DOI":"10.1039\/C9SC01928F","volume":"10","author":"R Winter","year":"2019","unstructured":"Winter R, Montanari F, Steffen A, Briem H, No\u00e9 F, Clevert D-A (2019) Efficient multi-objective molecular optimization in a continuous latent space. Chem Sci 10(34):8016\u20138024","journal-title":"Chem Sci"},{"issue":"1","key":"599_CR21","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s13321-018-0287-6","volume":"10","author":"Y Li","year":"2018","unstructured":"Li Y, Zhang L, Liu Z (2018) Multi-objective de novo drug design with conditional graph generative model. J Cheminform 10(1):33","journal-title":"J Cheminform"},{"issue":"5","key":"599_CR22","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1038\/s42256-020-0174-5","volume":"2","author":"P-C Kotsias","year":"2020","unstructured":"Kotsias P-C, Ar\u00fas-Pous J, Chen H, Engkvist O, Tyrchan C, Bjerrum EJ (2020) Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat Mach Intell 2(5):254\u2013265","journal-title":"Nat Mach Intell"},{"key":"599_CR23","unstructured":"Jin W, Barzilay R, Jaakkola T (2019) Hierarchical graph-to-graph translation for molecules. arXiv, 1907"},{"key":"599_CR24","unstructured":"Jin W, Barzilay R, Jaakkola T (2020) Hierarchical generation of molecular graphs using structural motifs. In: International conference on machine learning, pp. 4839\u20134848 . PMLR"},{"issue":"1","key":"599_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00497-0","volume":"13","author":"J He","year":"2021","unstructured":"He J, You H, Sandstr\u00f6m E, Nittinger E, Bjerrum EJ, Tyrchan C, Czechtizky W, Engkvist O (2021) Molecular optimization by capturing chemist\u2019s intuition using deep neural networks. J Cheminform 13(1):1\u201317","journal-title":"J Cheminform"},{"key":"599_CR26","doi-asserted-by":"crossref","unstructured":"He J, Mattsson F, Forsberg M, Bjerrum E.J., Engkvist O, Tyrchan C, Czechtizky W, et al. (2021) Transformer neural network for structure constrained molecular optimization. In: ICLR 2021 workshop: machine learning for preventing and combating pandemics","DOI":"10.26434\/chemrxiv.14416133"},{"issue":"1","key":"599_CR27","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336","journal-title":"J Chem Inf Comput Sci"},{"key":"599_CR28","unstructured":"Sutskever I, Vinyals O, Le Q.V. (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp. 3104\u20133112"},{"key":"599_CR29","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A.N., Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp. 5998\u20136008"},{"key":"599_CR30","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1002\/3527603743.ch11","volume":"23","author":"PW Kenny","year":"2005","unstructured":"Kenny PW, Sadowski J (2005) Structure modification in chemical databases. Chemoinform Drug Discov 23:271\u2013285","journal-title":"Chemoinform Drug Discov"},{"key":"599_CR31","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1016\/j.csbj.2016.12.003","volume":"15","author":"C Tyrchan","year":"2017","unstructured":"Tyrchan C, Evertsson E (2017) Matched molecular pair analysis in short: algorithms, applications and limitations. Comput Structl Biotechnol J 15:86\u201390","journal-title":"Comput Structl Biotechnol J"},{"issue":"15","key":"599_CR32","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis GW, Murcko MA (1996) The properties of known drugs. 1. molecular frameworks. J Med Chem 39(15):2887\u20132893","journal-title":"J Med Chem"},{"issue":"D1","key":"599_CR33","doi-asserted-by":"publisher","first-page":"930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2019","unstructured":"Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, F\u00e9lix E, Magari\u00f1os MP, Mosquera JF, Mutowo P, Nowotka M et al (2019) Chembl: towards direct deposition of bioassay data. Nucl Acids Res 47(D1):930\u2013940","journal-title":"Nucl Acids Res"},{"issue":"12","key":"599_CR34","doi-asserted-by":"publisher","first-page":"948","DOI":"10.1038\/nrd4128","volume":"12","author":"JG Cumming","year":"2013","unstructured":"Cumming JG, Davis AM, Muresan S, Haeberlein M, Chen H (2013) Chemical predictive modelling to improve compound quality. Nat Rev Drug Discov 12(12):948\u2013962","journal-title":"Nat Rev Drug Discov"},{"issue":"23","key":"599_CR35","doi-asserted-by":"publisher","first-page":"14425","DOI":"10.1021\/acs.jmedchem.0c01332","volume":"63","author":"A Schuffenhauer","year":"2020","unstructured":"Schuffenhauer A, Schneider N, Hintermann S, Auld D, Blank J, Cotesta S, Engeloch C, Fechner N, Gaul C, Giovannoni J et al (2020) Evolution of Novartis\u2019 small molecule screening deck design. J Med Chem 63(23):14425\u201314447","journal-title":"J Med Chem"},{"issue":"5","key":"599_CR36","doi-asserted-by":"publisher","first-page":"902","DOI":"10.1021\/acs.jcim.8b00173","volume":"58","author":"A Dalke","year":"2018","unstructured":"Dalke A, Hert J, Kramer C (2018) mmpdb: an open-source matched molecular pair platform for large multiproperty data sets. J Chem Inf Model 58(5):902\u2013910","journal-title":"J Chem Inf Model"},{"issue":"1","key":"599_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00525-z","volume":"13","author":"D Gogishvili","year":"2021","unstructured":"Gogishvili D, Nittinger E, Margreitter C, Tyrchan C (2021) Nonadditivity in public and inhouse data: implications for drug design. J Cheminform 13(1):1\u201318","journal-title":"J Cheminform"},{"issue":"8","key":"599_CR38","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"K Yang","year":"2019","unstructured":"Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370\u20133388","journal-title":"J Chem Inf Model"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00599-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-022-00599-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00599-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,28]],"date-time":"2022-03-28T22:16:34Z","timestamp":1648505794000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-022-00599-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,28]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["599"],"URL":"https:\/\/doi.org\/10.1186\/s13321-022-00599-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2021-z8rk6","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,28]]},"assertion":[{"value":"24 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"18"}}