{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T07:26:43Z","timestamp":1763018803840,"version":"3.33.0"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T00:00:00Z","timestamp":1736899200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T00:00:00Z","timestamp":1736899200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008131","name":"Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008131","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure\u2013activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications.<\/jats:p>\n          <jats:p>\n            <jats:bold>Scientific contribution<\/jats:bold>\n          <\/jats:p>\n          <jats:p>A method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.<\/jats:p>","DOI":"10.1186\/s13321-025-00951-3","type":"journal-article","created":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T11:04:30Z","timestamp":1736939070000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Context-dependent similarity analysis of analogue series for structure\u2013activity relationship transfer based on a concept from natural language processing"],"prefix":"10.1186","volume":"17","author":[{"given":"Atsushi","family":"Yoshimori","sequence":"first","affiliation":[]},{"given":"J\u00fcrgen","family":"Bajorath","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,15]]},"reference":[{"key":"951_CR1","doi-asserted-by":"publisher","first-page":"3341","DOI":"10.1002\/1521-3773(20010917)40:18<3341::AID-ANIE3341>3.0.CO;2-D","volume":"40","author":"G Wess","year":"2001","unstructured":"Wess G, Urmann M, Sickenberger B (2001) Medicinal chemistry: challenges and opportunities. Angew Chem Int Ed 40:3341\u20133350. https:\/\/doi.org\/10.1002\/1521-3773(20010917)40:18%3c3341::AID-ANIE3341%3e3.0.CO;2-D","journal-title":"Angew Chem Int Ed"},{"key":"951_CR2","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1038\/nrd1086","volume":"2","author":"KH Bleicher","year":"2003","unstructured":"Bleicher KH, B\u00f6hm HJ, M\u00fcller K, Alanine AI (2003) Hit and lead generation: beyond high-throughput screening. Nat Rev Drug Discov 2:369\u2013378. https:\/\/doi.org\/10.1038\/nrd1086","journal-title":"Nat Rev Drug Discov"},{"key":"951_CR3","volume-title":"The practice of medicinal chemistry, 3rd ed","year":"2011","unstructured":"Wermuth CG (ed) (2011) The practice of medicinal chemistry, 3rd ed. Academic Press-Elsevier, Burlington, San Diego, London"},{"key":"951_CR4","doi-asserted-by":"publisher","first-page":"1239","DOI":"10.1111\/j.1476-5381.2010.01127.x","volume":"162","author":"JP Hughes","year":"2011","unstructured":"Hughes JP, Rees S, Kalindjian SB, Philpott KL (2011) Principles of early drug discovery. Br J Pharmacol 162:1239\u20131249. https:\/\/doi.org\/10.1111\/j.1476-5381.2010.01127.x","journal-title":"Br J Pharmacol"},{"key":"951_CR5","volume-title":"The handbook of medicinal chemistry: principles and practice","year":"2014","unstructured":"Davis A, Ward SE (eds) (2014) The handbook of medicinal chemistry: principles and practice. Royal Society of Chemistry, London"},{"key":"951_CR6","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1038\/nrd.2018.116","volume":"17","author":"J Bostr\u00f6m","year":"2018","unstructured":"Bostr\u00f6m J, Brown DG, Young RJ, Keser\u00fc GM (2018) Expanding the medicinal chemistry synthetic toolbox. Nat Rev Drug Discov 17:709\u2013727. https:\/\/doi.org\/10.1038\/nrd.2018.116","journal-title":"Nat Rev Drug Discov"},{"key":"951_CR7","doi-asserted-by":"publisher","first-page":"348","DOI":"10.1016\/j.drudis.2006.02.006","volume":"11","author":"CG Wermuth","year":"2006","unstructured":"Wermuth CG (2006) Similarity in drugs: reflections on analogue design. Drug Discov Today 11:348\u2013354. https:\/\/doi.org\/10.1016\/j.drudis.2006.02.006","journal-title":"Drug Discov Today"},{"key":"951_CR8","doi-asserted-by":"publisher","first-page":"e419","DOI":"10.1016\/j.ddtec.2013.01.002","volume":"10","author":"J Bajorath","year":"2013","unstructured":"Bajorath J (2013) Large-scale SAR analysis. Drug Discov Today: Technol 10:e419\u2013e426. https:\/\/doi.org\/10.1016\/j.ddtec.2013.01.002","journal-title":"Drug Discov Today: Technol"},{"key":"951_CR9","doi-asserted-by":"publisher","first-page":"2944","DOI":"10.1021\/jm200026b","volume":"54","author":"M Wawer","year":"2011","unstructured":"Wawer M, Bajorath J (2011) Local structural changes, global Data Views: graphical substructure\u2013activity relationship trailing. J Med Chem 54:2944\u20132951. https:\/\/doi.org\/10.1021\/jm200026b","journal-title":"J Med Chem"},{"key":"951_CR10","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1021\/ci900450m","volume":"50","author":"J Hussain","year":"2010","unstructured":"Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339\u2013348. https:\/\/doi.org\/10.1021\/ci900450m","journal-title":"J Chem Inf Model"},{"key":"951_CR11","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1021\/acsomega.8b03390","volume":"4","author":"JJ Naveja","year":"2019","unstructured":"Naveja JJ, Vogt M, Stumpfe D, Medina-Franco JL, Bajorath J (2019) Systematic extraction of analogue series from large compound collections using a new computational compound\u2212core relationship method. ACS Omega 4:1027\u20131032. https:\/\/doi.org\/10.1021\/acsomega.8b03390","journal-title":"ACS Omega"},{"key":"951_CR12","doi-asserted-by":"publisher","first-page":"1857","DOI":"10.1021\/ci200254k","volume":"51","author":"AM Wassermann","year":"2011","unstructured":"Wassermann AM, Bajorath J (2011) A data mining method to facilitate SAR transfer. J Chem Inf Model 51:1857\u20131866. https:\/\/doi.org\/10.1021\/ci200254k","journal-title":"J Chem Inf Model"},{"key":"951_CR13","doi-asserted-by":"publisher","first-page":"3138","DOI":"10.1021\/ci300481d","volume":"52","author":"B Zhang","year":"2012","unstructured":"Zhang B, Wassermann AM, Vogt M, Bajorath J (2012) Systematic assessment of compound series with SAR transfer potential. J Chem Inf Model 52:3138\u20133143","journal-title":"J Chem Inf Model"},{"key":"951_CR14","doi-asserted-by":"publisher","first-page":"1388","DOI":"10.1021\/ci300481d","volume":"63","author":"D Bonnani","year":"2020","unstructured":"Bonnani D, Lolli ML, Bajorath J (2020) Computational method for structure-based analysis of SAR transfer. J Med Chem 63:1388\u20131396. https:\/\/doi.org\/10.1021\/ci300481d","journal-title":"J Med Chem"},{"key":"951_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejmech.2022.114558","volume":"239","author":"A Yoshimori","year":"2022","unstructured":"Yoshimori A, Bajorath J (2022) Computational method for the systematic alignment of analogue series with structure-activity relationship transfer potential across different targets. Eur J Med Chem 239:114558. https:\/\/doi.org\/10.1016\/j.ejmech.2022.114558","journal-title":"Eur J Med Chem"},{"key":"951_CR16","doi-asserted-by":"publisher","first-page":"D1100","DOI":"10.1093\/nar\/gkr777","volume":"40","author":"A Gaulton","year":"2012","unstructured":"Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100\u2013D1107. https:\/\/doi.org\/10.1093\/nar\/gkr777","journal-title":"Nucleic Acids Res"},{"key":"951_CR17","unstructured":"RDKit: cheminformatics and machine learning software. (2021) http:\/\/www.rdkit.org\/. Accessed 01 July 2024."},{"key":"951_CR18","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures\u2014a technique developed at Chemical Abstracts Service. J Chem Doc 5:107\u2013113","journal-title":"J Chem Doc"},{"key":"951_CR19","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1021\/ci300513m","volume":"53","author":"M Awale","year":"2013","unstructured":"Awale M, van Deursen R, Reymond JL (2013) MQN-Mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 53:509\u2013518. https:\/\/doi.org\/10.1021\/ci300513m","journal-title":"J Chem Inf Model"},{"key":"951_CR20","doi-asserted-by":"publisher","first-page":"1115","DOI":"10.1126\/science.132.3434.1115","volume":"132","author":"DJ Rogers","year":"1960","unstructured":"Rogers DJ, Tanimoto TT (1960) A computer program for classifying plants. Science 132:1115\u20131118. https:\/\/doi.org\/10.1126\/science.132.3434.1115","journal-title":"Science"},{"key":"951_CR21","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","volume":"48","author":"SB Needleman","year":"1970","unstructured":"Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443\u2013453. https:\/\/doi.org\/10.1016\/0022-2836(70)90057-4","journal-title":"J Mol Biol"},{"key":"951_CR22","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1126\/science.aaa86","volume":"349","author":"J Hirschberg","year":"2015","unstructured":"Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349:261\u2013266. https:\/\/doi.org\/10.1126\/science.aaa86","journal-title":"Science"},{"key":"951_CR23","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Preprint at arXiv:1301.3781v3."},{"key":"951_CR24","doi-asserted-by":"publisher","first-page":"27","DOI":"10.56471\/slujst.v4i.266","volume":"4","author":"HD Abubakar","year":"2022","unstructured":"Abubakar HD, Umar M, Bakale MA (2022) Sentiment classification: review of text vectorization methods: bag of words, Tf-Idf, word2vec and doc2vec. SLU J Sci Technol 4:27\u201333. https:\/\/doi.org\/10.56471\/slujst.v4i.266","journal-title":"SLU J Sci Technol"},{"key":"951_CR25","unstructured":"Rehurek R, Sojka P (2011) Gensim\u2013Python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic. https:\/\/radimrehurek.com\/gensim\/. Accessed 20 Aug 2024."},{"key":"951_CR26","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1109\/ACCESS.2021.3137960","volume":"10","author":"JA Sterling","year":"2021","unstructured":"Sterling JA, Montemore MM (2021) Combining citation network information and text similarity for research article recommender systems. IEEE Access 10:16\u201323. https:\/\/doi.org\/10.1109\/ACCESS.2021.3137960","journal-title":"IEEE Access"},{"key":"951_CR27","doi-asserted-by":"publisher","first-page":"7061","DOI":"10.1021\/acsomega.9b00595","volume":"4","author":"A Yoshimori","year":"2019","unstructured":"Yoshimori A, Tanoue T, Bajorath J (2019) Integrating the structure\u2013activity relationship matrix method with molecular grid maps and activity landscape models for medicinal chemistry applications. ACS Omega 4:7061\u20137069. https:\/\/doi.org\/10.1021\/acsomega.9b00595","journal-title":"ACS Omega"},{"key":"951_CR28","first-page":"2579","volume":"9","author":"L Van der Maaten","year":"2008","unstructured":"Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579\u20132605","journal-title":"J Mach Learn Res"},{"key":"951_CR29","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"951_CR30","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1007\/978-3-642-73778-7_164","volume":"38","author":"R Jonker","year":"1987","unstructured":"Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38:325\u2013340. https:\/\/doi.org\/10.1007\/978-3-642-73778-7_164","journal-title":"Computing"},{"key":"951_CR31","doi-asserted-by":"publisher","first-page":"3147","DOI":"10.1021\/cr950066q","volume":"96","author":"GA Patani","year":"1996","unstructured":"Patani GA, LaVoie EJ (1996) Bioisosterism: a rational approach in drug design. Chem Rev 96:3147\u20133176. https:\/\/doi.org\/10.1021\/cr950066q","journal-title":"Chem Rev"},{"key":"951_CR32","volume-title":"Medicinal chemistry, 3rd ed","year":"1970","unstructured":"Burger A (ed) (1970) Medicinal chemistry, 3rd ed. Burger, Wiley-Interscience, New York"},{"key":"951_CR33","doi-asserted-by":"publisher","first-page":"14046","DOI":"10.1021\/acs.jmedchem.1c01215","volume":"64","author":"MAM Subbaiah","year":"2021","unstructured":"Subbaiah MAM, Meanwell NA (2021) Bioisosteres of the phenyl ring: recent strategic applications in lead optimization and drug design. J Med Chem 64:14046\u201314128. https:\/\/doi.org\/10.1021\/acs.jmedchem.1c01215","journal-title":"J Med Chem"},{"key":"951_CR34","doi-asserted-by":"publisher","first-page":"5049","DOI":"10.1021\/acs.jmedchem.9b00252","volume":"62","author":"C Tseng","year":"2019","unstructured":"Tseng C, Baillie G, Donvito G et al (2019) The trifluoromethyl group as a bioisosteric replacement of the aliphatic nitro group in CB1 receptor positive allosteric modulators. J Med Chem 62:5049\u20135062. https:\/\/doi.org\/10.1021\/acs.jmedchem.9b00252","journal-title":"J Med Chem"},{"key":"951_CR35","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1021\/jm00214a001","volume":"20","author":"JG Topliss","year":"1977","unstructured":"Topliss JG (1977) A manual method for applying the Hansch approach to drug design. J Med Chem 20:463\u2013469","journal-title":"J Med Chem"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00951-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-00951-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00951-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T11:04:37Z","timestamp":1736939077000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-00951-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,15]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["951"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-00951-3","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,15]]},"assertion":[{"value":"4 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"5"}}