{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T04:12:36Z","timestamp":1748319156706,"version":"3.41.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T00:00:00Z","timestamp":1748217600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T00:00:00Z","timestamp":1748217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008131","name":"Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008131","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure\u2013activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure\u2013property contexts can be established, providing opportunities for various applications.<\/jats:p>","DOI":"10.1186\/s13321-025-01032-1","type":"journal-article","created":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T13:58:25Z","timestamp":1748267905000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Context-dependent similarity searching for small molecular fragments"],"prefix":"10.1186","volume":"17","author":[{"given":"Atsushi","family":"Yoshimori","sequence":"first","affiliation":[]},{"given":"J\u00fcrgen","family":"Bajorath","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,26]]},"reference":[{"key":"1032_CR1","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1002\/aris.2009.1440430108","volume":"43","author":"P Willett","year":"2009","unstructured":"Willett P (2009) Similarity methods in chemoinformatics. Annu Rev Inform Sci Technol 43:3\u201371. https:\/\/doi.org\/10.1002\/aris.2009.1440430108","journal-title":"Annu Rev Inform Sci Technol"},{"key":"1032_CR2","doi-asserted-by":"publisher","first-page":"3186","DOI":"10.1021\/jm401411z","volume":"57","author":"G Maggiora","year":"2014","unstructured":"Maggiora G, Vogt M, Stumpfe D, Bajorath J (2014) Molecular similarity in medicinal chemistry. J Med Chem 57:3186\u20133204. https:\/\/doi.org\/10.1021\/jm401411z","journal-title":"J Med Chem"},{"key":"1032_CR3","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1021\/acs.jcim.0c01301","volume":"61","author":"E L\u00f3pez-L\u00f3pez","year":"2020","unstructured":"L\u00f3pez-L\u00f3pez E, Bajorath J, Medina-Franco JL (2020) Informatics for chemistry, biology, and biomedical sciences. J Chem Inf Model 61:26\u201335. https:\/\/doi.org\/10.1021\/acs.jcim.0c01301","journal-title":"J Chem Inf Model"},{"key":"1032_CR4","volume-title":"Concepts and applications of molecular similarity","year":"1990","unstructured":"Johnson M, Maggiora GM (eds) (1990) Concepts and applications of molecular similarity. Wiley, New York"},{"key":"1032_CR5","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1002\/wcms.23","volume":"1","author":"D Stumpfe","year":"2011","unstructured":"Stumpfe D, Bajorath J (2011) Similarity searching. WIRES: Comput Mol Sci 1:260\u2013282. https:\/\/doi.org\/10.1002\/wcms.23","journal-title":"WIRES: Comput Mol Sci"},{"key":"1032_CR6","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1016\/S0079-6468(10)49004-9","volume":"49","author":"P Gedeck","year":"2010","unstructured":"Gedeck P, Kramer C, Ertl P (2010) Computational analysis of structure\u2013activity relationships. Prog Med Chem 49:113\u2013160. https:\/\/doi.org\/10.1016\/S0079-6468(10)49004-9","journal-title":"Prog Med Chem"},{"key":"1032_CR7","doi-asserted-by":"publisher","first-page":"e419","DOI":"10.1016\/j.ddtec.2013.01.002","volume":"10","author":"J Bajorath","year":"2013","unstructured":"Bajorath J (2013) Large-scale SAR analysis. Drug Discov Today: Technol 10:e419\u2013e426. https:\/\/doi.org\/10.1016\/j.ddtec.2013.01.002","journal-title":"Drug Discov Today: Technol"},{"key":"1032_CR8","volume-title":"The practice of medicinal chemistry","year":"2011","unstructured":"Wermuth CG (ed) (2011) The practice of medicinal chemistry, 3rd edn. Academic Press, Burlington","edition":"3"},{"key":"1032_CR9","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1038\/nrd1086","volume":"2","author":"KH Bleicher","year":"2003","unstructured":"Bleicher KH, B\u00f6hm HJ, M\u00fcller K, Alanine AI (2003) Hit and lead generation: beyond high-throughput screening. Nat Rev Drug Discov 2:369\u2013378. https:\/\/doi.org\/10.1038\/nrd1086","journal-title":"Nat Rev Drug Discov"},{"key":"1032_CR10","doi-asserted-by":"publisher","first-page":"348","DOI":"10.1016\/j.drudis.2006.02.006","volume":"11","author":"CG Wermuth","year":"2006","unstructured":"Wermuth CG (2006) Similarity in drugs: reflections on analogue design. Drug Discov Today 11:348\u2013354. https:\/\/doi.org\/10.1016\/j.drudis.2006.02.006","journal-title":"Drug Discov Today"},{"key":"1032_CR11","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1021\/acsomega.8b03390","volume":"4","author":"JJ Naveja","year":"2019","unstructured":"Naveja JJ, Vogt M, Stumpfe D, Medina-Franco JL, Bajorath J (2019) Systematic extraction of analogue series from large compound collections using a new computational compound\u2212core relationship method. ACS Omega 4:1027\u20131032. https:\/\/doi.org\/10.1021\/acsomega.8b03390","journal-title":"ACS Omega"},{"key":"1032_CR12","doi-asserted-by":"publisher","first-page":"114558","DOI":"10.1016\/j.ejmech.2022.114558","volume":"239","author":"A Yoshimori","year":"2022","unstructured":"Yoshimori A, Bajorath J (2022) Computational method for the systematic alignment of analogue series with structure\u2013activity relationship transfer potential across different targets. Eur J Med Chem 239:114558. https:\/\/doi.org\/10.1016\/j.ejmech.2022.114558","journal-title":"Eur J Med Chem"},{"key":"1032_CR13","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1021\/ci950274j","volume":"36","author":"SK Kearsley","year":"1996","unstructured":"Kearsley SK, Sallamack S, Fluder EM, Andose JD, Mosley RT, Sheridan RP (1996) Chemical similarity searching using physicochemical property descriptors. J Chem Inf Comput Sci 36:118\u2013127. https:\/\/doi.org\/10.1021\/ci950274j","journal-title":"J Chem Inf Comput Sci"},{"key":"1032_CR14","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1021\/ci9800211","volume":"38","author":"P Willett","year":"1998","unstructured":"Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J Chem Inf Comput Sci 38:983\u2013996. https:\/\/doi.org\/10.1021\/ci9800211","journal-title":"J Chem Inf Comput Sci"},{"key":"1032_CR15","doi-asserted-by":"publisher","first-page":"1115","DOI":"10.1126\/science.132.3434.1115","volume":"132","author":"DJ Rogers","year":"1960","unstructured":"Rogers DJ, Tanimoto TT (1960) A computer program for classifying plants. Science 132:1115\u20131118. https:\/\/doi.org\/10.1126\/science.132.3434.1115","journal-title":"Science"},{"key":"1032_CR16","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Preprint at arXiv:1301.3781v3"},{"key":"1032_CR17","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1126\/science.aaa86","volume":"349","author":"J Hirschberg","year":"2015","unstructured":"Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349:261\u2013266. https:\/\/doi.org\/10.1126\/science.aaa86","journal-title":"Science"},{"key":"1032_CR18","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/s13321-025-00951-3","volume":"17","author":"A Yoshimori","year":"2025","unstructured":"Yoshimori A, Bajorath J (2025) Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing. J Cheminform 17:5. https:\/\/doi.org\/10.1186\/s13321-025-00951-3","journal-title":"J Cheminform"},{"key":"1032_CR19","doi-asserted-by":"publisher","first-page":"1857","DOI":"10.1021\/ci200254k","volume":"51","author":"AM Wassermann","year":"2011","unstructured":"Wassermann AM, Bajorath J (2011) A data mining method to facilitate SAR transfer. J Chem Inf Model 51:1857\u20131866. https:\/\/doi.org\/10.1021\/ci200254k","journal-title":"J Chem Inf Model"},{"key":"1032_CR20","doi-asserted-by":"publisher","first-page":"3138","DOI":"10.1021\/ci300481d","volume":"52","author":"B Zhang","year":"2012","unstructured":"Zhang B, Wassermann AM, Vogt M, Bajorath J (2012) Systematic assessment of compound series with SAR transfer potential. J Chem Inf Model 52:3138\u20133143. https:\/\/doi.org\/10.1021\/ci300481d","journal-title":"J Chem Inf Model"},{"key":"1032_CR21","doi-asserted-by":"publisher","first-page":"D1100","DOI":"10.1093\/nar\/gkr777","volume":"40","author":"A Gaulton","year":"2012","unstructured":"Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100\u2013D1107. https:\/\/doi.org\/10.1093\/nar\/gkr777","journal-title":"Nucleic Acids Res"},{"key":"1032_CR22","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1021\/ci900450m","volume":"50","author":"J Hussain","year":"2010","unstructured":"Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339\u2013348. https:\/\/doi.org\/10.1021\/ci900450m","journal-title":"J Chem Inf Model"},{"key":"1032_CR23","unstructured":"RDKit: cheminformatics and machine learning software (2021) http:\/\/www.rdkit.org\/. Accessed 01 July 2024"},{"key":"1032_CR24","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures\u2014a technique developed at chemical abstracts service. J Chem Doc 5:107\u2013113","journal-title":"J Chem Doc"},{"key":"1032_CR25","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1021\/ci300513m","volume":"53","author":"M Awale","year":"2013","unstructured":"Awale M, van Deursen R, Reymond JL (2013) MQN-Mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 53:509\u2013518. https:\/\/doi.org\/10.1021\/ci300513m","journal-title":"J Chem Inf Model"},{"key":"1032_CR26","doi-asserted-by":"publisher","first-page":"27","DOI":"10.56471\/slujst.v4i.266","volume":"4","author":"HD Abubakar","year":"2022","unstructured":"Abubakar HD, Umar M, Bakale MA (2022) Sentiment classification: review of text vectorization methods: bag of words, Tf-Idf, word2vec and doc2vec. SLU J Sci Technol 4:27\u201333. https:\/\/doi.org\/10.56471\/slujst.v4i.266","journal-title":"SLU J Sci Technol"},{"key":"1032_CR27","unstructured":"Rehurek R, Sojka P (2011) Gensim\u2013Python framework for vector space modelling. NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic. https:\/\/radimrehurek.com\/gensim\/. Accessed 20 Aug 2024"},{"key":"1032_CR28","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system@ 1@ Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31\u201336","journal-title":"J Chem Inf Comput Sci"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01032-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01032-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01032-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T13:58:30Z","timestamp":1748267910000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01032-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,26]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1032"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01032-1","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,26]]},"assertion":[{"value":"17 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"83"}}