{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T09:33:59Z","timestamp":1776850439366,"version":"3.51.2"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T00:00:00Z","timestamp":1619136000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T00:00:00Z","timestamp":1619136000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012550","name":"Nemzeti Kutat\u00e1si, Fejleszt\u00e9si \u00e9s Innovaci\u00f3s Alap","doi-asserted-by":"crossref","award":["OTKA K 134260"],"award-info":[{"award-number":["OTKA K 134260"]}],"id":[{"id":"10.13039\/501100012550","id-type":"DOI","asserted-by":"crossref"}]},{"name":"University of Florida: startup grant"},{"DOI":"10.13039\/501100003825","name":"Magyar Tudom\u00e1nyos Akad\u00e9mia","doi-asserted-by":"crossref","award":["J\u00e1nos Bolyai Research Scholarship"],"award-info":[{"award-number":["J\u00e1nos Bolyai Research Scholarship"]}],"id":[{"id":"10.13039\/501100003825","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (<jats:italic>i.e.<\/jats:italic> when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana <jats:italic>et al.<\/jats:italic> J Cheminform. 2021. <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"doi\" xlink:href=\"https:\/\/doi.org\/10.1186\/s13321-021-00504-4\">10.1186\/s13321-021-00504-4<\/jats:ext-link>. Python code for calculating the extended similarity metrics is freely available at: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ramirandaq\/MultipleComparisons\">https:\/\/github.com\/ramirandaq\/MultipleComparisons<\/jats:ext-link>.<\/jats:p>","DOI":"10.1186\/s13321-021-00505-3","type":"journal-article","created":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T12:07:25Z","timestamp":1619179645000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":58,"title":["Extended similarity indices: the benefits of comparing more than two objects simultaneously. Part 1: Theory and characteristics\u2020"],"prefix":"10.1186","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2121-4449","authenticated-orcid":false,"given":"Ram\u00f3n Alain","family":"Miranda-Quintana","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4277-9481","authenticated-orcid":false,"given":"D\u00e1vid","family":"Bajusz","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8271-9841","authenticated-orcid":false,"given":"Anita","family":"R\u00e1cz","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0965-939X","authenticated-orcid":false,"given":"K\u00e1roly","family":"H\u00e9berger","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,4,23]]},"reference":[{"key":"505_CR1","doi-asserted-by":"publisher","first-page":"2884","DOI":"10.1021\/ci300261r","volume":"52","author":"R Todeschini","year":"2012","unstructured":"Todeschini R, Consonni V, Xiang H, Holliday J, Buscema M, Willett P (2012) Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets. J Chem Inf Model 52:2884\u20132901","journal-title":"J Chem Inf Model"},{"key":"505_CR2","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-018-0302-y","volume":"10","author":"A R\u00e1cz","year":"2018","unstructured":"R\u00e1cz A, Bajusz D, H\u00e9berger K (2018) Life beyond the Tanimoto coefficient: similarity measures for interaction fingerprints Journal of. Cheminformatics 10:48","journal-title":"Cheminformatics"},{"key":"505_CR3","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1016\/j.drudis.2007.01.011","volume":"12","author":"H Eckert","year":"2007","unstructured":"Eckert H, Bajorath J (2007) Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today 12:225\u2013233","journal-title":"Drug Discov Today"},{"key":"505_CR4","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1038\/nrd2796","volume":"8","author":"GM Keser\u00fc","year":"2009","unstructured":"Keser\u00fc GM, Makara GM (2009) The influence of lead discovery strategies on the properties of drug candidates. Nat Rev Drug Discov 8:203\u2013212","journal-title":"Nat Rev Drug Discov"},{"key":"505_CR5","doi-asserted-by":"publisher","first-page":"4977","DOI":"10.1021\/jm4004285","volume":"57","author":"A Cherkasov","year":"2014","unstructured":"Cherkasov A, Muratov E, Fourches D, Varnek A, Baskin I, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz\u2019min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57:4977\u20135010","journal-title":"J Med Chem"},{"key":"505_CR6","doi-asserted-by":"publisher","first-page":"2932","DOI":"10.1021\/jm201706b","volume":"55","author":"D Stumpfe","year":"2012","unstructured":"Stumpfe D, Bajorath J (2012) Exploring activity cliffs in medicinal chemistry. J Med Chem 55:2932\u20132942","journal-title":"J Med Chem"},{"key":"505_CR7","doi-asserted-by":"publisher","first-page":"2000","DOI":"10.1021\/acs.jcim.8b00376","volume":"58","author":"I Cortes-Ciriano","year":"2018","unstructured":"Cortes-Ciriano I, Firth NC, Bender A, Watson O (2018) Discovering highly potent molecules from an initial set of inactives using iterative screening. J Chem Inf Model 58:2000\u20132014","journal-title":"J Chem Inf Model"},{"key":"505_CR8","doi-asserted-by":"publisher","first-page":"3204","DOI":"10.1039\/b409813g","volume":"2","author":"A Bender","year":"2004","unstructured":"Bender A, Glen RC (2004) Molecular similarity: a key technique in molecular informatics. Org Biomol Chem 2:3204\u20133218","journal-title":"Org Biomol Chem"},{"key":"505_CR9","doi-asserted-by":"publisher","first-page":"927","DOI":"10.1007\/s10910-012-0119-2","volume":"51","author":"F Heidar Zadeh","year":"2013","unstructured":"Heidar Zadeh F, Ayers PW (2013) Molecular alignment as a penalized permutation Procrustes problem. J Math Chem 51:927\u2013936","journal-title":"J Math Chem"},{"key":"505_CR10","first-page":"103","volume":"549","author":"DR Alcoba","year":"2012","unstructured":"Alcoba DR, Lain L, Torre A, Ona OB, Tiznado W (2012) Ground and excited state similarity studies by means of Fukui and dual-descriptor matrices Chem. Phys Lett 549:103\u2013107","journal-title":"Phys Lett"},{"key":"505_CR11","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/s10910-010-9737-8","volume":"49","author":"PW Ayers","year":"2011","unstructured":"Ayers PW, Carbo-Dorca R (2011) The relationship between the eigenvalues and eigenvectors of a similarity matrix and its associated Carbo index matrix. J Math Chem 49:6\u201311","journal-title":"J Math Chem"},{"key":"505_CR12","doi-asserted-by":"publisher","first-page":"1344","DOI":"10.1007\/s10910-009-9658-6","volume":"47","author":"RA Miranda-Quintana","year":"2010","unstructured":"Miranda-Quintana RA, Cruz-Rodes R, Codorniu-Hernandez E, Batista-Leyva AJ (2010) Formal theory of the comparative relations: its application to the study of quantum similarity and dissimilarity measures and indices. J Math Chem 47:1344\u20131365","journal-title":"J Math Chem"},{"key":"505_CR13","doi-asserted-by":"publisher","first-page":"234104","DOI":"10.1063\/1.2741536","volume":"126","author":"A Borgoo","year":"2007","unstructured":"Borgoo A, Torrent-Sucarrat M, De Proft F, Geerlings P (2007) Quantum similarity study of atoms: a bridge between hardness and similarity indices. J Chem Phys 126:234104","journal-title":"J Chem Phys"},{"key":"505_CR14","doi-asserted-by":"publisher","first-page":"1185","DOI":"10.1002\/qua.560170612","volume":"17","author":"R Carbo-Dorca","year":"1980","unstructured":"Carbo-Dorca R, Leyda L, Arnau M (1980) How similar is a molecule to another? An electron density measure of similarity between two molecular structures Int. J Quantum Chem 17:1185\u20131189","journal-title":"J Quantum Chem"},{"key":"505_CR15","doi-asserted-by":"publisher","first-page":"1046","DOI":"10.1016\/j.drudis.2006.10.005","volume":"11","author":"P Willett","year":"2006","unstructured":"Willett P (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov Today 11:1046\u20131053","journal-title":"Drug Discov Today"},{"key":"505_CR16","volume-title":"Encyclopedia of analytical chemistry: applications, theory and instrumentation","author":"R Todeschini","year":"2015","unstructured":"Todeschini R, Ballabio D, Consonni V (2015) Encyclopedia of analytical chemistry: applications, theory and instrumentation. Wiley, Hoboken"},{"key":"505_CR17","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1007\/s11306-018-1327-y","volume":"14","author":"A R\u00e1cz","year":"2018","unstructured":"R\u00e1cz A, Bajusz D, H\u00e9berger K (2018) Binary similarity measures for fingerprint analysis of qualitative metabolomic profiles. Metabolomics 14:29","journal-title":"Metabolomics"},{"key":"505_CR18","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K (2017) Comprehensive medicinal chemistry III. In: Chackalamannil S, Rotella D, Ward SE (Eds). Elsevier, Amsterdam"},{"key":"505_CR19","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-015-0069-3","volume":"7","author":"D Bajusz","year":"2015","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminformatics 7:20","journal-title":"J Cheminformatics"},{"key":"505_CR20","doi-asserted-by":"publisher","first-page":"1755","DOI":"10.1007\/s10910-019-01035-y","volume":"57","author":"RA Miranda-Quintana","year":"2019","unstructured":"Miranda-Quintana RA, Kim TD, Heidar-Zadeh F, Ayers PW (2019) On the impossibility of unambiguously selecting the best model for fitting data. J Math Chem 57:1755\u20131769","journal-title":"J Math Chem"},{"key":"505_CR21","doi-asserted-by":"publisher","first-page":"025008","DOI":"10.1088\/2632-2153\/ab891b","volume":"1","author":"AE Brereton","year":"2020","unstructured":"Brereton AE, MacKinnon S, Safikhani Z, Reeves S, Alwash S, Shahani V, Windemuth A (2020) Predicting drug properties with parameter-free machine learning: pareto-optimal embedded modeling (POEM). Mach Learn Sci Technol 1:025008","journal-title":"Mach Learn Sci Technol"},{"key":"505_CR22","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-021-00504-4","author":"RA Miranda-Quintana","year":"2021","unstructured":"Miranda-Quintana RA, R\u00e1cz A, Bajusz D, H\u00e9berger K. Extended similarity indices: the beneits of comparing more than two objects simultaneously. Part 2: speed, consistency, diversity selection. J Cheminform. 2021. https:\/\/doi.org\/10.1186\/s13321-021-00504-4","journal-title":"J Cheminform"},{"key":"505_CR23","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/j.trac.2009.09.009","volume":"29","author":"K H\u00e9berger","year":"2010","unstructured":"H\u00e9berger K (2010) Sum of ranking differences compares methods or models fairly. Trends Anal Chem 29:101\u2013109","journal-title":"Trends Anal Chem"},{"key":"505_CR24","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1016\/j.chemolab.2013.06.007","volume":"127","author":"K Koll\u00e1r-Hunek","year":"2013","unstructured":"Koll\u00e1r-Hunek K, H\u00e9berger K (2013) Method and model comparison by sum of ranking differences in cases of repeated observations (ties). Chemometr Intell Lab Syst 127:139\u2013146","journal-title":"Chemometr Intell Lab Syst"},{"key":"505_CR25","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1002\/cem.1320","volume":"25","author":"K H\u00e9berger","year":"2011","unstructured":"H\u00e9berger K, Koll\u00e1r-Hunek K (2011) Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers. J Chemom 25:151\u2013158","journal-title":"J Chemom"},{"key":"505_CR26","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.mrgentox.2014.04.028","volume":"771","author":"K H\u00e9berger","year":"2014","unstructured":"H\u00e9berger K, Kolarevi\u0107 S, Kra\u010dun-Kolarevi\u0107 M, Sunjog K, Ga\u010di\u0107 Z, Kljaji\u0107 Z, Mitri\u0107 M, Vukovi\u0107-Ga\u010di\u0107 B (2014) Evaluation of single cell gel electrophoresis data: combination of variance analysis with sum of ranking differences. Mutation Res Genet Toxicol Environ Mutagenesis 771:15\u201322","journal-title":"Mutation Res Genet Toxicol Environ Mutagenesis"},{"key":"505_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/cem.3104","volume":"33","author":"K H\u00e9berger","year":"2019","unstructured":"H\u00e9berger K, Koll\u00e1r-Hunek K (2019) Comparison of validation variants by sum of ranking differences and ANOVA. J Chemom 33:1\u201314","journal-title":"J Chemom"},{"key":"505_CR28","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1080\/1062936X.2015.1084647","volume":"26","author":"A R\u00e1cz","year":"2015","unstructured":"R\u00e1cz A, Bajusz D, H\u00e9berger K (2015) Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters. SAR QSAR Environ Res 26:683\u2013700","journal-title":"SAR QSAR Environ Res"},{"key":"505_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TMAG.2018.2836327","volume":"54","author":"J Louren\u00e7o","year":"2018","unstructured":"Louren\u00e7o J, Lebensztajn L (2018) Post-pareto optimality analysis with sum of ranking differences. IEEE Trans Magn 54:1\u201310","journal-title":"IEEE Trans Magn"},{"key":"505_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1021\/ci300547g","volume":"53","author":"P Willett","year":"2013","unstructured":"Willett P (2013) Combination of similarity rankings using data fusion. J Chem Inf Model 53:1\u201310","journal-title":"J Chem Inf Model"},{"key":"505_CR31","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.jpba.2016.04.001","volume":"127","author":"F Andri\u0107","year":"2016","unstructured":"Andri\u0107 F, Bajusz D, R\u00e1cz A, \u0160egan S, H\u00e9berger K (2016) Multivariate assessment of lipophilicity scales\u2014computational and reversed phase thin-layer chromatographic indices. J Pharm Biomed Anal 127:81\u201393","journal-title":"J Pharm Biomed Anal"},{"key":"505_CR32","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1177\/0003702817749232","volume":"72","author":"TD Stokes","year":"2018","unstructured":"Stokes TD, Fotein M, Brownfield B, Kalivas JH, Mousdis G, Amine A, Georgiou C (2018) Feasibility assessment of synchronous fluorescence spectral fusion by application to argan oil for adulteration analysis Appl. Spectrosc 72:432\u2013441","journal-title":"Spectrosc"},{"key":"505_CR33","doi-asserted-by":"publisher","first-page":"e3011","DOI":"10.1002\/cem.3011","volume":"32","author":"L Sipos","year":"2018","unstructured":"Sipos L, Gere A, Popp J, Kov\u00e1cs S (2018) A novel ranking distance measure combining Cayley and Spearman footrule metrics. J Chemom 32:e3011","journal-title":"J Chemom"},{"key":"505_CR34","volume-title":"Analysis of variance in experimental design","author":"HR Lindman","year":"1991","unstructured":"Lindman HR (1991) Analysis of variance in experimental design. Springer Verlag, New York"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-021-00505-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-021-00505-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-021-00505-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,4,23]],"date-time":"2021-04-23T12:22:35Z","timestamp":1619180555000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-021-00505-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,23]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["505"],"URL":"https:\/\/doi.org\/10.1186\/s13321-021-00505-3","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,23]]},"assertion":[{"value":"15 September 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 March 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 April 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"32"}}