{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:36:57Z","timestamp":1760150217448,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,10,13]],"date-time":"2023-10-13T00:00:00Z","timestamp":1697155200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Portuguese national funds","award":["UIDB\/00685\/2020"],"award-info":[{"award-number":["UIDB\/00685\/2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Stats"],"abstract":"<jats:p>From the affinity coefficient between two discrete probability distributions proposed by Matusita, Bacelar-Nicolau introduced the affinity coefficient in a cluster analysis context and extended it to different types of data, including for the case of complex and heterogeneous data within the scope of symbolic data analysis (SDA). In this study, we refer to the most significant partitions obtained using the hierarchical cluster analysis (h.c.a.) of two well-known datasets that were taken from the literature on complex (symbolic) data analysis. h.c.a. is based on the weighted generalized affinity coefficient for the case of interval data and on probabilistic aggregation criteria from a VL parametric family. To calculate the values of this coefficient, two alternative algorithms were used and compared. Both algorithms were able to detect clusters of macrodata (aggregated data into groups of interest) that were consistent and consonant with those reported in the literature, but one performed better than the other in some specific cases. Moreover, both approaches allow for the treatment of large microdatabases (non-aggregated data) after their transformation into macrodata from the huge microdata.<\/jats:p>","DOI":"10.3390\/stats6040068","type":"journal-article","created":{"date-parts":[[2023,10,13]],"date-time":"2023-10-13T10:16:12Z","timestamp":1697192172000},"page":"1082-1094","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Comparison between Two Algorithms for Computing the Weighted Generalized Affinity Coefficient in the Case of Interval Data"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3151-5237","authenticated-orcid":false,"given":"\u00c1urea","family":"Sousa","sequence":"first","affiliation":[{"name":"Faculty of Sciences and Technology, CEEAplA and OSEAN, Universidade dos A\u00e7ores, 9500-321 Ponta Delgada, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0269-8153","authenticated-orcid":false,"given":"Osvaldo","family":"Silva","sequence":"additional","affiliation":[{"name":"Faculty of Sciences and Technology, CICSNOVA.UAc, Universidade dos A\u00e7ores, 9500-321 Ponta Delgada, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0421-1262","authenticated-orcid":false,"given":"Leonor","family":"Bacelar-Nicolau","sequence":"additional","affiliation":[{"name":"Faculty of Medicine, Institute of Preventive Medicine and Public Health & ISAMB\/FM-UL, Universidade de Lisboa, 1649-028 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6579-3553","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Cabral","sequence":"additional","affiliation":[{"name":"Faculty of Sciences and Technology, Universidade dos A\u00e7ores, 9500-321 Ponta Delgada, Portugal"},{"name":"CIMA-Research Centre, Mathematics and Applications & Azores University, 9500-321 Ponta Delgada, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9663-3977","authenticated-orcid":false,"given":"Helena","family":"Bacelar-Nicolau","sequence":"additional","affiliation":[{"name":"Faculty of Psychology, Institute of Environmental Health (ISAMB\/FM-UL), Universidade de Lisboa, 1649-013 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,10,13]]},"reference":[{"key":"ref_1","unstructured":"Malerba, D., Esposito, F., Gioviale, V., and Tamma, V. (2001, January 7\u201310). Comparing dissimilarity measures in symbolic data analysis. Proceedings of the Joint Conferences on \u201cNew Techniques and Technologies for Statistics\u201d and \u201cExchange of Technology and Knowhow\u201d (ETK-NTTS\u201901), New York, NY, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bock, H.H., and Diday, E. (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. Studies in Classification, Data Analysis, and Knowledge Organization.","DOI":"10.1007\/978-3-642-57155-8"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bock, H.H., and Diday, E. (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. Studies in Classification, Data Analysis, and Knowledge Organization.","DOI":"10.1007\/978-3-642-57155-8"},{"key":"ref_4","unstructured":"Bock, H.H. (1988). Classification and Related Methods of Data Analysis, Proc. IFCS-87."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Bock, H.-H., and Diday, E. (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. [1st ed.]. Studies in Classification, Data Analysis, and Knowledge Organization.","DOI":"10.1007\/978-3-642-57155-8"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1198\/016214503000242","article-title":"From the statistics of data to the statistics of knowledge","volume":"98","author":"Billard","year":"2003","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Billard, L., and Diday, E. (2007). Symbolic Data Analysis: Conceptual Statistics and Data Mining, Wiley. [1st ed.].","DOI":"10.1002\/9780470090183"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1002\/widm.1133","article-title":"Symbolic data analysis: Another look at the interaction of data mining and statistics","volume":"4","author":"Brito","year":"2014","journal-title":"WIREs Data Min. Knowl. Discov."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bock, H.H., and Diday, E. (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. Studies in Classification, Data Analysis, and Knowledge Organization.","DOI":"10.1007\/978-3-642-57155-8"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Banks, D., House, L., McMorris, F.R., Arabie, P., and Gaul, W. (2004). Classification, Clustering, and Data Mining Applications, Springer. Studies in Classification, Data Analysis, and Knowledge Organization.","DOI":"10.1007\/978-3-642-17103-1"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s00180-006-0260-0","article-title":"New clustering methods for interval data","volume":"21","author":"Chavent","year":"2006","journal-title":"Comput. Statist."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.csda.2014.04.012","article-title":"Exploratory data analysis of interval-valued symbolic data with matrix visualization","volume":"79","author":"Kao","year":"2014","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_13","first-page":"1","article-title":"Clustering methods and Kohonen maps for symbolic data","volume":"15","author":"Bock","year":"2002","journal-title":"J. Japanese Soc. Comput. Statist"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Diday, E., and Noirhomme, M. (2008). Symbolic Data Analysis and the SODAS Software, Wiley.","DOI":"10.1002\/9780470723562"},{"key":"ref_15","unstructured":"Noirhomme-Fraiture, M., and Rouard, M. (1998, January 4\u20136). Representation of Subpopulations and Correlation with Zoom Star. Proceedings of the NTTS\u2019 98, EUSTAT, Sorrento, Italy."},{"key":"ref_16","first-page":"45","article-title":"Interval Data Clustering","volume":"22","author":"Goswami","year":"2020","journal-title":"IOSR J. Comput. Eng."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/s00180-021-01121-3","article-title":"A hierarchical clustering method for random intervals based on a similarity measure","volume":"37","year":"2022","journal-title":"Comput. Stat."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"293","DOI":"10.3233\/IDA-150718","article-title":"Probabilistic clustering of interval data","volume":"19","author":"Brito","year":"2015","journal-title":"Intell. Data Anal."},{"key":"ref_19","unstructured":"Anderberg, M.R. (1973). Cluster Analysis for Applications, Academic Press. [1st ed.]."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"231","DOI":"10.5183\/jjscs1988.15.2_231","article-title":"Hierarchical and pyramidal clustering for symbolic data","volume":"15","author":"Brito","year":"2002","journal-title":"J. Jpn. Soc. Comput. Stat."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"104743","DOI":"10.1016\/j.engappai.2022.104743","article-title":"A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects","volume":"110","author":"Ezugwu","year":"2022","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1007\/s11390-018-1857-9","article-title":"Hierarchical clustering of complex symbolic data and application for emitter","volume":"33","author":"Xu","year":"2018","journal-title":"J. Comput. Sci. Technol."},{"key":"ref_23","first-page":"53","article-title":"Dynamical clustering algorithm of interval data: Optimization of an adequacy criterion based on Hausdorff distance","volume":"Volume 3","author":"Jajuga","year":"2002","journal-title":"Classification, Clustering, and Data Analysis"},{"key":"ref_24","first-page":"5","article-title":"Trois nouvelles m\u00e9thods de classification automatique de donn\u00e9es symboliques de type intervalle","volume":"LI","author":"Chavent","year":"2003","journal-title":"Rev. Stat. Appl."},{"key":"ref_25","first-page":"231","article-title":"Dynamic clustering for interval data based on L2 distance","volume":"21a","author":"Brito","year":"2006","journal-title":"Comput. Stat."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.patrec.2005.08.014","article-title":"Adaptive Hausdorff distances and dynamic clustering of symbolic interval data","volume":"27","author":"Chavent","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1016\/j.patrec.2003.10.016","article-title":"Clustering of interval data based on city-block distances","volume":"25","author":"Souza","year":"2004","journal-title":"Pattern Recognit. Lett."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1016\/0031-3203(91)90022-W","article-title":"Symbolic Clustering Using a New Dissimilarity Measure","volume":"24","author":"Gowda","year":"1991","journal-title":"Pattern Recognit. Lett."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1109\/21.148412","article-title":"Symbolic Clustering Using a New Similarity Measure","volume":"22","author":"Gowda","year":"1992","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_30","first-page":"9","article-title":"Measuring similarity of complex and heterogeneous data in clustering of large data sets","volume":"29","author":"Nicolau","year":"2009","journal-title":"Biocybern. Biomed. Eng."},{"key":"ref_31","first-page":"435","article-title":"Clustering of variables with a three-way approach for health sciences","volume":"21","author":"Nicolau","year":"2014","journal-title":"Test. Psychom. Methodol. Appl. Psychol."},{"key":"ref_32","unstructured":"Gordon, A.D. (1999). Classification, Chapman & Hall\/CRC. [2nd ed.]."},{"key":"ref_33","first-page":"5","article-title":"Sur l\u2019analyse des donn\u00e9es pr\u00e9alable \u00e0 une classification automatique","volume":"32","author":"Lerman","year":"1970","journal-title":"Rev. Math. Sc. Hum."},{"key":"ref_34","first-page":"3","article-title":"\u00c9tude Distributionelle de Statistiques de Proximit\u00e9 Entre Structures Alg\u00e9briques Finies du M\u00eame Type: Apllication \u00e0 la Classification Automatique","volume":"19","author":"Lerman","year":"1973","journal-title":"Cah. Bur. Univ. Rech. Op\u00e9r. S\u00e9rie Rech."},{"key":"ref_35","unstructured":"Lerman, I.C. (1981). Classification et Analyse Ordinale des Donn\u00e9es, Dunod."},{"key":"ref_36","first-page":"37","article-title":"Comparing Taxonomic Data","volume":"150","author":"Lerman","year":"2000","journal-title":"Math Sci. Hum."},{"key":"ref_37","unstructured":"Lerman, I.C. (2016). Advanced Information and Knowledge Processing, Springer."},{"key":"ref_38","first-page":"431","article-title":"Cluster analysis and distribution function","volume":"45","author":"Nicolau","year":"1983","journal-title":"Methods Oper. Res."},{"key":"ref_39","unstructured":"Bock, H.-H. (1988). Classification and Related Methods of Data Analysis, Elsevier Sciences Publishers B.V."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/BF02949773","article-title":"On the theory of statistical decision functions","volume":"3","author":"Matusita","year":"1951","journal-title":"Ann. Inst. Stat. Math."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1214\/aoms\/1177728422","article-title":"Decision rules, based on distance for problems of fit, two samples and estimation","volume":"26","author":"Matusita","year":"1955","journal-title":"Ann. Inst. Stat. Math."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Bock, H.H., and Diday, E. (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. Chapter Similarity and Dissimilarity.","DOI":"10.1007\/978-3-642-57155-8"},{"key":"ref_43","first-page":"17","article-title":"On clustering interval data with different scales of measures: Experimental results","volume":"4","author":"Sousa","year":"2015","journal-title":"Asian J. Appl. Sci. Eng."},{"key":"ref_44","first-page":"23151","article-title":"Clustering an interval data set: Are the main partitions similar to a priori partition?","volume":"7","author":"Sousa","year":"2015","journal-title":"Int. J. Curr. Res."},{"key":"ref_45","first-page":"45","article-title":"Weighted generalised affinity coefficient in cluster analysis of complex data of the interval type","volume":"47","author":"Sousa","year":"2010","journal-title":"Biom. Lett."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Bock, H.H., and Diday, E. (2000). Analysis of Symbolic Data. Exploratory Methods for Extracting Statistical Information from Complex Data, Springer. Studies in Classification, Data Analysis, and Knowledge Organization.","DOI":"10.1007\/978-3-642-57155-8"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1203","DOI":"10.1016\/j.patrec.2004.03.016","article-title":"Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns","volume":"25","author":"Guru","year":"2004","journal-title":"Pattern Recog. Lett."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/j.patrec.2006.08.014","article-title":"Fuzzy c-means clustering methods for symbolic interval data","volume":"28","year":"2007","journal-title":"Pattern Recog. Lett."}],"container-title":["Stats"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2571-905X\/6\/4\/68\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:06:13Z","timestamp":1760130373000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2571-905X\/6\/4\/68"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,13]]},"references-count":48,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["stats6040068"],"URL":"https:\/\/doi.org\/10.3390\/stats6040068","relation":{},"ISSN":["2571-905X"],"issn-type":[{"type":"electronic","value":"2571-905X"}],"subject":[],"published":{"date-parts":[[2023,10,13]]}}}