{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T20:23:17Z","timestamp":1776889397459,"version":"3.51.2"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T00:00:00Z","timestamp":1719878400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T00:00:00Z","timestamp":1719878400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004505","name":"Universit\u00e0 degli Studi di Catania","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004505","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Classif"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Compositional data have peculiar characteristics that pose significant challenges to traditional statistical methods and models. Within this framework, we use a convenient mode parametrized Dirichlet distribution across multiple fields of statistics. In particular, we propose finite mixtures of unimodal Dirichlet (UD) distributions for model-based clustering and classification. Then, we introduce the contaminated UD (CUD) distribution, a heavy-tailed generalization of the UD distribution that allows for a more flexible tail behavior in the presence of atypical observations. Thirdly, we propose finite mixtures of CUD distributions to jointly account for the presence of clusters and atypical points in the data. Parameter estimation is carried out by directly maximizing the maximum likelihood or by using an expectation-maximization (EM) algorithm. Two analyses are conducted on simulated data to illustrate the effects of atypical observations on parameter estimation and data classification, and how our proposals address both aspects. Furthermore, two real datasets are investigated and the results obtained via our models are discussed.<\/jats:p>","DOI":"10.1007\/s00357-024-09480-4","type":"journal-article","created":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T11:02:04Z","timestamp":1719918124000},"page":"31-53","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["A New Look at the Dirichlet Distribution: Robustness, Clustering, and Both Together"],"prefix":"10.1007","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2690-8546","authenticated-orcid":false,"given":"Salvatore D.","family":"Tomarchio","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7742-1821","authenticated-orcid":false,"given":"Antonio","family":"Punzo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5945-6550","authenticated-orcid":false,"given":"Johannes T.","family":"Ferreira","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4793-5674","authenticated-orcid":false,"given":"Andriette","family":"Bekker","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,2]]},"reference":[{"issue":"2","key":"9480_CR1","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1111\/j.2517-6161.1982.tb01195.x","volume":"44","author":"J Aitchison","year":"1982","unstructured":"Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society: Series B (Methodological), 44(2), 139\u2013160.","journal-title":"Journal of the Royal Statistical Society: Series B (Methodological)"},{"issue":"2","key":"9480_CR2","first-page":"129","volume":"34","author":"J Aitchison","year":"1985","unstructured":"Aitchison, J., & Lauder, I. (1985). Kernel density estimation for compositional data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 34(2), 129\u2013137.","journal-title":"Journal of the Royal Statistical Society: Series C (Applied Statistics)"},{"issue":"4","key":"9480_CR3","doi-asserted-by":"crossref","first-page":"1571","DOI":"10.1007\/s00180-012-0367-4","volume":"28","author":"L Bagnato","year":"2013","unstructured":"Bagnato, L., & Punzo, A. (2013). Finite mixtures of unimodal beta and gamma densities and the $$k$$-bumps algorithm. Computational Statistics, 28(4), 1571\u20131597.","journal-title":"Computational Statistics"},{"issue":"3","key":"9480_CR4","doi-asserted-by":"crossref","first-page":"803","DOI":"10.2307\/2532201","volume":"49","author":"JD Banfield","year":"1993","unstructured":"Banfield, J. D., & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803\u2013821.","journal-title":"Biometrics"},{"key":"9480_CR5","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1007\/BF02083658","volume":"28","author":"C Barcel\u00f3","year":"1996","unstructured":"Barcel\u00f3, C., Pawlowsky, V., & Grunsky, E. (1996). Some aspects of transformations of compositional data and the identification of outliers. Mathematical Geology, 28, 501\u2013518.","journal-title":"Mathematical Geology"},{"issue":"105","key":"9480_CR6","first-page":"158","volume":"195","author":"K Bertin","year":"2023","unstructured":"Bertin, K., Genest, C., Klutchnikoff, N., et al. (2023). Minimax properties of Dirichlet kernel density estimators. Journal of Multivariate Analysis, 195(105), 158.","journal-title":"Journal of Multivariate Analysis"},{"issue":"13","key":"9480_CR7","doi-asserted-by":"crossref","first-page":"1493","DOI":"10.3390\/math9131493","volume":"9","author":"T Botha","year":"2021","unstructured":"Botha, T., Ferreira, J., & Bekker, A. (2021). Alternative Dirichlet priors for estimating entropy via a power sum functional. Mathematics, 9(13), 1493.","journal-title":"Mathematics"},{"issue":"11","key":"9480_CR8","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1109\/TIP.2004.834664","volume":"13","author":"N Bouguila","year":"2004","unstructured":"Bouguila, N., Ziou, D., & Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Transactions on Image Processing, 13(11), 1533\u20131543.","journal-title":"IEEE Transactions on Image Processing"},{"key":"9480_CR9","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/301115a0","volume":"301","author":"S Brazier","year":"1983","unstructured":"Brazier, S., Sparks, R. S. J., Carey, S. N., et al. (1983). Bimodal grain size distribution and secondary thickening in air-fall ash layers. Nature, 301, 115\u2013119.","journal-title":"Nature"},{"issue":"11","key":"9480_CR10","doi-asserted-by":"crossref","first-page":"3091","DOI":"10.1016\/j.renene.2011.03.024","volume":"36","author":"R Calif","year":"2011","unstructured":"Calif, R., Emilion, R., & Soubdhan, T. (2011). Classification of wind speed distributions using a mixture of Dirichlet distributions. Renewable Energy, 36(11), 3091\u20133097.","journal-title":"Renewable Energy"},{"issue":"1","key":"9480_CR11","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1111\/insr.12340","volume":"88","author":"JE Chac\u00f3n","year":"2020","unstructured":"Chac\u00f3n, J. E. (2020). The modal age of statistics. International Statistical Review, 88(1), 122\u2013141.","journal-title":"International Statistical Review"},{"issue":"2","key":"9480_CR12","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/S0167-9473(99)00010-9","volume":"31","author":"SX Chen","year":"1999","unstructured":"Chen, S. X. (1999). Beta kernel estimators for density functions. Computational Statistics & Data Analysis, 31(2), 131\u2013145.","journal-title":"Computational Statistics & Data Analysis"},{"issue":"1","key":"9480_CR13","first-page":"73","volume":"10","author":"SX Chen","year":"2000","unstructured":"Chen, S. X. (2000). Beta kernel smoothers for regression curves. Statistica Sinica, 10(1), 73\u201391.","journal-title":"Statistica Sinica"},{"issue":"423","key":"9480_CR14","doi-asserted-by":"crossref","first-page":"782","DOI":"10.1080\/01621459.1993.10476339","volume":"88","author":"L Davies","year":"1993","unstructured":"Davies, L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88(423), 782\u2013792.","journal-title":"Journal of the American Statistical Association"},{"issue":"1","key":"9480_CR15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"AP Dempster","year":"1977","unstructured":"Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: series B, 39(1), 1\u201322.","journal-title":"Journal of the Royal Statistical Society: series B"},{"issue":"8","key":"9480_CR16","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1007\/s11004-020-09861-6","volume":"52","author":"P Filzmoser","year":"2020","unstructured":"Filzmoser, P., & Gregorich, M. (2020). Multivariate outlier detection in applied data analysis: Global, local, compositional and cellwise outliers. Mathematical Geosciences, 52(8), 1049\u20131066.","journal-title":"Mathematical Geosciences"},{"key":"9480_CR17","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1007\/s11004-007-9141-5","volume":"40","author":"P Filzmoser","year":"2008","unstructured":"Filzmoser, P., & Hron, K. (2008). Outlier detection for compositional data using robust methods. Mathematical Geosciences, 40, 233\u2013248.","journal-title":"Mathematical Geosciences"},{"key":"9480_CR18","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-96422-5","volume-title":"Applied compositional data analysis","author":"P Filzmoser","year":"2018","unstructured":"Filzmoser, P., Hron, K., & Templ, M. (2018). Applied compositional data analysis. Cham: Springer."},{"issue":"7","key":"9480_CR19","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1080\/02664760902914532","volume":"37","author":"E Fi\u0161erov\u00e1","year":"2010","unstructured":"Fi\u0161erov\u00e1, E., & Hron, K. (2010). Total least squares solution for compositional data using linear models. Journal of Applied Statistics, 37(7), 1137\u20131152.","journal-title":"Journal of Applied Statistics"},{"key":"9480_CR20","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193\u2013218.","journal-title":"Journal of Classification"},{"key":"9480_CR21","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-78189-1","volume-title":"Modern multivariate statistical techniques: Regression, classification, and manifold learning","author":"AJ Izenman","year":"2008","unstructured":"Izenman, A. J. (2008). Modern multivariate statistical techniques: Regression, classification, and manifold learning. New York: Springer."},{"key":"9480_CR22","volume-title":"Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan","author":"J Kruschke","year":"2014","unstructured":"Kruschke, J. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press."},{"issue":"1","key":"9480_CR23","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1111\/j.2517-6161.1975.tb01035.x","volume":"37","author":"RH Lochner","year":"1975","unstructured":"Lochner, R. H. (1975). A generalized Dirichlet distribution in Bayesian life testing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 37(1), 103\u2013113.","journal-title":"Journal of the Royal Statistical Society Series B: Statistical Methodology"},{"key":"9480_CR24","unstructured":"McLachlan, G. J., Basford, K. E. (1988) Mixture models: Inference and applications to clustering. Statistics: A Series of Textbooks and Monographs, Marcel Dekker, New York"},{"key":"9480_CR25","doi-asserted-by":"crossref","DOI":"10.1201\/9781315373577","volume-title":"Mixture model-based classification","author":"PD McNicholas","year":"2016","unstructured":"McNicholas, P. D. (2016). Mixture model-based classification. CRC Press."},{"issue":"4","key":"9480_CR26","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1016\/0021-9681(64)90073-6","volume":"17","author":"EA Murphy","year":"1964","unstructured":"Murphy, E. A. (1964). One cause? Many causes? The argument from the bimodal distribution. Journal of Chronic Diseases, 17(4), 301\u2013324.","journal-title":"Journal of Chronic Diseases"},{"issue":"4","key":"9480_CR27","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/comjnl\/7.4.308","volume":"7","author":"JA Nelder","year":"1965","unstructured":"Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308\u2013313.","journal-title":"The Computer Journal"},{"key":"9480_CR28","volume-title":"Dirichlet and related distributions: Theory, methods and applications","author":"KW Ng","year":"2011","unstructured":"Ng, K. W., Tian, G. L., & Tang, M. L. (2011). Dirichlet and related distributions: Theory, methods and applications. John Wiley & Sons."},{"issue":"2","key":"9480_CR29","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/S0167-7152(98)00010-8","volume":"38","author":"JP Nolan","year":"1998","unstructured":"Nolan, J. P. (1998). Parameterizations and modes of stable distributions. Statistics & Probability Letters, 38(2), 187\u2013195.","journal-title":"Statistics & Probability Letters"},{"key":"9480_CR30","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1016\/j.jmva.2012.07.007","volume":"114","author":"A Ongaro","year":"2013","unstructured":"Ongaro, A., & Migliorati, S. (2013). A generalization of the Dirichlet distribution. Journal of Multivariate Analysis, 114, 412\u2013426.","journal-title":"Journal of Multivariate Analysis"},{"key":"9480_CR31","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1007\/s11222-019-09920-x","volume":"30","author":"A Ongaro","year":"2020","unstructured":"Ongaro, A., Migliorati, S., & Ascari, R. (2020). A new mixture model on the simplex. Statistics and Computing, 30, 749\u2013770.","journal-title":"Statistics and Computing"},{"issue":"104","key":"9480_CR32","first-page":"832","volume":"187","author":"F Ouimet","year":"2022","unstructured":"Ouimet, F., & Tolosana-Delgado, R. (2022). Asymptotic properties of Dirichlet kernel density estimators. Journal of Multivariate Analysis, 187(104), 832.","journal-title":"Journal of Multivariate Analysis"},{"key":"9480_CR33","doi-asserted-by":"crossref","unstructured":"Pal, S., Heumann, C. (2022) Clustering compositional data using Dirichlet mixture model. Plos one 17(5):e0268,438","DOI":"10.1371\/journal.pone.0268438"},{"key":"9480_CR34","doi-asserted-by":"crossref","unstructured":"Pawlowsky-Glahn, V., Buccianti, A. (2011) Compositional data analysis. Wiley Online Library","DOI":"10.1002\/9781119976462"},{"key":"9480_CR35","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1023\/A:1008981510081","volume":"10","author":"D Peel","year":"2000","unstructured":"Peel, D., & McLachlan, G. J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10, 339\u2013348.","journal-title":"Statistics and Computing"},{"issue":"7","key":"9480_CR36","doi-asserted-by":"crossref","first-page":"1260","DOI":"10.1080\/02664763.2018.1542668","volume":"46","author":"A Punzo","year":"2019","unstructured":"Punzo, A. (2019). A new look at the inverse Gaussian distribution with applications to insurance and economic data. Journal of Applied Statistics, 46(7), 1260\u20131287.","journal-title":"Journal of Applied Statistics"},{"issue":"6","key":"9480_CR37","doi-asserted-by":"crossref","first-page":"1506","DOI":"10.1002\/bimj.201500144","volume":"58","author":"A Punzo","year":"2016","unstructured":"Punzo, A., & McNicholas, P. D. (2016). Parsimonious mixtures of multivariate contaminated normal distributions. Biometrical Journal, 58(6), 1506\u20131537.","journal-title":"Biometrical Journal"},{"key":"9480_CR38","unstructured":"R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, https:\/\/www.R-project.org\/"},{"issue":"5","key":"9480_CR39","first-page":"2042","volume":"33","author":"S Ray","year":"2005","unstructured":"Ray, S., & Lindsay, B. G. (2005). The topography of multivariate normal mixtures. Annals of Statistics, 33(5), 2042\u20132065.","journal-title":"Annals of Statistics"},{"issue":"2","key":"9480_CR40","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","volume":"6","author":"G Schwarz","year":"1978","unstructured":"Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461\u2013464.","journal-title":"The Annals of Statistics"},{"key":"9480_CR41","volume-title":"robCompositions: An R-package for robust statistical analysis of compositional data","author":"M Templ","year":"2011","unstructured":"Templ, M., Hron, K., & Filzmoser, P. (2011). robCompositions: An R-package for robust statistical analysis of compositional data. John Wiley and Sons."},{"issue":"2","key":"9480_CR42","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1093\/petrology\/13.2.219","volume":"13","author":"R Thompson","year":"1972","unstructured":"Thompson, R., Esson, J., & Dunham, A. (1972). Major element chemical variation in the Eocene lavas of the Isle of Skye. Scotland. Journal of Petrology, 13(2), 219\u2013253.","journal-title":"Scotland. Journal of Petrology"},{"key":"9480_CR43","volume-title":"Statistical analysis of finite mixture distributions","author":"DM Titterington","year":"1985","unstructured":"Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite mixture distributions. New York: John Wiley & Sons."},{"issue":"4","key":"9480_CR44","doi-asserted-by":"crossref","first-page":"1247","DOI":"10.1111\/rssa.12466","volume":"182","author":"SD Tomarchio","year":"2019","unstructured":"Tomarchio, S. D., & Punzo, A. (2019). Modelling the loss given default distribution via a family of zero-and-one inflated mixture models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(4), 1247\u20131266.","journal-title":"Journal of the Royal Statistical Society: Series A (Statistics in Society)"},{"issue":"13\u201315","key":"9480_CR45","doi-asserted-by":"crossref","first-page":"2328","DOI":"10.1080\/02664763.2020.1789076","volume":"47","author":"SD Tomarchio","year":"2020","unstructured":"Tomarchio, S. D., & Punzo, A. (2020). Dichotomous unimodal compound models: Application to the distribution of insurance losses. Journal of Applied Statistics, 47(13\u201315), 2328\u20132353.","journal-title":"Journal of Applied Statistics"},{"key":"9480_CR46","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-36809-7","volume-title":"Analyzing compositional data with R,","author":"KG Van den Boogaart","year":"2013","unstructured":"Van den Boogaart, K. G., & Tolosana-Delgado, R. (2013). Analyzing compositional data with R, (Vol. 122). Springer."}],"container-title":["Journal of Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00357-024-09480-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00357-024-09480-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00357-024-09480-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T08:10:32Z","timestamp":1742371832000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00357-024-09480-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,2]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["9480"],"URL":"https:\/\/doi.org\/10.1007\/s00357-024-09480-4","relation":{},"ISSN":["0176-4268","1432-1343"],"issn-type":[{"value":"0176-4268","type":"print"},{"value":"1432-1343","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,2]]},"assertion":[{"value":"8 June 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 July 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 November 2024","order":3,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":4,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Missing Open Access funding information has been added in the Funding Note.","order":5,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}