{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T22:42:46Z","timestamp":1773787366839,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T00:00:00Z","timestamp":1680480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Gaussian mixture modeling is a generative probabilistic model that assumes that the observed data are generated from a mixture of multiple Gaussian distributions. This mixture model provides a flexible approach to model complex distributions that may not be easily represented by a single Gaussian distribution. The Gaussian mixture model with a noise component refers to a finite mixture that includes an additional noise component to model the background noise or outliers in the data. This additional noise component helps to take into account the presence of anomalies or outliers in the data. This latter aspect is crucial for anomaly detection in situations where a clear, early warning of an abnormal condition is required. This paper proposes a novel entropy-based procedure for initializing the noise component in Gaussian mixture models. Our approach is shown to be easy to implement and effective for anomaly detection. We successfully identify anomalies in both simulated and real-world datasets, even in the presence of significant levels of noise and outliers. We provide a step-by-step description of the proposed data analysis process, along with the corresponding R code, which is publicly available in a GitHub repository.<\/jats:p>","DOI":"10.3390\/a16040195","type":"journal-article","created":{"date-parts":[[2023,4,4]],"date-time":"2023-04-04T01:35:59Z","timestamp":1680572159000},"page":"195","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Entropy-Based Anomaly Detection for Gaussian Mixture Modeling"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3826-0484","authenticated-orcid":false,"given":"Luca","family":"Scrucca","sequence":"first","affiliation":[{"name":"Department of Economics, Universit\u00e0 degli Studi di Perugia, Via A. Pascoli 20, 06123 Perugia, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"McLachlan, G.J., and Peel, D. (2000). Finite Mixture Models, Wiley.","DOI":"10.1002\/0471721182"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1093\/bioinformatics\/17.10.977","article-title":"Model-based clustering and data transformations for gene expression data","volume":"17","author":"Yeung","year":"2001","journal-title":"Bioinformatics"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1093\/bioinformatics\/18.3.413","article-title":"A mixture model-based approach to the clustering of microarray expression data","volume":"18","author":"McLachlan","year":"2002","journal-title":"Bioinformatics"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Najarian, K., Zaheri, M., Rad, A.A., Najarian, S., and Dargahi, J. (2004). A novel mixture model method for identification of differentially expressed genes from DNA microarray data. BMC Bioinform., 5.","DOI":"10.1186\/1471-2105-5-201"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ko, Y., Zhai, C., and Rodriguez-Zas, S.L. (2007, January 2\u20134). Inference of gene pathways using Gaussian mixture models. Proceedings of the 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007), Fremont, CA, USA.","DOI":"10.1109\/BIBM.2007.59"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2184","DOI":"10.1093\/bioinformatics\/btn396","article-title":"Mixture models for protein structure ensembles","volume":"24","author":"Hirsch","year":"2008","journal-title":"Bioinformatics"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1080\/01621459.1998.10474110","article-title":"Detecting features in spatial point processes with clutter via model-based clustering","volume":"93","author":"Dasgupta","year":"1998","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1093\/comjnl\/41.8.578","article-title":"How many clusters? Which clustering method? Answers via model-based cluster analysis","volume":"41","author":"Fraley","year":"1998","journal-title":"Comput. J."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1648","DOI":"10.1080\/01621459.2015.1100996","article-title":"Robust improper maximum likelihood: Tuning, computation, and a comparison with other methods for robust Gaussian clustering","volume":"111","author":"Coretto","year":"2016","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1081","DOI":"10.1111\/biom.12351","article-title":"Mixtures of multivariate power exponential distributions","volume":"71","author":"Dang","year":"2015","journal-title":"Biometrics"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1506","DOI":"10.1002\/bimj.201500144","article-title":"Parsimonious mixtures of multivariate contaminated normal distributions","volume":"58","author":"Punzo","year":"2016","journal-title":"Biom. J."},{"key":"ref_12","first-page":"1324","article-title":"A general trimming approach to robust cluster analysis","volume":"36","author":"Gordaliza","year":"2008","journal-title":"Ann. Stat."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1080\/00949655.2018.1554659","article-title":"Robust inference for parsimonious model-based clustering","volume":"89","author":"Dotto","year":"2019","journal-title":"J. Stat. Comput. Simul."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1007\/s11749-019-00693-z","article-title":"Robust model-based clustering with mild and gross outliers","volume":"29","author":"Farcomeni","year":"2020","journal-title":"TEST"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based clustering, discriminant analysis, and density estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"803","DOI":"10.2307\/2532201","article-title":"Model-based Gaussian and non-Gaussian clustering","volume":"49","author":"Banfield","year":"1993","journal-title":"Biometrics"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1016\/0031-3203(94)00125-6","article-title":"Gaussian parsimonious clustering models","volume":"28","author":"Celeux","year":"1995","journal-title":"Pattern Recognit."},{"key":"ref_18","unstructured":"R Core Team (2022). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing."},{"key":"ref_19","unstructured":"Fraley, C., Raftery, A.E., and Scrucca, L. (2022). mclust: Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation, R Foundation for Statistical Computing. R Package Version 6.0.0."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"205","DOI":"10.32614\/RJ-2016-021","article-title":"mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models","volume":"8","author":"Scrucca","year":"2016","journal-title":"R J."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm (with discussion)","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"McLachlan, G., and Krishnan, T. (2008). The EM Algorithm and Extensions, Wiley-Interscience. [2nd ed.].","DOI":"10.1002\/9780470191613"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1109\/34.865189","article-title":"Assessing a mixture model for clustering with the integrated completed likelihood","volume":"22","author":"Biernacki","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","first-page":"1485","article-title":"Nonparametric maximum likelihood estimation of features in spatial point processes using Vorono\u00ef tessellation","volume":"92","author":"Allard","year":"1997","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1080\/01621459.1998.10473711","article-title":"Nearest-neighbor clutter removal for estimating features in spatial point processes","volume":"93","author":"Byers","year":"1998","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"994","DOI":"10.1198\/016214502388618780","article-title":"Nearest neighbor variance estimation (NNVE): Robust covariance estimation via nearest neighbor cleaning (with discussion)","volume":"97","author":"Wang","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_28","unstructured":"Cover, T.M., and Thomas, J.A. (2006). Elements of Information Theory, John Wiley & Sons. [2nd ed.]."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Michalowicz, J.V., Nichols, J.M., and Bucholtz, F. (2014). Handbook of Differential Entropy, Chapman & Hall\/CRC.","DOI":"10.1201\/b15991"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"107582","DOI":"10.1016\/j.csda.2022.107582","article-title":"Mixture-based estimation of entropy","volume":"177","author":"Robin","year":"2023","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1137\/S1064827596311451","article-title":"Algorithms for model-based Gaussian hierarchical clustering","volume":"20","author":"Fraley","year":"1998","journal-title":"SIAM J. Sci. Comput."},{"key":"ref_32","unstructured":"Dua, D., and Graff, C. (2023, January 15). UCI Machine Learning Repository. Available online: http:\/\/archive.ics.uci.edu\/ml."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"570","DOI":"10.1287\/opre.43.4.570","article-title":"Breast cancer diagnosis and prognosis via linear programming","volume":"43","author":"Mangasarian","year":"1995","journal-title":"Oper. Res."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/4\/195\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:09:14Z","timestamp":1760123354000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/4\/195"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,3]]},"references-count":33,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["a16040195"],"URL":"https:\/\/doi.org\/10.3390\/a16040195","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,3]]}}}