{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T17:36:10Z","timestamp":1761845770335,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T00:00:00Z","timestamp":1702252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Imbalanced data present a pervasive challenge in many real-world applications of statistical and machine learning, where the instances of one class significantly outnumber those of the other. This paper examines the impact of class imbalance on the performance of Gaussian mixture models in classification tasks and establishes the need for a strategy to reduce the adverse effects of imbalanced data on the accuracy and reliability of classification outcomes. We explore various strategies to address this problem, including cost-sensitive learning, threshold adjustments, and sampling-based techniques. Through extensive experiments on synthetic and real-world datasets, we evaluate the effectiveness of these methods. Our findings emphasize the need for effective mitigation strategies for class imbalance in supervised Gaussian mixtures, offering valuable insights for practitioners and researchers in improving classification outcomes.<\/jats:p>","DOI":"10.3390\/a16120563","type":"journal-article","created":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T13:18:21Z","timestamp":1702300701000},"page":"563","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["On the Influence of Data Imbalance on Supervised Gaussian Mixture Models"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3826-0484","authenticated-orcid":false,"given":"Luca","family":"Scrucca","sequence":"first","affiliation":[{"name":"Department of Economics, Universit\u00e0 degli Studi di Perugia, Via A. Pascoli 20, 06123 Perugia, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2023,12,11]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"McLachlan, G.J., and Peel, D. (2000). Finite Mixture Models, Wiley.","key":"ref_1","DOI":"10.1002\/0471721182"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1198\/016214502760047131","article-title":"Model-based Clustering, Discriminant Analysis, and Density Estimation","volume":"97","author":"Fraley","year":"2002","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1146\/annurev-statistics-031017-100325","article-title":"Finite Mixture Models","volume":"6","author":"McLachlan","year":"2019","journal-title":"Annu. Rev. Stat. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm (with discussion)","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"doi-asserted-by":"crossref","unstructured":"McLachlan, G., and Krishnan, T. (2008). The EM Algorithm and Extensions, Wiley-Interscience. [2nd ed.].","key":"ref_5","DOI":"10.1002\/9780470191613"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1142\/S0218001409007326","article-title":"Classification of Imbalanced Data: A Review","volume":"23","author":"Sun","year":"2009","journal-title":"Int. J. Pattern Recognit. Artif. Intell."},{"doi-asserted-by":"crossref","unstructured":"Fern\u00e1ndez, A., Garc\u00eda, S., Galar, M., Prati, R.C., Krawczyk, B., and Herrera, F. (2018). Learning from Imbalanced Data Sets, Springer.","key":"ref_7","DOI":"10.1007\/978-3-319-98074-4"},{"doi-asserted-by":"crossref","unstructured":"Pal, B., and Paul, M.K. (2017, January 6\u201318). A Gaussian mixture based boosted classification scheme for imbalanced and oversampled data. Proceedings of the 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox\u2019s Bazar, Bangladesh.","key":"ref_8","DOI":"10.1109\/ECACE.2017.7912938"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"3687","DOI":"10.1007\/s13042-019-00953-2","article-title":"A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets","volume":"10","author":"Han","year":"2019","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1111\/j.2517-6161.1996.tb02073.x","article-title":"Discriminant analysis by Gaussian mixtures","volume":"58","author":"Hastie","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1743","DOI":"10.1080\/01621459.1996.10476746","article-title":"Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition","volume":"91","author":"Bensmail","year":"1996","journal-title":"J. Am. Stat. Assoc."},{"doi-asserted-by":"crossref","unstructured":"Scrucca, L., Fraley, C., Murphy, T.B., and Raftery, A.E. (2023). Model-Based Clustering, Classification, and Density Estimation Using mclust in R, Chapman & Hall\/CRC.","key":"ref_12","DOI":"10.1201\/9781003277965"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"205","DOI":"10.32614\/RJ-2016-021","article-title":"mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models","volume":"8","author":"Scrucca","year":"2016","journal-title":"R J."},{"unstructured":"Fraley, C., Raftery, A.E., and Scrucca, L. (2023). mclust: Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation, R Foundation. R Package Version 6.0.0.","key":"ref_15"},{"unstructured":"R Core Team (2022). R: A Language and Environment for Statistical Computing, R Foundation.","key":"ref_16"},{"unstructured":"Provost, F. (2000, January 31). Machine learning from imbalanced data sets 101. Proceedings of the AAAI Workshop on Imbalanced Data Sets, Austin, TX, USA.","key":"ref_17"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1162\/089976602753284446","article-title":"Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure","volume":"14","author":"Saerens","year":"2002","journal-title":"Neural Comput."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1007\/s10618-012-0295-5","article-title":"Training and assessing classification rules with imbalanced data","volume":"28","author":"Menardi","year":"2012","journal-title":"Data Min. Knowl. Discov."},{"unstructured":"Murphy, K.P. (2022). Probabilistic Machine Learning: An Introduction, MIT Press.","key":"ref_21"},{"unstructured":"Cortez, P., Cerdeira, A., Almeida, F., Matos, T., and Reis, J. (2009). Wine Quality Data, UC Irvine. UCI Machine Learning Repository.","key":"ref_22"},{"unstructured":"Dua, D., and Graff, C. (2023, September 14). UCI Machine Learning Repository. Available online: http:\/\/archive.ics.uci.edu\/ml.","key":"ref_23"},{"unstructured":"Quinlan, R. (1987). Thyroid Disease, UC Irvine. UCI Machine Learning Repository.","key":"ref_24"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/12\/563\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:36:44Z","timestamp":1760132204000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/12\/563"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,11]]},"references-count":24,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["a16120563"],"URL":"https:\/\/doi.org\/10.3390\/a16120563","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2023,12,11]]}}}