{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:54:04Z","timestamp":1773194044253,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T00:00:00Z","timestamp":1725408000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.<\/jats:p>","DOI":"10.3390\/bdcc8090109","type":"journal-article","created":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T10:58:46Z","timestamp":1725447526000},"page":"109","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems"],"prefix":"10.3390","volume":"8","author":[{"given":"Atena","family":"Jalali Mojahed","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8968-6744","authenticated-orcid":false,"given":"Mohammad Hossein","family":"Moattar","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran"}]},{"given":"Hamidreza","family":"Ghaffari","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,4]]},"reference":[{"key":"ref_1","unstructured":"Russell, S.J., and Norvig, P. (2016). Artificial Intelligence: A Modern Approach, Pearson Education Limited."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Duin, R.P., and Tax, D.M.J. (2005). Statistical pattern recognition. Handbook of Pattern Recognition and Computer Vision, World Scientific Pub Co Inc.","DOI":"10.1142\/9789812775320_0001"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"He, H., and Ma, Y. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications, John Wiley & Sons.","DOI":"10.1002\/9781118646106"},{"key":"ref_4","first-page":"1263","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2008","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_5","first-page":"176","article-title":"Classification with class imbalance problem: A review","volume":"7","author":"Ali","year":"2015","journal-title":"Int. J. Adv. Soft Comput. Its Appl."},{"key":"ref_6","unstructured":"Nguyen, G.H., Bouzerdoum, A., and Phung, S.L. (2009). Learning pattern classification tasks with imbalanced datasets. Pattern Recognition, InTech."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.neunet.2007.12.031","article-title":"Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance","volume":"21","author":"Mazurowski","year":"2008","journal-title":"Neural Netw."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1007\/s11280-012-0178-0","article-title":"Effective detection of sophisticated online banking fraud on extremely imbalanced data","volume":"16","author":"Wei","year":"2013","journal-title":"World Wide Web"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Li, Y., Sun, G., and Zhu, Y. (2010, January 15\u201317). Data imbalance problem in text classification. Proceedings of the Information Processing (ISIP), 2010 Third International Symposium on Information Processing, Qingdao, China.","DOI":"10.1109\/ISIP.2010.47"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"936","DOI":"10.1016\/j.cherd.2010.01.005","article-title":"Fault diagnosis based on imbalance modified kernel Fisher discriminant analysis","volume":"88","author":"Zhu","year":"2010","journal-title":"Chem. Eng. Res. Des."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1109\/TSMCC.2010.2048428","article-title":"Toward credible evaluation of anomaly-based intrusion-detection methods","volume":"40","author":"Tavallaee","year":"2010","journal-title":"IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)"},{"key":"ref_12","first-page":"25","article-title":"Handling imbalanced datasets: A review","volume":"30","author":"Kotsiantis","year":"2006","journal-title":"GESTS Int. Trans. Comput. Sci. Eng."},{"key":"ref_13","unstructured":"Xing, E.P., Jordan, M.I., Russell, S.J., and Ng, A.Y. (2003). Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems, Mit Pr."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Bellet, A., Habrard, A., and Sebban, M. (2015). Metric Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning; Springer.","DOI":"10.1007\/978-3-031-01572-4"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1016\/j.neunet.2018.06.003","article-title":"Survey and experimental study on metric learning methods","volume":"105","author":"Li","year":"2018","journal-title":"Neural Netw."},{"key":"ref_16","unstructured":"Weinberger, K.Q., Blitzer, J., and Saul, L.K. (2006). Distance metric learning for large margin nearest neighbor classification. Advances in Neural Information Processing Systems, Mit Pr."},{"key":"ref_17","first-page":"207","article-title":"Distance metric learning for large margin nearest neighbor classification","volume":"10","author":"Weinberger","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Zadeh, P., Hosseini, R., and Sra, S. (2016, January 19\u201324). Geometric mean metric learning. Proceedings of the International Conference on Machine Learning, PMLR, New York City, NY, USA."},{"key":"ref_19","first-page":"1","article-title":"Distance metric learning with eigenvalue optimization","volume":"13","author":"Ying","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1016\/j.patcog.2016.11.010","article-title":"Supervised distance metric learning through maximization of the Jeffrey divergence","volume":"64","author":"Nguyen","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Davis, J.V., Kulis, B., Jain, P., Sra, S., and Dhillon, I.S. (2007, January 17\u201324). Information-theoretic metric learning. Proceedings of the 24th international conference on Machine learning, Corvallis, OR, USA.","DOI":"10.1145\/1273496.1273523"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1016\/j.patcog.2011.07.026","article-title":"A boosting approach for supervised Mahalanobis distance metric learning","volume":"45","author":"Chang","year":"2012","journal-title":"Pattern Recognit."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1109\/TCSVT.2017.2726526","article-title":"SLMOML: Online Metric Learning With Global Convergence","volume":"28","author":"Zhong","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, W., and Tsang, I.W. (2015, January 25\u201330). Large Margin Metric Learning for Multi-Label Prediction. Proceedings of the AAAI, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9610"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kaya, M., and Bilge, H.\u015e. (2019). Deep metric learning: A survey. Symmetry, 11.","DOI":"10.3390\/sym11091066"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1016\/j.neucom.2020.08.017","article-title":"A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges","volume":"425","author":"Herrera","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ghojogh, B., Ghodsi, A., Karray, F., and Crowley, M. (2022). Spectral, Probabilistic, and Deep Metric Learning: Tutorial and Survey. arXiv.","DOI":"10.1007\/978-3-031-10602-6_11"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.neucom.2019.05.019","article-title":"Hyperspectral imagery classification with deep metric learning","volume":"356","author":"Cao","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, N., Zhao, X., Jiang, Y., and Gao, Y. (2018, January 13\u201319). Iterative Metric Learning for Imbalance Data Classification. Proceedings of the 2018 International Joint Conference on Artificial Intelligence IJCAI, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/389"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2384","DOI":"10.1109\/TSMC.2018.2790914","article-title":"Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets","volume":"49","author":"Feng","year":"2018","journal-title":"IEEE Trans. Syst. Man Cybern. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1016\/j.patrec.2020.03.008","article-title":"Metric learning from imbalanced data with generalization guarantees","volume":"133","author":"Gautheron","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1007\/s10489-022-03494-4","article-title":"Borderline-margin loss based deep metric learning framework for imbalanced data","volume":"53","author":"Yan","year":"2022","journal-title":"Appl. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-022-00617-z","article-title":"Improved cost-sensitive representation of data for solving the imbalanced big data classification problem","volume":"9","author":"Fattahi","year":"2022","journal-title":"J. Big Data"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, K.F., An, J., Wei, Z., Cui, C., Ma, X.H., Ma, C., and Bao, H.Q. (2022). Deep learning-based imbalanced classification with fuzzy support vector machine. Front. Bioeng. Biotechnol., 9.","DOI":"10.3389\/fbioe.2021.802712"},{"key":"ref_35","unstructured":"(2024, July 22). UCI Machine Learning Repository. Available online: https:\/\/archive.ics.uci.edu\/ml\/index.php."},{"key":"ref_36","unstructured":"Navarro, J.R.D., and Noche, J.R. (2024, July 22). Classification of Mixtures of Student Grade Distributions Based on The Gaussian Mixture Model Using The Expectation-Maximization Algorithm. Available online: https:\/\/www.researchgate.net\/publication\/2922541_Classification_of_Mixtures_of_Student_Grade_Distributions_Based_on_the_Gaussian_Mixture_Model_Using_the_Expectation-Maximization_Algorithm."},{"key":"ref_37","unstructured":"Ester, M., Kriegel, H.P., Sander, J., and Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the KDD\u201996: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2\u20134 August 1996."},{"key":"ref_38","first-page":"99","article-title":"On a measure of divergence between two statistical populations defined by their probability distributions","volume":"35","author":"Bhattacharyya","year":"1943","journal-title":"Bull. Calcutta Math. Soc."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/9\/109\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:48:44Z","timestamp":1760111324000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/8\/9\/109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,4]]},"references-count":38,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["bdcc8090109"],"URL":"https:\/\/doi.org\/10.3390\/bdcc8090109","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,4]]}}}