{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T14:26:27Z","timestamp":1774448787313,"version":"3.50.1"},"reference-count":23,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T00:00:00Z","timestamp":1741824000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:p>The exponential growth of image and video data motivates the need for practical real-time content-based searching algorithms. Features play a vital role in identifying objects within images. However, feature-based classification faces a challenge due to uneven class instance distribution. Ideally, each class should have an equal number of instances and features to ensure optimal classifier performance. However, real-world scenarios often exhibit class imbalances. Thus, this article explores the classification framework based on image features, analyzing balanced and imbalanced distributions. Through extensive experimentation, we examine the impact of class imbalance on image classification performance, primarily on large datasets. The comprehensive evaluation shows that all models perform better with balancing compared to using an imbalanced dataset, underscoring the importance of dataset balancing for model accuracy. Distributed Gaussian (D-GA) and Distributed Poisson (D-PO) are found to be the most effective techniques, especially in improving Random Forest (RF) and SVM models. The deep learning experiments also show an improvement as such.<\/jats:p>","DOI":"10.3389\/fdata.2025.1455442","type":"journal-article","created":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T12:57:42Z","timestamp":1741870662000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Impact of imbalanced features on large datasets"],"prefix":"10.3389","volume":"8","author":[{"given":"Waleed","family":"Albattah","sequence":"first","affiliation":[]},{"given":"Rehan Ullah","family":"Khan","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,3,13]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"90","DOI":"10.51967\/tepian.v4i2.2216","article-title":"Hyperparameter tuning deep learning for imbalanced data","volume":"4","author":"Achmad","year":"2023","journal-title":"Tepian"},{"key":"B2","author":"Asokan","year":"2021","journal-title":"Handling Class Imbalance Using Generative Adversarial Network (GAN) and Convolutional Neural Network (CNN)"},{"key":"B3","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1016\/j.cviu.2012.09.007","article-title":"Pooling in image representation: the visual codeword point of view","volume":"117","author":"Avila","year":"2013","journal-title":"Comput. Vis. Image Underst."},{"key":"B4","doi-asserted-by":"publisher","first-page":"8614","DOI":"10.3390\/app13158614","article-title":"Sensitivity of modern deep learning neural networks to unbalanced datasets in multiclass classification problems","volume":"13","author":"Barulina","year":"2023","journal-title":"Appl. Sci"},{"key":"B5","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","article-title":"A systematic study of the class imbalance problem in convolutional neural networks","volume":"106","author":"Buda","year":"2018","journal-title":"Neural Netw"},{"key":"B6","first-page":"1","article-title":"Gender shades: intersectional accuracy disparities in commercial gender classification","volume":"81","author":"Buolamwini","year":"2018","journal-title":"Proc. Mach. Learn. Res."},{"key":"B7","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res"},{"key":"B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/3-540-45014-9_1","article-title":"\u201cEnsemble methods in machine learning,\u201d","volume-title":"International Workshop on Multiple Classifier Systems","author":"Dietterich","year":"2000"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2021\/1735386","article-title":"Classification of imbalanced data using deep learning with adding noise","volume":"2021","author":"Fan","year":"2021","journal-title":"J. Sens"},{"key":"B10","article-title":"Generative adversarial nets","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"B11","doi-asserted-by":"publisher","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"B12","doi-asserted-by":"publisher","first-page":"e16807","DOI":"10.1016\/j.heliyon.2023.e16807","article-title":"Dynamic Learning for Imbalance data in learning chest x-ray and CT images","volume":"9","author":"Iqbal","year":"2023","journal-title":"Heliyon"},{"key":"B13","doi-asserted-by":"publisher","first-page":"429","DOI":"10.3233\/IDA-2002-6504","article-title":"The class imbalance problem: a systematic study","volume":"6","author":"Japkowicz","year":"2002","journal-title":"Intell. Data Anal"},{"key":"B14","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"B15","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.media.2017.07.005","article-title":"A survey of deep learning in medical image analysis","volume":"42","author":"Litjens","year":"2017","journal-title":"Med. Image Anal."},{"key":"B16","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1109\/TSMCB.2008.2007853","article-title":"Exploratory undersampling for class-imbalance learning","volume":"39","author":"Liu","year":"2009","journal-title":"IEEE Trans. Syst. Man Cybern. B"},{"key":"B17","author":"Masko","year":"2015","journal-title":"The Impact of Imbalanced Training Data for Convolutional Neural Networks"},{"key":"B18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40645-021-00459-y","article-title":"Classification of imbalanced cloud image data using deep neural networks: performance improvement through a data science competition","volume":"8","author":"Matsuoka","year":"2021","journal-title":"Prog. Earth Planet. Sci"},{"key":"B19","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"B20","article-title":"The effectiveness of data augmentation in image classification using deep learning","author":"Perez","year":"2017","journal-title":"arXiv [Preprint]"},{"key":"B21","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-59650-1_19","article-title":"\u201cOn the impact of imbalanced data in convolutional neural networks performance,\u201d","volume-title":"Hybrid Artificial Intelligent Systems. HAIS 2017. Lecture Notes in Computer Science","author":"Pulgar","year":"2017"},{"key":"B22","author":"Zhang","year":"2019","journal-title":"Medical Image Classification Under Class Imbalance"},{"key":"B23","doi-asserted-by":"publisher","first-page":"5449","DOI":"10.1007\/s10489-022-03953-y","article-title":"An empirical study on the joint impact of feature selection and data resampling on imbalance classification","volume":"53","author":"Zhang","year":"2023","journal-title":"Appl. Intell"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2025.1455442\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T12:57:47Z","timestamp":1741870667000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2025.1455442\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,13]]},"references-count":23,"alternative-id":["10.3389\/fdata.2025.1455442"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2025.1455442","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,13]]},"article-number":"1455442"}}