{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T05:19:28Z","timestamp":1763097568280,"version":"3.45.0"},"reference-count":51,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T00:00:00Z","timestamp":1762905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>For classification problems, an imbalanced dataset can seriously reduce the learning efficiency in machine learning. In order to solve this problem, many scholars have proposed a series of methods mainly from the data and algorithm levels. At the data level, SMOTE is one of the most effective methods; it creates new minority samples through linearly interpolating between existing minority samples. This paper proposes an improved SMOTE-based data-level oversampling method that leverages a symmetrical cube scoring mechanism. This algorithm first exploits the symmetry properties of cubes to construct a new scoring rule based on different symmetric neighboring cubes, thereby dynamically selecting sample points. It then maps back to the original dimensional space, and generates new samples through multiple linear interpolations. This is equivalent to reducing the data to three dimensions, selecting points in that three-dimensional space, and synthesizing new samples by mapping those points back to the corresponding high-dimensional space. Compared to existing SMOTE variants, the proposed method delivers more targeted performance in regions of varying densities and boundary areas. In the experimental section, the proposed method selects several datasets to synthesize samples under different oversampling methods, and then compare the performances of these methods by calculating some evaluation indicators. In addition, to avoid accidental results caused by relying on a single classifier, the performance of each oversampling method is tested in the experimental section using three commonly used classifiers (SVM, ELM, and MLP). The experimental results show that, compared with other oversampling methods, CS-SMOTE achieves the first place in average ranking. Based on 33 datasets, 3 classifiers, and 3 performance metrics, a total of 297 rankings were obtained, and CS-SMOTE ranked first in 179 of them, accounting for 60.27%, which clearly demonstrates its strong capability in addressing class-imbalanced problems.<\/jats:p>","DOI":"10.3390\/sym17111941","type":"journal-article","created":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T09:10:45Z","timestamp":1763025045000},"page":"1941","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["CS-SMOTE: An Improved Oversampling Method Combining SMOTE Method and Symmetrical Cube Scoring Mechanism"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-6327-6272","authenticated-orcid":false,"given":"Shihao","family":"Song","sequence":"first","affiliation":[{"name":"School of Science, Dalian Maritime University, Dalian 116026, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2301-3803","authenticated-orcid":false,"given":"Sibo","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Science, Dalian Maritime University, Dalian 116026, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengqi","family":"Sun","sequence":"additional","affiliation":[{"name":"College of Fisheries and Life Science, Dalian Ocean University, Dalian 116023, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,12]]},"reference":[{"key":"ref_1","first-page":"3011","article-title":"Gaussian Processes for Machine Learning (GPML) Toolbox","volume":"11","author":"Rasmussen","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"105266","DOI":"10.1016\/j.compfluid.2021.105266","article-title":"Machine learning for vortex induced vibration in turbulent flow","volume":"235","author":"Bai","year":"2022","journal-title":"Comput. Fluids"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"118622","DOI":"10.1016\/j.measurement.2025.118622","article-title":"The class labels and spatial information based fault diagnosis of air handling unit via combining kernel Fischer discriminant analysis with an improved graph convolutional neural network","volume":"257","author":"Zhang","year":"2026","journal-title":"Measurement"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/s10994-011-5256-5","article-title":"Classifier chains for multi-label classification","volume":"85","author":"Read","year":"2011","journal-title":"Mach. Learn."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1504\/IJCSM.2022.124003","article-title":"To solve multi-class pattern classification problems by grid neural network","volume":"15","author":"Kumar","year":"2022","journal-title":"Int. J. Comput. Sci. Math. IJCSM"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/s11063-020-10236-5","article-title":"Binary output layer of extreme learning machine for solving multi-class classification problems","volume":"52","author":"Yang","year":"2020","journal-title":"Neural Process. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"108924","DOI":"10.1016\/j.asoc.2022.108924","article-title":"Handling imbalanced data for aircraft predictive maintenance using the BACHE algorithm","volume":"123","author":"Dangut","year":"2022","journal-title":"Appl. Soft Comput."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"055101","DOI":"10.1063\/5.0008935","article-title":"A cluster-based hybrid sampling approach for imbalanced data classification","volume":"91","author":"Feng","year":"2020","journal-title":"Rev. Sci. Instrum."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"xv","DOI":"10.1016\/j.thorsurg.2021.05.002","article-title":"Recent Advances in Small Cell and Non-Small Cell Lung Cancer, Diagnosis, Staging, and Surgical Treatment: A Tribute to Jean Deslauriers Preface","volume":"31","author":"Shamji","year":"2021","journal-title":"Thorac. Surg. Clin."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.patrec.2020.03.004","article-title":"Adjusting the imbalance ratio by the dimensionality of imbalanced data","volume":"133","author":"Zhu","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"114301","DOI":"10.1016\/j.eswa.2020.114301","article-title":"DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem","volume":"168","author":"Valdovinos","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1142\/S0218488520500026","article-title":"A New Efficient Algorithm based on Multi-classifiers Model for Classification","volume":"28","author":"Zheng","year":"2020","journal-title":"Int. J. Uncertain. Fuzziness Knowl.-Based Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"107077","DOI":"10.1016\/j.optlaseng.2022.107077","article-title":"Dynamic spectroscopic characterization for fast spectral variations based on dual asynchronous undersampling with triple optical frequency combs","volume":"156","author":"Yang","year":"2022","journal-title":"Opt. Lasers Eng."},{"key":"ref_14","first-page":"55","article-title":"Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction","volume":"3","author":"Goyal","year":"2022","journal-title":"Artif. Intell. Rev. Int. Sci. Eng. J."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1016\/j.eswa.2015.10.031","article-title":"Adaptive semi-unsupervised weighted oversampling (A-SUWO) for Imbalanced Datasets","volume":"46","author":"Nekooeimehr","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3677","DOI":"10.1109\/TIM.2011.2135050","article-title":"Oversampling Technique for Obtaining Higher Order Derivative of Low-Frequency Signals","volume":"60","author":"Tan","year":"2011","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"116982","DOI":"10.1016\/j.eswa.2022.116982","article-title":"A new oversampling method and improved radial basis function classifier for customer consumption behavior prediction","volume":"199","author":"Li","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2259038","DOI":"10.1142\/S0218001422590388","article-title":"Synthetic Minority Oversampling Technique Based on Adaptive Noise Optimization and Fast Search for Local Sets for Random Forest","volume":"37","author":"Luo","year":"2023","journal-title":"Int. J. Pattern Recognit. Artif. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"117023","DOI":"10.1016\/j.eswa.2022.117023","article-title":"A novel SMOTE-based resampling technique trough noise detection and the boosting procedure","volume":"200","author":"Salam","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Quan, Y., Zhong, X., Feng, W., Chan, C.W., and Xing, M. (2021). SMOTE-Based Weighted Deep Rotation Forest for the Imbalanced Hyperspectral Data Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030464"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"108511","DOI":"10.1016\/j.patcog.2021.108511","article-title":"FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification","volume":"124","author":"Maldonado","year":"2022","journal-title":"Pattern Recognit. J. Pattern Recognit. Soc."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1016\/j.neunet.2021.03.030","article-title":"A noisy label and negative sample robust loss function for DNN-based distant supervised relation extraction","volume":"473","author":"Deng","year":"2021","journal-title":"Neural Netw."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/S0925-2312(03)00433-8","article-title":"A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine","volume":"55","author":"Cao","year":"2003","journal-title":"Neurocomputing"},{"key":"ref_24","first-page":"6529","article-title":"Big data precision marketing and consumer behavior analysis based on fuzzy clustering and PCA model","volume":"40","author":"Liu","year":"2021","journal-title":"J. Intell. Fuzzy Syst. Appl. Eng. Technol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2392","DOI":"10.1109\/TIT.2009.2016060","article-title":"Divergence estimation for multidimensional densities via k-Nearest-Neighbor distances","volume":"55","author":"Wang","year":"2009","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_26","first-page":"409","article-title":"Incremental and Decremental Support Vector Machine Learning","volume":"13","author":"Cauwenberghs","year":"2001","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1016\/j.neucom.2005.12.126","article-title":"Extreme learning machine: Theory and applications","volume":"70","author":"Huang","year":"2006","journal-title":"Neurocomputing"},{"key":"ref_28","unstructured":"Almeida, L.B. (2020). Multilayer perceptrons. Handbook of Neural Computation, CRC Press."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Johnson, J.M., and Khoshgoftaar, T.M. (2022, January 12\u201314). Cost-sensitive ensemble learning for highly imbalanced classification. Proceedings of the 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas.","DOI":"10.1109\/ICMLA55696.2022.00225"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1007\/s10462-023-10652-8","article-title":"Cost-sensitive learning for imbalanced medical data: A review","volume":"57","author":"Araf","year":"2024","journal-title":"Artif. Intell. Rev."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"108296","DOI":"10.1016\/j.knosys.2022.108296","article-title":"Adaptive cost-sensitive learning: Improving the convergence of intelligent diagnosis models under imbalanced data","volume":"241","author":"Ren","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1080\/01605682.2019.1705193","article-title":"Multi-class misclassification cost matrix for credit ratings in peer-to-peer lending","volume":"72","author":"Wang","year":"2021","journal-title":"J. Oper. Res. Soc."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"108266","DOI":"10.1016\/j.asoc.2021.108266","article-title":"Cost-sensitive matrixized classification learning with information entropy","volume":"116","author":"Wang","year":"2022","journal-title":"Appl. Soft Comput."},{"key":"ref_34","first-page":"104357","article-title":"Fast 3D time-domain airborne EM forward modeling using random under-sampling","volume":"3","author":"Haoman","year":"2021","journal-title":"J. Appl. Geophys."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Moreo, A., Esuli, A., and Sebastiani, F. (2016, January 17\u201321). Distributional Random Oversampling for Imbalanced Text Classification. Proceedings of the SIGIR\u201916: 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy.","DOI":"10.1145\/2911451.2914722"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1109\/TCDS.2021.3074811","article-title":"Electroencephalogram Emotion Recognition Based on Dispersion Entropy Feature Extraction Using Random Oversampling Imbalanced Data Processing","volume":"14","author":"Ding","year":"2022","journal-title":"IEEE Trans. Cogn. Dev. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1007\/s11069-020-04409-7","article-title":"Application of the borderline-SMOTE method in susceptibility assessments of debris flows in Pinggu District, Beijing, China","volume":"105","author":"Li","year":"2021","journal-title":"Nat. Hazards"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"104428","DOI":"10.1016\/j.chemolab.2021.104428","article-title":"PreCar_Deep: A deep learning framework for prediction of protein carbonylation sites based on Borderline-SMOTE strategy","volume":"218","author":"Song","year":"2021","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.ins.2021.04.017","article-title":"Improved CBSO: A Distributed Fuzzy-Based Adaptive Synthetic Oversampling Algorithm for Imbalanced Judicial Data","volume":"569","author":"Dai","year":"2021","journal-title":"Inf. Sci."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"116387","DOI":"10.1016\/j.eswa.2021.116387","article-title":"Geometric SMOTE for regression","volume":"193","author":"Camacho","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"115230","DOI":"10.1016\/j.eswa.2021.115230","article-title":"G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE","volume":"183","author":"Douzas","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_42","first-page":"402","article-title":"Research on random forest drug classification prediction model based on KMeans-SMOTE","volume":"Volume 12458","author":"Song","year":"2022","journal-title":"Proceedings of the International Conference on Biomedical and Intelligent Systems (IC-BIS 2022)"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/s10064-021-02523-9","article-title":"A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification","volume":"81","author":"Li","year":"2022","journal-title":"Bull. Eng. Geol. Environ."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1109\/TKDE.2012.232","article-title":"MWMOTE\u2013Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning","volume":"26","author":"Barua","year":"2013","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from Imbalanced Data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Lipton, Z.C., Elkan, C., and Naryanaswamy, B. (2014). Optimal Thresholding of Classifiers to Maximize F1 Measure, Springer.","DOI":"10.1007\/978-3-662-44851-9_15"},{"key":"ref_47","unstructured":"Wang, R., and Li, J. (August, January 28). Bayes Test of Precision, Recall, and F1 Measure for Comparison of Two Natural Language Processing Models. Proceedings of the Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_48","first-page":"103","article-title":"A Novel Approach to Maximize G-mean in Nonstationary Data with Recurrent Imbalance Shifts","volume":"18","author":"Kulkarni","year":"2021","journal-title":"Int. Arab J. Inf. Technol."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1016\/j.patrec.2021.06.023","article-title":"A Ratio: Extending area under the ROC curve for probabilistic labels","volume":"150","author":"Rachakonda","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"105","DOI":"10.35940\/ijitee.C8403.0110321","article-title":"High Accurate and a Variant of k-fold Cross Validation Technique for Predicting the Decision Tree Classifier Accuracy","volume":"10","author":"Mabuni","year":"2021","journal-title":"Int. J. Innov. Technol. Explor. Eng."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"109656","DOI":"10.1016\/j.petrol.2021.109656","article-title":"Acoustic impedance and lithology-based reservoir porosity analysis using predictive machine learning algorithms","volume":"208","author":"Agbadze","year":"2022","journal-title":"J. Pet. Sci. Eng."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/11\/1941\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T05:17:53Z","timestamp":1763097473000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/11\/1941"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,12]]},"references-count":51,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["sym17111941"],"URL":"https:\/\/doi.org\/10.3390\/sym17111941","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2025,11,12]]}}}