{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,12]],"date-time":"2025-03-12T04:29:51Z","timestamp":1741753791720,"version":"3.38.0"},"reference-count":23,"publisher":"SAGE Publications","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2022,11,12]]},"abstract":"<jats:p>Addressing the problem of imbalanced data category distribution in real applications and the problem of traditional classifiers tending to ensure the accuracy of the majority class while ignoring the accuracy of the minority class when processing imbalanced data, this paper proposes a method called RBSP-Boosting for imbalanced data classification. First, RBSP-Boosting introduces the Shapley value and calculates the Shapley value for each sample of the dataset through the truncated Monte Carlo method. Moreover, the proposed method removes the noise data according to the Shapley value and undersamples the samples with Shapley values less than zero in the majority class. Then, it takes the Shapley value as the weight of the sample and oversamples the minority class according to the weight. Finally, the new dataset is trained on the classifier through the AdaBoost classifier. Experiments are conducted on nine groups of UCI and KEEL datasets, and RBSP-Boosting is compared with four sampling algorithms: Random-OverSampler, SMOTE, Borderline-SMOTE and SVM-SMOTE. Experimental results show that the RBSP-Boosting method in the three evaluation metrics of AUC, F-score and G-mean, compared with the best performance of the four comparison algorithms, increases by 4.69%, 10.3% and 7.86%, respectively. The proposed method can significantly improve the effect of imbalanced data classification.<\/jats:p>","DOI":"10.3233\/ida-216092","type":"journal-article","created":{"date-parts":[[2022,11,4]],"date-time":"2022-11-04T15:33:41Z","timestamp":1667576021000},"page":"1579-1595","source":"Crossref","is-referenced-by-count":0,"title":["RBSP-Boosting: A Shapley value-based resampling approach for imbalanced data classification"],"prefix":"10.1177","volume":"26","author":[{"given":"Weitu","family":"Chong","sequence":"first","affiliation":[{"name":"School of Computer and Electronic Information, Guangxi University, Nanning, Guangxi, China"}]},{"given":"Ningjiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer and Electronic Information, Guangxi University, Nanning, Guangxi, China"},{"name":"Key Laboratory of Parallel and Distributed Computing, Guangxi Colleges and Universities, Nanning, Guangxi, China"}]},{"given":"Chengyun","family":"Fang","sequence":"additional","affiliation":[{"name":"School of Computer and Electronic Information, Guangxi University, Nanning, Guangxi, China"}]}],"member":"179","reference":[{"key":"10.3233\/IDA-216092_ref1","doi-asserted-by":"crossref","first-page":"567","DOI":"10.3233\/IDA-194669","article-title":"Estimating a one-class naive bayes text classifier","volume":"24","author":"Zhang","year":"2020","journal-title":"Intelligent Data Analysis"},{"issue":"34","key":"10.3233\/IDA-216092_ref2","first-page":"1","article-title":"S2OSC: A holistic semi-supervised approach for open set classification","volume":"16","author":"Yang","year":"2021","journal-title":"ACM Trans. Knowl. Discov. Data"},{"issue":"19","key":"10.3233\/IDA-216092_ref3","first-page":"1","article-title":"New multi-view classification method with uncertain data","volume":"16","author":"Liu","year":"2021","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"10.3233\/IDA-216092_ref4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1177\/1550147720916404","article-title":"A review on classification of imbalanced data for wireless sensor networks","volume":"16","author":"Patel","year":"2020","journal-title":"International Journal of Distributed Sensor Networks"},{"key":"10.3233\/IDA-216092_ref5","doi-asserted-by":"crossref","first-page":"2832","DOI":"10.1109\/TNNLS.2019.2917524","article-title":"Self-paced balance learning for clinical skin disease recognition","volume":"31","author":"Yang","year":"2020","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"10.3233\/IDA-216092_ref6","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1109\/TSM.2020.2994357","article-title":"A deep convolutional neural network for wafer defect identification on an imbalanced dataset in semiconductor manufacturing processes","volume":"33","author":"Saqlain","year":"2020","journal-title":"IEEE Transactions on Semiconductor Manufacturing"},{"key":"10.3233\/IDA-216092_ref7","doi-asserted-by":"crossref","first-page":"106266","DOI":"10.1016\/j.cie.2019.106266","article-title":"Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis","volume":"140","author":"Gan","year":"2020","journal-title":"Computers & Industrial Engineering"},{"key":"10.3233\/IDA-216092_ref8","doi-asserted-by":"crossref","first-page":"114035","DOI":"10.1016\/j.eswa.2020.114035","article-title":"CDBH: A clustering and density-based hybrid approach for imbalanced data classification","volume":"164","author":"Mirzaei","year":"2021","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/IDA-216092_ref9","doi-asserted-by":"crossref","first-page":"114692","DOI":"10.1109\/ACCESS.2020.3003346","article-title":"A synthetic minority based on probabilistic distribution (SyMProD) oversampling for imbalanced datasets","volume":"8","author":"Kunakorntum","year":"2020","journal-title":"IEEE Access"},{"key":"10.3233\/IDA-216092_ref10","unstructured":"M. Kubat and S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceeding of the 14th International Conference on Machine Learning, ACM, Nashville, TN, USA, 1997, pp. 179\u2013186."},{"key":"10.3233\/IDA-216092_ref11","doi-asserted-by":"crossref","first-page":"114301","DOI":"10.1016\/j.eswa.2020.114301","article-title":"DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem","volume":"168","author":"Guzm\u00e1n-Ponce","year":"2021","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/IDA-216092_ref12","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawl","year":"2002","journal-title":"Journal of Artificial Intelligence Research"},{"key":"10.3233\/IDA-216092_ref13","doi-asserted-by":"crossref","first-page":"4065","DOI":"10.1109\/TNNLS.2017.2751612","article-title":"Classification of imbalanced data by oversampling in kernel space of support vector machines","volume":"29","author":"Mathew","year":"2018","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"10.3233\/IDA-216092_ref14","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1007\/s10489-019-01543-z","article-title":"An effective distance based feature selection approach for imbalanced data","volume":"50","author":"Shahee","year":"2020","journal-title":"Applied Intelligence"},{"key":"10.3233\/IDA-216092_ref15","doi-asserted-by":"crossref","unstructured":"P.Y. Yang, W. Liu, B.B. Zhou, S. Chawla and A.Y. Zomaya, Ensemble-based wrapper methods for feature selection and class imbalance learning, in: Pacific-Asia Conference on Knowledge Discovery and Date Mining, Springer, Gold Coast, QLD, Australia, 2013, pp. 544\u2013555.","DOI":"10.1007\/978-3-642-37453-1_45"},{"key":"10.3233\/IDA-216092_ref16","doi-asserted-by":"crossref","first-page":"357","DOI":"10.3233\/IDA-183831","article-title":"Cost-sensitive convolutional neural networks for imbalanced time series classification","volume":"23","author":"Geng","year":"2019","journal-title":"Intelligent Data Analysis"},{"key":"10.3233\/IDA-216092_ref17","doi-asserted-by":"crossref","unstructured":"L. Loezer, F. Enembreck, J.P. Barddal and A.D.S. Britto, Cost-sensitive learning for imbalanced data streams, in: Proceedings of the 35th Annual ACM Symposium on Applied Computing (SAC \u201920), ACM, Online event, [Brno, Czech Republic], 2020, pp. 498\u2013504.","DOI":"10.1145\/3341105.3373949"},{"key":"10.3233\/IDA-216092_ref18","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1006\/jcss.1997.1504","article-title":"A decision-theoretic generalization of online learning and an application to boosting","volume":"55","author":"Freund","year":"1999","journal-title":"Journal of Computer & System Sciences"},{"key":"10.3233\/IDA-216092_ref20","first-page":"307","article-title":"A value for n-person games","volume":"2","author":"Shapley","year":"1953","journal-title":"Contributions to the Theory of Games"},{"key":"10.3233\/IDA-216092_ref21","unstructured":"J.X. Wang, J. Wiens and S. Lundberg, Shapley Flow: A Graph-based Approach to Interpreting Model Predictions, in: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, PMLR, Virtual event, 2021, pp.\u00a0721\u2013729."},{"key":"10.3233\/IDA-216092_ref22","unstructured":"A. Ghorbani and J. Zou, Data Shapley: Equitable Valuation of Data for Machine Learning, in: Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, California, USA, 2019, pp. 2242\u20132251."},{"key":"10.3233\/IDA-216092_ref23","doi-asserted-by":"crossref","unstructured":"R.X. Jia, D. Dao, B.X. Wang, F.A. Hubis, N.M. Gurel, B. Li, C. Zhang, C. Spanos and D. Song, Effificient task-specifific data valuation for nearest neighbor algorithms, in: Proceedings of the 45th International Conference on Very Large Data Bases, Morgan Kaufmann, Los Angeles, California, USA, 2019, pp. 1610\u20131623.","DOI":"10.14778\/3342263.3342637"},{"key":"10.3233\/IDA-216092_ref24","doi-asserted-by":"crossref","unstructured":"T.S. Song, Y.X. Tong and S.Y. Wei, Profit Allocation for Federated Learning, in: IEEE International Conference on Big Data (Big Data), IEEE, Los Angeles, California, USA, 2019, pp. 2577\u20132586.","DOI":"10.1109\/BigData47090.2019.9006327"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-216092","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T09:15:43Z","timestamp":1741684543000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-216092"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,12]]},"references-count":23,"journal-issue":{"issue":"6"},"URL":"https:\/\/doi.org\/10.3233\/ida-216092","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"type":"print","value":"1088-467X"},{"type":"electronic","value":"1571-4128"}],"subject":[],"published":{"date-parts":[[2022,11,12]]}}}