{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T22:42:44Z","timestamp":1769640164371,"version":"3.49.0"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2023,10,5]],"date-time":"2023-10-05T00:00:00Z","timestamp":1696464000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020YFB1805400"],"award-info":[{"award-number":["2020YFB1805400"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U19A2068"],"award-info":[{"award-number":["U19A2068"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1736212"],"award-info":[{"award-number":["U1736212"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62032002"],"award-info":[{"award-number":["62032002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62101358"],"award-info":[{"award-number":["62101358"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["2020M683345"],"award-info":[{"award-number":["2020M683345"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["SCU2021D052"],"award-info":[{"award-number":["SCU2021D052"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy classification boundary and uneven distribution of samples. To solve these problems, we propose a novel oversampling algorithm named space partitioning adaptive weighted synthetic minority oversampling technique (SPAW-SMOTE). We first divide the data space into boundary space and non-boundary space based on spatial partitioning techniques. The number of samples to be generated is assigned to different spaces by the designed adaptive weighting algorithm, which is used to solve the problems of uneven distribution of samples and easy to blur the classification boundary. Finally, we also endeavor to develop a new generation algorithm to reduce the probability of overlapping samples generated when synthesizing new samples and to ensure the diversity of new samples. Experimental results on 18 real-world data sets show that the average performance (G-mean, F1-measure and Area Under Curve) of SPAW-SMOTE is significantly better than other existing oversampling techniques.<\/jats:p>","DOI":"10.1093\/comjnl\/bxad098","type":"journal-article","created":{"date-parts":[[2023,10,7]],"date-time":"2023-10-07T06:52:05Z","timestamp":1696661525000},"page":"1747-1762","source":"Crossref","is-referenced-by-count":1,"title":["SPAW-SMOTE: Space Partitioning Adaptive Weighted Synthetic Minority Oversampling Technique For Imbalanced Data Set Learning"],"prefix":"10.1093","volume":"67","author":[{"given":"Qiang","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University , No.24 South Section 1, Yihuan Road, Chengdu 610065 , China"}]},{"given":"Junjiang","family":"He","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University , No.24 South Section 1, Yihuan Road, Chengdu 610065 , China"}]},{"given":"Tao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University , No.24 South Section 1, Yihuan Road, Chengdu 610065 , China"}]},{"given":"Xiaolong","family":"Lan","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University , No.24 South Section 1, Yihuan Road, Chengdu 610065 , China"}]},{"given":"Wenbo","family":"Fang","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University , No.24 South Section 1, Yihuan Road, Chengdu 610065 , China"}]},{"given":"Yihong","family":"Li","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Sichuan University , No.24 South Section 1, Yihuan Road, Chengdu 610065 , China"}]}],"member":"286","published-online":{"date-parts":[[2023,10,5]]},"reference":[{"key":"2024062312365471300_ref1","article-title":"A systematic review on imbalanced data challenges in machine learning: Applications and solutions","author":"Kaur","year":"2019"},{"key":"2024062312365471300_ref2","first-page":"320","article-title":"Text classification feature extraction method based on deep learning for unbalanced data sets","author":"Lin","year":"2020"},{"key":"2024062312365471300_ref3","first-page":"23","article-title":"C-pugp: a cluster-based positive unlabeled learning method for disease gene prediction and prioritization","volume":"76","author":"Vasighizaker","year":"2018"},{"key":"2024062312365471300_ref4","first-page":"234","article-title":"Sequence classification for credit-card fraud detection","volume":"100","author":"Jurgovsky","year":"2018"},{"key":"2024062312365471300_ref5","article-title":"A fast network intrusion detection system using adaptive synthetic oversampling and lightgbm","volume":"106","author":"Liu","year":"2021"},{"key":"2024062312365471300_ref6","first-page":"220","article-title":"Learning from class-imbalanced data: review of methods and applications","volume":"73","author":"Haixiang","year":"2017"},{"key":"2024062312365471300_ref7","first-page":"429","article-title":"Data imbalance in classification: experimental evaluation","volume":"513","author":"Thabtah","year":"2020"},{"key":"2024062312365471300_ref8","first-page":"863","article-title":"Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary","volume":"61","author":"Fern\u00e1ndez","year":"2018"},{"key":"2024062312365471300_ref9","first-page":"321","article-title":"Smote: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002"},{"key":"2024062312365471300_ref10","first-page":"107588","article-title":"Svdd boundary and dpc clustering technique-based oversampling approach for handling imbalanced and overlapped data","volume":"234","author":"Tao","year":"2021"},{"key":"2024062312365471300_ref11","first-page":"878","article-title":"Borderline-smote: a new over-sampling method in imbalanced data sets learning","author":"Han","year":"2005"},{"key":"2024062312365471300_ref12","first-page":"1322","article-title":"Adasyn: Adaptive synthetic sampling approach for imbalanced learning","author":"He","year":"2008"},{"key":"2024062312365471300_ref13","first-page":"317","article-title":"Prowsyn: Proximity weighted synthetic oversampling technique for imbalanced data set learning","author":"Barua","year":"2013"},{"key":"2024062312365471300_ref14","first-page":"664","article-title":"Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning","author":"Tang","year":"2015"},{"key":"2024062312365471300_ref15","first-page":"405","article-title":"Mwmote\u2013majority weighted minority oversampling technique for imbalanced data set learning","volume":"26","author":"Barua","year":"2012"},{"key":"2024062312365471300_ref16","first-page":"67","article-title":"Multiclass imbalanced classification using fuzzy c-mean and smote with fuzzy support vector machine","author":"Pruengkarn","year":"2017"},{"key":"2024062312365471300_ref17","first-page":"43","article-title":"Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering","volume":"519","author":"Tao","year":"2020"},{"key":"2024062312365471300_ref18","first-page":"1","article-title":"Cure-smote algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests","volume":"18","author":"Ma","year":"2017"},{"key":"2024062312365471300_ref19","first-page":"1325","article-title":"Hybrid prediction model for type 2 diabetes and hypertension using dbscan-based outlier detection, synthetic minority over sampling technique (smote), and random forest","volume":"8","author":"Ijaz","year":"2018"},{"key":"2024062312365471300_ref20","first-page":"1","article-title":"Improving imbalanced learning through a heuristic oversampling method based on k-means and smote","volume":"465","author":"Douzas","year":"2018"},{"key":"2024062312365471300_ref21","first-page":"20","article-title":"A study of the behavior of several methods for balancing machine learning training data","volume":"6","author":"Batista","year":"2004"},{"key":"2024062312365471300_ref22","first-page":"1394","article-title":"Smote-wenn: solving class imbalance and small sample problems by oversampling and distance scaling","volume":"51","author":"Guan","year":"2021"},{"key":"2024062312365471300_ref23","first-page":"245","article-title":"Smote-rsb${^\\ast }$: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory","volume":"33","author":"Ramentol","year":"2012"},{"key":"2024062312365471300_ref24","first-page":"552","article-title":"Instance selection and class balancing techniques for cross project defect prediction","author":"Bispo","year":"2018"},{"key":"2024062312365471300_ref25","first-page":"184","article-title":"Smote\u2013ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering","volume":"291","author":"S\u00e1ez","year":"2015"},{"key":"2024062312365471300_ref26","first-page":"399","article-title":"Enhancing prediction on imbalance data by thresholding technique with noise filtering","author":"Radwan","year":"2017"},{"key":"2024062312365471300_ref27","first-page":"727","article-title":"Ccr: a combined cleaning and resampling algorithm for imbalanced data classification","volume":"27","author":"Koziarski","year":"2017"},{"key":"2024062312365471300_ref28","first-page":"107269","article-title":"Sp-smote: a novel space partitioning based synthetic minority oversampling technique","volume":"228","author":"Li","year":"2021"},{"key":"2024062312365471300_ref29","first-page":"677","article-title":"New oversampling approaches based on polynomial fitting for imbalanced data sets","author":"Gazzah","year":"2008"},{"key":"2024062312365471300_ref30","article-title":"An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets","volume":"83","author":"Kov\u00e1cs","year":"2019"},{"key":"2024062312365471300_ref31","first-page":"273","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995"},{"key":"2024062312365471300_ref32","doi-asserted-by":"crossref","DOI":"10.1002\/9781118548387","article-title":"Applied Logistic Regression, 3","author":"Hosmer","year":"2013"},{"key":"2024062312365471300_ref33","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011"},{"key":"2024062312365471300_ref34","article-title":"Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework","volume":"17","author":"Alcal\u00e1-Fdez","year":"2011"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/67\/5\/1747\/58307858\/bxad098.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/67\/5\/1747\/58307858\/bxad098.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,23]],"date-time":"2024-06-23T12:37:28Z","timestamp":1719146248000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/67\/5\/1747\/7292004"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,5]]},"references-count":34,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,10,5]]},"published-print":{"date-parts":[[2024,6,22]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxad098","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"value":"0010-4620","type":"print"},{"value":"1460-2067","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5]]},"published":{"date-parts":[[2023,10,5]]}}}