{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T20:52:56Z","timestamp":1781124776585,"version":"3.54.1"},"reference-count":31,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T00:00:00Z","timestamp":1654041600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Key scientific research projects of colleges and universities in Henan Province","award":["20A520012"],"award-info":[{"award-number":["20A520012"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11471102"],"award-info":[{"award-number":["11471102"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["12071112"],"award-info":[{"award-number":["12071112"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["International Journal of Distributed Sensor Networks"],"published-print":{"date-parts":[[2022,6]]},"abstract":"<jats:p> As a new and efficient ensemble learning algorithm, XGBoost has been widely applied for its multitudinous advantages, but its classification effect in the case of data imbalance is often not ideal. Aiming at this problem, an attempt was made to optimize the regularization term of XGBoost, and a classification algorithm based on mixed sampling and ensemble learning is proposed. The main idea is to combine SVM-SMOTE over-sampling and EasyEnsemble under-sampling technologies for data processing, and then obtain the final model based on XGBoost by training and ensemble. At the same time, the optimal parameters are automatically searched and adjusted through the Bayesian optimization algorithm to realize classification prediction. In the experimental stage, the G-mean and area under the curve (AUC) values are used as evaluation indicators to compare and analyze the classification performance of different sampling methods and algorithm models. The experimental results on the public data set also verify the feasibility and effectiveness of the proposed algorithm. <\/jats:p>","DOI":"10.1177\/15501329221106935","type":"journal-article","created":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T06:10:22Z","timestamp":1656569422000},"page":"155013292211069","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":229,"title":["Research and application of XGBoost in imbalanced data"],"prefix":"10.1177","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8500-8727","authenticated-orcid":false,"given":"Ping","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yiqiao","family":"Jia","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Youlin","family":"Shang","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2022,6,29]]},"reference":[{"issue":"6","key":"bibr1-15501329221106935","first-page":"98","volume":"47","author":"Song LL","year":"2020","journal-title":"Comput Sci"},{"issue":"7","key":"bibr2-15501329221106935","first-page":"102","volume":"33","author":"Liu DX","year":"2019","journal-title":"J Chongqing Univ Technol Nat Sci"},{"key":"bibr3-15501329221106935","volume-title":"Research on imbalanced dataset classification","author":"Fan XN","year":"2011"},{"key":"bibr4-15501329221106935","volume-title":"Research on imbalanced classification method based on XGBoost","author":"Wan ZC","year":"2018"},{"issue":"24","key":"bibr5-15501329221106935","first-page":"12","volume":"56","author":"Xu LL","year":"2020","journal-title":"Comput Eng Appl"},{"key":"bibr6-15501329221106935","first-page":"107","volume-title":"Proceedings of the European conference on principles of data mining and knowledge discovery","author":"Chawla NV"},{"key":"bibr7-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1155\/2016\/5873769"},{"key":"bibr8-15501329221106935","first-page":"785","volume-title":"Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining","author":"Chen T"},{"issue":"4","key":"bibr9-15501329221106935","first-page":"118","volume":"40","author":"Qu WL","year":"2019","journal-title":"J Jilin Norm Univ Nat Sci Ed"},{"issue":"1","key":"bibr10-15501329221106935","first-page":"73","volume":"39","author":"Yu GL","year":"2019","journal-title":"Electr Pow Autom Equip"},{"issue":"12","key":"bibr11-15501329221106935","first-page":"1486","volume":"23","author":"Ma QQ","year":"2020","journal-title":"Chin Gen Pract"},{"issue":"3","key":"bibr12-15501329221106935","first-page":"814","volume":"37","author":"Yuan LX","year":"2020","journal-title":"Appl Res Comput"},{"key":"bibr13-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.108197"},{"key":"bibr14-15501329221106935","doi-asserted-by":"publisher","DOI":"10.5194\/gmd-14-1493-2021"},{"issue":"12","key":"bibr15-15501329221106935","first-page":"2536","volume":"44","author":"Zhang XL","year":"2018","journal-title":"J Beijing Univ Aeronaut Astronaut"},{"issue":"1","key":"bibr16-15501329221106935","first-page":"228","volume":"28","author":"Liu Y","year":"2019","journal-title":"Comput Syst Appl"},{"key":"bibr17-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106758"},{"key":"bibr18-15501329221106935","doi-asserted-by":"publisher","DOI":"10.3390\/rs13132577"},{"issue":"1","key":"bibr19-15501329221106935","first-page":"23","volume":"35","author":"Li YZ","year":"2018","journal-title":"J Guangdong Univ Technol"},{"issue":"20","key":"bibr20-15501329221106935","first-page":"202","volume":"55","author":"Wang Y","year":"2019","journal-title":"Comput Eng Appl"},{"issue":"3","key":"bibr21-15501329221106935","first-page":"315","volume":"46","author":"Zhang CF","year":"2020","journal-title":"Comput Eng"},{"key":"bibr22-15501329221106935","volume-title":"Anomaly detection of bolt tightening process for imbalanced data sets","author":"Jia QC","year":"2018"},{"key":"bibr23-15501329221106935","volume-title":"Application of hybrid XGBoost model in unbalanced dataset classification predication","author":"Cui LS","year":"2018"},{"key":"bibr24-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2009.2029559"},{"key":"bibr25-15501329221106935","unstructured":"Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features, 2017, https:\/\/proceedings.neurips.cc\/paper\/2018\/file\/14491b756b3a51daac41c24863285549-Paper.pdf"},{"key":"bibr26-15501329221106935","unstructured":"Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree, 2017, https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf"},{"key":"bibr27-15501329221106935","volume-title":"Research on XGBoost performance optimization based on imbalanced data","author":"Yue QS","year":"2019"},{"key":"bibr28-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"issue":"6","key":"bibr29-15501329221106935","first-page":"1073","volume":"14","author":"Shi H","year":"2019","journal-title":"CAAI Trans Intell Syst"},{"key":"bibr30-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2008.2007853"},{"key":"bibr31-15501329221106935","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2005.00503.x"}],"container-title":["International Journal of Distributed Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/15501329221106935","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/15501329221106935","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/15501329221106935","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T06:10:48Z","timestamp":1656569448000},"score":1,"resource":{"primary":{"URL":"http:\/\/journals.sagepub.com\/doi\/10.1177\/15501329221106935"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6]]},"references-count":31,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["10.1177\/15501329221106935"],"URL":"https:\/\/doi.org\/10.1177\/15501329221106935","relation":{},"ISSN":["1550-1329","1550-1477"],"issn-type":[{"value":"1550-1329","type":"print"},{"value":"1550-1477","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6]]}}}