{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T22:41:47Z","timestamp":1773528107500,"version":"3.50.1"},"reference-count":29,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,4,16]],"date-time":"2022-04-16T00:00:00Z","timestamp":1650067200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of knowledge by evaluating the performance of gradient boosting-based ensembles, including gradient boosting machine (GBM), extreme gradient boosting (XGBoost), LightGBM, and CatBoost. This paper assesses the performance of various imbalanced data sets using the Matthew correlation coefficient (MCC), area under the receiver operating characteristic curve (AUC), and F1 metrics. The article discusses an example of anomaly detection in an industrial control network and, more specifically, threat detection in a cyber-physical smart power grid. The tests\u2019 results indicate that CatBoost surpassed its competitors, regardless of the imbalance ratio of the data sets. Moreover, LightGBM showed a much lower performance value and had more variability across the data sets.<\/jats:p>","DOI":"10.3390\/bdcc6020041","type":"journal-article","created":{"date-parts":[[2022,4,16]],"date-time":"2022-04-16T07:42:41Z","timestamp":1650094961000},"page":"41","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Revisiting Gradient Boosting-Based Approaches for Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8274-0990","authenticated-orcid":false,"given":"Maya Hilda Lestari","family":"Louk","sequence":"first","affiliation":[{"name":"Department of Informatics Engineering, University of Surabaya, Surabaya 60293, Indonesia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1821-6438","authenticated-orcid":false,"given":"Bayu Adhi","family":"Tama","sequence":"additional","affiliation":[{"name":"Data Science Group, Institute for Basic Science (IBS), Daejeon 34141, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"112296","DOI":"10.1016\/j.enpol.2021.112296","article-title":"Does power grid infrastructure stimulate regional economic growth?","volume":"155","author":"Xu","year":"2021","journal-title":"Energy Policy"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wei, R., Kelly, T.P., Hawkins, R., and Armengaud, E. (2017). Deis: Dependability engineering innovation for cyber-physical systems. Federation of International Conferences on Software Technologies: Applications and Foundations, Springer.","DOI":"10.1007\/978-3-319-74730-9_37"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Irmak, E., and Erkek, \u0130. (2018, January 22\u201325). An overview of cyber-attack vectors on SCADA systems. Proceedings of the 2018 6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Turkey.","DOI":"10.1109\/ISDFS.2018.8355379"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"352","DOI":"10.1016\/j.ins.2019.12.029","article-title":"Worst-case \u03f5-stealthy false data injection attacks in cyber-physical systems","volume":"515","author":"Li","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"107211","DOI":"10.1016\/j.compeleceng.2021.107211","article-title":"Detection of false data cyber-attacks for the assessment of security in smart grid using deep learning","volume":"93","author":"Sengan","year":"2021","journal-title":"Comput. Electr. Eng."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1109\/TII.2015.2420951","article-title":"Classification of disturbances and cyber-attacks in power systems using heterogeneous time-synchronized data","volume":"11","author":"Pan","year":"2015","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"3104","DOI":"10.1109\/TSG.2015.2409775","article-title":"Developing a hybrid intrusion detection system using data mining for power systems","volume":"6","author":"Pan","year":"2015","journal-title":"IEEE Trans. Smart Grid"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1109\/TSUSC.2019.2906657","article-title":"An integrated framework for privacy-preserving based anomaly detection for cyber-physical systems","volume":"6","author":"Keshk","year":"2019","journal-title":"IEEE Trans. Sustain. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2559","DOI":"10.1109\/TNSE.2021.3099371","article-title":"Intrusion detection in SCADA based power grids: Recursive feature elimination model with majority vote ensemble algorithm","volume":"8","author":"Upadhyay","year":"2021","journal-title":"IEEE Trans. Netw. Sci. Eng."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Xu, Z., Huang, G., Weinberger, K.Q., and Zheng, A.X. (2014, January 24\u201327). Gradient boosted feature selection. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2623330.2623635"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"100357","DOI":"10.1016\/j.cosrev.2020.100357","article-title":"Ensemble learning for intrusion detection systems: A systematic mapping study and cross-benchmark evaluation","volume":"39","author":"Tama","year":"2021","journal-title":"Comput. Sci. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_14","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, ACM."},{"key":"ref_15","unstructured":"Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., and Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, ACM."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1109\/TKDE.2012.232","article-title":"MWMOTE\u2013majority weighted minority oversampling technique for imbalanced data set learning","volume":"26","author":"Barua","year":"2012","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_17","unstructured":"Hink, R.C.B., Beaver, J.M., Buckner, M.A., Morris, T., Adhikari, U., and Pan, S. (2014, January 19\u201321). Machine learning for power system disturbance and cyber-attack discrimination. Proceedings of the 2014 7th International Symposium on Resilient Control Systems (ISRCS), Denver, CO, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Keshk, M., Moustafa, N., Sitnikova, E., and Creech, G. (2017, January 14\u201316). Privacy preservation intrusion detection technique for SCADA systems. Proceedings of the 2017 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia.","DOI":"10.1109\/MilCIS.2017.8190422"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1104","DOI":"10.1109\/TNSM.2020.3032618","article-title":"Gradient boosting feature selection with machine learning classifiers for intrusion detection on power grids","volume":"18","author":"Upadhyay","year":"2020","journal-title":"IEEE Trans. Netw. Serv. Manag."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Louk, M.H.L., and Tama, B.A. (2021). Exploring Ensemble-Based Class Imbalance Learners for Intrusion Detection in Industrial Control Networks. Big Data Cogn. Comput., 5.","DOI":"10.3390\/bdcc5040072"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chicco, D., and Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom., 21.","DOI":"10.1186\/s12864-019-6413-7"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1186\/s13040-021-00244-z","article-title":"The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation","volume":"14","author":"Chicco","year":"2021","journal-title":"BioData Min."},{"key":"ref_23","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1903","DOI":"10.21105\/joss.01903","article-title":"mlr3: A modern object-oriented machine learning framework in R","volume":"4","author":"Lang","year":"2019","journal-title":"J. Open Source Softw."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1007\/s00357-014-9161-z","article-title":"Ward\u2019s hierarchical agglomerative clustering method: Which algorithms implement Ward\u2019s criterion?","volume":"31","author":"Murtagh","year":"2014","journal-title":"J. Classif."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1214\/aoms\/1177731944","article-title":"A comparison of alternative tests of significance for the problem of m rankings","volume":"11","author":"Friedman","year":"1940","journal-title":"Ann. Math. Stat."},{"key":"ref_27","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Japkowicz, N., and Shah, M. (2011). Evaluating Learning Algorithms: A Classification Perspective, Cambridge University Press.","DOI":"10.1017\/CBO9780511921803"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tama, B.A., and Lim, S. (2020). A comparative performance evaluation of classification algorithms for clinical decision support systems. Mathematics, 8.","DOI":"10.3390\/math8101814"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/2\/41\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:55:14Z","timestamp":1760136914000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/2\/41"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,16]]},"references-count":29,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["bdcc6020041"],"URL":"https:\/\/doi.org\/10.3390\/bdcc6020041","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,16]]}}}