{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T05:38:28Z","timestamp":1782970708530,"version":"3.54.5"},"reference-count":30,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,3,29]],"date-time":"2023-03-29T00:00:00Z","timestamp":1680048000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009226","name":"2021 NCAE-C-002: Cyber Research Innovation Grant Program","doi-asserted-by":"publisher","award":["H98230-21-1-0170"],"award-info":[{"award-number":["H98230-21-1-0170"]}],"id":[{"id":"10.13039\/100009226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>This study, focusing on identifying rare attacks in imbalanced network intrusion datasets, explored the effect of using different ratios of oversampled to undersampled data for binary classification. Two designs were compared: random undersampling before splitting the training and testing data and random undersampling after splitting the training and testing data. This study also examines how oversampling\/undersampling ratios affect random forest classification rates in datasets with minority dataor rare attacks. The results suggest that random undersampling before splitting gives better classification rates; however, random undersampling after oversampling with BSMOTE allows for the use of lower ratios of oversampled data.<\/jats:p>","DOI":"10.3390\/fi15040130","type":"journal-article","created":{"date-parts":[[2023,3,30]],"date-time":"2023-03-30T01:05:26Z","timestamp":1680138326000},"page":"130","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1886-4582","authenticated-orcid":false,"given":"Sikha","family":"Bagui","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0106-3890","authenticated-orcid":false,"given":"Dustin","family":"Mink","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Subhash","family":"Bagui","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, University of West Florida, Pensacola, FL 32514, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sakthivel","family":"Subramaniam","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniel","family":"Wallace","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of West Florida, Pensacola, FL 32514, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,29]]},"reference":[{"key":"ref_1","unstructured":"(2023, March 01). Zippia, How Many People Use the Internet?. Available online: https:\/\/www.zippia.com\/advice\/how-many-people-use-the-internet\/."},{"key":"ref_2","unstructured":"(2023, February 15). CSO, Up to Three Percent of Internet Traffic is Malicious, Researcher Says. Available online: https:\/\/www.csoonline.com\/article\/2122506\/up-to-three-percent-of-internet-traffic-is-malicious--researcher-says.html."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1186\/s40537-020-00390-x","article-title":"Resampling Imbalanced Data for Network Intrusion Detection Datasets","volume":"8","author":"Bagui","year":"2021","journal-title":"J. Big Data"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Moustafa, N., and Slay, J. (2015, January 10\u201312). UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 network data set). Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia.","DOI":"10.1109\/MilCIS.2015.7348942"},{"key":"ref_5","unstructured":"(2023, February 01). UWF-ZeekData22 Dataset. Available online: Datasets.uwf.edu."},{"key":"ref_6","unstructured":"(2022, December 12). Machine Learning Mastery Random Oversampling and Undersampling for Imbalanced Classification. Available online: https:\/\/imbalanced-learn.readthedocs.io\/en\/stable\/generated\/imblearn.under_sampling.RandomUnderSampler.html#imblearn.under_sampling.RandomUnderSampler."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic Minority Over-sampling Technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Han, H., Wang, W.-Y., and Mao, B.-G. (2005, January 23\u201326). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Proceedings of the International Conference on Intelligent Computing, Hefei, China.","DOI":"10.1007\/11538059_91"},{"key":"ref_9","unstructured":"He, H., Bai, Y., Garcia, E.A., and Li, S. (2008, January 1\u20138). ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China."},{"key":"ref_10","first-page":"238","article-title":"To Combat Multi-class Imbalanced Problems by Means of Over-sampling Techniques","volume":"28","author":"Abdi","year":"2016","journal-title":"IEEE"},{"key":"ref_11","unstructured":"(2023, January 05). Imbalanced-Learn, RandomUnderSampler. Available online: https:\/\/imbalanced-learn.org\/stable\/references\/generated\/imblearn.under_sampling.RandomUnderSampler.html."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Shamsudin, H., Yusof, U., Jayalakshmi, A., and Akmal Khalid, M. (2020, January 9\u201311). Combining Oversampling and Undersampling Techniques for Imbalanced Classification: A Comparative Study Using Credit Card Fraudulent Transaction Dataset. Proceedings of the 2020 IEEE 16th International Conference on Control & Automation, Singapore.","DOI":"10.1109\/ICCA51439.2020.9264517"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/S0031-3203(02)00257-1","article-title":"Strategies for Learning in Class Imbalance Problems","volume":"36","author":"Barandela","year":"2003","journal-title":"Pattern Recognit."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Vandewiele, G., Dehaene, I., Kov\u00e1cs, G., Sterckx, L., Janssens, O., Ongenae, F., De Backere, F., De Turck, F., Roelens, K., and Decruyenaere, J. (2020). Overly Optimistic Prediction Results on Imbalanced Data: Flaws and benefits of Applying Over-sampling. Artif. Intell. Med., preprint.","DOI":"10.1016\/j.artmed.2020.101987"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bajer, D., Zon\u0107, B., Dudjak, M., and Martinovi\u0107, G. (2019, January 5\u20137). Performance Analysis of SMOTE-based Oversampling Techniques When Dealing with Data Imbalance. Proceedings of the 2019 International Conference on Systems, Signals and Image Processing (IWSSIP), Osijek, Croatia.","DOI":"10.1109\/IWSSIP.2019.8787306"},{"key":"ref_16","first-page":"39","article-title":"Classifying UNSW-NB15 Network Traffic in the Big Data Framework Using Random Forest in Spark","volume":"2","author":"Bagui","year":"2021","journal-title":"Int. J. Big Data Intell. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Koziarski, M. (2021, January 18\u201322). CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification. Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China.","DOI":"10.1109\/IJCNN52387.2021.9533415"},{"key":"ref_18","unstructured":"Liu, A.Y. (2004). The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets. [Ph.D. Thesis, The University of Texas at Austin]."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1111\/j.0824-7935.2004.t01-1-00228.x","article-title":"A Multiple Resampling Method for Learning from Imbalanced Data Sets","volume":"20","author":"Estabrooks","year":"2004","journal-title":"Comput. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gonzalez-Cuautle, D., Hernandez-Suarez, A., Sanchez-Perez, G., Toscano-Medina, L.K., Portillo-Portillo, J., Olivares-Mercado, J., Perez-Meana, H.M., and Sandoval-Orozco, A.L. (2020). Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets. Appl. Sci., 10.","DOI":"10.3390\/app10030794"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Bagui, S.S., Mink, D., Bagui, S.C., Ghosh, T., Plenkers, R., McElroy, T., Dulaney, S., and Shabanali, S. (2023). Introducing UWF-ZeekData22: A Comprehensive Network Traffic Dataset Based on the MITRE ATT&CK Framework. Data, 8.","DOI":"10.3390\/data8010018"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bagui, S., Mink, D., Bagui, S., Ghosh, T., McElroy, T., Paredes, E., Khasnavis, N., and Plenkers, R. (2022). Detecting Reconnaissance and Discovery Tactics from the MITRE ATT&CK Framework in Zeek Conn Logs Using Spark\u2019s Machine Learning in the Big Data Framework. Sensors, 22.","DOI":"10.3390\/s22207999"},{"key":"ref_23","unstructured":"Han, J., Kamber, M., and Pei, J. (2022). Data Mining: Concepts and Techniques, Morgan Kaufmann."},{"key":"ref_24","first-page":"1","article-title":"Random Forests","volume":"45","author":"Brieman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_25","unstructured":"(2023, March 01). SparkApache StringIndexer. Available online: https:\/\/spark.apache.org\/docs\/latest\/api\/python\/reference\/api\/pyspark.ml.feature.StringIndexer.html."},{"key":"ref_26","unstructured":"(2023, March 01). Understand TCP\/IP Addressing and Subnetting Basics. Available online: https:\/\/docs.microsoft.com\/en-us\/troubleshoot\/windows-client\/networking\/tcpip-addressing-and-subnetting."},{"key":"ref_27","unstructured":"(2023, March 02). Service Name and Transport Protocol Port Number Registry. Available online: https:\/\/www.iana.org\/assignments\/service-names-port-numbers\/service-names-port-numbers.xhtml."},{"key":"ref_28","unstructured":"(2023, February 12). Scikit Learn 3.3 Metrics and Scoring: Quantifying the Quality of Predictions. Available online: https:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html#accuracy-score."},{"key":"ref_29","first-page":"37","article-title":"Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation","volume":"2","author":"Powders","year":"2011","journal-title":"J. Mach. Learn. Technol."},{"key":"ref_30","unstructured":"(2023, February 12). sklearn.metrics.precision_recall_fscore_support. Available online: https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.precision_recall_fscore_support.html."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/4\/130\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:05:59Z","timestamp":1760123159000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/15\/4\/130"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,29]]},"references-count":30,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["fi15040130"],"URL":"https:\/\/doi.org\/10.3390\/fi15040130","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,29]]}}}