{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,20]],"date-time":"2026-06-20T16:56:30Z","timestamp":1781974590631,"version":"3.54.5"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T00:00:00Z","timestamp":1606262400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T00:00:00Z","timestamp":1606262400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Computer networks intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are critical aspects that contribute to the success of an organization. Over the past years, IDSs and IPSs using different approaches have been developed and implemented to ensure that computer networks within enterprises are secure, reliable and available. In this paper, we focus on IDSs that are built using machine learning (ML) techniques. IDSs based on ML methods are effective and accurate in detecting networks attacks. However, the performance of these systems decreases for high dimensional data spaces. Therefore, it is crucial to implement an appropriate feature extraction method that can prune some of the features that do not possess a great impact in the classification process. Moreover, many of the ML based IDSs suffer from an increase in false positive rate and a low detection accuracy when the models are trained on highly imbalanced datasets. In this paper, we present an analysis the UNSW-NB15 intrusion detection dataset that will be used for training and testing our models. Moreover, we apply a filter-based feature reduction technique using the XGBoost algorithm. We then implement the following ML approaches using the reduced feature space: Support Vector Machine (SVM), k-Nearest-Neighbour (kNN), Logistic Regression (LR), Artificial Neural Network (ANN) and Decision Tree (DT). In our experiments, we considered both the binary and multiclass classification configurations. The results demonstrated that the XGBoost-based feature selection method allows for methods such as the DT to increase its test accuracy from 88.13 to 90.85% for the binary classification scheme.<\/jats:p>","DOI":"10.1186\/s40537-020-00379-6","type":"journal-article","created":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T11:02:48Z","timestamp":1606302168000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":471,"title":["Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset"],"prefix":"10.1186","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8989-5004","authenticated-orcid":false,"given":"Sydney M.","family":"Kasongo","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yanxia","family":"Sun","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,11,25]]},"reference":[{"key":"379_CR1","doi-asserted-by":"publisher","first-page":"38367","DOI":"10.1109\/ACCESS.2018.2854599","volume":"6","author":"Z Wang","year":"2018","unstructured":"Wang Z: Deep learning-based intrusion detection with adversaries. IEEE Access. 2018;6:38367\u2013384.","journal-title":"IEEE Access"},{"key":"379_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.icte.2020.03.002","volume-title":"A deep gated recurrent unit based model for wireless intrusion detection system","author":"SM Kasongo","year":"2020","unstructured":"Kasongo SM, Sun Y. A deep gated recurrent unit based model for wireless intrusion detection system. Cakovec: ICT Express; 2020."},{"key":"379_CR3","doi-asserted-by":"publisher","first-page":"23154","DOI":"10.1109\/ACCESS.2020.2969626","volume":"8","author":"J Ribeiro","year":"2020","unstructured":"Ribeiro J, Saghezchi FB, Mantas G, Rodriguez J, Abd-Alhameed RA. Hidroid: prototyping a behavioral host-based intrusion detection and prevention system for android. IEEE Access. 2020;8:23154\u2013168.","journal-title":"IEEE Access"},{"key":"379_CR4","unstructured":"Van NTT, Thinh TN. Accelerating anomaly-based IDS using neural network on GPU. In: 2015 international conference on advanced computing and applications (ACOMP). IEEE; 2015. pp. 67\u201374."},{"key":"379_CR5","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1016\/j.procs.2015.04.191","volume":"48","author":"J Jabez","year":"2015","unstructured":"Jabez J, Muthukumar B. Intrusion detection system (IDS): anomaly detection using outlier detection approach. Procedia Comput Sci. 2015;48:338\u201346.","journal-title":"Procedia Comput Sci"},{"key":"379_CR6","doi-asserted-by":"crossref","unstructured":"Neelakantan S, Rao S. A threat-aware anomaly-based intrusion-detection approach for obtaining network-specific useful alarms. In: International conference on distributed computing and networking. Springer. 2009; pp. 175\u2013180.","DOI":"10.1007\/978-3-540-92295-7_21"},{"key":"379_CR7","doi-asserted-by":"publisher","first-page":"38597","DOI":"10.1109\/ACCESS.2019.2905633","volume":"7","author":"SM Kasongo","year":"2019","unstructured":"Kasongo SM, Sun Y. A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE Access. 2019; 7:38597\u2013607.","journal-title":"IEEE Access"},{"key":"379_CR8","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-3-319-18305-3_1","volume-title":"Machine learning in radiation oncology","author":"I El\u00a0Naqa","year":"2015","unstructured":"El\u00a0Naqa I, Murphy MJ. What is machine learning? In: Machine learning in radiation oncology. Berlin: Springer; 2015. p. 3\u201311."},{"key":"379_CR9","doi-asserted-by":"crossref","unstructured":"Khatri S, Arora A, Agrawal AP. Supervised machine learning algorithms for credit card fraud detection: a comparison. In: 2020 10th international conference on cloud computing, data science & engineering (confluence), IEEE; 2020. pp. 680\u201383.","DOI":"10.1109\/Confluence47617.2020.9057851"},{"key":"379_CR10","doi-asserted-by":"crossref","unstructured":"Singh P. Supervised machine learning. In: Learn PySpark. Springer; 2019. pp. 117\u201359.","DOI":"10.1007\/978-1-4842-4961-1_6"},{"key":"379_CR11","volume-title":"Machine learning in action","author":"P Harrington","year":"2012","unstructured":"Harrington P. Machine learning in action. New York: Manning Publications Co.; 2012."},{"key":"379_CR12","volume-title":"Feature engineering for machine learning and data analytics","author":"G Dong","year":"2018","unstructured":"Dong G, Liu H. Feature engineering for machine learning and data analytics. Boca Raton: CRC Press; 2018."},{"key":"379_CR13","unstructured":"Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016; pp. 785\u201394."},{"issue":"1","key":"379_CR14","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1109\/TSMCB.2006.883267","volume":"37","author":"Z Zhu","year":"2007","unstructured":"Zhu Z, Ong Y-S, Dash M. Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Sys Man Cybern Part B (Cybern). 2007;37(1):70\u20136.","journal-title":"IEEE Trans Sys Man Cybern Part B (Cybern)"},{"issue":"3","key":"379_CR15","doi-asserted-by":"publisher","first-page":"4815","DOI":"10.1109\/JIOT.2018.2871719","volume":"6","author":"N Moustafa","year":"2018","unstructured":"Moustafa N, Turnbull B, Choo K-KR. An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet Things J. 2018;6(3):4815\u2013830.","journal-title":"IEEE Internet Things J"},{"issue":"1\u20133","key":"379_CR16","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1080\/19393555.2015.1125974","volume":"25","author":"N Moustafa","year":"2016","unstructured":"Moustafa N, Slay J. The evaluation of network anomaly detection systems: Statistical analysis of the unsw-nb15 data set and the comparison with the KDD99 data set. Inf Secur J A Glob Perspect. 2016;25(1\u20133):18\u201331.","journal-title":"Inf Secur J A Glob Perspect"},{"key":"379_CR17","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1016\/j.cose.2017.06.005","volume":"70","author":"C Khammassi","year":"2017","unstructured":"Khammassi C, Krichen S. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur 2017;70:255\u201377.","journal-title":"Comput Secur"},{"issue":"1","key":"379_CR18","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1186\/s13638-016-0623-3","volume":"2016","author":"O Osanaiye","year":"2016","unstructured":"Osanaiye O, Cai H, Choo K-KR, Dehghantanha A, Xu Z, Dlodlo M. Ensemble-based multi-filter feature selection method for DDOS detection in cloud computing. EURASIP J Wirel Commun Netw. 2016;2016(1):130.","journal-title":"EURASIP J Wirel Commun Netw"},{"issue":"10","key":"379_CR19","doi-asserted-by":"publisher","first-page":"2986","DOI":"10.1109\/TC.2016.2519914","volume":"65","author":"MA Ambusaidi","year":"2016","unstructured":"Ambusaidi MA, He X, Nanda P, Tan Z. Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput. 2016; 65(10):2986\u201398.","journal-title":"IEEE Trans Comput"},{"key":"379_CR20","doi-asserted-by":"crossref","unstructured":"Ingre B, Yadav A. Performance analysis of NSL-KDD dataset using ANN. In: 2015 international conference on signal processing and communication engineering systems, IEEE; 2015. pp. 92\u20136.","DOI":"10.1109\/SPACES.2015.7058223"},{"key":"379_CR21","doi-asserted-by":"publisher","first-page":"113249","DOI":"10.1016\/j.eswa.2020.113249","volume":"148","author":"H Alazzam","year":"2020","unstructured":"Alazzam, H., Sharieh, A., Sabri, K.E.: A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst Appl. 2020;148:113249.","journal-title":"Expert Syst Appl"},{"issue":"1","key":"379_CR22","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s11071-016-2670-z","volume":"85","author":"Y Deng","year":"2016","unstructured":"Deng Y, Duan H. Control parameter design for automatic carrier landing system via pigeon-inspired optimization. Nonlinear Dyn. 2016; 85(1):97\u2013106.","journal-title":"Nonlinear Dyn"},{"key":"379_CR23","doi-asserted-by":"crossref","unstructured":"Janarthanan T, Zargari S. Feature selection in UNSW-NB15 and KDDCUP\u201999 datasets. In: 2017 IEEE 26th international symposium on industrial electronics (ISIE). IEEE; 2017. pp. 1881\u20131886.","DOI":"10.1109\/ISIE.2017.8001537"},{"issue":"2","key":"379_CR24","doi-asserted-by":"publisher","first-page":"1397","DOI":"10.1007\/s10586-019-03008-x","volume":"23","author":"V Kumar","year":"2020","unstructured":"Kumar V, Sinha D, Das AK, Pandey SC, Goswami RT. An integrated rule based intrusion detection system: analysis on UNSW-NB15 data set and the real time online dataset. Cluster Comput. 2020; 23(2):1397\u20131418.","journal-title":"Cluster Comput"},{"issue":"6","key":"379_CR25","doi-asserted-by":"publisher","first-page":"1046","DOI":"10.3390\/sym12061046","volume":"12","author":"O Almomani","year":"2020","unstructured":"Almomani O. A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms. Symmetry. 2020;12(6):1046.","journal-title":"Symmetry"},{"key":"379_CR26","doi-asserted-by":"crossref","unstructured":"Khan NM, Negi A, Thaseen IS, et\u00a0al. Analysis on improving the performance of machine learning models using feature selection technique. In: International conference on intelligent systems design and applications. Springer; 2018. pp. 69\u201377.","DOI":"10.1007\/978-3-030-16660-1_7"},{"key":"379_CR27","doi-asserted-by":"publisher","first-page":"94497","DOI":"10.1109\/ACCESS.2019.2928048","volume":"7","author":"BA Tama","year":"2019","unstructured":"Tama BA, Comuzzi M, Rhee, K-H: TSE-IDS: a two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 2019; 7:94497\u2013507.","journal-title":"IEEE Access"},{"key":"379_CR28","doi-asserted-by":"crossref","unstructured":"Zong W, Chow Y-W, Susilo W. A two-stage classifier approach for network intrusion detection. In: International conference on information security practice and experience. Springer; 2018. pp. 329\u2013340.","DOI":"10.1007\/978-3-319-99807-7_20"},{"issue":"6","key":"379_CR29","first-page":"389","volume":"8","author":"M Belouch","year":"2017","unstructured":"Belouch M, El Hadaj S, Idhammad M. A two-stage classifier approach using reptree algorithm for network intrusion detection. Int J Adv Comput Sci Appl. 2017;8(6):389\u201394","journal-title":"Int J Adv Comput Sci Appl"},{"issue":"7","key":"379_CR30","doi-asserted-by":"publisher","first-page":"1223","DOI":"10.3390\/en12071223","volume":"12","author":"J Gao","year":"2019","unstructured":"Gao J, Chai S, Zhang B, Xia Y. Research on network intrusion detection based on incremental extreme learning machine and adaptive principal component analysis. Energies 2019;12(7):1223.","journal-title":"Energies"},{"key":"379_CR31","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1016\/j.jpdc.2019.12.008","volume":"137","author":"AS Almogren","year":"2020","unstructured":"Almogren AS. Intrusion detection in edge-of-things computing. J Parallel Distrib Comput. 2020;137:259\u201365.","journal-title":"J Parallel Distrib Comput"},{"key":"379_CR32","doi-asserted-by":"publisher","first-page":"32464","DOI":"10.1109\/ACCESS.2020.2973730","volume":"8","author":"K Jiang","year":"2020","unstructured":"Jiang K, Wang W, Wang A, Wu H. Network intrusion detection combined hybrid sampling with deep hierarchical network. IEEE Access. 2020; 8:32464\u2013476.","journal-title":"IEEE Access"},{"key":"379_CR33","unstructured":"Scikit-Learn, Support Vector Machines. https:\/\/scikit-learn.org\/stable\/modules\/svm.html.\u00a0Accessed 25 Sept 2020."},{"issue":"5\u20136","key":"379_CR34","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1016\/S1532-0464(03)00034-0","volume":"35","author":"S Dreiseitl","year":"2002","unstructured":"Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform. 2002;35(5\u20136):352\u201359.","journal-title":"J Biomed Inform"},{"issue":"1","key":"379_CR35","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/S0167-7012(00)00201-3","volume":"43","author":"IA Basheer","year":"2000","unstructured":"Basheer IA, Hajmeer M. Artificial neural networks: fundamentals, computing, design, and application. J Microbiol Methods. 2000;43(1):3\u201331.","journal-title":"J Microbiol Methods"},{"key":"379_CR36","unstructured":"Li Y, Yuan Y. Convergence analysis of two-layer neural networks with relu activation. In: Advances in neural information processing systems; 2017. pp. 597\u2013607."},{"key":"379_CR37","doi-asserted-by":"publisher","DOI":"10.1142\/8868","volume-title":"Principles of artificial neural networks","author":"D Graupe","year":"2013","unstructured":"Graupe D. Principles of artificial neural networks, vol. 7. Singapore: World Scientific; 2013."},{"issue":"3","key":"379_CR38","doi-asserted-by":"publisher","first-page":"660","DOI":"10.1109\/21.97458","volume":"21","author":"SR Safavian","year":"1991","unstructured":"Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybern. 1991;21(3):660\u201374.","journal-title":"IEEE Trans Syst Man Cybern"},{"key":"379_CR39","unstructured":"Kuang Q, Zhao L. A practical GPU based KNN algorithm. In: Proceedings. The 2009 international symposium on computer science and computational technology (ISCSCI 2009). Citeseer; 2009. p. 151."},{"issue":"11","key":"379_CR40","doi-asserted-by":"publisher","first-page":"2159","DOI":"10.1109\/TPAMI.2014.25","volume":"36","author":"TE Schouten","year":"2014","unstructured":"Schouten TE, Van den Broek, EL. Fast exact euclidean distance (feed): a new class of adaptable distance transforms. IEEE Trans Pattern Anal Mach Intell. 2014;36(11):2159\u201372.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"379_CR41","doi-asserted-by":"crossref","unstructured":"Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS). IEEE; 2015. pp. 1\u20136.","DOI":"10.1109\/MilCIS.2015.7348942"},{"key":"379_CR42","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-2053-8","volume-title":"A survey of data leakage detection and prevention solutions","author":"A Shabtai","year":"2012","unstructured":"Shabtai A, Elovici Y, Rokach L. A survey of data leakage detection and prevention solutions. Berlin: Springer; 2012."},{"key":"379_CR43","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1016\/j.proenv.2011.12.040","volume":"11","author":"Z Liu","year":"2011","unstructured":"Liu Z, et al. A method of SVM with normalization in intrusion detection. Procedia Environ Sci. 2011;11:256\u201362.","journal-title":"Procedia Environ Sci"},{"key":"379_CR44","unstructured":"Scikit-Learn,\u00a0 Gradient Boosting Classifier. https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.GradientBoostingClassifier.html. Accessed 26 Sept 2020."},{"key":"379_CR45","unstructured":"Scikit Learn, Machine Learning in Python.\u00a0https:\/\/scikit-learn.org\/stable.\u00a0Accessed 26 Sept 2020."},{"key":"379_CR46","unstructured":"UNSW-NB15, Intrusion Detection Dataset. https:\/\/www.unsw.adfa.edu.au\/unsw-canberra-cyber\/cybersecurity\/ADFA-NB15-Datasets\/. Accessed 26 Sept 2020."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00379-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-020-00379-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00379-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,25]],"date-time":"2020-11-25T11:33:05Z","timestamp":1606303985000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00379-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,25]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["379"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00379-6","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,25]]},"assertion":[{"value":"30 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 November 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"105"}}