{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T07:15:12Z","timestamp":1776496512075,"version":"3.51.2"},"reference-count":57,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2021,6,24]],"date-time":"2021-06-24T00:00:00Z","timestamp":1624492800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"H2020-EU.2.1.1.","award":["833042"],"award-info":[{"award-number":["833042"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Cybersecurity is an arms race, with both the security and the adversaries attempting to outsmart one another, coming up with new attacks, new ways to defend against those attacks, and again with new ways to circumvent those defences. This situation creates a constant need for novel, realistic cybersecurity datasets. This paper introduces the effects of using machine-learning-based intrusion detection methods in network traffic coming from a real-life architecture. The main contribution of this work is a dataset coming from a real-world, academic network. Real-life traffic was collected and, after performing a series of attacks, a dataset was assembled. The dataset contains 44 network features and an unbalanced distribution of classes. In this work, the capability of the dataset for formulating machine-learning-based models was experimentally evaluated. To investigate the stability of the obtained models, cross-validation was performed, and an array of detection metrics were reported. The gathered dataset is part of an effort to bring security against novel cyberthreats and was completed in the SIMARGL project.<\/jats:p>","DOI":"10.3390\/s21134319","type":"journal-article","created":{"date-parts":[[2021,6,24]],"date-time":"2021-06-24T11:01:38Z","timestamp":1624532498000},"page":"4319","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["The Proposition and Evaluation of the RoEduNet-SIMARGL2021 Network Intrusion Detection Dataset"],"prefix":"10.3390","volume":"21","author":[{"given":"Maria-Elena","family":"Mihailescu","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, 060042 Bucharest, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Darius","family":"Mihai","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, 060042 Bucharest, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mihai","family":"Carabas","sequence":"additional","affiliation":[{"name":"RoEduNet, Strada Mendeleev 21-25, 010362 Bucharest, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miko\u0142aj","family":"Komisarek","sequence":"additional","affiliation":[{"name":"ITTI Sp. z o.o., ul. Rubie\u017c 46, 61-612 Pozna\u0144, Poland"},{"name":"Institute of Telecommunications and Computer Science, UTP University of Science and Technology, 85-796 Bydgoszcz, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marek","family":"Pawlicki","sequence":"additional","affiliation":[{"name":"ITTI Sp. z o.o., ul. Rubie\u017c 46, 61-612 Pozna\u0144, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Witold","family":"Ho\u0142ubowicz","sequence":"additional","affiliation":[{"name":"Institute of Telecommunications and Computer Science, UTP University of Science and Technology, 85-796 Bydgoszcz, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rafa\u0142","family":"Kozik","sequence":"additional","affiliation":[{"name":"ITTI Sp. z o.o., ul. Rubie\u017c 46, 61-612 Pozna\u0144, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.jnca.2017.02.009","article-title":"A survey of intrusion detection in Internet of Things","volume":"84","author":"Miani","year":"2017","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1007\/s12652-015-0283-x","article-title":"Advanced services for critical infrastructures protection","volume":"6","author":"Kozik","year":"2015","journal-title":"J. Ambient. Intell. Humaniz. Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/j.jocs.2017.03.025","article-title":"Simulation platform for cyber-security and vulnerability analysis of critical infrastructures","volume":"22","author":"Ficco","year":"2017","journal-title":"J. Comput. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"5371","DOI":"10.1109\/ACCESS.2020.3048319","article-title":"Tight Arms Race: Overview of Current Malware Threats and Trends in Their Detection","volume":"9","author":"Caviglione","year":"2021","journal-title":"IEEE Access"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/j.neucom.2020.07.138","article-title":"Intrusion detection approach based on optimised artificial neural network","volume":"452","author":"Pawlicki","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kozik, R., Pawlicki, M., and Chora\u015b, M. (2021). A new method of hybrid time window embedding with transformer-based traffic data classification in IoT-networked environment. Pattern Anal. Appl., 1\u20139.","DOI":"10.1007\/s10044-021-00980-2"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Dutta, V., Choras, M., Pawlicki, M., and Kozik, R. (2020). A Deep Learning Ensemble for Network Anomaly and Cyber-Attack Detection. Sensors, 20.","DOI":"10.3390\/s20164583"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1186\/s42400-019-0038-7","article-title":"Survey of intrusion detection systems: Techniques, datasets and challenges","volume":"2","author":"Khraisat","year":"2019","journal-title":"Cybersecurity"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"106301","DOI":"10.1016\/j.asoc.2020.106301","article-title":"A survey and taxonomy of the fuzzy signature-based Intrusion Detection Systems","volume":"92","author":"Masdari","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Daniya, T., Suresh Kumar, K., Santhosh Kumar, B., and Sekhar Kolli, C. (2021). A survey on anomaly based intrusion detection system. Mater. Today Proc.","DOI":"10.1016\/j.matpr.2021.03.353"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"102289","DOI":"10.1016\/j.cose.2021.102289","article-title":"A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM","volume":"106","author":"Liu","year":"2021","journal-title":"Comput. Secur."},{"key":"ref_12","unstructured":"He, H., Bai, Y., Garcia, E.A., and Li, S. (2008, January 1\u20136). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China."},{"key":"ref_13","first-page":"3146","article-title":"Lightgbm: A highly efficient gradient boosting decision tree","volume":"30","author":"Ke","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A.A. (2009, January 8\u201310). A detailed analysis of the KDD CUP 99 data set. Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada.","DOI":"10.1109\/CISDA.2009.5356528"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Moustafa, N., and Slay, J. (2015, January 10\u201312). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia.","DOI":"10.1109\/MilCIS.2015.7348942"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sharafaldin, I., Habibi Lashkari, A., and Ghorbani, A.A. (2018, January 22\u201324). Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. Proceedings of the 4th International Conference on Information Systems Security and Privacy\u2014Volume 1: ICISSP, INSTICC, SciTePress, Madeira, Portugal.","DOI":"10.5220\/0006639801080116"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"102151","DOI":"10.1016\/j.cose.2020.102151","article-title":"RNNIDS: Enhancing network intrusion detection systems through deep learning","volume":"102","author":"Sohi","year":"2021","journal-title":"Comput. Secur."},{"key":"ref_18","first-page":"e00497","article-title":"Network intrusion detection system using supervised learning paradigm","volume":"9","author":"Mebawondu","year":"2020","journal-title":"Sci. Afr."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"107247","DOI":"10.1016\/j.comnet.2020.107247","article-title":"Building an efficient intrusion detection system based on feature selection and ensemble classifier","volume":"174","author":"Zhou","year":"2020","journal-title":"Comput. Netw."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1109\/COMST.2015.2402161","article-title":"Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset","volume":"18","author":"Kolias","year":"2016","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/j.cose.2014.05.011","article-title":"An Empirical Comparison of Botnet Detection Methods","volume":"45","author":"Grill","year":"2014","journal-title":"Comput. Secur."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Damasevicius, R., Venckauskas, A., Grigaliunas, S., Toldinas, J., Morkevicius, N., Aleliunas, T., and Smuikys, P. (2020). LITNET-2020: An Annotated Real-World Network Flow Dataset for Network Intrusion Detection. Electronics, 9.","DOI":"10.3390\/electronics9050800"},{"key":"ref_23","unstructured":"McCanne, S. (2021, May 20). libpcap: An Architecture and Optimization Methodology for Packet Capture. Available online: http:\/\/sharkfest.wireshark.org\/sharkfest.11\/presentations\/McCanne-Sharkfest%2711_Keynote_Address.pdf."},{"key":"ref_24","unstructured":"(2021, May 20). Okiru Malware Puts Billions of Connected Devices at Risk. Available online: https:\/\/searchsecurity.techtarget.com\/news\/252433491\/Okiru-malware-puts-billions-of-connected-devices-at-risk."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1109\/MC.2017.201","article-title":"DDoS in the IoT: Mirai and other botnets","volume":"50","author":"Kolias","year":"2017","journal-title":"Computer"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Alomari, E., Manickam, S., Gupta, B., Karuppayah, S., and Alfaris, R. (2012). Botnet-based distributed denial of service (DDoS) attacks on web servers: Classification and art. arXiv.","DOI":"10.5120\/7640-0724"},{"key":"ref_27","unstructured":"Lee, C.B., Roedel, C., and Silenok, E. (2003). Detection and Characterization of Port Scan Attacks, Univeristy of California, Department of Computer Science and Engineering."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Allen, L., Heriyanto, T., and Ali, S. (2014). Kali Linux\u2014Assuring Security by Penetration Testing, Packt Publishing Ltd.","DOI":"10.1016\/S1353-4858(14)70077-7"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Haja, D., Szabo, M., Szalay, M., Nagy, A., Kern, A., Toka, L., and Sonkoly, B. (2018, January 15\u201319). How to orchestrate a distributed OpenStack. Proceedings of the IEEE INFOCOM 2018\u2014IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, USA.","DOI":"10.1109\/INFCOMW.2018.8407014"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Tesliuk, A., Bobkov, S., Ilyin, V., Novikov, A., Poyda, A., and Velikhov, V. (2019, January 5\u20136). Kubernetes Container Orchestration as a Framework for Flexible and Effective Scientific Data Analysis. Proceedings of the 2019 Ivannikov Ispras Open Conference (ISPRAS), Moscow, Russia.","DOI":"10.1109\/ISPRAS47671.2019.00016"},{"key":"ref_31","unstructured":"Lyon, G.F. (2009). Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning, Insecure."},{"key":"ref_32","unstructured":"(2021, May 20). robertdavidgraham\/masscan: TCP Port Scanner, Spews SYN Packets Asynchronously, Scanning Entire Internet in under 5 Minutes. Available online: https:\/\/github.com\/robertdavidgraham\/masscan."},{"key":"ref_33","unstructured":"(2021, May 20). CAPEC\u2014CAPEC-287: TCP SYN Scan (Version 3.4). Available online: https:\/\/capec.mitre.org\/data\/definitions\/287.html."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Tarasov, Y., Pakulova, E., and Basov, O. (2019, January 12\u201315). Modeling of Low-Rate DDoS-Attacks. Proceedings of the 12th International Conference on Security of Information and Networks, (SIN\u201919), Sochi, Russian.","DOI":"10.1145\/3357613.3357638"},{"key":"ref_35","unstructured":"Najafabadi, M.M., Khoshgoftaar, T.M., Napolitano, A., and Wheelus, C. (2016, January 16\u201318). Rudy attack: Detection at the network level and its important features. Proceedings of the Twenty-Ninth International Flairs Conference, Key Largo, FL, USA."},{"key":"ref_36","unstructured":"(2021, May 20). Apache Kafka. Available online: https:\/\/kafka.apache.org\/."},{"key":"ref_37","unstructured":"Deri, L., Martinelli, M., and Cardigliano, A. (2014, January 9\u201314). Realtime high-speed network traffic monitoring using ntopng. Proceedings of the 28th Large Installation System Administration Conference (LISA14), Seattle, WA, USA."},{"key":"ref_38","first-page":"3","article-title":"Machine Learning Based Approach to Anomaly and Cyberattack Detection in Streamed Network Traffic Data","volume":"12","author":"Komisarek","year":"2021","journal-title":"J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.compeleceng.2013.11.024","article-title":"A survey on feature selection methods","volume":"40","author":"Chandrashekar","year":"2014","journal-title":"Comput. Electr. Eng."},{"key":"ref_40","unstructured":"Longadge, R., and Dongre, S. (2013). Class imbalance problem in data mining review. arXiv."},{"key":"ref_41","unstructured":"Burduk, R. (2020). Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1016\/j.ins.2019.11.004","article-title":"Data imbalance in classification: Experimental evaluation","volume":"513","author":"Thabtah","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic Minority Over-sampling Technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res. (JAIR)"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Mukherjee, M., and Khushi, M. (2021). SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. Appl. Syst. Innov., 4.","DOI":"10.3390\/asi4010018"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12864-019-6413-7","article-title":"The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation","volume":"21","author":"Chicco","year":"2020","journal-title":"BMC Genom."},{"key":"ref_46","unstructured":"(2021, May 20). sklearn.feature_selection.SelectKBest \u2014Scikit-learn 0.24.2 Documentation. Available online: https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_selection.SelectKBest.html."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/BF02478259","article-title":"A logical calculus of the ideas immanent in nervous activity","volume":"5","author":"McCulloch","year":"1943","journal-title":"Bull. Math. Biophys."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"947","DOI":"10.2514\/8.5282","article-title":"Gradient theory of optimal flight paths","volume":"30","author":"Kelley","year":"1960","journal-title":"ARS J."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_50","unstructured":"Ho, T.K. (1995, January 14\u201316). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_52","first-page":"1612","article-title":"A short introduction to boosting","volume":"14","author":"Freund","year":"1999","journal-title":"J. Jpn. Soc. Artif. Intell."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Pawlicki, M., Chora\u015b, M., Kozik, R., and Ho\u0142ubowicz, W. (2020). On the Impact of Network Data Balancing in Cybersecurity Applications. International Conference on Computational Science, Springer Nature.","DOI":"10.1007\/978-3-030-50423-6_15"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Kozik, R., Pawlicki, M., and Chora\u015b, M. (2018). Cost-sensitive distributed machine learning for netflow-based botnet activity detection. Secur. Commun. Netw., 2018.","DOI":"10.1155\/2018\/8753870"},{"key":"ref_56","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1007\/s00362-012-0443-4","article-title":"A generalization of the Wilcoxon signed-rank test and its applications","volume":"54","author":"Taheri","year":"2013","journal-title":"Stat. Pap."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/13\/4319\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:22:57Z","timestamp":1760163777000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/13\/4319"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,24]]},"references-count":57,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["s21134319"],"URL":"https:\/\/doi.org\/10.3390\/s21134319","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,24]]}}}