{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T02:17:25Z","timestamp":1771467445918,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,2,26]],"date-time":"2022-02-26T00:00:00Z","timestamp":1645833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this paper, we addressed the problem of dataset scarcity for the task of network intrusion detection. Our main contribution was to develop a framework that provides a complete process for generating network traffic datasets based on the aggregation of real network traces. In addition, we proposed a set of tools for attribute extraction and labeling of traffic sessions. A new dataset with botnet network traffic was generated by the framework to assess our proposed method with machine learning algorithms suitable for unbalanced data. The performance of the classifiers was evaluated in terms of macro-averages of F1-score (0.97) and the Matthews Correlation Coefficient (0.94), showing a good overall performance average.<\/jats:p>","DOI":"10.3390\/s22051847","type":"journal-article","created":{"date-parts":[[2022,2,27]],"date-time":"2022-02-27T20:48:33Z","timestamp":1645994913000},"page":"1847","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["A Novel Framework for Generating Personalized Network Datasets for NIDS Based on Traffic Aggregation"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1211-1061","authenticated-orcid":false,"given":"Pablo","family":"Velarde-Alvarado","sequence":"first","affiliation":[{"name":"Unidad Acad\u00e9mica de Ciencias B\u00e1sicas e Ingenier\u00edas, Universidad Aut\u00f3noma de Nayarit, Tepic 63000, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7322-4019","authenticated-orcid":false,"given":"Hugo","family":"Gonzalez","sequence":"additional","affiliation":[{"name":"Academia de Tecnolog\u00edas de la Informaci\u00f3n y Telem\u00e1tica, Universidad Polit\u00e9cnica de San Luis Potos\u00ed, San Luis Potos\u00ed 78363, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2188-9892","authenticated-orcid":false,"given":"Rafael","family":"Mart\u00ednez-Pel\u00e1ez","sequence":"additional","affiliation":[{"name":"Facultad de Ingenier\u00edas y Tecnolog\u00edas, Universidad De La Salle Baj\u00edo, Av. Universidad 602, Le\u00f3n 37150, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3244-0129","authenticated-orcid":false,"given":"Luis J.","family":"Mena","sequence":"additional","affiliation":[{"name":"Unidad Acad\u00e9mica de Computaci\u00f3n, Universidad Polit\u00e9cnica de Sinaloa, Ctra. Libre Mazatl\u00e1n Higueras Km 3, Mazatl\u00e1n 82199, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3327-8562","authenticated-orcid":false,"given":"Alberto","family":"Ochoa-Brust","sequence":"additional","affiliation":[{"name":"Facultad de Ingenier\u00eda Mec\u00e1nica y El\u00e9ctrica, Universidad de Colima, Av. Universidad 333, Colima 28040, Mexico"}]},{"given":"Efra\u00edn","family":"Moreno-Garc\u00eda","sequence":"additional","affiliation":[{"name":"Direcci\u00f3n de Posgrado e investigaci\u00f3n, Instituto Tecnol\u00f3gico de Tepic, Tepic 63175, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9118-8042","authenticated-orcid":false,"given":"Vanessa G.","family":"F\u00e9lix","sequence":"additional","affiliation":[{"name":"Unidad Acad\u00e9mica de Computaci\u00f3n, Universidad Polit\u00e9cnica de Sinaloa, Ctra. Libre Mazatl\u00e1n Higueras Km 3, Mazatl\u00e1n 82199, Mexico"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1429-3660","authenticated-orcid":false,"given":"Rodolfo","family":"Ostos","sequence":"additional","affiliation":[{"name":"Unidad Acad\u00e9mica de Computaci\u00f3n, Universidad Polit\u00e9cnica de Sinaloa, Ctra. Libre Mazatl\u00e1n Higueras Km 3, Mazatl\u00e1n 82199, Mexico"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,26]]},"reference":[{"key":"ref_1","unstructured":"Singh, G., and Khare, N. (2021). A survey of intrusion detection from the perspective of intrusion datasets and machine learning techniques. Int. J. Comput. Appl., 1\u201311."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"e4150","DOI":"10.1002\/ett.4150","article-title":"Network intrusion detection system: A systematic study of machine learning and deep learning approaches","volume":"32","author":"Ahmad","year":"2021","journal-title":"Trans. Emerg. Telecommun. Technol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1007\/s11276-020-02529-3","article-title":"Intrusion detection techniques in network environment: A systematic review","volume":"27","author":"Ayyagari","year":"2021","journal-title":"Wirel. Netw."},{"key":"ref_4","unstructured":"Goutam, R.K. (2021). Cybersecurity Fundamentals: Understand the Role of Cybersecurity, Its Importance and Modern Techniques Used by Cybersecurity Professionals (English Edition), BPB Publications."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1016\/j.cose.2011.12.012","article-title":"Toward developing a systematic approach to generate benchmark datasets for intrusion detection","volume":"31","author":"Shiravi","year":"2012","journal-title":"Comput. Secur."},{"key":"ref_6","unstructured":"(2021, December 25). Canadian Institute for Cybersecurity. NSL-KDD. Available online: https:\/\/www.unb.ca\/cic\/datasets\/nsl.html."},{"key":"ref_7","unstructured":"(2021, October 25). Argus. Available online: https:\/\/openargus.org\/."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Hussain, F., Abbas, S.G., Fayyaz, U.U., Shah, G.A., Toqeer, A., and Ali, A. (2020, January 5\u20137). Towards a Universal Features Set for IoT Botnet Attacks Detection. Proceedings of the 2020 IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan.","DOI":"10.1109\/INMIC50486.2020.9318106"},{"key":"ref_9","unstructured":"MIT Lincoln Laboratory (2021, December 26). 1998 DARPA Intrusion Detection Evaluation Dataset. Available online: https:\/\/www.ll.mit.edu\/r-d\/datasets\/1998-darpa-intrusion-detection-evaluation-dataset."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Thomas, C., Sharma, V., and Balakrishnan, N. (2008, January 16\u201320). Usefulness of DARPA dataset for intrusion detection system evaluation. Proceedings of the Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security, Orlando, FL, USA.","DOI":"10.1117\/12.777341"},{"key":"ref_11","unstructured":"Al-Dhafian, B., Ahmad, I., and Al-Ghamid, A. (July, January 27\u2013). An Overview of the Current Classification Techniques. Proceedings of the International Conference on Security and Management, Las Vegas, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1016\/S1389-1286(00)00139-0","article-title":"The 1999 DARPA off-line intrusion detection evaluation","volume":"34","author":"Lippmann","year":"2000","journal-title":"Comput. Netw."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1007\/s12652-020-02014-x","article-title":"A survey of neural networks usage for intrusion detection systems","volume":"12","year":"2021","journal-title":"J. Ambient. Intell. Humaniz. Comput."},{"key":"ref_14","unstructured":"UCI Knowledge Discovery in Databases (2021, October 20). KDD Cup 1999 Data. Available online: https:\/\/kdd.ics.uci.edu\/databases\/kddcup99\/kddcup99.html."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"580","DOI":"10.5937\/vojtehg66-16670","article-title":"Review of KDD Cup\u201999, NSL-KDD and Kyoto 2006+ datasets","volume":"66","year":"2018","journal-title":"Vojnoteh. Glas."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., and Nakao, K. (2011, January 10). Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, Salzburg, Austria.","DOI":"10.1145\/1978672.1978676"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1016\/j.cose.2014.05.011","article-title":"An empirical comparison of botnet detection methods","volume":"45","author":"Grill","year":"2014","journal-title":"Comput. Secur."},{"key":"ref_18","unstructured":"The CTU-13 Dataset (2021, October 24). A Labeled Dataset with Botnet, Normal and Background Traffic. Available online: https:\/\/www.stratosphereips.org\/datasets-ctu13."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kim, J., Sim, C., and Choi, J. (2019, January 24\u201328). Generating Labeled Flow Data from MAWILab Traces for Network Intrusion Detection. Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics, Phoenix, AZ, USA.","DOI":"10.1145\/3322798.3329251"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3424155","article-title":"On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection","volume":"24","author":"Cordero","year":"2021","journal-title":"ACM Trans. Priv. Secur."},{"key":"ref_21","first-page":"256","article-title":"Novel Bi-directional Flow-based Traffic Generation Framework for IDS Evaluation and Exploratory Data Analysis","volume":"29","author":"Wilailux","year":"2021","journal-title":"J. Inf. Process."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fontugne, R., Borgnat, P., Abry, P., and Fukuda, K. (2010, January 30). MAWILab: Combining Diverse Anomaly Detectors for Automated Anomaly Labeling and Performance Benchmarking. Proceedings of the ACM CoNEXT \u201910, Philadelphia, PA, USA.","DOI":"10.1145\/1921168.1921179"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3531","DOI":"10.1016\/j.comnet.2012.02.019","article-title":"A tool for the generation of realistic network workload for emerging networking scenarios","volume":"56","author":"Botta","year":"2012","journal-title":"Comput. Netw."},{"key":"ref_24","unstructured":"(2022, January 25). Kali: The most advanced Penetration Testing Distribution. Available online: https:\/\/www.kali.org."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Sadiku, M.N.O., and Musa, S.M. (2013). Self-Similarity of Network Traffic. Performance Analysis of Computer Networks, Springer International Publishing.","DOI":"10.1007\/978-3-319-01646-7_10"},{"key":"ref_26","unstructured":"Roesch, M. (1999, January 7\u201312). Snort: Lightweight Intrusion Detection for Networks. Proceedings of the LISA\u201999: 13th USENIX Conference on System Administration, Berkeley, CA, USA."},{"key":"ref_27","unstructured":"Au, H., and Lee, K. (2017, January 29\u201330). Graph Database Technology and k-Means Clustering for Digital Forensics. Proceedings of the European Conference on Cyber Warfare and Security, Dublin, Ireland."},{"key":"ref_28","unstructured":"(2021, October 30). NETRESEC: Publicly Available PCAP Files. Available online: https:\/\/www.netresec.com\/?page=pcapfiles."},{"key":"ref_29","unstructured":"(2021, October 30). Malware Traffic Analysis: A Source for Pcap Files and Malware Samples\u2026. Available online: https:\/\/www.malware-traffic-analysis.net\/."},{"key":"ref_30","unstructured":"(2021, October 30). Stratosphere Lab: Datasets Overview. Available online: https:\/\/www.stratosphereips.org\/datasets-overview."},{"key":"ref_31","unstructured":"Canadian Institute for Cybersecurity (2021, October 24). CICFlowMeter. Available online: https:\/\/github.com\/CanadianInstituteForCybersecurity\/CICFlowMeter."},{"key":"ref_32","unstructured":"Topasna, K. (2021, October 28). Flowmeter Tool. Available online: https:\/\/github.com\/alekzandr\/flowmeter."},{"key":"ref_33","first-page":"831","article-title":"Principles of risk minimization for learning theory","volume":"1992","author":"Vapnik","year":"1992","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_34","first-page":"2635","article-title":"Learnability, stability and uniform convergence","volume":"11","author":"Shamir","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1142\/S0218213009000135","article-title":"Symbolic one-class learning from imbalanced datasets: Application in medical diagnosis","volume":"18","author":"Mena","year":"2009","journal-title":"Int. J. Artif. Intell. Tools"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"102499","DOI":"10.1016\/j.cose.2021.102499","article-title":"CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems","volume":"112","author":"Gupta","year":"2022","journal-title":"Comput. Secur."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"He, H., and Ma, Y. (2013). Imbalanced Learning: Foundations, Algorithms, and Applications, John Wiley & Sons.","DOI":"10.1002\/9781118646106"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"91038","DOI":"10.1109\/ACCESS.2021.3092054","article-title":"Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks","volume":"9","author":"Panda","year":"2021","journal-title":"IEEE Access"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Bansal, A., and Mahapatra, S. (2017, January 13\u201315). A Comparative Analysis of Machine Learning Techniques for Botnet Detection. Proceedings of the 10th International Conference on Security of Information and Networks, Jaipur, India.","DOI":"10.1145\/3136825.3136874"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"key":"ref_41","unstructured":"Kubat, M., and Matwin, S. (1997, January 8\u201312). Addressing the curse of imbalanced training sets: One-sided selection. Proceedings of the ICML, Nashville, TN, USA."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1109\/TIT.1968.1054155","article-title":"The condensed nearest neighbor rule (corresp.)","volume":"14","author":"Hart","year":"1968","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"e127","DOI":"10.7717\/peerj-cs.127","article-title":"Accelerating the XGBoost algorithm using GPU computing","volume":"3","author":"Mitchell","year":"2017","journal-title":"PeerJ Comput. Sci."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2401","DOI":"10.1016\/j.neucom.2017.11.018","article-title":"A LSTM based framework for handling multiclass imbalance in DGA botnet detection","volume":"275","author":"Tran","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1694","DOI":"10.1007\/s10922-020-09554-9","article-title":"A Two-Stream Network Based on Capsule Networks and Sliced Recurrent Neural Networks for DGA Botnet Detection","volume":"28","author":"Pei","year":"2020","journal-title":"J. Netw. Syst. Manag."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"102549","DOI":"10.1016\/j.cose.2021.102549","article-title":"On Detecting and Classifying DGA Botnets and their Families","volume":"113","author":"Tuan","year":"2022","journal-title":"Comput. Secur."},{"key":"ref_47","unstructured":"Christopher, D.M., Prabhakar, R., and Hinrich, S. (2008). Introduction to Information Retrieval, Cambridge University Press."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Chicco, D., and Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom., 21.","DOI":"10.1186\/s12864-019-6413-7"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/j.patcog.2019.02.023","article-title":"The impact of class imbalance in classification performance metrics based on the binary confusion matrix","volume":"91","author":"Luque","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13040-021-00244-z","article-title":"The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation","volume":"14","author":"Chicco","year":"2021","journal-title":"BioData Min."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/j.patrec.2020.03.030","article-title":"On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset","volume":"136","author":"Zhu","year":"2020","journal-title":"Pattern Recognit. Lett."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/5\/1847\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:27:58Z","timestamp":1760135278000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/5\/1847"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,26]]},"references-count":51,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22051847"],"URL":"https:\/\/doi.org\/10.3390\/s22051847","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,26]]}}}