{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T09:21:46Z","timestamp":1778145706610,"version":"3.51.4"},"reference-count":39,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,9,13]],"date-time":"2023-09-13T00:00:00Z","timestamp":1694563200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Agency for Defense Development Institute","award":["9129156"],"award-info":[{"award-number":["9129156"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Recent advances in the Internet and digital technology have brought a wide variety of activities into cyberspace, but they have also brought a surge in cyberattacks, making it more important than ever to detect and prevent cyberattacks. In this study, a method is proposed to detect anomalies in cyberspace by consolidating BGP (Border Gateway Protocol) data into numerical data that can be trained by machine learning (ML) through a tokenizer. BGP data comprise a mix of numeric and textual data, making it challenging for ML models to learn. To convert the data into a numerical format, a tokenizer, a preprocessing technique from Natural Language Processing (NLP), was employed. This process goes beyond merely replacing letters with numbers; its objective is to preserve the patterns and characteristics of the data. The Synthetic Minority Over-sampling Technique (SMOTE) was subsequently applied to address the issue of imbalanced data. Anomaly detection experiments were conducted on the model using various ML algorithms such as One-Class Support Vector Machine (One-SVM), Convolutional Neural Network\u2013Long Short-Term Memory (CNN\u2013LSTM), Random Forest (RF), and Autoencoder (AE), and excellent performance in detection was demonstrated. In experiments, it performed best with the AE model, with an F1-Score of 0.99. In terms of the Area Under the Receiver Operating Characteristic (AUROC) curve, good performance was achieved by all ML models, with an average of over 90%. Improved cybersecurity is expected to be contributed by this research, as it enables the detection and monitoring of cyber anomalies from malicious users through BGP data.<\/jats:p>","DOI":"10.3390\/info14090501","type":"journal-article","created":{"date-parts":[[2023,9,13]],"date-time":"2023-09-13T05:31:28Z","timestamp":1694583088000},"page":"501","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["BGP Dataset-Based Malicious User Activity Detection Using Machine Learning"],"prefix":"10.3390","volume":"14","author":[{"given":"Hansol","family":"Park","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea"},{"name":"Department of Convergence Engineering for Intelligent Drones, Sejong University, Seoul 05006, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9094-4053","authenticated-orcid":false,"given":"Kookjin","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea"},{"name":"Department of Convergence Engineering for Intelligent Drones, Sejong University, Seoul 05006, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8621-715X","authenticated-orcid":false,"given":"Dongil","family":"Shin","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea"},{"name":"Department of Convergence Engineering for Intelligent Drones, Sejong University, Seoul 05006, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2665-3339","authenticated-orcid":false,"given":"Dongkyoo","family":"Shin","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea"},{"name":"Department of Convergence Engineering for Intelligent Drones, Sejong University, Seoul 05006, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,13]]},"reference":[{"key":"ref_1","unstructured":"(2023, April 26). Check Point: Third Quarter of 2022 Reveals Increase in Cyberattacks and Unexpected Developments in Global Trends. Available online: https:\/\/blog.checkpoint.com\/2022\/10\/26\/third-quarter-of-2022-reveals-increase-in-cyberattacks\/."},{"key":"ref_2","unstructured":"Scott, K.D. (2018). Joint Publication (JP) 3\u201312 Cyberspace Operation, The Joint Staff."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"10761","DOI":"10.3390\/app122110761","article-title":"Malicious file detection method using machine learning and interworking with MITRE ATT&CK framework","volume":"21","author":"Ahn","year":"2022","journal-title":"Appl. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Rekhter, Y., Li, T., and Hares, S. (2006). A Border Gateway Protocol 4 (BGP-4), Internet Engineering Task Force. No. rfc4271.","DOI":"10.17487\/rfc4271"},{"key":"ref_5","unstructured":"Lad, M., Massey, D., Pei, D., Wu, Y., Zhang, B., and Zhang, L. (August, January 31). PHAS: A Prefix Hijack Alert System. Proceedings of the 15th USENIX Security Symposium, Vancouver, BC, Canada."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Comarela, G., and Crovella, M. (2014, January 5\u20137). Identifying and analyzing high impact routing events with PathMiner. Proceedings of the 2014 Conference on Internet Measurement Conference, Vancouver, BC, Canada.","DOI":"10.1145\/2663716.2663754"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"McGlynn, K., Acharya, H.B., and Kwon, M. (May, January 29). Detecting BGP route anomalies with deep learning. Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Paris, France.","DOI":"10.1109\/INFCOMW.2019.8845138"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chen, Z., Yeo, C.K., Lee, B.S., and Lau, C.T. (2018, January 17\u201320). Autoencoder-based network anomaly detection. Proceedings of the 2018 Wireless Telecommunications Symposium (WTS), Phoenix, AZ, USA.","DOI":"10.1109\/WTS.2018.8363930"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Copstein, R., and Zincir-Heywood, N. (2020, January 2\u20136). Temporal representations for detecting BGP blackjack attacks. Proceedings of the 2020 16th International Conference on Network and Service Management (CNSM), Izmir, Turkey.","DOI":"10.23919\/CNSM50824.2020.9269055"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1016\/j.procs.2020.03.367","article-title":"Analysis of KDD-Cup\u201999, NSL-KDD and UNSW-NB15 datasets using deep learning in IoT","volume":"167","author":"Choudhary","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_11","unstructured":"Zhang, J., Zheng, Y., Qi, D., Li, R., and Yi, X. (November, January 31). DNN-based prediction model for spatio-temporal data. Proceedings of the ACM Sigspatial International Conference on Advances in Geographic Information Systems, San Francisco, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Moustafa, N., and Slay, J. (2015, January 10\u201312). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia.","DOI":"10.1109\/MilCIS.2015.7348942"},{"key":"ref_13","first-page":"446","article-title":"A study on NSL-KDD dataset for intrusion detection system based on classification algorithms","volume":"4","author":"Dhanabal","year":"2015","journal-title":"Int. J. Adv. Res. Comput. Commun. Eng."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A.A. (2009, January 8\u201310). A detailed analysis of the KDD CUP 99 data set. Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Ottawa, ON, Canada.","DOI":"10.1109\/CISDA.2009.5356528"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"6032","DOI":"10.1109\/TVT.2022.3165526","article-title":"Event-Based Anomaly Detection Using a One-Class SVM for a Hybrid Electric Vehicle","volume":"71","author":"Ji","year":"2022","journal-title":"IEEE Trans. Vehic. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.patcog.2016.03.028","article-title":"High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning","volume":"58","author":"Sarah","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"99837","DOI":"10.1109\/ACCESS.2022.3206425","article-title":"CNN-LSTM: Hybrid Deep Neural Network for Network Intrusion Detection System","volume":"10","author":"Halbouni","year":"2022","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yulianto, A., Sukarno, P., and Suwastika, N.A. (2018, January 15\u201316). Improving Adaboost-Based Intrusion Detection System (IDS) Performance on CIC IDS 2017 Dataset. Proceedings of the 2nd International Conference on Data and Information Science, Bandung, Indonesia.","DOI":"10.1088\/1742-6596\/1192\/1\/012018"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4731953","DOI":"10.1155\/2016\/4731953","article-title":"WSN-DS: A Dataset for Intrusion Detection Systems in Wireless Sensor Networks","volume":"2016","author":"Almomani","year":"2016","journal-title":"J. Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.eswa.2018.04.004","article-title":"Web Traffic Anomaly Detection Using C-LSTM Neural Networks","volume":"106","author":"Kim","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_21","unstructured":"Wright, R.E. (1995). Reading and Understanding Multivariate Statistics, American Psychological Association."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1016\/j.proeng.2012.01.849","article-title":"Network anomaly detection by cascading k-Means clustering and C4. 5 decision tree algorithms","volume":"30","author":"Muniyandi","year":"2012","journal-title":"Procedia Eng."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Anton, S.D.D., Sinha, S., and Schotten, H.D. (2019, January 19\u201321). Anomaly-based intrusion detection in industrial data with SVM and random forests. Proceedings of the 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia.","DOI":"10.23919\/SOFTCOM.2019.8903672"},{"key":"ref_24","unstructured":"Morris, T.H., Thornton, Z., and Turnipseed, I. (2015, January 3\u20134). Industrial control system simulation and data logging for intrusion detection system research. Proceedings of the 7th Annual Southeastern Cyber Security Summit, Huntsville, AL, USA."},{"key":"ref_25","unstructured":"Anton, S.D., Gundall, M., Fraunholz, D., and Schotten, H.D. (March, January 28). Implementing scada scenarios and introducing attacks to obtain training data for intrusion detection methods. Proceedings of the ICCWS 2019 14th International Conference on Cyber Warfare and Security: ICCWS 2019, Stellenbosch, South Africa."},{"key":"ref_26","unstructured":"Zhang, X., Gu, C., and Lin, J. (2006, January 21\u201323). Support vector machines for anomaly detection. Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_28","unstructured":"Yassin, W., Udzir, N.I., Muda, Z., and Sulaiman, M.N. (2013, January 28\u201330). Anomaly-based intrusion detection through k-means clustering and naives Bayes classification. Proceedings of the 4th International Conference on Computing and Informatics, ICOCI, Kuching, Malaysia."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1109\/TSMCC.2008.923876","article-title":"Random-forests-based network intrusion detection systems","volume":"38","author":"Zhang","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"10880","DOI":"10.1109\/TVT.2021.3106940","article-title":"Anomaly Detection for In-Vehicle Network Using CNN-LSTM with Attention Mechanism","volume":"70","author":"Sun","year":"2021","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, Y., Kumar, N., Xiong, Z., Lim, W.Y.B., Kang, J., and Niyato, D. (2020, January 7\u201310). Communication-Efficient Federated Learning for Anomaly Detection in Industrial Internet of Things. Proceedings of the 2020 IEEE Global Communications Conference, Taipei City, Taiwan.","DOI":"10.1109\/GLOBECOM42002.2020.9348249"},{"key":"ref_32","unstructured":"Li, K.L., Huang, H.K., Tian, S.F., and Xu, W. (2003, January 5). Improving one-class SVM for anomaly detection. Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693), Xi\u2019an, China."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Perdisci, R., Gu, G., and Lee, W. (2006, January 18\u201322). Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems. Proceedings of the Sixth International Conference on Data Mining (ICDM\u201906), Hong Kong, China.","DOI":"10.1109\/ICDM.2006.165"},{"key":"ref_34","unstructured":"Tschannen, M., Bachem, O., and Lucic, M. (2018, January 7). Recent advances in autoencoder-based representation learning. Proceedings of the Third Workshop on Bayesian Deep Learning (NeurIPS 2018), Montr\u00e9al, QC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1109\/TSMCB.2008.2007853","article-title":"Exploratory undersampling for class-imbalance learning","volume":"39","author":"Liu","year":"2009","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_36","unstructured":"Good, P.I. (2006). Resampling Methods, Springer."},{"key":"ref_37","unstructured":"He, H., Bai, Y., Garcia, E.A., and Li, S. (2008, January 1\u20138). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1613\/jair.1.11192","article-title":"SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary","volume":"61","author":"Garcia","year":"2018","journal-title":"J. Artif. Intell. Res."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/9\/501\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:49:55Z","timestamp":1760129395000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/9\/501"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,13]]},"references-count":39,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["info14090501"],"URL":"https:\/\/doi.org\/10.3390\/info14090501","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,13]]}}}