{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T03:23:10Z","timestamp":1778556190027,"version":"3.51.4"},"reference-count":50,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,4,26]],"date-time":"2020-04-26T00:00:00Z","timestamp":1587859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61802030"],"award-info":[{"award-number":["61802030"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Research Foundation of Education Bureau of Hunan Province, China","award":["19B005"],"award-info":[{"award-number":["19B005"]}]},{"name":"the International Cooperative Project for &quot;Double First-Class&quot;, CSUST","award":["2018IC24"],"award-info":[{"award-number":["2018IC24"]}]},{"name":"the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education","award":["JZNY201905"],"award-info":[{"award-number":["JZNY201905"]}]},{"name":"the Open Research Fund of the Hunan Provincial Key Laboratory of Network Investigational Technology","award":["2018WLZC003"],"award-info":[{"award-number":["2018WLZC003"]}]},{"name":"the Researchers Supporting Project King Saud University","award":["RSP-2019\/102"],"award-info":[{"award-number":["RSP-2019\/102"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene\/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.<\/jats:p>","DOI":"10.3390\/s20092451","type":"journal-article","created":{"date-parts":[[2020,4,28]],"date-time":"2020-04-28T10:30:58Z","timestamp":1588069858000},"page":"2451","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":90,"title":["LogEvent2vec: LogEvent-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5473-8738","authenticated-orcid":false,"given":"Jin","family":"Wang","sequence":"first","affiliation":[{"name":"School of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China"},{"name":"Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing 210003, China"}]},{"given":"Yangning","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2572-8041","authenticated-orcid":false,"given":"Shiming","family":"He","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China"},{"name":"Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing 210003, China"}]},{"given":"Changqing","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China"}]},{"given":"Pradip Kumar","family":"Sharma","sequence":"additional","affiliation":[{"name":"Department of Computing Science, University of Aberdeen, Aberdeen AB243FX, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6111-8617","authenticated-orcid":false,"given":"Osama","family":"Alfarraj","sequence":"additional","affiliation":[{"name":"Computer Science Department, Community College, King Saud University, Riyadh 11437, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3439-6413","authenticated-orcid":false,"given":"Amr","family":"Tolba","sequence":"additional","affiliation":[{"name":"Computer Science Department, Community College, King Saud University, Riyadh 11437, Saudi Arabia"},{"name":"Mathematics and Computer Science Department, Faculty of Science, Menoufia University, Menoufia 32511, Egypt"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, W., Xu, H., Li, H., Yang, Y., Sharma, P.K., and Wang, J. (2019). Complexity and Algorithms for Superposed Data Uploading Problem in Networks with Smart Devices. IEEE Internet Things J.","DOI":"10.1109\/JIOT.2019.2949352"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4844","DOI":"10.1109\/JIOT.2018.2872133","article-title":"Multi-Model Framework for Indoor Localization under Mobile Edge Computing Environment","volume":"6","author":"Li","year":"2019","journal-title":"IEEE Internet Things J."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"17996","DOI":"10.1109\/ACCESS.2018.2820093","article-title":"Energy-aware Routing for SWIPT in Multi-hop Energy-constrained Wireless Network","volume":"6","author":"He","year":"2018","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"He, S., Tang, Y., Li, Z., Li, F., Xie, K., Kim, H.J., and Kim, G.J. (2019). Interference-Aware Routing for Difficult Wireless Sensor Network Environment with SWIPT. Sensors, 19.","DOI":"10.3390\/s19183978"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, J., Gao, Y., Wang, K., Sangaiah, A.K., and Lim, S.J. (2019). An affinity propagation-based self-adaptive clustering method for wireless sensor networks. Sensors, 19.","DOI":"10.3390\/s19112579"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/s13673-019-0179-4","article-title":"An empower hamilton loop based data collection algorithm with mobile agent for WSNs","volume":"9","author":"Wang","year":"2019","journal-title":"Human-Centric Comput. Inf. Sci."},{"key":"ref_7","first-page":"695","article-title":"Optimal coverage multi-path scheduling scheme with multiple mobile sinks for WSNs","volume":"62","author":"Wang","year":"2020","journal-title":"Comput. Mater. Cont."},{"key":"ref_8","first-page":"81","article-title":"Smart Security Framework for Educational Institutions Using Internet of Things (IoT)","volume":"61","author":"Badshah","year":"2019","journal-title":"Comput. Mater. Cont."},{"key":"ref_9","first-page":"635","article-title":"A novel ensemble learning algorithm based on DS evidence theory for IoT security","volume":"57","author":"Shi","year":"2018","journal-title":"Comput. Mater. Cont."},{"key":"ref_10","first-page":"1","article-title":"A DPN (Delegated Proof of Node) Mechanism for Secure Data Transmission in IoT Services","volume":"60","author":"Kim","year":"2019","journal-title":"CMC Comput. Mater. Cont."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Park, J.S., Youn, T.Y., Kim, H.B., Rhee, K.H., and Shin, S.U. (2018). Smart Contract-Based Review System for an IoT Data Marketplace. Sensors, 18.","DOI":"10.3390\/s18103577"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, S., Xie, K., Zhou, X., Semong, T., and Wang, J. (2019). Multi-Source Reliable Multicast Routing with QoS Constraints of NFV in Edge Computing. Electronics, 8.","DOI":"10.3390\/electronics8101106"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.inffus.2018.11.010","article-title":"Short-long term anomaly detection in wireless sensor networks based on machine learning and multi-parameterized edit distance","volume":"52","author":"Cauteruccio","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_14","first-page":"15","article-title":"Using imbalanced triangle synthetic data for machine learning anomaly detection","volume":"58","author":"Luo","year":"2019","journal-title":"Comput. Mater. Cont."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, J., Wang, W., Lu, C., Wang, J., and Sangaiah, A.K. (2020). Lightweight deep network for traffic sign classification. Ann. Telecommun.","DOI":"10.1007\/s12243-019-00731-9"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"29742","DOI":"10.1109\/ACCESS.2020.2972338","article-title":"A cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection","volume":"8","author":"Zhang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4855","DOI":"10.1007\/s12652-018-01171-4","article-title":"The visual object tracking algorithm research based on adaptive combination kernel","volume":"10","author":"Chen","year":"2019","journal-title":"J. Ambient Intell. Humanized Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1016\/j.jvcir.2019.01.029","article-title":"Multi-camera transfer GAN for person re-identification","volume":"59","author":"Zhou","year":"2019","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3794","DOI":"10.1109\/TNET.2017.2761704","article-title":"Fast tensor factorization for accurate internet anomaly detection","volume":"25","author":"Xie","year":"2017","journal-title":"IEEE\/ACM Trans. Netw. (TON)"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1109\/TNET.2018.2819507","article-title":"On-line anomaly detection with high accuracy","volume":"26","author":"Xie","year":"2018","journal-title":"IEEE\/ACM Trans. Netw."},{"key":"ref_21","first-page":"829","article-title":"Long Short Term Memory Networks Based Anomaly Detection for KPIs","volume":"61","author":"Zhu","year":"2019","journal-title":"Comput. Mater. Cont."},{"key":"ref_22","first-page":"1171","article-title":"YATA: Yet Another Proposal for Traffic Analysis and Anomaly Detection","volume":"60","author":"Wang","year":"2019","journal-title":"Comput. Mater. Cont."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"131","DOI":"10.32604\/csse.2019.34.131","article-title":"Non-deterministic outlier detection method based on the variable precision rough set model","volume":"34","author":"Oliva","year":"2019","journal-title":"Comput. Syst. Sci. Eng."},{"key":"ref_24","first-page":"317","article-title":"Network Embedding-Based Anomalous Density Searching for Multi-Group Collaborative Fraudsters Detection in Social Media","volume":"60","author":"Zhu","year":"2019","journal-title":"Comput. Mater. Cont."},{"key":"ref_25","unstructured":"Zhang, S., Meng, W., Bu, J., Yang, S., Liu, Y., Pei, D., Xu, J., Chen, Y., Dong, H., and Qu, X. (2017, January 14\u201316). Syslog processing for switch failure diagnosis and prediction in datacenter networks. Proceedings of the 2017 IEEE\/ACM 25th International Symposium on Quality of Service (IWQoS), Vilanova i la Geltru, Spain."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1109\/TDSC.2017.2762673","article-title":"Towards automated log parsing for large-scale log data analysis","volume":"15","author":"He","year":"2017","journal-title":"IEEE Trans. Depend. Secure Comput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, P., Zhu, J., Zheng, Z., and Lyu, M.R. (2017, January 25\u201330). Drain: An Online Log Parsing Approach with Fixed Depth Tree. Proceedings of the IEEE International Conference on Web Services, Honolulu, HI, USA.","DOI":"10.1109\/ICWS.2017.13"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5","DOI":"10.32604\/csse.2018.33.005","article-title":"A dynamic independent component analysis approach to fault detection with new statistics","volume":"33","author":"Teimoortashloo","year":"2018","journal-title":"Comput. Syst. Sci. Eng."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, S., Liu, Y., Meng, W., Luo, Z., Bu, J., Yang, S., Liang, P., Pei, D., Xu, J., and Zhang, Y. (2018, January 18\u201322). Prefix: Switch failure prediction in datacenter networks. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Irvine, CA, USA.","DOI":"10.1145\/3219617.3219643"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Khatuya, S., Ganguly, N., Basak, J., Bharde, M., and Mitra, B. (2018, January 16\u201319). ADELE: Anomaly Detection from Event Log Empiricism. Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA.","DOI":"10.1109\/INFOCOM.2018.8486257"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"He, S., Zhu, J., He, P., and Lyu, M.R. (2016, January 23\u201327). Experience report: system log analysis for anomaly detection. Proceedings of the 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), Ottawa, ON, Canada.","DOI":"10.1109\/ISSRE.2016.21"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Meng, W., Liu, Y., Zhang, S., Pei, D., Dong, H., Song, L., and Luo, X. (2018, January 4\u20136). Device-agnostic log anomaly classification with partial labels. Proceedings of the 2018 IEEE\/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada.","DOI":"10.1109\/IWQoS.2018.8624141"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., and Chen, X. (2016, January 14\u201322). Log clustering based problem identification for online service systems. Proceedings of the 38th International Conference on Software Engineering Companion, Austin, TX, USA.","DOI":"10.1145\/2889160.2889232"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Bertero, C., Roy, M., Sauvanaud, C., and Tr\u00e9dan, G. (2017, January 23\u201326). Experience report: Log mining using natural language processing and application to anomaly detection. Proceedings of the 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), Toulouse, France.","DOI":"10.1109\/ISSRE.2017.43"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Chen, Y., Zhang, R., Tao, S., and Sun, P. (2019, January 10\u201316). Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, Macao, China.","DOI":"10.24963\/ijcai.2019\/658"},{"key":"ref_36","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., and Lyu, M.R. (2019, January 25\u201331). Tools and benchmarks for automated log parsing. Proceedings of the 2019 IEEE\/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada.","DOI":"10.1109\/ICSE-SEIP.2019.00021"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"30602","DOI":"10.1109\/ACCESS.2018.2843336","article-title":"An integrated method for anomaly detection from massive system logs","volume":"6","author":"Liu","year":"2018","journal-title":"IEEE Access"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1361684.1361686","article-title":"Interpreting TF-IDF term weights as making relevance decisions","volume":"26","author":"Wu","year":"2008","journal-title":"Acm Trans. Inf. Syst."},{"key":"ref_40","unstructured":"Soucy, P., and Mineau, G.W. (August, January 30). Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. Proceedings of the the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Nguyen, K.A., Schulte im Walde, S., and Vu, N.T. (2016, January 25). Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany.","DOI":"10.18653\/v1\/P16-2074"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Debnath, B., Solaimani, M., Gulzar, M.A.G., Arora, N., Lumezanu, C., Xu, J., Zong, B., Zhang, H., Jiang, G., and Khan, L. (2018, January 2\u20136). LogLens: A Real-time Log Analysis System. Proceedings of the 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria.","DOI":"10.1109\/ICDCS.2018.00105"},{"key":"ref_43","first-page":"321","article-title":"Parameters Compressing in Deep Learning","volume":"62","author":"He","year":"2020","journal-title":"Comput. Mater. Cont."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Du, M., Li, F., Zheng, G., and Srikumar, V. (November, January 30). Deeplog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA.","DOI":"10.1145\/3133956.3134015"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Vinayakumar, R., Soman, K., and Poornachandran, P. (2017, January 13\u201316). Long short-term memory based operation log anomaly detection. Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India.","DOI":"10.1109\/ICACCI.2017.8125846"},{"key":"ref_46","unstructured":"Tuor, A.R., Baerwolf, R., Knowles, N., Hutchinson, B., Nichols, N., and Jasper, R. (2018, January 2\u20133). Recurrent neural network language models for open vocabulary event-level cyber anomaly detection. Proceedings of the Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Hernandez-Suarez, A., and Sanchez-Perez, G. (2019). Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation. Sensors, 19.","DOI":"10.3390\/s19071746"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Lu, H., Shi, K., and Zhu, Y. (2018). Sensing Urban Transportation Events from Multi-Channel Social Signals with the Word2vec Fusion Model. Sensors, 18.","DOI":"10.3390\/s18124093"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhou, W., Wang, H., Sun, H., and Sun, T. (2019). A Method of Short Text Representation Based on the Feature Probability Embedded Vector. Sensors, 19.","DOI":"10.3390\/s19173728"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Oliner, A., and Stearley, J. (2007, January 25\u201328). What supercomputers say: A study of five system logs. Proceedings of the 37th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN\u201907), Edinburgh, UK.","DOI":"10.1109\/DSN.2007.103"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/9\/2451\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:32:01Z","timestamp":1760362321000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/9\/2451"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,26]]},"references-count":50,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["s20092451"],"URL":"https:\/\/doi.org\/10.3390\/s20092451","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,26]]}}}