{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T11:57:31Z","timestamp":1770465451073,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,2,2]],"date-time":"2019-02-02T00:00:00Z","timestamp":1549065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61702235"],"award-info":[{"award-number":["61702235"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61602247"],"award-info":[{"award-number":["61602247"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61472188"],"award-info":[{"award-number":["61472188"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1636117"],"award-info":[{"award-number":["U1636117"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK20160840"],"award-info":[{"award-number":["BK20160840"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK20150472"],"award-info":[{"award-number":["BK20150472"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["30920140121006"],"award-info":[{"award-number":["30920140121006"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["30915012208"],"award-info":[{"award-number":["30915012208"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>In highly sophisticated network attacks, command-and-control (C&amp;C) servers always use domain generation algorithms (DGAs) to dynamically produce several candidate domains instead of static hard-coded lists of IP addresses or domain names. Distinguishing the domains generated by DGAs from the legitimate ones is critical for finding out the existence of malware or further locating the hidden attackers. The word-based DGAs disclosed in recent network attack events have shown significantly stronger stealthiness when compared with traditional character-based DGAs. In word-based DGAs, two or more words are randomly chosen from one or more specific dictionaries to form a dynamic domain, these regularly generated domains aim to mimic the characteristics of a legitimate domain. Existing DGA detection schemes, including the state-of-the-art one based on deep learning, still cannot find out these domains accurately while maintaining an acceptable false alarm rate. In this study, we exploit the inter-word and inter-domain correlations using semantic analysis approaches, word embedding and the part-of-speech are taken into consideration. Next, we propose a detection framework for word-based DGAs by incorporating the frequency distribution of the words and that of part-of-speech into the design of the feature set. Using an ensemble classifier constructed from Naive Bayes, Extra-Trees, and Logistic Regression, we benchmark the proposed scheme with malicious and legitimate domain samples extracted from public datasets. The experimental results show that the proposed scheme can achieve significantly higher detection accuracy for word-based DGAs when compared with three state-of-the-art DGA detection schemes.<\/jats:p>","DOI":"10.3390\/sym11020176","type":"journal-article","created":{"date-parts":[[2019,2,5]],"date-time":"2019-02-05T11:31:07Z","timestamp":1549366267000},"page":"176","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7065-6801","authenticated-orcid":false,"given":"Luhui","family":"Yang","sequence":"first","affiliation":[{"name":"School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8557-9899","authenticated-orcid":false,"given":"Jiangtao","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210094, China"}]},{"given":"Weiwei","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6094-4626","authenticated-orcid":false,"given":"Xiaopeng","family":"Ji","sequence":"additional","affiliation":[{"name":"School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"given":"Huiwen","family":"Bai","sequence":"additional","affiliation":[{"name":"School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"given":"Guangjie","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China"}]},{"given":"Yuewei","family":"Dai","sequence":"additional","affiliation":[{"name":"School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210094, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1663","DOI":"10.1109\/TNET.2012.2184552","article-title":"Detecting algorithmically generated domain-flux attacks with DNS traffic analysis","volume":"20","author":"Yadav","year":"2012","journal-title":"IEEE Acm Trans. Netw."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/MSP.2016.76","article-title":"A taxonomy of domain-generation algorithms","volume":"14","author":"Sood","year":"2016","journal-title":"IEEE Secur. Priv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1430","DOI":"10.1109\/TIFS.2017.2668361","article-title":"Stealthy domain generation algorithms","volume":"12","author":"Fu","year":"2017","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_4","unstructured":"Geffner, J. (August, January 27). End-to-end analysis of a domain generating algorithm malware family. Proceedings of the Black Hat USA, Las Vegas, NV, USA."},{"key":"ref_5","unstructured":"Stanislav, S. (2016, May 05). Matsnu Malware ID. Check Point Blog Post. Available online: https:\/\/blog.checkpoint.com\/wp-content\/uploads\/2015\/07\/Matsnu-malwareid-technical-brief.pdf."},{"key":"ref_6","unstructured":"Woodbridge, J., Anderson, H.S., Ahuja, A., and Grant, D. (arXiv, 2016). Predicting domain generation algorithms with long short-term memory networks, arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yadav, S., Reddy, A.K.K., Reddy, A.L., and Ranjan, S. (2010, January 1\u20133). Detecting algorithmically generated malicious domain names. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, Melbourne, Australia.","DOI":"10.1145\/1879141.1879148"},{"key":"ref_8","unstructured":"Schiavoni, S., Maggi, F., Cavallaro, L., and Zanero, S. (arXiv, 2016). Tracking and characterizing botnets using automatically generated domains, arXiv."},{"key":"ref_9","first-page":"192","article-title":"Phoenix DGA-based botnet tracking and intelligence","volume":"8850","author":"Schiavoni","year":"2014","journal-title":"Detect. Intrusions Malware Vulnerability Assess."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2584679","article-title":"Exposure: A passive dns analysis service to detect and report malicious domains","volume":"16","author":"Bilge","year":"2014","journal-title":"ACM Trans. Inf. Syst. Secur."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yu, B., Gray, D.L., Pan, J., De Cock, M., and Nascimento, A.C. (2016, January 18\u201321). Inline dga detection with deep networks. Proceedings of the IEEE International Conference on Data Mining Work, New Orleans, LA, USA.","DOI":"10.1109\/ICDMW.2017.96"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Mowbray, M., and Hagen, J. (2014, January 3\u20136). Finding domain-generation algorithms by looking at length distribution. Proceedings of the IEEE International Symposium on Software Reliability Engineering Workshops, Naples, Italy.","DOI":"10.1109\/ISSREW.2014.20"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/j.jare.2014.01.001","article-title":"Unsupervised, low latency anomaly detection of algorithmically generated domain names by generative probabilistic modeling","volume":"5","author":"Raghuram","year":"2014","journal-title":"J. Adv. Res."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Grill, M., Nikolaev, I., Valeros, V., and Rehak, M. (2015, January 11\u201315). Detecting DGA malware using NetFlow. Proceedings of the IFIP\/IEEE International Symposium on Integrated Network Management (IM), Ottawa, ON, Canada.","DOI":"10.1109\/INM.2015.7140486"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Nguyen, T.D., Cao, T.D., and Nguyen, L.G. (2015, January 3\u20134). DGA botnet detection using collaborative filtering and density-based clustering. Proceedings of the International Symposium on Information and Communication Technology, Hue City, Vietnam.","DOI":"10.1145\/2833258.2833310"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, T., Hu, X., Jang, J., Ji, S., Stoecklin, M., and Taylor, T. (2016, January 27\u201330). BotMeter: Charting DGA-botnet landscapes in large networks. Proceedings of the IEEE 36th International Conference on Distributed Computing Systems (ICDCS), Nara, Japan.","DOI":"10.1109\/ICDCS.2016.77"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1294","DOI":"10.1109\/TFUZZ.2006.889970","article-title":"Semantic web content analysis: A study in proximity-based collaborative clustering","volume":"15","author":"Loia","year":"2007","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1109\/TC.2011.223","article-title":"An approach to source-code plagiarism detection and investigation using latent semantic analysis","volume":"61","author":"Cosma","year":"2012","journal-title":"IEEE Trans. Comput."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.knosys.2017.01.006","article-title":"Automated essay evaluation with semantic analysis","volume":"120","author":"Zupanc","year":"2017","journal-title":"Knowl.-Based Syst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Harispe, S., Ranwez, S., Janaqi, S., and Montmain, J. (2015). Semantic similarity from natural language and ontology analysis. Synth. Lect. Hum. Lang. Technol., 8.","DOI":"10.1007\/978-3-031-02156-5"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"977","DOI":"10.1109\/TKDE.2010.172","article-title":"A web search engine-based approach to measure semantic similarity between words","volume":"23","author":"Bollegala","year":"2011","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1109\/TFUZZ.2010.2065811","article-title":"Cross-lingual document representation and semantic similarity measure: A fuzzy set and rough set based approach","volume":"18","author":"Huang","year":"2010","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1145\/361219.361220","article-title":"A vector space model for automatic indexing","volume":"18","author":"Salton","year":"1975","journal-title":"Commun. ACM"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","article-title":"Indexing by latent semantic analysis","volume":"41","author":"Deerwester","year":"1990","journal-title":"J. Am. Soc. Inf. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"203","DOI":"10.3758\/BF03204766","article-title":"Producing high-dimensional semantic spaces from lexical co-occurrence","volume":"28","author":"Lund","year":"1996","journal-title":"Behav. Res. Methods Instrum. Comput."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1023\/A:1007537716579","article-title":"Similarity-based models of word cooccurrence probabilities","volume":"34","author":"Dagan","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"510","DOI":"10.3758\/BF03193020","article-title":"Extracting semantic representations from word co-occurrence statistics: A computational study","volume":"39","author":"Bullinaria","year":"2007","journal-title":"Behav. Res. Methods"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000006","article-title":"Learning deep architectures for AI","volume":"2","author":"Bengio","year":"2009","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_29","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient estimation of word representations in vector space. Proceedings of the Workshop at International Conference on Learning Representations, Scottsdale, AZ, USA."},{"key":"ref_30","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013, January 5\u201310). Distributed representations of words and phrases and their compositionality. Proceedings of the Neural Information Processing Systems Conference (NIPS), Lake Tahoe, NV, USA."},{"key":"ref_31","unstructured":"Mikolov, T., Yih, W., and Zweig, G. (2013, January 9\u201314). Linguistic regularities in continuous space word representations. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), Atlana, GA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014, January 25\u201329). Glove: Global vectors for word representation. Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bird, S., and Loper, E. (2004, January 21\u201326). NLTK: The natural language toolkit. Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, Barcelona, Spain.","DOI":"10.3115\/1219044.1219075"},{"key":"ref_34","first-page":"3","article-title":"On methods of Chinese automatic segmentation","volume":"3","author":"Chunyu","year":"1989","journal-title":"J. Chin. Inf. Process."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/2\/176\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:30:44Z","timestamp":1760185844000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/11\/2\/176"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,2]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["sym11020176"],"URL":"https:\/\/doi.org\/10.3390\/sym11020176","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,2,2]]}}}