{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T06:50:09Z","timestamp":1777704609480,"version":"3.51.4"},"reference-count":27,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2018,7,18]],"date-time":"2018-07-18T00:00:00Z","timestamp":1531872000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"published-print":{"date-parts":[[2018,12,24]]},"abstract":"<jats:p>Most Intrusion Detection Systems (IDS) nowadays are signature-based. They are very useful and accurate for detecting known attacks. However, they are inefficient with unknown attacks. Moreover, most of cyber attacks start with malicious URLs. Attackers try to trick users into clicking on malicious URLs. This gives attackers an easy way to launch attacks. To defend against this, companies and organizations use IDS\/IPS to detect malicous URLs using blacklist of URLs. This is very efficient with known malicious URLs, but useless with unknown malicious URLs. To overcome this problem, a number of malicious Web site detection systems have been proposed. One of the most promising methods is to apply machine learning detection techniques. In this paper, we present a new lexical approach to classify URLs by using machine learning techniques which patternize the URLs. Our approach is based on natural language processing features which use word vector representation and ngram models on the blacklist word as the main features. Using this technique can help classifier distinguish benign URLs from malicious ones. Our experimentation shows that our approach can achieve a high degree of accuracy at 97.1% in the case of SVM. Moreover, it can maintain a high level of robustness with 0.97 precision and 0.93 recall scores.<\/jats:p>","DOI":"10.3233\/jifs-169831","type":"journal-article","created":{"date-parts":[[2018,7,20]],"date-time":"2018-07-20T12:27:45Z","timestamp":1532089665000},"page":"5889-5900","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":13,"title":["Detection of malicious URLs based on word vector representation and ngram"],"prefix":"10.1177","volume":"35","author":[{"given":"Quan Tran","family":"Hai","sequence":"first","affiliation":[{"name":"Department of Electronics and Computer Engineering, Hongik University, Sejong, Korea"}]},{"given":"Seong Oun","family":"Hwang","sequence":"additional","affiliation":[{"name":"Department of Software and Communications Engineering, Hongik University, Sejong, Korea"}]}],"member":"179","published-online":{"date-parts":[[2018,7,18]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Korea internet security agency. https:\/\/www.kisa.or.kr."},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"SommerR. and PaxsonV. Outside the closed world: On using machine learning for network intrusion detection inSecurity and Privacy (SP) 2010 IEEE Symposium on IEEE 2010 pp.305\u2013316.","DOI":"10.1109\/SP.2010.25"},{"key":"e_1_3_2_4_2","unstructured":"Netscape. dmoz open directory project. http:\/\/www.dmoz.org."},{"key":"e_1_3_2_5_2","first-page":"4","article-title":"Behind phishing: An examination of phisher modi operandi","volume":"8","author":"McGrath D.K.","year":"2008","unstructured":"McGrathD.K. and GuptaM., Behind phishing: An examination of phisher modi operandi, LEET8 (2008), 4.","journal-title":"LEET"},{"key":"e_1_3_2_6_2","unstructured":"KolariP. FininT. and JoshiA. Svms for the blogosphere: Blog identification and splog detection in AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs 2006 pp. 92\u201399."},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","unstructured":"MaJ. SaulL.K. SavageS. and VoelkerG.M. Identifying suspicious urls: An application of large-scale online learning in Proceedings of the 26th Annual International Conference on Machine Learning ACM 2009 pp. 681\u2013688.","DOI":"10.1145\/1553374.1553462"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"MaJ. SaulL.K. SavageS. and VoelkerG.M. Beyond blacklists: Learning to detect malicious web sites from suspicious urls in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM 2009 pp. 1245\u20131254.","DOI":"10.1145\/1557019.1557153"},{"key":"e_1_3_2_9_2","unstructured":"Tinyurl. url shortener service. http:\/\/tinyurl.com."},{"key":"e_1_3_2_10_2","unstructured":"Google shortener service. https:\/\/goo.gl."},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"GareraS. ProvosN. ChewM. and RubinA.D. A framework for detection and measurement of phishing attacks in Proceedings of the 2007 ACM Workshop on Recurring Malcode ACM 2007 pp. 1\u20138.","DOI":"10.1145\/1314389.1314391"},{"key":"e_1_3_2_12_2","unstructured":"MikolovT. SutskeverI. ChenK. CorradoG.S. and DeanJ. Distributed representations of words and phrases and their compositionality in Advances in Neural Information Processing Systems 2013 pp.3111\u20133119."},{"key":"e_1_3_2_13_2","unstructured":"Word2vec. implementation of words vector representation. https:\/\/code.google.eom\/archive\/p\/word2vec\/."},{"key":"e_1_3_2_14_2","unstructured":"Gensim. python library for natural language processing. https:\/\/radimrehurek.com\/gensim\/index.html."},{"key":"e_1_3_2_15_2","unstructured":"Malware domain list url database. http:\/\/www.malware domainlist.com."},{"key":"e_1_3_2_16_2","unstructured":"MalcOde url database. http:\/\/malc0de.com\/database."},{"key":"e_1_3_2_17_2","unstructured":"Virustotal. https:\/\/www.virustotal.com."},{"key":"e_1_3_2_18_2","unstructured":"CleanMX url database. http:\/\/support.clean-mx.com."},{"key":"e_1_3_2_19_2","first-page":"108","article-title":"API design for machine learning software: Experiences from the scikit-learn project","author":"Buitinck L.","year":"2013","unstructured":"BuitinckL., LouppeG., BlondelM., PedregosaF., MuellerA., GriselO., NiculaeV., PrettenhoferP., GramfortA., GroblerJ., LaytonR., VanderPlasJ., JolyA., HoltB. and VaroquauxG., API design for machine learning software: Experiences from the scikit-learn project, in ECML PKDDWorkshop: Languages for Data Mining and Machine Learning, 2013, pp. 108\u2013122.","journal-title":"ECML PKDDWorkshop: Languages for Data Mining and Machine Learning"},{"key":"e_1_3_2_20_2","unstructured":"Pandas. python data analysis library. http:\/\/pandas.pydata.org."},{"key":"e_1_3_2_21_2","unstructured":"Pyro4. python remote objects library. https:\/\/pythonhosted.org\/Pyro4."},{"key":"e_1_3_2_22_2","unstructured":"Python library for html and xml parsing. http:\/\/www.crummy.com\/software\/BeautifulSoup."},{"key":"e_1_3_2_23_2","unstructured":"Feedparser - python library for rss parsing. https:\/\/pypi.python.org\/pypi\/feedparser."},{"key":"e_1_3_2_24_2","unstructured":"Snort ids. https:\/\/www.snort.org."},{"key":"e_1_3_2_25_2","unstructured":"Alexa. domain ranking by traffic. http:\/\/www.alexa.com\/."},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/2019599.2019606"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"LeA. MarkopoulouA. and FaloutsosM. Phishdef: Url names say it all in INFOCOM 2011 Proceedings IEEE IEEE 2011 pp. 191\u2013195.","DOI":"10.1109\/INFCOM.2011.5934995"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","unstructured":"IslamM.R. SinghJ. ChonkaA. and ZhouW. Multiclassi-fier classification of spam email on a ubiquitous multicore architecture in Network and Parallel Computing 2008NPC 2008 IFIP International Conference on IEEE 2008 pp. 210\u2013217.","DOI":"10.1109\/NPC.2008.71"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-169831","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JIFS-169831","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-169831","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:41:36Z","timestamp":1777455696000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JIFS-169831"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,18]]},"references-count":27,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2018,12,24]]}},"alternative-id":["10.3233\/JIFS-169831"],"URL":"https:\/\/doi.org\/10.3233\/jifs-169831","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,7,18]]}}}