{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T06:38:05Z","timestamp":1764225485158,"version":"3.41.2"},"reference-count":71,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2019,4,8]],"date-time":"2019-04-08T00:00:00Z","timestamp":1554681600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IMDS"],"published-print":{"date-parts":[[2019,4,8]]},"abstract":"<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title>\n<jats:p>Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones).<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title>\n<jats:p>The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Findings<\/jats:title>\n<jats:p>By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Practical implications<\/jats:title>\n<jats:p>This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title>\n<jats:p>Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.<\/jats:p>\n<\/jats:sec>","DOI":"10.1108\/imds-02-2018-0072","type":"journal-article","created":{"date-parts":[[2018,12,4]],"date-time":"2018-12-04T11:01:39Z","timestamp":1543921299000},"page":"676-696","source":"Crossref","is-referenced-by-count":25,"title":["Malicious web domain identification using online credibility and performance data by considering the class imbalance issue"],"prefix":"10.1108","volume":"119","author":[{"given":"Zhongyi","family":"Hu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raymond","family":"Chiong","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ilung","family":"Pranata","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5418-8799","authenticated-orcid":false,"given":"Yukun","family":"Bao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuqing","family":"Lin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","reference":[{"key":"key2020092409414964500_ref001","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.procs.2017.05.352","article-title":"Using case-based reasoning for phishing detection","volume":"109","year":"2017","journal-title":"Procedia Computer Science"},{"year":"2015","first-page":"226","article-title":"SCUT: multi-class imbalanced data classification using SMOTE and cluster-based undersampling","key":"key2020092409414964500_ref002"},{"key":"key2020092409414964500_ref003","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-018-3084-2","article-title":"Heuristic nonlinear regression strategy for detecting phishing websites","year":"2018","journal-title":"Soft Computing"},{"issue":"3","key":"key2020092409414964500_ref004","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/S0031-3203(02)00257-1","article-title":"Strategies for learning in class imbalance problems","volume":"36","year":"2003","journal-title":"Pattern Recognition"},{"issue":"2","key":"key2020092409414964500_ref005","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1109\/TKDE.2012.232","article-title":"MWMOTE\u2013majority weighted minority oversampling technique for imbalanced data set learning","volume":"26","year":"2014","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"1","key":"key2020092409414964500_ref006","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1145\/1007730.1007735","article-title":"A study of the behavior of several methods for balancing machine learning training data","volume":"6","year":"2004","journal-title":"ACM SIGKDD Explorations Newsletter"},{"key":"key2020092409414964500_ref007","first-page":"370","article-title":"An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr Bayes, FRS communicated by Mr Price, in a letter to John Canton, AMFRS","volume":"53","year":"1763","journal-title":"Philosophical Transactions (1683\u20131775)"},{"issue":"6","key":"key2020092409414964500_ref008","doi-asserted-by":"crossref","first-page":"534","DOI":"10.1109\/TSE.2017.2731766","article-title":"MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction","volume":"44","year":"2018","journal-title":"IEEE Transactions on Software Engineering"},{"year":"2010","first-page":"54","article-title":"Lexical feature based phishing URL detection using online learning","key":"key2020092409414964500_ref009"},{"issue":"2","key":"key2020092409414964500_ref010","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","year":"1996","journal-title":"Machine Learning"},{"issue":"1","key":"key2020092409414964500_ref011","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","year":"2001","journal-title":"Machine Learning"},{"issue":"4","key":"key2020092409414964500_ref012","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1057\/s41274-017-0233-4","article-title":"A cost-sensitive multi-criteria quadratic programming model for imbalanced data","volume":"69","year":"2018","journal-title":"Journal of the Operational Research Society"},{"key":"key2020092409414964500_ref013","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","year":"2002","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"4","key":"key2020092409414964500_ref014","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1108\/IMDS-06-2015-0222","article-title":"Big data analytics with swarm intelligence","volume":"116","year":"2016","journal-title":"Industrial Management & Data Systems"},{"doi-asserted-by":"crossref","unstructured":"Chiong, R., Neri, F. and McKay, R.I. (2010), \u201cNature that breeds solutions\u201d, in Chiong, R. (Ed.), Nature-Inspired Informatics for Intelligent Applications and Knowledge Discovery: Implications in Business, Science and Engineering, IGI Global, Hershey, PA, pp. 1-24.","key":"key2020092409414964500_ref015","DOI":"10.4018\/978-1-60566-705-8.ch001"},{"key":"key2020092409414964500_ref016","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","year":"2006","journal-title":"Journal of Machine Learning Research"},{"issue":"7","key":"key2020092409414964500_ref017","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1108\/02635570410550278","article-title":"Internet security: malicious e-mails detection and protection","volume":"104","year":"2004","journal-title":"Industrial Management & Data Systems"},{"key":"key2020092409414964500_ref018","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1016\/j.eswa.2017.09.030","article-title":"Effective data generation for imbalanced learning using conditional generative adversarial networks","volume":"91","year":"2018","journal-title":"Expert Systems with Applications"},{"issue":"1","key":"key2020092409414964500_ref019","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1006\/jcss.1997.1504","article-title":"A decision-theoretic generalization of on-line learning and an application to boosting","volume":"55","year":"1997","journal-title":"Journal of Computer and System Sciences"},{"issue":"4","key":"key2020092409414964500_ref020","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1109\/TDSC.2006.50","article-title":"Detecting phishing web pages with visual similarity assessment based on Earth mover\u2019s distance","volume":"3","year":"2006","journal-title":"IEEE Transactions on Dependable and Secure Computing"},{"issue":"7","key":"key2020092409414964500_ref021","first-page":"750","article-title":"A branch and bound algorithm for computing K-nearest neighbors","volume":"100","year":"1975","journal-title":"IEEE Transactions on Computers"},{"issue":"12","key":"key2020092409414964500_ref022","doi-asserted-by":"crossref","first-page":"3460","DOI":"10.1016\/j.patcog.2013.05.006","article-title":"EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling","volume":"46","year":"2013","journal-title":"Pattern Recognition"},{"issue":"4","key":"key2020092409414964500_ref023","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1109\/TSMCC.2011.2161285","article-title":"A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches","volume":"42","year":"2012","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)"},{"issue":"17","key":"key2020092409414964500_ref024","doi-asserted-by":"crossref","first-page":"3456","DOI":"10.1016\/j.neucom.2011.06.010","article-title":"A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems","volume":"74","year":"2011","journal-title":"Neurocomputing"},{"issue":"3","key":"key2020092409414964500_ref025","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1162\/evco.2009.17.3.275","article-title":"Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy","volume":"17","year":"2009","journal-title":"Evolutionary Computation"},{"key":"key2020092409414964500_ref026","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1016\/j.patcog.2017.11.027","article-title":"A two-dimensional (2-D) learning framework for particle swarm based feature selection","volume":"76","year":"2018","journal-title":"Pattern Recognition"},{"key":"key2020092409414964500_ref027","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1016\/j.eswa.2016.12.035","article-title":"Learning from class-imbalanced data: review of methods and applications","volume":"73","year":"2017","journal-title":"Expert Systems with Applications"},{"key":"key2020092409414964500_ref028","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.eswa.2018.01.012","article-title":"A novel ensemble method for credit scoring: adaption of different imbalance ratios","volume":"98","year":"2018","journal-title":"Expert Systems with Applications"},{"year":"2016","first-page":"5186","article-title":"Identifying malicious web domains using machine learning techniques with online credibility and performance data","key":"key2020092409414964500_ref029"},{"issue":"6","key":"key2020092409414964500_ref030","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.1007\/s11424-017-5293-7","article-title":"Profit guided or statistical error guided? A study of stock index forecasting using support vector regression","volume":"30","year":"2017","journal-title":"Journal of Systems Science & Complexity"},{"key":"key2020092409414964500_ref031","doi-asserted-by":"crossref","first-page":"24184","DOI":"10.1109\/ACCESS.2018.2817572","article-title":"An ensemble oversampling model for class imbalance problem in software defect prediction","volume":"6","year":"2018","journal-title":"IEEE Access"},{"issue":"5","key":"key2020092409414964500_ref032","doi-asserted-by":"crossref","first-page":"429","DOI":"10.3233\/IDA-2002-6504","article-title":"The class imbalance problem: a systematic study","volume":"6","year":"2002","journal-title":"Intelligent Data Analysis"},{"year":"2013","first-page":"48","article-title":"Streaming malware classification in the presence of concept drift and class imbalance","key":"key2020092409414964500_ref033"},{"year":"1997","first-page":"4104","article-title":"A discrete binary version of the particle swarm algorithm","key":"key2020092409414964500_ref034"},{"issue":"5","key":"key2020092409414964500_ref035","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1108\/IMDS-06-2016-0195","article-title":"Machine learning-based anomaly detection via integration of manufacturing, inspection and after-sales service data","volume":"117","year":"2017","journal-title":"Industrial Management & Data Systems"},{"year":"2018","first-page":"240","article-title":"Finding effective classifier for malicious URL detection","key":"key2020092409414964500_ref036"},{"issue":"4","key":"key2020092409414964500_ref037","doi-asserted-by":"crossref","first-page":"e0122855","DOI":"10.1371\/journal.pone.0122855","article-title":"Using support vector machine ensembles for target audience classification on Twitter","volume":"10","year":"2015","journal-title":"PLOS One"},{"key":"key2020092409414964500_ref038","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.ins.2013.07.007","article-title":"An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics","volume":"250","year":"2013","journal-title":"Information Sciences"},{"year":"2009","first-page":"1245","article-title":"Beyond blacklists: learning to detect malicious web sites from suspicious URLs","key":"key2020092409414964500_ref039"},{"year":"2009","first-page":"681","article-title":"Identifying suspicious URLs: an application of large-scale online learning","key":"key2020092409414964500_ref040"},{"issue":"7","key":"key2020092409414964500_ref041","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1057\/jors.2012.120","article-title":"On the suitability of resampling techniques for the class imbalance problem in credit scoring","volume":"64","year":"2013","journal-title":"Journal of the Operational Research Society"},{"key":"key2020092409414964500_ref042","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/j.eswa.2016.01.028","article-title":"New rule-based phishing detection method","volume":"53","year":"2016","journal-title":"Expert Systems with Applications"},{"issue":"5","key":"key2020092409414964500_ref043","doi-asserted-by":"crossref","first-page":"1233","DOI":"10.1109\/TKDE.2014.2365780","article-title":"Graph-based approaches for over-sampling in the context of ordinal regression","volume":"27","year":"2015","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"1","key":"key2020092409414964500_ref044","doi-asserted-by":"crossref","first-page":"25","DOI":"10.4236\/jis.2012.31004","article-title":"A distributed secure mechanism for resource protection in a digital ecosystem environment","volume":"3","year":"2012","journal-title":"Journal of Information Security"},{"issue":"3","key":"key2020092409414964500_ref045","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1108\/ICS-02-2013-0009","article-title":"Examining the effectiveness of phishing filters against DNS based phishing attacks","volume":"23","year":"2015","journal-title":"Information and Computer Security"},{"volume-title":"C4.5: Programs for Machine Learning","year":"1993","key":"key2020092409414964500_ref046"},{"year":"2017","article-title":"Malicious URL detection using machine learning: a survey","key":"key2020092409414964500_ref047"},{"year":"2010","first-page":"187","article-title":"Using domain top-page similarity feature in machine learning-based web phishing detection","key":"key2020092409414964500_ref048"},{"issue":"10","key":"key2020092409414964500_ref049","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1108\/IMDS-08-2016-0315","article-title":"Curbing electronic shopper perceived opportunism and encouraging trust","volume":"117","year":"2017","journal-title":"Industrial Management & Data Systems"},{"doi-asserted-by":"crossref","unstructured":"Tan, C.L., Chiew, K.L. and Sze, S.N. (2017), \u201cPhishing webpage detection using weighted URL tokens for identity keywords retrieval\u201d, in Ibrahim, H., Iqbal, S., Teoh, S.S. and Mustaffa, M.T., (Eds) 9th International Conference on Robotic, Vision, Signal Processing and Power Applications: Empowering Research and Innovation, Springer, Singapore, pp. 133-139.","key":"key2020092409414964500_ref050","DOI":"10.1007\/978-981-10-1721-6_15"},{"issue":"Supplement C","key":"key2020092409414964500_ref051","first-page":"18","article-title":"PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder","volume":"88","year":"2016","journal-title":"Decision Support Systems"},{"issue":"2","key":"key2020092409414964500_ref052","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1287\/isre.1090.0260","article-title":"The effect of online privacy information on purchasing behavior: an experimental study","volume":"22","year":"2011","journal-title":"Information Systems Research"},{"issue":"3","key":"key2020092409414964500_ref053","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1016\/j.ejor.2010.02.032","article-title":"A discrete particle swarm optimization method for feature selection in binary classification problems","volume":"206","year":"2010","journal-title":"European Journal of Operational Research"},{"year":"2007","first-page":"935","article-title":"Experimental perspectives on learning from imbalanced data","key":"key2020092409414964500_ref054"},{"volume-title":"The Nature of Statistical Learning Theory","year":"1995","key":"key2020092409414964500_ref055"},{"key":"key2020092409414964500_ref056","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.asoc.2013.09.014","article-title":"A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients","volume":"20","year":"2014","journal-title":"Applied Soft Computing"},{"key":"key2020092409414964500_ref057","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.neucom.2014.06.071","article-title":"An alternative way of presenting statistical test results when evaluating the performance of stochastic approaches","volume":"147","year":"2015","journal-title":"Neurocomputing"},{"year":"2010","article-title":"Large-scale automatic classification of phishing pages","key":"key2020092409414964500_ref058"},{"issue":"3","key":"key2020092409414964500_ref059","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1023\/A:1007626913721","article-title":"Reduction techniques for instance-based learning algorithms","volume":"38","year":"2000","journal-title":"Machine Learning"},{"issue":"6","key":"key2020092409414964500_ref060","doi-asserted-by":"crossref","first-page":"786","DOI":"10.1109\/TKDE.2005.95","article-title":"KBA: kernel boundary alignment considering imbalanced data distribution","volume":"17","year":"2005","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"key2020092409414964500_ref061","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.elerap.2017.06.004","article-title":"Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending","volume":"24","year":"2017","journal-title":"Electronic Commerce Research and Applications"},{"issue":"2","key":"key2020092409414964500_ref062","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2019599.2019606","article-title":"CANTINA+: a feature-rich machine learning framework for detecting phishing web sites","volume":"14","year":"2011","journal-title":"ACM Transactions on Information and System Security"},{"year":"2017","article-title":"Phishing website detection using C4.5 decision tree","key":"key2020092409414964500_ref063"},{"issue":"3","key":"key2020092409414964500_ref064","first-page":"1","article-title":"A particle swarm based hybrid system for imbalanced medical data sampling","volume":"10","year":"2009","journal-title":"BMC Genomics"},{"issue":"4","key":"key2020092409414964500_ref065","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1142\/S0219622006002258","article-title":"10 challenging problems in data mining research","volume":"5","year":"2006","journal-title":"International Journal of Information Technology & Decision Making"},{"issue":"1","key":"key2020092409414964500_ref066","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10844-009-0086-7","article-title":"Hierarchical associative classifier (HAC) for malware detection from the large and imbalanced gray list","volume":"35","year":"2010","journal-title":"Journal of Intelligent Information Systems"},{"issue":"10","key":"key2020092409414964500_ref067","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.1109\/TNN.2011.2161999","article-title":"Textual and visual content based anti-phishing: a Bayesian approach","volume":"22","year":"2011","journal-title":"IEEE Transactions on Neural Networks"},{"year":"2007","first-page":"639","article-title":"Cantina: a content-based approach to detecting phishing web sites","key":"key2020092409414964500_ref068"},{"year":"2013","first-page":"919","article-title":"Cost-sensitive online active learning with application to malicious URL detection","key":"key2020092409414964500_ref069"},{"key":"key2020092409414964500_ref070","first-page":"49","article-title":"Benchmarking sampling techniques for imbalance learning in churn prediction","volume-title":"Journal of the Operational Research Society","year":"2018"},{"key":"key2020092409414964500_ref071","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/j.neucom.2012.08.010","article-title":"Weighted extreme learning machine for imbalance learning","volume":"101","year":"2013","journal-title":"Neurocomputing"}],"container-title":["Industrial Management &amp; Data Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IMDS-02-2018-0072\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/IMDS-02-2018-0072\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T21:51:17Z","timestamp":1753393877000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/imds\/article\/119\/3\/676-696\/176779"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,8]]},"references-count":71,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,4,8]]}},"alternative-id":["10.1108\/IMDS-02-2018-0072"],"URL":"https:\/\/doi.org\/10.1108\/imds-02-2018-0072","relation":{},"ISSN":["0263-5577"],"issn-type":[{"type":"print","value":"0263-5577"}],"subject":[],"published":{"date-parts":[[2019,4,8]]}}}