{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:36:04Z","timestamp":1773797764301,"version":"3.50.1"},"reference-count":65,"publisher":"Emerald","issue":"3","license":[{"start":{"date-parts":[[2018,10,30]],"date-time":"2018-10-30T00:00:00Z","timestamp":1540857600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JSIT"],"published-print":{"date-parts":[[2018,11,14]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>Email spam classification is now becoming a challenging area in the domain of text classification. Precise and robust classifiers are not only judged by classification accuracy but also by sensitivity (correctly classified legitimate emails) and specificity (correctly classified unsolicited emails) towards the accurate classification, captured by both false positive and false negative rates. This paper aims to present a comparative study between various decision tree classifiers (such as AD tree, decision stump and REP tree) with\/without different boosting algorithms (bagging, boosting with re-sample and AdaBoost).<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>Artificial intelligence and text mining approaches have been incorporated in this study. Each decision tree classifier in this study is tested on informative words\/features selected from the two publically available data sets (SpamAssassin and LingSpam) using a greedy step-wise feature search method.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>Outcomes of this study show that without boosting, the REP tree provides high performance accuracy with the AD tree ranking as the second-best performer. Decision stump is found to be the under-performing classifier of this study. However, with boosting, the combination of REP tree and AdaBoost compares favourably with other classification models. If the metrics false positive rate and performance accuracy are taken together, AD tree and REP tree with AdaBoost were both found to carry out an effective classification task. Greedy stepwise has proven its worth in this study by selecting a subset of valuable features to identify the correct class of emails.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Research limitations\/implications<\/jats:title><jats:p>This research is focussed on the classification of those email spams that are written in the English language only. The proposed models work with content (words\/features) of email data that is mostly found in the body of the mail. Image spam has not been included in this study. Other messages such as short message service or multi-media messaging service were not included in this study.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Practical implications<\/jats:title><jats:p>In this research, a boosted decision tree approach has been proposed and used to classify email spam and ham files; this is found to be a highly effective approach in comparison with other state-of-the-art modes used in other studies. This classifier may be tested for different applications and may provide new insights for developers and researchers.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>A comparison of decision tree classifiers with\/without ensemble has been presented for spam classification.<\/jats:p><\/jats:sec>","DOI":"10.1108\/jsit-11-2017-0105","type":"journal-article","created":{"date-parts":[[2018,10,30]],"date-time":"2018-10-30T06:07:11Z","timestamp":1540879631000},"page":"298-105","source":"Crossref","is-referenced-by-count":16,"title":["Spam classification: a comparative analysis of different boosted decision tree approaches"],"prefix":"10.1108","volume":"20","author":[{"given":"Shrawan Kumar","family":"Trivedi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prabin Kumar","family":"Panigrahi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"140","published-online":{"date-parts":[[2018,10,30]]},"reference":[{"key":"key2021041507545239900_ref01a","year":"1999","journal-title":"Text categorisation: A survey"},{"issue":"1","key":"key2021041507545239900_ref001","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/TKDE.2010.36","article-title":"Classification using streaming random forests","volume":"23","year":"2011","journal-title":"Ieee Transactions on Knowledge and Data Engineering"},{"key":"key2021041507545239900_ref002","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1145\/1299015.1299021","article-title":"A comparison of machine learning techniques for phishing detection","volume-title":"Proceedings of the Anti-phishing Working Groups 2nd Annual eCrime Researchers Summit","year":"2007"},{"issue":"2","key":"key2021041507545239900_ref003","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","year":"1996","journal-title":"Machine Learning"},{"key":"key2021041507545239900_ref004","first-page":"152","article-title":"A simple named entity extractor using AdaBoost","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003","year":"2003"},{"key":"key2021041507545239900_ref005","article-title":"Boosting trees for anti-spam email filtering","year":"2001"},{"key":"key2021041507545239900_ref006","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1145\/1277741.1277814","article-title":"Know your neighbors: Web spam detection using the web topology","volume-title":"Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","year":"2007"},{"issue":"8","key":"key2021041507545239900_ref007","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1145\/280324.280336","article-title":"Spam!","volume":"41","year":"1998","journal-title":"Communications of the Acm"},{"key":"key2021041507545239900_ref008","first-page":"1","article-title":"Network traffic classification in encrypted environment: a case study of google hangout","volume-title":"Communications (NCC), 2015 Twenty First National Conference on","year":"2015"},{"key":"key2021041507545239900_ref009","article-title":"Spam detection using clustering, random forests, and active learning","volume-title":"Sixth Conference on Email and Anti-Spam.","year":"2009"},{"key":"key2021041507545239900_ref010","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","year":"2006","journal-title":"Journal of Machine Learning Research"},{"issue":"5","key":"key2021041507545239900_ref011","doi-asserted-by":"crossref","first-page":"1048","DOI":"10.1109\/72.788645","article-title":"Support vector machines for spam categorization","volume":"10","year":"1999","journal-title":"Ieee Transactions on Neural Networks"},{"key":"key2021041507545239900_ref056a","first-page":"335","article-title":"Pattern classification","volume":"1","year":"2001","journal-title":"International Journal of Computational Intelligence and Applications"},{"key":"key2021041507545239900_ref012a","article-title":"The jackknife, the bootstrap, and other resampling plans","volume-title":"Siam","year":"1982"},{"key":"key2021041507545239900_ref012","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1145\/1964114.1964121","article-title":"Web spam classification: a few features worth more","volume-title":"Proceedings of the 2011 Joint WICOW\/AIRWeb Workshop on Web Quality","year":"2011"},{"issue":"3","key":"key2021041507545239900_ref013","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.artmed.2007.04.001","article-title":"A methodology for the automated creation of fuzzy expert systems for ischaemic and arrhythmic beat classification based on a set of rules obtained by a decision tree","volume":"40","year":"2007","journal-title":"Artificial Intelligence in Medicine"},{"key":"key2021041507545239900_ref014","first-page":"124","article-title":"The alternating decision tree learning algorithm","volume":"99","year":"1999","journal-title":"In ICML"},{"key":"key2021041507545239900_ref015","first-page":"148","article-title":"Experiments with a new boosting algorithm","volume":"96","year":"1996","journal-title":"In Icml"},{"issue":"10","key":"key2021041507545239900_ref015a","volume":"1","year":"2001","journal-title":"The Elements of Statistical Learning"},{"key":"key2021041507545239900_ref016","first-page":"509","article-title":"A stochastic algorithm for feature selection in pattern recognition","volume":"8","year":"2007","journal-title":"The Journal of Machine Learning Research"},{"issue":"3","key":"key2021041507545239900_ref017","doi-asserted-by":"crossref","first-page":"1713","DOI":"10.48084\/etasr.1171","article-title":"Detection of spam email by combining harmony search algorithm and decision tree. Engineering","volume":"7","year":"2017","journal-title":"Technology and Applied Science Research"},{"key":"key2021041507545239900_ref018","first-page":"522","article-title":"Spam detection using KNN and decision tree mechanism in social network","volume-title":"Parallel, Distributed and Grid Computing (PDGC), 2016 Fourth International Conference on","year":"2016"},{"key":"key2021041507545239900_ref019","first-page":"5186","article-title":"Identifying malicious web domains using machine learning techniques with online credibility and performance data","volume-title":"Proceedings of the IEEE Congress on Evolutionary Computation (CEC)","year":"2016"},{"key":"key2021041507545239900_ref020","first-page":"233","article-title":"Induction of One-Level decision trees","year":"1992","journal-title":"In ML"},{"key":"key2021041507545239900_ref021","first-page":"90","article-title":"On the relationship between feature selection and classification accuracy","volume":"4","year":"2008","journal-title":"Journal of Machine Learning Research-Proceedings Track"},{"issue":"1\/2","key":"key2021041507545239900_ref022","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1108\/JSIT-10-2016-0061","article-title":"Comparison of supervised machine learning techniques for customer churn prediction based on analysis of customer behavior","volume":"19","year":"2017","journal-title":"Journal of Systems and Information Technology"},{"issue":"9","key":"key2021041507545239900_ref023","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1038\/nbt0908-1011","article-title":"What are decision trees?","volume":"26","year":"2008","journal-title":"Nature Biotechnology"},{"issue":"10","key":"key2021041507545239900_ref024","doi-asserted-by":"crossref","first-page":"2167","DOI":"10.1016\/j.ins.2006.12.005","article-title":"Learning to classify e-mail","volume":"177","year":"2007","journal-title":"Information Sciences"},{"key":"key2021041507545239900_ref025","first-page":"14","article-title":"Comparative study on email spam classifier using data mining techniques","volume-title":"Proceedings of the International MultiConference of Engineers and Computer Scientists","year":"2012"},{"issue":"3","key":"key2021041507545239900_ref026","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.knosys.2006.05.016","article-title":"An empirical study of three machine learning methods for spam filtering","volume":"20","year":"2007","journal-title":"Knowledge-Based Systems"},{"issue":"8","key":"key2021041507545239900_ref027","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1109\/TNNLS.2012.2199516","article-title":"Study on the impact of partition-induced dataset shift on $k $-fold cross-validation","volume":"23","year":"2012","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"key2021041507545239900_ref028","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1145\/1135777.1135794","article-title":"Detecting spam web pages through content analysis","volume-title":"Proceedings of the 15th International Conference on World Wide Web","year":"2006"},{"key":"key2021041507545239900_ref029","first-page":"496","article-title":"GUJSTER: a rule based stemmer using dictionary approach","volume-title":"Inventive Communication and Computational Technologies (ICICCT), 2017 International Conference on","year":"2017"},{"issue":"3","key":"key2021041507545239900_ref031a","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1016\/S0020-7373(87)80053-6","article-title":"Simplifying decision trees","volume":"27","year":"1987","journal-title":"International Journal of Man-Machine Studies"},{"key":"key2021041507545239900_ref030","volume-title":"C4.5: Programs for Machine Learning","year":"1993"},{"issue":"5","key":"key2021041507545239900_ref031","article-title":"Feature extraction and duplicate detection for text mining: a survey","volume":"16","year":"2017","journal-title":"Global Journal of Computer Science and Technology"},{"key":"key2021041507545239900_ref032","article-title":"Exploring support vector machines and random forests for spam detection","volume-title":"In CEAS.","year":"2004"},{"key":"key2021041507545239900_ref033","first-page":"98","article-title":"A bayesian approach to filtering junk e-mail","volume":"62","year":"1998","journal-title":"Learning for Text Categorization: Papers from the 1998 Workshop"},{"key":"key2021041507545239900_ref034","article-title":"Stacking classifiers for anti-spam filtering of e-mail","year":"2001"},{"issue":"2","key":"key2021041507545239900_ref035a","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00116037","article-title":"The strength of weak learnability","volume":"5","year":"1990","journal-title":"Machine Learning"},{"key":"key2021041507545239900_ref034a","first-page":"7","article-title":"A tutorial on automated text categorisation","year":"1999"},{"key":"key2021041507545239900_ref035","first-page":"23","article-title":"A comparative study of text preprocessing techniques for natural language call routing","volume-title":"Dialogues with Social Robots","year":"2017"},{"key":"key2021041507545239900_ref036","first-page":"265","article-title":"A comparative analysis of various spam classifications","volume-title":"Progress in Intelligent Computing Techniques: Theory, Practice, and Applications","year":"2018"},{"key":"key2021041507545239900_ref037","first-page":"1","article-title":"Support vector machines and random forests modeling for spam senders behavior analysis","volume-title":"In Global Telecommunications Conference, 2008, IEEE GLOBECOM 2008, IEEE","year":"2008"},{"key":"key2021041507545239900_ref038","first-page":"181","article-title":"Sentiment analyis of indian movie review with various feature selection techniques","volume-title":"Advances in Computer Applications (ICACA), IEEE International Conference on","year":"2016"},{"key":"key2021041507545239900_ref039","first-page":"176","article-title":"A study of machine learning classifiers for spam detection","volume-title":"Computational and Business Intelligence (ISCBI), 2016 4th International Symposium on","year":"2016"},{"issue":"21","key":"key2021041507545239900_ref040","article-title":"Effect of various kernels and feature selection methods on SVM performance for detecting email spams","volume":"66","year":"2013","journal-title":"International Journal of Computer Applications"},{"issue":"2","key":"key2021041507545239900_ref041","article-title":"Interplay between probabilistic classifiers and boosting algorithms for detecting complex unsolicited emails","volume":"1","year":"2013","journal-title":"Journal of Advances in Computer Networks"},{"key":"key2021041507545239900_ref042","article-title":"An enhanced genetic programming approach for detecting unsolicited emails","volume-title":"Proc. 2013 IEEE 16th International Conference on Computational Science and Engineering","year":"2013"},{"key":"key2021041507545239900_ref043","first-page":"35","article-title":"Effect of feature selection methods on machine learning classifiers for detecting email spams","volume-title":"In Proceedings of the 2013 Research in Adaptive and Convergent Systems","year":"2013"},{"issue":"1","key":"key2021041507545239900_ref044","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1145\/2600617.2600622","article-title":"Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails","volume":"14","year":"2014","journal-title":"ACM SIGAPP Applied Computing Review"},{"issue":"4","key":"key2021041507545239900_ref047","doi-asserted-by":"crossref","first-page":"524","DOI":"10.1108\/VJIKMS-07-2015-0042","article-title":"A novel committee selection mechanism for combining classifiers to detect unsolicited emails","volume":"46","year":"2016","journal-title":"VINE Journal of Information and Knowledge Management Systems"},{"key":"key2021041507545239900_ref045","first-page":"64","article-title":"A comparative study of various supervised feature selection methods for spam classification","volume-title":"Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies","year":"2016"},{"key":"key2021041507545239900_ref046","first-page":"355","article-title":"A combining classifiers approach for detecting email spams","volume-title":"Advanced Information Networking and Applications Workshops (WAINA), 2016 30th International Conference on","year":"2016"},{"key":"key2021041507545239900_ref048","volume-title":"Data Mining: practical Machine Learning Tools and Techniques","year":"2005","edition":"2nd ed"},{"key":"key2021041507545239900_ref049","first-page":"1","article-title":"An enhanced deep feature representation for person re-identification","volume-title":"In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on","year":"2016"},{"key":"key2021041507545239900_ref050","first-page":"861","article-title":"An approach to spam detection by naive bayes ensemble based on decision induction","volume-title":"Intelligent Systems Design and Applications, 2006. ISDA'06. Sixth International Conference on","year":"2006"},{"key":"key2021041507545239900_ref051","first-page":"249","article-title":"Efficient spam email filtering using adaptive ontology","volume-title":"Information Technology, 2007. ITNG'07. Fourth International Conference on","year":"2007"},{"issue":"2","key":"key2021041507545239900_ref052","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1166\/asl.2012.1768","article-title":"Spam detection via feature selection and decision tree","volume":"5","year":"2012","journal-title":"Advanced Science Letters"},{"key":"key2021041507545239900_ref053","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.knosys.2014.03.015","article-title":"Binary PSO with mutation operator for feature selection using decision tree applied to spam detection","volume":"64","year":"2014","journal-title":"Knowledge-Based Systems"},{"key":"key2021041507545239900_ref055a","volume-title":"Ensemble Methods: Foundations and Algorithms","year":"2012"},{"key":"key2021041507545239900_ref054","unstructured":"Aladdin Knowledge Systems (2018), \u201cAnti-spam white paper\u201d, available at: http:\/\/www.eAladdin.com."},{"key":"key2021041507545239900_ref055","volume-title":"Pattern Classification","year":"2012"},{"key":"key2021041507545239900_ref056","volume-title":"The Elements of Statistical Learning","year":"2009"},{"key":"key2021041507545239900_ref057","first-page":"313","article-title":"Using output codes to boost multiclass learning problems","volume":"97","year":"1997","journal-title":"ICML"}],"container-title":["Journal of Systems and Information Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/JSIT-11-2017-0105\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/JSIT-11-2017-0105\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T22:24:06Z","timestamp":1753395846000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jsit\/article\/20\/3\/298-105\/246173"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,30]]},"references-count":65,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2018,10,30]]},"published-print":{"date-parts":[[2018,11,14]]}},"alternative-id":["10.1108\/JSIT-11-2017-0105"],"URL":"https:\/\/doi.org\/10.1108\/jsit-11-2017-0105","relation":{},"ISSN":["1328-7265"],"issn-type":[{"value":"1328-7265","type":"print"}],"subject":[],"published":{"date-parts":[[2018,10,30]]}}}