{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:27:45Z","timestamp":1777854465100,"version":"3.51.4"},"reference-count":40,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2017,1,9]],"date-time":"2017-01-09T00:00:00Z","timestamp":1483920000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2018,4]]},"abstract":"<jats:p>Twitter is a social networking website that has gained a lot of popularity around the world in the last decade. This popularity made Twitter a common target for spammers and malicious users to spread unwanted advertisements, viruses and phishing attacks. In this article, we review the latest research works to determine the most effective features that were investigated for spam detection in the literature. These features are collected to build a comprehensive data set that can be used to develop more robust and accurate spammer detection models. The new data set is tested using popular classifiers (Naive Bayes, support vector machines, multilayer perceptron neural networks, Decision Trees, Random forests and k-Nearest Neighbour). The prediction performance of these classifiers is evaluated and compared based on different evaluation metrics. Moreover, a further analysis is carried out to identify the features that have higher impact on the accuracy of spam detection. Three different techniques are used and compared for this analysis: change of mean square error (CoM), information gain (IG) and Relief-F method. Top five features identified by each technique are used again to build the detection models. Experimental results show that most of the developed classifiers obtained high evaluation results based on the comprehensive data set constructed in this work. Experiments also reveal the important role of some features like the reputation of the account, average length of the tweet, average mention per tweet, age of the account, and the average time between posts in the process of identifying spammers in the social network.<\/jats:p>","DOI":"10.1177\/0165551516684296","type":"journal-article","created":{"date-parts":[[2017,1,9]],"date-time":"2017-01-09T10:01:45Z","timestamp":1483956105000},"page":"230-247","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":39,"title":["Feature engineering for detecting spammers on Twitter: Modelling and analysis"],"prefix":"10.1177","volume":"44","author":[{"given":"Wafa","family":"Herzallah","sequence":"first","affiliation":[{"name":"Business Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hossam","family":"Faris","sequence":"additional","affiliation":[{"name":"Business Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Omar","family":"Adwan","sequence":"additional","affiliation":[{"name":"Business Information Technology, King Abdullah II School of Information Technology, The University of Jordan, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2017,1,9]]},"reference":[{"key":"bibr1-0165551516684296","unstructured":"Social Times. Facebook, Twitter, Instagram, Pinterest, Vine, Snapchat: social media stats 2014, http:\/\/www.adweek.com\/socialtimes\/social-media-statistics-2014\/499230 (2014, accessed January 2016)."},{"key":"bibr2-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515610513"},{"key":"bibr3-0165551516684296","doi-asserted-by":"publisher","DOI":"10.14445\/22312803\/IJCTT-V9P117"},{"key":"bibr4-0165551516684296","unstructured":"Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224N project report, Stanford University, Stanford, CA, December 2009, p. 12."},{"key":"bibr5-0165551516684296","first-page":"241","volume-title":"Proceedings of the ACM 2012 conference on computer supported cooperative work","author":"De Choudhury M"},{"key":"bibr6-0165551516684296","unstructured":"Wang B, Zubiaga A, Liakata M, Procter R. Making the most of tweet-inherent features for social spam detection on Twitter. arXiv preprint arXiv 2015:1503.07405."},{"key":"bibr7-0165551516684296","first-page":"435","volume-title":"Proceedings of the 33rd international ACM SIGIR conference on research and development in information","author":"Lee K"},{"key":"bibr8-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1177\/0165551512439173"},{"key":"bibr9-0165551516684296","unstructured":"Twitter help Center. The Twitter rules, https:\/\/support.twitter.com\/articles\/18311 (accessed October 2015)."},{"key":"bibr10-0165551516684296","first-page":"841","volume-title":"Proceedings of the 15th international conference on advanced communication technology (ICACT)","author":"Lin PC"},{"key":"bibr11-0165551516684296","unstructured":"Gee G, Teh H. Twitter spammer profile detection. CS229 project report, Stanford University, Stanford, CA, December 2010."},{"key":"bibr12-0165551516684296","first-page":"12","volume-title":"Proceedings of the collaboration, electronic messaging, anti-abuse and spam conference (CEAS)","volume":"6","author":"Benevenuto F"},{"key":"bibr13-0165551516684296","first-page":"1","volume-title":"Proceedings of the international conference on security and cryptography (SECRYPT)","author":"Wang AH"},{"key":"bibr14-0165551516684296","first-page":"1","volume-title":"Proceedings of the 5th international conference on communication systems and networks (COMSNETS)","author":"Amleshwaram AA"},{"key":"bibr15-0165551516684296","unstructured":"Chakraborty A, Sundi J, Satapathy S, SPAM: a framework for social profile abuse monitoring. CSE508 report, Stony Brook University, Stony Brook, NY, 2012."},{"key":"bibr16-0165551516684296","first-page":"175","volume-title":"Proceedings of the 8th international conference on autonomic and trusted computing","author":"Mccord M"},{"key":"bibr17-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2013.3"},{"key":"bibr18-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2012.02.053"},{"key":"bibr19-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2014.03.001"},{"key":"bibr20-0165551516684296","first-page":"1","author":"Moro S","year":"2016","journal-title":"Neural Comput Appl"},{"key":"bibr21-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbusres.2016.02.010"},{"key":"bibr22-0165551516684296","unstructured":"Twitter Developers. Documentation, https:\/\/dev.twitter.com\/overview\/documentation (accessed January 2016)."},{"key":"bibr23-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29361-0_19"},{"key":"bibr24-0165551516684296","first-page":"27","volume-title":"Proceedings of the 17th ACM conference on computer and communications security","author":"Grier C"},{"key":"bibr25-0165551516684296","first-page":"243","volume-title":"Proceedings of the ACM SIGCOMM conference on Internet measurement conference","author":"Thomas K"},{"key":"bibr26-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1111\/tgis.12101"},{"key":"bibr27-0165551516684296","first-page":"86","volume-title":"Proceedings of the 30th annual computer security applications conference","author":"Yang C"},{"key":"bibr28-0165551516684296","unstructured":"HubSpot. The ultimate list of Email spam trigger words, http:\/\/blog.hubspot.com\/blog\/tabid\/6307\/bid\/30684\/The-Ultimate-List-of-Email-SPAM-Trigger-Words.aspx (accessed January 2016)."},{"key":"bibr29-0165551516684296","volume-title":"Machine learning","author":"Mitchell TM","year":"1997"},{"key":"bibr30-0165551516684296","first-page":"46","volume":"6","author":"Garson DG","year":"1991","journal-title":"AI Expert"},{"key":"bibr31-0165551516684296","first-page":"124","volume-title":"Proceedings of the international conference of machine learning (ICML\u201999)","author":"Freund Y"},{"key":"bibr32-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"bibr33-0165551516684296","first-page":"185","volume-title":"Advances in kernel methods: support vector learning","author":"Platt JC","year":"1999"},{"key":"bibr34-0165551516684296","first-page":"69","volume-title":"Proceedings of the 5th conference on message understanding","author":"Chinchor N"},{"key":"bibr35-0165551516684296","first-page":"250","volume-title":"Proceedings of the 9th international conference on collaborative computing: networking, applications and worksharing (collaboratecom)","author":"Wang D"},{"key":"bibr36-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"bibr37-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1016\/S0957-4174(98)00041-4"},{"issue":"3","key":"bibr38-0165551516684296","first-page":"75","volume":"11","author":"Adwan O","year":"2014","journal-title":"Life Sci J"},{"key":"bibr39-0165551516684296","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-35488-8_23"},{"key":"bibr40-0165551516684296","doi-asserted-by":"publisher","DOI":"10.2298\/YJOR1101119N"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551516684296","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551516684296","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551516684296","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:08:28Z","timestamp":1777504108000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551516684296"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,1,9]]},"references-count":40,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,4]]}},"alternative-id":["10.1177\/0165551516684296"],"URL":"https:\/\/doi.org\/10.1177\/0165551516684296","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,1,9]]}}}