{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T13:58:12Z","timestamp":1771941492184,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T00:00:00Z","timestamp":1771891200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>Spam messages are unwanted, irrelevant, or potentially harmful messages sent in bulk to large numbers of recipients via email, SMS, or social media. These messages pose a threat of spam to individual users and commercial companies. They threaten digital communication platforms by enabling phishing, malware distribution, service disruption, and unsolicited advertisements. Several mechanisms have been used in the literature to detect spam over digital communication systems. This includes rule-based filtering, Bayesian filtering, heuristic analysis, and machine learning (ML) techniques. Traditional rule-based and heuristic analyses were insufficient to cope with evolving attack patterns. Meanwhile, ML models can present modern, dynamic, appropriate, and efficient solutions in this manner. This study aims to evaluate and compare several basic ML models for spam detection, considering popular benchmark datasets on several communication platforms as a comprehensive comparative study. The experimental results demonstrate that the tested models achieve good accuracy, precision, recall, and F1-score on each investigated benchmark dataset. However, the performance of all models has decreased drastically when the trained models are tested on an unseen dataset. Recommendations for future required enhancements to handle this reduction in the performance of ML techniques for unseen datasets are provided. Finally, extra experimental tests have shown the positive impact of applying some of these recommendations.<\/jats:p>","DOI":"10.3390\/systems14030229","type":"journal-article","created":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T12:19:23Z","timestamp":1771935563000},"page":"229","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Machine Learning Based Spam Detection in Digital Communication Systems: A Comparative Analysis"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3844-6409","authenticated-orcid":false,"given":"Maram","family":"Bani Younes","sequence":"first","affiliation":[{"name":"Cybersecurity, Information Technology Faculty, American University of Madaba, Amman 11821, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmad","family":"Ababneh","sequence":"additional","affiliation":[{"name":"Computer Science, Information Technology Faculty, American University of Madaba, Amman 11821, Jordan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hussain, N., Mirza, H.T., Rasool, G., Hussain, I., and Kaleem, M. (2019). Spam review detection techniques: A systematic literature review. Appl. Sci., 9.","DOI":"10.3390\/app9050987"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"473","DOI":"10.3233\/JCS-210022","article-title":"Advances in spam detection for email spam, web spam, social network spam, and review spam: ML-based and nature-inspired-based techniques","volume":"29","author":"Akinyelu","year":"2021","journal-title":"J. Comput. Secur."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wang, D., Irani, D., and Pu, C. (2011, January 1\u20132). A social-spam detection framework. Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, Perth, Australia.","DOI":"10.1145\/2030376.2030382"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1016\/j.ins.2016.07.033","article-title":"Follow spam detection based on cascaded social information","volume":"369","author":"Jeong","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Song, J., Lee, S., and Kim, J. (2011). Spam filtering in twitter using sender-receiver relationship. International Workshop on Recent Advances in Intrusion Detection, Springer.","DOI":"10.1007\/978-3-642-23644-0_16"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sinha, P., Maini, O., Malik, G., and Kaushal, R. (2016). Ecosystem of spamming on Twitter: Analysis of spam reporters and spam reportees. 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE.","DOI":"10.1109\/ICACCI.2016.7732293"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Rathod, S.B., and Pattewar, T.M. (2015). Content-based spam detection in email using Bayesian classifier. 2015 International Conference on Communications and Signal Processing (ICCSP), IEEE.","DOI":"10.1109\/ICCSP.2015.7322709"},{"key":"ref_8","unstructured":"Wang, A.H. (2010). Don\u2019t follow me: Spam detection in Twitter. 2010 International Conference on Security and Cryptography (SECRYPT), IEEE."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Peng, W., Huang, L., Jia, J., and Ingram, E. (2018). Enhancing the naive bayes spam filter through intelligent text modification detection. 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications\/12th IEEE International Conference on Big Data Science and Engineering (TrustCom\/BigDataSE), IEEE.","DOI":"10.1109\/TrustCom\/BigDataSE.2018.00122"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3510420","article-title":"A weak-region enhanced Bayesian classification for spam content-based filtering","volume":"22","author":"Nosrati","year":"2023","journal-title":"ACM Trans. Asian -Low-Resour. Lang. Inf. Process."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"020007","DOI":"10.1063\/1.5038979","article-title":"Understanding of the naive Bayes classifier in spam filtering","volume":"Volume 1967","author":"Wei","year":"2018","journal-title":"AIP Conference Proceedings"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hossain, F., Uddin, M.N., and Halder, R.K. (2021). Analysis of optimized machine learning and deep learning techniques for spam detection. 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), IEEE.","DOI":"10.1109\/IEMTRONICS52119.2021.9422508"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"7710005","DOI":"10.1155\/2022\/7710005","article-title":"Efficient E-Mail Spam Detection Strategy Using Genetic Decision Tree Processing with NLP Features","volume":"2022","author":"Ismail","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Klimt, B., and Yang, Y. (2004). The Enron corpus: A new dataset for email classification research. Proceedings of the European Conference on Machine Learning (ECML 2004), Springer. Available online: https:\/\/www.kaggle.com\/datasets\/advaithsrao\/enron-fraud-email-dataset.","DOI":"10.1007\/978-3-540-30115-8_22"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Cormack, G.V., and Lynam, T.R. (2026, February 15). TREC 2007 Spam Track: Spam Filter Evaluation Corpus [Data set]. National Institute of Standards and Technology (NIST), Text REtrieval Conference (TREC), Available online: https:\/\/trec.nist.gov\/pubs\/trec16\/papers\/SPAM.OVERVIEW16.pdf.","DOI":"10.6028\/NIST.SP.500-274.spam-overview"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., and Spyropoulos, C.D. (2000). An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages [Data Set], Athens University of Economics and Business.","DOI":"10.1145\/345508.345569"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"14994","DOI":"10.48084\/etasr.7631","article-title":"Advancing email spam classification using machine learning and deep learning techniques","volume":"14","author":"Alsuwit","year":"2024","journal-title":"Eng. Technol. Appl. Sci. Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"100473","DOI":"10.1016\/j.eij.2024.100473","article-title":"Email spam detection by deep learning models using novel feature selection technique and BERT","volume":"26","author":"Nasreen","year":"2024","journal-title":"Egypt. Inform. J."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"157276","DOI":"10.1109\/ACCESS.2025.3605850","article-title":"SHRED: An Ensemble-Based Machine Learning Model to Sift Email Messages for Real-Time Spam Detection","volume":"13","author":"Alam","year":"2025","journal-title":"IEEE Access"},{"key":"ref_20","first-page":"1","article-title":"Efficient email spam detection using machine learning techniques: A comparative analysis of classification models","volume":"24","author":"Raihen","year":"2024","journal-title":"Int. J. Intell. Comput. Inf. Sci."},{"key":"ref_21","first-page":"205","article-title":"ESD: E-mail spam detection using cybersecurity-driven header analysis and machine learning based content analysis","volume":"20","author":"Batra","year":"2024","journal-title":"Int. J. Perform. Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"04013","DOI":"10.1051\/itmconf\/20257004013","article-title":"Enhancing Spam Filtering: A Comparative Study of Modern Advanced Machine Learning Techniques","volume":"Volume 70","author":"Zhang","year":"2025","journal-title":"ITM Web of Conferences"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"189","DOI":"10.22214\/ijraset.2024.62390","article-title":"Machine Learning Based Email Spam Detection: Achieving High Accuracy and Efficiency","volume":"12","author":"Joglekar","year":"2024","journal-title":"Int. J. Res. Appl. Sci. Eng. Technol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Qader, W.A., Ameen, M.M., and Ahmed, B.I. (2019). An overview of bag of words: Importance, implementation, applications, and challenges. 2019 International Engineering Conference (IEC), IEEE.","DOI":"10.1109\/IEC47844.2019.8950616"},{"key":"ref_25","first-page":"25","article-title":"Text mining: Use of TF-IDF to examine the relevance of words to documents","volume":"181","author":"Qaiser","year":"2018","journal-title":"Int. J. Comput. Appl."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"12320","DOI":"10.1007\/s11227-021-03743-2","article-title":"Considerations about learning Word2Vec","volume":"77","author":"Buonanno","year":"2021","journal-title":"J. Supercomput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014, January 25\u201329). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_28","unstructured":"Frank, A. (2026, February 15). UCI Machine Learning Repository. Available online: http:\/\/archive.ics.uci.edu\/ml."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, T., and Kan, M.-Y. (2013). Creating a Live, Public Short Message Service (SMS) Corpus: The NUS SMS Corpus [Data Set], Department of Computer Science, National University of Singapore.","DOI":"10.1007\/s10579-012-9197-9"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ahmadi, M., Khajavi, M., Varmaghani, A., Ala, A., Danesh, K., and Javaheri, D. (2025). Leveraging large language models for cybersecurity: Enhancing sms spam detection with robust and context-aware text classification. arXiv.","DOI":"10.1080\/23335777.2025.2550938"},{"key":"ref_31","first-page":"161","article-title":"Enhancing Spam Detection Using Hybrid of Harris Hawks and Firefly Optimization Algorithms","volume":"5","author":"Abualhaj","year":"2024","journal-title":"J. Soft Comput. Data Min."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"24306","DOI":"10.1109\/ACCESS.2024.3364671","article-title":"Investigating evasive techniques in SMS spam filtering: A comparative analysis of machine learning models","volume":"12","author":"Salman","year":"2024","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"447","DOI":"10.12928\/telkomnika.v23i2.26615","article-title":"Enhancing spam detection using Harris Hawks optimization algorithm","volume":"23","author":"Abualhaj","year":"2025","journal-title":"TELKOMNIKA Telecommun. Comput. Electron. Control."},{"key":"ref_34","first-page":"1","article-title":"Spam Detection Using Machine Learning","volume":"16","author":"Patil","year":"2025","journal-title":"IJSAT Int. J. Sci.Technol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"128160","DOI":"10.1016\/j.eswa.2025.128160","article-title":"A Hybrid TwinSVM-HHO Model for Multilingual Spam Review Detection Using Sentiment Features and Pre-trained Embeddings","volume":"287","author":"Mora","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.3390\/ai5030052","article-title":"Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach","volume":"5","author":"Hantom","year":"2024","journal-title":"AI"},{"key":"ref_37","first-page":"100550","article-title":"Spam detection for YouTube video comments using machine learning approaches","volume":"16","author":"Xiao","year":"2024","journal-title":"Mach. Learn. Appl."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"103993","DOI":"10.1016\/j.rineng.2025.103993","article-title":"Quantum behaved binary gravitational search algorithm with random forest for twitter spammer detection","volume":"25","author":"Sharma","year":"2025","journal-title":"Results Eng."},{"key":"ref_39","unstructured":"(2026, February 15). Available online: https:\/\/www.kaggle.com\/datasets\/khajahussainsk\/facebook-spam-dataset."},{"key":"ref_40","unstructured":"(2026, February 15). Available online: https:\/\/www.kaggle.com\/datasets\/rajumavinmar\/fake-instagram-profile-dataset."},{"key":"ref_41","unstructured":"(2026, February 15). Available online: https:\/\/www.kaggle.com\/datasets\/greyhatboy\/twitter-spam-dataset."},{"key":"ref_42","unstructured":"(2026, February 15). Available online: https:\/\/www.kaggle.com\/datasets\/ruhul20\/youtube-spam-dataset."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Alshattnawi, S., Shatnawi, A., AlSobeh, A.M.R., and Magableh, A.A. (2024). Beyond word-based model embeddings: Contextualized representations for enhanced social media spam detection. Appl. Sci., 14.","DOI":"10.3390\/app14062254"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"973","DOI":"10.1109\/TCSS.2018.2878852","article-title":"A neural network-based ensemble approach for spam detection in Twitter","volume":"5","author":"Madisetty","year":"2018","journal-title":"IEEE Trans. Comput. Soc. Syst."},{"key":"ref_45","unstructured":"SpamAssassin Project (2002). SpamAssassin Public Corpus [Data Set], The Apache Software Foundation."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1080\/19393555.2015.1078017","article-title":"An Adaptive and Collaborative Server-Side SMS Spam Filtering Scheme Using Artificial Immune System","volume":"24","author":"Onashoga","year":"2015","journal-title":"Inf. Secur. J. Glob. Perspect."},{"key":"ref_47","unstructured":"(2026, February 15). Available online: https:\/\/www.kaggle.com\/datasets\/tinu10kumar\/sms-spam-dataset."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1613\/jair.1.11192","article-title":"SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary","volume":"61","author":"Garcia","year":"2018","journal-title":"J. Artif. Intell. Res."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Xie, H., Shao, Y., Li, Z., Alomari, Z., and Makanju, A. (2025). Optimization of class imbalance techniques in machine learning models for network intrusion detection. 2025 9th International Conference on Cryptography, Security and Privacy (CSP), IEEE.","DOI":"10.1109\/CSP66295.2025.00025"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Huang, Y., and Li, L. (2011). Naive Bayes classification algorithm based on small sample set. 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, IEEE.","DOI":"10.1109\/CCIS.2011.6045027"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1162\/089976601300014493","article-title":"Improvements to Platt\u2019s SMO algorithm for SVM classifier design","volume":"13","author":"Keerthi","year":"2001","journal-title":"Neural Comput."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1016\/j.cmpb.2013.11.004","article-title":"A random forest classifier for lymph diseases","volume":"113","author":"Azar","year":"2014","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Yigit, H. (2013). A weighting approach for KNN classifier. 2013 International Conference on Electronics, Computer and Computation (ICECCO), IEEE.","DOI":"10.1109\/ICECCO.2013.6718270"},{"key":"ref_54","first-page":"246","article-title":"Decision tree classifier: A detailed survey","volume":"12","author":"Priyanka","year":"2020","journal-title":"Int. J. Inf. Decis. Sci."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"13965","DOI":"10.1007\/s10586-018-2158-3","article-title":"Feature extraction using LR-PCA hybridization on Twitter data and classification accuracy using machine learning algorithms","volume":"22","author":"Murugan","year":"2019","journal-title":"Clust. Comput."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1615","DOI":"10.59934\/jaiea.v4i3.965","article-title":"Analysis of the Application of Machine Learning Algorithm in Spam Detection System: Literature Review","volume":"4","author":"Putra","year":"2025","journal-title":"J. Artif. Intell. Eng. Appl. (JAIEA)"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Galli, C., Donos, N., and Calciolari, E. (2024). Performance of 4 pre-trained sentence transformer models in the semantic query of a systematic review dataset on peri-implantitis. Information, 15.","DOI":"10.3390\/info15020068"}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/14\/3\/229\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T13:00:51Z","timestamp":1771938051000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/14\/3\/229"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,24]]},"references-count":57,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["systems14030229"],"URL":"https:\/\/doi.org\/10.3390\/systems14030229","relation":{},"ISSN":["2079-8954"],"issn-type":[{"value":"2079-8954","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,24]]}}}