{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:00:53Z","timestamp":1775815253943,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,9,20]],"date-time":"2023-09-20T00:00:00Z","timestamp":1695168000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,20]],"date-time":"2023-09-20T00:00:00Z","timestamp":1695168000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100007465","name":"UiT The Arctic University of Norway","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100007465","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int. J. Inf. Secur."],"published-print":{"date-parts":[[2024,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Spam emails pose a substantial cybersecurity danger, necessitating accurate classification to reduce unwanted messages and mitigate risks. This study focuses on enhancing spam email classification accuracy using stacking ensemble machine learning techniques. We trained and tested five classifiers: logistic regression, decision tree, K-nearest neighbors (KNN), Gaussian naive Bayes and AdaBoost. To address overfitting, two distinct datasets of spam emails were aggregated and balanced. Evaluating individual classifiers based on recall, precision and F1 score metrics revealed AdaBoost as the top performer. Considering evolving spam technology and new message types challenging traditional approaches, we propose a stacking method. By combining predictions from multiple base models, the stacking method aims to improve classification accuracy. The results demonstrate superior performance of the stacking method with the highest accuracy (98.8%), recall (98.8%) and F1 score (98.9%) among tested methods. Additional experiments validated our approach by varying dataset sizes and testing different classifier combinations. Our study presents an innovative combination of classifiers that significantly improves accuracy, contributing to the growing body of research on stacking techniques. Moreover, we compare classifier performances using a unique combination of two datasets, highlighting the potential of ensemble techniques, specifically stacking, in enhancing spam email classification accuracy. The implications extend beyond spam classification systems, offering insights applicable to other classification tasks. Continued research on emerging spam techniques is vital to ensure long-term effectiveness.<\/jats:p>","DOI":"10.1007\/s10207-023-00756-1","type":"journal-article","created":{"date-parts":[[2023,9,20]],"date-time":"2023-09-20T16:02:06Z","timestamp":1695225726000},"page":"505-517","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":45,"title":["Improving spam email classification accuracy using ensemble techniques: a stacking approach"],"prefix":"10.1007","volume":"23","author":[{"given":"Muhammad","family":"Adnan","sequence":"first","affiliation":[]},{"given":"Muhammad Osama","family":"Imam","sequence":"additional","affiliation":[]},{"given":"Muhammad Furqan","family":"Javed","sequence":"additional","affiliation":[]},{"given":"Iqbal","family":"Murtza","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,9,20]]},"reference":[{"issue":"2","key":"756_CR1","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1109\/MSP.2005.38","volume":"3","author":"SL Pfleeger","year":"2005","unstructured":"Pfleeger, S.L., Bloom, G.: Canning spam: proposed solutions to unwanted email. IEEE Secur. Priv. 3(2), 40\u201347 (2005)","journal-title":"IEEE Secur. Priv."},{"key":"756_CR2","doi-asserted-by":"crossref","unstructured":"Grier, C., Thomas, K., Paxson, V., & Zhang, M. (2010, October). @ spam: the underground on 140 characters or less. in Proceedings of the 17th ACM conference on Computer and communications security (pp. 27\u201337)","DOI":"10.1145\/1866307.1866311"},{"issue":"5","key":"756_CR3","first-page":"16","volume":"136","author":"DK Agarwal","year":"2016","unstructured":"Agarwal, D.K., Kumar, R.: Spam filtering using SVM with different kernel functions. Int. J. Comput. Appl. 136(5), 16\u201323 (2016)","journal-title":"Int. J. Comput. Appl."},{"issue":"3","key":"756_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2835375","volume":"48","author":"R Heartfield","year":"2015","unstructured":"Heartfield, R., Loukas, G.: A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. (CSUR) 48(3), 1\u201339 (2015)","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"756_CR5","unstructured":"John, J. P., Moshchuk, A., Gribble, S. D., & Krishnamurthy, A.: Studying spamming botnets using botlab. in NSDI (Vol. 9, No. 2009) (2009, April)"},{"key":"756_CR6","doi-asserted-by":"crossref","unstructured":"Kumar, N., & Sonowal, S.: Email spam detection using machine learning algorithms. in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA)\u00a0(pp. 108\u2013113). IEEE. (2020)","DOI":"10.1109\/ICIRCA48905.2020.9183098"},{"key":"756_CR7","doi-asserted-by":"crossref","unstructured":"Junnarkar, A., Adhikari, S., Fagania, J., Chimurkar, P., & Karia, D.: E-mail spam classification via machine learning and natural language processing. in 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV)\u00a0(pp. 693\u2013699). IEEE. (2021, February)","DOI":"10.1109\/ICICV50876.2021.9388530"},{"issue":"1","key":"756_CR8","first-page":"173","volume":"3","author":"WA Awad","year":"2011","unstructured":"Awad, W.A., ELseuofi, S.M.: Machine learning methods for spam e-mail classification. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 3(1), 173\u2013184 (2011)","journal-title":"Int. J. Comput. Sci. Inf. Technol. (IJCSIT)"},{"issue":"3","key":"756_CR9","doi-asserted-by":"publisher","first-page":"766","DOI":"10.1109\/TCYB.2015.2415032","volume":"46","author":"F Zhang","year":"2015","unstructured":"Zhang, F., Chan, P.P., Biggio, B., Yeung, D.S., Roli, F.: Adversarial feature selection against evasion attacks. IEEE Trans. Cybern. 46(3), 766\u2013777 (2015)","journal-title":"IEEE Trans. Cybern."},{"key":"756_CR10","doi-asserted-by":"crossref","unstructured":"Shaukat, K., Luo, S., Chen, S., & Liu, D.: Cyber threat detection using machine learning techniques: A performance evaluation perspective. in 2020 international conference on cyber warfare and security (ICCWS) (pp. 1\u20136). IEEE. (2020, October)","DOI":"10.1109\/ICCWS48432.2020.9292388"},{"key":"756_CR11","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/5359540","author":"A Garavand","year":"2022","unstructured":"Garavand, A., Salehnasab, C., Behmanesh, A., Aslani, N., Zadeh, A.H., Ghaderzadeh, M.: Efficient model for coronary artery disease diagnosis: a comparative study of several machine learning algorithms. J. Healthc. Eng. (2022). https:\/\/doi.org\/10.1155\/2022\/5359540","journal-title":"J. Healthc. Eng."},{"key":"756_CR12","doi-asserted-by":"publisher","DOI":"10.1155\/2021\/9942873","author":"M Ghaderzadeh","year":"2021","unstructured":"Ghaderzadeh, M., Aria, M., Asadi, F.: X-ray equipped with artificial intelligence: changing the COVID-19 diagnostic paradigm during the pandemic. BioMed Res. Int. (2021). https:\/\/doi.org\/10.1155\/2021\/9942873","journal-title":"BioMed Res. Int."},{"key":"756_CR13","doi-asserted-by":"publisher","first-page":"17259","DOI":"10.1007\/s00521-020-04757-2","volume":"32","author":"P Hajek","year":"2020","unstructured":"Hajek, P., Barushka, A., Munk, M.: Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Comput. Appl. 32, 17259\u201317274 (2020)","journal-title":"Neural Comput. Appl."},{"key":"756_CR14","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1016\/j.cose.2012.12.002","volume":"34","author":"V Ramanathan","year":"2013","unstructured":"Ramanathan, V., Wechsler, H.: Phishing detection and impersonated entity discovery using conditional random field and latent Dirichlet allocation. Comput. Secur. 34, 123\u2013139 (2013)","journal-title":"Comput. Secur."},{"issue":"9","key":"756_CR15","doi-asserted-by":"publisher","first-page":"156","DOI":"10.3390\/fi12090156","volume":"12","author":"A Ghourabi","year":"2020","unstructured":"Ghourabi, A., Mahmood, M.A., Alzubi, Q.M.: A hybrid CNN-LSTM model for SMS spam detection in arabic and english messages. Future Internet 12(9), 156 (2020)","journal-title":"Future Internet"},{"key":"756_CR16","doi-asserted-by":"crossref","unstructured":"Madhavan, M. V., Pande, S., Umekar, P., Mahore, T., & Kalyankar, D.: Comparative analysis of detection of email spam with the aid of machine learning approaches. in IOP conference series: materials science and engineering (Vol. 1022, No. 1, p. 012113). IOP Publishing. (2021)","DOI":"10.1088\/1757-899X\/1022\/1\/012113"},{"key":"756_CR17","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/2500772","author":"A Rayan","year":"2022","unstructured":"Rayan, A.: Analysis of e-mail spam detection using a novel machine learning-based hybrid bagging technique. Comput. Intell. Neurosci. (2022). https:\/\/doi.org\/10.1155\/2022\/2500772","journal-title":"Comput. Intell. Neurosci."},{"key":"756_CR18","doi-asserted-by":"crossref","unstructured":"Suborna, A.K., Saha, S., Roy, C., Sarkar, S., & Siddique, M.T.H.: An approach to improve the accuracy of detecting spam in online reviews. in 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD) (pp. 296\u2013299). IEEE. (2021, February)","DOI":"10.1109\/ICICT4SD50815.2021.9396881"},{"key":"756_CR19","doi-asserted-by":"crossref","unstructured":"Fr\u00edas-Blanco, I., Verdecia-Cabrera, A., Ortiz-D\u00edaz, A., & Carvalho, A.: Fast adaptive stacking of ensembles. in Proceedings of the 31st Annual ACM Symposium on Applied Computing\u00a0(pp. 929\u2013934). (2016, April)","DOI":"10.1145\/2851613.2851655"},{"issue":"45","key":"756_CR20","doi-asserted-by":"publisher","first-page":"1242","DOI":"10.21608\/auej.2017.19151","volume":"12","author":"A El-Kareem","year":"2017","unstructured":"El-Kareem, A., Elshenawy, A., Elrfaey, F.: Mail spam detection using stacking classification. J. Al-Azhar Univ. Eng. Sector 12(45), 1242\u20131255 (2017)","journal-title":"J. Al-Azhar Univ. Eng. Sector"},{"key":"756_CR21","doi-asserted-by":"publisher","first-page":"3927","DOI":"10.1007\/s11042-020-09873-8","volume":"80","author":"S Madichetty","year":"2021","unstructured":"Madichetty, S.: A stacked convolutional neural network for detecting the resource tweets during a disaster. Multimed. Tools Appl. 80, 3927\u20133949 (2021)","journal-title":"Multimed. Tools Appl."},{"key":"756_CR22","doi-asserted-by":"publisher","first-page":"144121","DOI":"10.1109\/ACCESS.2021.3121508","volume":"9","author":"H Oh","year":"2021","unstructured":"Oh, H.: A YouTube spam comments detection scheme using cascaded ensemble machine learning model. IEEE Access 9, 144121\u2013144128 (2021)","journal-title":"IEEE Access"},{"issue":"3","key":"756_CR23","doi-asserted-by":"publisher","first-page":"936","DOI":"10.3390\/app10030936","volume":"10","author":"C Zhao","year":"2020","unstructured":"Zhao, C., Xin, Y., Li, X., Yang, Y., Chen, Y.: A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl. Sci. 10(3), 936 (2020)","journal-title":"Appl. Sci."},{"key":"756_CR24","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1016\/j.cose.2016.12.004","volume":"69","author":"S Liu","year":"2017","unstructured":"Liu, S., Wang, Y., Zhang, J., Chen, C., Xiang, Y.: Addressing the class imbalance problem in twitter spam detection using ensemble learning. Comput. Secur. 69, 35\u201349 (2017)","journal-title":"Comput. Secur."},{"issue":"3","key":"756_CR25","doi-asserted-by":"publisher","first-page":"1971","DOI":"10.3390\/app13031971","volume":"13","author":"TO Omotehinwa","year":"2023","unstructured":"Omotehinwa, T.O., Oyewola, D.O.: Hyperparameter optimization of ensemble models for spam email detection. Appl. Sci. 13(3), 1971 (2023)","journal-title":"Appl. Sci."},{"key":"756_CR26","doi-asserted-by":"publisher","DOI":"10.32604\/cmc.2021.014868","author":"K Sahu","year":"2021","unstructured":"Sahu, K., Alzahrani, F.A., Srivastava, R.K., Kumar, R.: Evaluating the impact of prediction techniques: software reliability perspective. Comput., Mater. Contin. (2021). https:\/\/doi.org\/10.32604\/cmc.2021.014868","journal-title":"Comput., Mater. Contin."},{"issue":"1","key":"756_CR27","doi-asserted-by":"publisher","first-page":"33","DOI":"10.18576\/isl\/090105","volume":"9","author":"K Sahu","year":"2020","unstructured":"Sahu, K., Srivastava, R.K.: Needs and importance of reliability prediction: an industrial perspective. Inf. Sci. Lett. 9(1), 33\u201337 (2020)","journal-title":"Inf. Sci. Lett."},{"key":"756_CR28","first-page":"19","volume":"17","author":"K Sahu","year":"2018","unstructured":"Sahu, K., Srivastava, R.K.: Soft computing approach for prediction of software reliability. Neural Netw. 17, 19 (2018)","journal-title":"Neural Netw."},{"key":"756_CR29","unstructured":"Apache Spam Assassin. (2022, November 22) https:\/\/spamassassin.apache.org\/old\/publiccorpus\/"},{"key":"756_CR30","unstructured":"Enron Corp & Cohen, W. W. (2015)\u00a0Enron Email Dataset. United States Federal Energy Regulatory Commissioniler, comp [Philadelphia, PA: William W. Cohen, MLD, CMU] [Software, E-Resource] Retrieved from the Library of Congress, https:\/\/www.loc.gov\/item\/2018487913\/."},{"key":"756_CR31","unstructured":"Scikit-Learn (2022, November 23) https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_extraction.text.TfidfTransformer.html#sklearn.feature_extraction.text.TfidfTransformer."},{"key":"756_CR32","doi-asserted-by":"publisher","unstructured":"Dedeturk, Bilge & Akay, Bahriye. (2020). Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Applied Soft Computing. 91. 106229. https:\/\/doi.org\/10.1016\/j.asoc.2020.106229.","DOI":"10.1016\/j.asoc.2020.106229"},{"issue":"2","key":"756_CR33","first-page":"79","volume":"14","author":"P Kumar","year":"2017","unstructured":"Kumar, P., Biswas, M.: SVM based image spam detection using kernels: linear, polynomial, RBF, and sigmoid. Int. J. Comput. Sci. Appl. 14(2), 79\u201396 (2017)","journal-title":"Int. J. Comput. Sci. Appl."},{"key":"756_CR34","doi-asserted-by":"publisher","first-page":"106229","DOI":"10.1016\/j.asoc.2020.106229","volume":"91","author":"BK Dedeturk","year":"2020","unstructured":"Dedeturk, B.K., Akay, B.: Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Appl. Soft Comput. 91, 106229 (2020)","journal-title":"Appl. Soft Comput."},{"issue":"1","key":"756_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0232-1","volume":"6","author":"VM Herrera","year":"2019","unstructured":"Herrera, V.M., Khoshgoftaar, T.M., Villanustre, F., Furht, B.: Random forest implementation and optimization for Big Data analytics on LexisNexis\u2019s high performance computing cluster platform. J. Big Data 6(1), 1\u201336 (2019)","journal-title":"J. Big Data"},{"key":"756_CR36","volume-title":"Machine learning: a probabilistic perspective","author":"KP Murphy","year":"2012","unstructured":"Murphy, K.P.: Machine learning: a probabilistic perspective. MIT press, London (2012)"},{"issue":"1","key":"756_CR37","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1006\/jcss.1997.1504","volume":"55","author":"Y Freund","year":"1997","unstructured":"Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119\u2013139 (1997)","journal-title":"J. Comput. Syst. Sci."},{"issue":"4","key":"756_CR38","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1016\/j.ipm.2009.03.002","volume":"45","author":"M Sokolova","year":"2009","unstructured":"Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427\u2013437 (2009)","journal-title":"Inf. Process. Manage."}],"container-title":["International Journal of Information Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10207-023-00756-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10207-023-00756-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10207-023-00756-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,23]],"date-time":"2024-01-23T01:10:58Z","timestamp":1705972258000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10207-023-00756-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,20]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["756"],"URL":"https:\/\/doi.org\/10.1007\/s10207-023-00756-1","relation":{},"ISSN":["1615-5262","1615-5270"],"issn-type":[{"value":"1615-5262","type":"print"},{"value":"1615-5270","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,20]]},"assertion":[{"value":"2 September 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 September 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Cnflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Human Participants and\/or Animals"}}]}}