{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T08:52:27Z","timestamp":1773391947551,"version":"3.50.1"},"reference-count":24,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100016721","name":"Universitas Islam Riau","doi-asserted-by":"publisher","award":["939\/KONTRAK\/P-K-KI\/DPPM-UIR\/10-2024"],"award-info":[{"award-number":["939\/KONTRAK\/P-K-KI\/DPPM-UIR\/10-2024"]}],"id":[{"id":"10.13039\/100016721","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Phishing URL detection is critical due to the severe cybersecurity threats posed by phishing attacks. While traditional methods rely heavily on handcrafted features and supervised machine learning, recent advances in large language models (LLMs) provide promising alternatives. This paper presents a comprehensive benchmarking study of 21 state-of-the-art open-source LLMs\u2014including Llama3, Gemma, Qwen, Phi, DeepSeek, and Mistral\u2014for phishing URL detection. We evaluate four key prompt engineering techniques\u2014zero-shot, role-playing, chain-of-thought, and few-shot prompting\u2014using a balanced, publicly available phishing URL dataset, with no fine-tuning or additional training of the models conducted, reinforcing the zero-shot, prompt-based nature as a distinctive aspect of our study. The results demonstrate that large open-source LLMs (\u226527B parameters) achieve performance exceeding 90% F1-score without fine-tuning, closely matching proprietary models. Among the prompt strategies, few-shot prompting consistently delivers the highest accuracy (91.24% F1 with Llama3.3_70b), whereas chain-of-thought significantly lowers accuracy and increases inference time. Additionally, our analysis highlights smaller models (7B\u201327B parameters) offering strong performance with substantially reduced computational costs. This study underscores the practical potential of open-source LLMs for phishing detection and provides insights for effective prompt engineering in cybersecurity applications.<\/jats:p>","DOI":"10.3390\/info16050366","type":"journal-article","created":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T05:50:17Z","timestamp":1745992217000},"page":"366","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Benchmarking 21 Open-Source Large Language Models for Phishing Link Detection with Prompt Engineering"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6283-3217","authenticated-orcid":false,"given":"Arbi Haza","family":"Nasution","sequence":"first","affiliation":[{"name":"Department of Informatics Engineering, Universitas Islam Riau, Pekanbaru 28284, Indonesia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4637-5735","authenticated-orcid":false,"given":"Winda","family":"Monika","sequence":"additional","affiliation":[{"name":"Department of Library Information, Universitas Lancang Kuning, Pekanbaru 28266, Indonesia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9434-5880","authenticated-orcid":false,"given":"Aytug","family":"Onan","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, College of Engineering and Architecture, Izmir Katip Celebi University, Izmir 35620, Turkey"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8310-2007","authenticated-orcid":false,"given":"Yohei","family":"Murakami","sequence":"additional","affiliation":[{"name":"Faculty of Information Science and Engineering, Ritsumeikan University, Kusatsu 525-8577, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"key":"ref_1","unstructured":"(2025, March 31). IBM Security. Cost of a Data Breach Report 2024. Available online: https:\/\/www.ibm.com\/reports\/data-breach."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/j.eswa.2018.09.029","article-title":"Machine learning based phishing detection from URLs","volume":"117","author":"Sahingoz","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhao, X., Langlois, K., Furst, J., McClellan, S., Fleur, R., An, Y., Hu, X., Uribe-Romo, F., Gualdron, D., and Greenberg, J. (2023, January 15\u201318). When LLM Meets Material Science: An Investigation on MOF Synthesis Labeling. Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy.","DOI":"10.1109\/BigData59044.2023.10386438"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Hu, H., Yan, J., Zhang, X., Jiao, Z., and Tang, B. (2024). Overview of CHIP2023 Shared Task 4: CHIP-YIER Medical Large Language Model Evaluation. Communications in Computer and Information Science, Springer Nature.","DOI":"10.1007\/978-981-97-1717-0_11"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Schur, A., and Groenjes, S. (2024). Comparative Analysis for Open-Source Large Language Models. Communications in Computer and Information Science, Springer Nature.","DOI":"10.1007\/978-3-031-49215-0_7"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"26839","DOI":"10.1109\/ACCESS.2024.3365742","article-title":"A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges","volume":"12","author":"Raiaan","year":"2024","journal-title":"IEEE Access"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ahmed, T., Piovesan, N., De Domenico, A., and Choudhury, S. (2024, January 9\u201313). Linguistic Intelligence in Large Language Models for Telecommunications. Proceedings of the 2024 IEEE International Conference on Communications Workshops (ICC Workshops), Denver, CO, USA.","DOI":"10.1109\/ICCWorkshops59551.2024.10615609"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"65643","DOI":"10.1109\/ACCESS.2025.3560254","article-title":"Leveraging Large Language Models for Discrepancy Value Prediction in Custody Transfer Systems: A Comparative Analysis of Probabilistic and Point Forecasting Approaches","volume":"13","author":"Hidayat","year":"2025","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Khalila, Z., Nasution, A.H., Monika, W., Onan, A., Murakami, Y., Radi, Y.B.I., and Osmani, N.M. (2025). Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models. Int. J. Adv. Comput. Sci. Appl., 16.","DOI":"10.14569\/IJACSA.2025.01602134"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"71876","DOI":"10.1109\/ACCESS.2024.3402809","article-title":"ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-Resource Language NLP Tasks","volume":"12","author":"Nasution","year":"2024","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"367","DOI":"10.3390\/make6010018","article-title":"Prompt engineering or fine-tuning? a case study on phishing detection with large language models","volume":"6","author":"Trad","year":"2024","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"124990","DOI":"10.1109\/ACCESS.2024.3444483","article-title":"From sample poverty to rich feature learning: A new metric learning method for few-shot classification","volume":"12","author":"Zhang","year":"2024","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"5948","DOI":"10.1016\/j.eswa.2014.03.019","article-title":"Phishing detection based on rough set theory","volume":"41","author":"Abdelhamid","year":"2014","journal-title":"Expert Syst. Appl."},{"key":"ref_14","unstructured":"Koide, T., Fukushi, N., Nakano, H., and Chiba, D. (2024). Chatspamdetector: Leveraging large language models for effective phishing email detection. arXiv."},{"key":"ref_15","unstructured":"Heiding, T., Schiele, T., and Reuter, C. (2023, January 3\u20137). Large Language Models for Phishing Email Detection: GPT-4 vs. Humans. Proceedings of the 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Delft, The Netherlands."},{"key":"ref_16","unstructured":"Trad, B., and Chehab, L. (2023). Prompting Large Language Models for Phishing URL Detection. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"104347","DOI":"10.1016\/j.engappai.2021.104347","article-title":"Towards benchmark datasets for machine learning based website phishing detection: An experimental study","volume":"104","author":"Hannousse","year":"2021","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Haq, Q.E.u., Faheem, M.H., and Ahmad, I. (2024). Detecting Phishing URLs Based on a Deep Learning Approach to Prevent Cyber-Attacks. Appl. Sci., 14.","DOI":"10.3390\/app142210086"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Heiding, F., Schneier, B., Vishwanath, A., Bernstein, J., and Park, P.S. (2023). Devising and detecting phishing: Large language models vs. smaller human models. arXiv.","DOI":"10.1109\/ACCESS.2024.3375882"},{"key":"ref_20","unstructured":"Nicklas, F., Ventulett, N., and Conrad, J. (2024, January 13\u201314). Enhancing Phishing Email Detection with Context-Augmented Open Large Language Models. Proceedings of the Upper-Rhine Artificial Intelligence Symposium, Offenburg, Germany."},{"key":"ref_21","unstructured":"Liu, R., Geng, J., Wu, A.J., Sucholutsky, I., Lombrozo, T., and Griffiths, T.L. (2024). Mind your step (by step): Chain-of-thought can reduce performance on tasks where thinking makes humans worse. arXiv."},{"key":"ref_22","first-page":"2021","article-title":"Web page phishing detection","volume":"3","author":"Hannousse","year":"2021","journal-title":"Mendeley Data"},{"key":"ref_23","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","unstructured":"Wang, Y., Liu, X., and Peng, N. (2024). The Pitfalls of Chain-of-Thought Prompting: Contamination, Misinterpretation, and Overthinking. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/366\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:24:28Z","timestamp":1760030668000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/366"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":24,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["info16050366"],"URL":"https:\/\/doi.org\/10.3390\/info16050366","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,29]]}}}