{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T05:20:46Z","timestamp":1766035246542,"version":"3.48.0"},"reference-count":62,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T00:00:00Z","timestamp":1766016000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>The exponential growth of heterogeneous, high-velocity CyberSecurity data generated by modern digital infrastructures presents both opportunities and challenges for threat detection, especially against increasingly sophisticated cyber-attacks. Traditional security tools struggle to process such data effectively, highlighting the need for scalable Big Data Analytics and advanced Machine Learning (ML) techniques. However, the black-box nature of many ML models limits interpretability, trust, and regulatory compliance in high-stakes environments.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>This study proposes an integrated framework that combines Big Data technologies, ML models, and Explainable Artificial Intelligence (XAI) to enable accurate, transparent, and real-time phishing attack detection. The framework leverages distributed computing and stream processing for efficient handling of large and diverse datasets while incorporating XAI methods to generate human-understandable model explanations.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Experimental evaluation conducted on four publicly available CyberSecurity datasets demonstrates improved phishing detection performance, enhanced interpretability of model decisions, and actionable insights into malicious URL behavior and patterns.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>The proposed approach advances interpretable and scalable CyberSecurity analytics by addressing the gap between predictive accuracy and decision transparency. By integrating Big Data processing with XAI-driven ML, the framework offers a trustworthy solution for real-time threat detection, supporting informed decision-making and regulatory compliance.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fdata.2025.1688091","type":"journal-article","created":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T05:16:15Z","timestamp":1766034975000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Transparent and trustworthy CyberSecurity: an XAI-integrated big data framework for phishing attack detection"],"prefix":"10.3389","volume":"8","author":[{"given":"Muhammad","family":"Nauman","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hafiz Muhammad","family":"Usman Akhtar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huseyn","family":"Gorbani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Hadi Ul Hassan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad A. B.","family":"Fayyaz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,12,18]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"84","DOI":"10.37502\/IJSMR.2025.8208","article-title":"Harnessing big data analytics for advanced detection of deepfakes and cybersecurity threats across industries","volume":"6","author":"Afolabi","year":"2025","journal-title":"Int. J. Sci. Manag. Res"},{"key":"B2","unstructured":"Agent\n              I.\n            \n          \n          2025 Global Threat Intelligence Report\n          \n          2025"},{"key":"B3","first-page":"5001","article-title":"Intrusion detection in IOT using xgboost and catboost: a comparative study","volume":"12","author":"Ahmad","year":"2024","journal-title":"IEEE Access"},{"key":"B4","doi-asserted-by":"crossref","first-page":"1054","DOI":"10.1145\/3531146.3533168","article-title":"\u201cCounterfactual Shapley Additive Explanations,\u201d","volume-title":"Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22)","author":"Albini","year":"2022"},{"key":"B5","first-page":"1800","article-title":"Explainable AI for cybersecurity: opportunities and challenges","volume":"20","author":"Ali","year":"","journal-title":"IEEE Trans. Dependable Secure Comput"},{"key":"B6","first-page":"102973","article-title":"State-of-the-art machine learning techniques in cybersecurity: a review","volume":"124","author":"Ali","year":"","journal-title":"Comput. Secur"},{"key":"B7","first-page":"45","article-title":"AI-powered cyber attacks: emerging trends and defense strategies. J. Cybersecur","volume":"12","author":"Amara","year":"2024","journal-title":"Res"},{"key":"B8","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.inffus.2019.12.012","article-title":"Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI","volume":"58","author":"Arrieta","year":"2020","journal-title":"Inf. Fus"},{"key":"B9","doi-asserted-by":"publisher","first-page":"7375","DOI":"10.3390\/su14127375","article-title":"Interpretable machine learning models for malicious domains detection using explainable artificial intelligence (XAI)","volume":"14","author":"Aslam","year":"2022","journal-title":"Sustainability"},{"key":"B10","doi-asserted-by":"publisher","first-page":"112613","DOI":"10.1016\/j.asoc.2024.112613","article-title":"Cybersecurity-aware log management system for critical water infrastructures","volume":"169","author":"Balta","year":"2025","journal-title":"Appl. Soft Comput"},{"key":"B11","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/978-1-4842-4470-8_7","article-title":"\u201cGoogle colaboratory,\u201d","volume-title":"Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners","author":"Bisong","year":"2019"},{"key":"B12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3527448","article-title":"Explainable AI: state of the art and challenges","volume":"55","author":"Calzarossa","year":"2023","journal-title":"ACM Comput. Surv"},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2202.10573","article-title":"Explainable artificial intelligence for cybersecurity: a comprehensive survey","author":"Charmet","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B14","doi-asserted-by":"publisher","first-page":"103080","DOI":"10.1016\/j.simpat.2025.103080","article-title":"Harnessing coloured petri nets to enhance machine learning:a simulation-based method for healthcare and beyond","volume":"140","author":"da Silveira","year":"2025","journal-title":"Simul. Model Pract. Theory"},{"key":"B15","doi-asserted-by":"publisher","first-page":"101666","DOI":"10.1016\/j.giq.2021.101666","article-title":"The perils and pitfalls of explainable AI: strategies for explaining algorithmic decision-making","volume":"39","author":"de Bruijn","year":"2022","journal-title":"Gov. Inf. Q."},{"key":"B16","first-page":"103315","article-title":"Investigation of human-centric cybersecurity risk factors using explainable AI","volume":"133","author":"Fan","year":"2024","journal-title":"Comput. Secur"},{"key":"B17","doi-asserted-by":"publisher","first-page":"209517","DOI":"10.14236\/ewic\/icscsr19.16","article-title":"Deep learning techniques for cybersecurity intrusion detection: a systematic review","volume":"8","author":"Ferrag","year":"2020","journal-title":"IEEE Access"},{"key":"B18","first-page":"126327","article-title":"Explainable artificial intelligence (XAI): a bibliometric review","volume":"9","author":"Gianfagna","year":"2021","journal-title":"IEEE Access"},{"key":"B19","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1609\/aimag.v40i2.2850","article-title":"Darpa's explainable artificial intelligence (XAI) program","volume":"40","author":"Gunning","year":"2019","journal-title":"AI Mag"},{"key":"B20","first-page":"1","article-title":"\u201cArtificial intelligence and cybersecurity: the application of lime to detect phishing attacks,\u201d","volume-title":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","author":"Hakkoum","year":"2020"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1627078","DOI":"10.3389\/frai.2025.1627078","article-title":"Explainable AI-driven depression detection from social media using natural language processing and black box machine learning models","volume":"8","author":"Hameed","year":"","journal-title":"Front. Artif. Intell"},{"key":"B22","doi-asserted-by":"publisher","first-page":"258","DOI":"10.21015\/vtse.v13i3.2228","article-title":"An explainable deep learning framework for automated classification of ocular diseases in a big data environment","volume":"13","author":"Hameed","year":"","journal-title":"VFAST Trans. Softw. Eng"},{"key":"B23","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1016\/j.is.2014.07.006","article-title":"The rise of big data on cloud computing: review and open research issues","volume":"47","author":"Hashem","year":"2015","journal-title":"Inf. Syst"},{"key":"B24","first-page":"4500","article-title":"Dimensionality reduction for industrial iot attack detection using machine learning","volume":"11","author":"Hoang","year":"2024","journal-title":"IEEE Internet Things J"},{"key":"B25","doi-asserted-by":"publisher","first-page":"127817","DOI":"10.1109\/ACCESS.2023.3332030","article-title":"Leveraging big data analytics for enhanced clinical decision-making in healthcare","volume":"11","author":"Hussain","year":"2023","journal-title":"IEEE Access"},{"key":"B26","first-page":"33000","article-title":"Deep learning-based ransomware detection: a comparative study","volume":"11","author":"Hussain","year":"2023","journal-title":"IEEE Access"},{"key":"B27","doi-asserted-by":"publisher","first-page":"35","DOI":"10.3390\/photonics12010035","article-title":"A deep learning-based approach for the detection of various internet of things intrusion attacks through optical networks","volume":"12","author":"Imtiaz","year":"2025","journal-title":"Photonics"},{"key":"B28","first-page":"1","article-title":"Big data analytics in cybersecurity: a survey","volume":"35","author":"Iqbal","year":"2020","journal-title":"J. Comput. Sci. Technol"},{"key":"B29","unstructured":"IRONSCALES: A Complete Email Security Solution (Whitepaper)\n          \n          2025"},{"key":"B30","doi-asserted-by":"publisher","first-page":"81","DOI":"10.3390\/technologies12060081","article-title":"A survey of machine learning in edge computing: techniques, frameworks, applications, issues, and research directions","volume":"12","author":"Jouini","year":"2024","journal-title":"Technologies"},{"key":"B31","doi-asserted-by":"publisher","first-page":"36805","DOI":"10.1109\/ACCESS.2023.3252366","article-title":"Phishing detection system through hybrid machine learning based on URL","volume":"11","author":"Karim","year":"2023","journal-title":"IEEE Access"},{"key":"B32","first-page":"48678","article-title":"Guaranteeing explainability in machine learning-based cybersecurity systems: a survey and future directions","volume":"12","author":"Khan","year":"2024","journal-title":"IEEE Access"},{"key":"B33","doi-asserted-by":"publisher","first-page":"90299","DOI":"10.1109\/ACCESS.2024.3420415","article-title":"Guaranteeing correctness in black-box machine learning: a fusion of explainable AI and formal methods for healthcare decision-making","volume":"12","author":"Khan","year":"2024","journal-title":"IEEE Access"},{"key":"B34","first-page":"621","article-title":"Big data analytics and explainable AI for scalable cybersecurity systems","volume":"145","author":"Khan","year":"2024","journal-title":"Future Gener. Comput. Syst"},{"key":"B35","doi-asserted-by":"publisher","first-page":"10245","DOI":"10.1007\/s10115-025-02531-1","article-title":"A comprehensive bibliometric analysis of big data and cyber security: intellectual structure, trends, and global collaborations: M. koca, s. \u00e7ift\u00e7i","volume":"67","author":"Koca","year":"2025","journal-title":"Knowl. Inf. Syst"},{"key":"B36","doi-asserted-by":"publisher","first-page":"6300","DOI":"10.3390\/s24196300","article-title":"Securevision: advanced cybersecurity deepfake detection with big data analytics","volume":"24","author":"Kumar","year":"2024","journal-title":"Sensors"},{"key":"B37","article-title":"\u201cA unified approach to interpreting model predictions,\u201d","volume-title":"Advance in Neural Information Processing System 30","author":"Lundberg","year":"2017"},{"key":"B38","doi-asserted-by":"publisher","first-page":"618","DOI":"10.1109\/ACCESS.2023.3347028","article-title":"Explainable artificial intelligence (XAI): a systematic literature review on taxonomies and applications in finance","volume":"12","author":"Martins","year":"2024","journal-title":"IEEE Access"},{"key":"B39","doi-asserted-by":"publisher","first-page":"1526221","DOI":"10.3389\/frai.2025.1526221","article-title":"A systematic review on the integration of explainable artificial intelligence in intrusion detection systems to enhancing transparency and interpretability in cybersecurity","volume":"8","author":"Mohale","year":"2025","journal-title":"Front. Artif. Intell."},{"key":"B40","first-page":"1","article-title":"Examining big data analytics for strategic decision-making: a cross-sectoral perspective","volume":"12","author":"Mughal","year":"2025","journal-title":"Big Data Soc"},{"key":"B41","first-page":"700","article-title":"\u201cBig data analysis using apache hadoop,\u201d","volume-title":"2013 IEEE 14th International Conference on Information Reuse &Integration (IRI)","author":"Nandimath","year":"2013"},{"key":"B42","doi-asserted-by":"publisher","first-page":"200","DOI":"10.1109\/ACCESS.2025.3526456","article-title":"The role of big data in transforming diabetes management: a predictive analytics approach","volume":"49","author":"Nauman","year":"2025","journal-title":"J. Med. Syst"},{"key":"B43","doi-asserted-by":"publisher","first-page":"143434","DOI":"10.1109\/ACCESS.2021.3121092","article-title":"Improving the correctness of medical diagnostics based on machine learning with coloured petri nets","volume":"9","author":"Nauman","year":"","journal-title":"IEEE Access"},{"key":"B44","doi-asserted-by":"publisher","first-page":"92864","DOI":"10.1109\/ACCESS.2021.3088901","article-title":"Guaranteeing correctness of machine learning based decision making at higher educational institutions","volume":"9","author":"Nauman","year":"","journal-title":"IEEE Access"},{"key":"B45","doi-asserted-by":"publisher","first-page":"1556157","DOI":"10.3389\/fdata.2025.1556157","article-title":"Safeguarding digital livestock farming-a comprehensive cybersecurity roadmap for dairy and poultry industries","volume":"8","author":"Neethirajan","year":"2025","journal-title":"Front Big Data"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1252","DOI":"10.3390\/app13031252","article-title":"Explainable artificial intelligence (XAI) for intrusion detection and mitigation in intelligent connected vehicles: a review","volume":"13","author":"Nwakanma","year":"2023","journal-title":"Appl. Sci."},{"key":"B47","article-title":"Advanced explainable AI techniques for real-time cybersecurity threat detection","author":"Pawlicki","year":"2024","journal-title":"IEEE Trans. Dependable Secure Comput"},{"key":"B48","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1016\/j.jnca.2013.09.016","article-title":"Google drive: forensic analysis of data remnants","volume":"40","author":"Quick","year":"2014","journal-title":"J. Netw. Comput. Appl"},{"key":"B49","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1145\/2939672.2939778","article-title":"\u201c\u201cWhy should i trust you?\u201d: explaining the predictions of any classifier,\u201d","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Ribeiro","year":"2016"},{"key":"B50","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1007\/s41060-016-0027-9","article-title":"Big data analytics on apache spark","volume":"1","author":"Salloum","year":"2016","journal-title":"Int. J. Data Sci. Anal"},{"key":"B51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-020-00318-5","article-title":"Cybersecurity data science: an overview from machine learning perspective","volume":"7","author":"Sarker","year":"2020","journal-title":"J. Big Data"},{"key":"B52","unstructured":"Financial Threat Report 2023: Phishing, PC and Mobile Malware.\n          \n          2023"},{"key":"B53","doi-asserted-by":"publisher","first-page":"173579","DOI":"10.1109\/ACCESS.2020.3041951","article-title":"A survey on machine learning techniques for cybersecurity intrusion detection","volume":"8","author":"Shaukat","year":"2020","journal-title":"IEEE Access"},{"key":"B54","unstructured":"The State of Phishing Report 2022.\n          \n          2022"},{"key":"B55","first-page":"40150","article-title":"Exploring explainable AI with hexadecimal features for malicious url detection","volume":"12","author":"Tashtoush","year":"2024","journal-title":"IEEE Access"},{"key":"B56","first-page":"1200","article-title":"Zero-day attack detection using xgboost and autoencoder: a hybrid approach","volume":"19","author":"Usman","year":"2024","journal-title":"IEEE Trans Inf. Forensics Secur"},{"key":"B57","first-page":"3201","article-title":"Lightweight xgboost-based intrusion detection system for iot networks","volume":"13","author":"Usman","year":"","journal-title":"IEEE Internet Things J"},{"key":"B58","first-page":"45500","article-title":"A systematic review of machine learning approaches in cybersecurity: Focus on phishing and social engineering","volume":"13","author":"Usman","year":"","journal-title":"IEEE Access"},{"key":"B59","first-page":"103242","article-title":"Machine learning models for cybersecurity: LS-SVM and beyond","volume":"130","author":"Usman","year":"","journal-title":"Comput. Secur"},{"key":"B60","volume-title":"Python Reference Manual, Volume 111","author":"Van Rossum","year":"1995"},{"key":"B61","first-page":"68070","article-title":"Explainable machine learning for intrusion detection: a comprehensive review","volume":"10","author":"Zhang","year":"2022","journal-title":"IEEE Access"},{"key":"B62","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-015-0013-4","article-title":"Intrusion detection and big heterogeneous data: a survey","volume":"2","author":"Zuech","year":"2015","journal-title":"J. Big Data"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2025.1688091\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T05:16:20Z","timestamp":1766034980000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2025.1688091\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,18]]},"references-count":62,"alternative-id":["10.3389\/fdata.2025.1688091"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2025.1688091","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,18]]},"article-number":"1688091"}}