{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T11:19:41Z","timestamp":1767611981101,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T00:00:00Z","timestamp":1737331200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100009697","name":"Hashemite University","doi-asserted-by":"publisher","award":["H6574"],"award-info":[{"award-number":["H6574"]}],"id":[{"id":"10.13039\/501100009697","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Electronic health records (EHRs) are widely used in healthcare institutions worldwide, containing vast amounts of unstructured textual data. However, the sensitive nature of Protected Health Information (PHI) embedded within these records presents significant privacy challenges, necessitating robust de-identification techniques. This paper introduces a novel approach, leveraging a Bi-LSTM-CRF model to achieve accurate and reliable PHI de-identification, using the i2b2 dataset sourced from Harvard University. Unlike prior studies that often unify Bi-LSTM and CRF layers, our approach focuses on the individual design, optimization, and hyperparameter tuning of both the Bi-LSTM and CRF components, allowing for precise model performance improvements. This rigorous approach to architectural design and hyperparameter tuning, often underexplored in the existing literature, significantly enhances the model\u2019s capacity for accurate PHI tag detection while preserving the essential clinical context. Comprehensive evaluations are conducted across 23 PHI categories, as defined by HIPAA, ensuring thorough security across critical domains. The optimized model achieves exceptional performance metrics, with a precision of 99%, recall of 98%, and F1-score of 98%, underscoring its effectiveness in balancing recall and precision. By enabling the de-identification of medical records, this research strengthens patient confidentiality, promotes compliance with privacy regulations, and facilitates safe data sharing for research and analysis.<\/jats:p>","DOI":"10.3390\/fi17010047","type":"journal-article","created":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T10:32:15Z","timestamp":1737369135000},"page":"47","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Deep Learning Framework for Advanced De-Identification of Protected Health Information"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7942-9018","authenticated-orcid":false,"given":"Ahmad","family":"Aloqaily","sequence":"first","affiliation":[{"name":"Department of Information Technology, Faculty of Prince Al-Hussein Bin Abdullah II for Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1181-9973","authenticated-orcid":false,"given":"Emad E.","family":"Abdallah","sequence":"additional","affiliation":[{"name":"Department of Information Technology, Faculty of Prince Al-Hussein Bin Abdullah II for Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0313-628X","authenticated-orcid":false,"given":"Rahaf","family":"Al-Zyoud","sequence":"additional","affiliation":[{"name":"Department of Information Technology, Faculty of Prince Al-Hussein Bin Abdullah II for Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0796-8522","authenticated-orcid":false,"given":"Esraa","family":"Abu Elsoud","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Information Technology, Zarqa University, P.O. Box 330127, Zarqa 13133, Jordan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7967-5825","authenticated-orcid":false,"given":"Malak","family":"Al-Hassan","sequence":"additional","affiliation":[{"name":"King Abdullah II School of Information Technology, The University of Jordan, Amman 11942, Jordan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8959-9969","authenticated-orcid":false,"given":"Alaa E.","family":"Abdallah","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Prince Al-Hussein Bin Abdullah II for Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan"}]}],"member":"1968","published-online":{"date-parts":[[2025,1,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ahmed, T., Aziz, M.M.A., Mohammed, N., and Jiang, X. (2021, January 1\u20134). Privacy preserving neural networks for electronic health records de-identification. Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, Gainesville, FL, USA.","DOI":"10.1145\/3459930.3469555"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.infoh.2024.05.001","article-title":"Health informatics to enhance the healthcare industry\u2019s culture: An extensive analysis of its features, contributions, applications and limitations","volume":"1","author":"Javaid","year":"2024","journal-title":"Inform. Health"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"231","DOI":"10.62411\/jcta.9509","article-title":"BEHedas: A blockchain electronic health data system for secure medical records exchange","volume":"1","author":"Oladele","year":"2024","journal-title":"J. Comput. Theor. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"420","DOI":"10.51594\/imsrj.v4i4.1000","article-title":"Reviewing the impact of health information technology on healthcare management efficiency","volume":"4","author":"Okolo","year":"2024","journal-title":"Int. Med. Sci. Res. J."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1446","DOI":"10.30574\/wjarr.2024.21.2.0592","article-title":"The impact of electronic health records on patient care and outcomes: A comprehensive review","volume":"21","author":"Adeniyi","year":"2024","journal-title":"World J. Adv. Res. Rev."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"9787","DOI":"10.1038\/s41467-024-54071-x","article-title":"Shareable artificial intelligence to extract cancer outcomes from electronic health records","volume":"15","author":"Kehl","year":"2024","journal-title":"Nat. Commun."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"234","DOI":"10.30574\/wjbphs.2024.19.1.0435","article-title":"The role of data-driven initiatives in enhancing healthcare delivery and patient retention","volume":"19","author":"Ajegbile","year":"2024","journal-title":"World J. Biol. Pharm. Health Sci."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Corte-Real, A., Nunes, T., and da Cunha, P.R. (2024). Reflections about Blockchain in Health Data Sharing: Navigating a Disruptive Technology. Int. J. Environ. Res. Public Health, 21.","DOI":"10.20944\/preprints202401.1016.v1"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Isibor, E. (2024). Regulation of Healthcare Data Security: Legal Obligations in A Digital Age. SSRN.","DOI":"10.2139\/ssrn.4957244"},{"key":"ref_10","unstructured":"Alves, V.M.R.G. (2024). De-Identification of Clinical Text Using Sentence Embeddings. [Master\u2019s Thesis, Universidade do Porto]."},{"key":"ref_11","first-page":"2153","article-title":"De-identification of free text data containing personal health information: A scoping review of reviews","volume":"8","author":"Negash","year":"2023","journal-title":"Int. J. Popul. Data Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1628","DOI":"10.1056\/NEJMsa0900592","article-title":"Use of electronic health records in US hospitals","volume":"360","author":"Jha","year":"2009","journal-title":"N. Engl. J. Med."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1016\/j.ijmedinf.2008.06.013","article-title":"Inter-organizational future proof EHR systems: A review of the security and privacy related issues","volume":"78","author":"Kalra","year":"2009","journal-title":"Int. J. Med. Inform."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1186\/s40537-020-00351-4","article-title":"Survey on RNN and CRF models for de-identification of medical free text","volume":"7","author":"Leevy","year":"2020","journal-title":"J. Big Data"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"S34","DOI":"10.1016\/j.jbi.2017.05.023","article-title":"De-identification of clinical notes via recurrent neural network and conditional random field","volume":"75","author":"Liu","year":"2017","journal-title":"J. Biomed. Inform."},{"key":"ref_16","first-page":"857","article-title":"De-identification of clinical text via Bi-LSTM-CRF with neural language models","volume":"Volume 2019","author":"Tang","year":"2019","journal-title":"Proceedings of the AMIA Annual Symposium Proceedings"},{"key":"ref_17","first-page":"98-012","article-title":"Medical name entity recognition based on Bi-LSTM-CRF and attention mechanism","volume":"40","author":"Zhang","year":"2020","journal-title":"J. Comput. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"(2021). Chinese-named entity recognition from adverse drug event records: Radical embedding-combined dynamic embedding\u2013based BERT in a bidirectional long short-term conditional random field (Bi-LSTM-CRF) model. JMIR Med. Inform., 9, e26407.","DOI":"10.2196\/26407"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cai, X., Sun, E., and Lei, J. (2022, January 1\u20133). Research on application of named entity recognition of electronic medical records based on BERT-IDCNN-CRF model. Proceedings of the 6th International Conference on Graphics and Signal Processing, Chiba, Japan.","DOI":"10.1145\/3561518.3561531"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gao, M., Xiao, Q., Wu, S., and Deng, K. (2019, January 17\u201319). An attention-based ID-CNNs-CRF model for named entity recognition on clinical electronic medical records. Proceedings of the International Conference on Artificial Neural Networks, Munich, Germany.","DOI":"10.1007\/978-3-030-30493-5_25"},{"key":"ref_21","unstructured":"Zavala, R.M.R., Mart\u00ednez, P., and Segura-Bedmar, I. (2018, January 18). A Hybrid Bi-LSTM-CRF model for Knowledge Recognition from eHealth documents. Proceedings of the TASS@ SEPLN, Seville, Spain."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s13042-020-01160-0","article-title":"Clinical quantitative information recognition and entity-quantity association from Chinese electronic medical records","volume":"12","author":"Liu","year":"2021","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"110100","DOI":"10.1109\/ACCESS.2022.3213676","article-title":"Bi-LSTM-CRF network for clinical event extraction with medical knowledge features","volume":"10","author":"Zhang","year":"2022","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"12875","DOI":"10.1007\/s11356-021-13875-w","article-title":"Water quality assessment of a river using deep learning Bi-LSTM methodology: Forecasting and validation","volume":"29","author":"Khullar","year":"2022","journal-title":"Environ. Sci. Pollut. Res."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2205047","DOI":"10.1002\/adma.202205047","article-title":"Artificial neuronal devices based on emerging materials: Neuronal dynamics and applications","volume":"35","author":"Liu","year":"2023","journal-title":"Adv. Mater."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Dhoot, A., Deva, R., and Shukla, V. (2023, January 27\u201329). A Novel Security Model for Healthcare Prediction by Using DL. Proceedings of the International Conference on Cryptology & Network Security with Machine Learning, Kanpur, India.","DOI":"10.1007\/978-981-97-0641-9_53"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1016\/j.jbi.2015.09.018","article-title":"Creation of a new longitudinal corpus of clinical narratives","volume":"58","author":"Kumar","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"S20","DOI":"10.1016\/j.jbi.2015.07.020","article-title":"Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2\/UTHealth corpus","volume":"58","author":"Stubbs","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1016\/j.jbi.2015.06.007","article-title":"Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2\/UTHealth shared task Track 1","volume":"58","author":"Stubbs","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"De Santis, E., Martino, A., Ronci, F., and Rizzi, A. (2024). From Bag-of-Words to Transformers: A Comparative Study for Text Classification in Healthcare Discussions in Social Media. IEEE Trans. Emerg. Top. Comput. Intell., early access.","DOI":"10.1109\/TETCI.2024.3423444"},{"key":"ref_31","unstructured":"Sumukh, S. (2023). Better Understanding of Code-Mixed Social Media Data via Information Extraction. [Ph.D. Thesis, International Institute of Information Technology Hyderabad]."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2486","DOI":"10.1016\/j.procs.2020.03.301","article-title":"Extracting aspect terms using CRF and bi-LSTM models","volume":"167","author":"Gandhi","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"229","DOI":"10.62019\/abbdm.v4i1.111","article-title":"Part-Of-Speech Tagging for Balochi Language: A Data driven application of Conditional Random Fields","volume":"4","author":"Ullah","year":"2024","journal-title":"Asian Bull. Big Data Manag."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Azzouzi, M.E., Coatrieux, G., Bellafqira, R., Delamarre, D., Riou, C., Oubenali, N., Cabon, S., Cuggia, M., and Bouzill\u00e9, G. (2024). Automatic de-identification of French electronic health records: A cost-effective approach exploiting distant supervision and deep learning models. BMC Med. Inform. Decis. Mak., 24.","DOI":"10.1186\/s12911-024-02422-5"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"e12692","DOI":"10.1111\/coin.12692","article-title":"Contextual classification of clinical records with bidirectional long short-term memory (Bi-LSTM) and bidirectional encoder representations from transformers (BERT) model","volume":"40","author":"Zalte","year":"2024","journal-title":"Comput. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1007\/s12539-024-00624-z","article-title":"MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records","volume":"16","author":"Du","year":"2024","journal-title":"Interdiscip. Sci. Comput. Life Sci."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Dai, C., Zhuang, X., and Cai, J. (2022, January 17\u201319). Chinese Electronic Medical Record Named Entity Recognition Based on Bi-RNN-LSTM-RNN-CRF. Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition, Beijing, China.","DOI":"10.1145\/3581807.3581892"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Taheri, R., Arabikhan, F., Gegov, A., and Akbari, N. (2023, January 22\u201323). Robust Aggregation Function in Federated Learning. Proceedings of the International Conference on Information and Knowledge Systems, Portsmouth, UK.","DOI":"10.1007\/978-3-031-51664-1_12"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/1\/47\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T10:32:05Z","timestamp":1759919525000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/1\/47"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,20]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["fi17010047"],"URL":"https:\/\/doi.org\/10.3390\/fi17010047","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2025,1,20]]}}}