{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:20:11Z","timestamp":1772166011225,"version":"3.50.1"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T00:00:00Z","timestamp":1739750400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T00:00:00Z","timestamp":1739750400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Korea Government","award":["RS-2023-00222663"],"award-info":[{"award-number":["RS-2023-00222663"]}]},{"name":"Korea Government","award":["RS-2023-00222663"],"award-info":[{"award-number":["RS-2023-00222663"]}]},{"name":"Korea Government","award":["RS-2023-00222663"],"award-info":[{"award-number":["RS-2023-00222663"]}]},{"DOI":"10.13039\/501100014188","name":"Ministry of Science and ICT","doi-asserted-by":"crossref","award":["2020H1D3A2A03100666"],"award-info":[{"award-number":["2020H1D3A2A03100666"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100014188","name":"Ministry of Science and ICT","doi-asserted-by":"crossref","award":["2020H1D3A2A03100666"],"award-info":[{"award-number":["2020H1D3A2A03100666"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100014188","name":"Ministry of Science and ICT","doi-asserted-by":"crossref","award":["2020H1D3A2A03100666"],"award-info":[{"award-number":["2020H1D3A2A03100666"]}],"id":[{"id":"10.13039\/501100014188","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Ministry of Health and Welfare,South Korea","award":["HI22C0471"],"award-info":[{"award-number":["HI22C0471"]}]},{"name":"Ministry of Health and Welfare,South Korea","award":["HI22C0471"],"award-info":[{"award-number":["HI22C0471"]}]},{"name":"Ministry of Health and Welfare,South Korea","award":["HI22C0471"],"award-info":[{"award-number":["HI22C0471"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>De-identification of clinical notes is essential to utilize the rich information in unstructured text data in medical research. However, only limited work has been done in removing personal information from clinical notes in Korea.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>Our study utilized a comprehensive dataset stored in the Note table of the OMOP Common Data Model at Seoul National University Bundang Hospital. This dataset includes 11,181,617 radiology and 9,282,477 notes from various other departments (non-radiology reports). From this, 0.1% of the reports (11,182) were randomly selected for training and validation purposes. We used two de-identification strategies to improve performance with limited and few annotated data. First, a rule-based approach is used to construct regular expressions on the 1,112 notes annotated by domain experts. Second, by using the regular expressions as label-er, we applied a semi-supervised approach to fine-tune a pre-trained Korean BERT model with pseudo-labeled notes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Validation was conducted using 342 radiology and 12 non-radiology notes labeled at the token level. Our rule-based approach achieved 97.2% precision, 93.7% recall, and 96.2% F1 score from the department of radiology notes. For machine learning approach, KoBERT-NER that is fine-tuned with 32,000 automatically pseudo-labeled notes achieved 96.5% precision, 97.6% recall, and 97.1% F1 score.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>By combining a rule-based approach and machine learning in a semi-supervised way, our results show that the performance of de-identification can be improved.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12911-025-02913-z","type":"journal-article","created":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T00:08:59Z","timestamp":1739750939000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["De-identification of clinical notes with pseudo-labeling using regular expression rules and pre-trained BERT"],"prefix":"10.1186","volume":"25","author":[{"given":"Jiyong","family":"An","sequence":"first","affiliation":[]},{"given":"Jiyun","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Leonard","family":"Sunwoo","sequence":"additional","affiliation":[]},{"given":"Hyunyoung","family":"Baek","sequence":"additional","affiliation":[]},{"given":"Sooyoung","family":"Yoo","sequence":"additional","affiliation":[]},{"given":"Seunggeun","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,2,17]]},"reference":[{"issue":"3","key":"2913_CR1","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1377\/hlthaff.2011.1314","volume":"31","author":"C Williams","year":"2012","unstructured":"Williams C, Mostashari F, Mertz K, Hogin E, Atwal P. From the office of the national coordinator: the strategy for advancing the exchange of health information. Health Aff. 2012;31(3):527\u201336.","journal-title":"Health Aff"},{"key":"2913_CR2","doi-asserted-by":"crossref","unstructured":"Fuad A, Hsu CY. High rate EHR adoption in Korea and health IT rise in Asia. 2012.","DOI":"10.1016\/j.ijmedinf.2012.04.010"},{"issue":"3","key":"2913_CR3","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1016\/j.ijmedinf.2011.12.002","volume":"81","author":"D Yoon","year":"2012","unstructured":"Yoon D, Chang B-C, Kang SW, Bae H, Park RW. Adoption of electronic health records in Korean tertiary teaching and general hospitals. Int J Med Inf. 2012;81(3):196\u2013203.","journal-title":"Int J Med Inf"},{"issue":"1","key":"2913_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson AE, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3(1):1\u20139.","journal-title":"Sci Data"},{"issue":"1","key":"2913_CR5","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1038\/s41746-020-0258-y","volume":"3","author":"B Norgeot","year":"2020","unstructured":"Norgeot B, Muenzen K, Peterson TA, et al. Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes. NPJ Digit Med. 2020;3(1):57.","journal-title":"NPJ Digit Med"},{"key":"2913_CR6","unstructured":"Guo Y, Gaizauskas R, Roberts I, Demetriou G, Hepple M. Identifying personal health information using support vector machines. Paper presented at: i2b2 workshop on challenges in natural language processing for clinical data. 2006. p. 10\u201311."},{"issue":"5","key":"2913_CR7","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1197\/jamia.M2444","volume":"14","author":"\u00d6 Uzuner","year":"2007","unstructured":"Uzuner \u00d6, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550\u201363.","journal-title":"J Am Med Inform Assoc"},{"key":"2913_CR8","unstructured":"Khin K, Burckhardt P, Padman R. A deep learning architecture for de-identification of patient notes: Implementation and evaluation. arXiv preprint arXiv:1810.01570. 2018."},{"issue":"4","key":"2913_CR9","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234\u201340.","journal-title":"Bioinformatics"},{"key":"2913_CR10","doi-asserted-by":"crossref","unstructured":"Alsentzer E, Murphy JR, Boag W, Weng WH, Jin D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323. 2019.","DOI":"10.18653\/v1\/W19-1909"},{"key":"2913_CR11","unstructured":"Meaney C, Hakimpour W, Kalia S, Moineddin R. A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data. arXiv preprint arXiv:2204.07056. 2022."},{"key":"2913_CR12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-018-0723-6","volume":"19","author":"X Yang","year":"2019","unstructured":"Yang X, Lyu T, Li Q, Lee CY, Bian J, Hogan WR, Wu Y. A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC Med Inf Decis Mak. 2019;19:1\u20139.","journal-title":"BMC Med Inf Decis Mak"},{"key":"2913_CR13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-019-1002-x","volume":"20","author":"T Hartman","year":"2020","unstructured":"Hartman T, Howell MD, Dean J, Hoory S, Slyper R, Laish I, Gilon O, Vainstein D, Corrado G, Chou K, Po MJ. Customization scenarios for de-identification of clinical notes. BMC Med Inf Decis Mak. 2020;20:1\u20139.","journal-title":"BMC Med Inf Decis Mak"},{"key":"2913_CR14","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1145\/3368555.3384455","volume":"2020","author":"AE Johnson","year":"2020","unstructured":"Johnson AE, Bulgarelli L, Pollard TJ. Deidentification of free-text medical records using pre-trained bidirectional transformers. Proc ACM Conf Health Inference Learn. 2020;2020:214\u201322115.","journal-title":"Proc ACM Conf Health Inference Learn"},{"key":"2913_CR15","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1016\/j.jbi.2013.12.014","volume":"50","author":"C Grouin","year":"2014","unstructured":"Grouin C, N\u00e9v\u00e9ol A. De-identification of clinical notes in French: towards a protocol for reference corpus development. J Biomed Inform. 2014;50:151\u201361.","journal-title":"J Biomed Inform"},{"issue":"8","key":"2913_CR16","doi-asserted-by":"publisher","first-page":"e38154","DOI":"10.2196\/38154","volume":"10","author":"P Wang","year":"2022","unstructured":"Wang P, Li Y, Yang L, Li S, Li L, Zhao Z, Long S, Wang F, Wang H, Li Y, Wang C. An efficient method for deidentifying protected health information in Chinese electronic health records: algorithm development and validation. JMIR Med Inf. 2022;10(8):e38154.","journal-title":"JMIR Med Inf"},{"issue":"1","key":"2913_CR17","doi-asserted-by":"publisher","DOI":"10.3346\/jkms.2015.30.1.7","volume":"30","author":"SY Shin","year":"2015","unstructured":"Shin SY, Park YR, Shin Y, Choi HJ, Park J, Lyu Y, Lee MS, Choi CM, Kim WS, Lee JH. A de-identification method for bilingual clinical texts of various note types. J Korean Med Sci. 2015;30(1): 7.","journal-title":"J Korean Med Sci"},{"key":"2913_CR18","unstructured":"SKTBrain S. Korean BERT pre-trained cased (KoBERT). 2019. Available at:\u00a0https:\/\/github.com\/SKTBrain\/KoBERT."},{"key":"2913_CR19","unstructured":"Park J. KoBERT-NER. 2020. Available at:\u00a0https:\/\/github.com\/monologg\/KoBERT-NER."},{"key":"2913_CR20","unstructured":"Naver. Naver NLP. challenge. 2018. Available at: https:\/\/github.com\/naver\/nlp-challenge."},{"key":"2913_CR21","doi-asserted-by":"crossref","unstructured":"Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676. 2019.","DOI":"10.18653\/v1\/D19-1371"},{"key":"2913_CR22","doi-asserted-by":"crossref","unstructured":"Tai W, Kung HT, Dong XL, Comiter M, Kuo CF. exBERT: Extending pre-trained models with domain-specific vocabulary under constrained training resources. Findings of the Association for Computational Linguistics: EMNLP 2020. 2020;1433\u20139.","DOI":"10.18653\/v1\/2020.findings-emnlp.129"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-025-02913-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-025-02913-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-025-02913-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T00:09:10Z","timestamp":1739750950000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-025-02913-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,17]]},"references-count":22,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["2913"],"URL":"https:\/\/doi.org\/10.1186\/s12911-025-02913-z","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-2672115\/v1","asserted-by":"object"}]},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,17]]},"assertion":[{"value":"9 March 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 February 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study was performed in accordance with the relevant guidelines and regulations of SNUBH Institutional Review Board. It was approved by the Institutional Review Board of SNUBH with wavier of informed consent (IRB No.: B-2206-761-002).","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"This study does not contain any individual person\u2019s data in any form (including any individual details, images, or videos). No consent is required for publication.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"82"}}