{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T14:37:36Z","timestamp":1775140656904,"version":"3.50.1"},"reference-count":15,"publisher":"China Science Publishing & Media Ltd.","issue":"3","license":[{"start":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T00:00:00Z","timestamp":1624665600000},"content-version":"vor","delay-in-days":176,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,9,8]]},"abstract":"<jats:p>Medical named entity recognition (NER) is an area in which medical named entities are recognized from medical texts, such as diseases, drugs, surgery reports, anatomical parts, and examination documents. Conventional medical NER methods do not make full use of un-labelled medical texts embedded in medical documents. To address this issue, we proposed a medical NER approach based on pre-trained language models and a domain dictionary. First, we constructed a medical entity dictionary by extracting medical entities from labelled medical texts and collecting medical entities from other resources, such as the Yidu-N4K data set. Second, we employed this dictionary to train domain-specific pre-trained language models using un-labelled medical texts. Third, we employed a pseudo labelling mechanism in un-labelled medical texts to automatically annotate texts and create pseudo labels. Fourth, the BiLSTM-CRF sequence tagging model was used to fine-tune the pre-trained language models. Our experiments on the un-labelled medical texts, which were extracted from Chinese electronic medical records, show that the proposed NER approach enables the strict and relaxed F1 scores to be 88.7% and 95.3%, respectively.<\/jats:p>","DOI":"10.1162\/dint_a_00105","type":"journal-article","created":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T01:24:20Z","timestamp":1624670660000},"page":"402-417","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":26,"title":["Medical Named Entity Recognition from Un-labelled Medical Records\n                    based on Pre-trained Language Models and Domain Dictionary"],"prefix":"10.3724","volume":"3","author":[{"given":"Chaojie","family":"Wen","sequence":"first","affiliation":[{"name":"Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China"}]},{"given":"Tao","family":"Chen","sequence":"additional","affiliation":[{"name":"Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China"}]},{"given":"Xudong","family":"Jia","sequence":"additional","affiliation":[{"name":"Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China"}]},{"given":"Jiang","family":"Zhu","sequence":"additional","affiliation":[{"name":"Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, China"}]}],"member":"2026","published-online":{"date-parts":[[2021,9,8]]},"reference":[{"issue":"5","key":"2021102914232832100_ref1","doi-asserted-by":"crossref","first-page":"808","DOI":"10.1136\/amiajnl-2013-002381","article-title":"A comprehensive study of named entity recognition in Chinese\n                        clinical text","volume":"21","author":"Lei","year":"2014","journal-title":"Journal of the American Medical\n                        Informatics Association"},{"key":"2021102914232832100_ref2","doi-asserted-by":"crossref","first-page":"113942","DOI":"10.1109\/ACCESS.2019.2935223","article-title":"An attention-based BiLSTM-CRF model for Chinese clinic named\n                        entity recognition","volume":"7","author":"Wu","year":"2019","journal-title":"IEEE Access"},{"key":"2021102914232832100_ref3","first-page":"349","article-title":"Recognition of Chinese medicine named entity based on\n                        condition random field","volume":"48","author":"Wang","year":"2009","journal-title":"Journal of Xiamen University\n                        (Natural Science)"},{"key":"2021102914232832100_ref4","first-page":"223","article-title":"A preliminary work on symptom name recognition from free-text\n                        clinical records of traditional Chinese medicine using conditional random\n                        fields and reasonable features","volume-title":"Proceedings of\n                        the 2012 Workshop on Biomedical Natural Language Processing","author":"Wang","year":"2012"},{"issue":"e1","key":"2021102914232832100_ref5","doi-asserted-by":"crossref","first-page":"e84","DOI":"10.1136\/amiajnl-2013-001806","article-title":"Joint segmentation and named entity recognition using dual\n                        decomposition in Chinese discharge summaries","volume":"21","author":"Xu","year":"2014","journal-title":"Journal of the American Medical Informatics Association"},{"key":"2021102914232832100_ref6","first-page":"624","article-title":"Named entity recognition in Chinese clinical text using deep\n                        neural network","volume":"216","author":"Wu","year":"2015","journal-title":"Studies in Health Technology and\n                        Informatics"},{"issue":"11","key":"2021102914232832100_ref7","first-page":"2725","article-title":"Chinese electronic medical record named entity and entity\n                        relationship corpus construction","volume":"27","author":"Yang","year":"2016","journal-title":"Journal of\n                        Software"},{"issue":"20","key":"2021102914232832100_ref8","first-page":"3237","article-title":"Named entity recognition based on bidirectional long\n                        short-term memory combined with case report form","volume":"22","author":"Yang","year":"2018","journal-title":"Chinese Journal of Tissue Engineering Research"},{"key":"2021102914232832100_ref9","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1186\/s12859-018-2467-9","article-title":"A multitask bi-directional RNN model for named entity\n                        recognition on Chinese electronic medical records","volume":"19","author":"Chowdhury","year":"2018","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"2021102914232832100_ref10","first-page":"54","article-title":"The recognition of naming entity of Bi-LSTM Chinese\n                        electronic medical records based on the joint training of Chinese characters\n                        and words","volume":"14","author":"Wan","year":"2019","journal-title":"China Digital Medicine"},{"key":"2021102914232832100_ref11","volume-title":"Google's neural machine translation system: Bridging the gap\n                        between human and machine translation","author":"Wu","year":"2016"},{"key":"2021102914232832100_ref12","first-page":"1","article-title":"Pseudo-label: The simple and efficient semi-supervised\n                        learning method for deep neural networks","volume-title":"Proceedings of ICML 2013 Workshop: Challenges in Representation\n                        Learning (WREPL)","author":"Lee","year":"2013"},{"key":"2021102914232832100_ref13","volume-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019"},{"key":"2021102914232832100_ref14","volume-title":"Pre-training with whole word masking for Chinese BERT","author":"Cui","year":"2020"},{"key":"2021102914232832100_ref15","doi-asserted-by":"crossref","first-page":"8342","DOI":"10.18653\/v1\/2020.acl-main.740","article-title":"Don't stop pretraining: Adapt language models to\n                        domains and tasks","volume-title":"Proceedings of the 58th\n                        Annual Meeting of the Association for Computational Linguistics","author":"Gururangan","year":"2020"}],"container-title":["Data Intelligence"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/dint\/article-pdf\/3\/3\/402\/1969110\/dint_a_00105.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/dint\/article-pdf\/3\/3\/402\/1969110\/dint_a_00105.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:43:56Z","timestamp":1741938236000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.sciengine.com\/doi\/10.1162\/dint_a_00105"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":15,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,8]]}},"URL":"https:\/\/doi.org\/10.1162\/dint_a_00105","relation":{},"ISSN":["2641-435X"],"issn-type":[{"value":"2641-435X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}