{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T20:23:47Z","timestamp":1779395027370,"version":"3.53.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T00:00:00Z","timestamp":1614297600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T00:00:00Z","timestamp":1614297600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter\/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76\u20130.99 for three-character and 0.87\u20130.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age\/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.<\/jats:p>","DOI":"10.1038\/s41746-021-00404-9","type":"journal-article","created":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T11:03:35Z","timestamp":1614337415000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":37,"title":["Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks"],"prefix":"10.1038","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5557-9342","authenticated-orcid":false,"given":"Arjan","family":"Sammani","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ayoub","family":"Bagheri","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter G. M.","family":"van der Heijden","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anneline S. J. M.","family":"te Riele","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Annette F.","family":"Baas","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"C. A. J.","family":"Oosters","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7467-2297","authenticated-orcid":false,"given":"Daniel","family":"Oberski","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1692-8669","authenticated-orcid":false,"given":"Folkert W.","family":"Asselbergs","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,2,26]]},"reference":[{"key":"404_CR1","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1038\/nrg3208","volume":"13","author":"PB Jensen","year":"2012","unstructured":"Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395\u2013405 (2012).","journal-title":"Nat. Rev. Genet."},{"key":"404_CR2","doi-asserted-by":"crossref","unstructured":"Bagheri, A., Sammani, A., van der Heijden, P. G. M., Asselbergs, F. W. & Oberski, D. L. Automatic ICD-10 classification of diseases from Dutch discharge letters. in BIOINFORMATICS 2020\u201411th International Conference on Bioinformatics Models, Methods and Algorithms, Proceedings; Part of 13th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2020 vol. BIOSTEC202. 281\u2013289 (SCITEPRESS\u2014Science and Technology Publications, 2020).","DOI":"10.5220\/0009372602810289"},{"key":"404_CR3","doi-asserted-by":"publisher","first-page":"596","DOI":"10.3174\/ajnr.A4696","volume":"37","author":"JA Hirsch","year":"2016","unstructured":"Hirsch, J. A. et al. ICD-10: History and context. Am. J. Neuroradiol. 37, 596\u2013599 (2016).","journal-title":"Am. J. Neuroradiol."},{"key":"404_CR4","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/j.ijmedinf.2019.05.015","volume":"129","author":"A Atutxa","year":"2019","unstructured":"Atutxa, A., de Ilarraza, A. D., Gojenola, K., Oronoz, M. & Perez-de-Vi\u00f1aspre, O. Interpretable deep learning to map diagnostic texts to ICD-10 codes. Int. J. Med. Inform. 129, 49\u201359 (2019).","journal-title":"Int. J. Med. Inform."},{"key":"404_CR5","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/j.ijmedinf.2006.11.005","volume":"77","author":"J Stausberg","year":"2008","unstructured":"Stausberg, J., Lehmann, N., Kaczmarek, D. & Stein, M. Reliability of diagnoses coding with ICD-10. Int. J. Med. Inf. 77, 50\u201357 (2008).","journal-title":"Int. J. Med. Inf."},{"key":"404_CR6","doi-asserted-by":"publisher","first-page":"105264","DOI":"10.1016\/j.cmpb.2019.105264","volume":"188","author":"A Blanco","year":"2020","unstructured":"Blanco, A., Perez-de-Vi\u00f1aspre, O., P\u00e9rez, A. & Casillas, A. Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity. Comput. Methods Prog. Biomed. 188, 105264 (2020).","journal-title":"Comput. Methods Prog. Biomed."},{"key":"404_CR7","doi-asserted-by":"crossref","unstructured":"Koopman, B. et al. Automatic classification of diseases from free-text death certificates for real-time surveillance. BMC Med. Inform. Decis. Mak. 15, 53 (2015).","DOI":"10.1186\/s12911-015-0174-2"},{"key":"404_CR8","doi-asserted-by":"publisher","first-page":"104135","DOI":"10.1016\/j.ijmedinf.2020.104135","volume":"139","author":"WA Sonabend","year":"2020","unstructured":"Sonabend, W. A. et al. Automated ICD coding via unsupervised knowledge integration (UNITE). Int. J. Med. Inform. 139, 104135 (2020).","journal-title":"Int. J. Med. Inform."},{"key":"404_CR9","first-page":"417","volume":"2019","author":"L Cao","year":"2019","unstructured":"Cao, L., Gu, D., Ni, Y. & Xie, G. Automatic ICD Code Assignment based on ICD\u2019s Hierarchy Structure for Chinese Electronic Medical Records. AMIA Jt. Summits Transl. Sci. Proc. 2019, 417\u2013424 (2019).","journal-title":"AMIA Jt. Summits Transl. Sci. Proc."},{"key":"404_CR10","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0173410","volume":"12","author":"YZ Chen","year":"2017","unstructured":"Chen, Y. Z., Lu, H. J. & Li, L. J. Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity. PLoS ONE 12, e0173410 (2017).","journal-title":"PLoS ONE"},{"key":"404_CR11","doi-asserted-by":"publisher","first-page":"1279","DOI":"10.1093\/jamia\/ocz085","volume":"26","author":"J Du","year":"2019","unstructured":"Du, J. et al. ML-Net: multi-label classification of biomedical texts with deep neural networks. J. Am. Med. Inform. Assoc. 26, 1279\u20131285 (2019).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"404_CR12","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1016\/j.jbi.2018.02.011","volume":"80","author":"F Duarte","year":"2018","unstructured":"Duarte, F., Martins, B., Pinto, C. S. & Silva, M. J. Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text. J. Biomed. Inform. 80, 64\u201377 (2018).","journal-title":"J. Biomed. Inform."},{"key":"404_CR13","doi-asserted-by":"publisher","unstructured":"Karimi, S., Dai, X., Hassanzadeh, H. & Nguyen, A. Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods. in BioNLP 328\u2013332 (Association for Computational Linguistics, 2017) https:\/\/doi.org\/10.18653\/v1\/w17-2342.","DOI":"10.18653\/v1\/w17-2342"},{"key":"404_CR14","doi-asserted-by":"publisher","DOI":"10.2196\/12015","volume":"21","author":"C Lin","year":"2019","unstructured":"Lin, C. et al. Projection word embedding model with hybrid sampling training for classifying ICD-10-CM codes: Longitudinal observational study. J. Med. Internet Res. 21, e14499 (2019).","journal-title":"J. Med. Internet Res."},{"key":"404_CR15","doi-asserted-by":"publisher","first-page":"516","DOI":"10.1197\/jamia.M2077","volume":"13","author":"SVS Pakhomov","year":"2006","unstructured":"Pakhomov, S. V. S., Buntrock, J. D. & Chute, C. G. Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning. Tech. J. Am. Med. Inform. Assoc. 13, 516\u2013525 (2006).","journal-title":"Tech. J. Am. Med. Inform. Assoc."},{"key":"404_CR16","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1136\/amiajnl-2013-002159","volume":"21","author":"A Perotte","year":"2014","unstructured":"Perotte, A. et al. Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. 21, 231\u2013237 (2014).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"404_CR17","unstructured":"Bhavani Singh, A. K., Guntu, M., Bhimireddy, A. R., Gichoya, J. W. & Purkayastha, S. Multi-label natural language processing to identify diagnosis and procedure codes from MIMIC-III inpatient notes. Preprint at arXiv https:\/\/arxiv.org\/abs\/2003.07507 (2020)."},{"key":"404_CR18","unstructured":"Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. in NAACL HLT 2019\u20142019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies\u2014Proceedings of the Conference. vol. 1 4171\u20134186 (2019)."},{"key":"404_CR19","doi-asserted-by":"publisher","unstructured":"Peters, M. E. et al. Deep contextualized word representations. in NAACL HLT 2018\u20142018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies\u2014Proceedings of the Conference. https:\/\/doi.org\/10.18653\/v1\/n18-1202 (2018).","DOI":"10.18653\/v1\/n18-1202"},{"key":"404_CR20","doi-asserted-by":"publisher","first-page":"584","DOI":"10.1093\/jamia\/ocaa001","volume":"27","author":"C Lin","year":"2020","unstructured":"Lin, C. et al. Does BERT need domain adaptation for clinical negation detection? J. Am. Med. Inform. Assoc. 27, 584\u2013591 (2020).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"404_CR21","doi-asserted-by":"publisher","DOI":"10.2196\/18055","author":"M Abdalla","year":"2020","unstructured":"Abdalla, M., Abdalla, M., Hirst, G. & Rudzicz, F. Exploring the privacy-preserving properties of word embeddings: algorithmic validation study. J. Med. Internet Res. https:\/\/doi.org\/10.2196\/18055 (2020).","journal-title":"J. Med. Internet Res."},{"key":"404_CR22","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1016\/j.tele.2017.08.002","volume":"35","author":"V Menger","year":"2018","unstructured":"Menger, V., Scheepers, F., van Wijk, L. M. & Spruit, M. DEDUCE: a pattern matching method for automatic de-identification of Dutch medical text. Telemat. Inform. 35, 727\u2013736 (2018).","journal-title":"Telemat. Inform."},{"key":"404_CR23","volume":"21","author":"S Sheikhalishahi","year":"2019","unstructured":"Sheikhalishahi, S. et al. Natural language processing of clinical notes on chronic diseases: systematic review. J. Med. Internet Res. 21, e12239 (2019).","journal-title":"J. Med. Internet Res."},{"key":"404_CR24","unstructured":"Cao, S., Kitaev, N. & Klein, D. Multilingual alignment of contextual word representations. Preprint at arXiv https:\/\/arxiv.org\/abs\/2002.03518 (2020)."},{"key":"404_CR25","doi-asserted-by":"crossref","unstructured":"Peng, Y., Yan, S. & Lu, Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. Preprint at arXiv https:\/\/arxiv.org\/abs\/1906.05474 (2019).","DOI":"10.18653\/v1\/W19-5006"},{"key":"404_CR26","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1186\/s13075-019-2092-7","volume":"21","author":"L Jamian","year":"2019","unstructured":"Jamian, L., Wheless, L., Crofford, L. J. & Barnado, A. Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record. Arthritis Res. Ther. 21, 305 (2019).","journal-title":"Arthritis Res. Ther."},{"key":"404_CR27","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1038\/s41746-019-0113-1","volume":"2","author":"Y Zhang","year":"2019","unstructured":"Zhang, Y., Nie, A., Zehnder, A., Page, R. L. & Zou, J. VetTag: improving automated veterinary diagnosis coding via large-scale language modeling. npj Digit. Med. 2, 35 (2019).","journal-title":"npj Digit. Med."},{"key":"404_CR28","doi-asserted-by":"publisher","first-page":"426","DOI":"10.1007\/s12471-019-1288-4","volume":"27","author":"A Sammani","year":"2019","unstructured":"Sammani, A. et al. UNRAVEL: big data analytics research data platform to improve care of patients with cardiomyopathies using routine electronic health records and standardised biobanking. Neth. Hear. J. 27, 426\u2013434 (2019).","journal-title":"Neth. Hear. J."},{"key":"404_CR29","doi-asserted-by":"publisher","unstructured":"Jones, O., Maillardet, R. & Robinson, A. Introduction to Scientific Programming and Simulation Using R. https:\/\/doi.org\/10.1201\/9781420068740 (2009).","DOI":"10.1201\/9781420068740"},{"key":"404_CR30","unstructured":"Chollet, F. and others. Keras Documentation: Optimizers. Keras.Io https:\/\/keras.io\/optimizers\/ (2015)."},{"key":"404_CR31","unstructured":"Gal, Y. & Ghahramani, Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. Preprint at arXiv https:\/\/arxiv.org\/abs\/1506.02142 (2016)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00404-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00404-9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00404-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,3]],"date-time":"2022-12-03T18:45:35Z","timestamp":1670093135000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00404-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,26]]},"references-count":31,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["404"],"URL":"https:\/\/doi.org\/10.1038\/s41746-021-00404-9","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,26]]},"assertion":[{"value":"16 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"37"}}