{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T15:29:32Z","timestamp":1772638172291,"version":"3.50.1"},"reference-count":53,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T00:00:00Z","timestamp":1772582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Reinventing University Funding (2024) from the Office of the Permanent Secretary, Ministry of Higher Education, Science, Research and Innovation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Automating ICD-10 coding from discharge summaries remains demanding because coders analyze clinical narratives while justifying decisions. This study compares three automation patterns: PLM-ICD as a standalone deep learning system emitting 15 codes per case, LLM-only generation with full autonomy, and a hybrid approach where PLM-ICD drafts candidates for an agentic LLM audit to accept or reject. All strategies were evaluated on 19,801 MIMIC-IV summaries using four LLMs spanning compact (Qwen2.5-3B-Instruct, Llama-3.2-3B-Instruct, Phi-4-mini-instruct) to large-scale (Sonnet-4.5). Precision guided evaluation because coders still supply any missing diagnoses. PLM-ICD alone reached 55.8% precision while always surfacing 15 suggestions. LLM-only generation lagged severely (1.5\u201334.6% precision) and produced inconsistent output sizes. The agentic audit delivered the best trade-off: compact LLMs reviewed the 15 candidates, discarded weak evidence, and returned 2\u20138 high-confidence codes. Llama-3.2-3B-Instruct, for example, improved from 1.5% as a generator to 55.1% as a verifier while trimming false positives by 73%. These results show that positioning LLMs as quality controllers, rather than primary generators, yields reliable support for clinical coding teams, while formal recall\/F1 reporting remains future work for fully autonomous implementations.<\/jats:p>","DOI":"10.3390\/informatics13030039","type":"journal-article","created":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T08:51:33Z","timestamp":1772614293000},"page":"39","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Integrating Agentic Artificial Intelligence to Automate International Classification of Diseases, Tenth Revision, Medical Coding"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4160-8607","authenticated-orcid":false,"given":"Kitti","family":"Akkhawatthanakun","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakorn Pathom 73170, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5789-5966","authenticated-orcid":false,"given":"Lalita","family":"Narupiyakul","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakorn Pathom 73170, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4444-1396","authenticated-orcid":false,"given":"Konlakorn","family":"Wongpatikaseree","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakorn Pathom 73170, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3714-8482","authenticated-orcid":false,"given":"Narit","family":"Hnoohom","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakorn Pathom 73170, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1508-3123","authenticated-orcid":false,"given":"Chakkrit","family":"Termritthikun","sequence":"additional","affiliation":[{"name":"School of Renewable Energy and Smart Grid Technology (SGtech), Naresuan University, Phitsanulok 65000, Thailand"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4218-5176","authenticated-orcid":false,"given":"Paisarn","family":"Muneesawang","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, Mahidol University, Nakorn Pathom 73170, Thailand"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,4]]},"reference":[{"key":"ref_1","unstructured":"World Health Organization (2016). International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10)."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1097\/01.mlr.0000228018.48783.34","article-title":"Quality of Diagnosis and Procedure Coding in ICD-10 Administrative Data","volume":"44","author":"Henderson","year":"2006","journal-title":"Med. Care"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Edin, J., Junge, A., Havtorn, J.D., Borgholt, L., Maistro, M., Ruotsalo, T., and Maal\u00f8e, L. (2023, January 23\u201327). Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Taipei, Taiwan.","DOI":"10.1145\/3539618.3591918"},{"key":"ref_4","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics."},{"key":"ref_5","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2019","journal-title":"Bioinformatics"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Alsentzer, E., Murphy, J., Boag, W., Weng, W.H., Jindi, D., Naumann, T., and McDermott, M. (2019). Publicly Available Clinical BERT Embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop, Association for Computational Linguistics.","DOI":"10.18653\/v1\/W19-1909"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Huang, C.W., Tsai, S.C., and Chen, Y.N. (2022). PLM-ICD: Automatic ICD Coding with Pretrained Language Models. Proceedings of the 4th Clinical Natural Language Processing Workshop, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2022.clinicalnlp-1.2"},{"key":"ref_9","first-page":"3005","article-title":"Human-in-the-loop machine learning: A state of the art","volume":"56","year":"2022","journal-title":"Artif. Intell. Rev."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1801","DOI":"10.1093\/jamia\/ocae202","article-title":"Large language models in biomedicine and health: Current research landscape and future directions","volume":"31","author":"Lu","year":"2024","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_11","unstructured":"Nori, H., King, N., McKinney, S.M., Carignan, D., and Horvitz, E. (2023). Capabilities of GPT-4 on Medical Challenge Problems. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1136\/amiajnl-2013-002159","article-title":"Diagnosis code assignment: Models and evaluation metrics","volume":"21","author":"Perotte","year":"2014","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Masud, J.H.B., Kuo, C.C., Yeh, C.Y., Yang, H.C., and Lin, M.C. (2023). Applying Deep Learning Model to Predict Diagnosis Code of Medical Records. Diagnostics, 13.","DOI":"10.3390\/diagnostics13132297"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, H., Mourad, A., Zhuang, S., Koopman, B., and Zuccon, G. (2022). Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls. arXiv.","DOI":"10.1145\/3572960.3572982"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Manning, C.D., Raghavan, P., and Sch\u00fctze, H. (2008). Introduction to Information Retrieval, Cambridge University Press.","DOI":"10.1017\/CBO9780511809071"},{"key":"ref_17","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1109\/TNNLS.2020.2979670","article-title":"A Survey of the Usages of Deep Learning for Natural Language Processing","volume":"32","author":"Otter","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1038\/s41746-021-00404-9","article-title":"Automatic Multilabel Detection of ICD-10 Codes in Dutch Cardiology Discharge Letters Using Neural Networks","volume":"4","author":"Sammani","year":"2021","journal-title":"npj Digit. Med."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xie, P., and Xing, E. (2018). A Neural Architecture for Automated ICD Coding. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/P18-1098"},{"key":"ref_21","first-page":"5998","article-title":"Attention is All you Need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","first-page":"1101","article-title":"Explainable Prediction of Medical Codes from Clinical Text","volume":"Volume 1","author":"Mullenbach","year":"2018","journal-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Vu, T., Nguyen, D.Q., and Nguyen, A. (2020, January 11\u201317). A Label Attention Model for ICD Coding from Clinical Text. Proceedings of the 29th International Joint Conference on Artificial Intelligence, Yokohama, Japan.","DOI":"10.24963\/ijcai.2020\/461"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"104161","DOI":"10.1016\/j.jbi.2022.104161","article-title":"Hierarchical label-wise attention transformer model for explainable ICD coding","volume":"133","author":"Liu","year":"2022","journal-title":"J. Biomed. Inform."},{"key":"ref_25","first-page":"8180","article-title":"ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network","volume":"34","author":"Li","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Qiu, W., Wu, Y., Li, Y., Niu, K., Zeng, M., and Li, M. (2023, January 5\u20138). DILM-ICD: A Deep Iterative Learning Model for Automatic ICD Coding. Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkiye.","DOI":"10.1109\/BIBM58861.2023.10385585"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, W., Yan, J., Wang, X., and Zha, H. (2018, January 11\u201314). Deep Extreme Multi-label Learning. Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, Yokohama, Japan.","DOI":"10.1145\/3206025.3206030"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Pascual, D., Luck, S., and Wattenhofer, R. (2021). Towards BERT-based Automatic ICD Coding: Limitations and Opportunities. Proceedings of the 20th Workshop on Biomedical Language Processing, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2021.bionlp-1.6"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Rios, A., and Kavuluru, R. (2018). Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics.","DOI":"10.18653\/v1\/D18-1352"},{"key":"ref_30","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Bhosale, S. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv."},{"key":"ref_31","unstructured":"Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R.J., Javaheripi, M., and Kauffmann, P. (2024). Phi-4 Technical Report. arXiv."},{"key":"ref_32","unstructured":"Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., and McKinnon, C. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1136\/svn-2017-000101","article-title":"Artificial intelligence in healthcare: Past, present and future","volume":"2","author":"Jiang","year":"2017","journal-title":"Stroke Vasc. Neurol."},{"key":"ref_34","unstructured":"Kim, Y., Jeong, H., Park, C., Park, E., Zhang, H., Liu, X., Lee, H., McDuff, D., Ghassemi, M., and Breazeal, C. (2025). Tiered Agentic Oversight: A Hierarchical Multi-Agent System for Healthcare Safety. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"e72644","DOI":"10.2196\/72644","article-title":"Prompt Engineering in Clinical Practice: Tutorial for Clinicians","volume":"27","author":"Liu","year":"2025","journal-title":"J. Med. Internet Res."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"e55318","DOI":"10.2196\/55318","article-title":"An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study","volume":"12","author":"Sivarajkumar","year":"2024","journal-title":"JMIR Med. Inform."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41746-024-01029-4","article-title":"Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs","volume":"7","author":"Wang","year":"2024","journal-title":"npj Digit. Med."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"e60501","DOI":"10.2196\/60501","article-title":"Prompt Engineering Paradigms for Medical Applications: Scoping Review","volume":"26","author":"Zaghir","year":"2024","journal-title":"J. Med. Internet Res."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zaghir, J., Naguib, M., Bjelogrlic, M., N\u00e9v\u00e9ol, A., Tannier, X., and Lovis, C. (2024). Prompt engineering paradigms for medical applications: Scoping review and recommendations for better practices. arXiv.","DOI":"10.2196\/preprints.60501"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zouhar, V., Meister, C., Gastaldi, J.L., Du, L., Vieira, T., Sachan, M., and Cotterell, R. (2023). A Formal Perspective on Byte-Pair Encoding. arXiv.","DOI":"10.18653\/v1\/2023.findings-acl.38"},{"key":"ref_41","unstructured":"Nguyen, T.T., Schlegel, V., Kashyap, A., Winkler, S., Huang, S.S., Liu, J.J., and Lin, C.J. (2023). MIMIC-IV-ICD: A New Benchmark for Extreme Multilabel Classification. arXiv."},{"key":"ref_42","unstructured":"Wallach, H., Larochelle, H., Beygelzimer, A., d\u2019Alch\u00e9-Buc, F., Fox, E., and Garnett, R. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020). HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_44","unstructured":"Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., and Huang, F. (2023). Qwen Technical Report. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Davis, J., and Goadrich, M. (2006, January 25\u201329). The Relationship Between Precision-Recall and ROC Curves. Proceedings of the 23rd International Conference on Machine Learning (ICML \u201906), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143874"},{"key":"ref_46","unstructured":"Johnson, A., Pollard, T., Horng, S., Celi, L.A., and Mark, R. (2023). MIMIC-IV-Note: Deidentified free-text clinical notes. PhysioNet."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41597-022-01899-x","article-title":"MIMIC-IV, a freely accessible electronic health record dataset","volume":"10","author":"Johnson","year":"2023","journal-title":"Sci. Data"},{"key":"ref_48","unstructured":"Pham, H., Wang, G., Lu, Y., Florencio, D., and Zhang, C. (2022). Understanding Long Documents with Different Position-Aware Attentions. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1136\/jamia.2001.0080527","article-title":"Clinical Decision Support Systems for the Practice of Evidence-based Medicine","volume":"8","author":"Sim","year":"2001","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"AIdbp2300040","DOI":"10.1056\/AIdbp2300040","article-title":"Large Language Models Are Poor Medical Coders\u2014Benchmarking of Medical Code Querying","volume":"1","author":"Soroush","year":"2024","journal-title":"NEJM AI"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/3-540-45014-9_1","article-title":"Ensemble methods in machine learning","volume":"1857","author":"Dietterich","year":"2000","journal-title":"Lect. Notes Comput. Sci."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"40082","DOI":"10.57187\/smw.2023.40082","article-title":"Tackling alert fatigue with a semi-automated clinical decision support system: Quantitative evaluation and end-user survey","volume":"153","author":"Dahmke","year":"2023","journal-title":"Swiss Med. Wkly."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"e078378","DOI":"10.1136\/bmj-2023-078378","article-title":"TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods","volume":"385","author":"Collins","year":"2024","journal-title":"BMJ"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/3\/39\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T09:32:55Z","timestamp":1772616775000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/3\/39"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,4]]},"references-count":53,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["informatics13030039"],"URL":"https:\/\/doi.org\/10.3390\/informatics13030039","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,4]]}}}