{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T23:59:25Z","timestamp":1773446365891,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2022,12,7]],"date-time":"2022-12-07T00:00:00Z","timestamp":1670371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"China Shanghai Science and Technology Development Fund","award":["19511121204"],"award-info":[{"award-number":["19511121204"]}]},{"name":"Major Key Project of Peng Cheng Laboratory","award":["PCL2021A06"],"award-info":[{"award-number":["PCL2021A06"]}]},{"name":"China Shanghai Municipal Health Commission Advanced Appropriate Technology","award":["2019SY020"],"award-info":[{"award-number":["2019SY020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,2,16]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objectives<\/jats:title><jats:p>To develop an unbiased objective for learning automatic coding algorithms from clinical records annotated with only partial relevant International Classification of Diseases codes, as annotation noise in undercoded clinical records used as training data can mislead the learning process of deep neural networks.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We use Medical Information Mart for Intensive Care III as our dataset. We employ positive-unlabeled learning to achieve unbiased loss estimation, which is free of misleading training signal. We then utilize reweighting mechanism to compensate for the imbalance between positive and negative samples. To further close the performance gap caused by poor quality annotation, we integrate the supervision provided by the automatic annotation tool Medical Concept Annotation Toolkit which can ease the heavy burden of manual validation.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Our benchmarking results show that positive-unlabeled learning with reweighting outperforms competitive baseline methods over a range of missing label ratios. Integrating supervision provided by annotation tool further boosted the performance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Considering the annotation noise and severe imbalance, unbiased loss estimation and reweighting mechanism are both important for learning from undercoded clinical records. Unbiased loss requires the estimation of false negative ratios and estimation through trained models is practical and competitive.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>The combination of positive-unlabeled learning with reweighting and supervision provided by the annotation tool is a promising solution to learn from undercoded clinical records.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocac230","type":"journal-article","created":{"date-parts":[[2022,12,7]],"date-time":"2022-12-07T18:27:23Z","timestamp":1670437643000},"page":"438-446","source":"Crossref","is-referenced-by-count":5,"title":["Learning from undercoded clinical records for automated International Classification of Diseases (ICD) coding"],"prefix":"10.1093","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2908-2866","authenticated-orcid":false,"given":"Yucheng","family":"Jin","sequence":"first","affiliation":[{"name":"Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University , Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8575-5415","authenticated-orcid":false,"given":"Yun","family":"Xiong","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University , Shanghai, China"},{"name":"Peng Cheng Laboratory , Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2286-6732","authenticated-orcid":false,"given":"Dan","family":"Shi","sequence":"additional","affiliation":[{"name":"Department of Geriatrics, Yueyang Hospital of Integrated Traditional Chinese Medicine and Western Medicine, Shanghai University of Traditional Chinese Medicine , Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3184-9213","authenticated-orcid":false,"given":"Yifei","family":"Lin","sequence":"additional","affiliation":[{"name":"Medical Device Regulatory Research Center\/Precision Medicine Research Center, West China Hospital of Sichuan University , Sichuan, China"},{"name":"Department of Epidemiology, Harvard T.H. Chan School of Public Health , Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7810-9071","authenticated-orcid":false,"given":"Lifang","family":"He","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Lehigh University , Bethlehem, Pennsylvania, USA"}]},{"given":"Yao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University , Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9686-3876","authenticated-orcid":false,"given":"Joseph M","family":"Plasek","sequence":"additional","affiliation":[{"name":"Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women\u2019s Hospital, Harvard Medical School , Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3874-4833","authenticated-orcid":false,"given":"Li","family":"Zhou","sequence":"additional","affiliation":[{"name":"Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women\u2019s Hospital, Harvard Medical School , Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6268-1540","authenticated-orcid":false,"given":"David W","family":"Bates","sequence":"additional","affiliation":[{"name":"Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women\u2019s Hospital, Harvard Medical School , Boston, Massachusetts, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6460-0246","authenticated-orcid":false,"given":"Chunlei","family":"Tang","sequence":"additional","affiliation":[{"name":"Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women\u2019s Hospital, Harvard Medical School , Boston, Massachusetts, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,12,7]]},"reference":[{"key":"2023021611064294400_","author":"World Health Organization","year":"2004"},{"key":"2023021611064294400_","first-page":"1101","author":"Mullenbach","year":"2018"},{"issue":"1","key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1186\/s12859-019-2617-8","article-title":"Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks","volume":"20","author":"Li","year":"2019","journal-title":"BMC Bioinformatics"},{"issue":"11","key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1093\/jamia\/ocz072","article-title":"Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse","volume":"26","author":"Dligach","year":"2019","journal-title":"J Am Med Inform Assoc"},{"issue":"11","key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"1279","DOI":"10.1093\/jamia\/ocz085","article-title":"ML-Net: multi-label classification of biomedical texts with deep neural networks","volume":"26","author":"Du","year":"2019","journal-title":"J Am Med Inform Assoc"},{"key":"2023021611064294400_","first-page":"1066","author":"Xie","year":"2018"},{"key":"2023021611064294400_","first-page":"3335","author":"Vu","year":"2020"},{"key":"2023021611064294400_","first-page":"8180","author":"Li","year":"2020"},{"key":"2023021611064294400_","first-page":"3393","author":"Yuan","year":"2020"},{"key":"2023021611064294400_","first-page":"3132","author":"Rios","year":"2018"},{"key":"2023021611064294400_","first-page":"4018","author":"Song","year":"2020"},{"key":"2023021611064294400_","first-page":"2935","author":"Lu","year":"2020"},{"key":"2023021611064294400_","first-page":"132","author":"Lima","year":"1998"},{"key":"2023021611064294400_","first-page":"76","author":"Searle"},{"issue":"1","key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1093\/pubmed\/fdr054","article-title":"Systematic review of discharge coding accuracy","volume":"34","author":"Burns","year":"2012","journal-title":"J Public Health (Oxf)"},{"issue":"1","key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"key":"2023021611064294400_","first-page":"5948","author":"Zhou","year":"2021"},{"key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"1158","DOI":"10.26615\/978-954-452-072-4_130","article-title":"Multi-label diagnosis classification of Swedish discharge summaries \u2013 ICD-10 code assignment using KB-BERT","author":"Remmer","year":"2021","journal-title":"recent advances in natural language processing (RANLP)"},{"key":"2023021611064294400_","first-page":"1964","author":"Wu","year":"2014"},{"key":"2023021611064294400_","first-page":"410","author":"Chen","year":"2018"},{"key":"2023021611064294400_","first-page":"2301","author":"Xu","year":"2013"},{"key":"2023021611064294400_","first-page":"647","author":"Durand","year":"2019"},{"key":"2023021611064294400_","first-page":"9420","author":"Huynh","year":"2020"},{"key":"2023021611064294400_","first-page":"3711","author":"Qaraei","year":"2021"},{"key":"2023021611064294400_","first-page":"2995","author":"Su","year":"2021"},{"key":"2023021611064294400_","first-page":"1675","author":"Kiryo","year":"2017"},{"key":"2023021611064294400_","first-page":"935","author":"Jain","year":"2016"},{"key":"2023021611064294400_","doi-asserted-by":"crossref","first-page":"102083","DOI":"10.1016\/j.artmed.2021.102083","article-title":"Multi-domain clinical natural language processing with MedCAT: the Medical Concept Annotation Toolkit","volume":"117","author":"Kraljevic","year":"2021","journal-title":"Artif Intell Med"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/3\/438\/49198834\/ocac230.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/30\/3\/438\/49198834\/ocac230.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,12]],"date-time":"2023-03-12T13:26:30Z","timestamp":1678627590000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/30\/3\/438\/6881436"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,7]]},"references-count":28,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,12,7]]},"published-print":{"date-parts":[[2023,2,16]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocac230","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,3,1]]},"published":{"date-parts":[[2022,12,7]]}}}