{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T07:25:13Z","timestamp":1777533913379,"version":"3.51.4"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T00:00:00Z","timestamp":1601510400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"n2c2 challenge organizers"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>Normalizing clinical mentions to concepts in standardized medical terminologies, in general, is challenging due to the complexity and variety of the terms in narrative medical records. In this article, we introduce our work on a clinical natural language processing (NLP) system to automatically normalize clinical mentions to concept unique identifier in the Unified Medical Language System. This work was part of the 2019 n2c2 (National NLP Clinical Challenges) Shared-Task and Workshop on Clinical Concept Normalization.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We developed a hybrid clinical NLP system that combines a generic multilevel matching framework, customizable matching components, and machine learning ranking systems. We explored 2 machine leaning ranking systems based on either ensemble of various similarity features extracted from pretrained encoders or a Siamese attention network, targeting at efficient and fast semantic searching\/ranking. Besides, we also evaluated the performance of a general-purpose clinical NLP system based on Unstructured Information Management Architecture.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The systems were evaluated as part of the 2019 n2c2 challenge, and our original best system in the challenge obtained an accuracy of 0.8101, ranked fifth in the challenge. The improved system with newly designed machine learning ranking based on Siamese attention network improved the accuracy to 0.8209.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>We demonstrate the successful practice of combining multilevel matching and machine learning ranking for clinical concept normalization. Our results indicate the capability and interpretability of our proposed approach, as well as the limitation, suggesting the opportunities of achieving better performance by combining general clinical NLP systems.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocaa155","type":"journal-article","created":{"date-parts":[[2020,7,22]],"date-time":"2020-07-22T19:22:20Z","timestamp":1595445740000},"page":"1576-1584","source":"Crossref","is-referenced-by-count":19,"title":["Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking"],"prefix":"10.1093","volume":"27","author":[{"given":"Long","family":"Chen","sequence":"first","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Wenbo","family":"Fu","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Yu","family":"Gu","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Zhiyong","family":"Sun","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Haodan","family":"Li","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Enyu","family":"Li","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Li","family":"Jiang","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Yuan","family":"Gao","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]},{"given":"Yang","family":"Huang","sequence":"additional","affiliation":[{"name":"Med Data Quest, San Diego, California, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,10,8]]},"reference":[{"issue":"5","key":"2020110613121184100_ocaa155-B1","doi-asserted-by":"crossref","first-page":"760","DOI":"10.1016\/j.jbi.2009.08.007","article-title":"What can natural language processing do for clinical decision support?","volume":"42","author":"Demner-Fushman","year":"2009","journal-title":"J Biomed Inform"},{"issue":"1","key":"2020110613121184100_ocaa155-B2","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1146\/annurev-publhealth-032315-021353","article-title":"Using electronic health records for population health research: a review of methods and applications","volume":"37","author":"Casey","year":"2016","journal-title":"Annu Rev Public Health"},{"key":"2020110613121184100_ocaa155-B3","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1016\/j.jbi.2017.11.011","article-title":"Clinical information extraction applications: a literature review","volume":"77","author":"Wang","year":"2018","journal-title":"J Biomed Inform"},{"key":"2020110613121184100_ocaa155-B4","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.jbi.2017.07.012","article-title":"Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review","volume":"73","author":"Kreimeyer","year":"2017","journal-title":"J Biomed Inform"},{"key":"2020110613121184100_ocaa155-B5","author":"Unified Medical Language System (UMLS","year":"2019"},{"key":"2020110613121184100_ocaa155-B6","author":"N2C2: National NLP Clinical Challenges","year":"2020"},{"key":"2020110613121184100_ocaa155-B7","author":"Apache UIMA"},{"issue":"5","key":"2020110613121184100_ocaa155-B8","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1197\/jamia.M1552","article-title":"Automated encoding of clinical documents based on natural language processing","volume":"11","author":"Friedman","year":"2004","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2020110613121184100_ocaa155-B9","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1136\/jamia.2009.002733","article-title":"An overview of MetaMap: historical perspective and recent advances","volume":"17","author":"Aronson","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110613121184100_ocaa155-B10","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"2020110613121184100_ocaa155-B11","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1093\/jamia\/ocx132","article-title":"CLAMP\u2014a toolkit for efficiently building customized clinical natural language processing pipelines","volume":"25","author":"Soysal","year":"2018","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613121184100_ocaa155-B12","first-page":"732","article-title":"A hybrid normalization method for medical concepts in clinical narrative using semantic matching","volume":"2019","author":"Luo","year":"2019","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"issue":"2","key":"2020110613121184100_ocaa155-B13","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1093\/jamia\/ocv108","article-title":"Normalizing clinical terms using learned edit distance patterns","volume":"23","author":"Kate","year":"2016","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613121184100_ocaa155-B14","first-page":"212","volume-title":"International Conference of the Cross-Language Evaluation Forum for European Languages","author":"Suominen","year":"2013"},{"key":"2020110613121184100_ocaa155-B15","first-page":"54","author":"Pradhan","year":"2015"},{"key":"2020110613121184100_ocaa155-B16","first-page":"303","author":"Elhadad","year":"2015"},{"issue":"22","key":"2020110613121184100_ocaa155-B17","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1093\/bioinformatics\/btt474","article-title":"DNorm: Disease name normalization with pairwise learning to rank","volume":"29","author":"Leaman","year":"2013","journal-title":"Bioinformatics"},{"key":"2020110613121184100_ocaa155-B18","first-page":"802","author":"Zhang","year":"2015"},{"key":"2020110613121184100_ocaa155-B19","first-page":"828","author":"Ghiasvand","year":"2015"},{"key":"2020110613121184100_ocaa155-B20","first-page":"297","author":"Souza","year":"2015"},{"issue":"S11","key":"2020110613121184100_ocaa155-B21","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1186\/s12859-017-1805-7","article-title":"CNN-based ranking for biomedical entity normalization","volume":"18","author":"Li","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2020110613121184100_ocaa155-B22","author":"Ji"},{"key":"2020110613121184100_ocaa155-B23","first-page":"827","author":"Chiticariu","year":"2013"},{"key":"2020110613121184100_ocaa155-B24","doi-asserted-by":"crossref","first-page":"103132","DOI":"10.1016\/j.jbi.2019.103132","article-title":"MCN: a comprehensive corpus for medical concept normalization","volume":"92","author":"Luo","year":"2019","journal-title":"J Biomed Inform"},{"key":"2020110613121184100_ocaa155-B25","first-page":"640","author":"Spackman","year":"1997"},{"key":"2020110613121184100_ocaa155-B26","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1109\/MITP.2005.122","article-title":"RxNorm: Prescription for electronic drug information exchange","volume":"7","author":"Liu","year":"2005","journal-title":"IT Prof"},{"key":"2020110613121184100_ocaa155-B27","author":"Apache Lucene"},{"key":"2020110613121184100_ocaa155-B28","author":"Natural Language Toolkit\u2014NLTK"},{"key":"2020110613121184100_ocaa155-B29","author":"List of medical abbreviations\u2014Wikipedia"},{"issue":"8","key":"2020110613121184100_ocaa155-B30","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1109\/TKDE.2006.130","article-title":"Sentence similarity based on semantic nets and corpus statistics","volume":"18","author":"Li","year":"2006","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2020110613121184100_ocaa155-B31","author":"Devlin","year":"2019"},{"issue":"4","key":"2020110613121184100_ocaa155-B32","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pretrained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"issue":"1","key":"2020110613121184100_ocaa155-B33","doi-asserted-by":"crossref","first-page":"160035","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci Data"},{"key":"2020110613121184100_ocaa155-B34","first-page":"815","author":"Schroff","year":"2015"},{"key":"2020110613121184100_ocaa155-B35","author":"Zhou","year":"2016"},{"key":"2020110613121184100_ocaa155-B36","first-page":"37","author":"Chen","year":"2019"},{"issue":"1","key":"2020110613121184100_ocaa155-B37","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1093\/jamia\/ocz141","article-title":"Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning","volume":"27","author":"Chen","year":"2020","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613121184100_ocaa155-B38","first-page":"24","article-title":"Truth about computer-assisted coding: a consultant, him professional, and vendor weigh in on the real CAC impact","volume":"84","author":"Crawford","year":"2013","journal-title":"J AHIMA"},{"issue":"22","key":"2020110613121184100_ocaa155-B39","doi-asserted-by":"crossref","first-page":"2889","DOI":"10.1093\/bioinformatics\/btq555","article-title":"Graph-based word sense disambiguation of biomedical documents","volume":"26","author":"Agirre","year":"2010","journal-title":"Bioinformatics"},{"key":"2020110613121184100_ocaa155-B40","first-page":"1","volume-title":"Processing","author":"Melamud","year":"2015"},{"issue":"11","key":"2020110613121184100_ocaa155-B41","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1093\/jamia\/ocz109","article-title":"Clinical trial cohort selection based on multilevel rule-based natural language processing system","volume":"26","author":"Chen","year":"2019","journal-title":"J Am Med Inform Assoc"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/10\/1576\/34153790\/ocaa155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/10\/1576\/34153790\/ocaa155.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,4]],"date-time":"2023-10-04T17:44:40Z","timestamp":1696441480000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/27\/10\/1576\/5919212"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,1]]},"references-count":41,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2020,10,8]]},"published-print":{"date-parts":[[2020,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaa155","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,10]]},"published":{"date-parts":[[2020,10,1]]}}}