{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T19:36:03Z","timestamp":1778614563605,"version":"3.51.4"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T00:00:00Z","timestamp":1695859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"National Institute of Drug Abuse Clinical Trials Network, Tuolc Inc, Roche Inc"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,12,22]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n\u2009=\u20091563), glioblastoma (GBM, n\u2009=\u2009659), rectal adenocarcinoma (DRA, n\u2009=\u2009601), and abdominoperineal resection (APR, n\u2009=\u2009601) and low anterior resection (LAR, n\u2009=\u2009601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from \u221258% to 9%. Selective classifiers abstained on 5%\u201343% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocad182","type":"journal-article","created":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T22:32:12Z","timestamp":1695940332000},"page":"188-197","source":"Crossref","is-referenced-by-count":14,"title":["Selective prediction for extracting unstructured clinical data"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3426-9289","authenticated-orcid":false,"given":"Akshay","family":"Swaminathan","sequence":"first","affiliation":[{"name":"Stanford University School of Medicine , Stanford, CA, United States"},{"name":"Cerebral Inc. Claymont, DE, United States"}]},{"given":"Ivan","family":"Lopez","sequence":"additional","affiliation":[{"name":"Stanford University School of Medicine , Stanford, CA, United States"},{"name":"Cerebral Inc. Claymont, DE, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0878-1257","authenticated-orcid":false,"given":"William","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Biology, Stanford University , Stanford, CA, United States"},{"name":"Department of Bioengineering, Stanford University , Stanford, CA, United States"}]},{"given":"Ujwal","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University , Stanford, CA, United States"}]},{"given":"Edward","family":"Tran","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University , Stanford, CA, United States"},{"name":"Department of Management Science and Engineering, Stanford University , Stanford, CA, United States"}]},{"given":"Aarohi","family":"Bhargava-Shah","sequence":"additional","affiliation":[{"name":"Stanford University School of Medicine , Stanford, CA, United States"}]},{"given":"Janet Y","family":"Wu","sequence":"additional","affiliation":[{"name":"Stanford University School of Medicine , Stanford, CA, United States"}]},{"given":"Alexander L","family":"Ren","sequence":"additional","affiliation":[{"name":"Stanford University School of Medicine , Stanford, CA, United States"}]},{"given":"Kaitlin","family":"Caoili","sequence":"additional","affiliation":[{"name":"Stanford University School of Medicine , Stanford, CA, United States"}]},{"given":"Brandon","family":"Bui","sequence":"additional","affiliation":[{"name":"Department of Human Biology, Stanford University , Stanford, CA, United States"}]},{"given":"Layth","family":"Alkhani","sequence":"additional","affiliation":[{"name":"Department of Bioengineering, Stanford University , Stanford, CA, United States"},{"name":"Department of Chemistry, Stanford University , Stanford, CA, United States"}]},{"given":"Susan","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University , Stanford, CA, United States"}]},{"given":"Nathan","family":"Mohit","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University , Stanford, CA, United States"},{"name":"Department of Human Biology, Stanford University , Stanford, CA, United States"}]},{"given":"Noel","family":"Seo","sequence":"additional","affiliation":[{"name":"Department of Sociology, Stanford University , Stanford, CA, United States"}]},{"given":"Nicholas","family":"Macedo","sequence":"additional","affiliation":[{"name":"Department of Biology, Stanford University , Stanford, CA, United States"},{"name":"Department of Radiology, Stanford University School of Medicine , Stanford, CA, United States"}]},{"given":"Winson","family":"Cheng","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Stanford University , Stanford, CA, United States"},{"name":"Department of Chemistry, Stanford University , Stanford, CA, United States"}]},{"given":"Charles","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Surgery, Stanford University School of Medicine , Stanford, CA, United States"}]},{"given":"Reena","family":"Thomas","sequence":"additional","affiliation":[{"name":"Department of Neurology and Neurological Sciences, Stanford Health Care , Stanford, CA, United States"}]},{"given":"Jonathan H","family":"Chen","sequence":"additional","affiliation":[{"name":"Stanford Center for Biomedical Informatics Research , Stanford, CA, United States"},{"name":"Division of Hospital Medicine , Stanford, CA, United States"},{"name":"Clinical Excellence Research Center , Stanford, CA, United States"},{"name":"Department of Medicine , Stanford, CA, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9965-5466","authenticated-orcid":false,"given":"Olivier","family":"Gevaert","sequence":"additional","affiliation":[{"name":"Stanford Center for Biomedical Informatics Research , Stanford, CA, United States"},{"name":"Department of Medicine , Stanford, CA, United States"}]}],"member":"286","published-online":{"date-parts":[[2023,9,28]]},"reference":[{"key":"2023122220301599500_ocad182-B1","author":"Improved Diagnostics & Patient Outcomes | HealthIT.gov"},{"issue":"7775","key":"2023122220301599500_ocad182-B2","doi-asserted-by":"crossref","first-page":"S114","DOI":"10.1038\/d41586-019-02876-y","article-title":"The future of electronic health records","volume":"573","author":"Hecht","year":"2019","journal-title":"Nature"},{"issue":"10","key":"2023122220301599500_ocad182-B3","doi-asserted-by":"crossref","first-page":"e65","DOI":"10.1097\/MLR.0000000000000108","article-title":"Overcoming the challenges of unstructured data in multi-site, electronic medical record-based abstraction","volume":"54","author":"Polnaszek","year":"2016","journal-title":"Med Care"},{"issue":"1","key":"2023122220301599500_ocad182-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4258\/hir.2019.25.1.1","article-title":"Managing unstructured big data in healthcare system","volume":"25","author":"Kong","year":"2019","journal-title":"Healthc Inform Res"},{"key":"2023122220301599500_ocad182-B5","author":"Yang","year":"2022"},{"issue":"Spring","key":"2023122220301599500_ocad182-B6","first-page":"1g","article-title":"Electronic health record (EHR) abstraction","volume":"18","author":"Alzu'bi","year":"2021","journal-title":"Perspect Health Inf Manag"},{"key":"2023122220301599500_ocad182-B7","first-page":"33","author":"Kaur"},{"issue":"10","key":"2023122220301599500_ocad182-B8","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1093\/jamia\/ocaa180","article-title":"Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies","volume":"27","author":"Rasmy","year":"2020","journal-title":"J Am Med Inform Assoc"},{"issue":"5 Pt 2","key":"2023122220301599500_ocad182-B9","doi-asserted-by":"crossref","first-page":"1620","DOI":"10.1111\/j.1475-6773.2005.00444.x","article-title":"Measuring diagnoses: ICD code accuracy","volume":"40","author":"O'Malley","year":"2005","journal-title":"Health Serv Res"},{"key":"2023122220301599500_ocad182-B10","first-page":"10","author":"King"},{"key":"2023122220301599500_ocad182-B11","author":"Bommasani","year":"2022"},{"issue":"3","key":"2023122220301599500_ocad182-B12","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1093\/jamia\/ocz200","article-title":"Deep learning in clinical natural language processing: a methodical review","volume":"27","author":"Wu","year":"2020","journal-title":"J Am Med Inform Assoc"},{"key":"2023122220301599500_ocad182-B13","author":"Lin","year":"2021"},{"key":"2023122220301599500_ocad182-B14","author":"Pruthi","year":"2019"},{"key":"2023122220301599500_ocad182-B15","doi-asserted-by":"crossref","first-page":"102579","DOI":"10.1109\/ACCESS.2021.3095412","article-title":"Deep learning approach for negation handling in sentiment analysis","volume":"9","author":"Singh","year":"2021","journal-title":"IEEE Access"},{"key":"2023122220301599500_ocad182-B16","author":"Birnbaum","year":"2020"},{"key":"2023122220301599500_ocad182-B17","first-page":"1","article-title":"Secondary use of EHR: data quality issues and informatics opportunities","volume":"2010","author":"Botsis","year":"2010","journal-title":"Summit Translat Bioinforma"},{"issue":"1","key":"2023122220301599500_ocad182-B18","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1186\/s12911-021-01655-y","article-title":"Machine learning with asymmetric abstention for biomedical decision-making","volume":"21","author":"Gandouz","year":"2021","journal-title":"BMC Med Inform Decis Making"},{"key":"2023122220301599500_ocad182-B19","doi-asserted-by":"crossref","first-page":"325","DOI":"10.34768\/amcs-2020-0025","article-title":"Bounded-abstaining classification for breast tumors in imbalanced ultrasound images","volume":"30","author":"Guan","year":"2020","journal-title":"Int J Appl Math Comput Sci"},{"key":"2023122220301599500_ocad182-B20","first-page":"1040","author":"Xin","year":"2021"},{"key":"2023122220301599500_ocad182-B21","author":"Hendrickx","year":"2021"},{"key":"2023122220301599500_ocad182-B22","author":"Moseley"},{"issue":"2","key":"2023122220301599500_ocad182-B23","doi-asserted-by":"crossref","first-page":"e0192360","DOI":"10.1371\/journal.pone.0192360","article-title":"Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives","volume":"13","author":"Gehrmann","year":"2018","journal-title":"PLoS One"},{"key":"2023122220301599500_ocad182-B24","author":"Johnson","year":"2015"},{"key":"2023122220301599500_ocad182-B25","author":"MIMIC-III, a freely accessible critical care database | Scientific Data","year":"2023"},{"issue":"3","key":"2023122220301599500_ocad182-B26","doi-asserted-by":"crossref","first-page":"269","DOI":"10.14778\/3157794.3157797","article-title":"Snorkel: rapid training data creation with weak supervision","volume":"11","author":"Ratner","year":"2017","journal-title":"Proc VLDB Endowment"},{"key":"2023122220301599500_ocad182-B27","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1472-6947-10-51","article-title":"A regret theory approach to decision curve analysis: a novel method for eliciting decision makers\u2019 preferences and decision-making","volume":"10","author":"Tsalatsanis","year":"2010","journal-title":"BMC Med Inform Decis Mak"},{"key":"2023122220301599500_ocad182-B28","author":"2021\/2022 ICD-10-CM Index &gt; \u201cGlioblastoma","year":"2022"},{"key":"2023122220301599500_ocad182-B29","author":"2022 ICD-10-CM Codes C72","year":"2022"},{"key":"2023122220301599500_ocad182-B30","author":"2022 ICD-10-CM Codes C71","year":"2022"},{"key":"2023122220301599500_ocad182-B31","author":"Medical Billing Codes Search\u2014CPT, ICD 9, ICD 10 HCPCS Codes & Articles, Guidelines | Codify by AAPC","year":"2022"},{"issue":"1","key":"2023122220301599500_ocad182-B32","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1038\/s41746-020-00367-3","article-title":"Second opinion needed: communicating uncertainty in medical machine learning","volume":"4","author":"Kompa","year":"2021","journal-title":"NPJ Digit Med"},{"issue":"1","key":"2023122220301599500_ocad182-B33","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1155\/2009\/203790","article-title":"Linear classifier with reject option for the detection of vocal fold paralysis and vocal fold edema","volume":"2009","author":"Kotropoulos","year":"2009","journal-title":"EURASIP J Adv Signal Process"},{"key":"2023122220301599500_ocad182-B34","doi-asserted-by":"crossref","first-page":"i6","DOI":"10.1136\/bmj.i6","article-title":"Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests","volume":"352","author":"Vickers","year":"2016","journal-title":"BMJ"},{"key":"2023122220301599500_ocad182-B35","first-page":"17","author":"Arnold"},{"issue":"1","key":"2023122220301599500_ocad182-B36","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1007\/s10618-016-0460-3","article-title":"Evidence-based uncertainty sampling for active learning","volume":"31","author":"Sharma","year":"2017","journal-title":"Data Min Knowl Disc"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/1\/188\/54762074\/ocad182.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/1\/188\/54762074\/ocad182.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T20:57:35Z","timestamp":1729976255000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/1\/188\/7285661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,28]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,9,28]]},"published-print":{"date-parts":[[2023,12,22]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocad182","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2023,9,28]]}}}